WO2019000832A1 - 一种声纹创建与注册方法及装置 - Google Patents
一种声纹创建与注册方法及装置 Download PDFInfo
- Publication number
- WO2019000832A1 WO2019000832A1 PCT/CN2017/113772 CN2017113772W WO2019000832A1 WO 2019000832 A1 WO2019000832 A1 WO 2019000832A1 CN 2017113772 W CN2017113772 W CN 2017113772W WO 2019000832 A1 WO2019000832 A1 WO 2019000832A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- voiceprint
- registration
- voice
- module
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
Definitions
- the present application relates to the field of artificial intelligence applications, and in particular, to a voiceprint creation and registration method and apparatus.
- Artificial Intelligence is a new technical science that studies and develops theories, methods, techniques, and applications for simulating, extending, and extending human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that responds in a manner similar to human intelligence. Research in this area includes robotics, speech recognition, image recognition, Natural language processing and expert systems. Among them, one aspect of artificial intelligence is the voiceprint recognition technology.
- voice dialogue is that it can record the user's voice.
- everyone has their own voice, just like a fingerprint. So we also call everyone's voice a voiceprint.
- voiceprint Through the speaker's voiceprint, we can determine who the speaker is.
- the user and extract the user's data to provide a personalized service.
- the voiceprint technology in the industry is immature and it is difficult to meet the requirements of productization.
- the voiceprint creation and registration method has a high learning cost and is more annoying to the user.
- aspects of the present application provide a voiceprint creation and registration method and apparatus for providing personalized services to users and reducing learning costs.
- a voiceprint creation and registration method including:
- the device When the device is first enabled, it prompts to create a voiceprint and register it;
- the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
- the user's voiceprint model is generated based on the gender tag and voice.
- a method for creating and registering a voiceprint including:
- the user ID If the user ID is not recognized, it prompts to create a voiceprint and register;
- the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
- the acquiring the voice request sent by the user further includes:
- the acquiring the voice request sent by the user further includes:
- the prompting to create a voiceprint and registering includes:
- the voiceprint model that does not recognize the user ID is marked with an ID number
- a user ID is generated; the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
- the prompting to create a voiceprint and registering includes:
- a text-related training method is used to establish a voiceprint model for a user who does not recognize the user ID.
- the text-related training method is used to establish a voiceprint model for a user who does not recognize the user ID, including:
- the user's voiceprint model is generated based on the gender tag and voice.
- a voiceprint creation and registration apparatus comprising:
- a prompt module a voiceprint creation module, an input module, and a registration module;
- the prompting module is configured to promptly create a voiceprint and register when the device is first enabled
- the voiceprint establishing module is configured to establish a voiceprint model for a user by using a text-related training method
- the input module is configured to generate a user ID
- the registration module is configured to store the user ID and the voiceprint model correspondingly to the voiceprint registration database.
- the voiceprint establishing module specifically includes the following sub-modules:
- a receiving submodule configured to receive voice information that the user reads the registration string
- a sub-module is generated for generating a user's voiceprint model based on gender tags and voice.
- a voiceprint creation and registration apparatus comprising:
- the obtaining module is configured to acquire a voice request sent by a user
- the voiceprint recognition module is configured to identify a user ID that issues a voice request according to the voice request, by using a voiceprint recognition method;
- the prompting module is configured to prompt an unregistered user to create a voiceprint and register
- the input module is configured to generate a user ID
- the registration module is configured to store the user ID and the voiceprint model correspondingly to the voiceprint registration database.
- the voiceprint model that does not recognize the user ID is marked with an ID number
- a user ID is generated; the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
- a text-related training method is used to create a voiceprint model for unregistered users.
- the prompting module includes the following submodules:
- a receiving submodule configured to receive voice information that the user reads the registration string
- a sub-module is generated for generating a user's voiceprint model based on gender tags and voice.
- an apparatus comprising:
- One or more processors are One or more processors;
- a storage device for storing one or more programs
- the one or more programs are executed by the one or more processors such that the one or more processors implement any of the methods described above.
- a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements any of the above methods.
- the embodiment of the present application can avoid the problem that the voiceprint recognition method in the prior art has strong technical dependence, a single use strategy, and a low degree of productization. It has a high technical fault tolerance rate, speeds up productization, and provides users with personalized services.
- FIG. 1 is a schematic flowchart of a voiceprint creation and registration method according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of a text-related training method in a voiceprint creation and registration method according to an embodiment of the present invention, and a voiceprint model is established for an unregistered user;
- FIG. 3 is a schematic flowchart of a voiceprint creation and registration method according to another embodiment of the present disclosure.
- FIG. 4 is a schematic diagram of creating and registering a voiceprint according to another embodiment of the present application. Sound request, using a voiceprint recognition method, to identify a flow diagram of a user ID that issues a voice request;
- FIG. 5 is a schematic flowchart of prompting an unregistered user to create a voiceprint and registering in a voiceprint creation and registration method according to another embodiment of the present disclosure
- FIG. 6 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present disclosure.
- FIG. 7 is a schematic structural diagram of a voiceprint establishing module of a voiceprint creating and registering apparatus according to an embodiment of the present disclosure
- FIG. 8 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present disclosure.
- FIG. 9 is a schematic structural diagram of a prompting module of a voiceprint creating and registering device according to another embodiment of the present disclosure.
- FIG. 10 is a block diagram of an exemplary computer system/server suitable for use in implementing embodiments of the present invention.
- MateAPP For an intelligent voice interaction device, there is a MateAPP that cooperates with the intelligent voice interaction device on the mobile terminal to complete a series of tasks.
- a "voice management" function module is created on MateAPP, in which the user can create, delete and modify the voiceprint under the account.
- FIG. 1 is a schematic flowchart of a voiceprint creation and registration method according to an embodiment of the present application. As shown in Figure 1, the following steps are included:
- the device when the device is first enabled, it prompts to create a voiceprint and register;
- the user When the device is powered on for the first time, the user is prompted to register at least one voiceprint ID through MateAPP, and confirm relevant identity information, such as name, age, gender, and the like.
- the user creates a voiceprint through MateAPP or by voice expressing the will to create a voiceprint.
- a text-related training method is used to establish a voiceprint model for the user; specifically, as shown in FIG. 2, the following sub-steps are included:
- the registration string is provided to the user.
- the registration string can be in many forms:
- the registration string can be a randomly generated string of numbers.
- the number in the registration string only appears once.
- the registration string can be a randomly generated kanji string.
- the user receives the voice information of the registration string.
- the user can perform a plurality of readings according to the provided registration string to generate a plurality of voices for registration.
- the voice information generated by the user to read aloud according to the provided registration string may be received.
- the user's gender tag is determined based on the gender classifier and voice.
- the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
- the gender tag includes male or female.
- the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
- the gender classifier analyzes the first feature information to obtain the first The gender tag of the feature information, which is the gender tag of the user.
- the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
- the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined.
- the gender is male.
- the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user can be determined to be female.
- a user's voiceprint model is generated based on the gender tag and voice.
- a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
- the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
- a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
- Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- a user ID is generated, and the user is prompted to input user ID related data such as name, gender, age, hobby, home address, and work address.
- the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that the voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
- the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
- Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
- the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
- FIG. 3 is a schematic flowchart of a voiceprint creation and registration method according to another embodiment of the present application. As shown in FIG. 3, the method includes the following steps:
- the user after the intelligent voice interaction device is connected to the network, the user performs voice interaction with the intelligent voice interaction device to determine whether the voice request needs to be sent to the cloud; if yes, further identify the user ID that sends the voice request. .
- the voice request is first performed on the voice request, and the command described by the command voice is obtained, and the command is determined to correspond to the vertical class; If the user ID needs to be determined to provide a personalized recommendation, the voice request is directly responded; if the user class needs to determine the user ID to provide a personalized recommendation, the user ID that issued the voice request is further identified.
- the voiceprint identification method is used to identify the user ID that sends the voice request; specifically, as shown in FIG. 4, the following sub-steps are included:
- a voiceprint identification method is used to identify a user gender tag that issues a voice request.
- model training can be performed according to the voice characteristics of the user group to realize voiceprint analysis for user groups of different genders.
- the voiceprint recognition mode is used to identify the user sexual information that issues the voice request.
- the voiceprint of the speaker needs to be modeled, that is, "training” or “learning”. Specifically, the first feature vector of each voice in the training set is extracted by applying the deep neural network DNN voiceprint baseline system; and the gender classifier is trained according to the first feature vector of each voice and the pre-labeled gender label. Thus, a gender-based voiceprint processing model is established.
- the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the command voice.
- the fundamental frequency feature and the Mel frequency cepstrum coefficient MFCC feature can be extracted firstly for the speech request, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
- Perform a posteriori probability value calculation based on the calculated knot If the gender of the user is determined, for example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the gender of the user may be determined as a male, when the calculation result The posterior probability value is small, and if less than a certain threshold, the gender of the user is determined to be female.
- the user voiceprint ID that issued the command voice is further identified.
- Each user's voice will have a unique voiceprint ID that records personal data such as the user's name, gender, age, and hobbies.
- the voice input by the user is sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice request returned by the gender classifier. That is, if the voice request corresponds to a male voice, the voice is sent to the male DNN model. If the voice request corresponds to a female voice, the voice is sent to the female DNN model.
- Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the manner of obtaining is many, and may be selected according to different application requirements, for example:
- the voiceprint is created and registered
- the non-text related training method is used to establish a voiceprint model for the unregistered user and register.
- the obtained voiceprint model of the unregistered user is marked with an ID number
- the user ID is generated, and the user is prompted to input data related to the user ID such as name, gender, age, hobby, home address, work address, and the like.
- the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
- the interruption to the user can be minimized, and the voiceprint can be created only for the frequently used home users. Specifically:
- the voiceprint model that does not recognize the user ID is marked with an ID number; but the user ID is not generated to prompt the user to input data related to the user ID such as name, gender, age, hobbies, home address, work address, etc.; the user to whom the ID number belongs is recorded only in the background. the behavior of.
- a user ID is generated, prompting the user to input user ID related data such as name, gender, age, hobby, home address, work address, and the like.
- the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
- the text-related training method is used to establish a voiceprint model for a user who does not recognize the user ID; if the voiceprint technology is not perfect, the text-related training method may be used to improve Recognition rate.
- using a text-related training method to establish a voiceprint model for a user who does not recognize the user ID includes the following sub-steps:
- the registration string is provided to the user.
- the registration string can be in many forms:
- the registration string can be a randomly generated string of numbers.
- the number in the registration string only appears once.
- the registration string can be a randomly generated kanji string.
- the user receives the voice information of the registration string.
- the user can perform a plurality of readings in accordance with the provided registration string for registration.
- the voice information generated by the user to read aloud according to the provided registration string may be received.
- the user's gender tag is determined based on the gender classifier and voice.
- the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
- the gender tag includes male or female.
- the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
- the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the user.
- the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
- the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined. Sex Don't be male. When the calculation result is small, the posterior probability value is small. If it is less than a certain threshold, the gender of the user can be determined to be female.
- a user's voiceprint model is generated based on the gender tag and voice.
- a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
- the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
- a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
- Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the manner of obtaining is many, and may be selected according to different application requirements, for example:
- a user ID is generated, and the user is prompted to input user ID related data such as name, gender, age, hobby, home address, and work address.
- the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so as to subsequently perform voiceprint recognition according to the voiceprint model, and any intelligence under the account.
- Voice device can be used for voice control.
- the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
- Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
- the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
- the user can log in by MateAPP and modify the user ID and the voiceprint model.
- the method in this embodiment can avoid the problem that the technology of the voiceprint creation and registration method in the prior art has a high learning cost and is more disturbing to the user.
- the gender-based voiceprint registration process is implemented, so that the gender-specific voiceprint authentication processing model can improve the efficiency and accuracy of voiceprint authentication; the voiceprinting process can cover various scenes, and the voiceprint can be established at various stages. Guide the user, or separate the voiceprint establishment from the registration by frequency, minimize the interruption of the user, guide the user to register the voiceprint and then enable the voice interaction product to provide personalized service to the user based on the voiceprint.
- FIG. 6 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present invention. As shown in FIG. 6, the method includes a prompting module 61, a voiceprint establishing module 62, an input module 63, and a registration module 64.
- the prompting module 61 is configured to prompt to create a voiceprint and register when the device is first enabled
- the user When the device is powered on for the first time, the user is guided to register at least one voiceprint ID through MateAPP, and confirm relevant identity information, such as name, age, gender, and the like.
- the user creates a voiceprint through MateAPP or by voice expressing the will to create a voiceprint.
- the voiceprint establishing module 62 is configured to establish a voiceprint model for the user by using a text-related training method; specifically, as shown in FIG. 7, the following sub-module is included:
- a sub-module 71 is provided for providing a registration string to the user.
- the registration string can be in many forms:
- the registration string can be a randomly generated string of numbers.
- the number in the registration string only appears once.
- the registration string can be a randomly generated kanji string.
- the receiving sub-module 72 is configured to receive voice information that the user reads the registration string.
- the user can perform a plurality of readings according to the provided registration string to generate a plurality of voices for registration.
- the voice information generated by the user to read aloud according to the provided registration string may be received.
- the determining sub-module 73 is configured to determine the gender tag of the user according to the gender classifier and the voice.
- the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
- the gender tag includes male or female.
- the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
- the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the user.
- the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
- the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined.
- the gender is male.
- the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user can be determined to be female.
- the generating sub-module 74 is configured to generate a voiceprint model of the user according to the gender tag and the voice.
- a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
- the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
- a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
- Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the manner of obtaining is many, and may be selected according to different application requirements, for example:
- the input module 63 is configured to generate a user ID, and prompt the user to input user ID related data such as name, gender, age, hobby, home address, and work address.
- the registration module 64 is configured to store the user ID and the voiceprint model correspondingly under a certain account of the voiceprint registration database, so as to subsequently perform voiceprint recognition according to the voiceprint model, and perform voice control on any intelligent voice device under the account. .
- the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
- Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
- the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
- FIG. 8 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present application. As shown in FIG. 8, the method includes the following steps:
- the obtaining module 81 is configured to obtain a voice request sent by the user.
- the user After the intelligent voice interaction device is connected to the network, the user Perform voice interaction with the intelligent voice interaction device to determine whether a voice request needs to be sent to the cloud; if yes, further identify the user ID that issued the voice request.
- voice recognition is first performed on the voice request, and the command described by the command voice is obtained, and the command is determined to correspond to the vertical class; if the vertical class does not need to determine the user ID to provide personalization Preferably, the voice request is directly responded to; if the drop class needs to determine the user ID to provide a personalized recommendation, the user ID that issued the voice request is further identified.
- the voiceprint recognition module 82 is configured to identify a user ID that issues a voice request according to the voice request, and specifically includes the following submodules:
- the user gender identification sub-module is configured to identify a user gender tag that issues a voice request according to the voice request and adopt a voiceprint recognition manner.
- model training can be performed according to the voice characteristics of the user group to realize voiceprint analysis for user groups of different genders.
- the voiceprint recognition mode is used to identify the user sexual information that issues the voice request.
- the voiceprint of the speaker needs to be modeled, that is, "training” or “learning”. Specifically, the first feature vector of each voice in the training set is extracted by applying the deep neural network DNN voiceprint baseline system; and the gender classifier is trained according to the first feature vector of each voice and the pre-labeled gender label. Thus, a gender-based voiceprint processing model is established.
- the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a command voice Gender tag.
- the fundamental frequency feature and the Mel frequency cepstrum coefficient MFCC feature can be extracted firstly for the speech request, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
- Performing a posteriori probability value calculation, and determining the gender of the user according to the calculation result For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if it is greater than a certain threshold, the determination may be performed. The gender of the user is male. When the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user may be determined to be female.
- the user voiceprint ID identification sub-module is configured to identify the user voiceprint ID that issues the command voice after identifying the user gender label that issues the voice request.
- Each user's voice will have a unique voiceprint ID that records personal data such as the user's name, gender, age, and hobbies.
- the voice input by the user is sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice request returned by the gender classifier. That is, if the voice request corresponds to a male voice, the voice is sent to the male DNN model. If the voice request corresponds to a female voice, the voice is sent to the female DNN model.
- Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the manner of obtaining is many, and may be selected according to different application requirements, for example:
- the matching value is less than a preset threshold, it is determined that the user is not registered, so that the smart device is used for the first time.
- the prompting module 83 is configured to prompt to create a voiceprint and register if the user ID is not recognized;
- the prompting module 83 uses a non-text related training method to establish a voiceprint model for the user.
- the voiceprint model that does not recognize the user ID is marked with an ID number
- Generate a user ID prompt the user to input data related to the user ID such as name, gender, age, hobby, home address, work address, and register the voiceprint.
- the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
- the interruption to the user can be minimized, and the voiceprint can be created only for the frequently used home users. Specifically:
- the voiceprint model that does not recognize the user ID is marked with an ID number; but the user ID is not generated to prompt the user to input data related to the user ID such as name, gender, age, hobbies, home address, work address, etc.; the user to whom the ID number belongs is recorded only in the background. the behavior of.
- a user ID is generated, prompting the user to input a user ID number such as name, gender, age, hobby, home address, work address, and the like.
- a user ID number such as name, gender, age, hobby, home address, work address, and the like.
- the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
- the prompting module 83 adopts a text-related training method, and establishes a voiceprint model for the user ID that is not recognized and registers; if the voiceprint technology is not perfect, the text correlation may be used.
- the training method improves the recognition rate. Specifically, as shown in FIG. 9, the following sub-modules are included:
- a sub-module 91 is provided for providing a registration string to the user.
- the registration string can be in many forms:
- the registration string can be a randomly generated string of numbers.
- the number in the registration string only appears once.
- the registration string can be a randomly generated kanji string.
- the receiving sub-module 92 is configured to receive voice information that the user reads the registration string.
- the user can perform a plurality of readings in accordance with the provided registration string for registration.
- the voice information generated by the user to read aloud according to the provided registration string may be received.
- the determining sub-module 93 is configured to determine a gender tag of the user according to the gender classifier and the voice.
- the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
- the gender tag includes male or female.
- the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
- the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the user.
- the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
- the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined.
- the gender is male.
- the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user can be determined to be female.
- the generating sub-module 94 is configured to generate a voiceprint model of the user according to the gender tag and the voice.
- a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
- the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
- a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
- Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
- the manner of obtaining is many, and may be selected according to different application requirements, for example:
- the input module 84 is configured to generate a user ID, and prompt the user to input user ID related data such as name, gender, age, hobby, home address, and work address.
- the registration module 85 is configured to store the user ID and the voiceprint model under a certain account of the voiceprint registration database, so as to subsequently perform voiceprint recognition according to the voiceprint model, and perform voice control on any intelligent voice device under the account. .
- the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
- Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
- the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
- the user can log in by MateAPP and modify the user ID and the voiceprint model.
- the method in this embodiment can avoid the problem that the technology of the voiceprint creation and registration method in the prior art has a high learning cost and is more disturbing to the user.
- the gender-based voiceprint registration process is implemented, so that the gender-specific voiceprint authentication processing model can improve the efficiency and accuracy of voiceprint authentication; the voiceprinting process can cover various scenes, and the voiceprint can be established at various stages. Guide the user, or separate the voiceprint establishment from the registration by frequency, minimize the interruption of the user, guide the user to register the voiceprint and then enable the voice interaction product to provide personalized service to the user based on the voiceprint.
- the device in this embodiment can avoid the problem that the learning method of the voiceprint creation and registration method in the prior art is relatively high and the user is bothered.
- the process of establishing the voiceprint can cover various scenes, the voiceprint establishment can guide the user at various stages, or separate the voiceprint establishment and registration by frequency, minimize the disturbance to the user, guide the user to register the voiceprint and then make the voice interaction product Personalized services can be provided to users based on voiceprints.
- the disclosed methods and apparatus may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
- Figure 10 illustrates an exemplary computer system/service suitable for use in implementing embodiments of the present invention.
- the computer system/server 012 shown in FIG. 10 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
- computer system/server 012 is represented in the form of a general purpose computing device.
- Components of computer system/server 012 may include, but are not limited to, one or more processors or processing units 016, system memory 028, and bus 018 that connects different system components, including system memory 028 and processing unit 016.
- Bus 018 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
- these architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an Enhanced ISA Bus, a Video Electronics Standards Association (VESA) local bus, and peripheral component interconnects ( PCI) bus.
- ISA Industry Standard Architecture
- MAC Micro Channel Architecture
- VESA Video Electronics Standards Association
- PCI peripheral component interconnects
- Computer system/server 012 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer system/server 012, including volatile and non-volatile media, removable and non-removable media.
- System memory 028 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 030 and/or cache memory 032.
- Computer system/server 012 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 034 can be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 10, commonly referred to as a "hard disk drive").
- a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
- a removable non-volatile disk such as a CD-ROM, DVD-ROM
- other optical media read and write optical drive.
- each drive The actuator can be coupled to bus 018 via one or more data medium interfaces.
- Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of various embodiments of the present invention.
- Program/utility 040 having a set (at least one) of program modules 042, which may be stored, for example, in memory 028, such program module 042 includes, but is not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment.
- Program module 042 typically performs the functions and/or methods of the embodiments described herein.
- the computer system/server 012 can also be in communication with one or more external devices 014 (eg, a keyboard, pointing device, display 024, etc.), in which the computer system/server 012 communicates with an external radar device, and can also A plurality of devices that enable a user to interact with the computer system/server 012, and/or any device (eg, a network card, modem, etc.) that enables the computer system/server 012 to communicate with one or more other computing devices Communication. This communication can take place via an input/output (I/O) interface 022.
- I/O input/output
- computer system/server 012 can also communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) via network adapter 020.
- network adapter 020 communicates with other modules of computer system/server 012 via bus 018.
- other hardware and/or software modules may be utilized in connection with computer system/server 012, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems. , tape drives, and data backup storage systems.
- Processing unit 016 performs the functions and/or methods of the described embodiments of the present invention by running a program stored in system memory 028.
- the computer program described above may be provided in a computer storage medium encoded with a computer program that, when executed by one or more computers, causes one or more computers to perform the embodiment of the invention described above Method flow and/or device operation.
- the transmission route of computer programs is no longer limited by tangible media, and can also be downloaded directly from the network. Any combination of one or more computer readable media can be utilized.
- the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
- the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
- a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
- a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
- Program code embodied on a computer readable medium can be transferred by any suitable medium, including Including, but not limited to, wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for performing the operations of the present invention may be written in one or more programming languages, or a combination thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server.
- the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
- LAN local area network
- WAN wide area network
Abstract
Description
Claims (18)
- 一种声纹创建与注册方法,其特征在于,包括:当设备首次启用,提示创建声纹并注册;采用文本相关的训练方法,为用户建立声纹模型;生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
- 根据权利要求1所述的声纹创建与注册方法,其特征在于,所述采用文本相关的训练方法,为用户建立声纹模型包括以下子步骤:将注册字符串提供给用户;接收用户阅读注册字符串的语音信息;根据性别分类器和语音确定用户的性别标签;根据性别标签和语音生成用户的声纹模型。
- 一种声纹创建与注册方法,其特征在于,包括:获取用户发送的语音请求;根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;若未识别到用户ID,则提示创建声纹并注册;生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
- 根据权利要求3所述的声纹创建与注册方法,其特征在于,所述获取用户发送的语音请求进一步包括:判断是否需要向云端发送所述语音请求,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
- 根据权利要求3或4所述的声纹创建与注册方法,其特征在于,所述获取用户发送的语音请求进一步包括:判断所述语音请求是否需要识别用户ID,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
- 根据权利要求3、4或5所述的声纹创建与注册方法,其特征在于,所述提示创建声纹并注册包括:将未识别到用户ID的声纹模型打上ID号;判断所述打上ID号的声纹模型的出现频率;如果低于阈值,则删除该ID号;如果高于阈值,则生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
- 根据权利要求3至6任一权项所述的声纹创建与注册方法,其特征在于,所述提示创建声纹并注册包括:采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型。
- 根据权利要求7所述的声纹创建与注册方法,其特征在于,所述采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型包括:将注册字符串提供给用户;接收用户阅读注册字符串的语音信息;根据性别分类器和语音确定用户的性别标签;根据性别标签和语音生成用户的声纹模型。
- 一种声纹创建与注册装置,其特征在于,包括:提示模块、声纹建立模块、输入模块、注册模块;其中,所述提示模块,用于当设备首次启用,提示创建声纹并注册;所述声纹建立模块,用于采用文本相关的训练方法,为用户建立声纹模型;所述输入模块,用于生成用户ID;所述注册模块,用于将用户ID和声纹模型对应存储到声纹注册数据库。
- 根据权利要求9所述的声纹创建与注册装置,其特征在于,所述声纹建立模块,具体包括以下子模块:提供子模块,用于将注册字符串提供给用户;接收子模块,用于接收用户阅读注册字符串的语音信息;确定子模块,用于根据性别分类器和语音确定用户的性别标签;生成子模块,用于根据性别标签和语音生成用户的声纹模型。
- 一种声纹创建与注册装置,其特征在于,包括:获取模块、声纹识别模块、提示模块、输入模块和注册模块;其中,所述获取模块,用于获取用户发送的语音请求;所述声纹识别模块,用于根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;所述提示模块,用于提示未注册用户创建声纹并注册;所述输入模块,用于生成用户ID;所述注册模块,用于将用户ID和声纹模型对应存储到声纹注册数据库。
- 根据权利要求11所述的声纹创建与注册装置,其特征在于,所述获取模块具体执行:判断是否需要向云端发送所述语音请求,如果是,则根据所述语音 请求,采用声纹识别方式,识别发出语音请求的用户ID。
- 根据权利要求11或12所述的声纹创建与注册装置,其特征在于,所述提示模块具体执行:判断所述语音请求是否需要识别用户ID,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
- 根据权利要求11、12或13所述的声纹创建与注册装置,其特征在于,所述提示模块具体执行:将未识别到用户ID的声纹模型打上ID号;判断所述打上ID号的声纹模型的出现频率;如果低于阈值,则删除该ID号;如果高于阈值,则生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
- 根据权利要求13所述的声纹创建与注册装置,其特征在于,所述提示模块具体执行:采用文本相关的训练方法,为未注册用户建立声纹模型。
- 根据权利要求15所述的声纹创建与注册装置,其特征在于,所述提示模块包括以下子模块:提供子模块,用于将注册字符串提供给用户;接收子模块,用于接收用户阅读注册字符串的语音信息;确定子模块,用于根据性别分类器和语音确定用户的性别标签;生成子模块,用于根据性别标签和语音生成用户的声纹模型。
- 一种设备,其特征在于,所述设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8中任一所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019530680A JP2020503541A (ja) | 2017-06-30 | 2017-11-30 | 声紋の作成・登録の方法及び装置 |
US16/477,121 US11100934B2 (en) | 2017-06-30 | 2017-11-30 | Method and apparatus for voiceprint creation and registration |
KR1020197016874A KR102351670B1 (ko) | 2017-06-30 | 2017-11-30 | 성문 구축 및 등록 방법 및 그 장치 |
EP17915945.4A EP3564950B1 (en) | 2017-06-30 | 2017-11-30 | Method and apparatus for voiceprint creation and registration |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710527022.7A CN107492379B (zh) | 2017-06-30 | 2017-06-30 | 一种声纹创建与注册方法及装置 |
CN201710527022.7 | 2017-06-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019000832A1 true WO2019000832A1 (zh) | 2019-01-03 |
Family
ID=60644303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/113772 WO2019000832A1 (zh) | 2017-06-30 | 2017-11-30 | 一种声纹创建与注册方法及装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11100934B2 (zh) |
EP (1) | EP3564950B1 (zh) |
JP (2) | JP2020503541A (zh) |
KR (1) | KR102351670B1 (zh) |
CN (1) | CN107492379B (zh) |
WO (1) | WO2019000832A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210829A (zh) * | 2020-02-19 | 2020-05-29 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置、系统、设备和计算机可读存储介质 |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108597525B (zh) * | 2018-04-25 | 2019-05-03 | 四川远鉴科技有限公司 | 语音声纹建模方法及装置 |
CN109036436A (zh) * | 2018-09-18 | 2018-12-18 | 广州势必可赢网络科技有限公司 | 一种声纹数据库建立方法、声纹识别方法、装置及系统 |
CN109510844B (zh) * | 2019-01-16 | 2022-02-25 | 中民乡邻投资控股有限公司 | 一种基于声纹的对话交流式的账号注册方法及装置 |
CN111798857A (zh) * | 2019-04-08 | 2020-10-20 | 北京嘀嘀无限科技发展有限公司 | 一种信息识别方法、装置、电子设备及存储介质 |
CN109920435B (zh) * | 2019-04-09 | 2021-04-06 | 厦门快商通信息咨询有限公司 | 一种声纹识别方法及声纹识别装置 |
CN112127090A (zh) * | 2019-06-06 | 2020-12-25 | 青岛海尔洗衣机有限公司 | 用于衣物处理设备的控制方法 |
CN110459227A (zh) * | 2019-08-29 | 2019-11-15 | 四川长虹电器股份有限公司 | 基于智能电视的声纹注册方法 |
CN110570873B (zh) * | 2019-09-12 | 2022-08-05 | Oppo广东移动通信有限公司 | 声纹唤醒方法、装置、计算机设备以及存储介质 |
CN111081258B (zh) * | 2019-11-07 | 2022-12-06 | 厦门快商通科技股份有限公司 | 一种声纹模型管理方法、系统、存储介质及装置 |
CN110992930A (zh) * | 2019-12-06 | 2020-04-10 | 广州国音智能科技有限公司 | 声纹特征提取方法、装置、终端及可读存储介质 |
CN111368504A (zh) * | 2019-12-25 | 2020-07-03 | 厦门快商通科技股份有限公司 | 语音数据标注方法、装置、电子设备及介质 |
CN111161746B (zh) * | 2019-12-31 | 2022-04-15 | 思必驰科技股份有限公司 | 声纹注册方法及系统 |
CN111477234A (zh) * | 2020-03-05 | 2020-07-31 | 厦门快商通科技股份有限公司 | 一种声纹数据注册方法和装置以及设备 |
CN111599367A (zh) * | 2020-05-18 | 2020-08-28 | 珠海格力电器股份有限公司 | 一种智能家居设备的控制方法、装置、设备及介质 |
US11699447B2 (en) * | 2020-06-22 | 2023-07-11 | Rovi Guides, Inc. | Systems and methods for determining traits based on voice analysis |
CN111914803B (zh) * | 2020-08-17 | 2023-06-13 | 华侨大学 | 一种唇语关键词检测方法、装置、设备及存储介质 |
CN112185362A (zh) * | 2020-09-24 | 2021-01-05 | 苏州思必驰信息科技有限公司 | 针对用户个性化服务的语音处理方法及装置 |
CN112423063A (zh) * | 2020-11-03 | 2021-02-26 | 深圳Tcl新技术有限公司 | 一种智能电视自动设置方法、装置及存储介质 |
CN112634909B (zh) * | 2020-12-15 | 2022-03-15 | 北京百度网讯科技有限公司 | 声音信号处理的方法、装置、设备、计算机可读存储介质 |
CN113506577A (zh) * | 2021-06-25 | 2021-10-15 | 贵州电网有限责任公司 | 一种基于增量采集电话录音完善声纹库的方法 |
CN113707154B (zh) * | 2021-09-03 | 2023-11-10 | 上海瑾盛通信科技有限公司 | 模型训练方法、装置、电子设备和可读存储介质 |
CN117221450A (zh) * | 2023-09-25 | 2023-12-12 | 深圳我买家网络科技有限公司 | Ai智慧客服系统 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104967622A (zh) * | 2015-06-30 | 2015-10-07 | 百度在线网络技术(北京)有限公司 | 基于声纹的通讯方法、装置和系统 |
CN105185379A (zh) * | 2015-06-17 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | 声纹认证方法和装置 |
CN105656887A (zh) * | 2015-12-30 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | 基于人工智能的声纹认证方法以及装置 |
CN105913850A (zh) * | 2016-04-20 | 2016-08-31 | 上海交通大学 | 文本相关声纹密码验证方法 |
CN106057206A (zh) * | 2016-06-01 | 2016-10-26 | 腾讯科技(深圳)有限公司 | 声纹模型训练方法、声纹识别方法及装置 |
US20160314790A1 (en) * | 2015-04-22 | 2016-10-27 | Panasonic Corporation | Speaker identification method and speaker identification device |
CN106098068A (zh) * | 2016-06-12 | 2016-11-09 | 腾讯科技(深圳)有限公司 | 一种声纹识别方法和装置 |
CN106847292A (zh) * | 2017-02-16 | 2017-06-13 | 平安科技(深圳)有限公司 | 声纹识别方法及装置 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5864548A (ja) | 1981-10-14 | 1983-04-16 | Fujitsu Ltd | 音声日本語処理システム |
JP3776805B2 (ja) | 2001-02-27 | 2006-05-17 | アルパイン株式会社 | 携帯電話選択無線通信装置 |
US20060222210A1 (en) | 2005-03-31 | 2006-10-05 | Hitachi, Ltd. | System, method and computer program product for determining whether to accept a subject for enrollment |
US20070219801A1 (en) | 2006-03-14 | 2007-09-20 | Prabha Sundaram | System, method and computer program product for updating a biometric model based on changes in a biometric feature of a user |
JP2009109712A (ja) * | 2007-10-30 | 2009-05-21 | National Institute Of Information & Communication Technology | オンライン話者逐次区別システム及びそのコンピュータプログラム |
JP2009237774A (ja) * | 2008-03-26 | 2009-10-15 | Advanced Media Inc | 認証サーバ、サービス提供サーバ、認証方法、通信端末、およびログイン方法 |
US8442824B2 (en) * | 2008-11-26 | 2013-05-14 | Nuance Communications, Inc. | Device, system, and method of liveness detection utilizing voice biometrics |
JP5577737B2 (ja) | 2010-02-18 | 2014-08-27 | 株式会社ニコン | 情報処理システム |
AU2013203139B2 (en) * | 2012-01-24 | 2016-06-23 | Auraya Pty Ltd | Voice authentication and speech recognition system and method |
US20160372116A1 (en) * | 2012-01-24 | 2016-12-22 | Auraya Pty Ltd | Voice authentication and speech recognition system and method |
US9691377B2 (en) * | 2013-07-23 | 2017-06-27 | Google Technology Holdings LLC | Method and device for voice recognition training |
US9548047B2 (en) * | 2013-07-31 | 2017-01-17 | Google Technology Holdings LLC | Method and apparatus for evaluating trigger phrase enrollment |
WO2015033523A1 (ja) | 2013-09-03 | 2015-03-12 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 音声対話制御方法 |
JP2015153258A (ja) * | 2014-02-17 | 2015-08-24 | パナソニックIpマネジメント株式会社 | 車両用個人認証システム及び車両用個人認証方法 |
US20150302856A1 (en) * | 2014-04-17 | 2015-10-22 | Qualcomm Incorporated | Method and apparatus for performing function by speech input |
CN104616655B (zh) * | 2015-02-05 | 2018-01-16 | 北京得意音通技术有限责任公司 | 声纹模型自动重建的方法和装置 |
EP3380964A1 (en) * | 2015-11-24 | 2018-10-03 | Koninklijke Philips N.V. | Two-factor authentication in a pulse oximetry system |
CN106782571A (zh) * | 2017-01-19 | 2017-05-31 | 广东美的厨房电器制造有限公司 | 一种控制界面的显示方法和装置 |
-
2017
- 2017-06-30 CN CN201710527022.7A patent/CN107492379B/zh active Active
- 2017-11-30 JP JP2019530680A patent/JP2020503541A/ja active Pending
- 2017-11-30 WO PCT/CN2017/113772 patent/WO2019000832A1/zh unknown
- 2017-11-30 KR KR1020197016874A patent/KR102351670B1/ko active IP Right Grant
- 2017-11-30 US US16/477,121 patent/US11100934B2/en active Active
- 2017-11-30 EP EP17915945.4A patent/EP3564950B1/en active Active
-
2020
- 2020-10-27 JP JP2020179787A patent/JP7062851B2/ja active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160314790A1 (en) * | 2015-04-22 | 2016-10-27 | Panasonic Corporation | Speaker identification method and speaker identification device |
CN105185379A (zh) * | 2015-06-17 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | 声纹认证方法和装置 |
CN104967622A (zh) * | 2015-06-30 | 2015-10-07 | 百度在线网络技术(北京)有限公司 | 基于声纹的通讯方法、装置和系统 |
CN105656887A (zh) * | 2015-12-30 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | 基于人工智能的声纹认证方法以及装置 |
CN105913850A (zh) * | 2016-04-20 | 2016-08-31 | 上海交通大学 | 文本相关声纹密码验证方法 |
CN106057206A (zh) * | 2016-06-01 | 2016-10-26 | 腾讯科技(深圳)有限公司 | 声纹模型训练方法、声纹识别方法及装置 |
CN106098068A (zh) * | 2016-06-12 | 2016-11-09 | 腾讯科技(深圳)有限公司 | 一种声纹识别方法和装置 |
CN106847292A (zh) * | 2017-02-16 | 2017-06-13 | 平安科技(深圳)有限公司 | 声纹识别方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3564950A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210829A (zh) * | 2020-02-19 | 2020-05-29 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置、系统、设备和计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JP7062851B2 (ja) | 2022-05-09 |
EP3564950A4 (en) | 2020-08-05 |
JP2020503541A (ja) | 2020-01-30 |
KR20190077088A (ko) | 2019-07-02 |
JP2021021955A (ja) | 2021-02-18 |
US20190362724A1 (en) | 2019-11-28 |
KR102351670B1 (ko) | 2022-01-13 |
CN107492379B (zh) | 2021-09-21 |
EP3564950B1 (en) | 2022-03-23 |
US11100934B2 (en) | 2021-08-24 |
CN107492379A (zh) | 2017-12-19 |
EP3564950A1 (en) | 2019-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019000832A1 (zh) | 一种声纹创建与注册方法及装置 | |
CN107481720B (zh) | 一种显式声纹识别方法及装置 | |
JP6771805B2 (ja) | 音声認識方法、電子機器、及びコンピュータ記憶媒体 | |
WO2019000991A1 (zh) | 一种声纹识别方法及装置 | |
US10068588B2 (en) | Real-time emotion recognition from audio signals | |
CN107153496B (zh) | 用于输入表情图标的方法和装置 | |
US20200126566A1 (en) | Method and apparatus for voice interaction | |
EP3617946B1 (en) | Context acquisition method and device based on voice interaction | |
US10395655B1 (en) | Proactive command framework | |
WO2017112813A1 (en) | Multi-lingual virtual personal assistant | |
JP2021533397A (ja) | 話者埋め込みと訓練された生成モデルとを使用する話者ダイアライゼーション | |
WO2020019591A1 (zh) | 用于生成信息的方法和装置 | |
JP2020034895A (ja) | 応答方法及び装置 | |
US20220076674A1 (en) | Cross-device voiceprint recognition | |
CN108363556A (zh) | 一种基于语音与增强现实环境交互的方法和系统 | |
CN112969995A (zh) | 电子装置及其控制方法 | |
TW201937344A (zh) | 智慧型機器人及人機交互方法 | |
CN110704618B (zh) | 确定对话数据对应的标准问题的方法及装置 | |
CN112632244A (zh) | 一种人机通话的优化方法、装置、计算机设备及存储介质 | |
CN113703585A (zh) | 交互方法、装置、电子设备及存储介质 | |
US20190103110A1 (en) | Information processing device, information processing method, and program | |
US10831442B2 (en) | Digital assistant user interface amalgamation | |
CN112037772B (zh) | 基于多模态的响应义务检测方法、系统及装置 | |
CN115171673A (zh) | 一种基于角色画像的交流辅助方法、装置及存储介质 | |
CN111556096B (zh) | 信息推送方法、装置、介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17915945 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019530680 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20197016874 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2017915945 Country of ref document: EP Effective date: 20190729 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |