WO2019000832A1 - 一种声纹创建与注册方法及装置 - Google Patents

一种声纹创建与注册方法及装置 Download PDF

Info

Publication number
WO2019000832A1
WO2019000832A1 PCT/CN2017/113772 CN2017113772W WO2019000832A1 WO 2019000832 A1 WO2019000832 A1 WO 2019000832A1 CN 2017113772 W CN2017113772 W CN 2017113772W WO 2019000832 A1 WO2019000832 A1 WO 2019000832A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voiceprint
registration
voice
module
Prior art date
Application number
PCT/CN2017/113772
Other languages
English (en)
French (fr)
Inventor
王文宇
胡媛
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to JP2019530680A priority Critical patent/JP2020503541A/ja
Priority to US16/477,121 priority patent/US11100934B2/en
Priority to KR1020197016874A priority patent/KR102351670B1/ko
Priority to EP17915945.4A priority patent/EP3564950B1/en
Publication of WO2019000832A1 publication Critical patent/WO2019000832A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies

Definitions

  • the present application relates to the field of artificial intelligence applications, and in particular, to a voiceprint creation and registration method and apparatus.
  • Artificial Intelligence is a new technical science that studies and develops theories, methods, techniques, and applications for simulating, extending, and extending human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that responds in a manner similar to human intelligence. Research in this area includes robotics, speech recognition, image recognition, Natural language processing and expert systems. Among them, one aspect of artificial intelligence is the voiceprint recognition technology.
  • voice dialogue is that it can record the user's voice.
  • everyone has their own voice, just like a fingerprint. So we also call everyone's voice a voiceprint.
  • voiceprint Through the speaker's voiceprint, we can determine who the speaker is.
  • the user and extract the user's data to provide a personalized service.
  • the voiceprint technology in the industry is immature and it is difficult to meet the requirements of productization.
  • the voiceprint creation and registration method has a high learning cost and is more annoying to the user.
  • aspects of the present application provide a voiceprint creation and registration method and apparatus for providing personalized services to users and reducing learning costs.
  • a voiceprint creation and registration method including:
  • the device When the device is first enabled, it prompts to create a voiceprint and register it;
  • the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
  • the user's voiceprint model is generated based on the gender tag and voice.
  • a method for creating and registering a voiceprint including:
  • the user ID If the user ID is not recognized, it prompts to create a voiceprint and register;
  • the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
  • the acquiring the voice request sent by the user further includes:
  • the acquiring the voice request sent by the user further includes:
  • the prompting to create a voiceprint and registering includes:
  • the voiceprint model that does not recognize the user ID is marked with an ID number
  • a user ID is generated; the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
  • the prompting to create a voiceprint and registering includes:
  • a text-related training method is used to establish a voiceprint model for a user who does not recognize the user ID.
  • the text-related training method is used to establish a voiceprint model for a user who does not recognize the user ID, including:
  • the user's voiceprint model is generated based on the gender tag and voice.
  • a voiceprint creation and registration apparatus comprising:
  • a prompt module a voiceprint creation module, an input module, and a registration module;
  • the prompting module is configured to promptly create a voiceprint and register when the device is first enabled
  • the voiceprint establishing module is configured to establish a voiceprint model for a user by using a text-related training method
  • the input module is configured to generate a user ID
  • the registration module is configured to store the user ID and the voiceprint model correspondingly to the voiceprint registration database.
  • the voiceprint establishing module specifically includes the following sub-modules:
  • a receiving submodule configured to receive voice information that the user reads the registration string
  • a sub-module is generated for generating a user's voiceprint model based on gender tags and voice.
  • a voiceprint creation and registration apparatus comprising:
  • the obtaining module is configured to acquire a voice request sent by a user
  • the voiceprint recognition module is configured to identify a user ID that issues a voice request according to the voice request, by using a voiceprint recognition method;
  • the prompting module is configured to prompt an unregistered user to create a voiceprint and register
  • the input module is configured to generate a user ID
  • the registration module is configured to store the user ID and the voiceprint model correspondingly to the voiceprint registration database.
  • the voiceprint model that does not recognize the user ID is marked with an ID number
  • a user ID is generated; the user ID and the voiceprint model are correspondingly stored in the voiceprint registration database.
  • a text-related training method is used to create a voiceprint model for unregistered users.
  • the prompting module includes the following submodules:
  • a receiving submodule configured to receive voice information that the user reads the registration string
  • a sub-module is generated for generating a user's voiceprint model based on gender tags and voice.
  • an apparatus comprising:
  • One or more processors are One or more processors;
  • a storage device for storing one or more programs
  • the one or more programs are executed by the one or more processors such that the one or more processors implement any of the methods described above.
  • a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements any of the above methods.
  • the embodiment of the present application can avoid the problem that the voiceprint recognition method in the prior art has strong technical dependence, a single use strategy, and a low degree of productization. It has a high technical fault tolerance rate, speeds up productization, and provides users with personalized services.
  • FIG. 1 is a schematic flowchart of a voiceprint creation and registration method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a text-related training method in a voiceprint creation and registration method according to an embodiment of the present invention, and a voiceprint model is established for an unregistered user;
  • FIG. 3 is a schematic flowchart of a voiceprint creation and registration method according to another embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of creating and registering a voiceprint according to another embodiment of the present application. Sound request, using a voiceprint recognition method, to identify a flow diagram of a user ID that issues a voice request;
  • FIG. 5 is a schematic flowchart of prompting an unregistered user to create a voiceprint and registering in a voiceprint creation and registration method according to another embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a voiceprint establishing module of a voiceprint creating and registering apparatus according to an embodiment of the present disclosure
  • FIG. 8 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of a prompting module of a voiceprint creating and registering device according to another embodiment of the present disclosure.
  • FIG. 10 is a block diagram of an exemplary computer system/server suitable for use in implementing embodiments of the present invention.
  • MateAPP For an intelligent voice interaction device, there is a MateAPP that cooperates with the intelligent voice interaction device on the mobile terminal to complete a series of tasks.
  • a "voice management" function module is created on MateAPP, in which the user can create, delete and modify the voiceprint under the account.
  • FIG. 1 is a schematic flowchart of a voiceprint creation and registration method according to an embodiment of the present application. As shown in Figure 1, the following steps are included:
  • the device when the device is first enabled, it prompts to create a voiceprint and register;
  • the user When the device is powered on for the first time, the user is prompted to register at least one voiceprint ID through MateAPP, and confirm relevant identity information, such as name, age, gender, and the like.
  • the user creates a voiceprint through MateAPP or by voice expressing the will to create a voiceprint.
  • a text-related training method is used to establish a voiceprint model for the user; specifically, as shown in FIG. 2, the following sub-steps are included:
  • the registration string is provided to the user.
  • the registration string can be in many forms:
  • the registration string can be a randomly generated string of numbers.
  • the number in the registration string only appears once.
  • the registration string can be a randomly generated kanji string.
  • the user receives the voice information of the registration string.
  • the user can perform a plurality of readings according to the provided registration string to generate a plurality of voices for registration.
  • the voice information generated by the user to read aloud according to the provided registration string may be received.
  • the user's gender tag is determined based on the gender classifier and voice.
  • the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
  • the gender tag includes male or female.
  • the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
  • the gender classifier analyzes the first feature information to obtain the first The gender tag of the feature information, which is the gender tag of the user.
  • the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
  • the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined.
  • the gender is male.
  • the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user can be determined to be female.
  • a user's voiceprint model is generated based on the gender tag and voice.
  • a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
  • the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
  • a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
  • Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • a user ID is generated, and the user is prompted to input user ID related data such as name, gender, age, hobby, home address, and work address.
  • the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that the voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
  • the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
  • Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
  • the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
  • FIG. 3 is a schematic flowchart of a voiceprint creation and registration method according to another embodiment of the present application. As shown in FIG. 3, the method includes the following steps:
  • the user after the intelligent voice interaction device is connected to the network, the user performs voice interaction with the intelligent voice interaction device to determine whether the voice request needs to be sent to the cloud; if yes, further identify the user ID that sends the voice request. .
  • the voice request is first performed on the voice request, and the command described by the command voice is obtained, and the command is determined to correspond to the vertical class; If the user ID needs to be determined to provide a personalized recommendation, the voice request is directly responded; if the user class needs to determine the user ID to provide a personalized recommendation, the user ID that issued the voice request is further identified.
  • the voiceprint identification method is used to identify the user ID that sends the voice request; specifically, as shown in FIG. 4, the following sub-steps are included:
  • a voiceprint identification method is used to identify a user gender tag that issues a voice request.
  • model training can be performed according to the voice characteristics of the user group to realize voiceprint analysis for user groups of different genders.
  • the voiceprint recognition mode is used to identify the user sexual information that issues the voice request.
  • the voiceprint of the speaker needs to be modeled, that is, "training” or “learning”. Specifically, the first feature vector of each voice in the training set is extracted by applying the deep neural network DNN voiceprint baseline system; and the gender classifier is trained according to the first feature vector of each voice and the pre-labeled gender label. Thus, a gender-based voiceprint processing model is established.
  • the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the command voice.
  • the fundamental frequency feature and the Mel frequency cepstrum coefficient MFCC feature can be extracted firstly for the speech request, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
  • Perform a posteriori probability value calculation based on the calculated knot If the gender of the user is determined, for example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the gender of the user may be determined as a male, when the calculation result The posterior probability value is small, and if less than a certain threshold, the gender of the user is determined to be female.
  • the user voiceprint ID that issued the command voice is further identified.
  • Each user's voice will have a unique voiceprint ID that records personal data such as the user's name, gender, age, and hobbies.
  • the voice input by the user is sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice request returned by the gender classifier. That is, if the voice request corresponds to a male voice, the voice is sent to the male DNN model. If the voice request corresponds to a female voice, the voice is sent to the female DNN model.
  • Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the manner of obtaining is many, and may be selected according to different application requirements, for example:
  • the voiceprint is created and registered
  • the non-text related training method is used to establish a voiceprint model for the unregistered user and register.
  • the obtained voiceprint model of the unregistered user is marked with an ID number
  • the user ID is generated, and the user is prompted to input data related to the user ID such as name, gender, age, hobby, home address, work address, and the like.
  • the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
  • the interruption to the user can be minimized, and the voiceprint can be created only for the frequently used home users. Specifically:
  • the voiceprint model that does not recognize the user ID is marked with an ID number; but the user ID is not generated to prompt the user to input data related to the user ID such as name, gender, age, hobbies, home address, work address, etc.; the user to whom the ID number belongs is recorded only in the background. the behavior of.
  • a user ID is generated, prompting the user to input user ID related data such as name, gender, age, hobby, home address, work address, and the like.
  • the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
  • the text-related training method is used to establish a voiceprint model for a user who does not recognize the user ID; if the voiceprint technology is not perfect, the text-related training method may be used to improve Recognition rate.
  • using a text-related training method to establish a voiceprint model for a user who does not recognize the user ID includes the following sub-steps:
  • the registration string is provided to the user.
  • the registration string can be in many forms:
  • the registration string can be a randomly generated string of numbers.
  • the number in the registration string only appears once.
  • the registration string can be a randomly generated kanji string.
  • the user receives the voice information of the registration string.
  • the user can perform a plurality of readings in accordance with the provided registration string for registration.
  • the voice information generated by the user to read aloud according to the provided registration string may be received.
  • the user's gender tag is determined based on the gender classifier and voice.
  • the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
  • the gender tag includes male or female.
  • the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
  • the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the user.
  • the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
  • the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined. Sex Don't be male. When the calculation result is small, the posterior probability value is small. If it is less than a certain threshold, the gender of the user can be determined to be female.
  • a user's voiceprint model is generated based on the gender tag and voice.
  • a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
  • the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
  • a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
  • Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the manner of obtaining is many, and may be selected according to different application requirements, for example:
  • a user ID is generated, and the user is prompted to input user ID related data such as name, gender, age, hobby, home address, and work address.
  • the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so as to subsequently perform voiceprint recognition according to the voiceprint model, and any intelligence under the account.
  • Voice device can be used for voice control.
  • the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
  • Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
  • the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
  • the user can log in by MateAPP and modify the user ID and the voiceprint model.
  • the method in this embodiment can avoid the problem that the technology of the voiceprint creation and registration method in the prior art has a high learning cost and is more disturbing to the user.
  • the gender-based voiceprint registration process is implemented, so that the gender-specific voiceprint authentication processing model can improve the efficiency and accuracy of voiceprint authentication; the voiceprinting process can cover various scenes, and the voiceprint can be established at various stages. Guide the user, or separate the voiceprint establishment from the registration by frequency, minimize the interruption of the user, guide the user to register the voiceprint and then enable the voice interaction product to provide personalized service to the user based on the voiceprint.
  • FIG. 6 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present invention. As shown in FIG. 6, the method includes a prompting module 61, a voiceprint establishing module 62, an input module 63, and a registration module 64.
  • the prompting module 61 is configured to prompt to create a voiceprint and register when the device is first enabled
  • the user When the device is powered on for the first time, the user is guided to register at least one voiceprint ID through MateAPP, and confirm relevant identity information, such as name, age, gender, and the like.
  • the user creates a voiceprint through MateAPP or by voice expressing the will to create a voiceprint.
  • the voiceprint establishing module 62 is configured to establish a voiceprint model for the user by using a text-related training method; specifically, as shown in FIG. 7, the following sub-module is included:
  • a sub-module 71 is provided for providing a registration string to the user.
  • the registration string can be in many forms:
  • the registration string can be a randomly generated string of numbers.
  • the number in the registration string only appears once.
  • the registration string can be a randomly generated kanji string.
  • the receiving sub-module 72 is configured to receive voice information that the user reads the registration string.
  • the user can perform a plurality of readings according to the provided registration string to generate a plurality of voices for registration.
  • the voice information generated by the user to read aloud according to the provided registration string may be received.
  • the determining sub-module 73 is configured to determine the gender tag of the user according to the gender classifier and the voice.
  • the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
  • the gender tag includes male or female.
  • the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
  • the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the user.
  • the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
  • the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined.
  • the gender is male.
  • the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user can be determined to be female.
  • the generating sub-module 74 is configured to generate a voiceprint model of the user according to the gender tag and the voice.
  • a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
  • the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
  • a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
  • Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the manner of obtaining is many, and may be selected according to different application requirements, for example:
  • the input module 63 is configured to generate a user ID, and prompt the user to input user ID related data such as name, gender, age, hobby, home address, and work address.
  • the registration module 64 is configured to store the user ID and the voiceprint model correspondingly under a certain account of the voiceprint registration database, so as to subsequently perform voiceprint recognition according to the voiceprint model, and perform voice control on any intelligent voice device under the account. .
  • the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
  • Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
  • the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
  • FIG. 8 is a schematic structural diagram of a voiceprint creation and registration apparatus according to another embodiment of the present application. As shown in FIG. 8, the method includes the following steps:
  • the obtaining module 81 is configured to obtain a voice request sent by the user.
  • the user After the intelligent voice interaction device is connected to the network, the user Perform voice interaction with the intelligent voice interaction device to determine whether a voice request needs to be sent to the cloud; if yes, further identify the user ID that issued the voice request.
  • voice recognition is first performed on the voice request, and the command described by the command voice is obtained, and the command is determined to correspond to the vertical class; if the vertical class does not need to determine the user ID to provide personalization Preferably, the voice request is directly responded to; if the drop class needs to determine the user ID to provide a personalized recommendation, the user ID that issued the voice request is further identified.
  • the voiceprint recognition module 82 is configured to identify a user ID that issues a voice request according to the voice request, and specifically includes the following submodules:
  • the user gender identification sub-module is configured to identify a user gender tag that issues a voice request according to the voice request and adopt a voiceprint recognition manner.
  • model training can be performed according to the voice characteristics of the user group to realize voiceprint analysis for user groups of different genders.
  • the voiceprint recognition mode is used to identify the user sexual information that issues the voice request.
  • the voiceprint of the speaker needs to be modeled, that is, "training” or “learning”. Specifically, the first feature vector of each voice in the training set is extracted by applying the deep neural network DNN voiceprint baseline system; and the gender classifier is trained according to the first feature vector of each voice and the pre-labeled gender label. Thus, a gender-based voiceprint processing model is established.
  • the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a command voice Gender tag.
  • the fundamental frequency feature and the Mel frequency cepstrum coefficient MFCC feature can be extracted firstly for the speech request, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
  • Performing a posteriori probability value calculation, and determining the gender of the user according to the calculation result For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if it is greater than a certain threshold, the determination may be performed. The gender of the user is male. When the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user may be determined to be female.
  • the user voiceprint ID identification sub-module is configured to identify the user voiceprint ID that issues the command voice after identifying the user gender label that issues the voice request.
  • Each user's voice will have a unique voiceprint ID that records personal data such as the user's name, gender, age, and hobbies.
  • the voice input by the user is sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice request returned by the gender classifier. That is, if the voice request corresponds to a male voice, the voice is sent to the male DNN model. If the voice request corresponds to a female voice, the voice is sent to the female DNN model.
  • Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the manner of obtaining is many, and may be selected according to different application requirements, for example:
  • the matching value is less than a preset threshold, it is determined that the user is not registered, so that the smart device is used for the first time.
  • the prompting module 83 is configured to prompt to create a voiceprint and register if the user ID is not recognized;
  • the prompting module 83 uses a non-text related training method to establish a voiceprint model for the user.
  • the voiceprint model that does not recognize the user ID is marked with an ID number
  • Generate a user ID prompt the user to input data related to the user ID such as name, gender, age, hobby, home address, work address, and register the voiceprint.
  • the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
  • the interruption to the user can be minimized, and the voiceprint can be created only for the frequently used home users. Specifically:
  • the voiceprint model that does not recognize the user ID is marked with an ID number; but the user ID is not generated to prompt the user to input data related to the user ID such as name, gender, age, hobbies, home address, work address, etc.; the user to whom the ID number belongs is recorded only in the background. the behavior of.
  • a user ID is generated, prompting the user to input a user ID number such as name, gender, age, hobby, home address, work address, and the like.
  • a user ID number such as name, gender, age, hobby, home address, work address, and the like.
  • the user ID and the voiceprint model are correspondingly stored under a certain account of the voiceprint registration database, so that voiceprint recognition is performed according to the voiceprint model, and voice control is performed on any intelligent voice device under the account.
  • the prompting module 83 adopts a text-related training method, and establishes a voiceprint model for the user ID that is not recognized and registers; if the voiceprint technology is not perfect, the text correlation may be used.
  • the training method improves the recognition rate. Specifically, as shown in FIG. 9, the following sub-modules are included:
  • a sub-module 91 is provided for providing a registration string to the user.
  • the registration string can be in many forms:
  • the registration string can be a randomly generated string of numbers.
  • the number in the registration string only appears once.
  • the registration string can be a randomly generated kanji string.
  • the receiving sub-module 92 is configured to receive voice information that the user reads the registration string.
  • the user can perform a plurality of readings in accordance with the provided registration string for registration.
  • the voice information generated by the user to read aloud according to the provided registration string may be received.
  • the determining sub-module 93 is configured to determine a gender tag of the user according to the gender classifier and the voice.
  • the voice may be gender-classified according to the gender classifier to obtain the gender label of the user.
  • the gender tag includes male or female.
  • the first feature information of the acquired voice is extracted, and the first feature information is sent to a pre-generated gender classifier.
  • the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the user.
  • the fundamental frequency feature and the Mel frequency cepstral coefficient MFCC feature can be extracted first, and then the fundamental frequency feature and the MFCC feature can be based on the Gaussian mixture model.
  • the probability value is calculated, and the gender of the user is determined according to the calculation result. For example, if the Gaussian mixture model is a male Gaussian mixture model, when the calculation result is a high posterior probability value, if the threshold value is greater than a certain threshold, the user may be determined.
  • the gender is male.
  • the calculation result is that the posterior probability value is small, such as less than a certain threshold, the gender of the user can be determined to be female.
  • the generating sub-module 94 is configured to generate a voiceprint model of the user according to the gender tag and the voice.
  • a posterior probability of each speech is obtained according to a DNN model corresponding to the gender tag.
  • the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the voice returned by the gender classifier. That is, if the voice corresponds to a male voice, the voice is sent to the male DNN model. If the voice corresponds to a female voice, the voice is sent to the female DNN model.
  • a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
  • Each posterior probability is normalized according to a unified background model corresponding to the gender label, and the pre-trained feature vector extraction model is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the second feature vector is applied to extract each speech according to each voice and the corresponding normalized posterior probability.
  • the manner of obtaining is many, and may be selected according to different application requirements, for example:
  • the input module 84 is configured to generate a user ID, and prompt the user to input user ID related data such as name, gender, age, hobby, home address, and work address.
  • the registration module 85 is configured to store the user ID and the voiceprint model under a certain account of the voiceprint registration database, so as to subsequently perform voiceprint recognition according to the voiceprint model, and perform voice control on any intelligent voice device under the account. .
  • the pre-stored voiceprint model is associated with the same account, for example, a Baidu account, and all the voiceprints under the account form a set.
  • Each intelligent voice interaction device and the account are uniquely bound, and the intelligent voice interaction device is associated with the voiceprint through the account, and the voiceprint can be registered by any device under the account. Once registered, it can be used in any intelligent voice device under the account. .
  • the device under a certain account collects voiceprints, it matches in the family voiceprint collection under the same account, and recognizes the voiceprint ID, which achieves the unification of the three, and realizes the end-to-end voiceprint collection identification solution. Program.
  • the user can log in by MateAPP and modify the user ID and the voiceprint model.
  • the method in this embodiment can avoid the problem that the technology of the voiceprint creation and registration method in the prior art has a high learning cost and is more disturbing to the user.
  • the gender-based voiceprint registration process is implemented, so that the gender-specific voiceprint authentication processing model can improve the efficiency and accuracy of voiceprint authentication; the voiceprinting process can cover various scenes, and the voiceprint can be established at various stages. Guide the user, or separate the voiceprint establishment from the registration by frequency, minimize the interruption of the user, guide the user to register the voiceprint and then enable the voice interaction product to provide personalized service to the user based on the voiceprint.
  • the device in this embodiment can avoid the problem that the learning method of the voiceprint creation and registration method in the prior art is relatively high and the user is bothered.
  • the process of establishing the voiceprint can cover various scenes, the voiceprint establishment can guide the user at various stages, or separate the voiceprint establishment and registration by frequency, minimize the disturbance to the user, guide the user to register the voiceprint and then make the voice interaction product Personalized services can be provided to users based on voiceprints.
  • the disclosed methods and apparatus may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • Figure 10 illustrates an exemplary computer system/service suitable for use in implementing embodiments of the present invention.
  • the computer system/server 012 shown in FIG. 10 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • computer system/server 012 is represented in the form of a general purpose computing device.
  • Components of computer system/server 012 may include, but are not limited to, one or more processors or processing units 016, system memory 028, and bus 018 that connects different system components, including system memory 028 and processing unit 016.
  • Bus 018 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an Enhanced ISA Bus, a Video Electronics Standards Association (VESA) local bus, and peripheral component interconnects ( PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI peripheral component interconnects
  • Computer system/server 012 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer system/server 012, including volatile and non-volatile media, removable and non-removable media.
  • System memory 028 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 030 and/or cache memory 032.
  • Computer system/server 012 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 034 can be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 10, commonly referred to as a "hard disk drive").
  • a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
  • a removable non-volatile disk such as a CD-ROM, DVD-ROM
  • other optical media read and write optical drive.
  • each drive The actuator can be coupled to bus 018 via one or more data medium interfaces.
  • Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of various embodiments of the present invention.
  • Program/utility 040 having a set (at least one) of program modules 042, which may be stored, for example, in memory 028, such program module 042 includes, but is not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment.
  • Program module 042 typically performs the functions and/or methods of the embodiments described herein.
  • the computer system/server 012 can also be in communication with one or more external devices 014 (eg, a keyboard, pointing device, display 024, etc.), in which the computer system/server 012 communicates with an external radar device, and can also A plurality of devices that enable a user to interact with the computer system/server 012, and/or any device (eg, a network card, modem, etc.) that enables the computer system/server 012 to communicate with one or more other computing devices Communication. This communication can take place via an input/output (I/O) interface 022.
  • I/O input/output
  • computer system/server 012 can also communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) via network adapter 020.
  • network adapter 020 communicates with other modules of computer system/server 012 via bus 018.
  • other hardware and/or software modules may be utilized in connection with computer system/server 012, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems. , tape drives, and data backup storage systems.
  • Processing unit 016 performs the functions and/or methods of the described embodiments of the present invention by running a program stored in system memory 028.
  • the computer program described above may be provided in a computer storage medium encoded with a computer program that, when executed by one or more computers, causes one or more computers to perform the embodiment of the invention described above Method flow and/or device operation.
  • the transmission route of computer programs is no longer limited by tangible media, and can also be downloaded directly from the network. Any combination of one or more computer readable media can be utilized.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
  • a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium can be transferred by any suitable medium, including Including, but not limited to, wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present invention may be written in one or more programming languages, or a combination thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
  • LAN local area network
  • WAN wide area network

Abstract

一种声纹创建与注册方法及装置,包括:当设备首次启用,提示创建声纹并注册(101);采用文本相关的训练方法,为用户建立声纹模型(102);生成用户ID(103),提示用户输入用户ID相关数据;将用户ID和声纹模型对应存储到声纹注册数据库(104)。能够避免声纹创建与注册方法技术学习成本较高,较为打扰用户的问题。使得声纹的建立过程能够覆盖各种场景,声纹建立可以在各个阶段引导用户,或者通过频次将声纹建立与注册分离,对用户的打扰最小化,引导用户注册声纹而后使得语音交互产品可以基于声纹对用户提供个性化服务。

Description

一种声纹创建与注册方法及装置
本申请要求了申请日为2017年06月30日,申请号为201710527022.7发明名称为“一种声纹创建与注册方法及装置”的中国专利申请的优先权。
技术领域
本申请涉及人工智能应用领域,尤其涉及一种声纹创建与注册方法及装置。
背景技术
人工智能(Artificial Intelligence;AI),是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学。人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。其中,人工智能很重要的一个方面就是声纹识别技术。
近年来,人工智能技术有了深远的发展,并逐步实现产品化。特别是智能语音对话产品,随着国外的亚马逊Echo智能音响及Google Home智能音响的兴起,掀起了以对话为主要交互方式的智能家居产品特别是智能音响产品的流行热潮。
包括智能音箱在内的智能语音对话产品的典型使用场景是在家庭之中,在家庭中用户用语音与机器进行交互十分自然,而家庭中往往是多用户,每个用户必然会有不同的需求,但目前产品的服务都很粗糙,对所有的用户提供的是一套相同的服务,产品对用户请求的应答使用的都是同一套通用标准,造成了用户个性化需求无法得到满足。
语音对话的优势就是能收录用户的声音,每个人都有自己的声音,就像指纹一样,所以我们又称每个人的声音为声纹,通过说话人的声纹,判断出说话人是哪位用户,并提取该用户的数据,以提供个性化的服务。目前业界的声纹技术都不成熟,难以达到产品化的要求。
现有技术中存在声纹创建与注册方法技术学习成本较高,较为打扰用户的问题。
发明内容
本申请的多个方面提供一种声纹创建与注册方法及装置,用以为用户提供个性化服务,降低学习成本。
本申请的一方面,提供一种声纹创建与注册方法,包括:
当设备首次启用,提示创建声纹并注册;
采用文本相关的训练方法,为用户建立声纹模型;
生成用户ID;
将用户ID和声纹模型对应存储到声纹注册数据库。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述采用文本相关的训练方法,为用户建立声纹模型包括以下子步骤:
将注册字符串提供给用户;
接收用户阅读注册字符串的语音信息;
根据性别分类器和语音确定用户的性别标签;
根据性别标签和语音生成用户的声纹模型。
本申请的另一方面,提供一种声纹创建与注册方法,包括:
获取用户发送的语音请求;
根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户 ID;
若未识别到用户ID,则提示创建声纹并注册;
生成用户ID;
将用户ID和声纹模型对应存储到声纹注册数据库。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述获取用户发送的语音请求进一步包括:
判断是否需要向云端发送所述语音请求,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述获取用户发送的语音请求进一步包括:
判断所述语音请求是否需要识别用户ID,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述提示创建声纹并注册包括:
将未识别到用户ID的声纹模型打上ID号;
判断所述打上ID号的声纹模型的出现频率;
如果低于阈值,则删除该ID号;
如果高于阈值,则生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述提示创建声纹并注册包括:
采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型包括:
将注册字符串提供给用户;
接收用户阅读注册字符串的语音信息;
根据性别分类器和语音确定用户的性别标签;
根据性别标签和语音生成用户的声纹模型。
本发明的另一方面,提供一种声纹创建与注册装置,包括:
提示模块、声纹建立模块、输入模块、注册模块;其中,
所述提示模块,用于当设备首次启用,提示创建声纹并注册;
所述声纹建立模块,用于采用文本相关的训练方法,为用户建立声纹模型;
所述输入模块,用于生成用户ID;
所述注册模块,用于将用户ID和声纹模型对应存储到声纹注册数据库。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述声纹建立模块,具体包括以下子模块:
提供子模块,用于将注册字符串提供给用户;
接收子模块,用于接收用户阅读注册字符串的语音信息;
确定子模块,用于根据性别分类器和语音确定用户的性别标签;
生成子模块,用于根据性别标签和语音生成用户的声纹模型。
本发明的另一方面,提供一种声纹创建与注册装置,包括:
获取模块、声纹识别模块、提示模块、输入模块和注册模块;其中,
所述获取模块,用于获取用户发送的语音请求;
所述声纹识别模块,用于根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;
所述提示模块,用于提示未注册用户创建声纹并注册;
所述输入模块,用于生成用户ID;
所述注册模块,用于将用户ID和声纹模型对应存储到声纹注册数据库。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述获取模块具体执行:
判断是否需要向云端发送所述语音请求,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述提示模块具体执行:
判断所述语音请求是否需要识别用户ID,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述提示模块具体执行:
将未识别到用户ID的声纹模型打上ID号;
判断所述打上ID号的声纹模型的出现频率;
如果低于阈值,则删除该ID号;
如果高于阈值,则生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述提示模块具体执行:
采用文本相关的训练方法,为未注册用户建立声纹模型。
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述提示模块包括以下子模块:
提供子模块,用于将注册字符串提供给用户;
接收子模块,用于接收用户阅读注册字符串的语音信息;
确定子模块,用于根据性别分类器和语音确定用户的性别标签;
生成子模块,用于根据性别标签和语音生成用户的声纹模型。
本申请的另一方面,提供一种设备,其特征在于,所述设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现任一上述的方法。
本申请的另一方面,提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现任一上述的方法。
由所述技术方案可知,本申请实施例能够避免现有技术中声纹识别方法技术依赖性极强、使用策略单一、产品化程度低的问题。具有较高的技术容错率,加快产品化速度,为用户提供个性化服务。
附图说明
图1为本申请一实施例提供的声纹创建与注册方法的流程示意图;
图2为本申请一实施例提供的声纹创建与注册方法中采用文本相关的训练方法,为未注册用户建立声纹模型的流程示意图;
图3为本申请另一实施例提供的声纹创建与注册方法的流程示意图
图4为本申请另一实施例提供的声纹创建与注册方法中根据所述语 音请求,采用声纹识别方式,识别发出语音请求的用户ID的流程示意图;
图5为本申请另一实施例提供的声纹创建与注册方法中提示未注册用户创建声纹并注册的流程示意图;
图6为本申请另一实施例提供的声纹创建与注册装置的结构示意图;
图7为本申请一实施例提供的声纹创建与注册装置的声纹建立模块的结构示意图;
图8为本申请另一实施例提供的声纹创建与注册装置的结构示意图;
图9为本申请另一实施例提供的声纹创建与注册装置的提示模块的结构示意图;
图10为适于用来实现本发明实施例的示例性计算机系统/服务器的框图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
对于一个智能语音交互设备,存在一个MateAPP在手机端与智能语音交互设备配合,完成一系列任务。为了建立声纹,在MateAPP上创建有一个“声纹管理”功能模块,在其中用户可以创建、删除和修改账号下的声纹。
图1为本申请一实施例提供的声纹创建与注册方法的流程示意图, 如图1所示,包括以下步骤:
在101中,当设备首次启用,提示创建声纹并注册;
当设备第一次启动上电,提示用户通过MateAPP注册至少一个声纹ID,并确认相关身份信息,如姓名、年龄、性别等信息。
用户通过MateAPP或通过语音表达要创建声纹的意愿从而进行声纹创建。
在102中,采用文本相关的训练方法,为用户建立声纹模型;具体的,如图2所示,包括以下子步骤:
在201中,将注册字符串提供给用户。
可以理解,该注册字符串的形式可以有很多种:
作为一种示例,该注册字符串可为随机生成的数字串。此外,为了能够覆盖更大的样本空间,注册字符串中的数字只出现一次。
作为另一种示例,该注册字符串可为随机生成的汉字字符串。
在202中,接收用户阅读注册字符串的语音信息。
具体地,在将该注册字符串提供给用户之后,用户可按照提供的注册字符串进行多次朗读以生成多条语音进行注册。在用户阅读该注册字符串的过程中,或者在用户完成阅读该注册字符串时,可接收用户按照提供的注册字符串进行朗读而生成的语音信息。
在203中,根据性别分类器和语音确定用户的性别标签。
在本发明的实施例中,可根据性别分类器对语音进行性别分类,得到该用户的性别标签。其中,该性别标签包括男性或女性。具体而言,提取所获取到的语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一 特征信息的性别标签,也就是用户的性别标签。
举例而言,以性别分类模型为高斯混合模型为例,可先对该语音提取基频特征以及梅尔频率倒谱系数MFCC特征,之后,可基于高斯混合模型对基频特征以及MFCC特征进行后验概率值计算,根据计算结果确定该用户的性别,例如,假设该高斯混合模型为男性高斯混合模型,则当计算结果为后验概率值很高,如大于一定阈值时,可确定该用户的性别为男性,当计算结果为后验概率值很小,如小于一定阈值时,可确定该用户的性别为女性。
在204中,根据性别标签和语音生成用户的声纹模型。
根据与所述性别标签对应的DNN模型获取每条语音的后验概率。
根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型。
具体地,根据性别分类器返回的与语音对应的性别标签,将用户输入的多条语音发送到对应性别的DNN模型中。也就是说,如果语音对应的是男性语音,将语音发送到男性DNN模型中。如果语音对应的是女性语音,将语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取每条语音对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹 模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹模型。
在103中,生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据。
在104中,将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
其中,所述预存的声纹模型关联在同一账号,例如百度账号,之下,该账号下所有的声纹形成一个集合。各个智能语音交互设备与账号是唯一绑定的,通过账号将智能语音交互设备与声纹联系起来,声纹可以通过账号下的任意设备注册,一旦注册,可以在账号下任意智能语音设备中使用。当某一账号下的设备采集声纹后,就在该同一账号下的家庭声纹集合中进行匹配,识别声纹ID,达到了三者的统一,实现了从端到端的声纹集合识别解决方案。
图3为本申请另一实施例提供的声纹创建与注册方法的流程示意图,如图3所示,包括以下步骤:
在301中,获取用户发送的语音请求;
在本实施例的一种实现方式中,在智能语音交互设备联网后,用户与智能语音交互设备进行语音交互,判断是否需要向云端发送语音请求;如果是,则进一步识别发出语音请求的用户ID。
在本实施例的另一种实现方式中,首先对语音请求进行语音识别,得到命令语音所描述的命令,确定所述命令对应垂类;如果所述垂类不 需要确定用户ID以提供个性化推荐,则直接响应语音请求;如果所述垂类需要确定用户ID以提供个性化推荐,则进一步识别发出语音请求的用户ID。
在302中,根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;具体的,如图4所示,包括以下子步骤:
在401中,根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户性别标签。
由于不同性别的用户群,具有特殊的声纹特征,因此,可以根据用户群的声音特点,进行模型训练,以实现面向不同性别的用户群的声纹分析。当用户发起语音请求时,根据用户发出的语音请求,采用声纹识别方式,识别出发出语音请求的用户性信息。
在声纹识别之前,需要先对说话人的声纹进行建模,即“训练”或“学习”。具体的,通过应用深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器。从而建立了区分性别的声纹处理模型。
根据所获取到的命令语音,提取所述命令语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一特征信息的性别标签,也就是命令语音的性别标签。
举例而言,以性别分类器为高斯混合模型为例,可先对所述语音请求提取基频特征以及梅尔频率倒谱系数MFCC特征,之后,可基于高斯混合模型对基频特征以及MFCC特征进行后验概率值计算,根据计算结 果确定该用户的性别,例如,假设该高斯混合模型为男性高斯混合模型,则当计算结果为后验概率值很高,如大于一定阈值时,可确定该用户的性别为男性,当计算结果为后验概率值很小,如小于一定阈值时,可确定该用户的性别为女性。
在402中,识别出发出语音请求的用户性别标签后,进一步识别发出命令语音的用户声纹ID。
每个用户的声音会有一个唯一的声纹ID,该ID记录有该用户姓名、性别、年龄、爱好等个人数据。
具体地,根据性别分类器返回的与语音请求对应的性别标签,将用户输入的语音发送到对应性别的DNN模型中。也就是说,如果语音请求对应的是男性语音,将语音发送到男性DNN模型中。如果语音请求对应的是女性语音,将语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取语音请求对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹模型。
通过将获取到的所述用户的声纹模型,与预存的声纹模型进行匹配,如果所述匹配值小于预先设定的阈值,则确定所述用户未进行注册,为首次使用智能设备,执行步骤303。
在303中,若未识别到用户ID,则提示创建声纹并注册;
在本实施例的一种实现方式中,若未识别到用户ID,则采用非文本相关的训练方法,为未注册用户建立声纹模型并注册。
具体的,将获取到的未进行注册的用户的声纹模型打上ID号;
生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据并注册该声纹。
将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
在本实施例的另一种实现方式中,为对用户的打扰可以达到最小化,可以只给经常使用的家庭用户引导创建声纹,具体的:
将未识别到用户ID的声纹模型打上ID号;但不生成用户ID提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据;仅在后台记录该ID号所属用户的行为。
判断打上ID号的声纹模型的出现频率;
如果该声纹出现频次低,则自动删除该ID号;
如果该声纹出现频次较高或连续多天出现,则生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据并注册该声纹。将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
在本实施例的一种实现方式中,采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型;在声纹技术尚不完善的情况下,可以使用文本相关的训练方法提高识别率。
具体的,如图5所示,采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型包括以下子步骤:
在501中,将注册字符串提供给用户。
可以理解,该注册字符串的形式可以有很多种:
作为一种示例,该注册字符串可为随机生成的数字串。此外,为了能够覆盖更大的样本空间,注册字符串中的数字只出现一次。
作为另一种示例,该注册字符串可为随机生成的汉字字符串。
在502中,接收用户阅读注册字符串的语音信息。
具体地,在将该注册字符串提供给用户之后,用户可按照提供的注册字符串进行多次朗读以进行注册。在用户阅读该注册字符串的过程中,或者在用户完成阅读该注册字符串时,可接收用户按照提供的注册字符串进行朗读而生成的语音信息。
在503中,根据性别分类器和语音确定用户的性别标签。
在本发明的实施例中,可根据性别分类器对语音进行性别分类,得到该用户的性别标签。其中,该性别标签包括男性或女性。具体而言,提取所获取到的语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一特征信息的性别标签,也就是用户的性别标签。
举例而言,以性别分类模型为高斯混合模型为例,可先对该语音提取基频特征以及梅尔频率倒谱系数MFCC特征,之后,可基于高斯混合模型对基频特征以及MFCC特征进行后验概率值计算,根据计算结果确定该用户的性别,例如,假设该高斯混合模型为男性高斯混合模型,则当计算结果为后验概率值很高,如大于一定阈值时,可确定该用户的性 别为男性,当计算结果为后验概率值很小,如小于一定阈值时,可确定该用户的性别为女性。
在504中,根据性别标签和语音生成用户的声纹模型。
根据与所述性别标签对应的DNN模型获取每条语音的后验概率。
根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型。
具体地,根据性别分类器返回的与语音对应的性别标签,将用户输入的多条语音发送到对应性别的DNN模型中。也就是说,如果语音对应的是男性语音,将语音发送到男性DNN模型中。如果语音对应的是女性语音,将语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取每条语音对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹模型。
在304中,生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据。
在305中,将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智 能语音设备进行语音控制。
其中,所述预存的声纹模型关联在同一账号,例如百度账号,之下,该账号下所有的声纹形成一个集合。各个智能语音交互设备与账号是唯一绑定的,通过账号将智能语音交互设备与声纹联系起来,声纹可以通过账号下的任意设备注册,一旦注册,可以在账号下任意智能语音设备中使用。当某一账号下的设备采集声纹后,就在该同一账号下的家庭声纹集合中进行匹配,识别声纹ID,达到了三者的统一,实现了从端到端的声纹集合识别解决方案。
优选的,用户可以通过MateAPP以语音登录,对用户ID、声纹模型进行修改。
本实施例所述方法能够避免现有技术中声纹创建与注册方法技术学习成本较高,较为打扰用户的问题。实现了区分性别的声纹注册过程,以便应用区分性别的声纹认证处理模型提高了声纹认证的效率和准确性;使得声纹的建立过程能够覆盖各种场景,声纹建立可以在各个阶段引导用户,或者通过频次将声纹建立与注册分离,对用户的打扰最小化,引导用户注册声纹而后使得语音交互产品可以基于声纹对用户提供个性化服务。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在所述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
图6为本申请另一实施例提供的声纹创建与注册装置的结构示意图,如6所示,包括提示模块61、声纹建立模块62、输入模块63、注册模块64;其中,
所述提示模块61,用于当设备首次启用,提示创建声纹并注册;
当设备第一次启动上电,引导用户通过MateAPP注册至少一个声纹ID,并确认相关身份信息,如姓名、年龄、性别等信息。
用户通过MateAPP或通过语音表达要创建声纹的意愿从而进行声纹创建。
所述声纹建立模块62,用于采用文本相关的训练方法,为用户建立声纹模型;具体的,如图7所示,包括以下子模块:
提供子模块71,用于将注册字符串提供给用户。
可以理解,该注册字符串的形式可以有很多种:
作为一种示例,该注册字符串可为随机生成的数字串。此外,为了能够覆盖更大的样本空间,注册字符串中的数字只出现一次。
作为另一种示例,该注册字符串可为随机生成的汉字字符串。
接收子模块72,用于接收用户阅读注册字符串的语音信息。
具体地,在将该注册字符串提供给用户之后,用户可按照提供的注册字符串进行多次朗读以生成多条语音进行注册。在用户阅读该注册字符串的过程中,或者在用户完成阅读该注册字符串时,可接收用户按照提供的注册字符串进行朗读而生成的语音信息。
确定子模块73,用于根据性别分类器和语音确定用户的性别标签。
在本发明的实施例中,可根据性别分类器对语音进行性别分类,得到该用户的性别标签。其中,该性别标签包括男性或女性。具体而言,提取所获取到的语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一特征信息的性别标签,也就是用户的性别标签。
举例而言,以性别分类模型为高斯混合模型为例,可先对该语音提取基频特征以及梅尔频率倒谱系数MFCC特征,之后,可基于高斯混合模型对基频特征以及MFCC特征进行后验概率值计算,根据计算结果确定该用户的性别,例如,假设该高斯混合模型为男性高斯混合模型,则当计算结果为后验概率值很高,如大于一定阈值时,可确定该用户的性别为男性,当计算结果为后验概率值很小,如小于一定阈值时,可确定该用户的性别为女性。
生成子模块74,用于根据性别标签和语音生成用户的声纹模型。
根据与所述性别标签对应的DNN模型获取每条语音的后验概率。
根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型。
具体地,根据性别分类器返回的与语音对应的性别标签,将用户输入的多条语音发送到对应性别的DNN模型中。也就是说,如果语音对应的是男性语音,将语音发送到男性DNN模型中。如果语音对应的是女性语音,将语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取每条语音对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹模型。
输入模块63,用于生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据。
注册模块64,用于将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
其中,所述预存的声纹模型关联在同一账号,例如百度账号,之下,该账号下所有的声纹形成一个集合。各个智能语音交互设备与账号是唯一绑定的,通过账号将智能语音交互设备与声纹联系起来,声纹可以通过账号下的任意设备注册,一旦注册,可以在账号下任意智能语音设备中使用。当某一账号下的设备采集声纹后,就在该同一账号下的家庭声纹集合中进行匹配,识别声纹ID,达到了三者的统一,实现了从端到端的声纹集合识别解决方案。
图8为本申请另一实施例提供的声纹创建与注册装置的结构示意图,如图8所示,包括以下步骤:
获取模块81,用于获取用户发送的语音请求;
在本实施例的一种实现方式中,在智能语音交互设备联网后,用户 与智能语音交互设备进行语音交互,判断是否需要向云端发送语音请求;如果是,则进一步识别发出语音请求的用户ID。
在本实施例的另一种实现方式中,首先对语音请求进行语音识别,得到命令语音所描述的命令,确定所述命令对应垂类;如果所述垂类不需要确定用户ID以提供个性化推荐,则直接响应语音请求;如果所述垂类需要确定用户ID以提供个性化推荐,则进一步识别发出语音请求的用户ID。
声纹识别模块82,用于根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;具体的,包括以下子模块:
用户性别识别子模块,用于根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户性别标签。
由于不同性别的用户群,具有特殊的声纹特征,因此,可以根据用户群的声音特点,进行模型训练,以实现面向不同性别的用户群的声纹分析。当用户发起语音请求时,根据用户发出的语音请求,采用声纹识别方式,识别出发出语音请求的用户性信息。
在声纹识别之前,需要先对说话人的声纹进行建模,即“训练”或“学习”。具体的,通过应用深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器。从而建立了区分性别的声纹处理模型。
根据所获取到的命令语音,提取所述命令语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一特征信息的性别标签,也就是命令语音的 性别标签。
举例而言,以性别分类器为高斯混合模型为例,可先对所述语音请求提取基频特征以及梅尔频率倒谱系数MFCC特征,之后,可基于高斯混合模型对基频特征以及MFCC特征进行后验概率值计算,根据计算结果确定该用户的性别,例如,假设该高斯混合模型为男性高斯混合模型,则当计算结果为后验概率值很高,如大于一定阈值时,可确定该用户的性别为男性,当计算结果为后验概率值很小,如小于一定阈值时,可确定该用户的性别为女性。
用户声纹ID识别子模块,用于识别出发出语音请求的用户性别标签后,进一步识别发出命令语音的用户声纹ID。
每个用户的声音会有一个唯一的声纹ID,该ID记录有该用户姓名、性别、年龄、爱好等个人数据。
具体地,根据性别分类器返回的与语音请求对应的性别标签,将用户输入的语音发送到对应性别的DNN模型中。也就是说,如果语音请求对应的是男性语音,将语音发送到男性DNN模型中。如果语音请求对应的是女性语音,将语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取语音请求对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹模型。
通过将获取到的所述用户的声纹模型,与预存的声纹模型进行匹配,如果所述匹配值小于预先设定的阈值,则确定所述用户未进行注册,为首次使用智能设备。
提示模块83,用于若未识别到用户ID,则提示创建声纹并注册;
在本实施例的一种实现方式中,若未识别到用户ID,为首次使用智能设备,则提示模块83采用非文本相关的训练方法,为用户建立声纹模型。
具体的,
将未识别到用户ID的声纹模型打上ID号;
生成用户ID;提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据并注册该声纹。
将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
在本实施例的另一种实现方式中,为对用户的打扰可以达到最小化,可以只给经常使用的家庭用户引导创建声纹,具体的:
将未识别到用户ID的声纹模型打上ID号;但不生成用户ID提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据;仅在后台记录该ID号所属用户的行为。
判断打上ID号的声纹模型的出现频率;
如果该声纹出现频次低,则自动删除该ID号;
如果该声纹出现频次较高或连续多天出现,则生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数 据并注册该声纹。将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
在本实施例的一种实现方式中,提示模块83采用文本相关的训练方法,为未识别到用户ID的建立声纹模型并注册;在声纹技术尚不完善的情况下,可以使用文本相关的训练方法提高识别率。具体的,如图9所示,包括以下子模块:
提供子模块91,用于将注册字符串提供给用户。
可以理解,该注册字符串的形式可以有很多种:
作为一种示例,该注册字符串可为随机生成的数字串。此外,为了能够覆盖更大的样本空间,注册字符串中的数字只出现一次。
作为另一种示例,该注册字符串可为随机生成的汉字字符串。
接收子模块92,用于接收用户阅读注册字符串的语音信息。
具体地,在将该注册字符串提供给用户之后,用户可按照提供的注册字符串进行多次朗读以进行注册。在用户阅读该注册字符串的过程中,或者在用户完成阅读该注册字符串时,可接收用户按照提供的注册字符串进行朗读而生成的语音信息。
确定子模块93,用于根据性别分类器和语音确定用户的性别标签。
在本发明的实施例中,可根据性别分类器对语音进行性别分类,得到该用户的性别标签。其中,该性别标签包括男性或女性。具体而言,提取所获取到的语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一特征信息的性别标签,也就是用户的性别标签。
举例而言,以性别分类模型为高斯混合模型为例,可先对该语音提取基频特征以及梅尔频率倒谱系数MFCC特征,之后,可基于高斯混合模型对基频特征以及MFCC特征进行后验概率值计算,根据计算结果确定该用户的性别,例如,假设该高斯混合模型为男性高斯混合模型,则当计算结果为后验概率值很高,如大于一定阈值时,可确定该用户的性别为男性,当计算结果为后验概率值很小,如小于一定阈值时,可确定该用户的性别为女性。
生成子模块94,用于根据性别标签和语音生成用户的声纹模型。
根据与所述性别标签对应的DNN模型获取每条语音的后验概率。
根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型。
具体地,根据性别分类器返回的与语音对应的性别标签,将用户输入的多条语音发送到对应性别的DNN模型中。也就是说,如果语音对应的是男性语音,将语音发送到男性DNN模型中。如果语音对应的是女性语音,将语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取每条语音对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹模型。
输入模块84,用于生成用户ID,提示用户输入姓名、性别、年龄、爱好、家庭住址、工作地址等用户ID相关数据。
注册模块85,用于将用户ID和声纹模型对应存储到声纹注册数据库的某一账号之下,以便后续根据该声纹模型进行声纹识别,并对账号下任意智能语音设备进行语音控制。
其中,所述预存的声纹模型关联在同一账号,例如百度账号,之下,该账号下所有的声纹形成一个集合。各个智能语音交互设备与账号是唯一绑定的,通过账号将智能语音交互设备与声纹联系起来,声纹可以通过账号下的任意设备注册,一旦注册,可以在账号下任意智能语音设备中使用。当某一账号下的设备采集声纹后,就在该同一账号下的家庭声纹集合中进行匹配,识别声纹ID,达到了三者的统一,实现了从端到端的声纹集合识别解决方案。
优选的,用户可以通过MateAPP以语音登录,对用户ID、声纹模型进行修改。
本实施例所述方法能够避免现有技术中声纹创建与注册方法技术学习成本较高,较为打扰用户的问题。实现了区分性别的声纹注册过程,以便应用区分性别的声纹认证处理模型提高了声纹认证的效率和准确性;使得声纹的建立过程能够覆盖各种场景,声纹建立可以在各个阶段引导用户,或者通过频次将声纹建立与注册分离,对用户的打扰最小化,引导用户注册声纹而后使得语音交互产品可以基于声纹对用户提供个性化服务。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,所 述描述的终端和服务器的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本实施例所述装置能够避免现有技术中声纹创建与注册方法技术学习成本较高,较为打扰用户的问题。使得声纹的建立过程能够覆盖各种场景,声纹建立可以在各个阶段引导用户,或者通过频次将声纹建立与注册分离,对用户的打扰最小化,引导用户注册声纹而后使得语音交互产品可以基于声纹对用户提供个性化服务。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。所述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
图10示出了适于用来实现本发明实施方式的示例性计算机系统/服 务器012的框图。图10显示的计算机系统/服务器012仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图10所示,计算机系统/服务器012以通用计算设备的形式表现。计算机系统/服务器012的组件可以包括但不限于:一个或者多个处理器或者处理单元016,系统存储器028,连接不同系统组件(包括系统存储器028和处理单元016)的总线018。
总线018表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
计算机系统/服务器012典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机系统/服务器012访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器028可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)030和/或高速缓存存储器032。计算机系统/服务器012可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统034可以用于读写不可移动的、非易失性磁介质(图10未显示,通常称为“硬盘驱动器”)。尽管图9中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱 动器可以通过一个或者多个数据介质接口与总线018相连。存储器028可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。
具有一组(至少一个)程序模块042的程序/实用工具040,可以存储在例如存储器028中,这样的程序模块042包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块042通常执行本发明所描述的实施例中的功能和/或方法。
计算机系统/服务器012也可以与一个或多个外部设备014(例如键盘、指向设备、显示器024等)通信,在本发明中,计算机系统/服务器012与外部雷达设备进行通信,还可与一个或者多个使得用户能与该计算机系统/服务器012交互的设备通信,和/或与使得该计算机系统/服务器012能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口022进行。并且,计算机系统/服务器012还可以通过网络适配器020与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图10所示,网络适配器020通过总线018与计算机系统/服务器012的其它模块通信。应当明白,尽管图10中未示出,可以结合计算机系统/服务器012使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元016通过运行存储在系统存储器028中的程序,从而执行本发明所描述的实施例中的功能和/或方法。
上述的计算机程序可以设置于计算机存储介质中,即该计算机存储介质被编码有计算机程序,该程序在被一个或多个计算机执行时,使得一个或多个计算机执行本发明上述实施例中所示的方法流程和/或装置操作。
随着时间、技术的发展,介质含义越来越广泛,计算机程序的传播途径不再受限于有形介质,还可以直接从网络下载等。可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包 括——但不限于——无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。

Claims (18)

  1. 一种声纹创建与注册方法,其特征在于,包括:
    当设备首次启用,提示创建声纹并注册;
    采用文本相关的训练方法,为用户建立声纹模型;
    生成用户ID;
    将用户ID和声纹模型对应存储到声纹注册数据库。
  2. 根据权利要求1所述的声纹创建与注册方法,其特征在于,所述采用文本相关的训练方法,为用户建立声纹模型包括以下子步骤:
    将注册字符串提供给用户;
    接收用户阅读注册字符串的语音信息;
    根据性别分类器和语音确定用户的性别标签;
    根据性别标签和语音生成用户的声纹模型。
  3. 一种声纹创建与注册方法,其特征在于,包括:
    获取用户发送的语音请求;
    根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;
    若未识别到用户ID,则提示创建声纹并注册;
    生成用户ID;
    将用户ID和声纹模型对应存储到声纹注册数据库。
  4. 根据权利要求3所述的声纹创建与注册方法,其特征在于,所述获取用户发送的语音请求进一步包括:
    判断是否需要向云端发送所述语音请求,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
  5. 根据权利要求3或4所述的声纹创建与注册方法,其特征在于,所述获取用户发送的语音请求进一步包括:
    判断所述语音请求是否需要识别用户ID,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
  6. 根据权利要求3、4或5所述的声纹创建与注册方法,其特征在于,所述提示创建声纹并注册包括:
    将未识别到用户ID的声纹模型打上ID号;
    判断所述打上ID号的声纹模型的出现频率;
    如果低于阈值,则删除该ID号;
    如果高于阈值,则生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
  7. 根据权利要求3至6任一权项所述的声纹创建与注册方法,其特征在于,所述提示创建声纹并注册包括:
    采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型。
  8. 根据权利要求7所述的声纹创建与注册方法,其特征在于,所述采用文本相关的训练方法,为未识别到用户ID的用户建立声纹模型包括:
    将注册字符串提供给用户;
    接收用户阅读注册字符串的语音信息;
    根据性别分类器和语音确定用户的性别标签;
    根据性别标签和语音生成用户的声纹模型。
  9. 一种声纹创建与注册装置,其特征在于,包括:
    提示模块、声纹建立模块、输入模块、注册模块;其中,
    所述提示模块,用于当设备首次启用,提示创建声纹并注册;
    所述声纹建立模块,用于采用文本相关的训练方法,为用户建立声纹模型;
    所述输入模块,用于生成用户ID;
    所述注册模块,用于将用户ID和声纹模型对应存储到声纹注册数据库。
  10. 根据权利要求9所述的声纹创建与注册装置,其特征在于,所述声纹建立模块,具体包括以下子模块:
    提供子模块,用于将注册字符串提供给用户;
    接收子模块,用于接收用户阅读注册字符串的语音信息;
    确定子模块,用于根据性别分类器和语音确定用户的性别标签;
    生成子模块,用于根据性别标签和语音生成用户的声纹模型。
  11. 一种声纹创建与注册装置,其特征在于,包括:
    获取模块、声纹识别模块、提示模块、输入模块和注册模块;其中,
    所述获取模块,用于获取用户发送的语音请求;
    所述声纹识别模块,用于根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID;
    所述提示模块,用于提示未注册用户创建声纹并注册;
    所述输入模块,用于生成用户ID;
    所述注册模块,用于将用户ID和声纹模型对应存储到声纹注册数据库。
  12. 根据权利要求11所述的声纹创建与注册装置,其特征在于,所述获取模块具体执行:
    判断是否需要向云端发送所述语音请求,如果是,则根据所述语音 请求,采用声纹识别方式,识别发出语音请求的用户ID。
  13. 根据权利要求11或12所述的声纹创建与注册装置,其特征在于,所述提示模块具体执行:
    判断所述语音请求是否需要识别用户ID,如果是,则根据所述语音请求,采用声纹识别方式,识别发出语音请求的用户ID。
  14. 根据权利要求11、12或13所述的声纹创建与注册装置,其特征在于,所述提示模块具体执行:
    将未识别到用户ID的声纹模型打上ID号;
    判断所述打上ID号的声纹模型的出现频率;
    如果低于阈值,则删除该ID号;
    如果高于阈值,则生成用户ID;将用户ID和声纹模型对应存储到声纹注册数据库。
  15. 根据权利要求13所述的声纹创建与注册装置,其特征在于,所述提示模块具体执行:
    采用文本相关的训练方法,为未注册用户建立声纹模型。
  16. 根据权利要求15所述的声纹创建与注册装置,其特征在于,所述提示模块包括以下子模块:
    提供子模块,用于将注册字符串提供给用户;
    接收子模块,用于接收用户阅读注册字符串的语音信息;
    确定子模块,用于根据性别分类器和语音确定用户的性别标签;
    生成子模块,用于根据性别标签和语音生成用户的声纹模型。
  17. 一种设备,其特征在于,所述设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8中任一所述的方法。
PCT/CN2017/113772 2017-06-30 2017-11-30 一种声纹创建与注册方法及装置 WO2019000832A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2019530680A JP2020503541A (ja) 2017-06-30 2017-11-30 声紋の作成・登録の方法及び装置
US16/477,121 US11100934B2 (en) 2017-06-30 2017-11-30 Method and apparatus for voiceprint creation and registration
KR1020197016874A KR102351670B1 (ko) 2017-06-30 2017-11-30 성문 구축 및 등록 방법 및 그 장치
EP17915945.4A EP3564950B1 (en) 2017-06-30 2017-11-30 Method and apparatus for voiceprint creation and registration

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710527022.7A CN107492379B (zh) 2017-06-30 2017-06-30 一种声纹创建与注册方法及装置
CN201710527022.7 2017-06-30

Publications (1)

Publication Number Publication Date
WO2019000832A1 true WO2019000832A1 (zh) 2019-01-03

Family

ID=60644303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/113772 WO2019000832A1 (zh) 2017-06-30 2017-11-30 一种声纹创建与注册方法及装置

Country Status (6)

Country Link
US (1) US11100934B2 (zh)
EP (1) EP3564950B1 (zh)
JP (2) JP2020503541A (zh)
KR (1) KR102351670B1 (zh)
CN (1) CN107492379B (zh)
WO (1) WO2019000832A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210829A (zh) * 2020-02-19 2020-05-29 腾讯科技(深圳)有限公司 语音识别方法、装置、系统、设备和计算机可读存储介质

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597525B (zh) * 2018-04-25 2019-05-03 四川远鉴科技有限公司 语音声纹建模方法及装置
CN109036436A (zh) * 2018-09-18 2018-12-18 广州势必可赢网络科技有限公司 一种声纹数据库建立方法、声纹识别方法、装置及系统
CN109510844B (zh) * 2019-01-16 2022-02-25 中民乡邻投资控股有限公司 一种基于声纹的对话交流式的账号注册方法及装置
CN111798857A (zh) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 一种信息识别方法、装置、电子设备及存储介质
CN109920435B (zh) * 2019-04-09 2021-04-06 厦门快商通信息咨询有限公司 一种声纹识别方法及声纹识别装置
CN112127090A (zh) * 2019-06-06 2020-12-25 青岛海尔洗衣机有限公司 用于衣物处理设备的控制方法
CN110459227A (zh) * 2019-08-29 2019-11-15 四川长虹电器股份有限公司 基于智能电视的声纹注册方法
CN110570873B (zh) * 2019-09-12 2022-08-05 Oppo广东移动通信有限公司 声纹唤醒方法、装置、计算机设备以及存储介质
CN111081258B (zh) * 2019-11-07 2022-12-06 厦门快商通科技股份有限公司 一种声纹模型管理方法、系统、存储介质及装置
CN110992930A (zh) * 2019-12-06 2020-04-10 广州国音智能科技有限公司 声纹特征提取方法、装置、终端及可读存储介质
CN111368504A (zh) * 2019-12-25 2020-07-03 厦门快商通科技股份有限公司 语音数据标注方法、装置、电子设备及介质
CN111161746B (zh) * 2019-12-31 2022-04-15 思必驰科技股份有限公司 声纹注册方法及系统
CN111477234A (zh) * 2020-03-05 2020-07-31 厦门快商通科技股份有限公司 一种声纹数据注册方法和装置以及设备
CN111599367A (zh) * 2020-05-18 2020-08-28 珠海格力电器股份有限公司 一种智能家居设备的控制方法、装置、设备及介质
US11699447B2 (en) * 2020-06-22 2023-07-11 Rovi Guides, Inc. Systems and methods for determining traits based on voice analysis
CN111914803B (zh) * 2020-08-17 2023-06-13 华侨大学 一种唇语关键词检测方法、装置、设备及存储介质
CN112185362A (zh) * 2020-09-24 2021-01-05 苏州思必驰信息科技有限公司 针对用户个性化服务的语音处理方法及装置
CN112423063A (zh) * 2020-11-03 2021-02-26 深圳Tcl新技术有限公司 一种智能电视自动设置方法、装置及存储介质
CN112634909B (zh) * 2020-12-15 2022-03-15 北京百度网讯科技有限公司 声音信号处理的方法、装置、设备、计算机可读存储介质
CN113506577A (zh) * 2021-06-25 2021-10-15 贵州电网有限责任公司 一种基于增量采集电话录音完善声纹库的方法
CN113707154B (zh) * 2021-09-03 2023-11-10 上海瑾盛通信科技有限公司 模型训练方法、装置、电子设备和可读存储介质
CN117221450A (zh) * 2023-09-25 2023-12-12 深圳我买家网络科技有限公司 Ai智慧客服系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104967622A (zh) * 2015-06-30 2015-10-07 百度在线网络技术(北京)有限公司 基于声纹的通讯方法、装置和系统
CN105185379A (zh) * 2015-06-17 2015-12-23 百度在线网络技术(北京)有限公司 声纹认证方法和装置
CN105656887A (zh) * 2015-12-30 2016-06-08 百度在线网络技术(北京)有限公司 基于人工智能的声纹认证方法以及装置
CN105913850A (zh) * 2016-04-20 2016-08-31 上海交通大学 文本相关声纹密码验证方法
CN106057206A (zh) * 2016-06-01 2016-10-26 腾讯科技(深圳)有限公司 声纹模型训练方法、声纹识别方法及装置
US20160314790A1 (en) * 2015-04-22 2016-10-27 Panasonic Corporation Speaker identification method and speaker identification device
CN106098068A (zh) * 2016-06-12 2016-11-09 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
CN106847292A (zh) * 2017-02-16 2017-06-13 平安科技(深圳)有限公司 声纹识别方法及装置

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5864548A (ja) 1981-10-14 1983-04-16 Fujitsu Ltd 音声日本語処理システム
JP3776805B2 (ja) 2001-02-27 2006-05-17 アルパイン株式会社 携帯電話選択無線通信装置
US20060222210A1 (en) 2005-03-31 2006-10-05 Hitachi, Ltd. System, method and computer program product for determining whether to accept a subject for enrollment
US20070219801A1 (en) 2006-03-14 2007-09-20 Prabha Sundaram System, method and computer program product for updating a biometric model based on changes in a biometric feature of a user
JP2009109712A (ja) * 2007-10-30 2009-05-21 National Institute Of Information & Communication Technology オンライン話者逐次区別システム及びそのコンピュータプログラム
JP2009237774A (ja) * 2008-03-26 2009-10-15 Advanced Media Inc 認証サーバ、サービス提供サーバ、認証方法、通信端末、およびログイン方法
US8442824B2 (en) * 2008-11-26 2013-05-14 Nuance Communications, Inc. Device, system, and method of liveness detection utilizing voice biometrics
JP5577737B2 (ja) 2010-02-18 2014-08-27 株式会社ニコン 情報処理システム
AU2013203139B2 (en) * 2012-01-24 2016-06-23 Auraya Pty Ltd Voice authentication and speech recognition system and method
US20160372116A1 (en) * 2012-01-24 2016-12-22 Auraya Pty Ltd Voice authentication and speech recognition system and method
US9691377B2 (en) * 2013-07-23 2017-06-27 Google Technology Holdings LLC Method and device for voice recognition training
US9548047B2 (en) * 2013-07-31 2017-01-17 Google Technology Holdings LLC Method and apparatus for evaluating trigger phrase enrollment
WO2015033523A1 (ja) 2013-09-03 2015-03-12 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 音声対話制御方法
JP2015153258A (ja) * 2014-02-17 2015-08-24 パナソニックIpマネジメント株式会社 車両用個人認証システム及び車両用個人認証方法
US20150302856A1 (en) * 2014-04-17 2015-10-22 Qualcomm Incorporated Method and apparatus for performing function by speech input
CN104616655B (zh) * 2015-02-05 2018-01-16 北京得意音通技术有限责任公司 声纹模型自动重建的方法和装置
EP3380964A1 (en) * 2015-11-24 2018-10-03 Koninklijke Philips N.V. Two-factor authentication in a pulse oximetry system
CN106782571A (zh) * 2017-01-19 2017-05-31 广东美的厨房电器制造有限公司 一种控制界面的显示方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314790A1 (en) * 2015-04-22 2016-10-27 Panasonic Corporation Speaker identification method and speaker identification device
CN105185379A (zh) * 2015-06-17 2015-12-23 百度在线网络技术(北京)有限公司 声纹认证方法和装置
CN104967622A (zh) * 2015-06-30 2015-10-07 百度在线网络技术(北京)有限公司 基于声纹的通讯方法、装置和系统
CN105656887A (zh) * 2015-12-30 2016-06-08 百度在线网络技术(北京)有限公司 基于人工智能的声纹认证方法以及装置
CN105913850A (zh) * 2016-04-20 2016-08-31 上海交通大学 文本相关声纹密码验证方法
CN106057206A (zh) * 2016-06-01 2016-10-26 腾讯科技(深圳)有限公司 声纹模型训练方法、声纹识别方法及装置
CN106098068A (zh) * 2016-06-12 2016-11-09 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
CN106847292A (zh) * 2017-02-16 2017-06-13 平安科技(深圳)有限公司 声纹识别方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3564950A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210829A (zh) * 2020-02-19 2020-05-29 腾讯科技(深圳)有限公司 语音识别方法、装置、系统、设备和计算机可读存储介质

Also Published As

Publication number Publication date
JP7062851B2 (ja) 2022-05-09
EP3564950A4 (en) 2020-08-05
JP2020503541A (ja) 2020-01-30
KR20190077088A (ko) 2019-07-02
JP2021021955A (ja) 2021-02-18
US20190362724A1 (en) 2019-11-28
KR102351670B1 (ko) 2022-01-13
CN107492379B (zh) 2021-09-21
EP3564950B1 (en) 2022-03-23
US11100934B2 (en) 2021-08-24
CN107492379A (zh) 2017-12-19
EP3564950A1 (en) 2019-11-06

Similar Documents

Publication Publication Date Title
WO2019000832A1 (zh) 一种声纹创建与注册方法及装置
CN107481720B (zh) 一种显式声纹识别方法及装置
JP6771805B2 (ja) 音声認識方法、電子機器、及びコンピュータ記憶媒体
WO2019000991A1 (zh) 一种声纹识别方法及装置
US10068588B2 (en) Real-time emotion recognition from audio signals
CN107153496B (zh) 用于输入表情图标的方法和装置
US20200126566A1 (en) Method and apparatus for voice interaction
EP3617946B1 (en) Context acquisition method and device based on voice interaction
US10395655B1 (en) Proactive command framework
WO2017112813A1 (en) Multi-lingual virtual personal assistant
JP2021533397A (ja) 話者埋め込みと訓練された生成モデルとを使用する話者ダイアライゼーション
WO2020019591A1 (zh) 用于生成信息的方法和装置
JP2020034895A (ja) 応答方法及び装置
US20220076674A1 (en) Cross-device voiceprint recognition
CN108363556A (zh) 一种基于语音与增强现实环境交互的方法和系统
CN112969995A (zh) 电子装置及其控制方法
TW201937344A (zh) 智慧型機器人及人機交互方法
CN110704618B (zh) 确定对话数据对应的标准问题的方法及装置
CN112632244A (zh) 一种人机通话的优化方法、装置、计算机设备及存储介质
CN113703585A (zh) 交互方法、装置、电子设备及存储介质
US20190103110A1 (en) Information processing device, information processing method, and program
US10831442B2 (en) Digital assistant user interface amalgamation
CN112037772B (zh) 基于多模态的响应义务检测方法、系统及装置
CN115171673A (zh) 一种基于角色画像的交流辅助方法、装置及存储介质
CN111556096B (zh) 信息推送方法、装置、介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17915945

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019530680

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20197016874

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017915945

Country of ref document: EP

Effective date: 20190729

NENP Non-entry into the national phase

Ref country code: DE