WO2021232213A1 - Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method - Google Patents

Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method Download PDF

Info

Publication number
WO2021232213A1
WO2021232213A1 PCT/CN2020/090930 CN2020090930W WO2021232213A1 WO 2021232213 A1 WO2021232213 A1 WO 2021232213A1 CN 2020090930 W CN2020090930 W CN 2020090930W WO 2021232213 A1 WO2021232213 A1 WO 2021232213A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
registered
user
voiceprint information
voice
Prior art date
Application number
PCT/CN2020/090930
Other languages
French (fr)
Chinese (zh)
Inventor
高振东
吴晶
陈晓
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080001170.5A priority Critical patent/CN114026637A/en
Priority to PCT/CN2020/090930 priority patent/WO2021232213A1/en
Publication of WO2021232213A1 publication Critical patent/WO2021232213A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • This application relates to the technical field of voiceprint recognition, and in particular to a voiceprint recognition and registration device, and a cross-device voiceprint recognition method.
  • Voiceprint recognition is a type of biological recognition technology, which is a recognition technology that uses a computer to determine the identity of a speaker by converting a voice signal into a digital signal.
  • voiceprint recognition function it is necessary to register the user's voiceprint in advance.
  • registering the voiceprint has become a complicated process.
  • the current technology it is possible to establish a voiceprint mapping model between different devices, extract voiceprints from the voice recorded by the first device, perform voiceprint registration, and then extract voiceprint features from the voice commands recorded by the second device.
  • the established voiceprint mapping model maps the voiceprint features to the voiceprint registered by the first device, so that there is no need to register the voiceprint on the second device.
  • the saved voiceprint mapping model is related to the device, and pairwise mapping of all devices is required. There are too many mapping relationships. For example, when there are Q devices in an environment, a pairwise voiceprint mapping relationship model is established. The number of is Q ⁇ (Q-1); if there is a change in the channel of an individual device, the corresponding voiceprint mapping model group is all wrong, and the recognition accuracy is difficult to guarantee.
  • the embodiment of the application provides a voiceprint recognition device in a recognition device, a voiceprint registration device in a registration device, and a method for cross-device voiceprint recognition.
  • the method is applied to a cross-device voiceprint recognition system, and the system includes Recognition device and registration device, where the registration device registers the user's voice; the registration device eliminates the noise related to the registration device in the recorded voice to obtain the user's biological voiceprint, and records the biological voiceprint in the user's voiceprint identification
  • the registered voiceprint information which can be understood as that the biological voiceprint of the user has been "recorded", and the recorded biological voiceprint of the user is called the user's registered voiceprint information; the registered voiceprint information is the voice Pattern recognition provides the basis; the recognition device performs voiceprint recognition on the user’s voice.
  • the recognition device After the recognition device receives the voice input by the user, it eliminates the noise related to the recognition device in the voice to obtain the user’s biological voiceprint (called target voiceprint information) ), the voiceprint recognition function is realized through the registered voiceprint information shared by the registered device, that is, the target voiceprint information is matched with the registered voiceprint information, and the user identity is indicated through the matching result.
  • target voiceprint information biological voiceprint
  • an embodiment of the present application provides a voiceprint recognition device in a recognition device.
  • the recognition device may be a terminal, and the voiceprint recognition device may be a processor, a chip, or a chip system in the terminal; or the voice
  • the pattern recognition device is a terminal, which is a component of the recognition device; the device includes a processor, a voice input and output device and a transceiver connected to the processor; the voice input and output device is used to receive the voice input by the user; the processor is used to Eliminate the noise related to the recognition device in the received voice to obtain target voiceprint information, that is, the target voiceprint information is the biological voiceprint after eliminating the influence of the recognition device on the voice, and the user’s biological voiceprint is equivalent to obedient
  • the registered voiceprint information can be shared with other recognition devices.
  • the user only needs to use one device ( Registering the voiceprint on the registered device) can realize the voiceprint recognition function on multiple devices.
  • the recognition device does not require voiceprint registration, eliminating the complicated voiceprint registration process, and realizing cross-device voiceprint recognition, which can greatly
  • the user experience is improved, and the registered voiceprint information and the target voiceprint information are biological voiceprints after separating the device-related noise, eliminating the influence of the device itself on the voiceprint. No matter what type of terminal the identification device is, Both have a high accuracy rate.
  • the device further includes a transceiver connected to the processor; the transceiver is used to receive the registered voiceprint information and the voiceprint identification of the registered voiceprint information sent by the storage device or the registration device, and the voiceprint identification is used for Instruct the user; in this embodiment, the registered voiceprint information can be stored through a storage device to provide a basis for sharing the registered voiceprint information with other devices.
  • the user needs to use a recognition device for voiceprint recognition, there is no need to The voiceprint registration is performed on the recognition device, but the registered voiceprint information in the storage device or the registration device can be shared.
  • the transceiver is also used to send the user's voiceprint identification to the storage device or registration device to request the user's registered voiceprint information; the user's biological voiceprint identification corresponds to the registered voiceprint information Relationship, the user’s voiceprint identification is sent to the storage device or the registration device, and the storage device or the registration device sends the registered voiceprint information corresponding to the voiceprint identification to the voiceprint recognition device.
  • the processor is specifically configured to input the voice into the voiceprint extraction model, and eliminate the noise related to the recognition device in the voice through the voiceprint extraction model to obtain target voiceprint information; in the embodiment of the present application, By means of machine learning, the voiceprint extraction model is pre-trained, and the biological voiceprint of the user in the speech is directly extracted through the trained voiceprint extraction model, which has good robustness.
  • the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; the multiple devices may be different types of terminals, and the different types of terminals include, but are not limited to, smart homes Terminals, lighting systems, smart speakers, robots, in-vehicle terminals, user equipment, mobile phones, tablets, personal computers, virtual reality terminal equipment, augmented reality terminal equipment, etc.; in the embodiment of this application, the voiceprint extraction model is multiple
  • the voiceprint information output by the voiceprint extraction model is obtained by learning the corpus collected by different devices.
  • the voiceprint information output by the voiceprint extraction model can eliminate the influence of different devices on the voice to obtain the user's biological voiceprint.
  • the processor is also used to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain target voiceprint information; embodiments of the present application
  • the voice signal is processed through the filter, the prominent signal is attenuated, and the weaker signal is enhanced, thereby compensating the frequency response, so as to achieve the purpose of removing equipment-related noise.
  • the filtering method is directly The noise signal is filtered, and the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
  • the processor is also used to obtain user data associated with the user identity; perform operations corresponding to the user data; in the embodiment of the present application, the processor performs operations corresponding to the historical information, but not Users are required to perform habitual operations every time to improve user experience.
  • the embodiments of the present application provide a voiceprint registration device in a registration device.
  • the registration device may be a terminal, and the voiceprint registration device may be a processor, a chip, or a chip system in the terminal; or a voiceprint registration device.
  • the registration device is a terminal, which is a component of the recognition device; the device includes a processor and a voice input and output device connected to the processor; wherein the voice input and output device is used to receive the voice input by the user; the processor is used to eliminate the voice input The noise related to the registered device is used to obtain the user's registered voiceprint information.
  • the registered voiceprint information includes the user's biological voiceprint; the registered voiceprint information is used to provide a basis for other devices to recognize the voiceprint, and the registered voiceprint information can be Registration voiceprint information shared by multiple recognition devices; in this embodiment, the registration device extracts the received user’s voice, eliminates device-related noise in the voice, and obtains the user’s "clean" biological voiceprint (the biological voice The pattern is the closest to the voiceprint of a person when speaking face-to-face). Because the device’s influence on voice is eliminated, the extracted biological voiceprint information can be used as registered voiceprint information shared by multiple devices to achieve cross-device voiceprint recognition, which can greatly Enhance the user experience.
  • the device further includes a transceiver connected to the processor; the transceiver is used to send the registered voiceprint information and the voiceprint identification corresponding to the registered voiceprint information to the storage device or the identification device, and the voiceprint identification is used for Instruct the user; in this embodiment of the application, the transceiver sends the registered voiceprint information and the voiceprint identification corresponding to the registered voiceprint information to the storage device or identification device, and the storage device or identification device is used as the storage of the registered voiceprint information and voiceprint identification The center provides a basis for the sharing of registered voiceprint information.
  • the processor is specifically configured to input the voice into the voiceprint extraction model, and eliminate the noise related to the registered device in the voice through the voiceprint extraction model to obtain registered voiceprint information; in the embodiment of the present application, By means of machine learning, the voiceprint extraction model is pre-trained, and the biological voiceprint of the user in the speech is directly extracted through the trained voiceprint extraction model, which has good robustness.
  • the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; in the embodiment of the present application, the voiceprint information output by the voiceprint extraction model can eliminate the voice effects of different devices. Influence to get the user’s biological voiceprint.
  • the processor is also used to perform frequency response compensation for the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain registered voiceprint information; in this embodiment of the application , The speech signal is processed through the filter, the highlighted signal is attenuated, and the weaker signal is enhanced, thereby compensating the frequency response, so as to achieve the purpose of removing equipment-related noise. After the filter filtering method, it is directly filtered The noise signal, and the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
  • the embodiments of the present application provide a cross-device voiceprint recognition method, which is applied to a recognition device and may include the following steps: the recognition device receives the voice input by the user; and then eliminates the noise related to the recognition device in the voice.
  • the user's biological voiceprint is called target voiceprint information; obtain the user's registered voiceprint information, the registered voiceprint information includes at least one biological voiceprint template obtained by eliminating noise related to the registered device ; Match the target voiceprint information with the registered voiceprint information to obtain a matching result, which is used to indicate the identity of the user; in this embodiment of the application, the registered voiceprint information eliminates the noise related to the registered device itself in the voice, The extracted biological voiceprint information of the user himself in the voice, because the registered device’s influence on the biological voiceprint is eliminated, the registered voiceprint information can be shared with other recognition devices. The user only needs to register the voiceprint on the registered device.
  • the voiceprint recognition function can be realized on multiple devices.
  • the recognition device has a higher performance regardless of the type of terminal. The accuracy rate.
  • obtaining the registered voiceprint information of the user may include: receiving the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the storage device or the registered device, where the voiceprint identifier is used to indicate the user.
  • the method before receiving the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the storage device or the registered device, the method further includes: sending the voiceprint identifier to the storage device or the registered device to request the user's Register voiceprint information.
  • removing noise related to the recognition device in the voice to obtain target voiceprint information may include inputting the voice into a voiceprint extraction model, and removing the noise related to the recognition device in the voice through the voiceprint extraction model Obtain target voiceprint information; in the embodiment of this application, the voiceprint extraction model is pre-trained through machine learning, and the biological voiceprint of the user in the voice is directly extracted through the trained voiceprint extraction model, which has better results Robustness.
  • the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; in this embodiment of the present application, the voiceprint information output by the voiceprint extraction model can eliminate the voice effects of different devices themselves. Influence to get the user’s biological voiceprint.
  • the method further includes: obtaining user data associated with the user identity, and the user data may be historical data of the user's operation. Perform operations corresponding to user data; in the embodiment of the present application, the identification device obtains user data of the user and performs operations corresponding to the historical information, eliminating the need for the user to perform habitual operations each time, thereby improving user experience.
  • an embodiment of the present application provides a cross-device voiceprint recognition method, which is applied to a registered device, and may include: receiving a voice input by a user; eliminating noise related to the registered device in the voice to obtain registered voiceprint information;
  • the registered voiceprint information includes the user’s biological voiceprint; in this embodiment, the registration device extracts the received user’s voice, eliminates device-related noise in the voice, and obtains the user’s "clean" biological voiceprint (the biological voice The fingerprint is the closest to the voiceprint of a person when speaking face-to-face). Because the device’s impact on speech is reduced, the extracted biological voiceprint information can be used as registered voiceprint information shared by multiple devices to achieve cross-device voiceprint recognition, which can greatly Enhance the user experience.
  • the registration device sends the registered voiceprint information and the voiceprint identification corresponding to the registered voiceprint information to the storage device or the recognition device; the registration device sends the registered voiceprint information and the registered voiceprint to the storage device or the recognition device
  • the voiceprint identification corresponding to the information uses the storage device or recognition device as the storage center for registered voiceprint information and voiceprint identification, providing a basis for the sharing of registered voiceprint information.
  • eliminating the noise related to the registered device in the voice to obtain the biological voiceprint information may include: inputting the voice into the voiceprint extraction model, and eliminating the noise related to the registered device in the voice through the voiceprint extraction model
  • the voiceprint extraction model is pre-trained through machine learning, and the biological voiceprint of the user in the voice is directly extracted through the trained voiceprint extraction model. Robustness.
  • the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; in this embodiment of the present application, the voiceprint information output by the voiceprint extraction model can eliminate the voice effects of different devices themselves. Influence to get the user’s biological voiceprint.
  • eliminating the noise related to the registered device in the voice to obtain the biological voiceprint information may include: performing frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate the voice related to the recognition device.
  • the voice signal is processed through a filter to attenuate the highlighted signal and strengthen the weaker signal, thereby compensating for the frequency response, so as to achieve the elimination of equipment related
  • the purpose of the noise, through the filter method is to directly filter the noise signal, so that the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
  • the embodiments of the present application provide a voiceprint recognition device, which has the function of realizing the function performed by the recognition device of the third aspect mentioned above; the function can be realized by hardware, or the corresponding software can be executed by hardware Implementation; the hardware or software includes one or more modules corresponding to the above-mentioned functions, the device includes: a voice input and output module for receiving the voice input by the user; a processing module for eliminating the voice received by the voice input and output module Recognize the noise related to the device to obtain target voiceprint information, the target voiceprint information is the user’s biological voiceprint; the transceiver module is used to obtain the user’s registered voiceprint information, which includes the elimination of noise related to the registered device The obtained at least one biological voiceprint template; the processing module is also used to match the target voiceprint information obtained by the processing module with the registered voiceprint information obtained by the transceiver module 1303 to obtain a matching result, and the matching result is used to indicate the user's identity.
  • the embodiments of the present application provide a voiceprint registration device.
  • the voiceprint recognition device can realize the function performed by the registration device in the fourth aspect; the function can be realized by hardware, or the corresponding software can be executed by hardware.
  • Implementation; the hardware or software includes one or more modules corresponding to the above-mentioned functions, the device includes: a voice input and output module for receiving the voice input by the user; a processing module for eliminating the voice received by the voice input and output module The noise associated with the registered device is used to obtain the user's registered voiceprint information.
  • an embodiment of the present application provides a cross-device voiceprint recognition system, which includes a registration device and a recognition device; the registration device receives the first voice input by the user, and eliminates the recognition device-related information in the first voice. Noise is used to obtain registered voiceprint information.
  • the registered voiceprint information includes the user’s biological voiceprint.
  • the registered voiceprint information provides a basis for the recognition device to perform voiceprint recognition; the recognition device receives the second voice input by the user; and eliminates the second voice.
  • the noise associated with the recognition device is used to obtain the target voiceprint information, which is the user’s biological voiceprint; the recognition device matches the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to instruct the user identity.
  • the registered voiceprint information refers to the extracted biological voiceprint information of the user in the voice after eliminating the noise related to the registered device itself. Since the influence of the registered device on the biological voiceprint is eliminated, Share the registered voiceprint information with other voiceprint recognition devices.
  • the user only needs to register the voiceprint on one device (registered device), and the voiceprint recognition function can be implemented on multiple devices, without the need for each device Voiceprint registration is performed, eliminating the complicated voiceprint registration process, and realizing cross-device voiceprint recognition, which can greatly improve the user experience, and the registered voiceprint information and target voiceprint information are both after eliminating device-related noise
  • the voiceprint information eliminates the influence of the device on the voiceprint. No matter which voiceprint device the user uses for voiceprint recognition, it has a high accuracy rate.
  • the system further includes a storage device; the storage device receives and stores the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the registration device, the voiceprint identifier is used to indicate the user; The device receives the registered voiceprint information and the voiceprint information corresponding to the registered voiceprint information sent by the storage device.
  • the storage device stores the registered voiceprint information to provide a basis for sharing the registered voiceprint information with other devices. When the user needs to use other devices (also called voiceprint recognition devices) for voiceprint recognition When you do not need to register the voiceprint on the voiceprint recognition device, you can share the registered voiceprint information that has been registered on the registered device.
  • this embodiment provides a chip including a processor and a memory.
  • the memory is used to store a program or instruction.
  • the identification device can execute the method of any one of the foregoing third aspect.
  • the registration device is caused to execute the method of any one of the foregoing fourth aspects.
  • this embodiment provides a computer-readable medium for storing a computer program or instruction.
  • the computer executes any of the methods of the third aspect, or the computer executes the fourth Any one of the methods.
  • FIG. 1 is a schematic diagram of an embodiment of a communication system in an embodiment of this application;
  • FIG. 2 is a schematic flowchart of an embodiment of cross-device voiceprint recognition in an embodiment of this application;
  • FIG. 3 is a schematic diagram of a process flow of an embodiment of voiceprint registration of a registered device in an embodiment of the application
  • FIG. 4 is a schematic diagram of an embodiment of storing registered voiceprint information in an embodiment of the application.
  • FIG. 5 is a schematic diagram of another embodiment of storing registered voiceprint information in an embodiment of the application.
  • FIG. 6 is a schematic diagram of another embodiment of storing registered voiceprint information in an embodiment of the application.
  • FIG. 7 is a schematic diagram of a process flow of an embodiment of voiceprint recognition in an embodiment of this application.
  • FIG. 8 is a schematic diagram of an application scenario of cross-device voiceprint recognition in an embodiment of this application.
  • FIG. 9 is a schematic diagram of another application scenario of cross-device voiceprint recognition in an embodiment of this application.
  • FIG. 10 is a schematic diagram of training a voiceprint extraction model in an embodiment of the application.
  • FIG. 11A is a schematic diagram of a curve without frequency response compensation in an embodiment of the application.
  • FIG. 11B is a schematic diagram of a curve after frequency response compensation in an embodiment of the application.
  • FIG. 12 is a schematic diagram of generating at least one voiceprint information template in an embodiment of this application.
  • FIG. 13 is a schematic diagram of an embodiment of a device in an embodiment of the application.
  • FIG. 14 is a schematic diagram of another embodiment of a device in an embodiment of this application.
  • the embodiment of the present application provides a cross-device voiceprint recognition method.
  • the cross-device voiceprint recognition method refers to the completion of voiceprint registration on one device, and the registered voiceprint can be used on multiple unregistered devices. Information to complete the corresponding task. Multiple devices may include the same type of equipment or different types of equipment.
  • the communication system includes a server 101 and a plurality of terminals 102.
  • the server 101 may be a server, a server cluster, or a cloud server;
  • the terminal 102 may be a terminal in a smart home, including but not limited to smart home appliances (such as smart screens).
  • the terminal 102 can also be a vehicle-mounted terminal, user equipment (UE), mobile phone, tablet computer (pad), Personal computers, virtual reality (VR) terminal devices, augmented reality (AR) terminal devices; as an example and not a limitation, in this application, the terminal may also be a wearable device.
  • Wearable devices can also be called wearable smart devices. It is a general term for wearable devices that use wearable technology to intelligently design everyday wear, such as glasses, gloves, watches, clothing and shoes.
  • a wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories.
  • Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-sized, complete or partial functions that can be achieved without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to cooperate with other devices such as smart phones.
  • the terminal 102 may be a terminal in the Internet of Things (IoT) system.
  • IoT Internet of Things
  • Machine interconnection an intelligent network of interconnection of things.
  • the device-related noise is added during the process from the device receiving the voice to the voice signal processing.
  • the device-related noise includes but not limited to channel noise, coding noise, microphone physical characteristics and quantity, and gain Related noise caused by, distance, environment, pre-processing algorithms, etc. It is understandable that if the user's voice is entered through the registered device, the voiceprint information in the voice will be affected by the device, and thus change, that is, it is no longer a "clean" voiceprint, but a device-related voiceprint is added. Voiceprint after the noise. If the channels of different devices are different, the noise may vary. For example, because the signal will undergo signal attenuation or signal delay during transmission, the sound signal will change during transmission. The sound signal is composed of signals of different frequencies.
  • the received sound signal will be distorted. Will cause channel distortion. Or during signal transmission, it is necessary to convert analog signals into digital signals for transmission, and distortion may also be introduced.
  • the higher the bit rate the more high-quality signals can be transmitted.
  • the signal when the signal is coded and decoded, it cannot be completely lossless. Therefore, the encoded and decoded speech signal will have a certain loss more or less. Different devices may have different effects on the sound signal.
  • Voiceprint ID This voiceprint ID is used to distinguish registered voiceprint information. Each registered voiceprint information has a unique voiceprint ID.
  • the voiceprint ID in this application can be a universal account independently owned by each user, such as this
  • the voiceprint identification can be a user ID (identity), which can be used for a general account for performing device-related operations. If the device is a mobile phone, the general account can use complete services, such as downloading software, data synchronization, and mobile phone positioning. And other services. If the device is a TV, it can perform user-related data recommendation (favorite programs) and so on. Or the voiceprint identification may also be a dedicated account or identification for the voiceprint information identification function.
  • One voiceprint identifier may correspond to at least one biological voiceprint template of the same user, and the voiceprint identifier is used to indicate the user.
  • the voiceprint identification in the embodiment of the present application may be described by taking a user ID as an example.
  • Registered voiceprint information information related to the user’s biological voiceprint after eliminating the noise related to the registered device.
  • the registered voiceprint information includes at least one biological voiceprint template (that is, a preset biological voiceprint for matching or recognition). Pattern).
  • User's biological voiceprint The user's own voiceprint, which has nothing to do with the device that records the voice, is equivalent to the voiceprint of the speaker that the listener hears face-to-face.
  • the devices included in the communication system corresponding to FIG. 1 are classified according to their functions, including: registered devices, storage devices, and voiceprint recognition devices.
  • the registration device is used to register the user's voice.
  • the registration device extracts registered voiceprint information from the voice, and the registered voiceprint information includes the user's own biological voiceprint obtained after separating the noise related to the registered device. (Ie, registered voiceprint information);
  • the storage device also called voiceprint sharing device
  • the voiceprint recognition device is a device that performs voiceprint recognition on the user's voice (may be a device that has not registered the user's voiceprint), but the voiceprint recognition function can be realized through the shared registered voiceprint information.
  • the registration device may be any terminal among multiple terminals in the communication system.
  • the terminal may be a mobile phone, a tablet computer, or the like.
  • the storage device may be a server, a cloud server, or any terminal among multiple terminals, for example, it may be at least one terminal among multiple terminals, or it may also be each terminal among multiple terminals.
  • the voiceprint recognition device may be a terminal that has not registered the voiceprint among multiple terminals.
  • the voiceprint recognition device may be a wearable device, a vehicle-mounted terminal, a smart screen, a headset, a personal computer, etc.
  • the voiceprint recognition device is also referred to as the "first terminal”
  • the registered device is also referred to as the "second terminal”.
  • a terminal can be either a registered device, a storage device, or a voiceprint recognition device.
  • a mobile phone can be used to register the user’s voiceprint, and it can also be used to store registered voiceprint information.
  • a terminal can be either a storage device or a voiceprint recognition device.
  • a personal computer can be used to store registered voiceprint information as well as a voiceprint recognition device.
  • the registration device receives the voice input by the user, separates the noise related to the registration device itself in the voice, extracts the user's biological voiceprint information in the voice, and registers the voiceprint information to obtain the registered voiceprint information, Since the registered voiceprint information is the voiceprint information after the device-related noise is eliminated, and the device’s influence on the user’s biological voiceprint is eliminated, the registered voiceprint information can be shared with other devices, and the storage device stores the registered voice.
  • the pattern information provides a basis for sharing the registered voiceprint information with other devices.
  • the voiceprint recognition device receives the voice input by the user, separates the noise in the voice related to the voiceprint recognition device itself, and extracts the The user’s biological voiceprint information (also called target voiceprint information) in the voice, the voiceprint recognition device obtains the registered voiceprint information from the storage device, and matches the registered voiceprint information with the target voiceprint information to perform voice Pattern recognition to identify the user's identity.
  • the registered voiceprint information refers to the extracted biological voiceprint information of the user in the voice after eliminating the noise related to the registered device itself.
  • voiceprint registration is performed, eliminating the complicated voiceprint registration process and realizing cross-device voiceprint recognition, which can greatly improve user experience, and both registered voiceprint information and target voiceprint information are separated from device-related noise
  • the voiceprint information eliminates the influence of the device on the voiceprint. No matter which voiceprint device the user uses for voiceprint recognition, it has a high accuracy rate.
  • This application mainly includes three stages, 1) the collection stage of registered voiceprint information; 2) the storage stage of registered voiceprint information, and 3) the speaker's voiceprint recognition stage.
  • the collection stage of registered voiceprint information Eliminate the device-related noise in the voice to obtain the user's biometric voiceprint, and record the biometric voiceprint under the existing user ID.
  • the biological voiceprint is registered to obtain the registered voiceprint information; in the registered voiceprint information storage stage: the registered voiceprint information is stored in association with the user ID to provide a basis for the sharing of registered voiceprint information; in the speaker recognition stage, the registered voice
  • the pattern information is matched with the target voiceprint information to obtain a matching result, which is used to indicate the identity of the user.
  • the execution subject of this stage is the registration device, or the execution subject of this stage is the processor, chip or chip system in the registration device.
  • the execution body of this stage takes the registered device as an example, and the registered device can perform the following steps:
  • the voice does not pay attention to the text content of the voice itself, and is not limited to the text content of the voice, and is mainly used to extract voiceprint information.
  • the processing may specifically collect voice through a voice input and output device. For details, refer to the specific introduction of FIG. 14, which is not limited here.
  • Step 302 The registration device eliminates noise related to the registration device in the voice to obtain the user's biological voiceprint.
  • the specific processing scheme can refer to the specific introduction of the subsequent noise elimination scheme, which will not be repeated here.
  • Step 303 The registration device registers the user's biological voiceprint to obtain the user's registered voiceprint information.
  • “Registration” refers to the operation of recording the user's biological voiceprint under an existing voiceprint logo, or it can be understood as the operation of establishing the correspondence between the user's biological voiceprint and the voiceprint logo, or it can be understood In order to configure the user's voiceprint identification operation for the user's biological voiceprint.
  • the voiceprint identifier may be a general account for performing related operations of the device. For example, the registered device uses a mobile phone as an example. After a user purchases a mobile phone, he will register a user ID, and the user ID can use services such as downloading software and mobile phone location.
  • the voiceprint identification may also be a dedicated account for the voiceprint recognition function, for example, the user's mobile phone number may be used as the dedicated account.
  • the voiceprint information that has been "recorded” or the biological voiceprint that has been configured with voiceprint identification is called "registered voiceprint” information".
  • the registration device establishes the correspondence between the user's biological voiceprint and the user ID, and completes the registration process of the user's biological voiceprint.
  • the registered voiceprint information includes the biological voiceprint of the user in step 302.
  • the registered device can be used as a storage device to store the registered voiceprint information.
  • the registration device may also send the registered voiceprint information and the corresponding user ID to other storage devices, and the storage device stores the registered voiceprint information.
  • the storage device may be a cloud server, a server or other terminals.
  • the received user’s voice is extracted to eliminate device-related noise in the voice and obtain the user’s "clean" biological voiceprint (when the biological voiceprint speaks face-to-face with a person).
  • the voiceprint of is the closest), because the device’s impact on the voice is reduced, the extracted biological voiceprint information can be used as registered voiceprint information shared by multiple devices.
  • the execution subject of storing the registered voiceprint information may be a storage device, or may also be a memory in the storage device.
  • the storage device receives the registered voiceprint information and the corresponding user ID sent by the registration device, and the storage device stores the registered voiceprint information and the corresponding user ID.
  • the registration device may also send information such as the registration time of the registered voiceprint information, the identification of the registered device, and the voice intensity received by the registered device to the storage device, and the storage device stores the foregoing information in association with the user ID.
  • the registration time is used to record the time when the voiceprint information is registered, and can be used to prompt the user to update regularly according to the registration time.
  • the identification of the registered device is used for the storage device to identify the registered device. When the storage device receives the user ID, it can be identified whether it is sent by the registered device or by an unregistered device (voiceprint recognition device).
  • the voice intensity information can be used to indicate the distance between the speaker and the registered device when the registered voiceprint information is collected.
  • the user ID can correspond to multiple biological voiceprint templates of the same user, and multiple biological voiceprint templates can correspond to the same or different ones. Voice strength information.
  • one biological voiceprint template can be obtained through a process of performing step 301 to step 303; the registration device can obtain multiple biological voiceprint templates by performing the above process multiple times.
  • multiple biological voiceprint templates correspond to different voice intensity information
  • multiple biological voiceprint templates of the same user can be covered in multiple situations (the voiceprint is recorded in different distances), and the accuracy of voiceprint recognition can be improved.
  • the registration device sends the registered voiceprint information and user ID to the cloud (or server), and the cloud (or server) stores the registered voiceprint information and user ID in association.
  • the user's registered voiceprint information is stored in the cloud (or server), and the registered voiceprint information is distinguished by the user ID, which can save the storage space of the terminal and can be applied to a wider range of application scenarios, not only indoor home application scenarios , It can also include outdoor scenes, such as the application scene of cellular vehicle network, as long as the terminals that can connect to the network can share the registered voiceprint information.
  • the registered device sends the registered voiceprint information and user ID to the cloud (or server), and any of the multiple terminals (such as terminal 2, can also be (Referred to as the third terminal) can download the registered voiceprint information to the local according to its own storage resources, the registered voiceprint information can be stored in a third terminal (such as a tablet), and the third terminal can be in a local area network
  • the terminals included in the home scene include mobile phones, smart screens, smart lamps, and tablets.
  • the multiple terminals can use wireless fidelity (WiFi), Bluetooth, infrared, and network cards.
  • the registered voiceprint information can be distributed and stored in each of multiple terminals (such as smart screens, personal computers, vehicle-mounted terminals, and tablet computers, etc.) .
  • each terminal sends a user ID to the cloud
  • each terminal can receive the registered voiceprint information corresponding to the user ID from the cloud (or server).
  • the voiceprint data can also be shared between various terminals through Bluetooth, WiFi, infrared, network card, etc.
  • the voiceprints of the voiceprint devices can be synchronized and shared regularly or non-periodically between various terminals. Users can also configure not to update the voiceprint and not share the voiceprint on each device. If the terminals cannot share the registered voiceprint data through the above-mentioned communication methods, the registered voiceprint information can also be shared when the terminals are connected to other devices.
  • Each terminal has stored the registered voiceprint information corresponding to the user ID.
  • the voiceprint recognition device is also a storage device.
  • the voiceprint recognition device can obtain the registered voiceprint information from the local.
  • the registered voiceprint information needs to be obtained from the cloud or from other devices, that is, the registered voiceprint information is shared, and voice recognition can be performed quickly.
  • the main body of this stage is the voiceprint recognition device (also called the first terminal), or it can also be the processor, chip or chip system in the first terminal.
  • the first terminal receives the voice input by the user, separates the noise related to the first terminal in the voice, and extracts the user’s biological voiceprint information (also called target voiceprint information), and the first terminal Obtain the registered voiceprint information from the storage device, and match the registered voiceprint information with the target voiceprint information to perform cross-device voiceprint recognition. If the registered voiceprint information matches the target voiceprint information, the voiceprint recognition is successful. If the voiceprint information does not match the target voiceprint information, the voiceprint recognition fails.
  • the execution subject takes the first terminal as an example for description.
  • the first terminal (such as a smart speaker) can perform the following steps: step 701, the first terminal receives the user input Voice, the process can be specifically realized by voice collection of voice input and output devices, which will not be expanded here.
  • step 702 The first terminal eliminates noise related to the recognition device in the voice to obtain target voiceprint information, where the target voiceprint information is the biological voiceprint of the user. For details of this process, refer to the subsequent description.
  • Step 703 The first terminal obtains the registered voiceprint information of the user.
  • the registered voiceprint information can be obtained from a registered device or other equipment, or obtained from a registered device or other device in advance and stored in the memory of the first terminal, and the registered voiceprint information is read from the memory when it is used .
  • the specific process of obtaining this information from the registered device or other devices please refer to the following description.
  • the registered voiceprint information and the ID corresponding to the registered voiceprint information are stored by a storage device or a registration device.
  • the storage device may be a cloud server, a server, or the storage device may be a terminal, or ,
  • the storage device can also be multiple terminals.
  • the voiceprint recognition device sends a user ID to the storage device or the registration device to request the user's registered voiceprint information; the voiceprint recognition device receives the registered voiceprint information and the corresponding user ID sent by the storage device or the registration device.
  • the storage device can share (or synchronize) the registered voiceprint information and the corresponding user ID with other terminals. For example, it can be synchronized regularly, or multiple terminals can be synchronized when connected to the same LAN.
  • the storage device is a tablet computer
  • the communication system includes 3 terminals, a tablet computer, a TV, and a mobile phone.
  • the tablet computer will store the registered voiceprint information and the corresponding ID Transmitted to mobile phones and TVs, that is, the voiceprint recognition device receives the registered voiceprint information and the corresponding ID sent by the storage device.
  • the voiceprint recognition device does not need to send a user ID to the storage device to request the registration of voiceprint information.
  • the first terminal sends the user ID and the identification of the first terminal to the cloud server, and the cloud server confirms the storage location of the registered voiceprint information corresponding to the user ID according to the user ID (that is, in which device it is stored) In), the cloud server confirms that the registered voiceprint information is stored in the first device according to the user ID, the cloud server sends the user ID and the identification of the first terminal to the first device, and the first device is in communication connection with the first terminal.
  • the first device sends the registered voiceprint information to the first terminal according to the identifier of the first terminal, and the first terminal receives the registered voiceprint information sent by the first device.
  • Step 704 The first terminal matches the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the identity of the user. If the target voiceprint information matches the registered voiceprint information, it is determined that the user and the user corresponding to the registered voiceprint information are the same user, that is, the user identity is true (or the preset user), if the target voiceprint information matches the registered voiceprint information If the voiceprint information does not match, it is determined that the user and the user corresponding to the registered voiceprint information are different users, that is, the user identity is false.
  • the first terminal executes the voice control instruction; if the user identity is false (or not a preset user), it is determined that the user corresponds to the registered voiceprint information
  • the user identity is a preset user
  • the first terminal executes the voice control instruction if the user identity is false (or not a preset user)
  • it is determined that the user corresponds to the registered voiceprint information The users of are different users, and the first terminal does not need to execute the voice control instruction.
  • the terminals that the user S usually uses include mobile phones, tablet computers, computers, smart screens, and vehicle-mounted terminals.
  • User S can register his voiceprint through a registered device (such as a mobile phone).
  • the mobile phone receives the voice input by user S, such as "Hello, Xiaoyi".
  • the input voice does not focus on the text itself, and the mobile phone extracts the
  • the biological voiceprint information in the voice is the voiceprint information after the noise related to the mobile phone is separated, the biological voiceprint information is associated with the user ID of the user S, and the mobile phone associates the user ID with the user ID.
  • the biological voiceprint information is sent to the cloud, which serves as a storage center for registered voiceprint information.
  • the vehicle-mounted terminal When user S wants to control the vehicle-mounted terminal by voice, log in the user ID on the vehicle-mounted terminal (you can log in in advance, you do not need to log in every time), the vehicle-mounted terminal sends the user ID to the cloud through the cellular network, and the cloud will communicate with the user ID
  • the registered voiceprint information corresponding to the user ID is sent to the vehicle terminal, and the vehicle terminal receives the voice input by user S. If the voice is "play music", the vehicle terminal receives the voice input by user S and separates the voice from the vehicle. For terminal-related noise, extract the target voiceprint information in the voice.
  • the vehicle-mounted terminal will match the registered voiceprint information received from the cloud with the target voiceprint information.
  • the vehicle-mounted The terminal confirms that the user identity of the user S is the preset user, and executes the voice command of "play music".
  • the application scenario corresponding to Figure 8 above is the "speaker confirmation" application scenario, which is to extract the target voiceprint information in the voice received by the vehicle terminal, and determine the registration corresponding to the target voiceprint information and the user ID Whether the voiceprint information is the voiceprint of the same person, after determining the identity of the user S, the vehicle-mounted terminal can execute the voice command of the user S. If the target voiceprint information and the registered voiceprint information do not match, the vehicle-mounted terminal does not need to execute the voice command of the user S.
  • the application scenarios corresponding to FIG. 8 are only examples.
  • the application scenarios of this application include but are not limited to account login (such as bank account login), identity verification (such as security door voice recognition, identity recognition in financial securities transactions) and other application scenarios.
  • each registered voiceprint information corresponds to a user ID.
  • each user ID corresponds to user data.
  • the user data can be different information for different application scenarios.
  • the user data can be The user's favorite program, the temperature of the air conditioner, etc.
  • the user data takes the temperature of an air conditioner as an example, and the corresponding relationship between user ID and user data can be shown in Table 1 below:
  • Match the target voiceprint information with the multiple registered voiceprint information corresponding to the voiceprint identifier determine the target registered voiceprint information that matches the target voiceprint information; obtain the target registered voiceprint information correspondence The user data associated with the user ID; then perform the operation corresponding to the user data.
  • a family includes 3 family members, such as user f (such as father), user g (such as mother) and user c (such as child).
  • user f such as father
  • user g such as mother
  • user c such as child
  • the tablet can register a preset number of voiceprint information. The preset number can be set by the user, or it can be System settings of the tablet.
  • a terminal can register the voiceprint information of 3 users as an example.
  • User f logs in to the user ID (or pre-login), and the tablet computer receives the voice input for registration by user f.
  • the voice can be any The text content of the voice (such as hello, Xiaoyi), the text content of the voice is not limited, the tablet computer separates the noise related to the tablet computer in the voice, and extracts the biological voiceprint information of the user f in the voice.
  • the biological voiceprint information is associated with the user ID (such as "1A"), and the registration is performed to obtain the first registered voiceprint information of the user f.
  • the tablet computer receives the voice for registration input by user g, and obtains the second registered voiceprint information of user g.
  • the second registered voiceprint information corresponds to the user ID "such as 2D", and the tablet computer receives the input from user c
  • the voice used for registration, the third registered voiceprint information of user c is obtained, and the user ID corresponding to the third registered voiceprint information "such as 3C", the tablet computer sends the user ID corresponding to each registered voiceprint information to the cloud , Stored by the cloud.
  • the smart speaker receives the voice "Xiaoyi, turn on the air conditioner" input by the user f , The smart speaker separates the noise related to the smart speaker in the voice, extracts the user’s biological voiceprint information (target voiceprint information) in the voice, and the smart speaker sends the user ID to the cloud, and the cloud uses the three user IDs And the corresponding 3 registered voiceprint information are sent to the smart speaker, or the smart speaker has pre-stored the 3 registered voiceprint information and the corresponding user ID, and the smart speaker combines the target voiceprint information with the received 3 registrations The voiceprint information is matched.
  • the smart speaker determines the user ID (such as "1A") corresponding to the first registered voiceprint information, and the smart speaker can determine the relevant user data corresponding to the voiceprint identifier.
  • the relevant user data is "temperature 25°C”. ".
  • the smart speaker recognizes the user ID (such as "1A") corresponding to the first registered voiceprint information, indicating that the voice is the voice input by user f, and the historical user data recorded by the smart speaker can be: user ID (1A) corresponds to "temperature 25°C", which means that user f often adjusts the temperature of the air conditioner to 25°C, and the smart speaker sends control commands to the air conditioner according to the relevant information to adjust the temperature of the air conditioner to 25°C.
  • user ID (1A) corresponds to "temperature 25°C”
  • the smart speaker sends control commands to the air conditioner according to the relevant information to adjust the temperature of the air conditioner to 25°C.
  • smart devices can also provide personalized services for different users and expand the applications of smart devices.
  • the above application scenario is a home application scenario.
  • This application can also be applied to a work scenario.
  • the registered voiceprint information of multiple members of a working group corresponds to multiple IDs.
  • the voiceprint recognition technology in this application can be used to identify the speaker as a working group For which user (that is, to identify the speaker), different users have different permissions. If the speaker is identified as the user d corresponding to the user ID, the voiceprint recognition device directly executes the user data of the user d's permissions.
  • the “speaker identification” can be performed through the voiceprint recognition, and the speaker identification is performed to determine the user data corresponding to the speaker.
  • the user data includes, but is not limited to, those associated with the “speaker” Historical information, or user data corresponding to the "speaker” permission.
  • the device can be intelligently customized or intelligently recommended based on user data.
  • the smart speaker can perform operations corresponding to the historical information based on the historical information (such as the temperature 25°C) associated with the "speaker" (the user corresponding to the user ID "1A"), without requiring the user Perform habitual operations every time to improve user experience.
  • the specific manner in which the first terminal extracts target voiceprint information in the voice that is not related to the first terminal may include: 1. The method of machine learning; 2. The method of signal processing.
  • the first terminal uses the voice as the input of a voiceprint extraction model, and outputs the target voiceprint information through the voiceprint extraction model, and the voiceprint extraction model is collected from multiple devices The corpus is learned.
  • the voiceprint extraction model includes, but is not limited to, Gaussian mixture model (GMM), Gaussian mixture model-universal background model (GMM-UBM), i-vector, x-vector, dnn-ivector, Algorithms such as deep neural network (DNN), speech analysis, speech factorization, clustering, transformation, etc.
  • GMM Gaussian mixture model
  • GMM-UBM Gaussian mixture model-universal background model
  • i-vector i-vector
  • x-vector x-vector
  • dnn-ivector dnn-ivector
  • Algorithms such as deep neural network (DNN)
  • DNN deep neural network
  • Machine learning includes: the training phase of the voiceprint extraction model and the application phase of the voiceprint extraction model.
  • the large amount of corpus includes corpus collected by different types of equipment.
  • the different types of equipment include but are not limited to smart home appliances (such as smart screens, TVs, smart washing machines, smart air conditioners, etc.), lighting systems, smart speakers, robots, etc.; the terminal can also be a vehicle-mounted terminal, user equipment, mobile phone , Tablet computers, personal computers, virtual reality terminal devices, augmented reality terminal devices, wearable devices and other devices with voice recording capabilities.
  • the reference data is voiceprint information that has nothing to do with the device (or has a weak correlation).
  • the voiceprint data output by the GMM-UBM model is compared with the reference data.
  • the reference data It is the user's biological voiceprint data. If the difference between the output voiceprint data and the reference data is greater than or equal to the threshold, the output voiceprint data is re-input to the GMM-UBM model, the voiceprint data is output through the GMM-UBM model, and then the The output voiceprint data is compared with the reference data.
  • the voiceprint extraction model is used to separate device-related noise and extract the user's biological voiceprint.
  • the voiceprint extraction model is pre-trained through machine learning, and the biological voiceprint of the user in the speech is directly extracted through the trained voiceprint extraction model, which has good robustness.
  • a filter may be used to process the voice signal to perform frequency response compensation.
  • the frequency response compensation is used to eliminate noise in the voice related to the recognition device to obtain the user's biological voiceprint information.
  • the filter is the most basic signal processing device, which extracts the required signal from the mixed signals.
  • the main function of the filter is to eliminate various noises that affect signal processing.
  • the filter generates different gains according to different frequencies, so that specific signals are highlighted, attenuated the highlighted signals, and weaker signals are enhanced , So as to achieve the purpose of eliminating equipment noise.
  • the filter is shown in the following expression:
  • the above formula (1) is a finite impulse response filter, where n is the point in time, N is the unit impulse response length of the digital filter, the coefficient a and the coefficient x are convolved to produce the filtered output y, and k starts from 0, and always Take N.
  • the filter is shown in the following expression:
  • the above formula (2) is an infinite impulse response filter, where n refers to the point in time, N and P are the unit impulse response length of the digital filter, k starts from 0 and continues to N, and the coefficients a and x Convolution; j starts from 0 and continues to P, and the sum of the convolution of the coefficients b and y produces the filtered output y.
  • the frequency response of different devices is adjusted to a level or the same by adjusting the coefficients a and x, if in the above equation (2), the frequency response of different devices is all tending to be level or Consistent, so as to filter out the noise related to the device in the speech.
  • Figure 11A includes three curves, the upper limit curve 1101, the lower limit curve 1102 and the curve 1103 without frequency response compensation.
  • the frequency response curve 1103 has a peak between 600hz-1.2KHz. , The peak exceeds the upper limit curve 1101, and between 300 Hz and 500 Hz, the frequency response curve 1103 is an upward trend curve.
  • Figure 11B includes three curves, the upper limit curve 1101, the lower limit curve 1102 and the frequency response compensated curve 1104.
  • the frequency response tends to be horizontal, or, in the above formula (2), the frequency response of different devices tends to be horizontal by adjusting the coefficients a and b. In the frequency range of 300hz-2.5KHz, the frequency response tends to be consistent (such as -2dBr) , So as to filter out the noise related to the device in the voice, and get the user's biological voiceprint.
  • the voice signal is processed by the filter, the prominent signal is attenuated, and the weaker signal is enhanced, thereby compensating the frequency response, so as to achieve the purpose of removing equipment-related noise.
  • the filter is filtered. It directly filters the noise signal, and the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
  • the biological voiceprint of each person is not static, but will change, such as different time periods of the day, or different health conditions (such as health and sickness), or different Factors such as the age of the user, these factors will cause the biological voiceprint of the same person to change.
  • multiple voiceprints can be registered for the same user, that is, the same user corresponds to multiple biological voiceprint templates
  • One user ID corresponds to one registered voiceprint information
  • the registered voiceprint information includes multiple biological voiceprint templates.
  • the storage device can generate a voiceprint model (ie, voiceprint template) based on multiple biological voiceprints of the same user.
  • a voiceprint model ie, voiceprint template
  • the x-vector system is a speaker based on DNN. recognition system. By training the DNN, the speaker's speech is mapped to a fixed-dimensional embedding vector (embeddings), called x-vector.
  • the x-vector network receives the voiceprint information of the same user.
  • the voiceprint information is the user's voiceprint for removing device-related noise.
  • the x-vector network can capture user's voiceprint information by using shorter speech, and has stronger robustness in short speech.
  • An input can correspond to an x-vector vector, this vector is the user's voiceprint information (because the device-related information is removed before, the voiceprint information has nothing to do with the device), or it becomes the user's voiceprint model. If multiple device-independent voiceprint information is input through multiple devices, or multiple device-independent voiceprint information is input through one device, there will be multiple x-vector vectors.
  • LDA linear discriminant analysis
  • the multiple vectors can cover more user pronunciation conditions (such as user voiceprint information in different time periods, voiceprint information in healthy or unhealthy states, etc.), that is, one user corresponds to multiple
  • the biological voiceprint model (or template) the multiple biological voiceprint models form a voiceprint information template library, which can further enhance the voiceprint recognition effect.
  • the number of templates corresponding to the same user can be 10-30, and it can be updated regularly or irregularly, replacing the old template with a new template to improve the robustness of the biological voiceprint template.
  • the registration device can register one biological voiceprint template at a time, or multiple biological voiceprint templates can be registered multiple times. Or each of multiple registered devices can register a biological voiceprint template.
  • the registration device can send one or more biometric voiceprint templates to the storage device or recognition device through one message at a time, or it can send multiple biometric voiceprint templates to the storage device or the recognition device through multiple messages, respectively. ⁇ Recognition equipment.
  • the sending channels include, but are not limited to, wireless and wired methods, which can be specifically implemented through a transceiver. For details, refer to the corresponding description in FIG. 14.
  • the identification device may obtain the one or more biological voiceprint templates from the registration device or the storage device.
  • the one or more biological voiceprint templates are used to perform voiceprint recognition matching, and belong to the registered voiceprint information that can be used by the recognition device, although the registered voiceprint information is not all
  • the identification device is generated and registered by itself, but from other devices.
  • the embodiments of the present application also provide corresponding devices, including corresponding modules for executing the foregoing embodiments.
  • the module can be software, hardware, or a combination of software and hardware.
  • an embodiment of the present application also provides a device 1300, which may be a terminal or a component of a terminal (for example, an integrated circuit, a chip, etc.).
  • the voiceprint recognition device includes voice input and output.
  • a module 1301 (or a voice input and output unit), a processing module 1302 (or a processing unit), and a transceiver module 1303 (or a transceiver unit).
  • the device 1300 can perform the function of the recognition device in the above method embodiment: the voice input and output module 1301 is used to receive the voice input by the user; the processing module 1302 is used to eliminate the voice input and output module 1301 received
  • the noise in the voice related to the recognition device is used to obtain target voiceprint information, the target voiceprint information is the user's biological voiceprint;
  • the transceiver module 1303 is used to obtain the user's registered voiceprint information, and the registered voiceprint information includes cancellation and registration equipment At least one biological voiceprint template obtained from related noises; the processing module 1302 is also used to match the target voiceprint information obtained by the processing module 1302 with the registered voiceprint information obtained by the transceiver module 1303 to obtain a matching result, and the matching result is used To indicate the identity of the user.
  • the voice input and output module 1301 is configured to execute step 701 in the embodiment corresponding to FIG. 7 above.
  • the processing module 1302 is configured to execute step 702 and step 704 in the embodiment corresponding to FIG. 7.
  • the transceiver module 1303 is configured to perform step 703 in the embodiment corresponding to FIG. 7.
  • the device 1300 can perform the function of the registered device in the above method embodiment: the voice input and output module 1301 is used to receive the voice input by the user; the processing module 1302 is used to eliminate the voice input and output module 1301 receiving The noise associated with the registered device in the voice of the user can be used to obtain the user's registered voiceprint information.
  • the transceiver module 1303 is configured to send the registered voiceprint information and the corresponding voiceprint identifier obtained by the processing module 1302 to other storage devices.
  • the voice input and output module 1301 is configured to execute step 301 in the embodiment corresponding to FIG. 3 above.
  • the processing module 1302 is configured to execute step 302 and step 303 in the embodiment corresponding to FIG. 3, and details are not described here.
  • the device may be a chip or an integrated circuit.
  • the transceiver module 1303 may be a communication interface
  • the processing module 1302 may be a logic circuit
  • the voice input and output module 1301 may be an audio circuit.
  • the communication interface may be an input/output interface or a transceiver circuit.
  • the input and output interface may include an input interface and an output interface.
  • the transceiver circuit may include an input interface circuit and an output interface circuit.
  • the processing module 1302 may be a processing device, and the functions of the processing device may be partially or fully implemented by software.
  • the functions of the processing device may be partially or fully realized by software.
  • the processing device may include a memory and a processor, where the memory is used to store a computer program, and the processor reads and executes the computer program stored in the memory to perform corresponding processing and/or steps in any method embodiment.
  • the processing device may only include a processor.
  • the memory for storing the computer program is located outside the processing device, and the processor is connected to the memory through a circuit/wire to read and execute the computer program stored in the memory.
  • each functional component in FIG. 13 can be implemented by software, hardware or a combination of the two, which is not specifically limited.
  • FIG. 14 is a schematic structural diagram of an apparatus 1400 provided by an embodiment of the application.
  • the device may be a terminal, or may also be an integrated circuit, chip, or chip system in the terminal.
  • the device uses a terminal as an example.
  • the terminal may include, but is not limited to, a terminal in a smart home, a lighting system, a smart speaker, a robot, etc.; the terminal may also be a vehicle-mounted terminal, user equipment, mobile phone, tablet computer, personal computer, etc.
  • the apparatus 1400 includes a processor 1401, a transceiver 1402, a memory 1403, and a voice input and output device 1404.
  • the processor 1401, the transceiver 1402, the memory 1403, and the voice input and output device 1404 can communicate with each other through internal connection paths, and transfer control signals and/or data signals.
  • the memory 1403 is used to store a computer program
  • the processor 1401 is used to call and run the computer program from the memory 1403 to control the transceiver 1402 to send and receive signals.
  • the processor 1401 which is the control center of the device, uses various interfaces and lines to connect the various parts of the entire mobile phone, by running or executing software programs and/or modules stored in the memory 1403, and calling and storing in the memory 1403 Data, perform various functions of the phone and process data.
  • the voice input and output device 1404 is used for the audio interface between the user and the mobile phone.
  • the voice input unit may be an audio circuit, or may be a voice recognizer.
  • the audio circuit may include a speaker 14041 and a microphone 14042.
  • the microphone 14042 converts the collected sound signals into electrical signals, which are received by the audio circuit and converted into audio data, and then the audio data is output to the processor 1401 for processing. Device related noise to get the user’s biological voiceprint.
  • the device may also include an antenna.
  • the transceiver 1402 transmits or receives wireless signals through an antenna.
  • the transceiver 1402 can be used to send or receive registered voiceprint information and corresponding voiceprint identifiers to other devices.
  • the processor 1401 and the memory 1403 may be combined into one processing device, and the processor 1401 is configured to execute the program code stored in the memory 1403 to implement the foregoing functions.
  • the memory 1403 may also be integrated in the processor 1401.
  • the memory 1403 is independent of the processor 1401, that is, located outside the processor 1401.
  • the transceiver 1402 includes but is not limited to a radio frequency (RF) circuit, a communication interface, a WiFi module, a Bluetooth module module, and so on.
  • RF radio frequency
  • the device may further include a display unit 1405 which can be used to display information input by the user or information provided to the user and various images.
  • the display unit can be configured with a display panel in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
  • the device 1400 can be used to perform the functions performed by the recognition device in the method embodiment: the voice input and output device 1404 is used to receive the voice input by the user; the processor 1401 is used to eliminate the voice in the voice input and output device 1404. Recognize the noise related to the device to obtain target voiceprint information, the target voiceprint information is the user's biological voiceprint; the transceiver 1402 is used to obtain the user's registered voiceprint information, and the registered voiceprint information includes eliminating noise related to the registered device The obtained at least one biological voiceprint template; the processor 1401 is further configured to match the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the identity of the user.
  • the transceiver 1402 is configured to receive the registered voiceprint information and the voiceprint identifier of the registered voiceprint information sent by the storage device or the registered device, and the voiceprint identifier is used to indicate the user.
  • the transceiver 1402 is also used to send the user's voiceprint identification to the storage device or the registration device to request the user's registered voiceprint information.
  • the processor 1401 is specifically configured to input the voice into the voiceprint extraction model, and eliminate noise related to the recognition device in the voice through the voiceprint extraction model to obtain target voiceprint information.
  • the voiceprint extraction model is obtained by learning corpus collected by multiple devices.
  • the processor 1401 is further configured to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain target voiceprint information.
  • the processor 1401 is further configured to obtain user data associated with the user identity; and perform operations corresponding to the user data.
  • the device 1400 is used to perform the functions performed by the registered device in the above method embodiment: the voice input and output device 1404 is used to receive the voice input by the user; the processor 1401 is used to: The noise related to the registered device is used to obtain the user's registered voiceprint information, and the registered voiceprint information specifically includes the user's biological voiceprint.
  • the transceiver 1402 is configured to send the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information to the storage device or the recognition device, and the voiceprint identifier is used to indicate the user.
  • the processor 1401 is specifically configured to input the voice into the voiceprint extraction model, and eliminate noise related to the registered device in the voice through the voiceprint extraction model to obtain registered voiceprint information.
  • the voiceprint extraction model is obtained by learning corpus collected by multiple devices.
  • the processor 1401 is further configured to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain registered voiceprint information.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the above-mentioned processor can be general processing, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable Logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the processing unit used to execute these technologies at a communication device can be implemented in one or more general-purpose processors, DSPs, digital signal processing devices, ASICs, Programmable logic device, FPGA, or other programmable logic device, discrete gate or transistor logic, discrete hardware component, or any combination of the foregoing.
  • the general-purpose processor may be a microprocessor.
  • the general-purpose processor may also be any traditional processor, controller, microcontroller, or state machine.
  • the processor can also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors combined with a digital signal processor core, or any other similar configuration. accomplish.
  • the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the present application also provides a computer-readable medium on which a computer program is stored, and when the computer program is executed by a computer, the function of any of the foregoing method embodiments is realized.
  • This application also provides a computer program product, which, when executed by a computer, realizes the functions of any of the foregoing method embodiments.
  • At least one of! or "at least one of" as used herein means all or any combination of the listed items, for example, "at least one of A, B and C", It can mean: A alone exists, B alone exists, C exists alone, A and B exist at the same time, B and C exist at the same time, and there are six cases of A, B and C at the same time, where A can be singular or plural, and B can be Singular or plural, C can be singular or plural.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean that B is determined only based on A, and B can also be determined based on A and/or other information.
  • the term "and/or” in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone In the three cases of B, A can be singular or plural, and B can be singular or plural.
  • the character "/" generally indicates that the associated objects before and after are in an "or” relationship.
  • the systems, devices, and methods described in this application can also be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the functions described in this embodiment are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Abstract

A voiceprint recognition apparatus, a voiceprint registration apparatus and a cross-device voiceprint recognition method. The method is applied to a system, and the system comprises a voiceprint recognition apparatus and a voiceprint registration apparatus. The method comprises: a voiceprint registration apparatus eliminating noise, which is related to the apparatus, in a first voice to obtain registered voiceprint information, wherein the registered voiceprint information provides a basis for a recognition device to perform voiceprint recognition; a voiceprint recognition apparatus eliminating noise, which is related to the apparatus, in a second voice to obtain target voiceprint information; and during voice recognition, the voiceprint recognition apparatus matching the target voiceprint information with the registered voiceprint information to obtain a matching result, wherein the matching result is used for indicating the identity of a user, and the registered voiceprint information comprises a voiceprint obtained after the influence of the apparatus on a biological voiceprint is eliminated, and can be shared with other devices for use. A user can enable a voiceprint recognition function to be realized on the voiceprint recognition apparatus without repeating voiceprint registration, thereby omitting a complicated voiceprint registration process, and realizing cross-device voiceprint recognition.

Description

一种声纹识别、注册装置、及跨设备声纹识别方法Voiceprint recognition and registration device, and cross-device voiceprint recognition method 技术领域Technical field
本申请涉及声纹识别技术领域,尤其涉及一种声纹识别、注册装置、及跨设备声纹识别方法。This application relates to the technical field of voiceprint recognition, and in particular to a voiceprint recognition and registration device, and a cross-device voiceprint recognition method.
背景技术Background technique
声纹识别是生物识别技术中的一种,是通过将语音信号转换为数字信号,使用计算机判别说话人身份的识别技术。当前,设备在使用声纹识别功能时,需要预先对用户的声纹进行注册,随着智能设备技术的不断发展,声纹设备的增多,对声纹进行注册变成了一个繁杂的过程。Voiceprint recognition is a type of biological recognition technology, which is a recognition technology that uses a computer to determine the identity of a speaker by converting a voice signal into a digital signal. Currently, when the device uses the voiceprint recognition function, it is necessary to register the user's voiceprint in advance. With the continuous development of smart device technology and the increase of voiceprint devices, registering the voiceprint has become a complicated process.
当前技术中,可以建立不同设备之间的声纹映射模型,从第一设备收录的语音中提取声纹,进行声纹注册,然后从第二设备收录的语音指令中提取声纹特征,基于所建立的声纹映射模型将所述声纹特征映射到通过第一设备注册的声纹,从而可以不需要在第二设备上注册声纹。In the current technology, it is possible to establish a voiceprint mapping model between different devices, extract voiceprints from the voice recorded by the first device, perform voiceprint registration, and then extract voiceprint features from the voice commands recorded by the second device. The established voiceprint mapping model maps the voiceprint features to the voiceprint registered by the first device, so that there is no need to register the voiceprint on the second device.
上述方案中,保存的声纹映射模型与设备相关,且需要做所有设备的两两映射,映射关系数量过多,例如当某个环境中有Q台设备,那么建立两两声纹映射关系模型的数量为Q×(Q-1);如果个别设备信道存在变化,则对应的声纹映射模型组全部是错误的,识别准确率很难保证。In the above solution, the saved voiceprint mapping model is related to the device, and pairwise mapping of all devices is required. There are too many mapping relationships. For example, when there are Q devices in an environment, a pairwise voiceprint mapping relationship model is established. The number of is Q×(Q-1); if there is a change in the channel of an individual device, the corresponding voiceprint mapping model group is all wrong, and the recognition accuracy is difficult to guarantee.
发明内容Summary of the invention
本申请实施例提供了一种识别设备中的声纹识别装置,注册设备中的声纹注册装置及跨设备声纹识别的方法,该方法应用于一种跨设备声纹识别系统,该系统包括识别设备和注册设备,其中注册设备对用户的语音进行注册;该注册设备消除收录语音中与注册设备相关的噪声以得到用户的生物声纹,将该生物声纹记录在该用户的声纹标识之下以得到注册声纹信息,可以理解为,对该用户的生物声纹已“记录在册”,将已经记录的用户的生物声纹称为用户的注册声纹信息;注册声纹信息为声纹识别提供基础;识别设备对用户语音进行声纹识别的设备,识别设备接收到用户输入的语音后,消除语音中与识别设备相关的噪声以得到用户的生物声纹(称为目标声纹信息),通过注册设备共享的注册声纹信息实现声纹识别功能,即将目标声纹信息与注册声纹信息进行匹配,通过匹配结果来指示用户身份。The embodiment of the application provides a voiceprint recognition device in a recognition device, a voiceprint registration device in a registration device, and a method for cross-device voiceprint recognition. The method is applied to a cross-device voiceprint recognition system, and the system includes Recognition device and registration device, where the registration device registers the user's voice; the registration device eliminates the noise related to the registration device in the recorded voice to obtain the user's biological voiceprint, and records the biological voiceprint in the user's voiceprint identification Below is the registered voiceprint information, which can be understood as that the biological voiceprint of the user has been "recorded", and the recorded biological voiceprint of the user is called the user's registered voiceprint information; the registered voiceprint information is the voice Pattern recognition provides the basis; the recognition device performs voiceprint recognition on the user’s voice. After the recognition device receives the voice input by the user, it eliminates the noise related to the recognition device in the voice to obtain the user’s biological voiceprint (called target voiceprint information) ), the voiceprint recognition function is realized through the registered voiceprint information shared by the registered device, that is, the target voiceprint information is matched with the registered voiceprint information, and the user identity is indicated through the matching result.
第一方面,本申请实施例提供了一种识别设备中的声纹识别装置,该识别设备可以为终端,该声纹识别装置可以为该终端中的处理器,芯片或者芯片系统;或者该声纹识别装置为终端,该终端为识别设备的组成部分;该装置包括处理器,与处理器连接的语音输入输出设备和收发器;该语音输入输出设备用于接收用户输入的语音;处理器用于消除接收到的语音中的与该识别设备相关的噪声以得到目标声纹信息,即该目标声纹信息为消除了识别设备对语音的影响之后的生物声纹,用户的生物声纹相当于听话人面对面听到的说话 人发出的声纹;然后,通过收发器获取该用户的注册声纹信息,注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;该处理器,还用于将目标声纹信息与注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份;本申请实施例中,注册声纹信息是消除了语音中与注册设备本身相关的噪声后,提取的该语音中用户本身的生物声纹信息,由于消除了注册设备对于生物声纹的影响,可以将该注册声纹信息共享给其他识别设备使用,用户只需要在一个设备(注册设备)上注册声纹,就可以在多个设备实现进行声纹识别功能,该识别设备不需要进行声纹注册,省去繁杂的声纹注册过程,实现跨设备声纹识别,可以极大的提升用户体验,并且注册声纹信息与目标声纹信息均是分离了设备相关的噪声后的生物声纹,消除了设备本身对声纹的影响,该识别设备无论是哪种类型的终端,都具有较高的准确率。In the first aspect, an embodiment of the present application provides a voiceprint recognition device in a recognition device. The recognition device may be a terminal, and the voiceprint recognition device may be a processor, a chip, or a chip system in the terminal; or the voice The pattern recognition device is a terminal, which is a component of the recognition device; the device includes a processor, a voice input and output device and a transceiver connected to the processor; the voice input and output device is used to receive the voice input by the user; the processor is used to Eliminate the noise related to the recognition device in the received voice to obtain target voiceprint information, that is, the target voiceprint information is the biological voiceprint after eliminating the influence of the recognition device on the voice, and the user’s biological voiceprint is equivalent to obedient The voiceprint of the speaker that the person hears face-to-face; then, the user's registered voiceprint information is obtained through the transceiver, and the registered voiceprint information includes at least one biological voiceprint template obtained by eliminating noise related to the registered device; this processing The device is also used to match the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the identity of the user; After the noise, the biometric voiceprint information of the user in the voice is extracted. Since the influence of the registered device on the biometric voiceprint is eliminated, the registered voiceprint information can be shared with other recognition devices. The user only needs to use one device ( Registering the voiceprint on the registered device) can realize the voiceprint recognition function on multiple devices. The recognition device does not require voiceprint registration, eliminating the complicated voiceprint registration process, and realizing cross-device voiceprint recognition, which can greatly The user experience is improved, and the registered voiceprint information and the target voiceprint information are biological voiceprints after separating the device-related noise, eliminating the influence of the device itself on the voiceprint. No matter what type of terminal the identification device is, Both have a high accuracy rate.
在一种可能的实现方式中,装置还包括与处理器连接的收发器;收发器用于接收存储设备或注册设备发送的注册声纹信息及注册声纹信息的声纹标识,声纹标识用于指示用户;本实施例中,可以通过存储设备存储该注册声纹信息,为将该注册声纹信息共享给其他设备使用提供基础,当该用户需要使用识别设备进行声纹识别时,不需要再在该识别设备上进行声纹注册了,而是可以共享存储设备或注册设备中的注册声纹信息。In a possible implementation, the device further includes a transceiver connected to the processor; the transceiver is used to receive the registered voiceprint information and the voiceprint identification of the registered voiceprint information sent by the storage device or the registration device, and the voiceprint identification is used for Instruct the user; in this embodiment, the registered voiceprint information can be stored through a storage device to provide a basis for sharing the registered voiceprint information with other devices. When the user needs to use a recognition device for voiceprint recognition, there is no need to The voiceprint registration is performed on the recognition device, but the registered voiceprint information in the storage device or the registration device can be shared.
在一种可能的实现方式中,收发器还用于向存储设备或注册设备发送用户的声纹标识,以请求用户的注册声纹信息;该用户的生物声纹标识与注册声纹信息具有对应关系,向存储设备或注册设备发送用户的声纹标识,存储设备或注册设备将该声纹标识对应的注册声纹信息发送至该声纹识别装置。In a possible implementation, the transceiver is also used to send the user's voiceprint identification to the storage device or registration device to request the user's registered voiceprint information; the user's biological voiceprint identification corresponds to the registered voiceprint information Relationship, the user’s voiceprint identification is sent to the storage device or the registration device, and the storage device or the registration device sends the registered voiceprint information corresponding to the voiceprint identification to the voiceprint recognition device.
在一种可能的实现方式中,处理器具体用于将语音输入声纹提取模型,通过声纹提取模型消除语音中的与识别设备相关的噪声以得到目标声纹信息;本申请实施例中,通过机器学习的方式,预先训练好声纹提取模型,通过已经训练好的声纹提取模型直接提取语音中的用户的生物声纹,具有较好的鲁棒性。In a possible implementation manner, the processor is specifically configured to input the voice into the voiceprint extraction model, and eliminate the noise related to the recognition device in the voice through the voiceprint extraction model to obtain target voiceprint information; in the embodiment of the present application, By means of machine learning, the voiceprint extraction model is pre-trained, and the biological voiceprint of the user in the speech is directly extracted through the trained voiceprint extraction model, which has good robustness.
在一种可能的实现方式中,声纹提取模型为对多个设备采集的语料进行学习得到的;该多个设备可以为不同类型的终端,该不同类型的终端包括但不限定于智慧家庭中的终端,照明系统、智能音箱、机器人、车载终端、用户设备、手机、平板电脑、个人电脑、虚拟现实终端设备、增强现实终端设备等;本申请实施例中,声纹提取模型为对多个不同设备采集的语料进行学习得到的,该声纹提取模型输出的声纹信息可以消除不同设备本身对于语音的影响以得到用户的生物声纹。In a possible implementation, the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; the multiple devices may be different types of terminals, and the different types of terminals include, but are not limited to, smart homes Terminals, lighting systems, smart speakers, robots, in-vehicle terminals, user equipment, mobile phones, tablets, personal computers, virtual reality terminal equipment, augmented reality terminal equipment, etc.; in the embodiment of this application, the voiceprint extraction model is multiple The voiceprint information output by the voiceprint extraction model is obtained by learning the corpus collected by different devices. The voiceprint information output by the voiceprint extraction model can eliminate the influence of different devices on the voice to obtain the user's biological voiceprint.
在一种可能的实现方式中,处理器还用于通过滤波器对语音进行频响补偿,频响补偿用于消除语音中的与识别设备相关的噪声以得到目标声纹信息;本申请实施例中,通过滤波器对语音信号进行处理,将凸显的信号衰减,将较弱的信号进行增强,从而对频响进行补偿,从而达到去除设备相关的噪声的目的,经过滤波器过滤方式,是直接过滤噪声信号,从而输出的信号为用户的生物声纹信号,实现简单且速度快。In a possible implementation manner, the processor is also used to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain target voiceprint information; embodiments of the present application In the process, the voice signal is processed through the filter, the prominent signal is attenuated, and the weaker signal is enhanced, thereby compensating the frequency response, so as to achieve the purpose of removing equipment-related noise. The filtering method is directly The noise signal is filtered, and the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
在一种可能的实现方式中,处理器,还用于获取与用户身份关联的用户数据;执行与用户数据对应的操作;本申请实施例中,处理器执行与该历史信息对应的操作,不需要用户每次进行习惯性操作,提高用户体验。In a possible implementation manner, the processor is also used to obtain user data associated with the user identity; perform operations corresponding to the user data; in the embodiment of the present application, the processor performs operations corresponding to the historical information, but not Users are required to perform habitual operations every time to improve user experience.
第二方面,本申请实施例提供了一种注册设备中的声纹注册装置,该注册设备可以为终端,该声纹注册装置可以为该终端中的处理器,芯片或者芯片系统;或者声纹注册装置为终端,该终端为识别设备的组成部分;该装置包括处理器及与处理器连接的语音输入输出设备;其中,语音输入输出设备用于接收用户输入的语音;处理器用于消除语音中与注册设备相关的噪声以得到用户的注册声纹信息,注册声纹信息包括用户的生物声纹;该注册声纹信息用于为其他设备对声纹识别提供基础,该注册声纹信息可以为多个识别设备共享的注册声纹信息;本申请实施例中,注册设备将接收的用户的语音进行提取,消除语音中与设备相关的噪声,得到用户“干净”的生物声纹(该生物声纹与人面对面说话时的声纹最为接近),由于消除了设备对语音的影响,提取的生物声纹信息可以作为多个设备共享的注册声纹信息,实现跨设备声纹识别,可以极大的提升用户体验。In the second aspect, the embodiments of the present application provide a voiceprint registration device in a registration device. The registration device may be a terminal, and the voiceprint registration device may be a processor, a chip, or a chip system in the terminal; or a voiceprint registration device. The registration device is a terminal, which is a component of the recognition device; the device includes a processor and a voice input and output device connected to the processor; wherein the voice input and output device is used to receive the voice input by the user; the processor is used to eliminate the voice input The noise related to the registered device is used to obtain the user's registered voiceprint information. The registered voiceprint information includes the user's biological voiceprint; the registered voiceprint information is used to provide a basis for other devices to recognize the voiceprint, and the registered voiceprint information can be Registration voiceprint information shared by multiple recognition devices; in this embodiment, the registration device extracts the received user’s voice, eliminates device-related noise in the voice, and obtains the user’s "clean" biological voiceprint (the biological voice The pattern is the closest to the voiceprint of a person when speaking face-to-face). Because the device’s influence on voice is eliminated, the extracted biological voiceprint information can be used as registered voiceprint information shared by multiple devices to achieve cross-device voiceprint recognition, which can greatly Enhance the user experience.
在一种可能的实现方式中,装置还包括与处理器连接的收发器;收发器用于向存储设备或识别设备发送注册声纹信息及注册声纹信息对应的声纹标识,声纹标识用于指示用户;本申请实施例中,收发器向存储设备或识别设备发送注册声纹信息及注册声纹信息对应的声纹标识,将存储设备或识别设备作为注册声纹信息及声纹标识的存储中心,为注册声纹信息的共享提供基础。In a possible implementation, the device further includes a transceiver connected to the processor; the transceiver is used to send the registered voiceprint information and the voiceprint identification corresponding to the registered voiceprint information to the storage device or the identification device, and the voiceprint identification is used for Instruct the user; in this embodiment of the application, the transceiver sends the registered voiceprint information and the voiceprint identification corresponding to the registered voiceprint information to the storage device or identification device, and the storage device or identification device is used as the storage of the registered voiceprint information and voiceprint identification The center provides a basis for the sharing of registered voiceprint information.
在一种可能的实现方式中,处理器具体用于将语音输入声纹提取模型,通过声纹提取模型消除语音中的与注册设备相关的噪声以得到注册声纹信息;本申请实施例中,通过机器学习的方式,预先训练好声纹提取模型,通过已经训练好的声纹提取模型直接提取语音中的用户的生物声纹,具有较好的鲁棒性。In a possible implementation, the processor is specifically configured to input the voice into the voiceprint extraction model, and eliminate the noise related to the registered device in the voice through the voiceprint extraction model to obtain registered voiceprint information; in the embodiment of the present application, By means of machine learning, the voiceprint extraction model is pre-trained, and the biological voiceprint of the user in the speech is directly extracted through the trained voiceprint extraction model, which has good robustness.
在一种可能的实现方式中,声纹提取模型为对多个设备采集的语料进行学习得到的;本申请实施例中,该声纹提取模型输出的声纹信息可以消除不同设备本身对于语音的影响以得到用户的生物声纹。In a possible implementation manner, the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; in the embodiment of the present application, the voiceprint information output by the voiceprint extraction model can eliminate the voice effects of different devices. Influence to get the user’s biological voiceprint.
在一种可能的实现方式中,处理器还用于通过滤波器对语音进行频响补偿,频响补偿用于消除语音中与识别设备相关的噪声以得到注册声纹信息;本申请实施例中,通过滤波器对语音信号进行处理,将凸显的信号衰减,将较弱的信号进行增强,从而对频响进行补偿,从而达到去除设备相关的噪声的目的,经过滤波器过滤方式,是直接过滤噪声信号,从而输出的信号为用户的生物声纹信号,实现简单且速度快。In a possible implementation manner, the processor is also used to perform frequency response compensation for the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain registered voiceprint information; in this embodiment of the application , The speech signal is processed through the filter, the highlighted signal is attenuated, and the weaker signal is enhanced, thereby compensating the frequency response, so as to achieve the purpose of removing equipment-related noise. After the filter filtering method, it is directly filtered The noise signal, and the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
第三方面,本申请实施例提供了一种跨设备声纹识别的方法,应用于识别设备,可以包括如下步骤:识别设备接收用户输入的语音;然后消除语音中的与识别设备相关的噪声以得到用户的生物声纹,该用户的生物声纹称为目标声纹信息;获取用户的注册声纹信息,该注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;将目标声纹信息与注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份;本申请实施例中,注册声纹信息是消除了语音中与注册设备本身相关的噪声后,提取的该语音中用户本身的生物声纹信息,由于消除了注册设备对于生物声纹的影响,可以将该注册声纹信息共享给其他识别设备使用,用户只需要在注册设备上注册声纹,就可以在多个设备实现进行声纹识别功能,不需要在每个识别设备上都进行声纹注册,省去繁杂的声纹注册过程,实现跨设备声纹识别,可以极大的提升用户体验,并且注册声纹信息与目标声纹信息 均是分离了设备相关的噪声后的生物声纹,消除了设备本身对声纹的影响,该识别设备无论是哪种类型的终端,都具有较高的准确率。In the third aspect, the embodiments of the present application provide a cross-device voiceprint recognition method, which is applied to a recognition device and may include the following steps: the recognition device receives the voice input by the user; and then eliminates the noise related to the recognition device in the voice. Obtain the user's biological voiceprint, the user's biological voiceprint is called target voiceprint information; obtain the user's registered voiceprint information, the registered voiceprint information includes at least one biological voiceprint template obtained by eliminating noise related to the registered device ; Match the target voiceprint information with the registered voiceprint information to obtain a matching result, which is used to indicate the identity of the user; in this embodiment of the application, the registered voiceprint information eliminates the noise related to the registered device itself in the voice, The extracted biological voiceprint information of the user himself in the voice, because the registered device’s influence on the biological voiceprint is eliminated, the registered voiceprint information can be shared with other recognition devices. The user only needs to register the voiceprint on the registered device. The voiceprint recognition function can be realized on multiple devices. It is not necessary to register the voiceprint on each recognition device. This saves the complicated voiceprint registration process and realizes cross-device voiceprint recognition, which can greatly improve the user experience. , And the registered voiceprint information and target voiceprint information are biological voiceprints after separating the device-related noise, which eliminates the influence of the device itself on the voiceprint. The recognition device has a higher performance regardless of the type of terminal. The accuracy rate.
在一个可选的实现方式中,获取用户的注册声纹信息可以包括:接收存储设备或注册设备发送的注册声纹信息及注册声纹信息对应的声纹标识,声纹标识用于指示用户。In an optional implementation manner, obtaining the registered voiceprint information of the user may include: receiving the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the storage device or the registered device, where the voiceprint identifier is used to indicate the user.
在一个可选的实现方式中,接收存储设备或注册设备发送的注册声纹信息及注册声纹信息对应的声纹标识之前还包括:向存储设备或注册设备发送声纹标识,以请求用户的注册声纹信息。In an optional implementation manner, before receiving the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the storage device or the registered device, the method further includes: sending the voiceprint identifier to the storage device or the registered device to request the user's Register voiceprint information.
在一个可选的实现方式中,消除语音中的与识别设备相关的噪声以得到目标声纹信息可以包括将语音输入声纹提取模型,通过声纹提取模型消除语音中与识别设备相关的噪声以得到目标声纹信息;本申请实施例中,通过机器学习的方式,预先训练好声纹提取模型,通过已经训练好的声纹提取模型直接提取语音中的用户的生物声纹,具有较好的鲁棒性。In an optional implementation manner, removing noise related to the recognition device in the voice to obtain target voiceprint information may include inputting the voice into a voiceprint extraction model, and removing the noise related to the recognition device in the voice through the voiceprint extraction model Obtain target voiceprint information; in the embodiment of this application, the voiceprint extraction model is pre-trained through machine learning, and the biological voiceprint of the user in the voice is directly extracted through the trained voiceprint extraction model, which has better results Robustness.
在一个可选的实现方式中,声纹提取模型为对多个设备采集的语料进行学习得到的;本申请实施例中,该声纹提取模型输出的声纹信息可以消除不同设备本身对于语音的影响以得到用户的生物声纹。In an optional implementation manner, the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; in this embodiment of the present application, the voiceprint information output by the voiceprint extraction model can eliminate the voice effects of different devices themselves. Influence to get the user’s biological voiceprint.
在一个可选的实现方式中,将目标声纹信息与注册声纹信息进行匹配以得到匹配结果之后还包括:获取与用户身份关联的用户数据,该用户数据可以为该用户操作的历史数据,执行与用户数据对应的操作;本申请实施例中,识别设备获取该用户的用户数据,执行与该历史信息对应的操作,不需要用户每次进行习惯性操作,提高用户体验。In an optional implementation manner, after matching the target voiceprint information with the registered voiceprint information to obtain the matching result, the method further includes: obtaining user data associated with the user identity, and the user data may be historical data of the user's operation. Perform operations corresponding to user data; in the embodiment of the present application, the identification device obtains user data of the user and performs operations corresponding to the historical information, eliminating the need for the user to perform habitual operations each time, thereby improving user experience.
第四方面,本申请实施例提供了一种跨设备声纹识别的方法,应用于注册设备,可以包括:接收用户输入的语音;消除语音中与注册设备相关的噪声以得到注册声纹信息;注册声纹信息包括用户的生物声纹;本申请实施例中,注册设备将接收的用户的语音进行提取,消除语音中与设备相关的噪声,得到用户“干净”的生物声纹(该生物声纹与人面对面说话时的声纹最为接近),由于降低了设备对语音的影响,提取的生物声纹信息可以作为多个设备共享的注册声纹信息,实现跨设备声纹识别,可以极大的提升用户体验。In a fourth aspect, an embodiment of the present application provides a cross-device voiceprint recognition method, which is applied to a registered device, and may include: receiving a voice input by a user; eliminating noise related to the registered device in the voice to obtain registered voiceprint information; The registered voiceprint information includes the user’s biological voiceprint; in this embodiment, the registration device extracts the received user’s voice, eliminates device-related noise in the voice, and obtains the user’s "clean" biological voiceprint (the biological voice The fingerprint is the closest to the voiceprint of a person when speaking face-to-face). Because the device’s impact on speech is reduced, the extracted biological voiceprint information can be used as registered voiceprint information shared by multiple devices to achieve cross-device voiceprint recognition, which can greatly Enhance the user experience.
在一个可选的实现方式中,注册设备向存储设备或识别设备发送注册声纹信息及注册声纹信息对应的声纹标识;注册设备向存储设备或识别设备发送注册声纹信息及注册声纹信息对应的声纹标识,将存储设备或识别设备作为注册声纹信息及声纹标识的存储中心,为注册声纹信息的共享提供基础。In an optional implementation, the registration device sends the registered voiceprint information and the voiceprint identification corresponding to the registered voiceprint information to the storage device or the recognition device; the registration device sends the registered voiceprint information and the registered voiceprint to the storage device or the recognition device The voiceprint identification corresponding to the information uses the storage device or recognition device as the storage center for registered voiceprint information and voiceprint identification, providing a basis for the sharing of registered voiceprint information.
在一个可选的实现方式中,消除语音中与注册设备相关的噪声以得到生物声纹信息可以包括:将语音输入声纹提取模型,通过声纹提取模型消除语音中的与注册设备相关的噪声以得到注册声纹信息;本申请实施例中,通过机器学习的方式,预先训练好声纹提取模型,通过已经训练好的声纹提取模型直接提取语音中的用户的生物声纹,具有较好的鲁棒性。In an optional implementation manner, eliminating the noise related to the registered device in the voice to obtain the biological voiceprint information may include: inputting the voice into the voiceprint extraction model, and eliminating the noise related to the registered device in the voice through the voiceprint extraction model In order to obtain registered voiceprint information; in the embodiment of this application, the voiceprint extraction model is pre-trained through machine learning, and the biological voiceprint of the user in the voice is directly extracted through the trained voiceprint extraction model. Robustness.
在一个可选的实现方式中,声纹提取模型为对多个设备采集的语料进行学习得到的;本申请实施例中,该声纹提取模型输出的声纹信息可以消除不同设备本身对于语音的影响以得到用户的生物声纹。In an optional implementation manner, the voiceprint extraction model is obtained by learning the corpus collected by multiple devices; in this embodiment of the present application, the voiceprint information output by the voiceprint extraction model can eliminate the voice effects of different devices themselves. Influence to get the user’s biological voiceprint.
在一个可选的实现方式中,消除语音中与注册设备相关的噪声以得到生物声纹信息可 以包括:通过滤波器对语音进行频响补偿,频响补偿用于消除语音中的与识别设备相关的噪声以得到注册声纹信息;本申请实施例中,通过滤波器对语音信号进行处理,将凸显的信号衰减,将较弱的信号进行增强,从而对频响进行补偿,从而达到去除设备相关的噪声的目的,经过滤波器过滤方式,是直接过滤噪声信号,从而输出的信号为用户的生物声纹信号,实现简单且速度快。In an optional implementation manner, eliminating the noise related to the registered device in the voice to obtain the biological voiceprint information may include: performing frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate the voice related to the recognition device. To obtain the registered voiceprint information; in the embodiment of the application, the voice signal is processed through a filter to attenuate the highlighted signal and strengthen the weaker signal, thereby compensating for the frequency response, so as to achieve the elimination of equipment related The purpose of the noise, through the filter method, is to directly filter the noise signal, so that the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
第五方面,本申请实施例提供了一种声纹识别装置,该声纹识别装置具有实现上述第三方面识别设备所执行的功能;该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现;该硬件或软件包括一个或多个与上述功能相对应的模块,该装置包括:语音输入输出模块,用于接收用户输入的语音;处理模块,用于消除语音输入输出模块接收的语音中的与识别设备相关的噪声以得到目标声纹信息,目标声纹信息为用户的生物声纹;收发模块,用于获取用户的注册声纹信息,注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;处理模块,还用于将处理模块得到的目标声纹信息与收发模块1303获取的注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份。In the fifth aspect, the embodiments of the present application provide a voiceprint recognition device, which has the function of realizing the function performed by the recognition device of the third aspect mentioned above; the function can be realized by hardware, or the corresponding software can be executed by hardware Implementation; the hardware or software includes one or more modules corresponding to the above-mentioned functions, the device includes: a voice input and output module for receiving the voice input by the user; a processing module for eliminating the voice received by the voice input and output module Recognize the noise related to the device to obtain target voiceprint information, the target voiceprint information is the user’s biological voiceprint; the transceiver module is used to obtain the user’s registered voiceprint information, which includes the elimination of noise related to the registered device The obtained at least one biological voiceprint template; the processing module is also used to match the target voiceprint information obtained by the processing module with the registered voiceprint information obtained by the transceiver module 1303 to obtain a matching result, and the matching result is used to indicate the user's identity.
第六方面,本申请实施例提供了一种声纹注册装置,该声纹识别装置具有实现上述第四方面注册设备所执行的功能;该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现;该硬件或软件包括一个或多个与上述功能相对应的模块,该装置包括:语音输入输出模块,用于接收用户输入的语音;处理模块,用于消除语音输入输出模块接收的语音中与注册设备相关的噪声以得到用户的注册声纹信息。In the sixth aspect, the embodiments of the present application provide a voiceprint registration device. The voiceprint recognition device can realize the function performed by the registration device in the fourth aspect; the function can be realized by hardware, or the corresponding software can be executed by hardware. Implementation; the hardware or software includes one or more modules corresponding to the above-mentioned functions, the device includes: a voice input and output module for receiving the voice input by the user; a processing module for eliminating the voice received by the voice input and output module The noise associated with the registered device is used to obtain the user's registered voiceprint information.
第七方面,本申请实施例提供了一种跨设备声纹识别系统,该系统包括注册设备和识别设备;注册设备接收用户输入的第一语音,并消除第一语音中的与识别设备相关的噪声以得到注册声纹信息,注册声纹信息包括用户的生物声纹,该注册声纹信息为识别设备进行声纹识别提供基础;识别设备接收用户输入的第二语音;并消除第二语音中的与识别设备相关的噪声以得到目标声纹信息,目标声纹信息为用户的生物声纹;识别设备将目标声纹信息与注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份。本申请实施例中,注册声纹信息是消除了语音中与注册设备本身相关的噪声后,提取的该语音中用户本身的生物声纹信息,由于消除了注册设备对于生物声纹的影响,可以将该注册声纹信息共享给其他声纹识别设备使用,用户只需要在一个设备(注册设备)上注册声纹,就可以在多个设备实现进行声纹识别功能,不需要在每个设备上都进行声纹注册,省去繁杂的声纹注册过程,实现跨设备声纹识别,可以极大的提升用户体验,并且注册声纹信息与目标声纹信息均是消除了设备相关的噪声后的声纹信息,消除了设备对声纹的影响,用户无论用哪种声纹设备进行声纹识别,都具有较高的准确率。In a seventh aspect, an embodiment of the present application provides a cross-device voiceprint recognition system, which includes a registration device and a recognition device; the registration device receives the first voice input by the user, and eliminates the recognition device-related information in the first voice. Noise is used to obtain registered voiceprint information. The registered voiceprint information includes the user’s biological voiceprint. The registered voiceprint information provides a basis for the recognition device to perform voiceprint recognition; the recognition device receives the second voice input by the user; and eliminates the second voice. The noise associated with the recognition device is used to obtain the target voiceprint information, which is the user’s biological voiceprint; the recognition device matches the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to instruct the user identity. In the embodiment of this application, the registered voiceprint information refers to the extracted biological voiceprint information of the user in the voice after eliminating the noise related to the registered device itself. Since the influence of the registered device on the biological voiceprint is eliminated, Share the registered voiceprint information with other voiceprint recognition devices. The user only needs to register the voiceprint on one device (registered device), and the voiceprint recognition function can be implemented on multiple devices, without the need for each device Voiceprint registration is performed, eliminating the complicated voiceprint registration process, and realizing cross-device voiceprint recognition, which can greatly improve the user experience, and the registered voiceprint information and target voiceprint information are both after eliminating device-related noise The voiceprint information eliminates the influence of the device on the voiceprint. No matter which voiceprint device the user uses for voiceprint recognition, it has a high accuracy rate.
在一种可选的实现方式中,系统还包括存储设备;存储设备接收注册设备发送的注册声纹信息及注册声纹信息对应的声纹标识并进行存储,声纹标识用于指示用户;识别设备接收存储设备发送的注册声纹信息和注册声纹信息对应的声纹信息。本申请实施例中,存储设备存储该注册声纹信息,为将该注册声纹信息共享给其他设备使用提供基础,当该用户需要使用其他设备(也称为声纹识别设备)进行声纹识别时,不需要再在该声纹识别设备上进行声纹注册了,而是可以共享在注册设备上已经注册的注册声纹信息。In an optional implementation manner, the system further includes a storage device; the storage device receives and stores the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the registration device, the voiceprint identifier is used to indicate the user; The device receives the registered voiceprint information and the voiceprint information corresponding to the registered voiceprint information sent by the storage device. In this embodiment of the application, the storage device stores the registered voiceprint information to provide a basis for sharing the registered voiceprint information with other devices. When the user needs to use other devices (also called voiceprint recognition devices) for voiceprint recognition When you do not need to register the voiceprint on the voiceprint recognition device, you can share the registered voiceprint information that has been registered on the registered device.
第八方面,本实施例提供一种芯片,包括处理器和存储器,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得识别设备执行上述第三方面任一项的方法,或者,使得注册设备执行上述第四方面任一项的方法。In an eighth aspect, this embodiment provides a chip including a processor and a memory. The memory is used to store a program or instruction. When the program or instruction is executed by the processor, the identification device can execute the method of any one of the foregoing third aspect. Alternatively, the registration device is caused to execute the method of any one of the foregoing fourth aspects.
第九方面,本实施例提供一种计算机可读介质,用于储存计算机程序或指令,计算机程序或指令被执行时使得计算机执行上述第三方面任一项的方法,或者使得计算机执行上述第四方面任一项的方法。In the ninth aspect, this embodiment provides a computer-readable medium for storing a computer program or instruction. When the computer program or instruction is executed, the computer executes any of the methods of the third aspect, or the computer executes the fourth Any one of the methods.
附图说明Description of the drawings
图1为本申请实施例中通信系统一个实施例的示意图;FIG. 1 is a schematic diagram of an embodiment of a communication system in an embodiment of this application;
图2为本申请实施例中跨设备声纹识别的一个实施例的流程示意图;2 is a schematic flowchart of an embodiment of cross-device voiceprint recognition in an embodiment of this application;
图3为本申请实施例中注册设备声纹注册的一个实施例的步骤流程示意图;FIG. 3 is a schematic diagram of a process flow of an embodiment of voiceprint registration of a registered device in an embodiment of the application;
图4为本申请实施例中存储注册声纹信息的一个实施例的场景示意图;4 is a schematic diagram of an embodiment of storing registered voiceprint information in an embodiment of the application;
图5为本申请实施例中存储注册声纹信息的另一个实施例的场景示意图;5 is a schematic diagram of another embodiment of storing registered voiceprint information in an embodiment of the application;
图6为本申请实施例中存储注册声纹信息的另一个实施例的场景示意图;6 is a schematic diagram of another embodiment of storing registered voiceprint information in an embodiment of the application;
图7为本申请实施例中声纹识别的一个实施例的步骤流程示意图;FIG. 7 is a schematic diagram of a process flow of an embodiment of voiceprint recognition in an embodiment of this application;
图8为本申请实施例中跨设备声纹识别的一个应用场景的示意图;FIG. 8 is a schematic diagram of an application scenario of cross-device voiceprint recognition in an embodiment of this application;
图9为本申请实施例中跨设备声纹识别的另一个应用场景的示意图;FIG. 9 is a schematic diagram of another application scenario of cross-device voiceprint recognition in an embodiment of this application;
图10为本申请实施例中训练声纹提取模型的示意图;FIG. 10 is a schematic diagram of training a voiceprint extraction model in an embodiment of the application;
图11A为本申请实施例中未经过频响补偿的曲线示意图;FIG. 11A is a schematic diagram of a curve without frequency response compensation in an embodiment of the application;
图11B为本申请实施例中经过频响补偿的曲线示意图;FIG. 11B is a schematic diagram of a curve after frequency response compensation in an embodiment of the application;
图12为本申请实施例中生成至少一个声纹信息模板的示意图;FIG. 12 is a schematic diagram of generating at least one voiceprint information template in an embodiment of this application;
图13为本申请实施例中一种装置的一个实施例的示意图;FIG. 13 is a schematic diagram of an embodiment of a device in an embodiment of the application;
图14为本申请实施例中一种装置的另一个实施例的示意图。FIG. 14 is a schematic diagram of another embodiment of a device in an embodiment of this application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
本申请实施例提供了一种跨设备声纹识别方法,该跨设备声纹识别方法是指在一个设备上完成声纹注册,可以在其他未注册的多个设备上使用该已经注册的声纹信息,从而完成相应的任务。多个设备既可以包括同一类型设备,也可以包括不同类型设备。The embodiment of the present application provides a cross-device voiceprint recognition method. The cross-device voiceprint recognition method refers to the completion of voiceprint registration on one device, and the registered voiceprint can be used on multiple unregistered devices. Information to complete the corresponding task. Multiple devices may include the same type of equipment or different types of equipment.
该方法应用于一种通信系统,该通信系统架构的一个示例,请参阅图1所示,该通信 系统包括服务器101和多个终端102。其中,该服务器101可以为一个服务器,也可以为一个服务器集群,或者该服务器也可以为云服务器;终端102可以是智慧家庭(smart home)中的终端,包括但不限于智能家电(如智慧屏、电视、智能洗衣机、智能空调等)、照明系统、智能音箱、机器人等;该终端102还可以是车载终端、用户设备(user equipment,UE)、手机(mobile phone)、平板电脑(pad)、个人电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备;作为示例而非限定,在本申请中,终端还可以是可穿戴设备。可穿戴设备也可以称为穿戴式智能设备,是应用穿戴式技术对日常穿戴进行智能化设计的可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。This method is applied to a communication system. For an example of the communication system architecture, please refer to FIG. 1. The communication system includes a server 101 and a plurality of terminals 102. The server 101 may be a server, a server cluster, or a cloud server; the terminal 102 may be a terminal in a smart home, including but not limited to smart home appliances (such as smart screens). , Televisions, smart washing machines, smart air conditioners, etc.), lighting systems, smart speakers, robots, etc.; the terminal 102 can also be a vehicle-mounted terminal, user equipment (UE), mobile phone, tablet computer (pad), Personal computers, virtual reality (VR) terminal devices, augmented reality (AR) terminal devices; as an example and not a limitation, in this application, the terminal may also be a wearable device. Wearable devices can also be called wearable smart devices. It is a general term for wearable devices that use wearable technology to intelligently design everyday wear, such as glasses, gloves, watches, clothing and shoes. A wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-sized, complete or partial functions that can be achieved without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to cooperate with other devices such as smart phones. Use, such as all kinds of smart bracelets and smart jewelry for physical sign monitoring.
本申请中,终端102可以是物联网(internet of things,IoT)系统中的终端,IoT是未来信息技术发展的重要组成部分,其主要技术特点是将物品通过通信技术与网络连接,从而实现人机互连,物物互连的智能化网络。为了更好的说明本申请实施例,首先对本申请中涉及的词语进行说明。In this application, the terminal 102 may be a terminal in the Internet of Things (IoT) system. IoT is an important part of the development of information technology in the future. Machine interconnection, an intelligent network of interconnection of things. In order to better illustrate the embodiments of the present application, first, the words involved in the present application will be described.
设备相关的噪声:设备从接收语音到对语音信号进行处理的过程中,增加了和设备相关的噪声,和设备相关的噪声包括但不限定于信道噪声,编码噪声,麦克风物理特性及数量、增益、距离、环境、前处理算法等带来的相关噪声。可以理解的是,通过注册设备录入用户的语音,该语音中的声纹信息会受设备的影响,从而发生变化,即不再是“干净”的声纹,而是加入的了与设备相关的噪音之后的声纹。如不同设备的信道不同,噪音或存在变化。例如,由于信号在传输时会发生信号衰减或信号延时,所以声音信号在传输时是会发生变化。而声音信号是由不同频率的信号组合而成,因此在声音的传输过程中,如果各个频率的信号的衰减程度或延时程度不统一时,则接收到的声音信号就会发生扭曲变形,这就会造成信道失真。或者在信号传输时,要将模拟信号转化为数字信号传输,也可能引入扭曲变形。在传输的过程中,由于传输信道的带宽有限,因此,在编码时,比特率越多,则能传输更多的高质量信号。然而,信号在编解码时,并不能完全做到无损变换。因此,经过编解码的语音信号或多或少都会有一定的损失。不同的设备对于声音信号的影响可能是不同的。Device-related noise: The device-related noise is added during the process from the device receiving the voice to the voice signal processing. The device-related noise includes but not limited to channel noise, coding noise, microphone physical characteristics and quantity, and gain Related noise caused by, distance, environment, pre-processing algorithms, etc. It is understandable that if the user's voice is entered through the registered device, the voiceprint information in the voice will be affected by the device, and thus change, that is, it is no longer a "clean" voiceprint, but a device-related voiceprint is added. Voiceprint after the noise. If the channels of different devices are different, the noise may vary. For example, because the signal will undergo signal attenuation or signal delay during transmission, the sound signal will change during transmission. The sound signal is composed of signals of different frequencies. Therefore, in the sound transmission process, if the attenuation or delay of the signals of each frequency is not uniform, the received sound signal will be distorted. Will cause channel distortion. Or during signal transmission, it is necessary to convert analog signals into digital signals for transmission, and distortion may also be introduced. In the process of transmission, due to the limited bandwidth of the transmission channel, during encoding, the higher the bit rate, the more high-quality signals can be transmitted. However, when the signal is coded and decoded, it cannot be completely lossless. Therefore, the encoded and decoded speech signal will have a certain loss more or less. Different devices may have different effects on the sound signal.
声纹标识:该声纹标识用于区分注册声纹信息,每个注册声纹信息具有唯一的声纹标识,本申请中的声纹标识可以为针对每个用户独立拥有的通用账号,如该声纹标识可以为用户ID(identity),该用户ID可以用于执行设备的相关操作的通用账号,如若设备为手机,通过该通用账号可以使用完整的服务,如下载软件,数据同步,手机定位等服务。若该设备为电视,可以执行用户相关数据推荐(喜欢的节目)等。或者声纹标识也可以为声纹信息识别功能的专用账号或者标识。一个声纹标识可以对应同一个用户的至少一个生物声纹模板,该声纹标识用于指示用户。本申请实施例中的声纹标识可以以用户ID为例进行说明。Voiceprint ID: This voiceprint ID is used to distinguish registered voiceprint information. Each registered voiceprint information has a unique voiceprint ID. The voiceprint ID in this application can be a universal account independently owned by each user, such as this The voiceprint identification can be a user ID (identity), which can be used for a general account for performing device-related operations. If the device is a mobile phone, the general account can use complete services, such as downloading software, data synchronization, and mobile phone positioning. And other services. If the device is a TV, it can perform user-related data recommendation (favorite programs) and so on. Or the voiceprint identification may also be a dedicated account or identification for the voiceprint information identification function. One voiceprint identifier may correspond to at least one biological voiceprint template of the same user, and the voiceprint identifier is used to indicate the user. The voiceprint identification in the embodiment of the present application may be described by taking a user ID as an example.
注册声纹信息:消除了与注册设备相关的噪声之后的用户的生物声纹的相关信息,注册声纹信息包括至少一个生物声纹模板(即一个用于做匹配或识别用的预设生物声纹)。Registered voiceprint information: information related to the user’s biological voiceprint after eliminating the noise related to the registered device. The registered voiceprint information includes at least one biological voiceprint template (that is, a preset biological voiceprint for matching or recognition). Pattern).
用户的生物声纹:用户本身的声纹,与收录语音的设备无关,相当于听话人面对面听到的说话人发出的声纹。User's biological voiceprint: The user's own voiceprint, which has nothing to do with the device that records the voice, is equivalent to the voiceprint of the speaker that the listener hears face-to-face.
本申请中,将图1对应的通信系统中所包含的设备按照功能进行区分,包括:注册设备、存储设备和声纹识别设备。该注册设备用于对用户的语音进行注册。该注册设备提取语音中的注册声纹信息,该注册声纹信息包括分离该注册设备相关的噪声后得到的用户本身的生物声纹。(即注册声纹信息);存储设备(也可以称为声纹共享设备)用于对用户的注册声纹信息进行存储,提供注册声纹信息共享服务。声纹识别设备为对用户语音进行声纹识别的设备(可能是未对用户声纹进行注册的设备),但是可以通过共享的注册声纹信息实现声纹识别功能。In this application, the devices included in the communication system corresponding to FIG. 1 are classified according to their functions, including: registered devices, storage devices, and voiceprint recognition devices. The registration device is used to register the user's voice. The registration device extracts registered voiceprint information from the voice, and the registered voiceprint information includes the user's own biological voiceprint obtained after separating the noise related to the registered device. (Ie, registered voiceprint information); the storage device (also called voiceprint sharing device) is used to store the user's registered voiceprint information and provide a registered voiceprint information sharing service. The voiceprint recognition device is a device that performs voiceprint recognition on the user's voice (may be a device that has not registered the user's voiceprint), but the voiceprint recognition function can be realized through the shared registered voiceprint information.
注册设备(或称为声纹注册装置)可以是通信系统中多个终端中的任意终端,例如,该终端可以为手机,平板电脑等。The registration device (or called a voiceprint registration device) may be any terminal among multiple terminals in the communication system. For example, the terminal may be a mobile phone, a tablet computer, or the like.
存储设备可以是服务器,云服务器,或者可以是多个终端中任意终端,如可以是多个终端中的至少一个终端,或者也可以是多个终端中的每个终端。The storage device may be a server, a cloud server, or any terminal among multiple terminals, for example, it may be at least one terminal among multiple terminals, or it may also be each terminal among multiple terminals.
声纹识别设备(或称为识别设备,或称为声纹识别装置),可以为多个终端中未进行声纹注册的终端。例如,若注册终端为手机和平板电脑,则声纹识别设备可以为可穿戴设备、车载终端、智慧屏、耳机、个人电脑等。The voiceprint recognition device (or called the recognition device, or called the voiceprint recognition device) may be a terminal that has not registered the voiceprint among multiple terminals. For example, if the registered terminal is a mobile phone or a tablet computer, the voiceprint recognition device may be a wearable device, a vehicle-mounted terminal, a smart screen, a headset, a personal computer, etc.
本申请中,声纹识别设备也称为“第一终端”,注册设备也称为“第二终端”。In this application, the voiceprint recognition device is also referred to as the "first terminal", and the registered device is also referred to as the "second terminal".
需要说明的是,上述注册设备、存储设备和声纹识别设备仅是为了方便说明从各个设备从功能层面对设备的划分,并非限定每个设备只能执行一种功能。例如,一个终端既可以是注册设备,也可以是存储设备,也可以是声纹识别设备,如手机即可以用于注册用户的声纹,同时也可以用于存储注册声纹信息,也可以用于用户声纹的识别;一个终端既可以是存储设备,也可以是声纹识别设备,例如,个人电脑既可以用于存储注册声纹信息,又可以为声纹识别设备。It should be noted that the above-mentioned registration device, storage device and voiceprint recognition device are only for the convenience of explaining the division of devices from the functional level of each device, and it is not limited that each device can only perform one function. For example, a terminal can be either a registered device, a storage device, or a voiceprint recognition device. For example, a mobile phone can be used to register the user’s voiceprint, and it can also be used to store registered voiceprint information. For the recognition of the user's voiceprint; a terminal can be either a storage device or a voiceprint recognition device. For example, a personal computer can be used to store registered voiceprint information as well as a voiceprint recognition device.
本申请实施例中,注册设备接收用户输入的语音,分离该语音中与注册设备本身相关的噪声,提取该语音中用户的生物声纹信息,对该声纹信息进行注册得到注册声纹信息,由于该注册声纹信息是消除了与设备相关的噪声之后的声纹信息,消除了设备对于用户生物声纹的影响,该注册声纹信息可以共享给其他的设备使用,存储设备存储该注册声纹信息,为将该注册声纹信息共享给其他设备使用提供基础,当该用户需要使用其他设备(也称为声纹识别设备)进行声纹识别时,不需要再在该声纹识别设备上进行声纹注册了,而是可以共享在注册设备上已经注册的注册声纹信息,声纹识别设备接收该用户输入的语音,分离该语音中与该声纹识别设备本身相关的噪声,提取该语音中用户的生物声纹信息(也称为目标声纹信息),声纹识别设备从存储设备获取该注册声纹信息,将该注册声纹信息与该目标声纹信息进行匹配,从而进行声纹识别以识别用户身份。本申请实施例中,注册声纹信息是消除了语音中与注册设备本身相关的噪声后,提取的该语音中用户本身的生物声纹信息,由于消除了注册设备对于生物声纹的影响,可以将该注册声纹信息共享给其他声 纹识别设备使用,用户只需要在一个设备(注册设备)上注册声纹,就可以在多个设备实现进行声纹识别功能,不需要在每个设备上都进行声纹注册,省去繁杂的声纹注册过程,实现跨设备声纹识别,可以极大的提升用户体验,并且注册声纹信息与目标声纹信息均是分离了设备相关的噪声后的声纹信息,消除了设备对声纹的影响,用户无论用哪种声纹设备进行声纹识别,都具有较高的准确率。In the embodiment of the present application, the registration device receives the voice input by the user, separates the noise related to the registration device itself in the voice, extracts the user's biological voiceprint information in the voice, and registers the voiceprint information to obtain the registered voiceprint information, Since the registered voiceprint information is the voiceprint information after the device-related noise is eliminated, and the device’s influence on the user’s biological voiceprint is eliminated, the registered voiceprint information can be shared with other devices, and the storage device stores the registered voice. The pattern information provides a basis for sharing the registered voiceprint information with other devices. When the user needs to use other devices (also known as voiceprint recognition devices) for voiceprint recognition, there is no need to use the voiceprint recognition device Voiceprint registration is performed, but the registered voiceprint information registered on the registered device can be shared. The voiceprint recognition device receives the voice input by the user, separates the noise in the voice related to the voiceprint recognition device itself, and extracts the The user’s biological voiceprint information (also called target voiceprint information) in the voice, the voiceprint recognition device obtains the registered voiceprint information from the storage device, and matches the registered voiceprint information with the target voiceprint information to perform voice Pattern recognition to identify the user's identity. In the embodiment of this application, the registered voiceprint information refers to the extracted biological voiceprint information of the user in the voice after eliminating the noise related to the registered device itself. Since the influence of the registered device on the biological voiceprint is eliminated, Share the registered voiceprint information with other voiceprint recognition devices. The user only needs to register the voiceprint on one device (registered device), and the voiceprint recognition function can be implemented on multiple devices, without the need for each device Voiceprint registration is performed, eliminating the complicated voiceprint registration process and realizing cross-device voiceprint recognition, which can greatly improve user experience, and both registered voiceprint information and target voiceprint information are separated from device-related noise The voiceprint information eliminates the influence of the device on the voiceprint. No matter which voiceprint device the user uses for voiceprint recognition, it has a high accuracy rate.
本申请主要包括三个阶段,1)注册声纹信息的采集阶段;2)注册声纹信息存储阶段,3)说话人声纹识别阶段。请参阅图2所示,在注册声纹信息的采集阶段:消除语音中与设备相关的噪声以得到用户的生物声纹,将该生物声纹记录在已存在的用户ID之下,即对该生物声纹进行注册,得到注册声纹信息;在注册声纹信息存储阶段:将注册声纹信息与用户ID关联存储,为注册声纹信息的共享提供基础;在说话人识别阶段,将注册声纹信息与目标声纹信息进行匹配,得到匹配结果,该匹配结果用于指示用户身份。This application mainly includes three stages, 1) the collection stage of registered voiceprint information; 2) the storage stage of registered voiceprint information, and 3) the speaker's voiceprint recognition stage. Please refer to Figure 2. In the collection stage of registered voiceprint information: Eliminate the device-related noise in the voice to obtain the user's biometric voiceprint, and record the biometric voiceprint under the existing user ID. The biological voiceprint is registered to obtain the registered voiceprint information; in the registered voiceprint information storage stage: the registered voiceprint information is stored in association with the user ID to provide a basis for the sharing of registered voiceprint information; in the speaker recognition stage, the registered voice The pattern information is matched with the target voiceprint information to obtain a matching result, which is used to indicate the identity of the user.
首先,针对1)注册声纹信息的采集阶段,该阶段的执行主体为注册设备,或该阶段的执行主体为该注册设备中的处理器,芯片或芯片系统。请参阅图3所示,该阶段的执行主体以注册设备为例,该注册设备可以执行如下步骤:步骤301、注册设备接收用户录入的语音。该语音并不关注语音本身的文本内容,对于语音的文本内容并不限定,主要用于提取声纹信息。该处理具体可以通过语音输入输出设备采集语音,具体可以参照图14的具体介绍,本处不限定。步骤302、注册设备消除该语音中与注册设备相关的噪声以得到用户的生物声纹。具体处理方案可参照后续噪声消除方案的具体介绍,此处不赘述。First, for 1) the registration voiceprint information collection stage, the execution subject of this stage is the registration device, or the execution subject of this stage is the processor, chip or chip system in the registration device. Please refer to Figure 3, the execution body of this stage takes the registered device as an example, and the registered device can perform the following steps: Step 301: The registered device receives the voice entered by the user. The voice does not pay attention to the text content of the voice itself, and is not limited to the text content of the voice, and is mainly used to extract voiceprint information. The processing may specifically collect voice through a voice input and output device. For details, refer to the specific introduction of FIG. 14, which is not limited here. Step 302: The registration device eliminates noise related to the registration device in the voice to obtain the user's biological voiceprint. The specific processing scheme can refer to the specific introduction of the subsequent noise elimination scheme, which will not be repeated here.
步骤303、注册设备对用户的生物声纹进行注册,以得到所述用户的注册声纹信息。“注册”是指:将用户的生物声纹记录在已存在的声纹标识之下的操作,或者可以理解为,建立用户的生物声纹与该声纹标识的对应关系的操作,或者可以理解为,为用户的生物声纹配置所述用户的所述声纹标识的操作。该声纹标识可以为执行设备的相关操作的通用账号。例如,该注册设备以手机为例,用户在购买手机之后,就会注册一个用户ID,通过该用户ID可以使用下载软件,手机定位等服务。该声纹标识也可以为声纹识别功能的专用账号,例如可以将用户的手机号码作为该专用账号。为了区分已经被“记录”的声纹信息和未被“记录”的声纹信息,将已被“记录”的声纹信息,或已经配置了声纹标识的生物声纹称为“注册声纹信息”。注册设备建立用户的生物声纹与用户ID的对应关系,完成对用户的生物声纹的注册过程。该注册声纹信息包括步骤302中的用户的生物声纹。Step 303: The registration device registers the user's biological voiceprint to obtain the user's registered voiceprint information. "Registration" refers to the operation of recording the user's biological voiceprint under an existing voiceprint logo, or it can be understood as the operation of establishing the correspondence between the user's biological voiceprint and the voiceprint logo, or it can be understood In order to configure the user's voiceprint identification operation for the user's biological voiceprint. The voiceprint identifier may be a general account for performing related operations of the device. For example, the registered device uses a mobile phone as an example. After a user purchases a mobile phone, he will register a user ID, and the user ID can use services such as downloading software and mobile phone location. The voiceprint identification may also be a dedicated account for the voiceprint recognition function, for example, the user's mobile phone number may be used as the dedicated account. In order to distinguish the voiceprint information that has been "recorded" from the voiceprint information that has not been "recorded", the voiceprint information that has been "recorded" or the biological voiceprint that has been configured with voiceprint identification is called "registered voiceprint" information". The registration device establishes the correspondence between the user's biological voiceprint and the user ID, and completes the registration process of the user's biological voiceprint. The registered voiceprint information includes the biological voiceprint of the user in step 302.
该注册设备可以作为存储设备用于存储该注册声纹信息。可选的,注册设备也可以将将注册声纹信息与对应的用户ID发送至其他的存储设备,由存储设备存储该注册声纹信息,存储设备可以为云服务器,服务器或其他终端。The registered device can be used as a storage device to store the registered voiceprint information. Optionally, the registration device may also send the registered voiceprint information and the corresponding user ID to other storage devices, and the storage device stores the registered voiceprint information. The storage device may be a cloud server, a server or other terminals.
本申请中,在注册声纹信息的采集阶段,将接收的用户的语音进行提取,消除语音中与设备相关的噪声,得到用户“干净”的生物声纹(该生物声纹与人面对面说话时的声纹最为接近),由于降低了设备对语音的影响,提取的生物声纹信息可以作为多个设备共享的注册声纹信息。In this application, in the collection stage of registered voiceprint information, the received user’s voice is extracted to eliminate device-related noise in the voice and obtain the user’s "clean" biological voiceprint (when the biological voiceprint speaks face-to-face with a person). The voiceprint of is the closest), because the device’s impact on the voice is reduced, the extracted biological voiceprint information can be used as registered voiceprint information shared by multiple devices.
然后,针对2)注册声纹信息存储(或注册声纹信息共享)阶段,存储该注册声纹信息的执行主体可以为存储设备,或者也可以为存储设备中的存储器。例如,若该存储设备 为服务器,则存储该注册声纹信息的执行主体可以为该服务器,或者也可以为该服务器中的存储器。存储设备接收注册设备发送的注册声纹信息及对应的用户ID,存储设备存储该注册声纹信息及对应的用户ID。可选的,注册设备还可以将注册声纹信息的注册时间、注册设备的标识、注册设备接收到的语音强度等信息发送至存储设备,存储设备将上述信息与用户ID关联存储。该注册时间用于记录声纹信息注册的时间,可以用于提示用户根据该注册时间进行定时更新。该注册设备的标识用于存储设备识别注册设备,当存储设备接收到用户ID时,可以识别出是注册设备发送的,还是非注册设备(声纹识别设备)发送。通过语音强度信息可以表征收录注册声纹信息时,说话人与注册设备的距离的远近,用户ID可以对应同一个用户的多个生物声纹模板,多个生物声纹模板可以对应相同或不同的语音强度信息。注册设备每进行一次采集、噪声消除和注册,即通过一个执行步骤301至步骤303的过程,可以得到一个生物声纹模板;注册设备多次执行上述过程可以得到多个生物声纹模板。当多个生物声纹模板对应不同的语音强度信息时,可以使得同一个用户的多个生物声纹模板覆盖到多种情况(不同远近收录声纹的情况),提高声纹识别的准确度。Then, for the stage of 2) registered voiceprint information storage (or registered voiceprint information sharing), the execution subject of storing the registered voiceprint information may be a storage device, or may also be a memory in the storage device. For example, if the storage device is a server, the execution subject that stores the registered voiceprint information may be the server, or may also be a memory in the server. The storage device receives the registered voiceprint information and the corresponding user ID sent by the registration device, and the storage device stores the registered voiceprint information and the corresponding user ID. Optionally, the registration device may also send information such as the registration time of the registered voiceprint information, the identification of the registered device, and the voice intensity received by the registered device to the storage device, and the storage device stores the foregoing information in association with the user ID. The registration time is used to record the time when the voiceprint information is registered, and can be used to prompt the user to update regularly according to the registration time. The identification of the registered device is used for the storage device to identify the registered device. When the storage device receives the user ID, it can be identified whether it is sent by the registered device or by an unregistered device (voiceprint recognition device). The voice intensity information can be used to indicate the distance between the speaker and the registered device when the registered voiceprint information is collected. The user ID can correspond to multiple biological voiceprint templates of the same user, and multiple biological voiceprint templates can correspond to the same or different ones. Voice strength information. Each time the registration device performs collection, noise elimination, and registration, one biological voiceprint template can be obtained through a process of performing step 301 to step 303; the registration device can obtain multiple biological voiceprint templates by performing the above process multiple times. When multiple biological voiceprint templates correspond to different voice intensity information, multiple biological voiceprint templates of the same user can be covered in multiple situations (the voiceprint is recorded in different distances), and the accuracy of voiceprint recognition can be improved.
可选的,在一个示例中,请参阅图4所示,注册设备将注册声纹信息及用户ID发送至云(或服务器),由云(或服务器)关联存储注册声纹信息及用户ID。本示例中,将用户的注册声纹信息存储在云端(或服务器),通过用户ID区分注册声纹信息,可以节省终端的存储空间,并且可以适用更广的应用场景,不但包括室内家庭应用场景,还可以包括室外的场景,例如蜂窝车辆网的应用场景,只要能连接网络的终端都可以共享该注册声纹信息。Optionally, in an example, referring to FIG. 4, the registration device sends the registered voiceprint information and user ID to the cloud (or server), and the cloud (or server) stores the registered voiceprint information and user ID in association. In this example, the user's registered voiceprint information is stored in the cloud (or server), and the registered voiceprint information is distinguished by the user ID, which can save the storage space of the terminal and can be applied to a wider range of application scenarios, not only indoor home application scenarios , It can also include outdoor scenes, such as the application scene of cellular vehicle network, as long as the terminals that can connect to the network can share the registered voiceprint information.
可选的,在第二个示例中,请参阅图5所示,注册设备将注册声纹信息及用户ID发送至云端(或服务器),多个终端中的任意终端(如终端2,也可以称为第三终端)可以根据自身的存储资源的情况将注册声纹信息下载到本地,该注册声纹信息可以存储在第三终端(如平板电脑)中,该第三终端可以是一个局域网中的任意终端,如在家庭场景中,该家庭场景中包括的终端包括手机、智慧屏、智能灯具和平板电脑,该多个终端可以通过无线保真(wireless fidelity,WiFi)、蓝牙、红外、网卡等通信方式连接,当声纹识别设备需要进行声纹识别时,声纹识别设备可以从第三终端快速获取到该注册声纹信息。Optionally, in the second example, referring to Figure 5, the registered device sends the registered voiceprint information and user ID to the cloud (or server), and any of the multiple terminals (such as terminal 2, can also be (Referred to as the third terminal) can download the registered voiceprint information to the local according to its own storage resources, the registered voiceprint information can be stored in a third terminal (such as a tablet), and the third terminal can be in a local area network For example, in a home scene, the terminals included in the home scene include mobile phones, smart screens, smart lamps, and tablets. The multiple terminals can use wireless fidelity (WiFi), Bluetooth, infrared, and network cards. When the voiceprint recognition device needs to perform voiceprint recognition, the voiceprint recognition device can quickly obtain the registered voiceprint information from the third terminal.
可选的,在第三个示例中,参阅图6所示,注册声纹信息可以分布式存储在多个终端(如智慧屏、个人电脑、车载终端和平板电脑等)中的每个终端中。如各个终端向云端发送用户ID,各个终端可以从云端(或服务器)接收与该用户ID对应的注册声纹信息。或者,各个终端之间也可以通过蓝牙、WiFi、红外、网卡等方式进行声纹数据共享。各个终端间可以采用定期或者非定期同步共享声纹设备的声纹。用户也可以在各设备上配置不进行声纹更新和不共享声纹。如果各终端间无法通过上述通信方式共享注册声纹数据,当各终端和其他设备连接上也可共享注册声纹信息。Optionally, in the third example, as shown in Figure 6, the registered voiceprint information can be distributed and stored in each of multiple terminals (such as smart screens, personal computers, vehicle-mounted terminals, and tablet computers, etc.) . If each terminal sends a user ID to the cloud, each terminal can receive the registered voiceprint information corresponding to the user ID from the cloud (or server). Alternatively, the voiceprint data can also be shared between various terminals through Bluetooth, WiFi, infrared, network card, etc. The voiceprints of the voiceprint devices can be synchronized and shared regularly or non-periodically between various terminals. Users can also configure not to update the voiceprint and not share the voiceprint on each device. If the terminals cannot share the registered voiceprint data through the above-mentioned communication methods, the registered voiceprint information can also be shared when the terminals are connected to other devices.
各个终端已经存储了该用户ID对应的注册声纹信息,此时,若需要进行声纹识别,声纹识别设备也既是存储设备,声纹识别设备从本地就可以获取到注册声纹信息,不需要从云端或者从其他设备获取到注册声纹信息,即共享了注册声纹信息,又可以快速进行语音识别。Each terminal has stored the registered voiceprint information corresponding to the user ID. At this time, if voiceprint recognition needs to be performed, the voiceprint recognition device is also a storage device. The voiceprint recognition device can obtain the registered voiceprint information from the local. The registered voiceprint information needs to be obtained from the cloud or from other devices, that is, the registered voiceprint information is shared, and voice recognition can be performed quickly.
最后,针对3)说话人声纹识别阶段,该阶段的执行主体是声纹识别设备(也称为第一终端),或者也可以是第一终端中的处理器、芯片或芯片系统,该阶段的执行主体以第一终端为例,第一终端接收用户输入的语音,分离语音中与第一终端相关的噪声,提取用户的生物声纹信息(也称为目标声纹信息),第一终端从存储设备获取注册声纹信息,将该注册声纹信息和目标声纹信息进行匹配,从而进行跨设备声纹识别,若注册声纹信息和目标声纹信息匹配,声纹识别成功,若注册声纹信息和目标声纹信息不匹配,则声纹识别失败。Finally, for 3) the speaker’s voiceprint recognition stage, the main body of this stage is the voiceprint recognition device (also called the first terminal), or it can also be the processor, chip or chip system in the first terminal. Take the first terminal as an example. The first terminal receives the voice input by the user, separates the noise related to the first terminal in the voice, and extracts the user’s biological voiceprint information (also called target voiceprint information), and the first terminal Obtain the registered voiceprint information from the storage device, and match the registered voiceprint information with the target voiceprint information to perform cross-device voiceprint recognition. If the registered voiceprint information matches the target voiceprint information, the voiceprint recognition is successful. If the voiceprint information does not match the target voiceprint information, the voiceprint recognition fails.
请参阅图7所示,在说话人声纹识别阶段,执行主体以第一终端为例进行说明,该第一终端(如智能音箱)可以执行如下步骤:步骤701、第一终端接收用户输入的语音,该过程具体可通过语音输入输出设备的语音采集实现,这里不做展开。步骤702、第一终端消除所述语音中的与所述识别设备相关的噪声以得到目标声纹信息,所述目标声纹信息为所述用户的生物声纹,该过程具体参见后续描述。步骤703、第一终端获取所述用户的注册声纹信息。该注册声纹信息可以是从注册设备或其他设备处获取,或者预先从注册设备或其他设备处获取后存储在第一终端存储器内,并在使用的时候从存储器中读取该注册声纹信息。从注册设备或其他设备处获取该信息的具体过程可参考下面描述。Referring to Figure 7, in the voiceprint recognition stage of the speaker, the execution subject takes the first terminal as an example for description. The first terminal (such as a smart speaker) can perform the following steps: step 701, the first terminal receives the user input Voice, the process can be specifically realized by voice collection of voice input and output devices, which will not be expanded here. Step 702: The first terminal eliminates noise related to the recognition device in the voice to obtain target voiceprint information, where the target voiceprint information is the biological voiceprint of the user. For details of this process, refer to the subsequent description. Step 703: The first terminal obtains the registered voiceprint information of the user. The registered voiceprint information can be obtained from a registered device or other equipment, or obtained from a registered device or other device in advance and stored in the memory of the first terminal, and the registered voiceprint information is read from the memory when it is used . For the specific process of obtaining this information from the registered device or other devices, please refer to the following description.
在一种可能实现的方式中,注册声纹信息及注册声纹信息对应的ID由存储设备或者注册设备存储,该存储设备可以为云服务器,服务器,或者,该存储设备可以为一个终端,或者,该存储设备也可以为多个终端。In a possible implementation manner, the registered voiceprint information and the ID corresponding to the registered voiceprint information are stored by a storage device or a registration device. The storage device may be a cloud server, a server, or the storage device may be a terminal, or , The storage device can also be multiple terminals.
声纹识别设备向存储设备或注册设备发送用户ID,以请求所述用户的注册声纹信息;声纹识别设备接收存储设备或注册设备发送的注册声纹信息及对应的用户ID。The voiceprint recognition device sends a user ID to the storage device or the registration device to request the user's registered voiceprint information; the voiceprint recognition device receives the registered voiceprint information and the corresponding user ID sent by the storage device or the registration device.
在另一种可能的实现方式中,如该存储设备为通信系统中的一个终端或多个终端,该存储设备可以将注册声纹信息及对应的用户ID进行共享(或同步)给其他终端。如可以定期同步,或多个终端连接到同一个局域网即可以同步等。例如,若存储设备为平板电脑,通信系统包括3个终端,平板电脑,电视和手机,当3个终端通信连接时(如通过WiFi连接),平板电脑将存储的注册声纹信息及对应的ID传输给手机和电视,即声纹识别设备接收存储设备发送的注册声纹信息及对应的ID,在该实现方式中,声纹识别设备无需向存储设备发送用户ID请求注册声纹信息。In another possible implementation manner, if the storage device is a terminal or multiple terminals in a communication system, the storage device can share (or synchronize) the registered voiceprint information and the corresponding user ID with other terminals. For example, it can be synchronized regularly, or multiple terminals can be synchronized when connected to the same LAN. For example, if the storage device is a tablet computer, the communication system includes 3 terminals, a tablet computer, a TV, and a mobile phone. When the 3 terminals are connected in communication (such as via WiFi), the tablet computer will store the registered voiceprint information and the corresponding ID Transmitted to mobile phones and TVs, that is, the voiceprint recognition device receives the registered voiceprint information and the corresponding ID sent by the storage device. In this implementation, the voiceprint recognition device does not need to send a user ID to the storage device to request the registration of voiceprint information.
在另一种可能的实现方式中,第一终端向云服务器发送用户ID及第一终端的标识,云服务器根据该用户ID确认用户ID对应的注册声纹信息存储的位置(即存储在哪个设备中),云服务器根据该用户ID确认该注册声纹信息存储在第一设备内,云服务器将用户ID及第一终端的标识发送至第一设备,第一设备与第一终端通信连接,该第一设备根据该第一终端的标识,将注册声纹信息发送给第一终端,第一终端接收第一设备发送的注册声纹信息。In another possible implementation manner, the first terminal sends the user ID and the identification of the first terminal to the cloud server, and the cloud server confirms the storage location of the registered voiceprint information corresponding to the user ID according to the user ID (that is, in which device it is stored) In), the cloud server confirms that the registered voiceprint information is stored in the first device according to the user ID, the cloud server sends the user ID and the identification of the first terminal to the first device, and the first device is in communication connection with the first terminal. The first device sends the registered voiceprint information to the first terminal according to the identifier of the first terminal, and the first terminal receives the registered voiceprint information sent by the first device.
步骤704、第一终端将目标声纹信息与注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份。若目标声纹信息与注册声纹信息匹配,则判定该用户与注册声纹信息所对应的用户为相同用户,即所述用户身份为真(或预设用户),若目标声纹信息与注册声纹信息不匹配,则判定该用户与注册声纹信息所对应的用户为不同用户,即所述用户身份为假。可选的,若所述用户身份为预设用户,第一终端执行该语音的控制指令;若所述用户身份为假(或非预设用户),则判定该用户与注册声纹信息所对应的用户为不同用户, 第一终端不需要执行该语音的控制指令。Step 704: The first terminal matches the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the identity of the user. If the target voiceprint information matches the registered voiceprint information, it is determined that the user and the user corresponding to the registered voiceprint information are the same user, that is, the user identity is true (or the preset user), if the target voiceprint information matches the registered voiceprint information If the voiceprint information does not match, it is determined that the user and the user corresponding to the registered voiceprint information are different users, that is, the user identity is false. Optionally, if the user identity is a preset user, the first terminal executes the voice control instruction; if the user identity is false (or not a preset user), it is determined that the user corresponds to the registered voiceprint information The users of are different users, and the first terminal does not need to execute the voice control instruction.
针对上述3个阶段,以一个应用场景进行举例说明,请参阅图8所示,用户S平时使用的终端包括手机、平板电脑、电脑、智慧屏和车载终端。用户S可以通过注册设备(如手机)对自己的声纹进行注册,手机接收用户S输入的语音,该语音如“你好,小艺”,该输入的语音并不关注文本本身,手机提取该语音中的生物声纹信息,该生物声纹信息为分离了与该手机相关的噪声之后的声纹信息,将该生物声纹信息与用户S的用户ID进行关联,手机将该用户ID与该生物声纹信息发送至云端,该云端作为注册声纹信息的存储中心。当用户S想要通过语音控制车载终端,在车载终端上登录该用户ID(可以预先登录,不需要每次都登录),车载终端通过蜂窝网将用户ID发送至云端,云端将与该用户ID及该用户ID对应的注册声纹信息发送至该车载终端,车载终端接收用户S输入的语音,如该语音为“播放音乐”,车载终端接收用户S输入的语音,并分离该语音中与车载终端相关的噪声,提取该语音中的目标声纹信息,车载终端将从云端接收到的注册声纹信息和目标声纹信息相匹配,当注册声纹信息与该目标声纹信息匹配时,车载终端确认用户S的用户身份为预设用户,执行“播放音乐”的语音指令。Regarding the above three stages, an application scenario is used as an example. Please refer to Figure 8. The terminals that the user S usually uses include mobile phones, tablet computers, computers, smart screens, and vehicle-mounted terminals. User S can register his voiceprint through a registered device (such as a mobile phone). The mobile phone receives the voice input by user S, such as "Hello, Xiaoyi". The input voice does not focus on the text itself, and the mobile phone extracts the The biological voiceprint information in the voice, the biological voiceprint information is the voiceprint information after the noise related to the mobile phone is separated, the biological voiceprint information is associated with the user ID of the user S, and the mobile phone associates the user ID with the user ID. The biological voiceprint information is sent to the cloud, which serves as a storage center for registered voiceprint information. When user S wants to control the vehicle-mounted terminal by voice, log in the user ID on the vehicle-mounted terminal (you can log in in advance, you do not need to log in every time), the vehicle-mounted terminal sends the user ID to the cloud through the cellular network, and the cloud will communicate with the user ID The registered voiceprint information corresponding to the user ID is sent to the vehicle terminal, and the vehicle terminal receives the voice input by user S. If the voice is "play music", the vehicle terminal receives the voice input by user S and separates the voice from the vehicle. For terminal-related noise, extract the target voiceprint information in the voice. The vehicle-mounted terminal will match the registered voiceprint information received from the cloud with the target voiceprint information. When the registered voiceprint information matches the target voiceprint information, the vehicle-mounted The terminal confirms that the user identity of the user S is the preset user, and executes the voice command of "play music".
可以理解的是,根据声纹识别技术的目的进行分类,可分为“说话人确认”和“说话人辨认”。“说话人确认”是指判断被测试者是否为某个指定的人,“说话人辨认”则是指辨认被测试者是已记录说话人中的哪一位。It is understandable that the classification according to the purpose of voiceprint recognition technology can be divided into "speaker confirmation" and "speaker identification". "Speaker confirmation" refers to judging whether the subject is a designated person, and "speaker identification" refers to identifying which of the recorded speakers is the subject.
上述图8对应的应用场景为“说话人确认”的应用场景,即对车载终端接收到的语音,并提取该语音中的目标声纹信息,判定该目标声纹信息与该用户ID对应的注册声纹信息是否为同一个人的声纹,确定用户S的身份后,车载终端可以执行该用户S的语音指令。若该目标声纹信息和注册声纹信息不匹配,则车载终端不需要执行用户S的语音指令。图8对应的应用场景仅是举例说明,本申请的应用场景包括但不限定于账户登录(如银行账户登录)、身份确认(如防盗门语音识别、金融证券交易中身份识别)等应用场景。The application scenario corresponding to Figure 8 above is the "speaker confirmation" application scenario, which is to extract the target voiceprint information in the voice received by the vehicle terminal, and determine the registration corresponding to the target voiceprint information and the user ID Whether the voiceprint information is the voiceprint of the same person, after determining the identity of the user S, the vehicle-mounted terminal can execute the voice command of the user S. If the target voiceprint information and the registered voiceprint information do not match, the vehicle-mounted terminal does not need to execute the voice command of the user S. The application scenarios corresponding to FIG. 8 are only examples. The application scenarios of this application include but are not limited to account login (such as bank account login), identity verification (such as security door voice recognition, identity recognition in financial securities transactions) and other application scenarios.
本申请还可以应用于“说话人辨认”的应用场景。可选的,每个注册声纹信息对应一个用户ID,如在家庭场景中,每个用户ID对应有用户数据,该用户数据针对不同的应用场景中可以为不同的信息,如用户数据可以为用户喜欢的节目,空调的温度等。该用户数据以空调的温度为例,用户ID和用户数据的对应关系可以如下表1所示:This application can also be applied to the application scenario of "speaker identification". Optionally, each registered voiceprint information corresponds to a user ID. For example, in a family scenario, each user ID corresponds to user data. The user data can be different information for different application scenarios. For example, the user data can be The user's favorite program, the temperature of the air conditioner, etc. The user data takes the temperature of an air conditioner as an example, and the corresponding relationship between user ID and user data can be shown in Table 1 below:
表1Table 1
用户IDUser ID 用户user 空调温度Air conditioning temperature
1A1A ff 25摄氏度25 degrees Celsius
2D2D gg 20摄氏度20 degrees Celsius
3C3C cc 27摄氏度27 degrees Celsius
将所述目标声纹信息与所述声纹标识对应的多个注册声纹信息进行匹配,确定与所述目标声纹信息相匹配的目标注册声纹信息;获取所述目标注册声纹信息对应的用户ID所关联的用户数据;然后执行与用户数据对应的操作。Match the target voiceprint information with the multiple registered voiceprint information corresponding to the voiceprint identifier, determine the target registered voiceprint information that matches the target voiceprint information; obtain the target registered voiceprint information correspondence The user data associated with the user ID; then perform the operation corresponding to the user data.
例如,请参阅图9所示,在一个“说话人辨认”的应用场景中,在一个家庭中包括3位家庭成员,如用户f(如爸爸)、用户g(如妈妈)和用户c(如孩子)。平板电脑为注册设 备为例,3为家庭成员都可以通过平板电脑进行声纹注册,该平板电脑可以注册预设数量的声纹信息,该预设数量可以为用户设定,或者也可以为该平板电脑的系统设置。本应用场景中,以一个终端可以注册3个用户的声纹信息为例,用户f登录该用户ID(或者预先登录),平板电脑接收用户f输入的用于注册的语音,该语音可以为任意文本内容的语音(如你好,小艺),对该语音的文本内容并不限定,平板电脑分离该语音中与该平板电脑相关的噪声,提取该语音中用户f的生物声纹信息,将该生物声纹信息与该用户ID(如“1A”)关联,进行注册,得到用户f的第一注册声纹信息。同理,平板电脑接收用户g输入的用于注册的语音,得到用户g的第二注册声纹信息,该第二注册声纹信息对应的用户ID“如2D”,平板电脑接收用户c输入的用于注册的语音,得到用户c的第三注册声纹信息,该第三注册声纹信息对应的用户ID“如3C”,平板电脑将每个注册声纹信息所对应的用户ID发送至云端,由云端存储。For example, please refer to Figure 9. In a "speaker recognition" application scenario, a family includes 3 family members, such as user f (such as father), user g (such as mother) and user c (such as child). Take a tablet as a registered device as an example. 3 means that all family members can register voiceprints through the tablet. The tablet can register a preset number of voiceprint information. The preset number can be set by the user, or it can be System settings of the tablet. In this application scenario, a terminal can register the voiceprint information of 3 users as an example. User f logs in to the user ID (or pre-login), and the tablet computer receives the voice input for registration by user f. The voice can be any The text content of the voice (such as hello, Xiaoyi), the text content of the voice is not limited, the tablet computer separates the noise related to the tablet computer in the voice, and extracts the biological voiceprint information of the user f in the voice. The biological voiceprint information is associated with the user ID (such as "1A"), and the registration is performed to obtain the first registered voiceprint information of the user f. In the same way, the tablet computer receives the voice for registration input by user g, and obtains the second registered voiceprint information of user g. The second registered voiceprint information corresponds to the user ID "such as 2D", and the tablet computer receives the input from user c The voice used for registration, the third registered voiceprint information of user c is obtained, and the user ID corresponding to the third registered voiceprint information "such as 3C", the tablet computer sends the user ID corresponding to each registered voiceprint information to the cloud , Stored by the cloud.
若用户需要使用智能音箱时,想通过智能音箱控制空调(在平板电脑智能家居应用上该智能音箱已经与该用户ID建立关联),该智能音箱接收用户f输入的语音“小艺,开空调”,智能音箱分离该语音中与智能音箱相关的噪声,提取该语音中的用户的生物声纹信息(目标声纹信息),并且该智能音箱将用户ID发送至云端,云端将该3个用户ID及对应的3个注册声纹信息发送至智能音箱,或者,该智能音箱已经预先存储了该3个注册声纹信息及对应的用户ID,智能音箱将目标声纹信息与接收到的3个注册声纹信息匹配,若该目标声纹信息与3个注册声纹信息中的第一注册声纹信息(如用户f的注册声纹信息)相匹配,则执行该语音指令。可选的,智能音箱确定该第一注册声纹信息所对应的用户ID(如“1A”),智能音箱可以确定该声纹标识对应的相关用户数据,如该相关用户数据为“温度25℃”。可以理解的是,智能音箱识别出该第一注册声纹信息所对应的用户ID(如“1A”),表明该语音为用户f所输入的语音,智能音箱记录的历史用户数据可以为:用户ID(1A)对应“温度25℃”,表示用户f对于经常将空调的温度调控至25℃,智能音箱根据该相关信息向空调发送控制指令,将空调的温度调控到25℃。If the user needs to use a smart speaker and wants to control the air conditioner through the smart speaker (the smart speaker has been associated with the user ID in the tablet smart home application), the smart speaker receives the voice "Xiaoyi, turn on the air conditioner" input by the user f , The smart speaker separates the noise related to the smart speaker in the voice, extracts the user’s biological voiceprint information (target voiceprint information) in the voice, and the smart speaker sends the user ID to the cloud, and the cloud uses the three user IDs And the corresponding 3 registered voiceprint information are sent to the smart speaker, or the smart speaker has pre-stored the 3 registered voiceprint information and the corresponding user ID, and the smart speaker combines the target voiceprint information with the received 3 registrations The voiceprint information is matched. If the target voiceprint information matches the first registered voiceprint information of the three registered voiceprint information (such as the registered voiceprint information of the user f), the voice command is executed. Optionally, the smart speaker determines the user ID (such as "1A") corresponding to the first registered voiceprint information, and the smart speaker can determine the relevant user data corresponding to the voiceprint identifier. For example, the relevant user data is "temperature 25°C". ". It is understandable that the smart speaker recognizes the user ID (such as "1A") corresponding to the first registered voiceprint information, indicating that the voice is the voice input by user f, and the historical user data recorded by the smart speaker can be: user ID (1A) corresponds to "temperature 25°C", which means that user f often adjusts the temperature of the air conditioner to 25°C, and the smart speaker sends control commands to the air conditioner according to the relevant information to adjust the temperature of the air conditioner to 25°C.
使用声纹识别技术来识别出特定的使用者,有选择性地执行命令。同时,识别出不同的使用者,智能设备还可以针对不同的用户提供个性化的服务,拓展智能设备的应用。Use voiceprint recognition technology to identify specific users and selectively execute commands. At the same time, by identifying different users, smart devices can also provide personalized services for different users and expand the applications of smart devices.
上述应用场景为家庭应用场景,本申请也可以应用于工作场景,一个工作组的多个成员的注册声纹信息对应多个ID,可以通过本申请中的声纹识别技术,识别说话人为工作组中的哪个用户(即对说话人进行辨认),不同的用户具有不同的权限,若识别出说话人为用户ID所对应的用户d,声纹识别设备直接执行用户d的权限的用户数据。The above application scenario is a home application scenario. This application can also be applied to a work scenario. The registered voiceprint information of multiple members of a working group corresponds to multiple IDs. The voiceprint recognition technology in this application can be used to identify the speaker as a working group For which user (that is, to identify the speaker), different users have different permissions. If the speaker is identified as the user d corresponding to the user ID, the voiceprint recognition device directly executes the user data of the user d's permissions.
本示例中,通过对声纹识别可以进行“说话人辨认”,通过说话人进行辨认,从而确定该说话人对应的用户数据,该用户数据包括但不限定于与该“说话人”相关联的历史信息,或与“说话人”权限对应的用户数据。设备可以根据用户数据进行智能化定制或者智能化推荐。如在一个应用场景中,智能音箱可以根据该“说话人”(用户ID“1A”对应的用户)相关联的历史信息(如温度25℃),执行与该历史信息对应的操作,不需要用户每次进行习惯性操作,提高用户体验。In this example, the “speaker identification” can be performed through the voiceprint recognition, and the speaker identification is performed to determine the user data corresponding to the speaker. The user data includes, but is not limited to, those associated with the “speaker” Historical information, or user data corresponding to the "speaker" permission. The device can be intelligently customized or intelligently recommended based on user data. For example, in an application scenario, the smart speaker can perform operations corresponding to the historical information based on the historical information (such as the temperature 25°C) associated with the "speaker" (the user corresponding to the user ID "1A"), without requiring the user Perform habitual operations every time to improve user experience.
可选的,上述图7对应的实现方式的步骤702中,第一终端提取语音中的与第一终端 无关的目标声纹信息的具体方式,及上述图3对应的实现方式的步骤302中,注册设备提取该语音中与设备无关的生物声纹信息中的具体方式,可以包括:一、机器学习的方式;二、信号处理的方式。Optionally, in step 702 of the implementation manner corresponding to FIG. 7, the specific manner in which the first terminal extracts target voiceprint information in the voice that is not related to the first terminal, and in step 302 of the implementation manner corresponding to FIG. 3, The specific method for the registration device to extract the biological voiceprint information in the voice that is not related to the device may include: 1. The method of machine learning; 2. The method of signal processing.
一、机器学习的方式。1. The way of machine learning.
第一终端(或声纹识别设备)将所述语音作为声纹提取模型的输入,通过所述声纹提取模型输出所述目标声纹信息,所述声纹提取模型为对多种设备采集的语料进行学习得到的。The first terminal (or voiceprint recognition device) uses the voice as the input of a voiceprint extraction model, and outputs the target voiceprint information through the voiceprint extraction model, and the voiceprint extraction model is collected from multiple devices The corpus is learned.
该声纹提取模型包括但不限于高斯混合模型(gauss mixture model,GMM),高斯混合模型-通用背景模型(GMM universal background model,GMM-UBM)、i-vector、x-vector、dnn-ivector、深度神经网络(deep neural network,DNN)、语音解析、语音因子分解、聚类、变换等算法等。The voiceprint extraction model includes, but is not limited to, Gaussian mixture model (GMM), Gaussian mixture model-universal background model (GMM-UBM), i-vector, x-vector, dnn-ivector, Algorithms such as deep neural network (DNN), speech analysis, speech factorization, clustering, transformation, etc.
机器学习包括:对声纹提取模型的训练阶段和该声纹提取模型的应用阶段。Machine learning includes: the training phase of the voiceprint extraction model and the application phase of the voiceprint extraction model.
请参阅图10所示,在对声纹提取模型的训练阶段,获取大量的用于学习的语料和用于参考的参考数据,该大量的语料包括不同类型设备所采集的语料。例如,该不同类型的设备包括但不限定于智能家电(如智慧屏、电视、智能洗衣机、智能空调等)、照明系统、智能音箱、机器人等;该终端还可以是车载终端、用户设备、手机、平板电脑、个人电脑、虚拟现实终端设备、增强现实终端设备、可穿戴设备等具有语音收录功能的设备。该参考数据是和设备无关(或者相关性弱)的声纹信息。As shown in FIG. 10, in the training phase of the voiceprint extraction model, a large amount of corpus for learning and reference data for reference are obtained. The large amount of corpus includes corpus collected by different types of equipment. For example, the different types of equipment include but are not limited to smart home appliances (such as smart screens, TVs, smart washing machines, smart air conditioners, etc.), lighting systems, smart speakers, robots, etc.; the terminal can also be a vehicle-mounted terminal, user equipment, mobile phone , Tablet computers, personal computers, virtual reality terminal devices, augmented reality terminal devices, wearable devices and other devices with voice recording capabilities. The reference data is voiceprint information that has nothing to do with the device (or has a weak correlation).
通过不同类型的设备接收同一个用户录入的大量语料,将大量的语料输入到声纹模型(如GMM-UBM模型),该GMM-UBM模型输出的声纹数据与参考数据进行比较,该参考数据为用户的生物声纹数据。若输出的声纹数据与该参考数据之间的差异大于或者等于阈值,则将该输出的声纹数据重新输入到该GMM-UBM模型,通过该GMM-UBM模型输出声纹数据,然后将该输出的声纹数据与参考数据进行比较,通过不断的迭代训练,若从GMM-UBM模型输出的声纹数据与参考数据之间的差异小于阈值,则表明从该模型输出的声纹数据可以接近参考数据了,得到声纹提取模型,该声纹提取模型用于分离设备相关的噪声,提取用户的生物声纹。本示例中,通过机器学习的方式,预先训练好声纹提取模型,通过已经训练好的声纹提取模型直接提取语音中的用户的生物声纹,具有较好的鲁棒性。Receive a large amount of corpus input by the same user through different types of equipment, and input a large amount of corpus into the voiceprint model (such as the GMM-UBM model). The voiceprint data output by the GMM-UBM model is compared with the reference data. The reference data It is the user's biological voiceprint data. If the difference between the output voiceprint data and the reference data is greater than or equal to the threshold, the output voiceprint data is re-input to the GMM-UBM model, the voiceprint data is output through the GMM-UBM model, and then the The output voiceprint data is compared with the reference data. Through continuous iterative training, if the difference between the voiceprint data output from the GMM-UBM model and the reference data is less than the threshold, it indicates that the voiceprint data output from the model can be close The reference data is obtained, and the voiceprint extraction model is obtained. The voiceprint extraction model is used to separate device-related noise and extract the user's biological voiceprint. In this example, the voiceprint extraction model is pre-trained through machine learning, and the biological voiceprint of the user in the speech is directly extracted through the trained voiceprint extraction model, which has good robustness.
二、信号处理的方式。2. The way of signal processing.
本申请中可以通过滤波器对语音信号进行处理,进行频响补偿,频响补偿用于消除所述语音中的与所述识别设备相关的噪声以得到用户的生物声纹信息。滤波器是最基本的信号处理器件,从混合在一起的诸多信号中提取出所需要的信号。本申请中,滤波器主要功能是消除影响信号处理的各类噪声,滤波器是根据频率不同产生不同的增益,使得特定的信号被凸显出来,将凸显的信号衰减,将较弱的信号进行增强,从而达到消除设备噪声的目的。In this application, a filter may be used to process the voice signal to perform frequency response compensation. The frequency response compensation is used to eliminate noise in the voice related to the recognition device to obtain the user's biological voiceprint information. The filter is the most basic signal processing device, which extracts the required signal from the mixed signals. In this application, the main function of the filter is to eliminate various noises that affect signal processing. The filter generates different gains according to different frequencies, so that specific signals are highlighted, attenuated the highlighted signals, and weaker signals are enhanced , So as to achieve the purpose of eliminating equipment noise.
如该滤波器如下表达式所示:For example, the filter is shown in the following expression:
Figure PCTCN2020090930-appb-000001
Figure PCTCN2020090930-appb-000001
上述式(1)为有限冲击响应滤波器,其中,n为时刻点,N为数字滤波器单位冲激响应长度,系数a与系数x卷积产生滤波输出y,k从0开始取数,一直取到N。The above formula (1) is a finite impulse response filter, where n is the point in time, N is the unit impulse response length of the digital filter, the coefficient a and the coefficient x are convolved to produce the filtered output y, and k starts from 0, and always Take N.
或者,该滤波器如下表达式所示:Or, the filter is shown in the following expression:
Figure PCTCN2020090930-appb-000002
Figure PCTCN2020090930-appb-000002
上述式(2)为无限冲击响应滤波器,其中,n指的是时刻点,N和P为数字滤波器单位冲激响应长度,k从0开始取数,一直取到N,系数a与x卷积;j从0开始取数,一直取到P,系数b和y卷积的和产生滤波输出y。The above formula (2) is an infinite impulse response filter, where n refers to the point in time, N and P are the unit impulse response length of the digital filter, k starts from 0 and continues to N, and the coefficients a and x Convolution; j starts from 0 and continues to P, and the sum of the convolution of the coefficients b and y produces the filtered output y.
若在上述式(1)中通过调整系数a和x使得不同设备的频响都趋向水平或一致,若在上述式(2)中通过调整系数a和b使得不同设备的频响都趋向水平或一致,从而过滤掉语音中与设备相关的噪声。If in the above equation (1), the frequency response of different devices is adjusted to a level or the same by adjusting the coefficients a and x, if in the above equation (2), the frequency response of different devices is all tending to be level or Consistent, so as to filter out the noise related to the device in the speech.
请参阅图11A所示,在图11A中包括3条曲线,上限曲线1101,下限曲线1102和为未经过频响补偿的曲线1103,在频率600hz-1.2K Hz之间,频响曲线1103出现波峰,该波峰超过上限曲线1101,在300Hz-500Hz之间,频响曲线1103为上升趋势的曲线。请参阅图11B所示,图11B中包括3条曲线,上限曲线1101,下限曲线1102和为经过频响补偿的曲线1104,在上述式(1)中通过调整系数a和x使得不同设备的频响都趋向水平,或者,在上述式(2)中通过调整系数a和b使得不同设备的频响都趋向水平,在频率为300hz-2.5KHz范围内,频响趋于一致(如-2dBr),从而过滤掉语音中与设备相关的噪声,得到用户的生物声纹。Please refer to Figure 11A. Figure 11A includes three curves, the upper limit curve 1101, the lower limit curve 1102 and the curve 1103 without frequency response compensation. The frequency response curve 1103 has a peak between 600hz-1.2KHz. , The peak exceeds the upper limit curve 1101, and between 300 Hz and 500 Hz, the frequency response curve 1103 is an upward trend curve. Please refer to Figure 11B. Figure 11B includes three curves, the upper limit curve 1101, the lower limit curve 1102 and the frequency response compensated curve 1104. The frequency response tends to be horizontal, or, in the above formula (2), the frequency response of different devices tends to be horizontal by adjusting the coefficients a and b. In the frequency range of 300hz-2.5KHz, the frequency response tends to be consistent (such as -2dBr) , So as to filter out the noise related to the device in the voice, and get the user's biological voiceprint.
本示例中,通过滤波器对语音信号进行处理,将凸显的信号衰减,将较弱的信号进行增强,从而对频响进行补偿,从而达到去除设备相关的噪声的目的,经过滤波器过滤方式,是直接过滤噪声信号,从而输出的信号为用户的生物声纹信号,实现简单且速度快。In this example, the voice signal is processed by the filter, the prominent signal is attenuated, and the weaker signal is enhanced, thereby compensating the frequency response, so as to achieve the purpose of removing equipment-related noise. The filter is filtered. It directly filters the noise signal, and the output signal is the user's biological voiceprint signal, which is simple and fast to implement.
在一个可选的实现方式中,每个人的生物声纹并不是一成不变,而是会发生变化的,如一天当中的不同时间段,或者不同的健康情况(如健康状态和生病状态),或者不同的年龄等因素,这些因素都会使同一个人的生物声纹会发生变化,为了提升系统的鲁棒性,可以针对同一个用户注册多个声纹,也即同一个用户对应多个生物声纹模板,一个用户ID对应一个注册声纹信息,注册声纹信息包括多个生物声纹模板。In an alternative implementation, the biological voiceprint of each person is not static, but will change, such as different time periods of the day, or different health conditions (such as health and sickness), or different Factors such as the age of the user, these factors will cause the biological voiceprint of the same person to change. In order to improve the robustness of the system, multiple voiceprints can be registered for the same user, that is, the same user corresponds to multiple biological voiceprint templates One user ID corresponds to one registered voiceprint information, and the registered voiceprint information includes multiple biological voiceprint templates.
存储设备可以根据同一个用户的多个生物声纹生成声纹模型(即声纹模板),请参阅图12所示,以x-vector系统为例,x-vector系统是基于DNN搭建的说话人识别系统。通过对DNN进行训练,将说话者的语音映射到固定维度的嵌入矢量(embeddings),称为x-vector。The storage device can generate a voiceprint model (ie, voiceprint template) based on multiple biological voiceprints of the same user. Please refer to Figure 12. Take the x-vector system as an example. The x-vector system is a speaker based on DNN. recognition system. By training the DNN, the speaker's speech is mapped to a fixed-dimensional embedding vector (embeddings), called x-vector.
x-vector网络接收同一个用户的声纹信息,该声纹信息是去除设备相关的噪声的用户声纹。x-vector网络可以利用较短的语音,捕捉到用户声纹信息,在短语音上拥有更强的鲁棒性。一个输入可以对应一个x-vector向量,这个向量就是用户的声纹信息(由于前面去除了设备相关的信息,该声纹信息已经和设备无关),或者成为用户的声纹模型。如果通过多个设备输入多次与设备无关的声纹信息,或者通过一个设备输入多个与设备无关的声纹信息,则会有多条x-vector向量,通过线性判别分析(linear discriminant analysis,LDA)进行降维,该多条向量可以覆盖更多的用户发音情况(如用户不同时间段的声纹信息,健康状态或者 非健康状态的声纹信息等),也即一个用户对应有多个生物声纹模型(或模板),该多个生物声纹模型组成一个声纹信息模板库,可以进一步增强声纹识别效果。同一个用户对应的模板的数量为可以10-30,并且可以定时或者不定时的进行更新,用新的模板代替旧的模板,提高生物声纹模板的鲁棒性。The x-vector network receives the voiceprint information of the same user. The voiceprint information is the user's voiceprint for removing device-related noise. The x-vector network can capture user's voiceprint information by using shorter speech, and has stronger robustness in short speech. An input can correspond to an x-vector vector, this vector is the user's voiceprint information (because the device-related information is removed before, the voiceprint information has nothing to do with the device), or it becomes the user's voiceprint model. If multiple device-independent voiceprint information is input through multiple devices, or multiple device-independent voiceprint information is input through one device, there will be multiple x-vector vectors. Through linear discriminant analysis, LDA) performs dimensionality reduction. The multiple vectors can cover more user pronunciation conditions (such as user voiceprint information in different time periods, voiceprint information in healthy or unhealthy states, etc.), that is, one user corresponds to multiple The biological voiceprint model (or template), the multiple biological voiceprint models form a voiceprint information template library, which can further enhance the voiceprint recognition effect. The number of templates corresponding to the same user can be 10-30, and it can be updated regularly or irregularly, replacing the old template with a new template to improve the robustness of the biological voiceprint template.
可以理解,所述注册设备可以一次注册一个生物声纹模板,也可以分多次注册多个生物声纹模板。或者多个注册设备的每个可以分别注册一个生物声纹模板。可选地,注册设备可以一次通过一条消息将一个或多个生物声纹模板发给存储设备或识别设备,也可以分别通过多个消息将多个生物声纹模板发给所述存储设备或所述识别设备。发送的渠道包括但不限于无线方式和有线方式,具体可以通过收发机实现,具体可参考图14的对应说明。所述识别设备可以从所述注册设备或所述存储设备获取所述一个或多个生物声纹模板。对于所述识别设备而言,所述一个或多个生物声纹模板被用于进行声纹识别的匹配,属于所述识别设备能够使用的注册声纹信息,尽管所述注册声纹信息不是所述识别设备自身产生和注册,而是来自其他设备。通过这一方案,实现了与设备无关的声纹信息共享,实现灵活的跨设备声纹识别,提升用户体验。It can be understood that the registration device can register one biological voiceprint template at a time, or multiple biological voiceprint templates can be registered multiple times. Or each of multiple registered devices can register a biological voiceprint template. Optionally, the registration device can send one or more biometric voiceprint templates to the storage device or recognition device through one message at a time, or it can send multiple biometric voiceprint templates to the storage device or the recognition device through multiple messages, respectively.述Recognition equipment. The sending channels include, but are not limited to, wireless and wired methods, which can be specifically implemented through a transceiver. For details, refer to the corresponding description in FIG. 14. The identification device may obtain the one or more biological voiceprint templates from the registration device or the storage device. For the recognition device, the one or more biological voiceprint templates are used to perform voiceprint recognition matching, and belong to the registered voiceprint information that can be used by the recognition device, although the registered voiceprint information is not all The identification device is generated and registered by itself, but from other devices. Through this solution, device-independent voiceprint information sharing is realized, flexible cross-device voiceprint recognition is realized, and user experience is improved.
相应于上述方法实施例给出的方法,本申请实施例还提供了相应的装置,包括用于执行上述实施例相应的模块。模块可以是软件,也可以是硬件,或者是软件和硬件结合。如图13所示,本申请实施例还提供了一种装置1300,该装置可以是终端,也可以是终端的部件(例如,集成电路,芯片等等),该声纹识别装置包括语音输入输出模块1301(或称为语音输入输出单元),处理模块1302(或称为处理单元)和收发模块1303(或称为收发单元)。Corresponding to the methods given in the foregoing method embodiments, the embodiments of the present application also provide corresponding devices, including corresponding modules for executing the foregoing embodiments. The module can be software, hardware, or a combination of software and hardware. As shown in FIG. 13, an embodiment of the present application also provides a device 1300, which may be a terminal or a component of a terminal (for example, an integrated circuit, a chip, etc.). The voiceprint recognition device includes voice input and output. A module 1301 (or a voice input and output unit), a processing module 1302 (or a processing unit), and a transceiver module 1303 (or a transceiver unit).
一种可能的设计中,该装置1300可以执行上述方法实施例中识别设备的功能:语音输入输出模块1301,用于接收用户输入的语音;处理模块1302,用于消除语音输入输出模块1301接收的语音中的与识别设备相关的噪声以得到目标声纹信息,目标声纹信息为用户的生物声纹;收发模块1303,用于获取用户的注册声纹信息,注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;处理模块1302,还用于将处理模块1302得到的目标声纹信息与收发模块1303获取的注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份。In a possible design, the device 1300 can perform the function of the recognition device in the above method embodiment: the voice input and output module 1301 is used to receive the voice input by the user; the processing module 1302 is used to eliminate the voice input and output module 1301 received The noise in the voice related to the recognition device is used to obtain target voiceprint information, the target voiceprint information is the user's biological voiceprint; the transceiver module 1303 is used to obtain the user's registered voiceprint information, and the registered voiceprint information includes cancellation and registration equipment At least one biological voiceprint template obtained from related noises; the processing module 1302 is also used to match the target voiceprint information obtained by the processing module 1302 with the registered voiceprint information obtained by the transceiver module 1303 to obtain a matching result, and the matching result is used To indicate the identity of the user.
进一步的,语音输入输出模块1301用于执行上述图7对应的实施例中的步骤701,具体实现请参阅步骤701中的具体描述,此处不赘述。处理模块1302用于执行上述图7对应的实施例中的步骤702和步骤704,具体实现请参阅步骤702和步骤704的具体描述,此处不赘述。收发模块1303,用于执行上述图7对应的实施例中的步骤703,具体实现请参阅步骤703的具体描述,此处不赘述。Further, the voice input and output module 1301 is configured to execute step 701 in the embodiment corresponding to FIG. 7 above. For specific implementation, please refer to the specific description in step 701, which will not be repeated here. The processing module 1302 is configured to execute step 702 and step 704 in the embodiment corresponding to FIG. 7. For specific implementation, please refer to the specific description of step 702 and step 704, which will not be repeated here. The transceiver module 1303 is configured to perform step 703 in the embodiment corresponding to FIG. 7. For specific implementation, please refer to the specific description of step 703, which is not repeated here.
另一种可能的设计中,该装置1300可以执行上述方法实施例中注册设备的功能:语音输入输出模块1301,用于接收用户输入的语音;处理模块1302,用于消除语音输入输出模块1301接收的语音中与注册设备相关的噪声以得到用户的注册声纹信息。可选的,收发模块1303,用于将处理模块1302得到的注册声纹信息与对应的声纹标识发送至其他的存储 设备。In another possible design, the device 1300 can perform the function of the registered device in the above method embodiment: the voice input and output module 1301 is used to receive the voice input by the user; the processing module 1302 is used to eliminate the voice input and output module 1301 receiving The noise associated with the registered device in the voice of the user can be used to obtain the user's registered voiceprint information. Optionally, the transceiver module 1303 is configured to send the registered voiceprint information and the corresponding voiceprint identifier obtained by the processing module 1302 to other storage devices.
进一步的,语音输入输出模块1301用于执行上述图3对应的实施例中的步骤301,具体实现请参阅步骤301中的具体描述,此处不赘述。处理模块1302用于执行上述图3对应的实施例中的步骤302和步骤303,此处不赘述。Further, the voice input and output module 1301 is configured to execute step 301 in the embodiment corresponding to FIG. 3 above. For specific implementation, please refer to the specific description in step 301, which will not be repeated here. The processing module 1302 is configured to execute step 302 and step 303 in the embodiment corresponding to FIG. 3, and details are not described here.
在另一种实现方式中,该装置可以为芯片或集成电路。此时,收发模块1303可以为通信接口,处理模块1302可以为逻辑电路,语音输入输出模块1301可以为音频电路。可选地,通信接口可以是输入输出接口或者收发电路。输入输出接口可以包括输入接口和输出接口。收发电路可以包括输入接口电路和输出接口电路。In another implementation, the device may be a chip or an integrated circuit. At this time, the transceiver module 1303 may be a communication interface, the processing module 1302 may be a logic circuit, and the voice input and output module 1301 may be an audio circuit. Optionally, the communication interface may be an input/output interface or a transceiver circuit. The input and output interface may include an input interface and an output interface. The transceiver circuit may include an input interface circuit and an output interface circuit.
在一种实现方式中,处理模块1302可以是一个处理装置,处理装置的功能可以部分或全部通过软件实现。可选地,处理装置的功能可以部分或全部通过软件实现。此时,处理装置可以包括存储器和处理器,其中,存储器用于存储计算机程序,处理器读取并执行存储器中存储的计算机程序,以执行任意一个方法实施例中的相应处理和/或步骤。可选地,处理装置可以仅包括处理器。用于存储计算机程序的存储器位于处理装置之外,处理器通过电路/电线与存储器连接,以读取并执行存储器中存储的计算机程序。In an implementation manner, the processing module 1302 may be a processing device, and the functions of the processing device may be partially or fully implemented by software. Optionally, the functions of the processing device may be partially or fully realized by software. At this time, the processing device may include a memory and a processor, where the memory is used to store a computer program, and the processor reads and executes the computer program stored in the memory to perform corresponding processing and/or steps in any method embodiment. Optionally, the processing device may only include a processor. The memory for storing the computer program is located outside the processing device, and the processor is connected to the memory through a circuit/wire to read and execute the computer program stored in the memory.
可以理解,图13中每个功能部件可以通过软件、硬件或二者结合来实现,具体不做限定。It can be understood that each functional component in FIG. 13 can be implemented by software, hardware or a combination of the two, which is not specifically limited.
此外,图14为本申请实施例提供的一种装置1400的一种示意性结构图。该装置可以为终端,或者也可以为终端中的集成电路,芯片或者芯片系统。该装置以终端为例,该终端可以包括但不限于是智慧家庭中的终端、照明系统、智能音箱、机器人等;该终端还可以是车载终端、用户设备、手机、平板电脑、个人电脑等。In addition, FIG. 14 is a schematic structural diagram of an apparatus 1400 provided by an embodiment of the application. The device may be a terminal, or may also be an integrated circuit, chip, or chip system in the terminal. The device uses a terminal as an example. The terminal may include, but is not limited to, a terminal in a smart home, a lighting system, a smart speaker, a robot, etc.; the terminal may also be a vehicle-mounted terminal, user equipment, mobile phone, tablet computer, personal computer, etc.
如图14所示,装置1400包括处理器1401、收发器1402、存储器1403和语音输入输出设备1404。其中,处理器1401、收发器1402、存储器1403、语音输入输出设备1404之间可以通过内部连接通路互相通信,传递控制信号和/或数据信号。其中,存储器1403用于存储计算机程序,处理器1401用于从存储器1403中调用并运行计算机程序,以控制收发器1402收发信号。处理器1401,处理器1401是装置的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1403内的软件程序和/或模块,以及调用存储在存储器1403内的数据,执行手机的各种功能和处理数据。As shown in FIG. 14, the apparatus 1400 includes a processor 1401, a transceiver 1402, a memory 1403, and a voice input and output device 1404. Among them, the processor 1401, the transceiver 1402, the memory 1403, and the voice input and output device 1404 can communicate with each other through internal connection paths, and transfer control signals and/or data signals. Among them, the memory 1403 is used to store a computer program, and the processor 1401 is used to call and run the computer program from the memory 1403 to control the transceiver 1402 to send and receive signals. The processor 1401, which is the control center of the device, uses various interfaces and lines to connect the various parts of the entire mobile phone, by running or executing software programs and/or modules stored in the memory 1403, and calling and storing in the memory 1403 Data, perform various functions of the phone and process data.
语音输入输出设备1404,该语音输入输出设备1404用于供用户与手机之间的音频接口。语音输入单元可以为音频电路,或者可以为语音识别器。该音频电路可以包括扬声器14041和传声器14042,传声器14042将收集的声音信号转换为电信号,由音频电路接收后转换为音频数据,再将音频数据输出处理器1401进行处理,处理器1401消除与该装置相关的噪声以得到用户的生物声纹。The voice input and output device 1404 is used for the audio interface between the user and the mobile phone. The voice input unit may be an audio circuit, or may be a voice recognizer. The audio circuit may include a speaker 14041 and a microphone 14042. The microphone 14042 converts the collected sound signals into electrical signals, which are received by the audio circuit and converted into audio data, and then the audio data is output to the processor 1401 for processing. Device related noise to get the user’s biological voiceprint.
可选地,装置还可以包括天线。收发器1402通过天线发射或接收无线信号。该收发器1402可以用于向其他设备发送或接收注册声纹信息及对应的声纹标识。可选地,处理器1401和存储器1403可以合成一个处理装置,处理器1401用于执行存储器1403中存储的 程序代码来实现上述功能。可选地,存储器1403也可以集成在处理器1401中。或者,存储器1403独立于处理器1401,也即位于处理器1401之外。可选的,该收发器1402包括但不限于射频(Radio Frequency,RF)电路、通信接口、WiFi模块,蓝牙模块模块等。Optionally, the device may also include an antenna. The transceiver 1402 transmits or receives wireless signals through an antenna. The transceiver 1402 can be used to send or receive registered voiceprint information and corresponding voiceprint identifiers to other devices. Optionally, the processor 1401 and the memory 1403 may be combined into one processing device, and the processor 1401 is configured to execute the program code stored in the memory 1403 to implement the foregoing functions. Optionally, the memory 1403 may also be integrated in the processor 1401. Alternatively, the memory 1403 is independent of the processor 1401, that is, located outside the processor 1401. Optionally, the transceiver 1402 includes but is not limited to a radio frequency (RF) circuit, a communication interface, a WiFi module, a Bluetooth module module, and so on.
可选的,该装置还可以包括显示单元1405可用于显示由用户输入的信息或提供给用户的信息以及各种图像。显示单元可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。Optionally, the device may further include a display unit 1405 which can be used to display information input by the user or information provided to the user and various images. The display unit can be configured with a display panel in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
在一种可能的设计中,该装置1400可以用于执行方法实施例中识别设备所执行的功能:语音输入输出设备1404,用于接收用户输入的语音;处理器1401,用于消除语音中的与识别设备相关的噪声以得到目标声纹信息,目标声纹信息为用户的生物声纹;收发器1402,用于获取用户的注册声纹信息,注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;处理器1401,还用于将目标声纹信息与注册声纹信息进行匹配以得到匹配结果,匹配结果用于指示用户身份。In a possible design, the device 1400 can be used to perform the functions performed by the recognition device in the method embodiment: the voice input and output device 1404 is used to receive the voice input by the user; the processor 1401 is used to eliminate the voice in the voice input and output device 1404. Recognize the noise related to the device to obtain target voiceprint information, the target voiceprint information is the user's biological voiceprint; the transceiver 1402 is used to obtain the user's registered voiceprint information, and the registered voiceprint information includes eliminating noise related to the registered device The obtained at least one biological voiceprint template; the processor 1401 is further configured to match the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the identity of the user.
可选的,收发器1402,用于接收存储设备或注册设备发送的注册声纹信息及注册声纹信息的声纹标识,声纹标识用于指示用户。可选的,收发器1402,还用于向存储设备或注册设备发送用户的声纹标识,以请求用户的注册声纹信息。Optionally, the transceiver 1402 is configured to receive the registered voiceprint information and the voiceprint identifier of the registered voiceprint information sent by the storage device or the registered device, and the voiceprint identifier is used to indicate the user. Optionally, the transceiver 1402 is also used to send the user's voiceprint identification to the storage device or the registration device to request the user's registered voiceprint information.
可选的,处理器1401,具体用于将语音输入声纹提取模型,通过声纹提取模型消除语音中的与识别设备相关的噪声以得到目标声纹信息。可选的,声纹提取模型为对多个设备采集的语料进行学习得到的。可选的,处理器1401,还用于通过滤波器对语音进行频响补偿,频响补偿用于消除语音中的与识别设备相关的噪声以得到目标声纹信息。可选的,处理器1401,还用于获取与用户身份关联的用户数据;执行与用户数据对应的操作。Optionally, the processor 1401 is specifically configured to input the voice into the voiceprint extraction model, and eliminate noise related to the recognition device in the voice through the voiceprint extraction model to obtain target voiceprint information. Optionally, the voiceprint extraction model is obtained by learning corpus collected by multiple devices. Optionally, the processor 1401 is further configured to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain target voiceprint information. Optionally, the processor 1401 is further configured to obtain user data associated with the user identity; and perform operations corresponding to the user data.
在一种可能的设计中,该装置1400用于执行上述方法实施例中注册设备所执行的功能:语音输入输出设备1404,用于接收用户输入的语音;处理器1401,用于:消除语音中与注册设备相关的噪声以得到用户的注册声纹信息,注册声纹信息具体包括用户的生物声纹。In a possible design, the device 1400 is used to perform the functions performed by the registered device in the above method embodiment: the voice input and output device 1404 is used to receive the voice input by the user; the processor 1401 is used to: The noise related to the registered device is used to obtain the user's registered voiceprint information, and the registered voiceprint information specifically includes the user's biological voiceprint.
可选的,收发器1402,用于向存储设备或识别设备发送注册声纹信息及注册声纹信息对应的声纹标识,声纹标识用于指示用户。可选的,处理器1401,具体用于将语音输入声纹提取模型,通过声纹提取模型消除语音中的与注册设备相关的噪声以得到注册声纹信息。可选的,声纹提取模型为对多个设备采集的语料进行学习得到的。可选的,处理器1401,还用于通过滤波器对语音进行频响补偿,频响补偿用于消除语音中与识别设备相关的噪声以得到注册声纹信息。Optionally, the transceiver 1402 is configured to send the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information to the storage device or the recognition device, and the voiceprint identifier is used to indicate the user. Optionally, the processor 1401 is specifically configured to input the voice into the voiceprint extraction model, and eliminate noise related to the registered device in the voice through the voiceprint extraction model to obtain registered voiceprint information. Optionally, the voiceprint extraction model is obtained by learning corpus collected by multiple devices. Optionally, the processor 1401 is further configured to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain registered voiceprint information.
可以理解,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。It can be understood that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software. The above-mentioned processor can be general processing, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable Logic devices, discrete gates or transistor logic devices, discrete hardware components.
本申请所描述的方案可通过各种方式来实现。例如,这些技术可以用硬件、软件或者硬件结合的方式来实现。对于硬件实现,用于在通信装置(例如,基站,终端、网络实体、或芯片)处执行这些技术的处理单元,可以实现在一个或多个通用处理器、DSP、数字信号处理器件、ASIC、可编程逻辑器件、FPGA、或其它可编程逻辑装置,离散门或晶体管逻辑,离散硬件部件,或上述任何组合中。通用处理器可以为微处理器,可选地,该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现,例如数字信号处理器和微处理器,多个微处理器,一个或多个微处理器联合一个数字信号处理器核,或任何其它类似的配置来实现。The solution described in this application can be implemented in various ways. For example, these technologies can be implemented in hardware, software, or a combination of hardware. For hardware implementation, the processing unit used to execute these technologies at a communication device (for example, a base station, a terminal, a network entity, or a chip) can be implemented in one or more general-purpose processors, DSPs, digital signal processing devices, ASICs, Programmable logic device, FPGA, or other programmable logic device, discrete gate or transistor logic, discrete hardware component, or any combination of the foregoing. The general-purpose processor may be a microprocessor. Alternatively, the general-purpose processor may also be any traditional processor, controller, microcontroller, or state machine. The processor can also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors combined with a digital signal processor core, or any other similar configuration. accomplish.
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), and synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) ) And direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memories of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memories.
本申请还提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被计算机执行时实现上述任一方法实施例的功能。本申请还提供了一种计算机程序产品,该计算机程序产品被计算机执行时实现上述任一方法实施例的功能。The present application also provides a computer-readable medium on which a computer program is stored, and when the computer program is executed by a computer, the function of any of the foregoing method embodiments is realized. This application also provides a computer program product, which, when executed by a computer, realizes the functions of any of the foregoing method embodiments.
可以理解,说明书通篇中提到的“实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各个实施例未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。可以理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It can be understood that the “embodiment” mentioned throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Therefore, the various embodiments throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures or characteristics can be combined in one or more embodiments in any suitable manner. It can be understood that, in the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not imply the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application. The implementation process constitutes any limitation.
可以理解,在本申请中,“当…时”、“若”以及“如果”均指在某种客观情况下装置会做出相应的处理,并非是限定时间,且也不要求装置实现时一定要有判断的动作,也不意味着存在其它限定。本申请中对于使用单数表示的元素旨在用于表示“一个或多个”,而并非表示“一个且仅一个”,除非有特别说明。本申请中,在没有特别说明的情况下,“至少一个”旨在用于表示“一个或者多个”,“多个”旨在用于表示“两个或两个以上”。另外,本文中术语“系统”和“网络”在本文中常被可互换使用。It can be understood that in this application, "when", "if" and "if" all mean that the device will make corresponding processing under certain objective circumstances. It is not a time limit, and it does not require the device to be implemented. There must be a judgmental action, and it does not mean that there are other restrictions. The use of the singular element in this application is intended to mean "one or more" rather than "one and only one", unless otherwise specified. In this application, unless otherwise specified, "at least one" is intended to mean "one or more", and "multiple" is intended to mean "two or more". In addition, the terms "system" and "network" in this article are often used interchangeably in this article.
本文中术语“……中的至少一个”或“……中的至少一种”,表示所列出的各项的全部或任意组合,例如,“A、B和C中的至少一种”,可以表示:单独存在A,单独存在B,单独 存在C,同时存在A和B,同时存在B和C,同时存在A、B和C这六种情况,其中A可以是单数或者复数,B可以是单数或者复数,C可以是单数或者复数。The term "at least one of..." or "at least one of..." as used herein means all or any combination of the listed items, for example, "at least one of A, B and C", It can mean: A alone exists, B alone exists, C exists alone, A and B exist at the same time, B and C exist at the same time, and there are six cases of A, B and C at the same time, where A can be singular or plural, and B can be Singular or plural, C can be singular or plural.
可以理解,在本申请各实施例中,“与A对应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A可以是单数或者复数,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。It can be understood that in the embodiments of the present application, "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B based on A does not mean that B is determined only based on A, and B can also be determined based on A and/or other information. The term "and/or" in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone In the three cases of B, A can be singular or plural, and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship.
本领域普通技术人员可以理解,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those of ordinary skill in the art can understand that, for the convenience and conciseness of the description, the specific working process of the system, device, and unit described above can refer to the corresponding process in the foregoing method embodiment, and will not be repeated here.
可以理解,本申请中描述的系统、装置和方法也可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。It can be understood that the systems, devices, and methods described in this application can also be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
本实施例所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described in this embodiment are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
本申请中各个实施例之间相同或相似的部分可以互相参考。在本申请中各个实施例、以及各实施例中的各个实施方式/实施方法/实现方法中,如果没有特殊说明以及逻辑冲突,不同的实施例之间、以及各实施例中的各个实施方式/实施方法/实现方法之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例、以及各实施例中的各个实施方式/实施方法/实现方法中的技术特征根据其内在的逻辑关系可以组合形成新的实施例、实施方式、实施方法、或实现方法。以上所述的本申请实施方式并不构成对本申请保护范围的限定。The same or similar parts in the various embodiments of this application may be referred to each other. In each embodiment of this application, and each implementation method/implementation method/implementation method in each embodiment, if there is no special description and logical conflict, between different embodiments and each implementation manner/implementation method in each embodiment/ The terms and/or descriptions between the implementation methods/implementation methods are consistent and can be cited each other. The technical features in different embodiments and various implementation modes/implementation methods/implementation methods in each embodiment are based on their inherent The logical relationship can be combined to form a new embodiment, implementation, implementation method, or implementation method. The implementation manners of the application described above do not constitute a limitation on the protection scope of the application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application.

Claims (23)

  1. 一种识别设备中的声纹识别装置,其特征在于,包括:处理器,与所述处理器连接的语音输入输出设备和收发器;A voiceprint recognition device in a recognition device, which is characterized by comprising: a processor, a voice input and output device and a transceiver connected to the processor;
    所述语音输入输出设备,用于接收用户输入的语音;The voice input and output device is used to receive the voice input by the user;
    所述处理器,用于消除所述语音中的与所述识别设备相关的噪声以得到目标声纹信息,所述目标声纹信息为所述用户的生物声纹;The processor is configured to eliminate noise related to the recognition device in the voice to obtain target voiceprint information, where the target voiceprint information is the biological voiceprint of the user;
    所述收发器,用于获取所述用户的注册声纹信息,所述注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;The transceiver is configured to obtain registered voiceprint information of the user, where the registered voiceprint information includes at least one biological voiceprint template obtained by eliminating noise related to a registered device;
    所述处理器,还用于将所述目标声纹信息与所述注册声纹信息进行匹配以得到匹配结果,所述匹配结果用于指示所述用户身份。The processor is further configured to match the target voiceprint information with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the user identity.
  2. 根据权利要求1所述的装置,其特征在于,The device of claim 1, wherein:
    所述收发器,用于接收存储设备或所述注册设备发送的所述注册声纹信息及所述注册声纹信息的声纹标识,所述声纹标识用于指示所述用户。The transceiver is configured to receive the registered voiceprint information and the voiceprint identifier of the registered voiceprint information sent by a storage device or the registration device, where the voiceprint identifier is used to indicate the user.
  3. 根据权利要求2所述的装置,其特征在于,The device of claim 2, wherein:
    所述收发器,还用于向所述存储设备或所述注册设备发送所述用户的声纹标识,以请求所述用户的注册声纹信息。The transceiver is further configured to send the voiceprint identifier of the user to the storage device or the registration device to request the user's registered voiceprint information.
  4. 根据权利要求1至3中任一项所述的装置,其特征在于,The device according to any one of claims 1 to 3, characterized in that:
    所述处理器,具体用于将所述语音输入声纹提取模型,通过所述声纹提取模型消除所述语音中的与所述识别设备相关的噪声以得到所述目标声纹信息。The processor is specifically configured to input the voice into a voiceprint extraction model, and eliminate noise related to the recognition device in the voice through the voiceprint extraction model to obtain the target voiceprint information.
  5. 根据权利要求4所述的装置,其特征在于,所述声纹提取模型为对多个设备采集的语料进行学习得到的。The device according to claim 4, wherein the voiceprint extraction model is obtained by learning corpus collected by multiple devices.
  6. 根据权利要求1至3中任一项所述的装置,其特征在于,The device according to any one of claims 1 to 3, characterized in that:
    所述处理器,还用于通过滤波器对所述语音进行频响补偿,所述频响补偿用于消除所述语音中的与所述识别设备相关的噪声以得到所述目标声纹信息。The processor is further configured to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise in the voice related to the recognition device to obtain the target voiceprint information.
  7. 根据权利要求1-6所述的装置,其特征在于,所述处理器,还用于:The device according to claims 1-6, wherein the processor is further configured to:
    获取与所述用户身份关联的用户数据;Obtaining user data associated with the user identity;
    执行与所述用户数据对应的操作。Perform an operation corresponding to the user data.
  8. 一种注册设备中的声纹注册装置,其特征在于,包括:处理器及与所述处理器连接的语音输入输出设备;A voiceprint registration device in a registration device, characterized by comprising: a processor and a voice input and output device connected to the processor;
    所述语音输入输出设备,用于接收用户输入的语音;The voice input and output device is used to receive the voice input by the user;
    所述处理器,用于消除所述语音中与所述注册设备相关的噪声以得到所述用户的注册声纹信息,所述注册声纹信息包括所述用户的生物声纹。The processor is configured to eliminate noise related to the registered device in the voice to obtain registered voiceprint information of the user, where the registered voiceprint information includes the biological voiceprint of the user.
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括与所述处理器连接的收发器;The device according to claim 8, wherein the device further comprises a transceiver connected to the processor;
    所述收发器,用于向存储设备或识别设备发送所述注册声纹信息及所述注册声纹信息对应的声纹标识,所述声纹标识用于指示所述用户。The transceiver is configured to send the registered voiceprint information and a voiceprint identifier corresponding to the registered voiceprint information to a storage device or an identification device, where the voiceprint identifier is used to indicate the user.
  10. 根据权利要求8或9所述的装置,其特征在于,The device according to claim 8 or 9, characterized in that:
    所述处理器,具体用于将所述语音输入声纹提取模型,通过所述声纹提取模型消除所述语音中的与所述注册设备相关的噪声以得到所述注册声纹信息。The processor is specifically configured to input the voice into a voiceprint extraction model, and eliminate noise related to the registered device in the voice through the voiceprint extraction model to obtain the registered voiceprint information.
  11. 根据权利要求10所述的装置,其特征在于,所述声纹提取模型为对多个设备采集的语料进行学习得到的。The device according to claim 10, wherein the voiceprint extraction model is obtained by learning corpus collected by multiple devices.
  12. 根据权利要求8或9所述的装置,其特征在于,The device according to claim 8 or 9, characterized in that:
    所述处理器,还用于通过滤波器对所述语音进行频响补偿,所述频响补偿用于消除所述语音中与所述识别设备相关的噪声以得到所述注册声纹信息。The processor is further configured to perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain the registered voiceprint information.
  13. 一种跨设备声纹识别的方法,应用于识别设备,其特征在于,包括:A method for cross-device voiceprint recognition, applied to a recognition device, is characterized in that it includes:
    接收用户输入的语音;Receive the voice input by the user;
    消除所述语音中的与所述识别设备相关的噪声以得到目标声纹信息,所述目标声纹信息为所述用户的生物声纹;Removing noise related to the recognition device in the voice to obtain target voiceprint information, where the target voiceprint information is a biological voiceprint of the user;
    获取所述用户的注册声纹信息,所述注册声纹信息包括消除与注册设备相关的噪声而得到的至少一个生物声纹模板;Acquiring registered voiceprint information of the user, where the registered voiceprint information includes at least one biological voiceprint template obtained by eliminating noise related to the registered device;
    将所述目标声纹信息与所述注册声纹信息进行匹配以得到匹配结果,所述匹配结果用于指示所述用户身份。The target voiceprint information is matched with the registered voiceprint information to obtain a matching result, and the matching result is used to indicate the identity of the user.
  14. 根据权利要求13所述的方法,其特征在于,所述获取所述用户的注册声纹信息,包括:The method according to claim 13, wherein said obtaining registered voiceprint information of said user comprises:
    接收存储设备或所述注册设备发送的所述注册声纹信息及所述注册声纹信息对应的声纹标识,所述声纹标识用于指示所述用户。Receiving the registered voiceprint information and a voiceprint identifier corresponding to the registered voiceprint information sent by a storage device or the registration device, where the voiceprint identifier is used to indicate the user.
  15. 根据权利要求14所述的方法,其特征在于,所述接收存储设备或所述注册设备发送的所述注册声纹信息及所述注册声纹信息对应的声纹标识之前,所述方法还包括:The method according to claim 14, wherein before the receiving the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information sent by the storage device or the registration device, the method further comprises :
    向所述存储设备或所述注册设备发送所述声纹标识,以请求所述用户的注册声纹信息。Send the voiceprint identifier to the storage device or the registration device to request the user's registered voiceprint information.
  16. 根据权利要求13-15中任一项所述的方法,其特征在于,所述消除所述语音中的与所述识别设备相关的噪声以得到目标声纹信息,包括:The method according to any one of claims 13-15, wherein the removing noise related to the recognition device in the speech to obtain target voiceprint information comprises:
    将所述语音输入声纹提取模型,通过所述声纹提取模型消除所述语音中与所述识别设备相关的噪声以得到所述目标声纹信息。The voice is input into a voiceprint extraction model, and noise related to the recognition device in the voice is eliminated through the voiceprint extraction model to obtain the target voiceprint information.
  17. 根据权利要求16所述的方法,其特征在于,所述声纹提取模型为对多个设备采集的语料进行学习得到的。The method according to claim 16, wherein the voiceprint extraction model is obtained by learning corpus collected by multiple devices.
  18. 根据权利要求13-17中任一项所述的方法,其特征在于,所述将所述目标声纹信息与所述注册声纹信息进行匹配以得到匹配结果之后,所述方法还包括:The method according to any one of claims 13-17, wherein after the matching the target voiceprint information with the registered voiceprint information to obtain a matching result, the method further comprises:
    获取与所述用户身份关联的用户数据;Obtaining user data associated with the user identity;
    执行与所述用户数据对应的操作。Perform an operation corresponding to the user data.
  19. 一种跨设备声纹识别的方法,应用于注册设备,其特征在于,包括:A method for cross-device voiceprint recognition, applied to registered devices, is characterized in that it includes:
    接收用户输入的语音;Receive the voice input by the user;
    消除所述语音中与所述注册设备相关的噪声以得到注册声纹信息;所述注册声纹信息包括所述用户的生物声纹。Eliminate noise related to the registered device in the voice to obtain registered voiceprint information; the registered voiceprint information includes the biological voiceprint of the user.
  20. 根据权利要求19所述的方法,其特征在于,所述方法还包括:The method according to claim 19, wherein the method further comprises:
    向存储设备或识别设备发送所述注册声纹信息及所述注册声纹信息对应的声纹标识。Send the registered voiceprint information and the voiceprint identifier corresponding to the registered voiceprint information to a storage device or an identification device.
  21. 根据权利要求19或20所述的方法,其特征在于,所述消除所述语音中与所述注册设备相关的噪声以得到生物声纹信息,包括:The method according to claim 19 or 20, wherein the removing noise related to the registration device in the voice to obtain biological voiceprint information comprises:
    将所述语音输入声纹提取模型,通过所述声纹提取模型消除所述语音中的与所述注册设备相关的噪声以得到所述注册声纹信息。The voice is input into a voiceprint extraction model, and noise related to the registered device in the voice is eliminated through the voiceprint extraction model to obtain the registered voiceprint information.
  22. 根据权利要求21所述的方法,其特征在于,所述声纹提取模型为对多个设备采集的语料进行学习得到的。The method according to claim 21, wherein the voiceprint extraction model is obtained by learning corpus collected by multiple devices.
  23. 根据权利要求19或20所述的方法,其特征在于,所述消除所述语音中与所述注册设备相关的噪声以得到生物声纹信息,包括:The method according to claim 19 or 20, wherein the removing noise related to the registration device in the voice to obtain biological voiceprint information comprises:
    通过滤波器对所述语音进行频响补偿,所述频响补偿用于消除所述语音中的与所述识别设备相关的噪声以得到所述注册声纹信息。Perform frequency response compensation on the voice through a filter, and the frequency response compensation is used to eliminate noise related to the recognition device in the voice to obtain the registered voiceprint information.
PCT/CN2020/090930 2020-05-19 2020-05-19 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method WO2021232213A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080001170.5A CN114026637A (en) 2020-05-19 2020-05-19 Voiceprint recognition and registration device and cross-device voiceprint recognition method
PCT/CN2020/090930 WO2021232213A1 (en) 2020-05-19 2020-05-19 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/090930 WO2021232213A1 (en) 2020-05-19 2020-05-19 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method

Publications (1)

Publication Number Publication Date
WO2021232213A1 true WO2021232213A1 (en) 2021-11-25

Family

ID=78708944

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090930 WO2021232213A1 (en) 2020-05-19 2020-05-19 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method

Country Status (2)

Country Link
CN (1) CN114026637A (en)
WO (1) WO2021232213A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780787A (en) * 2022-04-01 2022-07-22 杭州半云科技有限公司 Voiceprint retrieval method, identity verification method, identity registration method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160042739A1 (en) * 2014-08-07 2016-02-11 Nuance Communications, Inc. Fast speaker recognition scoring using i-vector posteriors and probabilistic linear discriminant analysis
CN106779121A (en) * 2016-12-23 2017-05-31 上海木爷机器人技术有限公司 Electronic bill processing method and system
CN108172230A (en) * 2018-01-03 2018-06-15 平安科技(深圳)有限公司 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
CN108989349A (en) * 2018-08-31 2018-12-11 平安科技(深圳)有限公司 User account number unlocking method, device, computer equipment and storage medium
CN109360579A (en) * 2018-12-05 2019-02-19 途客电力科技(天津)有限公司 Charging pile phonetic controller and system
CN111161746A (en) * 2019-12-31 2020-05-15 苏州思必驰信息科技有限公司 Voiceprint registration method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160042739A1 (en) * 2014-08-07 2016-02-11 Nuance Communications, Inc. Fast speaker recognition scoring using i-vector posteriors and probabilistic linear discriminant analysis
CN106779121A (en) * 2016-12-23 2017-05-31 上海木爷机器人技术有限公司 Electronic bill processing method and system
CN108172230A (en) * 2018-01-03 2018-06-15 平安科技(深圳)有限公司 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
CN108989349A (en) * 2018-08-31 2018-12-11 平安科技(深圳)有限公司 User account number unlocking method, device, computer equipment and storage medium
CN109360579A (en) * 2018-12-05 2019-02-19 途客电力科技(天津)有限公司 Charging pile phonetic controller and system
CN111161746A (en) * 2019-12-31 2020-05-15 苏州思必驰信息科技有限公司 Voiceprint registration method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780787A (en) * 2022-04-01 2022-07-22 杭州半云科技有限公司 Voiceprint retrieval method, identity verification method, identity registration method and device

Also Published As

Publication number Publication date
CN114026637A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
US11386905B2 (en) Information processing method and device, multimedia device and storage medium
EP3525205B1 (en) Electronic device and method of performing function of electronic device
US10805470B2 (en) Voice-controlled audio communication system
CN105118257B (en) Intelligent control system and method
CN104335559B (en) A kind of method of automatic regulating volume, volume adjustment device and electronic equipment
CN103236263B (en) A kind of method, system and mobile terminal improving speech quality
US20150088515A1 (en) Primary speaker identification from audio and video data
US20160087952A1 (en) Scalable authentication process selection based upon sensor inputs
KR101883301B1 (en) Method for Providing Personalized Voice Recognition Service Using Artificial Intellignent Speaker Recognizing Method, and Service Providing Server Used Therein
US10916249B2 (en) Method of processing a speech signal for speaker recognition and electronic apparatus implementing same
US20220172729A1 (en) System and Method For Achieving Interoperability Through The Use of Interconnected Voice Verification System
CN111415673A (en) Customized audio processing based on user-specific and hardware-specific audio information
US20240013789A1 (en) Voice control method and apparatus
WO2021232213A1 (en) Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method
CN103426429A (en) Voice control method and voice control device
US20120330663A1 (en) Identity authentication system and method
US20180182393A1 (en) Security enhanced speech recognition method and device
WO2021244471A1 (en) Real-name authentication method and device
US10433081B2 (en) Consumer electronics device adapted for hearing loss compensation
CN109104664A (en) Control method, system, intelligent sound box and the storage medium of intelligent sound box
US20190312864A1 (en) Method and apparatus for establishing association between devices
CN111653284A (en) Interaction and recognition method, device, terminal equipment and computer storage medium
WO2022007846A1 (en) Speech enhancement method, device, system, and storage medium
CN112417923A (en) System, method and apparatus for controlling smart devices
JP2020503610A (en) Systems, methods, and media for utilizing remote data with biometric signature samples

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20936534

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20936534

Country of ref document: EP

Kind code of ref document: A1