WO2019006587A1 - Speaker recognition system, speaker recognition method, and in-ear device - Google Patents
Speaker recognition system, speaker recognition method, and in-ear device Download PDFInfo
- Publication number
- WO2019006587A1 WO2019006587A1 PCT/CN2017/091466 CN2017091466W WO2019006587A1 WO 2019006587 A1 WO2019006587 A1 WO 2019006587A1 CN 2017091466 W CN2017091466 W CN 2017091466W WO 2019006587 A1 WO2019006587 A1 WO 2019006587A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- ear
- signal
- ear canal
- voiceprint
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 53
- 210000000613 ear canal Anatomy 0.000 claims abstract description 120
- 238000012545 processing Methods 0.000 claims abstract description 52
- 238000001514 detection method Methods 0.000 claims description 67
- 239000000284 extract Substances 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 230000000241 respiratory effect Effects 0.000 claims description 8
- 230000001629 suppression Effects 0.000 claims description 7
- 230000005236 sound signal Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 description 33
- 238000010586 diagram Methods 0.000 description 16
- 230000029058 respiratory gaseous exchange Effects 0.000 description 8
- 238000013475 authorization Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 208000037656 Respiratory Sounds Diseases 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 210000000214 mouth Anatomy 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 210000000883 ear external Anatomy 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000004877 mucosa Anatomy 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000036391 respiratory frequency Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 210000001260 vocal cord Anatomy 0.000 description 2
- 210000002388 eustachian tube Anatomy 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 210000001331 nose Anatomy 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 230000036387 respiratory rate Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- the present application relates to a speaker recognition system and a speaker recognition method, and more particularly to a speaker recognition system and a speaker recognition method that can avoid being recorded or pirated.
- Speaker recognition has been widely used in voice security systems or voice authorization systems, and has become an indispensable feature in contemporary technology products.
- the existing speech recognition system mainly uses the microphone outside the human body to collect the sound, and the sound received by the human body is the sound wave transmitted by the human body through the oral cavity and transmitted through the external air medium, and the existing speaker identification is affected by the person concerned.
- the risk of recording or pirating In detail, a certain prince can track a certain prince and record the voice of the prince, or eavesdrop on the voice of the prince, and even use the technique of speech synthesis to forge the voice of the singer and store the voice of the prince in advance.
- the present application provides a speaker recognition system including an in-ear device for inserting an external auditory canal of a user, the in-ear device comprising a sound receiver for receiving from the external auditory canal.
- An ear canal sound wave to generate an ear canal acoustic signal corresponding to the ear canal sound wave; a frequency processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal sound signal to generate a voiceprint feature signal; and a terminal device for The voiceprint feature signal determines whether the user is an authenticated user.
- the in-ear device is a wired or wireless in-ear earphone, an in-ear earphone microphone, an earplug or a hearing aid.
- the audio processing module performs a speech detection operation and a feature extraction operation on the ear canal acoustic signal to generate the voiceprint feature signal.
- the audio processing module performs a noise suppression operation on the ear canal acoustic signal.
- the terminal device is a mobile electronic device, a computer host or an access control system.
- the terminal device establishes a voiceprint model corresponding to the authenticated user, and receives a voiceprint feature signal from the audio processing module, and compares the voiceprint feature signal according to the voiceprint model to generate a The similarity signal, the terminal device determines, according to the similarity signal, whether the user is the authenticated user.
- the audio processing module performs a physiological detection operation on the ear canal acoustic signal to generate a physiological detection result
- the terminal device determines the user according to the voiceprint characteristic signal and the physiological detection result. Whether it is the authenticated user.
- the physiological detection operation is a respiratory detection operation
- the physiological detection result is a respiratory detection result
- the physiological detection operation is a heart rate detection operation
- the physiological detection result is a heart rate detection result
- the present application also provides a speaker recognition method, which is applied to a speaker recognition system
- the speaker recognition system includes an in-ear device and a terminal device
- the in-ear device includes a microphone and an audio processing module.
- the in-ear device is placed in an external auditory canal of a user
- the speaker identification method includes the microphone receiving an ear canal sound wave from the external auditory canal to generate an acoustic wave corresponding to the ear canal An ear canal acoustic signal
- the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal
- the terminal Determining, according to the voiceprint characteristic signal, whether an utterance of the speaker recognition system is the user itself; wherein the utterance of the speaker recognition system is for the speaker recognition system A person or device that performs sound recognition.
- the present application utilizes an in-ear device for receiving sound to receive the ear canal sound waves from the user's external ear, and uses the audio processing module in the in-ear device to capture the voiceprint characteristics of the user, and uses the terminal device to perform voiceprint comparison to determine Whether the speaker of the speaker recognition system is the user itself.
- the present application can avoid the risk of being recorded or pirated by a person with a heart.
- FIG. 1 is a schematic diagram of the appearance of a speaker recognition system according to an embodiment of the present application.
- FIG. 2 is a functional block diagram of the speaker recognition system of FIG. 1.
- FIG. 3 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
- FIG. 4 is a schematic diagram of a voiceprint feature extraction process according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of a voiceprint comparison process according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
- FIG. 8 is a schematic functional block diagram of a speaker recognition system according to an embodiment of the present application.
- FIG. 9 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
- the existing speaker recognition mainly receives the sound waves sent by the speaker's mouth, and transmits the sound waves to the microphone outside the human body through the air medium outside the human body, and for the security system that needs to be recognized by the speaker (such as the voice access control system, In the case of voice payment systems, there is a risk that existing speaker recognition has been recorded or stolen by interested people.
- the sound waves generated by the vocal cord mucosa are transmitted through the Eustachian tube to the Internal Auditory Meatus or even the External Auditory Meatus, and to the external auditory canal.
- Sound waves (or ear canal sound waves) have different sound characteristics than sound waves received by microphones other than the human body. In other words, even if the speaker is the same person, the ear canal sound waves are different from the side-recorded or stolen sound waves. Sound characteristics.
- the speaker recognition system of the present application performs sound collection on the external auditory canal of the user, and captures the voiceprint features of the ear canal sound waves, and performs speaker recognition on the voiceprint features of the ear canal sound waves to prevent the user's voice from being affected.
- FIG. 1 and FIG. 2 are schematic diagrams showing the appearance and functional blocks of the speaker recognition system 10 according to the embodiment of the present application.
- the speaker recognition system 10 includes an In-Ear device (ie, a Canal-Type Device) 100 and a terminal device 120.
- the terminal device 120 can be a computer host with computing functions and mobile electronics.
- the device or the access control system, the in-ear device 100 can be placed into an external ear canal (Canal, ie, External Acoustic Meatus) of the user USR, which can be an earphone, an earphone, and an earplug. (Earplug) or one of the hearing aids (Hearing Aid).
- the in-ear device 100 can include a radio 102, a speaker 104, and an audio processing module 106.
- the radio 102 can be a microphone for receiving an ear canal CWV from the external ear canal of the user USR.
- the ear canal acoustic wave CWV is converted into an ear canal acoustic signal CSg, that is, the radio receiver 102 can generate an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
- the audio processing module 106 is coupled to the radio receiver 102, and extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg to generate a voiceprint feature signal VPF, wherein the voiceprint feature signal VPF includes User Voiceprint features of the USR.
- the in-ear device 100 can transmit the voiceprint feature signal VPF to the terminal device 120 through wired transmission or wireless transmission.
- the terminal device 120 can determine whether the user USR is an authenticated user or other person according to the voiceprint characteristic signal received by the user, or even a recorder that records the USR sound of the user in advance, wherein the speaker recognition system
- the utterance of 10 refers to a person or device (such as a recorder or a device having a speech synthesis function) that emits a sound to the speaker recognition system 10 for voiceprint recognition.
- the terminal device 120 can determine whether the user USR is an authenticated user according to the voiceprint feature signal it receives. In an ideal case, the terminal device 120 receives the voiceprint feature signal VPF generated by the in-ear device 100, and determines that the user USR is indeed an authenticated user based on the voiceprint feature signal VPF.
- FIG. 3 is a schematic diagram of a voiceprint identification process 30 according to an embodiment of the present application.
- the voiceprint recognition process 30 can be performed by the speaker recognition system 10, which includes the following steps:
- Step 302 The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
- Step 304 The audio processing module 106 of the in-ear device 100 extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
- Step 306 The terminal device 120 determines whether the user USR is an authenticated user according to the voiceprint feature signal VPF.
- step 304 the audio processing module 106 extracts the operation details corresponding to the voiceprint feature of the user USR and generates the voiceprint feature signal VPF from the ear canal acoustic signal CSg.
- FIG. 4 is a voiceprint feature capture.
- the schematic of the flow 40, the voiceprint feature extraction process 40 is performed by the audio processing module 106 of the in-ear device 100.
- the audio processing module 106 can perform a voice detection operation, a noise suppression (Noise Suppression) operation, and a feature extraction operation on the ear canal acoustic signal CSg to generate voiceprint features.
- the signal VPF wherein the speech detection operation, the noise suppression operation, and the feature extraction operation are not limited to being implemented by using a specific algorithm, and the technical details thereof are well known to those skilled in the art, and thus will not be described herein.
- the sound pattern The speech detection operation, the noise suppression operation, and the feature extraction operation in the enrollment process 40 are all performed by the audio processing module 106 disposed in the in-ear device 100, that is, generated by the audio processing module 106 disposed in the in-ear device 100.
- Voiceprint feature signal VPF After the audio processing module 106 generates the voiceprint feature signal VPF, the voiceprint feature signal VPF can be transmitted to the terminal device 120 by wire transmission or wireless transmission.
- step 306 the terminal device 120 determines, according to the voiceprint characteristic signal VPF, whether the user USR is an operation detail of the authenticated user.
- FIG. 5 is a schematic diagram of a voiceprint comparison process 50, and the voiceprint comparison process 50 is performed by the human body. It is executed by the terminal device 120 other than the terminal device 120. As shown in FIG.
- the terminal device 120 may first establish a voiceprint model MD corresponding to the authenticated user according to the voiceprint feature signal VPF, and after establishing the voiceprint model MD, compare the voiceprint feature signal VPF with the voiceprint model MD to perform "soundprint matching", and according to the voiceprint matching result, a similarity score SC (Score) is generated, wherein the similarity score SC represents the degree of similarity between the voiceprint characteristic signal VPF and the voiceprint model MD, which may be one Similarity signal.
- Score similarity score
- the terminal device 120 may correspond to the established user authentication voiceprint model MD (or the times t 1 to the first audio processing module 106 receives the generated corresponding to the authenticated user a first sound pattern feature in a first time t 1 VPF1 signal, and establishes corresponding to the authenticated user according to the first acoustic model MD voiceprint characteristic signal pattern VPF1, first acoustic signal characteristic pattern voiceprint features VPF1 t represents a first time signal to the VPF 1), to establish a model voiceprint after the MD, the terminal device 120 may be at a second time t 2 receiving the audio signal processing a second acoustic characteristic pattern VPF2 (which represents the second time t voiceprint signal VPF 2) is generated by module 106, the terminal apparatus 120 The second voiceprint characteristic signal VPF2 and the voiceprint model MD may be compared for voiceprint matching, and a similarity score SC is generated according to the voiceprint matching result.
- the terminal device 120 After the terminal device 120 generates the similarity score SC, it can determine whether the user USR is an authenticated user based on the similarity score SC, that is, the terminal device 120 performs the step of "identifying the identity” in FIG. In an embodiment, when the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.
- the steps of "establishing a voiceprint model", “soundprint matching”, and “acquiring a similarity score” in FIG. 5 are not limited to being implemented by using a specific algorithm, and the technical details thereof are well known to those skilled in the art, and thus Let me repeat.
- the speaker recognition system 10 receives the ear canal sound wave CWV by using the in-ear device 100, and uses the audio processing module 106 to capture the voiceprint feature corresponding to the user USR, and uses the terminal device 120 to determine the use according to the voiceprint characteristic signal VPF. Whether the USR is an authenticated user.
- a speaker-recognized security system such as a voice access control system, hereinafter referred to as a voice security system
- a voice authorization system a system that is recognized by a speaker to confirm the identity of a speaker for authorization to proceed to the next step, such as a voice payment system
- voice transfer transaction system a system that is recognized by a speaker to confirm the identity of a speaker for authorization to proceed to the next step, such as a voice payment system
- voice transfer transaction system such as a voice credit card transaction system or voice login system, etc.
- the speaker recognition system 10 performs sound collection on the external auditory canal of the user USR, and performs voiceprint recognition on the voiceprint characteristics of the ear canal sound wave CWV, since the ear canal sound wave is different from the sound wave received by the external microphone.
- the voice features, and the intent of the person can not crack the voice security system with the speaker recognition system 10 via side recording, piracy or voice synthesis, which can further enhance the security of the voice security system or the voice authorization system.
- the audio processing module 106 can determine whether the ear canal acoustic wave CWV has a respiratory sound wave by the ear canal acoustic signal CSg, that is, perform a physiological detection operation on the ear canal acoustic signal CSg to confirm that the speech end of the speaker recognition system 10 has physiological characteristics.
- a natural person rather than a device such as a tape recorder or a speech synthesizer, wherein the physiological detection operation can be a breathing detection operation or even a heart rate detection operation.
- FIG. 6 is a schematic diagram of a voiceprint identification process 60 according to an embodiment of the present application.
- the voiceprint recognition process 60 can be performed by the speaker recognition system 10, which includes the following steps:
- Step 602 The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
- Step 603 The audio processing module 106 of the in-ear device 100 performs a physiological detection operation on the ear canal acoustic signal CSg to generate a physiological detection result Bio.
- Step 604 The audio processing module 106 of the in-ear device 100 extracts the voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
- Step 606 The terminal device 120 determines, according to the voiceprint characteristic signal VPF and the physiological detection result Bio, whether the user USR is the authenticated user itself.
- the voiceprint recognition process 60 is similar to the voiceprint recognition process 30. Unlike the voiceprint recognition process 30, the voiceprint recognition process 60 further includes a step 603.
- the audio processing module 106 is not limited to performing a breath detection operation on the ear canal acoustic signal CSg by using a specific algorithm. For example, the audio processing module 106 can detect whether the ear canal sound wave CWV has a specific respiratory frequency according to the ear canal acoustic signal CSg. Breathing sound waves, not limited to this. The technical details of the breath detection operation are well known to those skilled in the art and will not be described herein.
- the physiological test result Bio can be a binary value (Binary Value), which represents the detection of "having breathing” or “no breathing”, when the physiological test result Bio indicates that "there is In the case of "breathing", the speaking end of the representative speaker recognition system 10 is a natural person.
- the physiological detection result Bio may also be a non-binary value such as a gray level value, and the representative representative detects that there is “breathing” (or The Confidence Level of the "no breath” is detected, or the specific respiratory rate and characteristics of the user USR.
- the terminal device 120 determines whether the speech end of the speaker recognition system 10 is the user USR itself based on the voiceprint characteristic signal VPF and the physiological detection result Bio. In one embodiment, when the physiological detection result Bio indicates that "breathing" is detected and the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.
- the voice security system or voice authorization system usually has a question-and-answer dialogue situation.
- the bank or credit card center, payment system center, hereinafter referred to as the customer service
- the customer service may ask in the voice call: May I ask your account?" and the user may answer: "123456789", in which the customer service question can be sent to the external auditory canal of the user USR through the speaker 104.
- the ear canal sound wave CWV can include the client's question sound wave.
- the audio processing module 106 in the in-ear device 100 can determine whether the ear canal sound wave CWV has a reflected sound wave to the question sound wave by the ear canal sound signal CSg to generate a reflected wave detection result.
- the acoustic wave CWV When the reflected wave detection result shows the ear canal
- the speech end of the speaker recognition system 10 is a natural person, rather than a device such as a recorder or a speech synthesizer, thereby eliminating the possibility that the speech end of the speaker recognition system 10 is a device.
- the question sound wave can be broadly regarded as the prompt sound wave.
- the user USR After the prompt sound wave ends, the user USR can start to speak.
- the customer service terminal may say in the voice call: "Please hear the buzzer and read you.
- the account number/password ie, the prompt statement
- the prompt sound wave may include a sound wave or the click sound related to the prompt statement.
- FIG. 7 is a schematic diagram of a voiceprint identification process 70 according to an embodiment of the present application.
- the voiceprint recognition process 70 can be performed by the speaker recognition system 10, which includes the following steps:
- Step 701 The speaker 104 sends a prompt sound wave to the user's USR external auditory canal.
- Step 702 The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
- Step 703 The audio processing module 106 of the in-ear device 100 determines whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV according to the ear canal sound signal CSg to generate a reflected wave detection result Rf.
- Step 704 The audio processing module 106 of the in-ear device 100 extracts the voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
- Step 706 The terminal device 120 determines whether the user USR is an authenticated user based on the voiceprint characteristic signal VPF and the reflected wave detection result Rf.
- the voiceprint recognition process 70 is similar to the voiceprint recognition process 30. Different from the voiceprint recognition process 30, the voiceprint recognition process 70 further includes steps 701 and 703.
- the audio processing module 106 is not limited to determining whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV by using a specific algorithm. For example, since the outer ear prop of the human body has an ear canal length range, the audio processing module 106 can According to the length range of the ear canal, it is judged whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV.
- the technical details of the physiological detection operation (such as the breathing detection operation or the heart rate detection operation) in the ear canal are well known to those skilled in the art, and thus will not be described herein.
- the reflected wave detection result Rf can be a binary number The value represents "reflected wave” or "no reflected wave”. When the reflected wave detection result Rf indicates "reflected wave”, the caller of the speaker recognition system 10 is a natural person.
- the terminal device 120 determines whether the originating end of the speaker recognition system 10 is the user USR itself based on the voiceprint characteristic signal VPF and the reflected wave detection result Rf. In one embodiment, when the reflected wave detection result Rf indicates "reflected wave” and the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.
- the speaker recognition system of the present application can perform a voice change on the voiceprint characteristic signal VPF generated by the in-ear device 100 by using a personal electronic device such as a smart phone (Voice)
- the terminal device 120 performs speaker recognition based on the voiced characteristic signal after the voice change, that is, whether the voice end of the speaker recognition system is the user USR according to the voice change characteristic signal after the voice change.
- the user USR can further increase the security of the voice security system or the voice authorization system only through the authentication process that the terminal device 120 recognizes the speaker when holding the personal electronic device.
- FIG. 8 is a functional block diagram of a speaker recognition system 80 according to an embodiment of the present application.
- the speaker recognition system 80 is similar to the speaker recognition system 10.
- the speaker recognition system 80 further includes a personal electronic device 800, which may be a smart wearable device, a smart phone, or a tablet computer.
- the personal electronic device 800 receives the voiceprint characteristic signal VPF generated by the in-ear device 100, and performs a voice-changing operation on the voiceprint characteristic signal VPF to generate a voice-changing voiceprint characteristic signal VPF', and
- the voice-changing voiceprint characteristic signal VPF' is transmitted to the terminal device 120, and the terminal device 120 performs speaker recognition based on the voice-changing voiceprint characteristic signal VPF'.
- FIG. 9 is a schematic diagram of a voiceprint identification process 90 according to an embodiment of the present application.
- the voiceprint recognition process 90 can be performed by the speaker recognition system 80, which includes the following steps:
- Step 902 The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
- Step 904 The audio processing module 106 of the in-ear device 100 extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
- Step 905 The personal electronic device 800 performs a voice-changing operation on the voiceprint feature signal VPF to generate a voice-changing voiceprint feature signal VPF'.
- Step 906 The terminal device 120 determines whether the user USR is an authenticated user based on the voiced characteristic signal VPF'.
- the voiceprint recognition process 90 is similar to the voiceprint recognition process 30. Unlike the voiceprint recognition process 30, the voiceprint recognition process 90 further includes a step 905. In step 905, the personal electronic device 800 is not limited to performing a voice-changing operation on the voiceprint feature signal VPF by using a specific algorithm to generate a voice-changing voiceprint feature signal VPF' to encrypt the information/voiceprint feature signal VPF. It is well known to those skilled in the art, and thus will not be described herein.
- the terminal device 120 may first establish a voiceprint model MD' corresponding to the user USR and the personal electronic device 800 according to the voice-changing voiceprint characteristic signal VPF', after establishing the voiceprint model MD', and then comparing the voice changes.
- the voiceprint characteristic signal VPF' and the voiceprint model MD' are used for "soundprint matching", and according to the voiceprint matching result, a similarity score SC' is generated, and the similarity score SC' represents the voiceprint characteristic signal VPF' after the voice change.
- the degree of similarity to the voiceprint model MD' For details of the remaining operations, refer to the aforementioned related paragraphs, and details are not described herein again.
- the terminal device 120 is not limited to being a computer host, as long as the terminal device 120 is an electronic device (such as a cloud server) that can perform the voiceprint comparison process 50 shown in FIG. 5, or even a mobile electronic device (such as a mobile phone, a tablet computer, etc.). All of them meet the requirements of this application and fall within the scope of this application.
- the audio processing module is not limited to being disposed in the in-ear device, and the audio processing module may also be disposed in the terminal device.
- the in-ear device only needs to send the ear canal acoustic signal to the terminal device, and the audio processing module in the terminal device is It is within the scope of the present application to extract the voiceprint features corresponding to the user USR in the ear canal acoustic signal, which also meets the requirements of the present application.
- the speaker recognition system of the present application uses an in-ear device to receive sound to receive the ear canal sound waves from the user's external ear, and utilizes the audio processing module in the in-ear device to capture the voiceprint characteristics of the user and utilize The terminal device performs voiceprint comparison based on the voiceprint characteristic signal to determine whether the speech end of the speaker recognition system is the user itself.
- the present application can avoid the risk of being recorded or pirated by a person with a heart.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Headphones And Earphones (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
A speaker recognition system (10) comprises an in-ear device (100) for positioning in an ear canal of a user; and a terminal device (120). The in-ear device (100) comprises: a sound receiver (102) used to receive an ear-canal acoustic wave from the ear canal so as to generate an ear-canal acoustic signal corresponding to the ear-canal acoustic wave; and an audio processing module (106) connected to the sound receiver (102) and used to extract, from the ear-canal acoustic signal, a voiceprint feature corresponding to the user so as to generate a voiceprint feature signal. The terminal device (120) is used to determine, according to the voiceprint feature signal, whether the user is an authenticated user.
Description
本申请涉及一种说话者识别系统及说话者识别方法,尤其涉及一种可避免被侧录或盗录的说话者识别系统及说话者识别方法。The present application relates to a speaker recognition system and a speaker recognition method, and more particularly to a speaker recognition system and a speaker recognition method that can avoid being recorded or pirated.
说话者识别已被广泛的运用在语音安全系统或语音授权系统上,已成为当代科技产品中不可或缺的一项功能之一。现有语音辨识系统主要是利用人体以外麦克风来进行收音,其所收到的声音为人体经由口腔发送出来并经过外在空气介质传导的声波,而现有说话者识别存有遭到有心人士侧录或盗录的风险。详细来说,某甲君可跟踪某乙君并侧录乙君的说话声音,或窃听乙君的说话声音,甚至利用语音合成的技术伪造乙君的说话声音,并将乙君的声音事先储存于录音机中,当甲君欲通过某语音门禁系统或语音授权系统的身份验证时,甲君可用录音机播放乙君的声音而通过份验证,进而盗用乙君的身份,可能造成乙君的财务损失,甚至危害乙君生命财产安全。因此,现有技术实有改进的必要。Speaker recognition has been widely used in voice security systems or voice authorization systems, and has become an indispensable feature in contemporary technology products. The existing speech recognition system mainly uses the microphone outside the human body to collect the sound, and the sound received by the human body is the sound wave transmitted by the human body through the oral cavity and transmitted through the external air medium, and the existing speaker identification is affected by the person concerned. The risk of recording or pirating. In detail, a certain prince can track a certain prince and record the voice of the prince, or eavesdrop on the voice of the prince, and even use the technique of speech synthesis to forge the voice of the singer and store the voice of the prince in advance. In the recorder, when A Jun wants to pass the identity verification of a voice access control system or a voice authorization system, A can use the recorder to play the voice of the king and pass the verification, and then steal the identity of the king, which may cause the financial loss of the king. And even endanger the safety of life and property of B. Therefore, there is a need for improvement in the prior art.
发明内容Summary of the invention
因此,本申请的主要目的即在于提供一种说话者识别系统及说话者识别方法,其可避免被侧录或盗录,以改善现有技术的缺点。Accordingly, it is a primary object of the present application to provide a speaker recognition system and a speaker identification method that can avoid being recorded or pirated to improve the disadvantages of the prior art.
为了解决上述技术问题,本申请提供了一种一种说话者识别系统,包括入耳式装置,用于置入使用者的外耳道,所述入耳式装置包括收音器,用来接收来自所述外耳道的耳道声波,以产生对应于所述耳道声波的一耳道声信号;声
频处理模块,耦接于所述收音器,用于自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生一声纹特征信号;以及一终端装置,用于根据所述声纹特征信号判断所述使用者是否为一认证用户。In order to solve the above technical problem, the present application provides a speaker recognition system including an in-ear device for inserting an external auditory canal of a user, the in-ear device comprising a sound receiver for receiving from the external auditory canal. An ear canal sound wave to generate an ear canal acoustic signal corresponding to the ear canal sound wave;
a frequency processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal sound signal to generate a voiceprint feature signal; and a terminal device for The voiceprint feature signal determines whether the user is an authenticated user.
例如,所述入耳式装置为一有线或无线的入耳式耳机、入耳式耳机麦克风、耳塞或助听器。For example, the in-ear device is a wired or wireless in-ear earphone, an in-ear earphone microphone, an earplug or a hearing aid.
例如,所述声频处理模块对所述耳道声信号进行一语音检测运算以及一特征提取运算,以产生所述声纹特征信号。For example, the audio processing module performs a speech detection operation and a feature extraction operation on the ear canal acoustic signal to generate the voiceprint feature signal.
例如,所述声频处理模块对所述耳道声信号进行一噪声抑制运算。For example, the audio processing module performs a noise suppression operation on the ear canal acoustic signal.
例如,所述终端装置为移动电子装置、计算机主机或门禁系统。For example, the terminal device is a mobile electronic device, a computer host or an access control system.
例如,所述终端装置建立对应于所述认证用户的一声纹模型,并接收来自所述声频处理模块的一声纹特征信号,根据所述声纹模型比对所述声纹特征信号,以产生一相似度信号,所述终端装置根据所述相似度信号判断所述使用者是否为所述认证用户。For example, the terminal device establishes a voiceprint model corresponding to the authenticated user, and receives a voiceprint feature signal from the audio processing module, and compares the voiceprint feature signal according to the voiceprint model to generate a The similarity signal, the terminal device determines, according to the similarity signal, whether the user is the authenticated user.
例如,所述声频处理模块对所述耳道声信号进行一生理检测运算,以产生一生理检测结果,所述终端装置根据所述声纹特征信号以及所述生理检测结果,判断所述使用者是否为所述认证用户。For example, the audio processing module performs a physiological detection operation on the ear canal acoustic signal to generate a physiological detection result, and the terminal device determines the user according to the voiceprint characteristic signal and the physiological detection result. Whether it is the authenticated user.
例如,所述生理检测运算为一呼吸检测运算,所述生理检测结果为一呼吸检测结果。For example, the physiological detection operation is a respiratory detection operation, and the physiological detection result is a respiratory detection result.
例如,所述生理检测运算为一心率检测运算,所述生理检测结果为一心率检测结果。For example, the physiological detection operation is a heart rate detection operation, and the physiological detection result is a heart rate detection result.
本申请还提供了一种说话者识别方法,应用于一说话者识别系统,所述说话者识别系统包括一入耳式装置及一终端装置,所述入耳式装置包括一收音器及一声频处理模块,所述入耳式装置置入一使用者的一外耳道,其特征在于,所述说话者识别方法包括所述收音器接收来自所述外耳道的一耳道声波,以产生对应于所述耳道声波的一耳道声信号;所述声频处理模块自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生一声纹特征信号;以及所述终端
装置根据所述声纹特征信号,判断所述说话者识别系统的一发话端是否为所述使用者本身;其中,所述说话者识别系统的所述发话端为对所述说话者识别系统发出声音以进行声纹辨识的人或装置。The present application also provides a speaker recognition method, which is applied to a speaker recognition system, the speaker recognition system includes an in-ear device and a terminal device, and the in-ear device includes a microphone and an audio processing module. The in-ear device is placed in an external auditory canal of a user, wherein the speaker identification method includes the microphone receiving an ear canal sound wave from the external auditory canal to generate an acoustic wave corresponding to the ear canal An ear canal acoustic signal; the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal; and the terminal
Determining, according to the voiceprint characteristic signal, whether an utterance of the speaker recognition system is the user itself; wherein the utterance of the speaker recognition system is for the speaker recognition system A person or device that performs sound recognition.
本申请利用入耳式装置来进行收音,以接收使用者外耳到的耳道声波,利用入耳式装置中的声频处理模块撷取使用者的声纹特征,并利用终端装置进行声纹对比,以判断说话者识别系统的发话端是否为使用者本身。相较于现有技术,本申请可避免遭到有心人士侧录或盗录的风险。The present application utilizes an in-ear device for receiving sound to receive the ear canal sound waves from the user's external ear, and uses the audio processing module in the in-ear device to capture the voiceprint characteristics of the user, and uses the terminal device to perform voiceprint comparison to determine Whether the speaker of the speaker recognition system is the user itself. Compared with the prior art, the present application can avoid the risk of being recorded or pirated by a person with a heart.
图1为本申请实施例一说话者识别系统的外观示意图。FIG. 1 is a schematic diagram of the appearance of a speaker recognition system according to an embodiment of the present application.
图2为图1的说话者识别系统的功能方块示意图。2 is a functional block diagram of the speaker recognition system of FIG. 1.
图3为本申请实施例一声纹辨识流程的示意图。FIG. 3 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
图4为本申请实施例一声纹特征撷取流程的示意图。FIG. 4 is a schematic diagram of a voiceprint feature extraction process according to an embodiment of the present application.
图5为本申请实施例一声纹对比流程的示意图。FIG. 5 is a schematic diagram of a voiceprint comparison process according to an embodiment of the present application.
图6为本申请实施例一声纹辨识流程的示意图。FIG. 6 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
图7为本申请实施例一声纹辨识流程的示意图。FIG. 7 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
图8为本申请实施例一说话者识别系统的功能方块示意图。FIG. 8 is a schematic functional block diagram of a speaker recognition system according to an embodiment of the present application.
图9为本申请实施例一声纹辨识流程的示意图。FIG. 9 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
人体发声流程为透过肺部呼吸时,气流通过狭窄的声门,声带黏膜会产生波动,此波动会使附近的空气介质振动,而形成疏密波,即为声波。这些声波会在咽、口腔、鼻腔及鼻窦等器官产生共鸣或共振而具有较大音量,再由嘴唇、
牙齿及舌头等器官修正成为外在听者听到的声音。现有说话者识别主要是接收由发话者口腔发送出来的声波,并藉由人体外在空气介质将声波传至人体以外的麦克风,而对于需经说话者识别的安全系统(如语音门禁系统、语音支付系统)来说,现有说话者识别存在有遭到有心人士侧录或盗录的风险。When the human body vocalization process is through the lungs, the airflow passes through the narrow glottis, and the vocal cord mucosa will fluctuate. This fluctuation will cause the nearby air medium to vibrate, and form a sparse wave, which is a sound wave. These sound waves will resonate or resonate in organs such as the pharynx, mouth, nose, and sinuses, and have a louder volume.
Organs such as teeth and tongue are corrected to be heard by external listeners. The existing speaker recognition mainly receives the sound waves sent by the speaker's mouth, and transmits the sound waves to the microphone outside the human body through the air medium outside the human body, and for the security system that needs to be recognized by the speaker (such as the voice access control system, In the case of voice payment systems, there is a risk that existing speaker recognition has been recorded or stolen by interested people.
然而,人体除了由口腔发出至外在空气介质的声音之外,声带黏膜所产生的声波亦会透过耳咽管传递至内耳道(Internal Auditory Meatus)甚至是外耳道(External Auditory Meatus),而于外耳道的声波(或称耳道声波)与利用人体以外麦克风所接收的声波具有不同的声音特征,换句话说,即使发话者为同一人,其耳道声波与侧录或盗录到的声波具有不同的声音特征。因此,本申请的说话者识别系统于使用者的外耳道进行收音,并撷取耳道声波的声纹特征,并针对耳道声波的声纹特征进行说话者识别,以避免使用者的声音遭到侧录或盗录的风险。However, in addition to the sound from the oral cavity to the external air medium, the sound waves generated by the vocal cord mucosa are transmitted through the Eustachian tube to the Internal Auditory Meatus or even the External Auditory Meatus, and to the external auditory canal. Sound waves (or ear canal sound waves) have different sound characteristics than sound waves received by microphones other than the human body. In other words, even if the speaker is the same person, the ear canal sound waves are different from the side-recorded or stolen sound waves. Sound characteristics. Therefore, the speaker recognition system of the present application performs sound collection on the external auditory canal of the user, and captures the voiceprint features of the ear canal sound waves, and performs speaker recognition on the voiceprint features of the ear canal sound waves to prevent the user's voice from being affected. The risk of side recording or piracy.
具体来说,请参考图1及图2,图1及图2分别为本申请实施例一说话者识别系统10的外观示意图及功能方块示意图。说话者识别系统10包括一入耳式(In-Ear)装置(即耳道式装置(Canal-Type Device))100以及一终端装置120,终端装置120可为具运算功能的一计算机主机、移动电子装置或门禁系统,入耳式装置100可置入一使用者USR的一外耳道(Canal,即External Acoustic Meatus),其可为一入耳式耳机(Earphone)、一入耳式耳机麦克风(Headset)、一耳塞(Earplug)或一助听器(Hearing Aid)其中之一。入耳式装置100可包含一收音器102、一扬声器(Speaker)104以及一声频处理模块106,收音器102可为一麦克风(Microphone),用来接收来自使用者USR外耳道的一耳道声波CWV,并将耳道声波CWV转换成为一耳道声信号CSg,即收音器102可产生对应于耳道声波CWV的耳道声信号CSg。声频处理模块106耦接于收音器102,用来自耳道声信号CSg中撷取对应于使用者USR的声纹特征(Voiceprint Feature),以产生一声纹特征信号VPF,其中声纹特征信号VPF包括使用者
USR的声纹特征。入耳式装置100可透过有线传输或是无线传输,将声纹特征信号VPF传送至终端装置120。Specifically, please refer to FIG. 1 and FIG. 2 . FIG. 1 and FIG. 2 are schematic diagrams showing the appearance and functional blocks of the speaker recognition system 10 according to the embodiment of the present application. The speaker recognition system 10 includes an In-Ear device (ie, a Canal-Type Device) 100 and a terminal device 120. The terminal device 120 can be a computer host with computing functions and mobile electronics. The device or the access control system, the in-ear device 100 can be placed into an external ear canal (Canal, ie, External Acoustic Meatus) of the user USR, which can be an earphone, an earphone, and an earplug. (Earplug) or one of the hearing aids (Hearing Aid). The in-ear device 100 can include a radio 102, a speaker 104, and an audio processing module 106. The radio 102 can be a microphone for receiving an ear canal CWV from the external ear canal of the user USR. The ear canal acoustic wave CWV is converted into an ear canal acoustic signal CSg, that is, the radio receiver 102 can generate an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV. The audio processing module 106 is coupled to the radio receiver 102, and extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg to generate a voiceprint feature signal VPF, wherein the voiceprint feature signal VPF includes User
Voiceprint features of the USR. The in-ear device 100 can transmit the voiceprint feature signal VPF to the terminal device 120 through wired transmission or wireless transmission.
一般来说,终端装置120可根据其所接收到的声纹特征信号,判断使用者USR是否为一认证用户或其他人,甚至是事先录有使用者USR声音的录音机,其中,说话者识别系统10的发话端是指对说话者识别系统10发出声音以进行声纹辨识的人或装置(如录音机或具有语音合成功能的装置)。换句话说,终端装置120可根据其所接收到的声纹特征信号,判断使用者USR是否为认证用户。在理想的情况下,终端装置120接收到入耳式装置100所产生的声纹特征信号VPF,并根据声纹特征信号VPF判断使用者USR确实为认证用户。Generally, the terminal device 120 can determine whether the user USR is an authenticated user or other person according to the voiceprint characteristic signal received by the user, or even a recorder that records the USR sound of the user in advance, wherein the speaker recognition system The utterance of 10 refers to a person or device (such as a recorder or a device having a speech synthesis function) that emits a sound to the speaker recognition system 10 for voiceprint recognition. In other words, the terminal device 120 can determine whether the user USR is an authenticated user according to the voiceprint feature signal it receives. In an ideal case, the terminal device 120 receives the voiceprint feature signal VPF generated by the in-ear device 100, and determines that the user USR is indeed an authenticated user based on the voiceprint feature signal VPF.
说话者识别系统10的操作可归纳为一声纹辨识流程。请参考图3,图3为本申请实施例一声纹辨识流程30的示意图。声纹辨识流程30可由说话者识别系统10来执行,其包含以下步骤:The operation of the speaker recognition system 10 can be summarized as a voiceprint recognition process. Please refer to FIG. 3. FIG. 3 is a schematic diagram of a voiceprint identification process 30 according to an embodiment of the present application. The voiceprint recognition process 30 can be performed by the speaker recognition system 10, which includes the following steps:
步骤302:入耳式装置100的收音器102自使用者USR外耳道接收耳道声波CWV,并产生对应于耳道声波CWV的耳道声信号CSg。Step 302: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
步骤304:入耳式装置100的声频处理模块106自耳道声信号CSg中撷取对应于使用者USR的声纹特征,并产生声纹特征信号VPF。Step 304: The audio processing module 106 of the in-ear device 100 extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
步骤306:终端装置120根据声纹特征信号VPF,判断使用者USR是否为一认证用户。Step 306: The terminal device 120 determines whether the user USR is an authenticated user according to the voiceprint feature signal VPF.
于步骤304,声频处理模块106自耳道声信号CSg中撷取对应于使用者USR的声纹特征并产生声纹特征信号VPF的操作细节,可参考图4,图4为一声纹特征撷取流程40的示意图,声纹特征撷取流程40是由入耳式装置100的声频处理模块106来执行。由图4可知,声频处理模块106可对耳道声信号CSg进行一语音检测(Voice Detection)运算、一噪声抑制(Noise Suppression)运算以及一特征提取(Feature Extraction)运算,即可产生声纹特征信号VPF,其中,语音检测运算、噪声抑制运算以及特征提取运算不限于利用特定算法来实现,其技术细节为本领域技术人员所熟知,故于此不再赘述。需注意的是,声纹特
征撷取流程40中的语音检测运算、噪声抑制运算以及特征提取运算皆由设置于入耳式装置100中的声频处理模块106来执行,即由设置于入耳式装置100中的声频处理模块106产生声纹特征信号VPF。声频处理模块106产生声纹特征信号VPF后,可将声纹特征信号VPF利用有线传输或是无线传输的方式传送至终端装置120。In step 304, the audio processing module 106 extracts the operation details corresponding to the voiceprint feature of the user USR and generates the voiceprint feature signal VPF from the ear canal acoustic signal CSg. Referring to FIG. 4, FIG. 4 is a voiceprint feature capture. The schematic of the flow 40, the voiceprint feature extraction process 40 is performed by the audio processing module 106 of the in-ear device 100. As can be seen from FIG. 4, the audio processing module 106 can perform a voice detection operation, a noise suppression (Noise Suppression) operation, and a feature extraction operation on the ear canal acoustic signal CSg to generate voiceprint features. The signal VPF, wherein the speech detection operation, the noise suppression operation, and the feature extraction operation are not limited to being implemented by using a specific algorithm, and the technical details thereof are well known to those skilled in the art, and thus will not be described herein. It should be noted that the sound pattern
The speech detection operation, the noise suppression operation, and the feature extraction operation in the enrollment process 40 are all performed by the audio processing module 106 disposed in the in-ear device 100, that is, generated by the audio processing module 106 disposed in the in-ear device 100. Voiceprint feature signal VPF. After the audio processing module 106 generates the voiceprint feature signal VPF, the voiceprint feature signal VPF can be transmitted to the terminal device 120 by wire transmission or wireless transmission.
于步骤306,终端装置120根据声纹特征信号VPF,判断使用者USR是否为认证用户的操作细节,请参考图5,图5为一声纹对比流程50的示意图,声纹对比流程50是由人体以外的终端装置120来执行。由图5可知,终端装置120可先根据声纹特征信号VPF建立对应于认证用户的一声纹模型MD,于建立声纹模型MD后,再对比声纹特征信号VPF与声纹模型MD,以进行「声纹匹配」,并根据声纹匹配结果,产生一相似度得分SC(Score),其中,相似度得分SC代表声纹特征信号VPF与声纹模型MD之间的相似程度,其可为一种相似度信号。详细来说,终端装置120可于一第一时间t1建立对应于认证用户声纹模型MD(或于第一时间t1接收声频处理模块106所产生对应于认证用户的一第一声纹特征信号VPF1,并根据第一声纹特征信号VPF1建立对应于认证用户的声纹模型MD,第一声纹特征信号VPF1代表于第一时间t1的声纹特征信号VPF),于建立声纹模型MD后,终端装置120可于一第二时间t2接收声频处理模块106所产生的一第二声纹特征信号VPF2(其代表于第二时间t2的声纹特征信号VPF),终端装置120可比对第二声纹特征信号VPF2与声纹模型MD以进行声纹匹配,并根据声纹匹配结果,产生相似度得分SC。In step 306, the terminal device 120 determines, according to the voiceprint characteristic signal VPF, whether the user USR is an operation detail of the authenticated user. Referring to FIG. 5, FIG. 5 is a schematic diagram of a voiceprint comparison process 50, and the voiceprint comparison process 50 is performed by the human body. It is executed by the terminal device 120 other than the terminal device 120. As shown in FIG. 5, the terminal device 120 may first establish a voiceprint model MD corresponding to the authenticated user according to the voiceprint feature signal VPF, and after establishing the voiceprint model MD, compare the voiceprint feature signal VPF with the voiceprint model MD to perform "soundprint matching", and according to the voiceprint matching result, a similarity score SC (Score) is generated, wherein the similarity score SC represents the degree of similarity between the voiceprint characteristic signal VPF and the voiceprint model MD, which may be one Similarity signal. In detail, the terminal device 120 may correspond to the established user authentication voiceprint model MD (or the times t 1 to the first audio processing module 106 receives the generated corresponding to the authenticated user a first sound pattern feature in a first time t 1 VPF1 signal, and establishes corresponding to the authenticated user according to the first acoustic model MD voiceprint characteristic signal pattern VPF1, first acoustic signal characteristic pattern voiceprint features VPF1 t represents a first time signal to the VPF 1), to establish a model voiceprint after the MD, the terminal device 120 may be at a second time t 2 receiving the audio signal processing a second acoustic characteristic pattern VPF2 (which represents the second time t voiceprint signal VPF 2) is generated by module 106, the terminal apparatus 120 The second voiceprint characteristic signal VPF2 and the voiceprint model MD may be compared for voiceprint matching, and a similarity score SC is generated according to the voiceprint matching result.
终端装置120产生相似度得分SC后,即可根据相似度得分SC,判断使用者USR是否为认证用户,即为终端装置120执行图5中「识别身份」的步骤。于一实施例中,当相似度得分SC大于一特定值时,终端装置120可判定使用者USR确实为认证用户。另外,图5中的「建立声纹模型」、「声纹匹配」以及「取得相似度得分」等步骤不限于利用特定算法来实现,其技术细节为本领域技术人员所熟知,故于此不再赘述。
After the terminal device 120 generates the similarity score SC, it can determine whether the user USR is an authenticated user based on the similarity score SC, that is, the terminal device 120 performs the step of "identifying the identity" in FIG. In an embodiment, when the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user. In addition, the steps of "establishing a voiceprint model", "soundprint matching", and "acquiring a similarity score" in FIG. 5 are not limited to being implemented by using a specific algorithm, and the technical details thereof are well known to those skilled in the art, and thus Let me repeat.
简言之,说话者识别系统10利用入耳式装置100接收耳道声波CWV,利用声频处理模块106撷取对应于使用者USR的声纹特征,利用终端装置120根据声纹特征信号VPF,判断使用者USR是否为认证用户。In short, the speaker recognition system 10 receives the ear canal sound wave CWV by using the in-ear device 100, and uses the audio processing module 106 to capture the voiceprint feature corresponding to the user USR, and uses the terminal device 120 to determine the use according to the voiceprint characteristic signal VPF. Whether the USR is an authenticated user.
现有说话者识别系统皆利用人体以外的麦克风进行收音,存在有遭到侧录或盗录的风险,甚至有心人士可利用语音合成技术合成出与使用者USR声纹相似的声音,进而破解需经说话者识别的安全系统(如语音门禁系统,以下简称语音安全系统)或语音授权系统(其为经说话者识别以确认语者身份以进行授权以进行下一步动作的系统,如语音支付系统、语音转账交易系统、语音信用卡交易系统或语音登入系统等)。相较之下,说话者识别系统10于使用者USR的外耳道进行收音,并针对耳道声波CWV的声纹特征进行声纹辨识,由于耳道声波与以人体外麦克风所接收的声波具有不同的声音特征,而有心人士无法经由侧录、盗录或语音合成破解具有说话者识别系统10的语音安全系统,可进一步提升语音安全系统或语音授权系统的安全性。Existing speaker recognition systems use microphones other than the human body to collect sounds, and there is a risk of being recorded or stolen. Even people with a heart can use voice synthesis technology to synthesize sounds similar to the USR voiceprints. A speaker-recognized security system (such as a voice access control system, hereinafter referred to as a voice security system) or a voice authorization system (a system that is recognized by a speaker to confirm the identity of a speaker for authorization to proceed to the next step, such as a voice payment system) , voice transfer transaction system, voice credit card transaction system or voice login system, etc.). In contrast, the speaker recognition system 10 performs sound collection on the external auditory canal of the user USR, and performs voiceprint recognition on the voiceprint characteristics of the ear canal sound wave CWV, since the ear canal sound wave is different from the sound wave received by the external microphone. The voice features, and the intent of the person can not crack the voice security system with the speaker recognition system 10 via side recording, piracy or voice synthesis, which can further enhance the security of the voice security system or the voice authorization system.
更进一步地,人体透过肺部呼吸时,仍会于外耳道产生因呼吸而产生的呼吸声波(其具有特定呼吸频率),而呼吸声波包含于耳道声波CWV之中,因此入耳式装置100中的声频处理模块106可由耳道声信号CSg判断耳道声波CWV中是否具有呼吸声波,即对耳道声信号CSg进行一生理检测运算,以确认说话者识别系统10的发话端为具有生理特征的自然人,而非如录音机或语音合成器等装置,其中,生理检测运算可为一呼吸检测运算,甚至是一心率检测运算。Further, when the human body breathes through the lungs, respiratory sound waves (which have a specific respiratory frequency) generated by breathing are still generated in the external auditory canal, and the respiratory sound waves are included in the ear canal sound wave CWV, so the in-ear device 100 The audio processing module 106 can determine whether the ear canal acoustic wave CWV has a respiratory sound wave by the ear canal acoustic signal CSg, that is, perform a physiological detection operation on the ear canal acoustic signal CSg to confirm that the speech end of the speaker recognition system 10 has physiological characteristics. A natural person, rather than a device such as a tape recorder or a speech synthesizer, wherein the physiological detection operation can be a breathing detection operation or even a heart rate detection operation.
具体来说,请参考图6,图6为本申请实施例一声纹辨识流程60的示意图。声纹辨识流程60可由说话者识别系统10来执行,其包含以下步骤:Specifically, please refer to FIG. 6. FIG. 6 is a schematic diagram of a voiceprint identification process 60 according to an embodiment of the present application. The voiceprint recognition process 60 can be performed by the speaker recognition system 10, which includes the following steps:
步骤602:入耳式装置100的收音器102自使用者USR外耳道接收耳道声波CWV,并产生对应于耳道声波CWV的耳道声信号CSg。Step 602: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
步骤603:入耳式装置100的声频处理模块106对耳道声信号CSg进行生理检测运算,以产生一生理检测结果Bio。
Step 603: The audio processing module 106 of the in-ear device 100 performs a physiological detection operation on the ear canal acoustic signal CSg to generate a physiological detection result Bio.
步骤604:入耳式装置100的声频处理模块106自耳道声信号CSg中撷取对应于使用者USR的声纹特征,并产生声纹特征信号VPF。Step 604: The audio processing module 106 of the in-ear device 100 extracts the voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
步骤606:终端装置120根据声纹特征信号VPF以及生理检测结果Bio,判断使用者USR是否为认证用户本身。Step 606: The terminal device 120 determines, according to the voiceprint characteristic signal VPF and the physiological detection result Bio, whether the user USR is the authenticated user itself.
声纹辨识流程60与声纹辨识流程30相似。与声纹辨识流程30不同的是,声纹辨识流程60还包括步骤603。于步骤603,声频处理模块106不限于利用特定算法对耳道声信号CSg进行呼吸检测运算,举例来说,声频处理模块106可根据耳道声信号CSg检测耳道声波CWV中是否具有特定呼吸频率的呼吸声波,而不在此限。呼吸检测运算的技术细节为本领域技术人员所熟知,故于此不再赘述。以生理检测结果Bio为呼吸检测结果为例,生理检测结果Bio可为一二进制数值(Binary Value),其代表检测到「有呼吸」或「无呼吸」,当生理检测结果Bio指示检测到「有呼吸」时,代表说话者识别系统10的发话端为自然人,另外,生理检测结果Bio亦可为如灰阶值(Gray Level)等非二进制的数值,其代表代表检测到「有呼吸」(或检测到「无呼吸」)的信心水平(Confidence Level),或是使用者USR的特定呼吸频率以及特征。The voiceprint recognition process 60 is similar to the voiceprint recognition process 30. Unlike the voiceprint recognition process 30, the voiceprint recognition process 60 further includes a step 603. In step 603, the audio processing module 106 is not limited to performing a breath detection operation on the ear canal acoustic signal CSg by using a specific algorithm. For example, the audio processing module 106 can detect whether the ear canal sound wave CWV has a specific respiratory frequency according to the ear canal acoustic signal CSg. Breathing sound waves, not limited to this. The technical details of the breath detection operation are well known to those skilled in the art and will not be described herein. Taking the physiological test result Bio as the respiratory test result as an example, the physiological test result Bio can be a binary value (Binary Value), which represents the detection of "having breathing" or "no breathing", when the physiological test result Bio indicates that "there is In the case of "breathing", the speaking end of the representative speaker recognition system 10 is a natural person. In addition, the physiological detection result Bio may also be a non-binary value such as a gray level value, and the representative representative detects that there is "breathing" (or The Confidence Level of the "no breath" is detected, or the specific respiratory rate and characteristics of the user USR.
于步骤606,终端装置120根据声纹特征信号VPF以及生理检测结果Bio,判断说话者识别系统10的发话端是否为使用者USR本身。于一实施例中,当生理检测结果Bio指示检测到「有呼吸」且相似度得分SC大于特定值时,终端装置120可判定使用者USR确实为认证用户。In step 606, the terminal device 120 determines whether the speech end of the speaker recognition system 10 is the user USR itself based on the voiceprint characteristic signal VPF and the physiological detection result Bio. In one embodiment, when the physiological detection result Bio indicates that "breathing" is detected and the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.
除此之外,语音安全系统或语音授权系统通常具有一问一答的对话情境,举例来说,银行端(或信用卡中心、支付系统中心,以下简称客服端)可能于语音电话中询问:「请问您的账号?」而使用者可能回答:「123456789」,其中客服端的问句可透过扬声器104发送至使用者USR的外耳道,在此情形下,耳道声波CWV可包含客服端的问句声波的反射波,因此入耳式装置100中的声频处理模块106可由耳道声信号CSg判断耳道声波CWV中是否具有向关于问句声波的反射声波,以产生一反射波检测结果。当反射波检测结果显示耳道
声波CWV具有反射声波时,代表说话者识别系统10的发话端为自然人,而非如录音机或语音合成器等装置,进而排除说话者识别系统10的发话端为装置的可能性。另外,问句声波可广义地视为提示声波,当提示声波结束之后,使用者USR才可开始发话,举例来说,客服端可能于语音电话中说:「请听到哔声后念出您的账号/密码(即提示语句)」,提示声波可包括相关于提示语句的声波或该哔声。In addition, the voice security system or voice authorization system usually has a question-and-answer dialogue situation. For example, the bank (or credit card center, payment system center, hereinafter referred to as the customer service) may ask in the voice call: May I ask your account?" and the user may answer: "123456789", in which the customer service question can be sent to the external auditory canal of the user USR through the speaker 104. In this case, the ear canal sound wave CWV can include the client's question sound wave. The reflected wave, therefore, the audio processing module 106 in the in-ear device 100 can determine whether the ear canal sound wave CWV has a reflected sound wave to the question sound wave by the ear canal sound signal CSg to generate a reflected wave detection result. When the reflected wave detection result shows the ear canal
When the acoustic wave CWV has reflected sound waves, it means that the speech end of the speaker recognition system 10 is a natural person, rather than a device such as a recorder or a speech synthesizer, thereby eliminating the possibility that the speech end of the speaker recognition system 10 is a device. In addition, the question sound wave can be broadly regarded as the prompt sound wave. After the prompt sound wave ends, the user USR can start to speak. For example, the customer service terminal may say in the voice call: "Please hear the buzzer and read you. The account number/password (ie, the prompt statement), the prompt sound wave may include a sound wave or the click sound related to the prompt statement.
详细来说,请参考图7,图7为本申请实施例一声纹辨识流程70的示意图。声纹辨识流程70可由说话者识别系统10来执行,其包含以下步骤:For details, please refer to FIG. 7. FIG. 7 is a schematic diagram of a voiceprint identification process 70 according to an embodiment of the present application. The voiceprint recognition process 70 can be performed by the speaker recognition system 10, which includes the following steps:
步骤701:扬声器104向使用者USR外耳道发出一提示声波。Step 701: The speaker 104 sends a prompt sound wave to the user's USR external auditory canal.
步骤702:入耳式装置100的收音器102自使用者USR外耳道接收耳道声波CWV,并产生对应于耳道声波CWV的耳道声信号CSg。Step 702: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
步骤703:入耳式装置100的声频处理模块106根据耳道声信号CSg,判断耳道声波CWV中是否具有对应于提示声波的反射声波,以产生一反射波检测结果Rf。Step 703: The audio processing module 106 of the in-ear device 100 determines whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV according to the ear canal sound signal CSg to generate a reflected wave detection result Rf.
步骤704:入耳式装置100的声频处理模块106自耳道声信号CSg中撷取对应于使用者USR的声纹特征,并产生声纹特征信号VPF。Step 704: The audio processing module 106 of the in-ear device 100 extracts the voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
步骤706:终端装置120根据声纹特征信号VPF以及反射波检测结果Rf,判断使用者USR是否为认证用户。Step 706: The terminal device 120 determines whether the user USR is an authenticated user based on the voiceprint characteristic signal VPF and the reflected wave detection result Rf.
声纹辨识流程70与声纹辨识流程30相似。与声纹辨识流程30不同的是,声纹辨识流程70还包括步骤701及步骤703。于步骤703,声频处理模块106不限于利用特定算法来判断耳道声波CWV中是否具有对应于提示声波的反射声波,举例来说,因人体的外耳道具有一耳道长度范围,声频处理模块106可根据耳道长度范围,来判断耳道声波CWV中是否具有对应于提示声波的反射声波。耳道内生理检测运算(如呼吸检测运算或心率检测运算)的技术细节为本领域技术人员所熟知,故于此不再赘述。反射波检测结果Rf可为一二进制数
值,其代表「有反射波」或「无反射波」,当反射波检测结果Rf指示「有反射波」时,代表说话者识别系统10的发话端为自然人。The voiceprint recognition process 70 is similar to the voiceprint recognition process 30. Different from the voiceprint recognition process 30, the voiceprint recognition process 70 further includes steps 701 and 703. In step 703, the audio processing module 106 is not limited to determining whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV by using a specific algorithm. For example, since the outer ear prop of the human body has an ear canal length range, the audio processing module 106 can According to the length range of the ear canal, it is judged whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV. The technical details of the physiological detection operation (such as the breathing detection operation or the heart rate detection operation) in the ear canal are well known to those skilled in the art, and thus will not be described herein. The reflected wave detection result Rf can be a binary number
The value represents "reflected wave" or "no reflected wave". When the reflected wave detection result Rf indicates "reflected wave", the caller of the speaker recognition system 10 is a natural person.
于步骤706,终端装置120根据声纹特征信号VPF以及反射波检测结果Rf,判断说话者识别系统10的发话端是否为使用者USR本身。于一实施例中,当反射波检测结果Rf指示「有反射波」且相似度得分SC大于特定值时,终端装置120可判定使用者USR确实为认证用户。In step 706, the terminal device 120 determines whether the originating end of the speaker recognition system 10 is the user USR itself based on the voiceprint characteristic signal VPF and the reflected wave detection result Rf. In one embodiment, when the reflected wave detection result Rf indicates "reflected wave" and the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.
除此之外,于一实施例中,本申请的说话者识别系统可利用如智能手机等个人电子装置(Personal Electronic Device)对入耳式装置100所产生的声纹特征信号VPF进行一变声(Voice Changing)运算,终端装置120根据变声后声纹特征信号进行说话者识别,即根据变声后声纹特征信号判断说话者识别系统的发话端是否为使用者USR。换句话说,使用者USR仅在持有个人电子装置时,才能通过终端装置120对其说话者识别的验证过程,进一步增加语音安全系统或语音授权系统的安全性。In addition, in an embodiment, the speaker recognition system of the present application can perform a voice change on the voiceprint characteristic signal VPF generated by the in-ear device 100 by using a personal electronic device such as a smart phone (Voice) In the operation, the terminal device 120 performs speaker recognition based on the voiced characteristic signal after the voice change, that is, whether the voice end of the speaker recognition system is the user USR according to the voice change characteristic signal after the voice change. In other words, the user USR can further increase the security of the voice security system or the voice authorization system only through the authentication process that the terminal device 120 recognizes the speaker when holding the personal electronic device.
具体来说,请参考图8,图8为本申请实施例一说话者识别系统80的功能方块示意图。说话者识别系统80与说话者识别系统10相似,与说话者识别系统10不同的是,说话者识别系统80另包含个人电子装置800,个人电子装置800可为智能穿戴装置、智能手机、平板计算机、个人计算机等个人电子装置,个人电子装置800接收入耳式装置100所产生的声纹特征信号VPF,并对声纹特征信号VPF进行变声运算,以产生一变声后声纹特征信号VPF’,并将变声后声纹特征信号VPF’传送至终端装置120,终端装置120根据变声后声纹特征信号VPF’进行说话者识别。Specifically, please refer to FIG. 8. FIG. 8 is a functional block diagram of a speaker recognition system 80 according to an embodiment of the present application. The speaker recognition system 80 is similar to the speaker recognition system 10. Unlike the speaker recognition system 10, the speaker recognition system 80 further includes a personal electronic device 800, which may be a smart wearable device, a smart phone, or a tablet computer. a personal electronic device such as a personal computer, the personal electronic device 800 receives the voiceprint characteristic signal VPF generated by the in-ear device 100, and performs a voice-changing operation on the voiceprint characteristic signal VPF to generate a voice-changing voiceprint characteristic signal VPF', and The voice-changing voiceprint characteristic signal VPF' is transmitted to the terminal device 120, and the terminal device 120 performs speaker recognition based on the voice-changing voiceprint characteristic signal VPF'.
说话者识别系统80的操作可归纳为一声纹辨识流程。请参考图9,图9为本申请实施例一声纹辨识流程90的示意图。声纹辨识流程90可由说话者识别系统80来执行,其包含以下步骤:The operation of the speaker recognition system 80 can be summarized as a voiceprint recognition process. Please refer to FIG. 9. FIG. 9 is a schematic diagram of a voiceprint identification process 90 according to an embodiment of the present application. The voiceprint recognition process 90 can be performed by the speaker recognition system 80, which includes the following steps:
步骤902:入耳式装置100的收音器102自使用者USR外耳道接收耳道声波CWV,并产生对应于耳道声波CWV的耳道声信号CSg。
Step 902: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.
步骤904:入耳式装置100的声频处理模块106自耳道声信号CSg中撷取对应于使用者USR的声纹特征,并产生声纹特征信号VPF。Step 904: The audio processing module 106 of the in-ear device 100 extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.
步骤905:个人电子装置800对声纹特征信号VPF进行变声运算,以产生变声后声纹特征信号VPF’。Step 905: The personal electronic device 800 performs a voice-changing operation on the voiceprint feature signal VPF to generate a voice-changing voiceprint feature signal VPF'.
步骤906:终端装置120根据变声后声纹特征信号VPF’,判断使用者USR是否为认证用户。Step 906: The terminal device 120 determines whether the user USR is an authenticated user based on the voiced characteristic signal VPF'.
声纹辨识流程90与声纹辨识流程30相似。与声纹辨识流程30不同的是,声纹辨识流程90还包括步骤905。于步骤905,个人电子装置800不限于利用特定算法对声纹特征信号VPF进行变声运算,以产生变声后声纹特征信号VPF’,以对这些信息/声纹特征信号VPF进行加密,其技术细节为本领域技术人员所熟知,故于此不再赘述。The voiceprint recognition process 90 is similar to the voiceprint recognition process 30. Unlike the voiceprint recognition process 30, the voiceprint recognition process 90 further includes a step 905. In step 905, the personal electronic device 800 is not limited to performing a voice-changing operation on the voiceprint feature signal VPF by using a specific algorithm to generate a voice-changing voiceprint feature signal VPF' to encrypt the information/voiceprint feature signal VPF. It is well known to those skilled in the art, and thus will not be described herein.
于步骤906,终端装置120可先根据变声后声纹特征信号VPF’建立对应于使用者USR及个人电子装置800的一声纹模型MD’,于建立声纹模型MD’后,再比对变声后声纹特征信号VPF’与声纹模型MD’,以进行「声纹匹配」,并根据声纹匹配结果,产生一相似度得分SC’,相似度得分SC’代表变声后声纹特征信号VPF’与声纹模型MD’之间的相似程度。其余操作细节可参考前述相关段落,于此不再赘述。In step 906, the terminal device 120 may first establish a voiceprint model MD' corresponding to the user USR and the personal electronic device 800 according to the voice-changing voiceprint characteristic signal VPF', after establishing the voiceprint model MD', and then comparing the voice changes. The voiceprint characteristic signal VPF' and the voiceprint model MD' are used for "soundprint matching", and according to the voiceprint matching result, a similarity score SC' is generated, and the similarity score SC' represents the voiceprint characteristic signal VPF' after the voice change. The degree of similarity to the voiceprint model MD'. For details of the remaining operations, refer to the aforementioned related paragraphs, and details are not described herein again.
需注意的是,前述实施例是用以说明本申请之概念,本领域具通常知识者当可据以做不同的修饰,而不限于此。举例来说,终端装置120不限于为计算机主机,只要终端装置120为可执行图5所示声纹对比流程50的电子装置(如云端服务器)甚至是移动电子装置(如手机、平板计算机等),皆符合本申请的要求而属于本申请的范畴。另外,声频处理模块不限于设置于入耳式装置中,声频处理模块亦可设置于终端装置中,入耳式装置仅需将耳道声信号送至终端装置,并由终端装置中的声频处理模块自耳道声信号中撷取对应于使用者USR的声纹特征,亦符合本申请的要求而属于本申请的范畴。
It should be noted that the foregoing embodiments are for explaining the concept of the present application, and those skilled in the art can make different modifications as they are, and are not limited thereto. For example, the terminal device 120 is not limited to being a computer host, as long as the terminal device 120 is an electronic device (such as a cloud server) that can perform the voiceprint comparison process 50 shown in FIG. 5, or even a mobile electronic device (such as a mobile phone, a tablet computer, etc.). All of them meet the requirements of this application and fall within the scope of this application. In addition, the audio processing module is not limited to being disposed in the in-ear device, and the audio processing module may also be disposed in the terminal device. The in-ear device only needs to send the ear canal acoustic signal to the terminal device, and the audio processing module in the terminal device is It is within the scope of the present application to extract the voiceprint features corresponding to the user USR in the ear canal acoustic signal, which also meets the requirements of the present application.
综上所述,本申请的说话者识别系统利用入耳式装置来收音,以接收使用者外耳到的耳道声波,利用入耳式装置中的声频处理模块撷取使用者的声纹特征,并利用终端装置根据声纹特征信号,进行声纹对比,以判断说话者识别系统的发话端是否为使用者本身。相较于现有技术,本申请可避免遭到有心人士侧录或盗录的风险。In summary, the speaker recognition system of the present application uses an in-ear device to receive sound to receive the ear canal sound waves from the user's external ear, and utilizes the audio processing module in the in-ear device to capture the voiceprint characteristics of the user and utilize The terminal device performs voiceprint comparison based on the voiceprint characteristic signal to determine whether the speech end of the speaker recognition system is the user itself. Compared with the prior art, the present application can avoid the risk of being recorded or pirated by a person with a heart.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本申请的保护范围之内。
The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the protection of the present application. Within the scope.
Claims (21)
- 一种说话者识别系统,其特征在于,包括:A speaker recognition system, comprising:入耳式装置,用于置入使用者的外耳道,所述入耳式装置包括:An in-ear device for placing an external auditory canal of a user, the in-ear device comprising:收音器,用来接收来自所述外耳道的耳道声波,以产生对应于所述耳道声波的一耳道声信号;a sound receiver for receiving an ear canal sound wave from the external auditory canal to generate an ear canal acoustic signal corresponding to the ear canal sound wave;声频处理模块,耦接于所述收音器,用于自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生一声纹特征信号;以及An audio processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal;一终端装置,用于根据所述声纹特征信号判断所述使用者是否为一认证用户。A terminal device is configured to determine, according to the voiceprint feature signal, whether the user is an authenticated user.
- 如权利要求1所述的说话者识别系统,其特征在于,所述入耳式装置为一有线或无线的入耳式耳机、入耳式耳机麦克风、耳塞或助听器。The speaker recognition system of claim 1 wherein said in-ear device is a wired or wireless in-ear earphone, an in-ear earphone microphone, an earbud or a hearing aid.
- 如权利要求1所述的说话者识别系统,其特征在于,所述声频处理模块对所述耳道声信号进行一语音检测运算以及一特征提取运算,以产生所述声纹特征信号。The speaker recognition system according to claim 1, wherein said audio processing module performs a speech detection operation and a feature extraction operation on said ear canal acoustic signal to generate said voiceprint feature signal.
- 如权利要求3所述的说话者识别系统,其特征在于,所述声频处理模块对所述耳道声信号进行一噪声抑制运算。The speaker recognition system of claim 3 wherein said audio processing module performs a noise suppression operation on said ear canal acoustic signal.
- 如权利要求1所述的说话者识别系统,其特征在于,所述终端装置为移动电子装置、计算机主机或门禁系统。The speaker recognition system of claim 1 wherein said terminal device is a mobile electronic device, a computer host or an access control system.
- 如权利要求1所述的说话者识别系统,其特征在于,所述终端装置建立对应于所述认证用户的一声纹模型,并接收来自所述声频处理模块的一声纹特征信号,根据所述声纹模型比对所述声纹特征信号,以产生一相似度信号,所述终端装置根据所述相似度信号判断所述使用者是否为所述认证用户。A speaker recognition system according to claim 1, wherein said terminal device establishes a voiceprint model corresponding to said authenticated user, and receives a voiceprint feature signal from said audio processing module, according to said sound The pattern model compares the voiceprint feature signal to generate a similarity signal, and the terminal device determines, according to the similarity signal, whether the user is the authenticated user.
- 如权利要求1所述的说话者识别系统,其特征在于,所述声频处理模块对所述耳道声信号进行一生理检测运算,以产生一生理检测结果,所述终端装 置根据所述声纹特征信号以及所述生理检测结果,判断所述使用者是否为所述认证用户。The speaker recognition system according to claim 1, wherein said audio processing module performs a physiological detection operation on said ear canal acoustic signal to generate a physiological detection result, said terminal loading And determining whether the user is the authenticated user according to the voiceprint feature signal and the physiological detection result.
- 如权利要求7所述的说话者识别系统,其特征在于,所述生理检测运算为一呼吸检测运算,所述生理检测结果为一呼吸检测结果。The speaker recognition system according to claim 7, wherein said physiological detection operation is a respiratory detection operation, and said physiological detection result is a respiratory detection result.
- 如权利要求7所述的说话者识别系统,其特征在于,所述生理检测运算为一心率检测运算,所述生理检测结果为一心率检测结果。The speaker recognition system according to claim 7, wherein said physiological detection operation is a heart rate detection operation, and said physiological detection result is a heart rate detection result.
- 如权利要求1所述的说话者识别系统,其特征在于,所述入耳式装置还包括:The speaker recognition system according to claim 1, wherein the in-ear device further comprises:一扬声器,用来向所述外耳道发出一第一声波;a speaker for emitting a first sound wave to the external auditory canal;其中,所述声频处理模块根据所述耳道声信号判断所述耳道声波中是否具有对应于所述第一声波的一反射声波,以产生一反射波检测结果,所述终端装置根据所述声纹特征信号以及所述反射波检测结果,判断所述使用者是否为所述认证用户。The audio processing module determines whether the reflected sound wave corresponding to the first sound wave has a reflected sound wave in the ear canal sound wave according to the ear canal sound signal to generate a reflected wave detection result, and the terminal device according to the The voiceprint characteristic signal and the reflected wave detection result are used to determine whether the user is the authenticated user.
- 如权利要求1所述的说话者识别系统,其特征在于,还包括:The speaker recognition system of claim 1 further comprising:个人电子装置,用来接收自所述入耳式装置接收所述声纹特征信号,并对所述声纹特征信号进行一变声运算,以产生一变声后声纹特征信号;a personal electronic device, configured to receive the voiceprint feature signal from the in-ear device, and perform a voice-changing operation on the voiceprint feature signal to generate a voice-changing voiceprint feature signal;其中,所述终端装置根据所述个人电子装置所产生的变声后声纹特征信号,判断所述使用者是否为所述认证用户。The terminal device determines whether the user is the authenticated user according to the voice-changing characteristic signal generated by the personal electronic device.
- 一种说话者识别方法,应用于一说话者识别系统,所述说话者识别系统包括一入耳式装置及一终端装置,所述入耳式装置包括一收音器及一声频处理模块,所述入耳式装置置入一使用者的一外耳道,其特征在于,所述说话者识别方法包括:A speaker recognition method is applied to a speaker recognition system, the speaker recognition system includes an in-ear device and a terminal device, and the in-ear device includes a microphone and an audio processing module, the in-ear type The device is placed in an external auditory canal of a user, wherein the speaker identification method comprises:所述收音器接收来自所述外耳道的耳道声波,以产生对应于所述耳道声波的一耳道声信号;The sound receiver receives an ear canal sound wave from the external auditory canal to generate an ear canal acoustic signal corresponding to the ear canal sound wave;所述声频处理模块自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生一声纹特征信号;以及 The audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal;所述终端装置根据所述声纹特征信号,判断所述使用者是否为一认证用户。The terminal device determines, according to the voiceprint feature signal, whether the user is an authenticated user.
- 如权利要求12所述的说话者识别方法,其特征在于,所述声频处理模块自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生所述声纹特征信号的步骤包括:The speaker identification method according to claim 12, wherein the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate the voiceprint feature signal The steps include:所述声频处理模块对所述耳道声信号进行一语音检测运算以及一特征提取运算,以产生所述声纹特征信号。The audio processing module performs a speech detection operation and a feature extraction operation on the ear canal acoustic signal to generate the voiceprint feature signal.
- 如权利要求13所述的说话者识别方法,其特征在于,所述声频处理模块自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生所述声纹特征信号的步骤还包括:The speaker identification method according to claim 13, wherein the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate the voiceprint feature signal. The steps also include:所述声频处理模块对所述耳道声信号进行一噪声抑制运算。The audio processing module performs a noise suppression operation on the ear canal acoustic signal.
- 如权利要求12所述的说话者识别方法,其特征在于,所述终端装置根据所述声纹特征信号,判断所述使用者是否为所述认证用户的步骤包括:The speaker identification method according to claim 12, wherein the step of determining, by the terminal device, whether the user is the authenticated user according to the voiceprint feature signal comprises:所述终端装置建立对应于所述认证用户的一声纹模型;The terminal device establishes a voiceprint model corresponding to the authenticated user;所述终端装置接收来自所述声频处理模块的一声纹特征信号,根据所述声纹模型比对所述声纹特征信号,以产生一相似度得分;以及The terminal device receives a voiceprint feature signal from the audio processing module, and compares the voiceprint feature signal according to the voiceprint model to generate a similarity score;所述终端装置根据所述相似度得分,判断所述使用者是否为所述认证用户。The terminal device determines, according to the similarity score, whether the user is the authenticated user.
- 如权利要求12所述的说话者识别方法,其特征在于,还包括:The speaker identification method according to claim 12, further comprising:所述声频处理模块对所述耳道声信号进行一生理检测运算,以产生一呼吸检测结果;以及The audio processing module performs a physiological detection operation on the ear canal acoustic signal to generate a respiratory detection result;所述终端装置根据所述声纹特征信号以及所述生理检测结果,判断所述使用者是否为所述认证用户。The terminal device determines, according to the voiceprint feature signal and the physiological detection result, whether the user is the authenticated user.
- 如权利要求16所述的说话者识别方法,其特征在于,所述生理检测运算为一呼吸检测运算,所述生理检测结果为一呼吸检测结果。The speaker identification method according to claim 16, wherein the physiological detection operation is a respiratory detection operation, and the physiological detection result is a respiratory detection result.
- 如权利要求16所述的说话者识别方法,其特征在于,所述生理检测运算为一心率检测运算,所述生理检测结果为一心率检测结果。 The speaker recognition method according to claim 16, wherein the physiological detection operation is a heart rate detection operation, and the physiological detection result is a heart rate detection result.
- 如权利要求12所述的说话者识别方法,其特征在于,所述入耳式装置包括一扬声器,所述说话者识别方法还包括:The speaker identification method according to claim 12, wherein the in-ear device comprises a speaker, and the speaker identification method further comprises:所述扬声器向所述外耳道发出一第一声波;The speaker emits a first sound wave to the external auditory canal;所述声频处理模块根据所述耳道声信号判断所述耳道声波中是否具有对应于所述第一声波的一反射声波,以产生一反射波检测结果;以及The audio processing module determines, according to the ear canal acoustic signal, whether a reflected sound wave corresponding to the first sound wave is included in the ear canal sound wave to generate a reflected wave detection result;所述终端装置根据所述声纹特征信号以及所述反射波检测结果,判断所述使用者是否为所述认证用户。The terminal device determines whether the user is the authenticated user according to the voiceprint characteristic signal and the reflected wave detection result.
- 如权利要求12所述的说话者识别方法,其特征在于,所述说话者识别系统包括一个人电子装置,所述说话者识别方法还包括:The speaker identification method according to claim 12, wherein the speaker recognition system comprises a person electronic device, and the speaker recognition method further comprises:所述个人电子装置对所述声纹特征信号进行一变声运算,以产生一变声后声纹特征信号;以及The personal electronic device performs a voice-changing operation on the voiceprint feature signal to generate a voice-changing voiceprint feature signal;所述终端装置根据所述个人电子装置所产生的变声后声纹特征信号,判断所述使用者是否为所述认证用户。The terminal device determines whether the user is the authenticated user according to the voice-changing characteristic signal generated by the personal electronic device.
- 一种用于说话者识别的入耳式装置,其用于置入使用者的外耳道,其特征在于,包括:An in-ear device for speaker recognition for placing an external auditory canal of a user, comprising:收音器,用来接收来自所述外耳道的耳道声波,以产生对应于所述耳道声波的耳道声信号;以及a sound receiver for receiving an ear canal sound wave from the external auditory canal to generate an ear canal acoustic signal corresponding to the ear canal sound wave;声频处理模块,耦接于所述收音器,用于自所述耳道声信号中撷取对应于所述使用者的声纹特征,以产生一声纹特征信号并将所述生纹特征信号发送至一外部终端。 An audio processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal and transmitting the texture feature signal To an external terminal.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201780000606.7A CN110100278B (en) | 2017-07-03 | 2017-07-03 | Speaker recognition system, speaker recognition method and in-ear device |
PCT/CN2017/091466 WO2019006587A1 (en) | 2017-07-03 | 2017-07-03 | Speaker recognition system, speaker recognition method, and in-ear device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/091466 WO2019006587A1 (en) | 2017-07-03 | 2017-07-03 | Speaker recognition system, speaker recognition method, and in-ear device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019006587A1 true WO2019006587A1 (en) | 2019-01-10 |
Family
ID=64949595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/091466 WO2019006587A1 (en) | 2017-07-03 | 2017-07-03 | Speaker recognition system, speaker recognition method, and in-ear device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110100278B (en) |
WO (1) | WO2019006587A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113643707A (en) * | 2020-04-23 | 2021-11-12 | 华为技术有限公司 | Identity verification method and device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2071856U (en) * | 1990-05-22 | 1991-02-20 | 查苕章 | Earplug type telephone receiver and transmiter |
JP2003058190A (en) * | 2001-08-09 | 2003-02-28 | Mitsubishi Heavy Ind Ltd | Personal authentication system |
CN101042869A (en) * | 2006-03-24 | 2007-09-26 | 致胜科技股份有限公司 | Nasal bone conduction living body sound-groove identification apparatus |
CN101541238A (en) * | 2007-01-24 | 2009-09-23 | 松下电器产业株式会社 | Biological information measurement device and method of controlling the same |
JP2010086328A (en) * | 2008-09-30 | 2010-04-15 | Yamaha Corp | Authentication device and cellphone |
CN203984682U (en) * | 2013-11-29 | 2014-12-03 | 华北电力大学 | A kind of auditory prosthesis for special object |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101442933A (en) * | 2005-10-07 | 2009-05-27 | 皇家飞利浦电子股份有限公司 | Ear-thermometer with ear identification |
JP4937661B2 (en) * | 2006-07-31 | 2012-05-23 | ナップエンタープライズ株式会社 | Mobile personal authentication method and electronic commerce method |
US8622919B2 (en) * | 2008-11-17 | 2014-01-07 | Sony Corporation | Apparatus, method, and computer program for detecting a physiological measurement from a physiological sound signal |
CN102142254A (en) * | 2011-03-25 | 2011-08-03 | 北京得意音通技术有限责任公司 | Voiceprint identification and voice identification-based recording and faking resistant identity confirmation method |
US10154818B2 (en) * | 2014-12-24 | 2018-12-18 | Samsung Electronics Co., Ltd. | Biometric authentication method and apparatus |
-
2017
- 2017-07-03 CN CN201780000606.7A patent/CN110100278B/en active Active
- 2017-07-03 WO PCT/CN2017/091466 patent/WO2019006587A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2071856U (en) * | 1990-05-22 | 1991-02-20 | 查苕章 | Earplug type telephone receiver and transmiter |
JP2003058190A (en) * | 2001-08-09 | 2003-02-28 | Mitsubishi Heavy Ind Ltd | Personal authentication system |
CN101042869A (en) * | 2006-03-24 | 2007-09-26 | 致胜科技股份有限公司 | Nasal bone conduction living body sound-groove identification apparatus |
CN101541238A (en) * | 2007-01-24 | 2009-09-23 | 松下电器产业株式会社 | Biological information measurement device and method of controlling the same |
JP2010086328A (en) * | 2008-09-30 | 2010-04-15 | Yamaha Corp | Authentication device and cellphone |
CN203984682U (en) * | 2013-11-29 | 2014-12-03 | 华北电力大学 | A kind of auditory prosthesis for special object |
Also Published As
Publication number | Publication date |
---|---|
CN110100278B (en) | 2023-09-22 |
CN110100278A (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210165866A1 (en) | Methods, apparatus and systems for authentication | |
CN111903112B (en) | Ear proximity detection | |
JP5015939B2 (en) | Method and apparatus for acoustic outer ear characterization | |
US8589167B2 (en) | Speaker liveness detection | |
CN112585676A (en) | Biometric authentication | |
WO2017069118A1 (en) | Personal authentication device, personal authentication method, and personal authentication program | |
TW200820218A (en) | Portable personal authentication method and electronic business transaction method | |
US10896682B1 (en) | Speaker recognition based on an inside microphone of a headphone | |
TW202141469A (en) | In-ear liveness detection for voice user interfaces | |
JP7120313B2 (en) | Biometric authentication device, biometric authentication method and program | |
US11900730B2 (en) | Biometric identification | |
Shang et al. | Voice liveness detection for voice assistants using ear canal pressure | |
CN107533415B (en) | Voiceprint detection method and device | |
WO2022199405A1 (en) | Voice control method and apparatus | |
US11799657B2 (en) | System and method for performing biometric authentication | |
CN110100278B (en) | Speaker recognition system, speaker recognition method and in-ear device | |
Zhang et al. | A phoneme localization based liveness detection for text-independent speaker verification | |
US20220272131A1 (en) | Method, electronic device and system for generating record of telemedicine service | |
JP7019765B2 (en) | Personal authentication device, personal authentication method and personal authentication program | |
TWI697891B (en) | In-ear voice device | |
Zhang et al. | A Continuous Liveness Detection System for Text-independent Speaker Verification | |
Zhang | Towards Enhanced Mobile Voice Authentication | |
FI127920B (en) | Online multimodal information transfer method, related system and device | |
JP2022070872A (en) | Personal authentication device, personal authentication method, and personal authentication program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17916959 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17916959 Country of ref document: EP Kind code of ref document: A1 |