WO2019006587A1

WO2019006587A1 - Speaker recognition system, speaker recognition method, and in-ear device

Info

Publication number: WO2019006587A1
Application number: PCT/CN2017/091466
Authority: WO
Inventors: 黄彦颖
Original assignee: 深圳市汇顶科技股份有限公司
Priority date: 2017-07-03
Filing date: 2017-07-03
Publication date: 2019-01-10
Also published as: CN110100278B; CN110100278A

Abstract

A speaker recognition system (10) comprises an in-ear device (100) for positioning in an ear canal of a user; and a terminal device (120). The in-ear device (100) comprises: a sound receiver (102) used to receive an ear-canal acoustic wave from the ear canal so as to generate an ear-canal acoustic signal corresponding to the ear-canal acoustic wave; and an audio processing module (106) connected to the sound receiver (102) and used to extract, from the ear-canal acoustic signal, a voiceprint feature corresponding to the user so as to generate a voiceprint feature signal. The terminal device (120) is used to determine, according to the voiceprint feature signal, whether the user is an authenticated user.

Description

Speaker recognition system and speaker recognition method and in-ear device

Technical field

The present application relates to a speaker recognition system and a speaker recognition method, and more particularly to a speaker recognition system and a speaker recognition method that can avoid being recorded or pirated.

Background technique

Speaker recognition has been widely used in voice security systems or voice authorization systems, and has become an indispensable feature in contemporary technology products. The existing speech recognition system mainly uses the microphone outside the human body to collect the sound, and the sound received by the human body is the sound wave transmitted by the human body through the oral cavity and transmitted through the external air medium, and the existing speaker identification is affected by the person concerned. The risk of recording or pirating. In detail, a certain prince can track a certain prince and record the voice of the prince, or eavesdrop on the voice of the prince, and even use the technique of speech synthesis to forge the voice of the singer and store the voice of the prince in advance. In the recorder, when A Jun wants to pass the identity verification of a voice access control system or a voice authorization system, A can use the recorder to play the voice of the king and pass the verification, and then steal the identity of the king, which may cause the financial loss of the king. And even endanger the safety of life and property of B. Therefore, there is a need for improvement in the prior art.

Summary of the invention

Accordingly, it is a primary object of the present application to provide a speaker recognition system and a speaker identification method that can avoid being recorded or pirated to improve the disadvantages of the prior art.

In order to solve the above technical problem, the present application provides a speaker recognition system including an in-ear device for inserting an external auditory canal of a user, the in-ear device comprising a sound receiver for receiving from the external auditory canal. An ear canal sound wave to generate an ear canal acoustic signal corresponding to the ear canal sound wave; a frequency processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal sound signal to generate a voiceprint feature signal; and a terminal device for The voiceprint feature signal determines whether the user is an authenticated user.

For example, the in-ear device is a wired or wireless in-ear earphone, an in-ear earphone microphone, an earplug or a hearing aid.

For example, the audio processing module performs a speech detection operation and a feature extraction operation on the ear canal acoustic signal to generate the voiceprint feature signal.

For example, the audio processing module performs a noise suppression operation on the ear canal acoustic signal.

For example, the terminal device is a mobile electronic device, a computer host or an access control system.

For example, the terminal device establishes a voiceprint model corresponding to the authenticated user, and receives a voiceprint feature signal from the audio processing module, and compares the voiceprint feature signal according to the voiceprint model to generate a The similarity signal, the terminal device determines, according to the similarity signal, whether the user is the authenticated user.

For example, the audio processing module performs a physiological detection operation on the ear canal acoustic signal to generate a physiological detection result, and the terminal device determines the user according to the voiceprint characteristic signal and the physiological detection result. Whether it is the authenticated user.

For example, the physiological detection operation is a respiratory detection operation, and the physiological detection result is a respiratory detection result.

For example, the physiological detection operation is a heart rate detection operation, and the physiological detection result is a heart rate detection result.

The present application also provides a speaker recognition method, which is applied to a speaker recognition system, the speaker recognition system includes an in-ear device and a terminal device, and the in-ear device includes a microphone and an audio processing module. The in-ear device is placed in an external auditory canal of a user, wherein the speaker identification method includes the microphone receiving an ear canal sound wave from the external auditory canal to generate an acoustic wave corresponding to the ear canal An ear canal acoustic signal; the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal; and the terminal Determining, according to the voiceprint characteristic signal, whether an utterance of the speaker recognition system is the user itself; wherein the utterance of the speaker recognition system is for the speaker recognition system A person or device that performs sound recognition.

The present application utilizes an in-ear device for receiving sound to receive the ear canal sound waves from the user's external ear, and uses the audio processing module in the in-ear device to capture the voiceprint characteristics of the user, and uses the terminal device to perform voiceprint comparison to determine Whether the speaker of the speaker recognition system is the user itself. Compared with the prior art, the present application can avoid the risk of being recorded or pirated by a person with a heart.

DRAWINGS

FIG. 1 is a schematic diagram of the appearance of a speaker recognition system according to an embodiment of the present application.

2 is a functional block diagram of the speaker recognition system of FIG. 1.

FIG. 3 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.

FIG. 4 is a schematic diagram of a voiceprint feature extraction process according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a voiceprint comparison process according to an embodiment of the present application.

FIG. 6 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.

FIG. 7 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.

FIG. 8 is a schematic functional block diagram of a speaker recognition system according to an embodiment of the present application.

FIG. 9 is a schematic diagram of a voiceprint identification process according to an embodiment of the present application.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

When the human body vocalization process is through the lungs, the airflow passes through the narrow glottis, and the vocal cord mucosa will fluctuate. This fluctuation will cause the nearby air medium to vibrate, and form a sparse wave, which is a sound wave. These sound waves will resonate or resonate in organs such as the pharynx, mouth, nose, and sinuses, and have a louder volume. Organs such as teeth and tongue are corrected to be heard by external listeners. The existing speaker recognition mainly receives the sound waves sent by the speaker's mouth, and transmits the sound waves to the microphone outside the human body through the air medium outside the human body, and for the security system that needs to be recognized by the speaker (such as the voice access control system, In the case of voice payment systems, there is a risk that existing speaker recognition has been recorded or stolen by interested people.

However, in addition to the sound from the oral cavity to the external air medium, the sound waves generated by the vocal cord mucosa are transmitted through the Eustachian tube to the Internal Auditory Meatus or even the External Auditory Meatus, and to the external auditory canal. Sound waves (or ear canal sound waves) have different sound characteristics than sound waves received by microphones other than the human body. In other words, even if the speaker is the same person, the ear canal sound waves are different from the side-recorded or stolen sound waves. Sound characteristics. Therefore, the speaker recognition system of the present application performs sound collection on the external auditory canal of the user, and captures the voiceprint features of the ear canal sound waves, and performs speaker recognition on the voiceprint features of the ear canal sound waves to prevent the user's voice from being affected. The risk of side recording or piracy.

Specifically, please refer to FIG. 1 and FIG. 2 . FIG. 1 and FIG. 2 are schematic diagrams showing the appearance and functional blocks of the speaker recognition system 10 according to the embodiment of the present application. The speaker recognition system 10 includes an In-Ear device (ie, a Canal-Type Device) 100 and a terminal device 120. The terminal device 120 can be a computer host with computing functions and mobile electronics. The device or the access control system, the in-ear device 100 can be placed into an external ear canal (Canal, ie, External Acoustic Meatus) of the user USR, which can be an earphone, an earphone, and an earplug. (Earplug) or one of the hearing aids (Hearing Aid). The in-ear device 100 can include a radio 102, a speaker 104, and an audio processing module 106. The radio 102 can be a microphone for receiving an ear canal CWV from the external ear canal of the user USR. The ear canal acoustic wave CWV is converted into an ear canal acoustic signal CSg, that is, the radio receiver 102 can generate an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV. The audio processing module 106 is coupled to the radio receiver 102, and extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg to generate a voiceprint feature signal VPF, wherein the voiceprint feature signal VPF includes User Voiceprint features of the USR. The in-ear device 100 can transmit the voiceprint feature signal VPF to the terminal device 120 through wired transmission or wireless transmission.

Generally, the terminal device 120 can determine whether the user USR is an authenticated user or other person according to the voiceprint characteristic signal received by the user, or even a recorder that records the USR sound of the user in advance, wherein the speaker recognition system The utterance of 10 refers to a person or device (such as a recorder or a device having a speech synthesis function) that emits a sound to the speaker recognition system 10 for voiceprint recognition. In other words, the terminal device 120 can determine whether the user USR is an authenticated user according to the voiceprint feature signal it receives. In an ideal case, the terminal device 120 receives the voiceprint feature signal VPF generated by the in-ear device 100, and determines that the user USR is indeed an authenticated user based on the voiceprint feature signal VPF.

The operation of the speaker recognition system 10 can be summarized as a voiceprint recognition process. Please refer to FIG. 3. FIG. 3 is a schematic diagram of a voiceprint identification process 30 according to an embodiment of the present application. The voiceprint recognition process 30 can be performed by the speaker recognition system 10, which includes the following steps:

Step 302: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.

Step 304: The audio processing module 106 of the in-ear device 100 extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.

Step 306: The terminal device 120 determines whether the user USR is an authenticated user according to the voiceprint feature signal VPF.

In step 304, the audio processing module 106 extracts the operation details corresponding to the voiceprint feature of the user USR and generates the voiceprint feature signal VPF from the ear canal acoustic signal CSg. Referring to FIG. 4, FIG. 4 is a voiceprint feature capture. The schematic of the flow 40, the voiceprint feature extraction process 40 is performed by the audio processing module 106 of the in-ear device 100. As can be seen from FIG. 4, the audio processing module 106 can perform a voice detection operation, a noise suppression (Noise Suppression) operation, and a feature extraction operation on the ear canal acoustic signal CSg to generate voiceprint features. The signal VPF, wherein the speech detection operation, the noise suppression operation, and the feature extraction operation are not limited to being implemented by using a specific algorithm, and the technical details thereof are well known to those skilled in the art, and thus will not be described herein. It should be noted that the sound pattern The speech detection operation, the noise suppression operation, and the feature extraction operation in the enrollment process 40 are all performed by the audio processing module 106 disposed in the in-ear device 100, that is, generated by the audio processing module 106 disposed in the in-ear device 100. Voiceprint feature signal VPF. After the audio processing module 106 generates the voiceprint feature signal VPF, the voiceprint feature signal VPF can be transmitted to the terminal device 120 by wire transmission or wireless transmission.

In step 306, the terminal device 120 determines, according to the voiceprint characteristic signal VPF, whether the user USR is an operation detail of the authenticated user. Referring to FIG. 5, FIG. 5 is a schematic diagram of a voiceprint comparison process 50, and the voiceprint comparison process 50 is performed by the human body. It is executed by the terminal device 120 other than the terminal device 120. As shown in FIG. 5, the terminal device 120 may first establish a voiceprint model MD corresponding to the authenticated user according to the voiceprint feature signal VPF, and after establishing the voiceprint model MD, compare the voiceprint feature signal VPF with the voiceprint model MD to perform "soundprint matching", and according to the voiceprint matching result, a similarity score SC (Score) is generated, wherein the similarity score SC represents the degree of similarity between the voiceprint characteristic signal VPF and the voiceprint model MD, which may be one Similarity signal. In detail, the terminal device 120 may correspond to the established user authentication voiceprint model MD (or the times t ₁ to the first audio processing module 106 receives the generated corresponding to the authenticated user a first sound pattern feature in a first time t ₁ VPF1 signal, and establishes corresponding to the authenticated user according to the first acoustic model MD voiceprint characteristic signal pattern VPF1, first acoustic signal characteristic pattern voiceprint features VPF1 t represents a first time signal to the VPF _1), to establish a model voiceprint after the MD, the terminal device 120 may be at a second time t ₂ receiving the audio signal processing a second acoustic characteristic pattern VPF2 (which represents the second time t voiceprint signal VPF ₂₎ is generated by module 106, the terminal apparatus 120 The second voiceprint characteristic signal VPF2 and the voiceprint model MD may be compared for voiceprint matching, and a similarity score SC is generated according to the voiceprint matching result.

After the terminal device 120 generates the similarity score SC, it can determine whether the user USR is an authenticated user based on the similarity score SC, that is, the terminal device 120 performs the step of "identifying the identity" in FIG. In an embodiment, when the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user. In addition, the steps of "establishing a voiceprint model", "soundprint matching", and "acquiring a similarity score" in FIG. 5 are not limited to being implemented by using a specific algorithm, and the technical details thereof are well known to those skilled in the art, and thus Let me repeat.

In short, the speaker recognition system 10 receives the ear canal sound wave CWV by using the in-ear device 100, and uses the audio processing module 106 to capture the voiceprint feature corresponding to the user USR, and uses the terminal device 120 to determine the use according to the voiceprint characteristic signal VPF. Whether the USR is an authenticated user.

Existing speaker recognition systems use microphones other than the human body to collect sounds, and there is a risk of being recorded or stolen. Even people with a heart can use voice synthesis technology to synthesize sounds similar to the USR voiceprints. A speaker-recognized security system (such as a voice access control system, hereinafter referred to as a voice security system) or a voice authorization system (a system that is recognized by a speaker to confirm the identity of a speaker for authorization to proceed to the next step, such as a voice payment system) , voice transfer transaction system, voice credit card transaction system or voice login system, etc.). In contrast, the speaker recognition system 10 performs sound collection on the external auditory canal of the user USR, and performs voiceprint recognition on the voiceprint characteristics of the ear canal sound wave CWV, since the ear canal sound wave is different from the sound wave received by the external microphone. The voice features, and the intent of the person can not crack the voice security system with the speaker recognition system 10 via side recording, piracy or voice synthesis, which can further enhance the security of the voice security system or the voice authorization system.

Further, when the human body breathes through the lungs, respiratory sound waves (which have a specific respiratory frequency) generated by breathing are still generated in the external auditory canal, and the respiratory sound waves are included in the ear canal sound wave CWV, so the in-ear device 100 The audio processing module 106 can determine whether the ear canal acoustic wave CWV has a respiratory sound wave by the ear canal acoustic signal CSg, that is, perform a physiological detection operation on the ear canal acoustic signal CSg to confirm that the speech end of the speaker recognition system 10 has physiological characteristics. A natural person, rather than a device such as a tape recorder or a speech synthesizer, wherein the physiological detection operation can be a breathing detection operation or even a heart rate detection operation.

Specifically, please refer to FIG. 6. FIG. 6 is a schematic diagram of a voiceprint identification process 60 according to an embodiment of the present application. The voiceprint recognition process 60 can be performed by the speaker recognition system 10, which includes the following steps:

Step 602: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.

Step 603: The audio processing module 106 of the in-ear device 100 performs a physiological detection operation on the ear canal acoustic signal CSg to generate a physiological detection result Bio.

Step 604: The audio processing module 106 of the in-ear device 100 extracts the voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.

Step 606: The terminal device 120 determines, according to the voiceprint characteristic signal VPF and the physiological detection result Bio, whether the user USR is the authenticated user itself.

The voiceprint recognition process 60 is similar to the voiceprint recognition process 30. Unlike the voiceprint recognition process 30, the voiceprint recognition process 60 further includes a step 603. In step 603, the audio processing module 106 is not limited to performing a breath detection operation on the ear canal acoustic signal CSg by using a specific algorithm. For example, the audio processing module 106 can detect whether the ear canal sound wave CWV has a specific respiratory frequency according to the ear canal acoustic signal CSg. Breathing sound waves, not limited to this. The technical details of the breath detection operation are well known to those skilled in the art and will not be described herein. Taking the physiological test result Bio as the respiratory test result as an example, the physiological test result Bio can be a binary value (Binary Value), which represents the detection of "having breathing" or "no breathing", when the physiological test result Bio indicates that "there is In the case of "breathing", the speaking end of the representative speaker recognition system 10 is a natural person. In addition, the physiological detection result Bio may also be a non-binary value such as a gray level value, and the representative representative detects that there is "breathing" (or The Confidence Level of the "no breath" is detected, or the specific respiratory rate and characteristics of the user USR.

In step 606, the terminal device 120 determines whether the speech end of the speaker recognition system 10 is the user USR itself based on the voiceprint characteristic signal VPF and the physiological detection result Bio. In one embodiment, when the physiological detection result Bio indicates that "breathing" is detected and the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.

In addition, the voice security system or voice authorization system usually has a question-and-answer dialogue situation. For example, the bank (or credit card center, payment system center, hereinafter referred to as the customer service) may ask in the voice call: May I ask your account?" and the user may answer: "123456789", in which the customer service question can be sent to the external auditory canal of the user USR through the speaker 104. In this case, the ear canal sound wave CWV can include the client's question sound wave. The reflected wave, therefore, the audio processing module 106 in the in-ear device 100 can determine whether the ear canal sound wave CWV has a reflected sound wave to the question sound wave by the ear canal sound signal CSg to generate a reflected wave detection result. When the reflected wave detection result shows the ear canal When the acoustic wave CWV has reflected sound waves, it means that the speech end of the speaker recognition system 10 is a natural person, rather than a device such as a recorder or a speech synthesizer, thereby eliminating the possibility that the speech end of the speaker recognition system 10 is a device. In addition, the question sound wave can be broadly regarded as the prompt sound wave. After the prompt sound wave ends, the user USR can start to speak. For example, the customer service terminal may say in the voice call: "Please hear the buzzer and read you. The account number/password (ie, the prompt statement), the prompt sound wave may include a sound wave or the click sound related to the prompt statement.

For details, please refer to FIG. 7. FIG. 7 is a schematic diagram of a voiceprint identification process 70 according to an embodiment of the present application. The voiceprint recognition process 70 can be performed by the speaker recognition system 10, which includes the following steps:

Step 701: The speaker 104 sends a prompt sound wave to the user's USR external auditory canal.

Step 702: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.

Step 703: The audio processing module 106 of the in-ear device 100 determines whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV according to the ear canal sound signal CSg to generate a reflected wave detection result Rf.

Step 704: The audio processing module 106 of the in-ear device 100 extracts the voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.

Step 706: The terminal device 120 determines whether the user USR is an authenticated user based on the voiceprint characteristic signal VPF and the reflected wave detection result Rf.

The voiceprint recognition process 70 is similar to the voiceprint recognition process 30. Different from the voiceprint recognition process 30, the voiceprint recognition process 70 further includes

steps

701 and 703. In step 703, the audio processing module 106 is not limited to determining whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV by using a specific algorithm. For example, since the outer ear prop of the human body has an ear canal length range, the audio processing module 106 can According to the length range of the ear canal, it is judged whether there is a reflected sound wave corresponding to the prompt sound wave in the ear canal sound wave CWV. The technical details of the physiological detection operation (such as the breathing detection operation or the heart rate detection operation) in the ear canal are well known to those skilled in the art, and thus will not be described herein. The reflected wave detection result Rf can be a binary number The value represents "reflected wave" or "no reflected wave". When the reflected wave detection result Rf indicates "reflected wave", the caller of the speaker recognition system 10 is a natural person.

In step 706, the terminal device 120 determines whether the originating end of the speaker recognition system 10 is the user USR itself based on the voiceprint characteristic signal VPF and the reflected wave detection result Rf. In one embodiment, when the reflected wave detection result Rf indicates "reflected wave" and the similarity score SC is greater than a specific value, the terminal device 120 may determine that the user USR is indeed an authenticated user.

In addition, in an embodiment, the speaker recognition system of the present application can perform a voice change on the voiceprint characteristic signal VPF generated by the in-ear device 100 by using a personal electronic device such as a smart phone (Voice) In the operation, the terminal device 120 performs speaker recognition based on the voiced characteristic signal after the voice change, that is, whether the voice end of the speaker recognition system is the user USR according to the voice change characteristic signal after the voice change. In other words, the user USR can further increase the security of the voice security system or the voice authorization system only through the authentication process that the terminal device 120 recognizes the speaker when holding the personal electronic device.

Specifically, please refer to FIG. 8. FIG. 8 is a functional block diagram of a speaker recognition system 80 according to an embodiment of the present application. The speaker recognition system 80 is similar to the speaker recognition system 10. Unlike the speaker recognition system 10, the speaker recognition system 80 further includes a personal electronic device 800, which may be a smart wearable device, a smart phone, or a tablet computer. a personal electronic device such as a personal computer, the personal electronic device 800 receives the voiceprint characteristic signal VPF generated by the in-ear device 100, and performs a voice-changing operation on the voiceprint characteristic signal VPF to generate a voice-changing voiceprint characteristic signal VPF', and The voice-changing voiceprint characteristic signal VPF' is transmitted to the terminal device 120, and the terminal device 120 performs speaker recognition based on the voice-changing voiceprint characteristic signal VPF'.

The operation of the speaker recognition system 80 can be summarized as a voiceprint recognition process. Please refer to FIG. 9. FIG. 9 is a schematic diagram of a voiceprint identification process 90 according to an embodiment of the present application. The voiceprint recognition process 90 can be performed by the speaker recognition system 80, which includes the following steps:

Step 902: The tuner 102 of the in-ear device 100 receives the ear canal acoustic wave CWV from the external ear canal of the user USR, and generates an ear canal acoustic signal CSg corresponding to the ear canal acoustic wave CWV.

Step 904: The audio processing module 106 of the in-ear device 100 extracts a voiceprint feature corresponding to the user USR from the ear canal acoustic signal CSg, and generates a voiceprint feature signal VPF.

Step 905: The personal electronic device 800 performs a voice-changing operation on the voiceprint feature signal VPF to generate a voice-changing voiceprint feature signal VPF'.

Step 906: The terminal device 120 determines whether the user USR is an authenticated user based on the voiced characteristic signal VPF'.

The voiceprint recognition process 90 is similar to the voiceprint recognition process 30. Unlike the voiceprint recognition process 30, the voiceprint recognition process 90 further includes a step 905. In step 905, the personal electronic device 800 is not limited to performing a voice-changing operation on the voiceprint feature signal VPF by using a specific algorithm to generate a voice-changing voiceprint feature signal VPF' to encrypt the information/voiceprint feature signal VPF. It is well known to those skilled in the art, and thus will not be described herein.

In step 906, the terminal device 120 may first establish a voiceprint model MD' corresponding to the user USR and the personal electronic device 800 according to the voice-changing voiceprint characteristic signal VPF', after establishing the voiceprint model MD', and then comparing the voice changes. The voiceprint characteristic signal VPF' and the voiceprint model MD' are used for "soundprint matching", and according to the voiceprint matching result, a similarity score SC' is generated, and the similarity score SC' represents the voiceprint characteristic signal VPF' after the voice change. The degree of similarity to the voiceprint model MD'. For details of the remaining operations, refer to the aforementioned related paragraphs, and details are not described herein again.

It should be noted that the foregoing embodiments are for explaining the concept of the present application, and those skilled in the art can make different modifications as they are, and are not limited thereto. For example, the terminal device 120 is not limited to being a computer host, as long as the terminal device 120 is an electronic device (such as a cloud server) that can perform the voiceprint comparison process 50 shown in FIG. 5, or even a mobile electronic device (such as a mobile phone, a tablet computer, etc.). All of them meet the requirements of this application and fall within the scope of this application. In addition, the audio processing module is not limited to being disposed in the in-ear device, and the audio processing module may also be disposed in the terminal device. The in-ear device only needs to send the ear canal acoustic signal to the terminal device, and the audio processing module in the terminal device is It is within the scope of the present application to extract the voiceprint features corresponding to the user USR in the ear canal acoustic signal, which also meets the requirements of the present application.

In summary, the speaker recognition system of the present application uses an in-ear device to receive sound to receive the ear canal sound waves from the user's external ear, and utilizes the audio processing module in the in-ear device to capture the voiceprint characteristics of the user and utilize The terminal device performs voiceprint comparison based on the voiceprint characteristic signal to determine whether the speech end of the speaker recognition system is the user itself. Compared with the prior art, the present application can avoid the risk of being recorded or pirated by a person with a heart.

The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the protection of the present application. Within the scope.

Claims

A speaker recognition system, comprising:

An in-ear device for placing an external auditory canal of a user, the in-ear device comprising:

a sound receiver for receiving an ear canal sound wave from the external auditory canal to generate an ear canal acoustic signal corresponding to the ear canal sound wave;

An audio processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal;

A terminal device is configured to determine, according to the voiceprint feature signal, whether the user is an authenticated user.
The speaker recognition system of claim 1 wherein said in-ear device is a wired or wireless in-ear earphone, an in-ear earphone microphone, an earbud or a hearing aid.
The speaker recognition system according to claim 1, wherein said audio processing module performs a speech detection operation and a feature extraction operation on said ear canal acoustic signal to generate said voiceprint feature signal.
The speaker recognition system of claim 3 wherein said audio processing module performs a noise suppression operation on said ear canal acoustic signal.
The speaker recognition system of claim 1 wherein said terminal device is a mobile electronic device, a computer host or an access control system.
A speaker recognition system according to claim 1, wherein said terminal device establishes a voiceprint model corresponding to said authenticated user, and receives a voiceprint feature signal from said audio processing module, according to said sound The pattern model compares the voiceprint feature signal to generate a similarity signal, and the terminal device determines, according to the similarity signal, whether the user is the authenticated user.
The speaker recognition system according to claim 1, wherein said audio processing module performs a physiological detection operation on said ear canal acoustic signal to generate a physiological detection result, said terminal loading And determining whether the user is the authenticated user according to the voiceprint feature signal and the physiological detection result.
The speaker recognition system according to claim 7, wherein said physiological detection operation is a respiratory detection operation, and said physiological detection result is a respiratory detection result.
The speaker recognition system according to claim 7, wherein said physiological detection operation is a heart rate detection operation, and said physiological detection result is a heart rate detection result.
The speaker recognition system according to claim 1, wherein the in-ear device further comprises:

a speaker for emitting a first sound wave to the external auditory canal;

The audio processing module determines whether the reflected sound wave corresponding to the first sound wave has a reflected sound wave in the ear canal sound wave according to the ear canal sound signal to generate a reflected wave detection result, and the terminal device according to the The voiceprint characteristic signal and the reflected wave detection result are used to determine whether the user is the authenticated user.
The speaker recognition system of claim 1 further comprising:

a personal electronic device, configured to receive the voiceprint feature signal from the in-ear device, and perform a voice-changing operation on the voiceprint feature signal to generate a voice-changing voiceprint feature signal;

The terminal device determines whether the user is the authenticated user according to the voice-changing characteristic signal generated by the personal electronic device.
A speaker recognition method is applied to a speaker recognition system, the speaker recognition system includes an in-ear device and a terminal device, and the in-ear device includes a microphone and an audio processing module, the in-ear type The device is placed in an external auditory canal of a user, wherein the speaker identification method comprises:

The sound receiver receives an ear canal sound wave from the external auditory canal to generate an ear canal acoustic signal corresponding to the ear canal sound wave;

The audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal;

The terminal device determines, according to the voiceprint feature signal, whether the user is an authenticated user.
The speaker identification method according to claim 12, wherein the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate the voiceprint feature signal The steps include:

The audio processing module performs a speech detection operation and a feature extraction operation on the ear canal acoustic signal to generate the voiceprint feature signal.
The speaker identification method according to claim 13, wherein the audio processing module extracts a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate the voiceprint feature signal. The steps also include:

The audio processing module performs a noise suppression operation on the ear canal acoustic signal.
The speaker identification method according to claim 12, wherein the step of determining, by the terminal device, whether the user is the authenticated user according to the voiceprint feature signal comprises:

The terminal device establishes a voiceprint model corresponding to the authenticated user;

The terminal device receives a voiceprint feature signal from the audio processing module, and compares the voiceprint feature signal according to the voiceprint model to generate a similarity score;

The terminal device determines, according to the similarity score, whether the user is the authenticated user.
The speaker identification method according to claim 12, further comprising:

The audio processing module performs a physiological detection operation on the ear canal acoustic signal to generate a respiratory detection result;

The terminal device determines, according to the voiceprint feature signal and the physiological detection result, whether the user is the authenticated user.
The speaker identification method according to claim 16, wherein the physiological detection operation is a respiratory detection operation, and the physiological detection result is a respiratory detection result.
The speaker recognition method according to claim 16, wherein the physiological detection operation is a heart rate detection operation, and the physiological detection result is a heart rate detection result.
The speaker identification method according to claim 12, wherein the in-ear device comprises a speaker, and the speaker identification method further comprises:

The speaker emits a first sound wave to the external auditory canal;

The audio processing module determines, according to the ear canal acoustic signal, whether a reflected sound wave corresponding to the first sound wave is included in the ear canal sound wave to generate a reflected wave detection result;

The terminal device determines whether the user is the authenticated user according to the voiceprint characteristic signal and the reflected wave detection result.
The speaker identification method according to claim 12, wherein the speaker recognition system comprises a person electronic device, and the speaker recognition method further comprises:

The personal electronic device performs a voice-changing operation on the voiceprint feature signal to generate a voice-changing voiceprint feature signal;

The terminal device determines whether the user is the authenticated user according to the voice-changing characteristic signal generated by the personal electronic device.
An in-ear device for speaker recognition for placing an external auditory canal of a user, comprising:

a sound receiver for receiving an ear canal sound wave from the external auditory canal to generate an ear canal acoustic signal corresponding to the ear canal sound wave;

An audio processing module, coupled to the sound receiver, for extracting a voiceprint feature corresponding to the user from the ear canal acoustic signal to generate a voiceprint feature signal and transmitting the texture feature signal To an external terminal.