CN115035886A

CN115035886A - Voiceprint recognition method and electronic equipment

Info

Publication number: CN115035886A
Application number: CN202111094139.3A
Authority: CN
Inventors: 孙运平
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2022-09-09
Anticipated expiration: 2041-09-17
Also published as: CN115035886B

Abstract

The application provides a voiceprint recognition method and electronic equipment. The method comprises the following steps: after receiving the target voice signal and the first muscle vibration signal, the electronic equipment separates the target voice signal to obtain a first voice signal and a second voice signal. And the two voice signals and the first muscle vibration signal are subjected to parameter extraction and calculation respectively to obtain first voiceprint information and corresponding confidence coefficients and second voiceprint information and corresponding confidence coefficients respectively. And respectively comparing the confidence degrees corresponding to the two voiceprint information with the registration voiceprint confidence degree critical value to obtain that the first voiceprint information belongs to the registered user or obtain that the second voiceprint information does not belong to the registered user. And finally, sending a first voice signal corresponding to the first voiceprint information belonging to the registered user to another electronic device. Therefore, under the condition that the registered user correctly wears the wearable electronic equipment, under the combined action of the voice signal and the muscle vibration signal, the voice signal sent by the registered user is judged.

Description

Voiceprint recognition method and electronic equipment

Technical Field

The present application relates to the field of wearable electronic devices, and in particular, to a voiceprint recognition method and an electronic device.

Background

With the development of the functional technology of wearable electronic devices, wearable electronic devices have been widely used in various fields such as games, entertainment, education, and the like. As a development core in the technical field of wearable electronic equipment, accurate voice information identification has significant meaning for the combined use of the wearable electronic equipment and various terminals or display equipment, and a user can have better use experience when using the wearable electronic equipment.

The existing sound identification technology of wearable electronic equipment comprises the steps that a microphone is used for collecting sound information, the wearable electronic equipment is wide in application range, for example, the wearable electronic equipment can be interconnected with a mobile phone, a sound instruction of a user is identified through the wearable electronic equipment, the sound instruction is sent to the mobile phone, and the mobile phone can perform a series of operations after receiving the sound instruction. However, in the process of actually applying the wearable electronic device, the user cannot control the surrounding environment, and the situation that the user is in a noisy environment or a multi-user acoustic environment sometimes occurs, and the existing wearable electronic device has a low device wake-up rate in these environments, and needs to be raised for a plurality of times to wake up the wearable electronic device, or a false wake-up event of a non-user himself occurs.

Disclosure of Invention

In order to solve the above technical problem, the present application provides a voiceprint recognition method and an electronic device. According to the method, the electronic equipment acquires the voiceprint information according to the received voice signal and the muscle vibration signal, compares the voiceprint information with the registered voiceprint confidence coefficient critical value to determine whether the voiceprint information is the voiceprint information of the registered user, and sends the voice signal corresponding to the voiceprint information of the registered user to the other electronic equipment, so that the awakening rate of the electronic equipment can be improved, the mistaken awakening rate is reduced, and the use experience of the user is improved.

In a first aspect, the present application provides a voiceprint recognition method. The method comprises the following steps: the electronic device acquires a target voice signal and a first muscle vibration signal, wherein the target voice signal comprises a first voice signal and a second voice signal. The electronic equipment separates the target voice signal to obtain a first voice signal and a second voice signal. Then, the electronic device acquires first voiceprint information based on the first voice signal and the first muscle vibration signal. And acquiring second voiceprint information by the electronic equipment based on the second voice signal and the first muscle vibration signal. Further, based on the first voiceprint information, the electronic device obtains a confidence level of the first voiceprint information. And based on the second acoustic line information, the electronic equipment acquires the confidence of the second acoustic line information. The confidence degree of the first voiceprint information detected by the electronic equipment is larger than or equal to the registration voiceprint confidence degree critical value, and the first voiceprint information is determined to belong to the registered user. And the electronic equipment detects that the confidence coefficient of the second voiceprint information is smaller than the registration voiceprint confidence coefficient critical value, and determines that the second voiceprint information does not belong to the registered user. And finally, the electronic equipment sends the first voice signal corresponding to the first voiceprint information to another electronic equipment. In this way, with the aid of the first muscle vibration signal, the combination of the first muscle vibration signal with the respective speech signals enables different voiceprint information to be obtained. In the different voiceprint information, only the voiceprint information formed by the voice signal belonging to the registered user and the muscle vibration signal belonging to the registered user is compared with the threshold value of the confidence degree of the registered voiceprint in the voiceprint judgment process to obtain the conclusion that the voiceprint information belongs to the registered user, namely, the voiceprint information belonging to the registered user can be found.

Illustratively, the first voice signal, the second voice signal and the first muscle vibration signal may be received simultaneously or with a prior history. The order in which the signals are received is not limited.

For example, the first voice signal and the first muscle vibration signal may be uttered by a registered user; the second speech signal may be uttered by the first user. The first user may be the voice of one user or may be the voice of two or more users.

Illustratively, the other electronic device may be a wearable electronic device or a mobile phone.

For example, in the step of generating the voiceprint information, the first voiceprint information may be calculated first, the second voiceprint information may be calculated first, or both the first and the second voiceprint information may be calculated at the same time. The present application does not limit the calculation timing of each voiceprint information.

According to the first aspect, before the step of transmitting the first voice signal corresponding to the first voiceprint information to the other electronic device, the method further includes: and the electronic equipment filters a second voice signal corresponding to the second voiceprint information. Like this, after the filtering second speech signal, only retained first speech signal in the target speech signal to obtain pure speech environment, electronic equipment will only belong to registered user's first speech signal and send to another electronic equipment, so that between two electronic equipment the interactive speech data of communication in-process is more clear, can promote user's experience and feel.

According to the first aspect, or any implementation manner of the first aspect above, before the step of acquiring the target speech signal and the first muscle vibration signal, the electronic device is in a screen-off state, and the method further includes: the electronic device acquires a third voice signal and a second muscle vibration signal. And the electronic equipment acquires third voiceprint information based on the third voice signal and the second muscle vibration signal. Then, based on the third voiceprint information, the electronic device obtains a confidence level of the third voiceprint information. And then, when the electronic equipment detects that the confidence degree of the third voiceprint information is greater than or equal to the registered voiceprint confidence degree critical value, determining that the third voiceprint information belongs to the registered user, and displaying a desktop by the electronic equipment. And when the electronic equipment detects that the confidence coefficient of the third voiceprint information is smaller than the registration voiceprint confidence coefficient critical value, determining that the third voiceprint information does not belong to the registered user, and keeping the electronic equipment in a screen-off state. Thus, the electronic device of the present application, for example, the electronic device of the present application may be a wearable electronic device or a mobile phone, and is always in a screen-off state in an unused state, and after receiving a third voice signal and a second muscle vibration signal of a user speaking a wakeup word, obtains voiceprint information of the user through the two signals. Then, through the voiceprint comparison mode, the electronic device can confirm that the voiceprint information belongs to the voiceprint information of the registered user, and then the electronic device is unlocked and the desktop is displayed. Or, through the voiceprint comparison mode, the electronic device can confirm that the voiceprint information does not belong to the voiceprint information of the registered user, and then the electronic device is still in the screen-off state. By adopting the voiceprint recognition method, the false awakening rate of the electronic equipment can be reduced, and the experience of a user is improved.

Illustratively, the third speech signal and the second muscle vibration signal may be received simultaneously or with a prior. The order in which the signals are received is not limited.

Illustratively, the electronic device is in a screen-off state, i.e., a state in which the desktop is not displayed.

For example, the electronic device may receive, process, and transmit signals in the off-screen state.

According to the first aspect, or any implementation manner of the first aspect above, after the step of displaying the desktop on the electronic device, the method further includes: the electronic device acquires a fourth voice signal and a third muscle vibration signal, and the fourth voice signal is used for indicating the electronic device to start the target application. Based on the fourth voice signal and the third muscle vibration signal, the electronic device launches the target application. By adopting the voiceprint recognition method, the electronic equipment can execute the instruction after receiving the instruction voice signal and the muscle vibration signal of the registered user. Therefore, the awakening efficiency of the electronic equipment can be improved, and the experience of a user is improved.

Illustratively, the fourth speech signal and the third muscle vibration signal may be received simultaneously or with a prior. The order in which the signals are received is not limited.

Illustratively, the fourth voice signal and the third muscle vibration signal of the present application belong to a registered user.

According to a first aspect, or any one of the above implementations of the first aspect, before the step of obtaining the target speech signal and the first muscle vibration signal, the method further comprises: the electronic device acquires a fifth voice signal and a fourth muscle vibration signal. Then, the electronic device acquires the registered voiceprint information based on the fifth voice signal and the fourth muscle vibration signal. Then, based on the registered voiceprint information, the electronic device obtains a registered voiceprint confidence critical value. By adopting the voiceprint recognition method, the electronic equipment holder, namely the registered user can be ensured. The registered voiceprint information pertaining to the registered user can be pre-stored on the electronic device prior to use of the electronic device. The registered voiceprint information can obtain a registered voiceprint confidence critical value through a statistical method of model calculation. The registered voiceprint confidence threshold value may be used as a reference in the voiceprint comparison process in any one of the above implementation manners. Specifically, the registered voiceprint information is obtained according to the fifth voice signal and the fourth muscle vibration signal, and under the combination of the muscle vibration signals of the registered user, the voiceprint information belonging to the registered user can be obtained more accurately, so that the voiceprint information can be distinguished from other voiceprint information. The voiceprint information of the registered user can be understood as the exclusive ID information of the holder of the electronic device.

According to the first aspect, or any implementation manner of the first aspect above, the method further includes: the electronic device is a wearable electronic device.

According to a first aspect, or any one of the above implementations of the first aspect, the method further comprises: the wearable electronic device includes: a microphone and at least one vibration sensor; at least one vibration sensor is arranged corresponding to the muscle vibration hot area of the neck of the user in a fitting mode.

According to the first aspect, or any implementation manner of the first aspect above, the method further includes: the method for acquiring the target voice signal and the first muscle vibration signal comprises the following steps: the wearable electronic equipment collects a target voice signal through a microphone; and the wearable electronic device acquires the first muscle vibration signal through at least one vibration sensor. The muscle vibration signal when the user speaks is collected through at least one vibration sensor on the wearable electronic equipment, the voiceprint information can be obtained by combining and processing the muscle vibration signal and the voice signal, so that different voiceprint information is compared, voiceprint information belonging to a registered user is obtained, the method for recognizing the voiceprint information is favorable for quickly recognizing the user, meanwhile, the awakening rate of the electronic equipment can be improved, and the mistaken awakening rate is reduced.

According to the first aspect, or any implementation manner of the first aspect above, the method further includes: the electronic equipment is a mobile phone.

Illustratively, the electronic device may also be a tablet, a computer, or the like.

For example, the electronic device may receive the above signals from the wearable electronic device and process the signals.

According to the first aspect, or any implementation manner of the first aspect above, the method further includes: the method for acquiring the target voice signal and the first muscle vibration signal comprises the following steps: the mobile phone receives a target voice signal and a first muscle vibration signal acquired by the wearable electronic device. The above effects can be achieved, and are not described in detail here.

In a second aspect, the present application provides an electronic device. The electronic device includes: a memory and a processor, the memory and the processor being coupled. The memory stores program instructions that, when executed by the processor, cause the electronic device to perform the steps of: and acquiring a target voice signal and a first muscle vibration signal. And then, separating the target voice signal to obtain a first voice signal and a second voice signal. Based on the first voice signal and the first muscle vibration signal, first voiceprint information may be acquired. Based on the second voice signal and the first muscle vibration signal, second voiceprint information can also be acquired. And further acquiring the confidence coefficient of the first voiceprint information based on the first voiceprint information. And further acquiring the confidence of the second fingerprint information based on the second fingerprint information. And when the confidence coefficient of the first voiceprint information is detected to be larger than or equal to the registration voiceprint confidence coefficient critical value, determining that the first voiceprint information belongs to the registered user. And when the confidence coefficient of the detected second voiceprint information is smaller than the registration voiceprint confidence coefficient critical value, determining that the second voiceprint information does not belong to the registered user. And finally, sending the first voice signal corresponding to the first voiceprint information to another electronic device.

According to a second aspect, prior to the step of transmitting the first voice signal corresponding to the first voiceprint information to the further electronic device, the program instructions, when executed by the processor, cause the electronic device to perform the steps of: and filtering a second voice signal corresponding to the second voiceprint information.

According to a second aspect, or any implementation of the second aspect above, prior to the step of acquiring the target speech signal and the first muscle vibration signal, the program instructions, when executed by the processor, cause the electronic device to perform the steps of: and acquiring a third voice signal and a second muscle vibration signal. Based on the third voice signal and the second muscle vibration signal, third voiceprint information can be acquired. And further acquiring the confidence coefficient of the third voiceprint information based on the third voiceprint information. And when the confidence coefficient of the third voiceprint information is detected to be larger than or equal to the registration voiceprint confidence coefficient critical value, determining that the third voiceprint information belongs to the registered user, and displaying the desktop by the electronic equipment. And when the confidence coefficient of the third voiceprint information is smaller than the registration voiceprint confidence coefficient critical value, determining that the third voiceprint information does not belong to the registered user, and the electronic equipment is still in a screen-off state.

According to a second aspect, or any implementation manner of the second aspect above, after the step of displaying the desktop by the electronic device, the program instructions, when executed by the processor, cause the electronic device to perform the steps of: and acquiring a fourth voice signal and a third muscle vibration signal, wherein the fourth voice signal is used for indicating the electronic equipment to start the target application. Based on the fourth voice signal and the third muscle vibration signal, the target application is launched.

According to a second aspect, or any implementation of the second aspect above, prior to the step of acquiring the target speech signal and the first muscle vibration signal, the program instructions, when executed by the processor, cause the electronic device to perform the steps of: and acquiring a fifth voice signal and a fourth muscle vibration signal. Based on the fifth voice signal and the fourth muscle vibration signal, registered voiceprint information may be acquired. And further acquiring a registration voiceprint confidence critical value based on the registration voiceprint information.

According to a second aspect, or any implementation manner of the second aspect above, the electronic device may be a wearable electronic device.

According to a second aspect, or any implementation manner of the second aspect above, the wearable electronic device may include: a microphone and at least one vibration sensor. The at least one vibration sensor is arranged corresponding to the muscle vibration hot area of the neck of the user in a fitting mode.

According to a second aspect, or any implementation of the second aspect above, the program instructions, when executed by the processor, cause the wearable electronic device to perform the steps of: the method of acquiring a target speech signal and a first muscle vibration signal includes acquiring the target speech signal by the microphone. And acquiring the first muscle vibration signal by the at least one vibration sensor.

According to a second aspect or any implementation manner of the second aspect above, the electronic device is a mobile phone.

According to a second aspect, or any implementation of the second aspect above, the program instructions, when executed by the processor, cause the electronic device to perform the steps of: the method for acquiring the target voice signal and the first muscle vibration signal comprises the step of receiving the target voice signal and the first muscle vibration signal acquired by the wearable electronic device.

Any one implementation manner of the second aspect and the second aspect corresponds to any one implementation manner of the first aspect and the first aspect, respectively. For technical effects corresponding to any one implementation manner of the second aspect and the second aspect, reference may be made to the technical effects corresponding to any one implementation manner of the first aspect and the first aspect, and details are not repeated here.

In a third aspect, the present application provides a computer readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation manner of the first aspect.

In a fourth aspect, the present application provides a chip. The method comprises the following steps: one or more processing circuits and one or more transceiver pins; wherein the transceiver pin and the processing circuit communicate with each other via an internal connection path, and the processing circuit executes instructions of the method in the first aspect or any possible implementation manner of the first aspect to control the receiving pin to receive a signal and to control the sending pin to send a signal.

In a fifth aspect, the present application provides a computer program comprising instructions for carrying out the method of the first aspect or any possible implementation manner of the first aspect.

Drawings

Fig. 1 is a schematic structural diagram of an exemplary illustrated electronic device;

fig. 2 is a first wearing schematic diagram of the exemplary electronic device;

FIG. 3 is a second wear diagram of the exemplary electronic device;

FIG. 4 is a third wear diagram of the exemplary illustrated electronic device;

FIG. 5 is a schematic diagram of a software architecture of an exemplary illustrated electronic device;

FIG. 6 is a schematic diagram of an application scenario in which a cell phone is illustratively shown interacting with an electronic device;

FIG. 7 is a flow diagram illustrating interaction of a cell phone with an electronic device;

FIG. 8 is a schematic diagram of yet another exemplary application scenario;

FIG. 9 is yet another flow diagram illustrating interaction of a cell phone with an electronic device;

FIG. 10 is yet another flow chart diagram illustrating interaction of a cell phone with an electronic device;

FIG. 11 is yet another flow diagram illustrating interaction of a cell phone with an electronic device;

FIG. 12 is yet another flow chart diagram illustrating interaction of a cell phone with an electronic device;

FIG. 13 is yet another flow chart diagram illustrating interaction of a cell phone with an electronic device;

fig. 14 is a diagram of yet another exemplary application scenario.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The terms "first" and "second," and the like, in the description and in the claims of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first target object and the second target object, etc. are specific sequences for distinguishing different target objects, rather than describing target objects.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.

In the description of the embodiment of the present application, a wearable electronic device and a mobile phone are taken as examples for illustration, and in other embodiments, the present application is also applicable to connection scenarios between an electronic device such as a large screen, a laptop computer, a desktop computer, a palmtop computer (e.g., a tablet computer, a smart phone, etc.) and an electronic device such as an intelligent wearable device (e.g., an intelligent neck ring, an intelligent neck headset, etc.).

As shown in fig. 1, a schematic diagram of a hardware structure of a mobile phone in the embodiment of the present application, or a schematic diagram of a hardware structure of a wearable electronic device in the embodiment of the present application, is shown, and an electronic device 100 is used as a mobile phone to describe a hardware structure. The electronic device 100 may include: the mobile terminal includes a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like.

Illustratively, the audio module 170 may include a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, and the like.

For example, the sensor module 180 may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.; in the embodiment of the present application, the sensor module 180 may further include a vibration sensor.

Further, processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc.

It will be appreciated that in a particular electronic device, the different processing units may be separate devices or may be integrated in one or more processors.

Further, in some embodiments, the controller may be a neural hub and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

In addition, memory in the processor 110 is used primarily for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The wireless communication module 160 may provide solutions for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like.

In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. For example: the processor 110 can execute the instructions stored in the internal memory 121, so as to enable the electronic device 100 to execute the voiceprint recognition scheme provided by the embodiment of the application.

Furthermore, it should be noted that in a specific implementation, the internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like.

The electronic device 100 may implement an audio function through the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor included in the audio module 170, for example, the speaker 170A may be used for playing music, the receiver 170B may be used for recording sound, and the microphone 170C may be used for collecting voice signals.

The bone conduction sensor may acquire a vibration signal. In some embodiments, the bone conduction sensor may acquire a vibration signal of a human voice vibrating a bone mass. The bone conduction sensor can also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone block vibrated by the sound part obtained by the bone conduction sensor, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signals acquired by the bone conduction sensor, and a heart rate detection function is realized. In particular, in some embodiments, the bone conduction sensor may be a vibration sensor for collecting muscle vibration signals at the periphery of the vocal cords.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.

In addition, the keys 190 in the electronic device 100 include a power-on key, a volume key, and the like.

It should be noted that the electronic device 100 shown in fig. 1 is only one example of an electronic device, and in particular implementations, the electronic device 100 may have more or fewer components than those shown, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

In addition, it should be noted that, the above-described hardware structure of the electronic device 100 may be a mobile phone related to in the embodiment of the present application, for example, the hardware structure of the mobile phone, or may be a wearable electronic device capable of acquiring a voice signal and a muscle vibration signal simultaneously related to in the embodiment of the present application, for example, the wearable electronic device may be a bluetooth headset, an intelligent wearable device (such as a smart watch), a neck ring, and the like, which are not listed here one by one, and the present application is also not limited to this.

Wearable electronic devices are described below in conjunction with fig. 2-4. The hardware structure of the wearable electronic device may include the hardware structure of the electronic device 100 described above. Illustratively, the wearable electronic device may include a microphone 170C and a receiver 170B. The wearable electronic device may further comprise vibration sensors, at least one of which may be provided.

For example, the wearable electronic device in the embodiments of the present application may be a neck-worn bluetooth headset. Referring to fig. 2, a schematic diagram of a neck-worn bluetooth headset for a user. In combination with the part of the user to generate sound, the region indicated by the dashed line in the neck of fig. 2 is a muscle vibration hot area where the vocal cords vibrate to drive the peripheral muscles to vibrate. The wearable electronic equipment is used for surrounding and is provided with a first region and a second region which is isolated from the first region towards one side of the neck, the first region is internally provided with a first vibration sensor, and the second region is internally provided with a second vibration sensor. When the user wears the wearable electronic equipment on the neck, the first area and the second area can respectively be attached to the neck of the user, so that the sensing range of each vibration sensor can be ensured to be contained in the muscle vibration hot area. Illustratively, referring to fig. 2, a schematic view of a wearable bluetooth headset is worn by a user.

Illustratively, referring to fig. 3, a schematic view of an open-neck wearable device is worn by a user. Exemplarily, be used for surrounding and be provided with the third region and with the fourth region that the third region separates the setting towards one side of neck on the open-type neck wearing equipment, the third region and the fourth region are laminated respectively in the both sides that are close to the throat of user's neck to, the position of laminating is located muscle vibration hot area, is provided with third vibration sensor and fourth vibration sensor in the third region and the fourth region respectively.

Illustratively, referring to fig. 4, a schematic view of a user wearing a closed-loop collar is shown. Exemplarily, a fifth region, a sixth region and a seventh region are arranged on the closed-loop collar at intervals at positions around the neck, wherein the sixth region is attached to the neck at a position close to the throat, the fifth region and the seventh region are respectively attached to two sides of the throat, the attached positions are all located in the muscle vibration hot area, and a fifth vibration sensor, a sixth vibration sensor and a seventh vibration sensor are respectively arranged in the fifth region, the sixth region and the seventh region.

Specifically, the arrangement position of the vibration sensor can be used for obtaining the hot area of the neck contact part of the user through user experience design and by combining with user big data of historical neck strap products. Illustratively, the user's neck contact hot zone comprises a muscle vibration hot zone; the number of the vibration sensors to be arranged is not limited, and may be set according to the positioning of the product and the desired effect. For example, the closer to the throat of the user, the more the distribution position of each vibration sensor in the above scenarios is, the stronger the signal of the acquired muscle vibration signal is.

Based on the wearable electronic device, the muscle vibration signals when the user speaks can be collected through the vibration sensor, and the voice signals when the user speaks can be collected through the microphone 170C. The collected muscle vibration signal and the voice signal can be sent to the mobile phone for corresponding processing. The software structure of the electronic device 100 is described below with reference to fig. 5. Before explaining the software structure of the electronic device 100, first, an architecture that can be adopted by the software system of the electronic device 100 will be explained.

Specifically, in practical applications, the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture.

Furthermore, it is understood that software systems currently used in mainstream electronic devices include, but are not limited to, Windows systems, Android systems, and iOS systems. For convenience of description, in the embodiment of the present application, an Android system with a layered architecture is taken as an example to exemplarily illustrate a software structure of the electronic device 100.

In addition, the wearable electronic device and the mobile phone related to the following description of the voiceprint recognition scheme provided by the embodiment of the application take an Android system as an example. In specific implementation, however, the voiceprint recognition scheme provided by the embodiment of the application is also applicable to other systems.

Referring to fig. 5, a block diagram of a software structure of the electronic device 100 according to the embodiment of the present application is shown.

As shown in fig. 5, the layered architecture of the electronic device 100 divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into three layers, i.e., an application layer, an application framework layer, and a kernel layer from top to bottom, as shown in fig. 5.

Wherein the application layer may include a series of application packages. As shown in FIG. 5, the application package may include applications such as application marketplace, call, navigation, Bluetooth, Wi-Fi, settings, etc.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer.

Wherein the application framework layer comprises a number of predefined functions. As shown in fig. 5, the application framework layer may include a front-end processing module, a fused feature extraction module, a voiceprint scoring determination module, a voice recognition analysis module, a call module, and the like.

It should be noted that, regarding the predefined function located in the application framework layer shown in fig. 5, specifically, the implementation of the voiceprint recognition scheme provided in the embodiment of the present application relates to, in a specific implementation, the application framework layer may further include other predefined functions according to an actual service requirement, such as a phone manager for providing a communication function of the electronic device 100, so as to implement management (connection, hang-up, and the like) of a call state of the electronic device 100, and may further include a resource manager for providing various resources, such as localized character strings, icons, pictures, layout files, video files, and the like, for the application, which is not limited in this application.

Furthermore, it is understood that the kernel layer in the Android system is a layer between hardware and software. The kernel layer at least comprises a display driver, a Wi-Fi driver, a Bluetooth driver, an audio driver and a sensor driver.

Note that the layers in the software structure shown in fig. 5 and the components included in each layer do not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer layers than those shown, and may include more or fewer components in each layer, which is not limited in this application.

In order to better describe the implementation process of the voiceprint recognition method provided by the embodiment of the present application, an interaction process of the wearable electronic device and the mobile phone is taken as an example in the embodiment of the present application, and with reference to fig. 6, the wearable electronic device and the mobile phone are connected by bluetooth.

In the bluetooth connection state, the voiceprint recognition scheme provided in the embodiment of the present application is described through the following several scenarios. It should be noted that the voiceprint recognition scheme may include a voiceprint registration method, a voiceprint determination method for a registered user, a voiceprint noise filtering method, and the like.

The voiceprint registration method provided in the embodiment of the present application is described in detail below with reference to fig. 7 and fig. 8. For better understanding of the voiceprint registration scheme, it is assumed that the wearable electronic device is used by the user for the first time and the user correctly wears the wearable electronic device, that is, the position on the wearable electronic device where the vibration sensor is configured can be attached to the muscle vibration hot zone on the periphery of the throat on the neck. A voiceprint registration scheme in the voiceprint recognition scheme is described with reference to fig. 7 and 8.

Referring to fig. 7, the method specifically includes:

s101a, a voice signal is collected by a collection module in the wearable electronic device.

Illustratively, as shown in fig. 8, the wearable electronic device collects voice signals through a microphone. For example, the voice signal in the embodiment of the present application may be a specific wake-up word set for the product, such as "my glory". And performing voiceprint registration of a specific user through a specific awakening word, wherein the specific user refers to a registered user of the wearable electronic device.

For example, each voice signal and each muscle vibration signal in the embodiment of the present application may be acquired simultaneously or sequentially. In this embodiment, the collection timing of each voice signal and each muscle vibration signal is not limited.

S101b, the wearable electronic device sends the voice signal to a front-end processing module of the mobile phone.

Specifically, based on the bluetooth connection between wearable electronic equipment and the cell-phone, wearable electronic equipment sends speech signal's data to the cell-phone.

For example, each voice signal and each muscle vibration signal in the embodiment of the present application may be transmitted simultaneously or sequentially. The present embodiment does not limit the transmission timing of each voice signal and each muscle vibration signal.

For example, each voice signal and each muscle vibration signal in the embodiment of the present application may be received simultaneously or sequentially. In this embodiment, the receiving timing of each voice signal and each muscle vibration signal is not limited.

S102a, collecting muscle vibration signals by a collecting module in the wearable electronic equipment.

Referring to fig. 8, the wearable electronic device collects muscle vibration signals through a vibration sensor or a bone conduction sensor.

For example, the muscle vibration signal in the embodiment of the present application may be a muscle vibration signal when the user utters a sound of a specific wake-up word.

And S102b, the wearable electronic device sends the muscle vibration signal acquired in the S102a to a front-end processing module of the mobile phone.

Specifically, based on the bluetooth connection between wearable electronic equipment and the cell-phone, wearable electronic equipment sends the data of muscle vibration signal to the cell-phone. Illustratively, the wearable device sends a request message (which may also be referred to as a target signal, request information, etc.) to the electronic device, where the request message includes a voice signal and a muscle vibration signal for requesting the electronic device to process the voice signal and the muscle vibration signal.

And S103, the mobile phone performs noise reduction processing on the received voice signal and the muscle vibration signal. Specifically, after the noise reduction processing, a voice signal and a muscle vibration signal after the noise reduction are obtained.

Illustratively, a front-end processing module in the mobile phone performs noise reduction processing on the received voice signal and the muscle vibration signal. Noise reduction processing includes, but is not limited to: and (4) filtering out ambient noise. Illustratively, ambient noise filtering may be implemented by a front-end noise reduction algorithm. Illustratively, the front-end noise reduction algorithm may be a typical LMS adaptive filtering algorithm and its modified algorithm, spectral subtraction, wiener filtering noise reduction algorithm, etc. The embodiment of the application does not limit the front-end noise reduction algorithm, and the noise reduction effect can be realized. The noise reduction calculation sequence of the voice signal and the muscle vibration signal is not limited in the embodiment of the application. The voice signal and the muscle vibration signal sent by the wearable electronic equipment are processed by a noise reduction algorithm, so that the noise component in the signal can be effectively removed, and a useful signal with a higher signal-to-noise ratio is obtained.

And S104, the front-end processing module sends the voice signal and the muscle vibration signal after noise reduction to the fusion feature extraction module.

Illustratively, the front-end processing module is located in a mobile phone, the fusion feature extraction module is located in a cloud server, and the front-end processing module and the fusion feature extraction module can transmit noise-reduced voice signals and muscle vibration signal data through a cloud local area network.

And S105, extracting characteristic parameters and carrying out model calculation. Specifically, characteristic parameters are extracted from the voice signal and the muscle vibration signal after noise reduction, and model calculation is performed.

Specifically, the characteristic parameters include characteristic parameters extracted from the voice signal and the vibration signal of the speaker, which can represent the specific organ structure or the habitual behavior of the speaker. The characteristic parameters are different from one another due to individual differences. For example, the size of the acoustic cavities, including particularly the throat, nasal cavity, oral cavity, etc., and the shape, size and location of these organs determine the magnitude of vocal cord tension and the range of vocal frequencies. Therefore, different people speak the same, but the frequency distribution of the sound is different, and the sound sounds with heavy and loud sound. The sounding cavity of each person is different, and like fingerprints, the sound of each person has unique characteristics. The sound organs are manipulated in different ways, including lip, tooth, tongue, soft palate, and palatal muscles, which interact to produce clear speech. And the cooperation mode among the people is randomly learned by the communication between the acquired people and the surrounding people. In the process of learning speaking, a person can gradually form the vocal print characteristics of the person by simulating the speaking modes of different people around the person. Illustratively, there are many methods for extracting the characteristic parameters. For example, the feature parameter may be extracted by using a signal spectrum parameter, such as extracting a pitch spectrum and its contour, the energy of a pitch frame, the occurrence frequency of a pitch formant, and other parameter features. Illustratively, the characteristic parameters can also be extracted by a linear prediction parameter method. For example, an existing mathematical model may be used to estimate the signal characteristics using the corresponding approximation parameters. Different methods such as linear predictive cepstrum (LPCC), Mel-frequency cepstrum (MFCC) and the like are used for feature coefficient extraction.

Further, the model computation includes pattern matching recognition correlation techniques. The pattern matching and recognizing related technology comprises the steps of matching the characteristic parameters to be recognized with a model base obtained in training and learning on the basis of obtaining the characteristic parameters, and accordingly judging and outputting the best result in possible results. Illustratively, the pattern matching recognition related art may apply a vectorization model of a Support Vector Machine (SVM) or the like. For example, the pattern matching recognition related art may apply a stochastic model such as a Hidden Markov Model (HMM) or a Gaussian Mixture Model (GMM). For example, pattern matching recognition correlation techniques may apply neural network models. The method for extracting the features of the pattern matching identification related technology is not particularly limited in the embodiment of the present application.

And S106, acquiring the registered voiceprint information.

Illustratively, the extracted feature parameters are stored in a certain form, so that the feature parameters form fixed voiceprint information of the speaker. The voiceprint information is embodied differently from the model applied in S105. Illustratively, the voiceprint information can be embodied in a digital combination. The embodiment of the present application does not limit the embodiment of the voiceprint information.

In conclusion, the registered voiceprint information can be successfully stored in the mobile phone. It is noted that the above generation process of the registered voiceprint information can operate in a quiet environment, and the vibration sensor can recognize a muscle vibration signal to the periphery of the throat of the registered user.

After the registered voiceprint information is successfully generated, subsequent functional use can be carried out. On the premise that the wearable electronic equipment and the mobile phone are in the Bluetooth connection state, the mobile phone is in the screen-off state. For example, when the mobile phone is in the screen-off state, the mobile phone does not display the desktop, and can receive, process and transmit signals. Referring to fig. 9, a description is given of a flow of how a user wakes up a mobile phone, that is, a specific scheme of a registration voiceprint determination method is described.

S201a, a collection module in the wearable electronic device collects the voice signal.

The specific contents may refer to the description in S101 a. For example, the voice signal in the present embodiment may be a voice signal when the user says "unlock".

S201b, the wearable electronic device sends the voice signal to the front-end processing module of the mobile phone.

Specific contents refer to those in S101 b.

S202a, a collecting module in the wearable electronic device collects muscle vibration signals.

Specific contents refer to those in S102 a. Illustratively, the muscle vibration signal in this embodiment may be a muscle vibration signal when the user says "unlock".

And S202b, the wearable electronic device sends the muscle vibration signal acquired in the S202a to a front-end processing module of the mobile phone.

Specific contents refer to those in S102 b.

And S203, the front-end processing module performs noise reduction processing on the received voice signal and the received muscle vibration signal.

The specific contents refer to the contents in S103.

And S204, the front-end processing module sends the voice signal and the muscle vibration signal after noise reduction to a fusion feature extraction module in the mobile phone. The specific contents refer to those in S104.

And S205, extracting characteristic parameters and carrying out model calculation.

Illustratively, in the fusion feature extraction module, feature parameters are extracted from the received noise-reduced voice signal and muscle vibration signal, and model calculation is performed. The extraction method and model reference basis of the feature parameters are the same as those in S105, and are not limited here as long as the extraction of the feature parameters of the speaker and the acquisition of the voiceprint information can be achieved.

S206, acquiring first voiceprint information.

Specifically, the first voiceprint information is the voiceprint information of the current user, and the embodiment of the first voiceprint information is the same as the embodiment of the voiceprint information of the registered user in S106, which is not described herein again.

And S207, the fused feature extraction module sends the first voiceprint information to the voiceprint scoring judgment module.

Illustratively, the fused feature extraction module and the voiceprint scoring judgment module can be both located in the cloud server, and data transmission between the fused feature extraction module and the voiceprint scoring judgment module can be achieved in the cloud server.

And S208, detecting that the confidence score of the first voiceprint information is greater than or equal to the registration voiceprint confidence critical value.

Specifically, in the voiceprint scoring and judging module, the first voiceprint information of the current user is compared with the voiceprint information of the registered user, wherein the comparison of the confidence level is the most common comparison. Confidence, also referred to as reliability or confidence level, confidence coefficient, means that when a sample estimates an overall parameter, its conclusion is always uncertain due to the randomness of the sample. Therefore, a probabilistic statement method, i.e. interval estimation in mathematical statistics, is used, i.e. how large the corresponding probability of the estimated value and the overall parameter are within a certain allowable error range, and this corresponding probability is called confidence.

In one example, if the confidence score of the first voiceprint information is greater than or equal to the registered voiceprint confidence threshold, S209 is performed.

In another example, if the confidence score of the first voiceprint information is less than the registered voiceprint confidence threshold, the difference between the first voiceprint information and the voiceprint information of the registered user is large, the first voiceprint information does not belong to the registered user, and S209 is not performed.

For example, the voiceprint scoring module can employ a Gaussian Mixture Model (GMM) and a Hidden Markov Model (HMM). The model calculation in the voiceprint scoring judgment module is not limited as long as the voiceprints can be compared. For example, no matter which model is applied, the confidence level of the voiceprint is obtained, the confidence level of the first voiceprint information is compared with the confidence level threshold of the registered user, and when it is detected that the confidence level score of the first voiceprint information is greater than or equal to the confidence level threshold of the registered user, the step S209 is executed.

S209, determining that the current user belongs to the registered user.

Specifically, since the confidence score of the detected first voiceprint information is greater than or equal to the confidence score of the voiceprint information of the registered user, it can be obtained that the first voiceprint information belongs to the registered user, and a result that the current user is the registered user is further obtained.

S210, the mobile phone is awakened.

Specifically, the mobile phone is unlocked, and for example, a desktop can be displayed. In some embodiments, the mobile phone may also display an application interface, and the like, which is not limited in this application

For example, after the mobile phone obtains the information of "determining that the current user belongs to the registered user", the mobile phone is unlocked and displays the desktop, that is, the mobile phone is successfully awakened.

By adopting the registered user voiceprint determination method, whether the user wearing the wearable electronic equipment is the registered user can be determined, when the current user is determined to belong to the registered user, the mobile phone can be awakened, the picked signal is matched with the voiceprint information of the registered user, the voiceprint determination is used for determining whether the user is the voiceprint of the registered user, the voiceprint information of the vibration signal is fused, the awakening rate is improved, and the false break rate and the false awakening rate are reduced. Meanwhile, under the action of the voiceprint information of the fused vibration signals, the user instruction is recognized and completed, and the recording attack and the synthesized voice attack can be prevented.

Further through wearable electronic equipment, the cell-phone can discern and carry out current user's follow-up instruction, and is concrete, below combine fig. 10 to wearable electronic equipment keeps bluetooth connection status with cell-phone A as the scene, uses current user as the registration user and is the prerequisite, and current user correctly wears wearable electronic equipment, so that the last vibration sensor of wearable electronic equipment can gather the muscle vibration signal of throat periphery, and the last microphone of wearable electronic equipment can gather speech signal. In the following, the interaction flow of the wearable electronic device, the mobile phone a, and the mobile phone B is described in detail by taking an example in which the mobile phone receives a call instruction after being awakened. Referring to fig. 10, the method specifically includes:

the current user sends a sound signal of 'calling to the mobile phone B', and at the moment, the specific steps of the interactive process are as follows:

s301a, a collection module in the wearable electronic device collects the voice signal.

Specifically, as shown in fig. 8, the wearable electronic device collects voice signals through a microphone. For example, the voice signal in the embodiment of the present application may be a voice signal of a type of executing an instruction, such as "call to handset B".

S301b, the wearable electronic device sends the voice signal to the front-end processing module of the mobile phone.

The specific contents refer to contents in S101b, wherein the contents of the voice signal refer to S301 a.

S302a, collecting the muscle vibration signal by a collecting module in the wearable electronic equipment.

Specific contents refer to the contents in S102a, wherein the muscle vibration signal may be a muscle vibration signal when the voice signal of S301a is uttered.

And S302b, the wearable electronic device sends the muscle vibration signal acquired in S302a to the front-end processing module of the mobile phone A.

Specific contents refer to those in S102 b.

And S303, the front-end processing module of the mobile phone A performs noise reduction processing on the received voice signal and the muscle vibration signal to obtain a noise-reduced voice signal and a muscle vibration signal.

Illustratively, the method of the noise reduction processing is the same as the method mentioned in S103, and is not limited herein as long as the noise reduction effect can be achieved.

And S304, the front-end processing module of the mobile phone A sends the voice signal and the muscle vibration signal after noise reduction to the voice recognition analysis module of the mobile phone A.

Specifically, the front-end processing module and the voice recognition analysis module are both arranged in the mobile phone A, and signal transmission between the front-end processing module and the voice recognition analysis module is directly transmitted through electric signals.

And S305, recognizing and analyzing the voice signal and the muscle vibration signal after noise reduction to obtain an instruction signal.

Specifically, after the voice signal of "call to the mobile phone B" is recognized and analyzed in the voice recognition and analysis module of the mobile phone a, a specific instruction signal is obtained, so that the instruction signal can be directed to the call module of the mobile phone a.

And S306, the voice recognition and analysis module of the mobile phone A sends an instruction signal to the call module of the mobile phone A.

Specifically, the voice recognition analysis module and the call module are both arranged in the mobile phone A, and signal transmission between the voice recognition analysis module and the call module is directly transmitted through an electric signal.

And S307, interacting voice data.

Illustratively, after receiving an instruction signal of 'call to mobile phone B', the call module of mobile phone a performs an operation of dialing to mobile phone B, and meanwhile, mobile phone a and mobile phone B interact voice data.

By adopting the process, the wearable electronic equipment can accurately execute the instruction of calling the mobile phone B, so that the mobile phone A can accurately execute the calling action, the accuracy of acquiring signals of the wearable electronic equipment is improved, and meanwhile, the accuracy of awakening the mobile phone A and identifying the instruction is improved. Further, when the mobile phone a is in the call process, the processing of filtering the voice of the unregistered user may be performed on the voice signal and the muscle vibration signal collected by the wearable electronic device, that is, the voice signal of the speaker other than the registered user is filtered, and the following describes an interaction flow involved in the process of filtering the voice of the unregistered user with reference to fig. 11.

When the current user sends a call sound of "i want to make a appointment with you", there is also any sound sent by other users at the same time in the environment where the current user is located, in order to ensure that the call content can be clearly received by the mobile phone B, the mobile phone a can perform noise filtering processing on the sound information at the same time interval, and filters out voiceprints other than registered users, please refer to fig. 11, which specifically includes:

s401a, a voice signal is collected by a collection module in the wearable electronic device.

The specific contents refer to those of S101 a. Illustratively, the voice signal may be call information during a call, such as "i want to and you reserve a time". Meanwhile, the voice signal of the embodiment may also include voices of other users.

S401b, the wearable electronic device sends the voice signal to the front-end processing module of the mobile phone.

The specific contents refer to those of S101 b.

S402a, an acquisition module within the wearable electronic device acquires the muscle vibration signal.

The specific contents refer to the contents of S102 a. Illustratively, the muscle vibration signal may be a muscle vibration signal when the user speaks during a conversation, for example, a muscle vibration signal when saying "i want and you reserve time".

And S402b, the wearable electronic device sends the muscle vibration signal acquired in the S402a to the front-end processing module of the mobile phone A.

The specific contents refer to the contents of S102 b.

S403, separating a plurality of voice tracks by separation technique.

Specifically, the front-end processing module of the mobile phone a performs noise reduction processing on the ambient sound on the received voice signal and the muscle vibration signal. Illustratively, the method of the noise reduction processing is the same as the method mentioned in S103, and is not limited herein as long as the noise reduction effect can be achieved. For example, the front-end processing module can also embody the received voice signals and muscle vibration signals in different voice tracks by using a separation technology, for example, when the current user says "i want to and you reserve a time", another two persons make sounds and the sounds are collected by a microphone of the wearable electronic device, and the voice of the time period can be separated into three voice tracks by using the separation technology, wherein each voice track corresponds to the voice signal of one person.

S404, the front-end processing module sends the voice signals of all voice tracks to the fusion feature extraction module of the mobile phone A.

The specific contents refer to the contents of S104.

S405, extracting the characteristic parameters of each voice signal and carrying out model calculation.

Illustratively, according to the example of S403, three tracks correspond to the voice signals of three persons, respectively. Illustratively, the three tracks include: a first voice track, a second voice track, and a third voice track, the voice signals of the three persons including: a first speech signal, a second speech signal and a third speech signal. The first voice audio track corresponds to a first voice signal, the second voice audio track corresponds to a second voice signal, and the third voice audio track corresponds to a third voice signal.

For example, the first voice signal corresponding to the first voice track, the second voice signal corresponding to the second voice track, and the third voice signal corresponding to the third voice track are respectively combined with the muscle vibration signal acquired in S402a, and then feature parameters are respectively extracted and model calculation is performed after combination, where specific contents refer to contents of S105, which is not described herein again.

S406, obtaining each voiceprint information.

Illustratively, based on the extraction of the feature parameters and the model calculation in S405, first voiceprint information corresponding to the first speech signal, second voiceprint information corresponding to the second speech signal, and third voiceprint information corresponding to the third speech signal are obtained, respectively.

Illustratively, the three pieces of voiceprint information may be obtained simultaneously or sequentially, and the calculation timing sequence of the three pieces of voiceprint information is not limited in the embodiment of the present application.

Specifically, the wearable electronic device is worn by the registered user, and the vibration sensor is attached to the neck of the registered user, so that the voiceprint information of the registered user is obtained by processing the voice signal and the muscle vibration signal of the registered user together, wherein the muscle vibration signal can be calculated to obtain the voiceprint information belonging to the registered user only by extracting the characteristic parameters by combining with the voice signal of the registered user. Even if the voice signals of other users are combined with the muscle vibration signals of the registered user, the characteristic parameter extraction process and the calculation result are different from those of the registered user, and only the voiceprint information different from that of the registered user can be obtained. Thus, in cooperation with the muscle vibration signal, it can be ensured that the same voiceprint information does not occur for the same period, and therefore, S407 is performed.

And S407, the fusion feature extraction module sends the voiceprint information to the voiceprint scoring judgment module.

The specific contents refer to the contents of S207.

S408, the voiceprint scoring judgment module detects that the confidence value of one voiceprint information is larger than or equal to the registered voiceprint confidence threshold value.

The specific contents refer to the contents of S208. Illustratively, it is assumed that the confidence level of the first voiceprint information is greater than the registration voiceprint confidence threshold.

S409, determining that the voiceprint information belongs to the registered user.

Specifically, based on the detection step of S408, it is determined that the one piece of voiceprint information belongs to the registered user. Illustratively, the first voiceprint information is determined to belong to the registered user based on the confidence level of the first voiceprint information being greater than the registered voiceprint confidence threshold.

S410, the voiceprint scoring judgment module sends one voiceprint information as the voiceprint information of the registered user to the front-end processing module.

For example, the voiceprint scoring judgment module and the front-end processing module may both be located in the mobile phone, and data transmission between the two modules may be directly through electrical signal transmission.

Illustratively, the voiceprint scoring judgment module and the front-end processing module may be both located in the cloud server, and data transmission between the voiceprint scoring judgment module and the front-end processing module may be directly transmitted in the local area network.

Illustratively, according to the detection of S408, one of the voiceprint information refers to the first voiceprint information.

S411, the voice track of the unregistered user is filtered from the voice signal.

Illustratively, the front-end processing module receives that the first voiceprint information is the voiceprint information of the registered user, and performs an operation of filtering out the voice track of the unregistered user. For example, filtering out the voice tracks of the non-registered users may include removing voice tracks corresponding to voiceprint information other than the registered voiceprint information, such that only voice tracks of registered users corresponding to the registered voiceprint information are retained. For example, since the first voiceprint information is the voiceprint information of the registered user, this step may filter the second voice track corresponding to the second voiceprint information and the third voice track corresponding to the third voiceprint information, so as to reserve the first voice track corresponding to the first voiceprint information, where the first voice track is a carrier for carrying the first voice signal.

S412, generating a pure voice signal.

Specifically, based on the operation of filtering out voiceprint information other than the registered user in S411, the first voice track with the first voice signal can be obtained, and the first voice signal with only the voiceprint information of the registered user can be further obtained, so as to generate a clean voice signal.

S413, the front-end processing module sends the pure voice signal to the call module of the mobile phone a.

Specifically, the front-end processing module sends the pure voice signal of S412 to the call module of the mobile phone a, so that pure voice interaction is performed between the mobile phone a and other mobile phones.

By adopting the process, the environmental noise can be filtered in the conversation process, and the voice of speakers except the registered user can be filtered, so that the conversation environment is cleaner, the conversation quality can be improved, the conversation efficiency is enhanced, and the conversation voice of the registered user can be accurately transmitted to other mobile phones by the mobile phone A connected with the wearable electronic equipment.

Above for after confirming that current user is registered user, relate to the relevant signal processing flow of conversation, below combine fig. 12, use wearable electronic equipment and cell-phone A to keep bluetooth connection state as the scene, use current user as registered user and be the prerequisite, and current user correctly wears wearable electronic equipment to make the last vibration sensor of wearable electronic equipment can gather the muscle vibration signal of throat periphery, receiver or microphone on the wearable electronic equipment can gather speech signal. For example, the wearable electronic device and the mobile phone may be configured to receive and open one of the application programs in the mobile phone after the mobile phone is awakened. Referring to fig. 12, the method specifically includes:

the current user sends out a voice signal of 'turn on navigation', and at the moment, the specific steps of the interactive process are as follows:

s501a, a collection module in the wearable electronic device collects the voice signal.

The specific contents refer to those of S101 a. For example, the voice signal may be instruction information for opening an application in the mobile phone, such as "open navigation".

S501b, the wearable electronic device sends the voice signal to the front-end processing module of the mobile phone.

The specific contents refer to the contents of S101 b.

S502a, the acquisition module in the wearable electronic device acquires the muscle vibration signal.

The specific contents refer to the contents of S102 a. Illustratively, the muscle vibration signal may be a muscle vibration signal when the user speaks during a conversation, for example a muscle vibration command signal when saying "turn on navigation".

S502b, the wearable electronic device sends a muscle vibration signal to the front-end processing module of the cell phone.

The specific contents refer to the contents of S102 b.

And S503, the front-end processing module of the mobile phone performs noise reduction processing on the received voice signal and muscle vibration signal to obtain a noise-reduced voice signal and muscle vibration signal.

The specific contents refer to the contents of S103. For example, the front-end processing module may further perform operations of filtering the voice signal of the unregistered user in S406-S412 by interacting with the fusion feature extraction module and the voiceprint scoring determination module, so as to obtain a pure voice signal.

And S504, the front-end processing module sends the voice signal and the muscle vibration signal after noise reduction to a voice recognition analysis module of the mobile phone.

The specific contents refer to the contents of S304.

And S505, recognizing and analyzing the voice signal and the muscle vibration signal after noise reduction to obtain an instruction signal.

Illustratively, the voice signal of the 'turn on navigation' and the muscle vibration signal are recognized and analyzed in the voice recognition analysis module of the mobile phone to obtain an instruction signal, so that the instruction can be directed to the navigation module of the mobile phone.

S506, the voice recognition and analysis module of the mobile phone sends an instruction signal to the navigation module of the mobile phone.

For example, the speech recognition and analysis module and the navigation module may be located in the mobile phone, and the signal transmission between the two modules may be directly transmitted through an electric signal.

And S507, executing an instruction and opening navigation.

Illustratively, after the navigation module of the mobile phone receives the command signal of 'turn on navigation', the navigation module executes the command, calls the cellular data and responds to turn on navigation.

And S508, the voice recognition and analysis module of the mobile phone sends an instruction signal to the navigation module of the mobile phone.

Specifically, the wearable electronic device continues to send a voice signal of "navigate to location a", the steps from S501a to S505 are continuously executed, a command signal of "navigate to location a" is obtained, and the voice recognition and analysis module sends the command signal to the navigation module.

S509, executing the command and navigating to the destination.

Specifically, the navigation module receives a command signal of "navigate to location a", executes the command, and navigates to destination "location a". For example, the instruction information is not limited to two words "navigate", but may be "where the location a is," how to go to the location a, "search the location a," "location a," and the like. The navigation instruction can be recognized as long as the name of the destination can be recognized. The navigation module can carry out navigation broadcasting in the navigation process.

S510, the navigation module sends navigation broadcast information to the wearable electronic device, so that a current user can hear the navigation broadcast information of the navigation module through the wearable electronic device, and the navigation use effect is more intelligent and convenient.

After the instruction signal of "navigate to location a" of this embodiment is executed, the navigation module may search the location of location a in real time through a cellular network or a wireless network, where the processing procedure of "call to mobile phone B" is different from the above-mentioned processing procedure, and the instruction signal of "call to mobile phone B" needs to be executed under a communication network.

By adopting the flow, the navigation instruction can filter out environmental noise and the voice of speakers except registered users, so that the sent instruction can be accurately identified.

Based on the above, the wearable electronic device is registered for correctly wearing the wearable electronic device, and the wearable electronic device and the mobile phone maintain the bluetooth connection state, and the interaction flow under different scenes on the premise that the current user belongs to the registered user is determined, in combination with fig. 13, the mobile phone is in the screen-off state under the condition that the wearable electronic device is not correctly worn, and the condition that the mobile phone cannot be awakened even if the sound of an awakening word is emitted is explained. Referring to fig. 13, the method specifically includes:

s601, a collection module in the wearable electronic equipment collects voice signals.

The specific contents refer to those of S101 a.

S602, the wearable electronic device sends the voice signal to a front-end processing module of the mobile phone.

The specific contents refer to those of S101 b.

S603, performing noise reduction processing on the received speech signal.

Specifically, the front-end processing module performs noise reduction processing on the received voice signal to obtain a noise-reduced voice signal.

S604, the front-end processing module sends the voice signal after noise reduction to a fusion feature extraction module in the mobile phone.

The specific contents refer to the contents of S104.

And S605, extracting characteristic parameters and carrying out model calculation.

Specifically, in the fusion feature extraction module, feature parameters are extracted from the received voice signal, and model calculation is performed. The extraction method and model reference basis of the feature parameters are the same as those in S105, and are not limited here as long as the extraction of the feature parameters of the speaker and the acquisition of the voiceprint information can be achieved.

And S606, acquiring second voiceprint information.

Specifically, based on the calculation result of S605, the voiceprint information of the current user, that is, the second voiceprint information, is acquired. The second voiceprint information is embodied in the same manner as the voiceprint information of the registered user in S106, and is not limited herein.

And S607, the fusion feature extraction module sends the second voiceprint information to a voiceprint scoring judgment module in the mobile phone.

The specific contents refer to the contents of S207.

And S608, detecting that the confidence score of the second voiceprint information is smaller than the registered user confidence critical value.

Specifically, in the voiceprint scoring determination module, the second voiceprint information of the current user is compared with the voiceprint information of the registered user, and when it is detected that the confidence score of the second voiceprint information is smaller than the confidence threshold of the registered user, S609 is executed in the voiceprint scoring determination module.

And S609, determining that the second user does not belong to the registered user.

Specifically, based on the fact that the confidence score of the second voiceprint information detected in S608 is smaller than the confidence threshold of the registered user, it is determined that the current user does not belong to the registered user.

S610, the mobile phone is not awakened. Specifically, the mobile phone is still in the screen-off state.

Specifically, based on the determination that the current user does not belong to the registered user in S609, the mobile phone is not woken up, and the mobile phone is always in the screen-off state.

As can be seen from the above description, when the user does not wear the wearable electronic device correctly, that is, the vibration sensor is not located in the sensing area, the vibration sensor cannot recognize the muscle vibration signal, even if the receiver or the microphone recognizes the voice signal, in the absence of the muscle vibration signal, the voiceprint information corresponding to the voice signal of the user is not matched with the voiceprint information corresponding to the muscle vibration signal and the voice signal during registration, and therefore, the terminal cannot be woken up when the wearable electronic device is not worn correctly.

It should be noted that, in the embodiment of the present application, all execution main bodies for processing signals are electronic devices, for example, mobile phones. In other embodiments, the executive body may also be a wearable electronic device. For example, after the wearable electronic device acquires the voice signal and the muscle vibration signal, the voice signal and the muscle vibration signal may be processed accordingly, the specific processing process is similar to that of the mobile phone in the foregoing embodiment, and a description of the present application is not repeated.

In addition, it should be noted that, the present embodiment also provides an electronic device 100 (specifically, a mobile phone or a wearable electronic device), which includes a memory and a processor, where the memory and the processor are coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the above-described related method steps to implement the voiceprint recognition method in the above-described embodiment.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer instruction is stored in the computer-readable storage medium, and when the computer instruction runs on an electronic device, the electronic device is caused to execute the relevant method steps to implement the voiceprint recognition method in the foregoing embodiment.

In addition, an embodiment of the present application further provides a computer program product, which when running on a computer, causes the computer to execute the above related steps to implement the voiceprint recognition method in the above embodiment.

In addition, embodiments of the present application also provide a chip (which may also be a component or a module), which may include one or more processing circuits and one or more transceiver pins; the receiving pin and the processing circuit communicate with each other through an internal connection path, and the processing circuit executes the related method steps to realize the voiceprint recognition method in the embodiment so as to control the receiving pin to receive signals and control the sending pin to send signals.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A voiceprint recognition method, comprising:

acquiring a target voice signal and a first muscle vibration signal;

separating the target voice signal to obtain a first voice signal and a second voice signal;

acquiring first voiceprint information based on the first voice signal and the first muscle vibration signal;

acquiring second voiceprint information based on the second voice signal and the first muscle vibration signal;

obtaining the confidence coefficient of the first voiceprint information based on the first voiceprint information;

acquiring the confidence coefficient of the second sound pattern information based on the second sound pattern information;

detecting that the confidence coefficient of the first voiceprint information is greater than or equal to a registered voiceprint confidence coefficient critical value, and determining that the first voiceprint information belongs to a registered user;

detecting that the confidence coefficient of the second voiceprint information is smaller than the registration voiceprint confidence coefficient critical value, and determining that the second voiceprint information does not belong to a registered user;

and sending the first voice signal corresponding to the first voiceprint information to another electronic device.

2. The method according to claim 1, wherein after the step of sending the first voice signal corresponding to the first voiceprint information to another electronic device, the method further comprises:

and filtering the second voice signal corresponding to the second voiceprint information.

3. The method of claim 1, wherein the step of obtaining the target speech signal and the first muscle vibration signal is preceded by the electronic device being in a de-screening state, the method further comprising:

acquiring a third voice signal and a second muscle vibration signal;

acquiring third voiceprint information based on the third voice signal and the second muscle vibration signal;

obtaining the confidence of the third voiceprint information based on the third voiceprint information;

when the confidence coefficient of the third voiceprint information is detected to be larger than or equal to the registration voiceprint confidence coefficient critical value, the third voiceprint information is determined to belong to a registered user, and the electronic equipment displays a desktop;

and when the confidence coefficient of the third voiceprint information is smaller than the registration voiceprint confidence coefficient critical value, determining that the third voiceprint information does not belong to the registered user, and the electronic equipment is still in the screen-off state.

4. The method of claim 3, wherein after the step of the electronic device displaying a desktop, the method further comprises:

acquiring a fourth voice signal and a third muscle vibration signal, wherein the fourth voice signal is used for indicating the electronic equipment to start a target application;

starting the target application based on the fourth voice signal and the third muscle vibration signal.

5. The method of any of claims 1-4, wherein prior to the step of obtaining the target speech signal and the first muscle vibration signal, the method further comprises:

acquiring a fifth voice signal and a fourth muscle vibration signal;

acquiring registered voiceprint information based on the fifth voice signal and the fourth muscle vibration signal;

and acquiring the registration voiceprint confidence critical value based on the registration voiceprint information.

6. The method of any of claims 1-5, wherein the electronic device is a wearable electronic device.

7. The method of claim 6, wherein the wearable electronic device comprises:

a microphone and at least one vibration sensor; the at least one vibration sensor is arranged corresponding to the muscle vibration hot area of the neck of the user in a fitting mode.

8. The method of claim 7, wherein the method of obtaining the target speech signal and the first muscle vibration signal comprises:

collecting the target voice signal through the microphone; and acquiring the first muscle vibration signal by the at least one vibration sensor.

9. The method of any of claims 1-5, wherein the electronic device is a cell phone.

10. The method of claim 9, wherein the method of obtaining the target speech signal and the first muscle vibration signal comprises:

receiving the target voice signal and the first muscle vibration signal acquired by the wearable electronic device.

11. An electronic device, comprising:

a memory and a processor, the memory and the processor coupled;

the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the steps of:

acquiring a target voice signal and a first muscle vibration signal;

acquiring the confidence coefficient of the first voiceprint information based on the first voiceprint information;

obtaining the confidence of the second acoustic line information based on the second acoustic line information;

detecting that the confidence coefficient of the first voiceprint information is greater than or equal to a registration voiceprint confidence coefficient critical value, and determining that the first voiceprint information belongs to a registered user;

12. The electronic device of claim 11, wherein prior to the step of sending the first voice signal corresponding to the first voiceprint information to another electronic device, the program instructions, when executed by the processor, cause the electronic device to perform the steps of:

13. The electronic device of claim 11 or 12, wherein prior to the step of acquiring the target speech signal and the first muscle vibration signal, the program instructions, when executed by the processor, cause the electronic device to perform the steps of:

acquiring a third voice signal and a second muscle vibration signal;

acquiring the confidence of the third voiceprint information based on the third voiceprint information;

when the confidence coefficient of the third voiceprint information is detected to be larger than or equal to the registration voiceprint confidence coefficient critical value, determining that the third voiceprint information belongs to a registered user, and displaying a desktop by the electronic equipment;

14. The electronic device of claim 13, wherein after the step of the electronic device displaying a desktop, the program instructions, when executed by the processor, cause the electronic device to perform the steps of:

15. The electronic device of any of claims 11-14, wherein prior to the step of acquiring the target speech signal and the first muscle vibration signal, the program instructions, when executed by the processor, cause the electronic device to perform the steps of:

acquiring a fifth voice signal and a fourth muscle vibration signal;

16. The electronic device of any of claims 11-15, wherein the electronic device is a wearable electronic device.

17. The electronic device of claim 16, wherein the wearable electronic device comprises:

18. The electronic device of claim 17, wherein the program instructions, when executed by the processor, cause the wearable electronic device to perform the steps of: the method for acquiring a target voice signal and a first muscle vibration signal comprises the steps of collecting the target voice signal through the microphone; and acquiring the first muscle vibration signal by the at least one vibration sensor.

19. The electronic device of any of claims 11-15, wherein the electronic device is a cell phone.

20. The electronic device of claim 19, wherein the program instructions, when executed by the processor, cause the electronic device to perform the steps of: the method for acquiring the target voice signal and the first muscle vibration signal comprises the step of receiving the target voice signal and the first muscle vibration signal acquired by the wearable electronic device.

21. A computer-readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform a voiceprint recognition method as claimed in any one of claims 1 to 7.

22. A chip comprising one or more processing circuits and one or more transceiver pins; wherein the transceiver pin and the processing circuit communicate with each other via an internal connection path, and the processing circuit performs the voiceprint recognition method of any one of claims 1 to 7 to control the receiver pin to receive signals and to control the transmitter pin to transmit signals.