CN114120950A

CN114120950A - Human voice shielding method and electronic equipment

Info

Publication number: CN114120950A
Application number: CN202210097399.4A
Authority: CN
Inventors: 杨昭
Original assignee: Honor Device Co Ltd
Current assignee: Beijing Honor Device Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-03-01
Anticipated expiration: 2042-01-27
Also published as: CN114120950B

Abstract

In the technical scheme of the human voice shielding method and the electronic equipment provided by the embodiment of the invention, the corresponding relation between a speaker and voice data in the current scene is determined according to a plurality of collected voice data; responding to the operation of a user, and determining initial relative orientation information of a target person and the target person relative to the user from speakers in a current scene; extracting target sound spectrum information from sound data corresponding to a target person, and obtaining a spatial difference compensation filter coefficient according to the initial relative orientation information; and shielding the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient. The embodiment of the invention can shield the sound of a specific person only and keep the sound of other persons and the environmental sound.

Description

Human voice shielding method and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to a human voice shielding method and electronic equipment.

Background

At present, TWS earphones and AR/VR related audio technologies are developed vigorously, when a user is in an environment scene with multiple persons and multiple sound sources, the user sometimes needs to shield the sound of specific persons and keep the sound and the environment sound of other persons, and the traditional technology for shielding sound signals has the problem of being eager, and the purpose that the sound of only the specific persons can not be shielded and the sound and the environment sound of other persons can not be kept can not be achieved.

Disclosure of Invention

In view of this, embodiments of the present invention provide a human voice shielding method and an electronic device, which can shield only the voice of a specific person and retain the voice of other persons and environmental sounds.

In a first aspect, an embodiment of the present invention provides a human voice shielding method, where the method includes:

determining a corresponding relation between a speaker in a current scene and the sound data according to the collected sound data;

responding to the operation of a user, and determining target persons and initial relative orientation information of the target persons relative to the user from speakers in the current scene;

extracting target sound spectrum information from the sound data corresponding to the target person, and obtaining a spatial difference compensation filter coefficient according to the initial relative orientation information;

and shielding the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient.

With reference to the first aspect, in certain implementations of the first aspect, the extracting target sound spectrum information from the sound data corresponding to the target person includes:

obtaining a discrete Fourier coefficient by performing discrete time Fourier transform on the sound data corresponding to the target person;

and performing voice signal enhancement processing on the discrete Fourier coefficients to obtain the target sound spectrum information.

With reference to the first aspect, in certain implementations of the first aspect, the obtaining spatial difference compensation filter coefficients according to the initial relative orientation information includes:

acquiring real-time relative azimuth information of the target person relative to the user;

obtaining real-time direction difference of the target person relative to the user according to the initial relative direction information and the real-time relative direction information;

and acquiring the spatial difference compensation filter coefficient corresponding to the real-time azimuth difference from a spatial cue library.

With reference to the first aspect, in certain implementations of the first aspect, the masking the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficients includes:

obtaining a signal to be shielded according to the target sound spectrum information and the spatial difference compensation filter coefficient;

generating a shielding signal which is in an opposite phase with the signal to be shielded and has the same amplitude with the signal to be shielded according to the signal to be shielded;

and shielding the signal to be shielded through the shielding signal so as to eliminate the sound of the target person.

With reference to the first aspect, in certain implementations of the first aspect, the determining, according to a plurality of collected sound data, a correspondence between a speaker in a current scene and the sound data includes:

and determining the corresponding relation between the speaker and the sound data in the current scene through a speaker segmentation clustering algorithm according to the sound data.

With reference to the first aspect, in some implementations of the first aspect, before the determining, in response to an operation of a user, initial relative position information of a target person and the target person relative to the user from speakers in the current scene, further includes:

and extracting the voiceprint characteristics of the corresponding speaker from the voice data.

With reference to the first aspect, in certain implementations of the first aspect, before the masking the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient, the method further includes:

judging whether the currently received sound data comprises the voiceprint characteristics of the target person;

if the currently received sound data is judged to include the voiceprint characteristics of the target person, continuing to execute the step of shielding the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient;

and if the currently received sound data does not comprise the voiceprint features of the target person, continuing to execute the step of judging whether the currently received sound data comprises the voiceprint features of the target person.

With reference to the first aspect, in certain implementations of the first aspect, the voiceprint features include a spectrogram, a fundamental frequency trace, and a long-time averaged spectrum.

With reference to the first aspect, in certain implementations of the first aspect, the obtaining real-time relative orientation information of the target person with respect to the user includes:

and obtaining the real-time relative orientation information according to the binaural amplitude difference, the binaural amplitude difference and the binaural cross-correlation coefficient.

In a second aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and the computer program includes program instructions, and when the processor executes the program instructions, the electronic device is caused to perform the following steps:

With reference to the second aspect, in some implementations of the second aspect, the extracting target sound spectrum information from the sound data corresponding to the target person includes:

With reference to the second aspect, in some implementations of the second aspect, the deriving a spatial difference compensation filter coefficient according to the initial relative orientation information includes:

With reference to the second aspect, in certain implementations of the second aspect, the masking the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficients includes:

With reference to the second aspect, in certain implementations of the second aspect, the determining, according to the collected multiple pieces of sound data, a correspondence between a speaker in the current scene and the sound data includes:

With reference to the second aspect, in some implementations of the second aspect, before the determining, in response to an operation of a user, initial relative position information of a target person and the target person relative to the user from speakers in the current scene, further includes:

With reference to the second aspect, in certain implementations of the second aspect, before the masking the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient, the method further includes:

With reference to the second aspect, in certain implementations of the second aspect, the voiceprint features include a spectrogram, a fundamental frequency trace, and a long-time averaged spectrum.

With reference to the second aspect, in some implementations of the second aspect, the obtaining the real-time relative orientation information of the target person with respect to the user includes:

In a third aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which, when the program is requested to be executed by a computer, make the computer execute the method as described above.

According to the technical scheme of the human voice shielding method and the electronic equipment, the corresponding relation between a speaker and voice data in the current scene is determined according to a plurality of collected voice data; responding to the operation of a user, and determining initial relative orientation information of a target person and the target person relative to the user from speakers in a current scene; extracting target sound spectrum information from sound data corresponding to a target person, and obtaining a spatial difference compensation filter coefficient according to the initial relative orientation information; and shielding the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient. The embodiment of the invention can shield the sound of a specific person only and keep the sound of other persons and the environmental sound.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of a human voice shielding method according to an embodiment of the present invention;

FIG. 2 is a schematic view of a first interface;

FIG. 3 is a schematic diagram of separating time domain speech information of different speakers by a speaker segmentation and clustering algorithm;

FIG. 4 is a schematic illustration of a voiceprint feature;

FIG. 5 is a flowchart of the extraction of target audio spectrum information from the audio data corresponding to the target person in FIG. 1;

FIG. 6 is a schematic diagram of extracting target sound spectrum information and generating a signal to be shielded;

fig. 7 is a general flowchart of a human voice screening method according to an embodiment of the present invention;

FIG. 8 is a flowchart of the method of FIG. 1 for obtaining spatial difference compensation filter coefficients corresponding to a target person;

FIG. 9 is a schematic diagram of spatial difference binaural compensation;

FIG. 10 is a schematic illustration of ITD calculation;

FIG. 11 is a diagram illustrating the generation of a spatial cue library;

FIG. 12 is a flowchart of the masking of the target person's voice according to the target audio spectrum information, spatial difference compensation filter coefficients of FIG. 1;

FIG. 13 is a schematic diagram of the principle of human acoustic shielding;

FIG. 14 is a schematic diagram of a standard Hybrid ANC algorithm framework;

fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 16 is a schematic structural diagram of an electronic device according to yet another embodiment of the present invention.

Detailed Description

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The human voice shielding method provided by the embodiment of the invention is applied to the environment scene of multiple people and multiple sound sources, and when a user uses the electronic equipment, the voice of a specific person is shielded through the electronic equipment. In the embodiment of the present invention, the electronic device includes an earphone, an Augmented Reality (AR)/Virtual Reality (VR) device, or other similar small electronic terminals, which support an active noise reduction technology.

For example, an environment scene with multiple persons and multiple sound sources includes a live conference, if a participant speaks something unrelated to the conference theme, the communication between the user and other people is affected, or the user simply does not want to hear the sound of the participant in the current scene, but wants to communicate with other people in the conference, and at the moment, the sound of the participant can be eliminated from the physical sound signal in real time through the electronic device.

For example: the environment scene of multiple persons and multiple sound sources comprises an AR/VR scene, persons existing in a real environment and other persons merged into the current environment through AR/VR technology exist simultaneously, and when sounds which are difficult to distinguish through overlapping positions or auditory sense of other persons occur simultaneously, sounds of persons which are not concerned relatively can be shielded temporarily through electronic equipment.

By adopting the human voice shielding method provided by the embodiment of the invention in the environment scene with multiple persons and multiple sound sources, the voice of a specific person can be shielded, and the voice and the environment voice of other persons are reserved.

Specifically, the electronic device collects sound data of all speakers in the current scene through the microphone, and determines a corresponding relationship between the speakers and the sound data in the current scene according to the collected sound data. The user selects an identification of the target person from a plurality of identifications of a plurality of speakers presented on a first interface of the electronic device, and when the user faces the target person, the user clicks on the calibration control. The electronic equipment responds to the operation of a user, and initial relative orientation information of a target person and the target person relative to the user is determined from a speaker in a current scene; extracting target sound spectrum information from sound data corresponding to a target person, and obtaining a spatial difference compensation filter coefficient according to the initial relative orientation information; and compensating a filter coefficient according to the target sound spectrum information and the spatial difference to obtain a signal to be shielded, generating a shielding signal which is opposite in phase and equal in amplitude to the signal to be shielded according to the signal to be shielded, and sending the shielding signal to a receiver of the electronic equipment so as to shield the sound of the target person.

Fig. 1 is a flowchart of a human voice shielding method according to an embodiment of the present invention. As shown in fig. 1, the method includes:

and step 102, determining the corresponding relation between the speaker and the sound data in the current scene according to the collected sound data.

In the embodiment of the invention, each step is executed by the electronic equipment. The electronic device comprises a headset, an AR/VR device or other similar small electronic terminal that supports active noise reduction techniques.

Before this step, the user needs to output the number of speakers in the current scene on the first interface of the electronic device, as shown in fig. 2, for example, when the number of speakers in the current scene is 7 (including the user), the user inputs 8 (1 greater than the actual number of speakers) in the first interface, a "target person selection" widget is displayed in the first interface, and A, B, C, D, E, F, G and Other are included in the "target person selection" widget. Wherein A, B, C, D, E, F, G represents 8 speakers in the current scene, Other is used to avoid or optimize the situation where multiple people are speaking at the same time. Then, the electronic device starts to collect the voice data of all speakers in the current scene through the microphone, the process of collecting the voice data is continuous, but when every new speaker's voice data is collected, the user needs to click the button on the right side of the collection progress bar, the corresponding collection completion degree is increased, and after all 7 speakers' voice data are collected, the button state is grayed and indicates "completed", as shown in fig. 2.

In the embodiment of the present invention, step 102 specifically includes: and determining the corresponding relation between the speaker and the sound data in the current scene through a speaker segmentation clustering algorithm according to the plurality of sound data.

In the step, after the voice data of all speakers in the current scene are acquired, the acquired voice data are processed through a speaker segmentation clustering algorithm, and the corresponding relation between the speakers and the voice data in the current scene is determined. As shown in fig. 3, the speaker segmentation and clustering algorithm can separate time-domain speech information of different speakers.

And 104, extracting the voiceprint characteristics of the corresponding speaker from the voice data.

The method mainly comprises the step of extracting the voiceprint characteristics of each speaker. As shown in fig. 4, the voiceprint features include a Spectrogram (Spectrogram), a Pitch contour (Pitch contour), and a Long time average spectrum (Ltas). It should be noted that the voiceprint feature may include other features according to actual needs.

And step 106, responding to the operation of the user, and determining the initial relative orientation information of the target person and the target person relative to the user from the speakers in the current scene.

The user operation comprises the operation that the user selects the identification of the target person from the identifications of the speakers displayed on the first interface, and when the user faces the target person, the user clicks the input of the calibration control.

In this step, as shown in fig. 2, the first page further includes a "calibration" widget, where the "calibration" widget includes a compass, a calibration control, and a text "note: after targeting the speaker, click 'calibrate'. After the user determines the target person (i.e. the speaker that the user wants to mask sound) from the speakers in the current scene, the user faces the target person and clicks the calibration control. The electronic equipment receives the operation of a user, and determines initial relative orientation information of a target person and the target person relative to the user from the speaker in the current scene by responding to the operation of the user. The initial relative orientation information is the relative orientation information of the target person relative to the user when the user faces the target person.

Step 108, extracting the target sound spectrum information from the sound data corresponding to the target person, and continuing to execute step 112.

In the embodiment of the present invention, as shown in fig. 5, step 108 includes:

step 1082, obtaining a discrete fourier coefficient by performing discrete time fourier transform on the voice data corresponding to the target person.

As shown in fig. 6, first, Discrete Time Fourier Transform (DTFT) is performed on sound data corresponding to a target person, and then Discrete Fourier Transform (DFT) coefficients at a frame level are obtained, where the DFT coefficients include a mixed spectrum real part and a mixed spectrum imaginary part.

And step 1084, performing speech signal enhancement processing on the discrete fourier coefficients to obtain target audio spectrum information, and continuing to execute step 112.

In the embodiment of the present invention, the sound data corresponding to the target person may include environmental noise and other non-linear interferences in addition to the voice of the target person. Therefore, the DFT coefficients are further subjected to background noise cancellation, nonlinear processing and other speech signal enhancement to obtain target sound spectrum information with higher confidence, where the target sound spectrum information includes a target sound spectrum real part and a target sound spectrum imaginary part, as shown in fig. 6.

The traditional unvarnished transmission technology and the voice enhancement are only indiscriminate target voice enhancement, and the embodiment of the invention also has the enhancement effect on the voice of a target person expected to be shielded.

And step 110, obtaining a spatial difference compensation filter coefficient according to the initial relative orientation information, and continuing to execute step 114.

Fig. 7 is a general flowchart of a human voice screening method according to an embodiment of the present invention, and as shown in fig. 7, before obtaining the spatial difference compensation filter coefficient, real-time relative orientation information and real-time orientation difference are sequentially obtained, and then the spatial difference compensation filter coefficient is obtained according to the real-time orientation difference.

Specifically, as shown in fig. 8, step 110 includes:

and 1102, acquiring real-time relative direction information of the target person relative to the user.

In the embodiment of the invention, because the relative position information between the user and the target person may change along with the change of time, the real-time relative position information of the target person relative to the user needs to be acquired.

Specifically, real-time relative orientation information is obtained according to a binaural time difference (ITD), a binaural amplitude difference (ILD) and a binaural cross-correlation coefficient (IACC).

The human ear auditory sense distinguishes the spatial orientation and mainly depends on binaural cue and monaural cue. The double-ear clues and the single-ear clues are respectively emphasized and mutually supported. The binaural cues are mainly ITD and ILD, and the main influence is the azimuthal perception of sound in the horizontal plane. The monaural clues are mainly the spectral changes caused by the reflection of signals from different directions by the body, trunk, auricle and external ear, and then the brain decodes the changes of the auditory sense in the vertical direction according to the changes of the monaural frequency spectrum. Of course, neither binaural cues nor monaural cues work completely independently, but there are also cooperating parts between them, and this "cooperation" can be described simply by IACC.

As shown in fig. 9, the embodiment of the present invention infers real-time relative orientation information through the ITD, ILD, and IACC, and then calculates a real-time orientation difference from the initial relative orientation information. The calculation formula for ITD is as follows:

（1）

in the formula (1), a is generally constant at 0.0875m, c is the sound velocity, the incident angle immediately ahead is 0 °, and ITD is also 0. Taking FIG. 10 as an example, if the sound source is in the front left, it is calculated

After the direction changes

The left and right ear signals of (c) are:

（2）

（3）

in the formulae (2) and (3),

in order to modulate the frequency of the signal,

m is the modulation index. Therefore, when the delay information of the left and right ear signals is known, the degree of the corresponding angle change can be known.

The relationship between ILD and orientation is more straightforward, such as when ILD (θ s) = xdB, the difference in amplitude of the left and right ear signals is calculated, and then the corresponding orientation θ s is deduced inversely:

（4）

and 1104, obtaining the real-time direction difference of the target person relative to the user according to the initial relative direction information and the real-time relative direction information.

As shown in fig. 9, the real-time spatial orientation difference module calculates the real-time relative orientation information and the initial relative orientation information to obtain the real-time orientation difference.

Step 1106, obtaining the spatial difference compensation filter coefficients corresponding to the real-time orientation difference from the spatial cue library, and continuing to step 114.

After the real-time orientation difference is calculated, the corresponding spatial difference compensation filter coefficient can be found from the spatial cue library, as shown in table 1. The spatial difference compensation filter coefficient includes a two-channel Finite-length unit Impulse Response (FIR) filter coefficient, and is used for performing filter compensation on two channels of the left ear and the right ear.

The physiological information of different speakers is different, so the data in the spatial cue library can initially pass through the tests of Head and Torso Simulator (HATS) of the company brel & Kj æ, the artificial Head testing System (HMS) of the company Head Acoustics, and the artificial Head data (knowledge Electronics man ikin for the artificial Research, KEMAR) of the company GRAS, as shown in fig. 11, because the standards of these artificial HEADs are statistically optimized for a large number of real people, the average value of the data measured by the artificial HEADs of the three companies should cover most of the actual situations of the human beings. Since the artificial head criteria are mainly for the biological characteristics of the euro, the averaged data can be personalized and compensated for the statistical biological characteristic differences of people in different regions. And finally, establishing a spatial cue library by the relation between the measured direction and the cue, and when the spatial cue library is used, acquiring the corresponding real-time binaural cue and monaural cue as long as the relative direction information is input.

Step 112, judging whether the currently received sound data comprises the voiceprint characteristics of the target person, if so, executing step 114; if not, go to step 112.

The time and frequency characteristics of each speaker can be expressed by using a time and frequency characteristic matrix:

(5)

in formula (5), i includes the ith speaker, M includes several features, N includes the number of frequency points, P includes the total number of frames, each

Corresponding to a time-frequency model of each feature after specific coding. The time-frequency model is established for judging whether the current sound data is the sound data of the target person.

And step 114, shielding the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient.

In the embodiment of the present invention, as shown in fig. 12, step 114 includes:

and step 1142, compensating the filter coefficient according to the target sound spectrum information and the spatial difference to obtain a signal to be shielded.

And compensating the perception speech spectrum change caused by the orientation change after combining with the spatial difference compensation filtering. The signal generated at this time is the target signal to be eliminated.

And step 1144, generating a shielding signal with the phase opposite to that of the signal to be shielded and the amplitude equal to that of the signal to be shielded according to the signal to be shielded.

Since the final purpose of the embodiment of the present invention is to shield the sound of the target person, as long as there is the amplitude-frequency information and the phase-frequency information of the sound of the target person, similar to the conventional Active Noise reduction (ANC), it is only necessary to send a signal with an opposite phase and an equal amplitude to the signal to be shielded of the target person to the receiver of the electronic device, so as to complete the Cancellation of the signal to be shielded, as shown in fig. 13.

And step 1146, shielding the signal to be shielded through the shielding signal so as to eliminate the sound of the target person.

What kind of inverse signals (i.e. shielding signals) are to be generated at the receiver of the electronic device is obtained through the above steps, thereby achieving the shielding effect on the sound of the target person. In this step, an algorithm framework similar to that of conventional ANC is adopted, except that Noise (Noise) in conventional ANC is replaced by a signal to be masked of the target person. As shown in fig. 14, since the purpose of the embodiment of the present invention is to eliminate the signal to be shielded x (n) of the target person, the input of the algorithm becomes the spectral information of the signal to be shielded of the target person, and the signal to be shielded passes through the primary path p (z) (the primary path represents the voice d (n) after the sound of the target person is transmitted from the target person to the acoustic full link at the eardrum of the user) and is superimposed on the opposite-phase shielding signal y' (n) estimated by the algorithm, so as to achieve the sound elimination of the target person. In the above figure, e (n) is the residual signal of the iteration of the algorithm, S (z) is the secondary path (i.e. the acoustic path approximation from the loudspeaker of the electronic device to the eardrum, in the practical engineering implementation of the algorithm, the transfer function from the loudspeaker to the feedback Mic), and S ̂ (z) is the secondary path response estimated by the adaptive filter. So far, the embodiment of the invention completes the elimination of the sound of the target person.

In the technical scheme of the human voice shielding method provided by the embodiment of the invention, the corresponding relation between a speaker and voice data in the current scene is determined according to a plurality of collected voice data; responding to the operation of a user, and determining initial relative orientation information of a target person and the target person relative to the user from speakers in a current scene; extracting target sound spectrum information from sound data corresponding to a target person, and obtaining a spatial difference compensation filter coefficient according to the initial relative orientation information; and shielding the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient. The embodiment of the invention can shield the sound of a specific person only and keep the sound of other persons and the environmental sound.

Fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and it should be understood that the electronic device 200 is capable of performing the steps of the human voice shielding method, and details thereof are not described herein to avoid repetition. The electronic apparatus 200 includes: a processing unit 201 and a receiving unit 202.

The processing unit 201 is configured to determine a correspondence between a speaker in a current scene and sound data according to a plurality of collected sound data. Specifically, the processing unit 201 is configured to determine, according to the plurality of sound data, a corresponding relationship between the speaker and the sound data in the current scene through a speaker segmentation and clustering algorithm.

The processing unit 201 is further configured to extract a voiceprint feature of the corresponding speaker from the sound data. The voiceprint features include a spectrogram, a fundamental frequency trace, and a long-time average spectrum.

The receiving unit 202 is used for receiving the operation of the user.

The processing unit 201 is further configured to determine initial relative orientation information of the target person and the target person relative to the user from the speakers in the current scene in response to the user's operation.

The processing unit 201 is further configured to extract target voiceprint information from the sound data corresponding to the target person, and continue to perform an operation of determining whether the currently received sound data includes a voiceprint feature of the target person. Specifically, the processing unit 201 is configured to perform discrete time fourier transform on sound data corresponding to a target person to obtain a discrete fourier coefficient, perform speech signal enhancement processing on the discrete fourier coefficient to obtain target sound spectrum information, and continue to perform an operation of determining whether currently received sound data includes a voiceprint feature of the target person.

The processing unit 201 is further configured to obtain a spatial difference compensation filter coefficient according to the initial relative orientation information, and continue to perform an operation of determining whether the currently received sound data includes a voiceprint feature of the target person. Specifically, the processing unit 201 is configured to obtain real-time relative orientation information of the target person relative to the user, obtain a real-time orientation difference of the target person relative to the user according to the initial relative orientation information and the real-time relative orientation information, obtain a spatial difference compensation filter coefficient corresponding to the real-time orientation difference from the spatial cue library, and continue to perform an operation of determining whether the currently received sound data includes a voiceprint feature of the target person. Specifically, the processing unit 201 is configured to obtain real-time relative orientation information according to the binaural amplitude difference, and the binaural cross-correlation coefficient.

The processing unit 201 is further configured to determine whether the currently received sound data includes a voiceprint feature of the target person.

The processing unit 201 is further configured to, if the processing unit 201 determines that the currently received sound data does not include the voiceprint feature of the target person, continue to perform the determination of whether the currently received sound data includes the voiceprint feature of the target person.

The processing unit 201 is further configured to mask the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficient if the processing unit 201 determines that the currently received sound data includes the voiceprint feature of the target person. Specifically, the processing unit 201 is configured to obtain a signal to be shielded according to the target sound spectrum information and the spatial difference compensation filter coefficient, generate a shielding signal with an opposite phase and an equal amplitude to the signal to be shielded according to the signal to be shielded, and shield the signal to be shielded through the shielding signal to eliminate the sound of the target person.

It should be understood that the electronic device 200 herein is embodied in the form of a functional unit. The term "unit" herein may be implemented in software and/or hardware, and is not particularly limited thereto. For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implement the above-described functions. The hardware circuitry may include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared processor, a dedicated processor, or a group of processors) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.

Accordingly, the units of the respective examples described in the embodiments of the present invention can be realized in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The embodiment of the application provides electronic equipment which can be terminal equipment or circuit equipment arranged in the terminal equipment. The electronic device may be adapted to perform the functions/steps of the above-described method embodiments.

Fig. 16 is a schematic structural diagram of an electronic device 300 according to yet another embodiment of the present invention. The electronic device 300 may include a processor 310, an external memory interface 320, an internal memory 321, a Universal Serial Bus (USB) interface 330, a charging management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a mobile communication module 350, a wireless communication module 360, an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, an earphone interface 370D, a sensor module 380, keys 390, a motor 391, an indicator 392, a camera 393, a display 394, and a Subscriber Identification Module (SIM) card interface 395, and the like. The sensor module 380 may include a pressure sensor 380A, a gyroscope sensor 380B, an air pressure sensor 380C, a magnetic sensor 380D, an acceleration sensor 380E, a distance sensor 380F, a proximity light sensor 380G, a fingerprint sensor 380H, a temperature sensor 380J, a touch sensor 380K, an ambient light sensor 380L, a bone conduction sensor 380M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 300. In other embodiments of the present application, electronic device 300 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 310 may include one or more processing units, such as: the processor 310 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 310 for storing instructions and data. In some embodiments, the memory in the processor 310 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 310. If the processor 310 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 310, thereby increasing the efficiency of the system.

In some embodiments, processor 310 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, the processor 310 may include multiple sets of I2C buses. The processor 310 may be coupled to the touch sensor 380K, the charger, the flash, the camera 393, etc., via different I2C bus interfaces. For example: the processor 310 may be coupled to the touch sensor 380K via an I2C interface, such that the processor 310 and the touch sensor 380K communicate via an I2C bus interface to implement the touch functionality of the electronic device 300.

The I2S interface may be used for audio communication. In some embodiments, the processor 310 may include multiple sets of I2S buses. The processor 310 may be coupled to the audio module 370 via an I2S bus to enable communication between the processor 310 and the audio module 370. In some embodiments, the audio module 370 may communicate audio signals to the wireless communication module 360 via an I2S interface, enabling answering of calls via a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 370 and the wireless communication module 360 may be coupled by a PCM bus interface. In some embodiments, the audio module 370 may also transmit audio signals to the wireless communication module 360 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 310 with the wireless communication module 360. For example: the processor 310 communicates with the bluetooth module in the wireless communication module 360 through the UART interface to implement the bluetooth function. In some embodiments, the audio module 370 may transmit the audio signal to the wireless communication module 360 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

The MIPI interface may be used to connect processor 310 with peripheral devices such as display 394, camera 393, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 310 and camera 393 communicate over a CSI interface to implement the capture functionality of electronic device 300. The processor 310 and the display screen 394 communicate via the DSI interface to implement the display functions of the electronic device 300.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 310 with the camera 393, the display 394, the wireless communication module 360, the audio module 370, the sensor module 380, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 330 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 330 may be used to connect a charger to charge the electronic device 300, and may also be used to transmit data between the electronic device 300 and peripheral devices. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only an illustration, and does not limit the structure of the electronic device 300. In other embodiments of the present application, the electronic device 300 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 340 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 340 may receive charging input from a wired charger via the USB interface 330. In some wireless charging embodiments, the charging management module 340 may receive a wireless charging input through a wireless charging coil of the electronic device 300. The charging management module 340 may also supply power to the electronic device through the power management module 341 while charging the battery 342.

The power management module 341 is configured to connect the battery 342, the charging management module 340 and the processor 310. The power management module 341 receives input from the battery 342 and/or the charge management module 340 and provides power to the processor 310, the internal memory 321, the display 394, the camera 393, and the wireless communication module 360. The power management module 341 may also be configured to monitor parameters such as battery capacity, battery cycle count, and battery state of health (leakage, impedance). In other embodiments, the power management module 341 may also be disposed in the processor 310. In other embodiments, the power management module 341 and the charging management module 340 may be disposed in the same device.

The wireless communication function of the electronic device 300 may be implemented by the antenna 1, the antenna 2, the mobile communication module 350, the wireless communication module 360, a modem processor, a baseband processor, and the like.

The

antennas

1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 300 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 350 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 300. The mobile communication module 350 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 350 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the filtered electromagnetic wave to the modem processor for demodulation. The mobile communication module 350 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 350 may be disposed in the processor 310. In some embodiments, at least some of the functional modules of the mobile communication module 350 may be disposed in the same device as at least some of the modules of the processor 310.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 370A, the receiver 370B, etc.) or displays images or video through the display 394. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be separate from the processor 310, and may be disposed in the same device as the mobile communication module 350 or other functional modules.

The wireless communication module 360 may provide solutions for wireless communication applied to the electronic device 300, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 360 may be one or more devices integrating at least one communication processing module. The wireless communication module 360 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 310. The wireless communication module 360 may also receive a signal to be transmitted from the processor 310, frequency-modulate and amplify the signal, and convert the signal into electromagnetic waves via the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of electronic device 300 is coupled to mobile communication module 350 and antenna 2 is coupled to wireless communication module 360 such that electronic device 300 may communicate with networks and other devices via wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The electronic device 300 implements display functions via the GPU, the display 394, and the application processor, among other things. The GPU is an image processing microprocessor coupled to a display 394 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 310 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 394 is used to display images, video, and the like. The display screen 394 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 300 may include 1 or N display screens 394, N being a positive integer greater than 1.

The electronic device 300 may implement a shooting function through the ISP, the camera 393, the video codec, the GPU, the display 394, the application processor, and the like.

The ISP is used to process the data fed back by the camera 393. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be located in camera 393.

Camera 393 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, electronic device 300 may include 1 or N cameras 393, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 300 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 300 may support one or more video codecs. In this way, the electronic device 300 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent recognition of the electronic device 300, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 300. The external memory card communicates with the processor 310 through the external memory interface 320 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 321 may be used to store computer-executable program code, which includes instructions. The internal memory 321 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The data storage area may store data (e.g., audio data, phone book, etc.) created during use of the electronic device 300, and the like. In addition, the internal memory 321 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 310 executes various functional applications of the electronic device 300 and data processing by executing instructions stored in the internal memory 321 and/or instructions stored in a memory provided in the processor.

The electronic device 300 may implement audio functions through the audio module 370, the speaker 370A, the receiver 370B, the microphone 370C, the earphone interface 370D, and the application processor. Such as music playing, recording, etc.

The audio module 370 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 370 may also be used to encode and decode audio signals. In some embodiments, the audio module 370 may be disposed in the processor 310, or some functional modules of the audio module 370 may be disposed in the processor 310.

The speaker 370A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic device 300 can listen to music through the speaker 370A or listen to a hands-free conversation.

The receiver 370B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic device 300 receives a call or voice information, it can receive voice by placing the receiver 370B close to the ear of the person.

Microphone 370C, also known as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal into the microphone 370C by speaking the user's mouth near the microphone 370C. The electronic device 300 may be provided with at least one microphone 370C. In other embodiments, the electronic device 300 may be provided with two microphones 370C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 300 may further include three, four or more microphones 370C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and so on.

The headphone interface 370D is used to connect wired headphones. The headset interface 370D may be the USB interface 330, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 380A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 380A may be disposed on the display screen 394. Pressure sensor 380A.

Such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 380A, the capacitance between the electrodes changes. The electronic device 300 determines the intensity of the pressure from the change in capacitance. When a touch operation is applied to the display screen 394, the electronic apparatus 300 detects the intensity of the touch operation according to the pressure sensor 380A. The electronic apparatus 300 may also calculate the touched position from the detection signal of the pressure sensor 380A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 380B may be used to determine the motion pose of the electronic device 300. In some embodiments, the angular velocity of electronic device 300 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 380B. The gyro sensor 380B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 380B detects the shake angle of the electronic device 300, calculates the distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 300 through a reverse movement, thereby achieving anti-shake. The gyro sensor 380B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 380C is used to measure air pressure. In some embodiments, electronic device 300 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 380C.

The magnetic sensor 380D includes a hall sensor. The electronic device 300 may detect the opening and closing of the flip holster using the magnetic sensor 380D. In some embodiments, when the electronic device 300 is a flip phone, the electronic device 300 may detect the opening and closing of the flip according to the magnetic sensor 380D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 380E may detect the magnitude of acceleration of the electronic device 300 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 300 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 380F for measuring distance. The electronic device 300 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, the electronic device 300 may utilize the distance sensor 380F to range for fast focus.

The proximity light sensor 380G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 300 emits infrared light to the outside through the light emitting diode. The electronic device 300 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 300. When insufficient reflected light is detected, the electronic device 300 may determine that there are no objects near the electronic device 300. The electronic device 300 can utilize the proximity light sensor 380G to detect that the user holds the electronic device 300 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 380G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 380L is used to sense the ambient light level. The electronic device 300 may adaptively adjust the brightness of the display 394 based on the perceived ambient light level. The ambient light sensor 380L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 380L may also cooperate with the proximity light sensor 380G to detect whether the electronic device 300 is in a pocket to prevent inadvertent contact.

The fingerprint sensor 380H is used to capture a fingerprint. The electronic device 300 may utilize the collected fingerprint characteristics to implement fingerprint unlocking, access an application lock, fingerprint photographing, fingerprint incoming call answering, and the like.

The temperature sensor 380J is used to detect temperature. In some embodiments, the electronic device 300 implements a temperature processing strategy using the temperature detected by the temperature sensor 380J. For example, when the temperature reported by the temperature sensor 380J exceeds a threshold, the electronic device 300 performs a reduction in performance of a processor located near the temperature sensor 380J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 300 heats the battery 342 when the temperature is below another threshold to avoid the low temperature causing the electronic device 300 to shut down abnormally. In other embodiments, when the temperature is below a further threshold, the electronic device 300 performs a boost on the output voltage of the battery 342 to avoid an abnormal shutdown due to low temperature.

The touch sensor 380K is also referred to as a "touch device". The touch sensor 380K may be disposed on the display screen 394, and the touch sensor 380K and the display screen 394 form a touch screen, which is also referred to as a "touch screen". The touch sensor 380K is used to detect a touch operation applied thereto or thereabout. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided via the display 394. In other embodiments, the touch sensor 380K can be disposed on a surface of the electronic device 300 at a different location than the display 394.

The bone conduction sensor 380M can acquire a vibration signal. In some embodiments, the bone conduction transducer 380M can acquire a vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 380M may also contact the human body pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 380M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 370 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 380M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 380M, so as to realize the heart rate detection function.

Keys 390 include a power-on key, a volume key, etc. The keys 390 may be mechanical keys. Or may be touch keys. The electronic device 300 may receive a key input, and generate a key signal input related to user setting and function control of the electronic device 300.

The motor 391 may generate a vibration cue. The motor 391 may be used for both incoming call vibration prompting and touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, voice shielding, etc.) may correspond to different vibration feedback effects. The motor 391 may also respond to different vibration feedback effects by performing touch operations on different areas of the display 394. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 392 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 395 is for connecting a SIM card. The SIM card can be brought into and out of contact with the electronic apparatus 300 by being inserted into and pulled out of the SIM card interface 395. The electronic device 300 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 395 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. Multiple cards can be inserted into the same SIM card interface 395 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 395 may also be compatible with different types of SIM cards. The SIM card interface 395 may also be compatible with an external memory card. The electronic device 300 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the electronic device 300 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 300 and cannot be separated from the electronic device 300.

Embodiments of the present application provide a computer-readable storage medium, which stores instructions that, when executed on a terminal device, cause the terminal device to perform the functions/steps in the above method embodiments.

Embodiments of the present application also provide a computer program product comprising instructions for causing a computer to perform the functions/steps of the above-described method embodiments when the computer program product runs on a computer or any at least one processor.

In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, any function, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered by the protection scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A human voice screening method, characterized in that the method comprises:

2. The method according to claim 1, wherein the extracting target sound spectrum information from the sound data corresponding to the target person comprises:

3. The method of claim 1, wherein deriving spatial difference compensation filter coefficients from the initial relative orientation information comprises:

4. The method of claim 1, wherein the masking the target person's voice from the target audio spectral information and the spatial difference compensation filter coefficients comprises:

5. The method of claim 1, wherein the determining the correspondence between the speaker in the current scene and the sound data according to the collected sound data comprises:

6. The method of claim 1, wherein prior to determining initial relative orientation information of a target person and the target person relative to the user from speakers in the current scene in response to a user operation, further comprising:

7. The method of claim 6, wherein before masking the target person's voice based on the target audio spectrum information and the spatial difference compensation filter coefficients, further comprising:

8. The method of claim 7, wherein the voiceprint features comprise a spectrogram, a fundamental frequency trace, and a long-time averaged spectrum.

9. The method of claim 3, wherein the obtaining real-time relative orientation information of the target person with respect to the user comprises:

10. An electronic device comprising a processor and a memory, wherein the memory is configured to store a computer program comprising program instructions that, when executed by the processor, cause the electronic device to perform the steps of:

11. The electronic device according to claim 10, wherein the extracting target sound spectrum information from the sound data corresponding to the target person includes:

12. The electronic device of claim 10, wherein deriving spatial difference compensation filter coefficients from the initial relative orientation information comprises:

13. The electronic device of claim 10, wherein the masking the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficients comprises:

14. The electronic device of claim 10, wherein the determining a correspondence between a speaker in a current scene and the sound data according to the collected sound data comprises:

15. The electronic device of claim 10, wherein prior to determining initial relative orientation information of a target person and the target person relative to the user from speakers in the current scene in response to a user operation, further comprising:

16. The electronic device of claim 15, wherein before masking the sound of the target person according to the target sound spectrum information and the spatial difference compensation filter coefficients, further comprising:

17. The electronic device of claim 16, wherein the voiceprint features comprise a spectrogram, a fundamental frequency trace, and a long-time averaged spectrum.

18. The electronic device of claim 12, wherein the obtaining real-time relative orientation information of the target person with respect to the user comprises:

19. A computer-readable storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a computer, cause the computer to carry out the method according to any one of claims 1-9.