WO2024032035A1 - 一种语音信号的输出方法和电子设备 - Google Patents

一种语音信号的输出方法和电子设备 Download PDF

Info

Publication number
WO2024032035A1
WO2024032035A1 PCT/CN2023/091095 CN2023091095W WO2024032035A1 WO 2024032035 A1 WO2024032035 A1 WO 2024032035A1 CN 2023091095 W CN2023091095 W CN 2023091095W WO 2024032035 A1 WO2024032035 A1 WO 2024032035A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
frequency point
signal
spectral density
power spectral
Prior art date
Application number
PCT/CN2023/091095
Other languages
English (en)
French (fr)
Other versions
WO2024032035A9 (zh
Inventor
杨枭
褚建飞
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2024032035A1 publication Critical patent/WO2024032035A1/zh
Publication of WO2024032035A9 publication Critical patent/WO2024032035A9/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/1752Masking
    • G10K11/1754Speech masking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used

Definitions

  • the present application relates to the field of terminal technology, and in particular, to a voice signal output method and an electronic device.
  • the driving force of the ceramic device in the sound-emitting component of the ceramic driver screen will be increased.
  • the intensity of the sound leaked to the user's surroundings will also increase. Others can clearly hear the user's call content, thus causing the user's privacy to be leaked and the user experience to be poor.
  • Embodiments of the present application provide a voice signal output method and an electronic device to solve the current problem of electronic devices in handheld call scenarios, where increasing the volume will cause the user's call content to be leaked and the user experience will be poor.
  • embodiments of the present application provide a voice signal output method, which method is used in an electronic device.
  • the electronic device includes a first sound-generating component and a second sound-generating component, and the first sound-generating component is disposed on the The first position of the electronic device.
  • the second sound-emitting component is disposed in a second position different from the first position.
  • the method includes: generating a first voice signal, which refers to an interference signal generated according to a downlink voice signal; and generating a second voice signal, which refers to performing delay processing on the downlink voice signal.
  • a voice signal with the same time delay as the first voice signal is obtained; at the same output time, the second voice signal is output through the first voice component, and the second voice signal is output through the second voice component. the first voice signal.
  • the downlink voice signal can be masked by the interference signal generated based on the downlink voice signal.
  • the sound entering the user's ears is strong enough to ensure that the user can clearly hear the call content.
  • the intensity of the sound reaching these people's ears is smaller and the information is incomplete, making it impossible for these people to hear clearly the content of the user's call, which is very good.
  • the privacy of users is effectively protected and the user experience is better.
  • the generating the first speech signal includes: generating a first power spectral density, where the first power spectral density refers to the power spectral density calculated according to the downlink speech signal; A power spectral density generates the first speech signal.
  • generating the first speech signal according to the first power spectral density includes: generating a masking signal and a pink noise signal according to the first power spectral density; adjusting the masking signal and the The pink noise signal is adjusted to the same time delay; and the first speech signal is generated according to the masking signal and the pink noise signal adjusted to the same time delay.
  • the generated first voice signal can mask the signal at a frequency point with a larger power spectral density value in the downlink voice signal.
  • generating a masking signal according to the first power spectral density includes: determining a first average power according to the first power spectral density, where the first average power refers to the first The average value of the power spectral density values of all frequency points corresponding to the power spectral density; determine the first frequency point, where the first frequency point refers to the corresponding power spectral density value among all frequency points corresponding to the first power spectral density. A frequency point greater than the first average power; generating the masking signal according to the first frequency point.
  • a masking sound can be generated that can mask the signal with a larger power spectral density in the downlink speech signal.
  • the signal with a larger power spectral density in the downlink speech signal can be masked, allowing it to enter the ears of other people in the user's surrounding environment.
  • the intensity of the sound is weakened, so that the user's call content cannot be heard clearly, and the user's privacy can be well protected.
  • generating the masking signal according to the first frequency point includes: if there are multiple first frequency points, and the one with the largest frequency value among all the first frequency points If the difference in frequency value between the first frequency point and the first frequency point with the smallest frequency value is less than the first preset frequency threshold, then in the order of corresponding power spectral density values from large to small, among all the first frequency points Select a first preset number of first frequency points and determine them as second frequency points; determine a third frequency point, and the third frequency point is located between two adjacent second frequency points; according to the preset frequency point
  • the ear masking effect curve determines the amplitude corresponding to the third frequency point, and the amplitude is used to characterize the strength of the signal; according to the frequency value of the third frequency point and the amplitude corresponding to the third frequency point , generate the masking signal.
  • generating the masking signal according to the first frequency point includes: if there are multiple first frequency points, and the one with the largest frequency value among all the first frequency points If the difference in frequency value between the first frequency point and the first frequency point with the smallest frequency value is greater than or equal to the first preset frequency threshold, then the first frequency point with the largest corresponding power spectral density value is selected from all the first frequency points.
  • the first frequency point with the largest frequency value and the first frequency point with the smallest frequency value are determined as the fourth frequency point; near each of the fourth frequency points, and between each of the fourth frequency points, the frequency value Where the difference is less than or equal to the second preset frequency threshold, one frequency point is selected and determined as the fifth frequency point; according to the preset human ear masking effect curve, the amplitude corresponding to the fifth frequency point is determined, and the amplitude is In order to represent the strength of the signal, the masking signal is generated according to the frequency value of the fifth frequency point and the amplitude corresponding to the fifth frequency point.
  • generating the masking signal according to the first frequency point includes: if there are multiple first frequency points, and the one with the largest frequency value among all the first frequency points If the difference in frequency value between the first frequency point and the first frequency point with the smallest frequency value is greater than or equal to the first preset frequency threshold, then in each frequency point interval, in order from the largest to the smallest corresponding power spectral density value, select A second preset number of first frequency points are obtained and determined as the sixth frequency point, wherein each frequency point Between the end frequency point and the start frequency point of the point interval, the frequency value difference is less than or equal to the third preset frequency threshold, and the number of first frequency points included in each frequency point interval is greater than or equal to the third preset number; determine The seventh frequency point corresponding to each frequency point interval, the seventh frequency point is located between two adjacent sixth frequency points in the corresponding frequency point interval; the said seventh frequency point is determined according to the preset human ear masking effect curve The amplitude corresponding to the seventh frequency point, the amplitude is used to represent
  • generating the masking signal according to the first frequency point includes: if the number of the first frequency point is one, selecting one frequency point on both sides of the first frequency point. The point is determined as the eighth frequency point; according to the preset human ear masking effect curve, the amplitude corresponding to the eighth frequency point is determined, and the amplitude is used to represent the strength of the signal; according to the eighth frequency point The frequency value and the amplitude corresponding to the eighth frequency point generate the masking signal.
  • generating the masking signal according to the first frequency point includes: if the number of the first frequency point is one, selecting one frequency point on both sides of the first frequency point. point is determined as the ninth frequency point; between each of the ninth frequency points and the first frequency point, a frequency point is selected and determined as the tenth frequency point; according to the preset human ear masking effect curve, a frequency point is determined The amplitude corresponding to the tenth frequency point, the amplitude is used to represent the strength of the signal; the masking signal is generated according to the frequency value of the tenth frequency point and the amplitude corresponding to the tenth frequency point. .
  • generating a pink noise signal according to the first power spectral density includes: determining a second average power, where the second average power refers to the power spectral density value of all eleventh frequency points.
  • the average value, the eleventh frequency point refers to the frequency point whose power spectral density value is less than or equal to the first average power among all frequency points corresponding to the first power spectral density, and the first average power refers to the frequency point average the power spectral density values of all frequency points corresponding to the first power spectral density; obtain the preset pink noise bandpass filter gain corresponding to the second average power; adjust the gain of the first bandpass filter to the preset Assume a pink noise bandpass filter gain; perform bandpass filtering on the signal output from the pink noise signal source through the gain-adjusted first bandpass filter to generate the pink noise signal.
  • a pink noise signal that can cooperate with the masking signal can be determined, so that the subsequently generated interference signal can better mask the downlink voice signal, thereby better protecting the user's call privacy.
  • generating the first speech signal according to the first power spectral density includes: determining a first average power according to the first power spectral density, where the first average power refers to The average value of the power spectral density values of all frequency points corresponding to the first power spectral density; determine a twelfth frequency point, where the twelfth frequency point refers to all frequency points corresponding to the first power spectral density , corresponding to a frequency point whose power spectral density value is greater than the first average power; in order of corresponding power spectral density values from large to small, select a fourth preset number of tenth frequency points from all the twelfth frequency points.
  • the second frequency point is determined as the thirteenth frequency point; according to the thirteenth frequency point, a notch filter is generated, and the notch frequency of the notch filter includes the frequency value of the thirteenth frequency point; by The notch filter performs notch filtering on the signal output from the pink noise signal source to generate the first speech signal.
  • the generated first speech signal can mask out the frequency points with larger power spectral density values in the downlink speech signal.
  • the user's call content can also protect the user's privacy well, and the user experience will be better.
  • generating the first power spectral density includes: band-pass filtering the downlink speech signal through a second band-pass filter to obtain a first signal within a first bandwidth range; A bandwidth is the bandwidth of the second bandpass filter; the power spectral density of the first signal is calculated; the power spectral density of the first signal is determined to be the first power spectral density.
  • the inaudible signals in the downlink voice signal can be filtered out through the second bandpass filter, thereby improving the output efficiency of the downlink voice signal, making the call process smoother and the user experience better.
  • the electronic device further includes a third sound-generating component, the third sound-generating component is disposed at a third position close to the first position, and the method further includes: generating a third voice signal,
  • the third voice signal refers to a voice signal with the same time delay as the first voice signal obtained after delay processing of the downlink voice signal; at the same output time, through the third sound The component outputs the third voice signal.
  • the sound emitted by the first sound emitting component can be supplemented by the sound emitted by the third sound emitting component, so that the sound heard by the user is clearer and the user's experience is improved.
  • inventions of the present application provide an electronic device.
  • the electronic device includes a first sound-generating component and a second sound-generating component.
  • the first sound-generating component is disposed at a first position of the electronic device. The user holds the When the electronic device is talking, the first position is close to the user's ear, and the second sound-emitting component is disposed at a second position different from the first position.
  • the electronic device also includes a memory and a processor, and the memory coupled to the processor; the memory is used to store computer program code, the computer program code includes computer instructions, and when the processor executes the computer instructions, the electronic device causes the electronic device to perform any one of the first aspects method described in the item.
  • the electronic device can mask the downlink voice signal by using the interference signal generated based on the downlink voice signal.
  • the sound entering the user's ears is strong enough to ensure that the user can clearly hear the call content.
  • the intensity of the sound reaching these people's ears is smaller and the information is incomplete, making it impossible for these people to hear clearly the content of the user's call, which is very good.
  • the privacy of users is effectively protected and the user experience is better.
  • the present application provides a computer storage medium in which a computer program or instructions are stored.
  • the method according to any one of the first aspects is performed. be executed.
  • the downlink voice signal can be masked by using the interference signal generated based on the downlink voice signal.
  • the sound entering the user's ears is strong enough to ensure that the user can clearly hear the call content.
  • the intensity of the sound reaching these people's ears is smaller and the information is incomplete, making it impossible for these people to hear clearly the content of the user's call, which is very good.
  • the privacy of users is effectively protected and the user experience is better.
  • Figure 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 2 is a software structure block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a voice signal output method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of an application scenario provided by the embodiment of the present application.
  • Figure 5 is a schematic flowchart of a method for generating a first voice signal provided by an embodiment of the present application
  • Figure 6 is a schematic flowchart of a method for generating a first speech signal according to a first power spectral density provided by an embodiment of the present application
  • Figure 7 is a schematic flowchart of a method for generating a masking signal according to the first power spectral density according to an embodiment of the present application
  • Figure 8 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 9 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 10 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 11 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 12 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 13 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 14 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 15 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 16 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 17 is a schematic flowchart of a method for generating a pink noise signal according to the first power spectral density provided by an embodiment of the present application;
  • Figure 18 is a schematic flow chart of another method of generating a first speech signal according to the first power spectral density provided by an embodiment of the present application;
  • Figure 19 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • multiple sound-generating components can be provided in an electronic device, for example, a sound-generating component driven by a ceramic screen, a top-mounted speaker, a bottom-mounted speaker, etc.
  • the electronic device usually produces sound through the sound-emitting component of the ceramic driver screen and/or the speaker set on the top.
  • the sound-emitting component of the ceramic-driven screen when the user is holding a call, in the sound-emitting component of the ceramic-driven screen, the sound-emitting area of the screen can be facing the user's ear, and the top speaker can also be close to the user's ear.
  • Most of the sounds emitted by the electronic device can enter the user's ear. In the ear, only a very small part of the sound will leak into the surrounding environment. Even if there are other people in the surrounding environment, the content of the user's call cannot be heard clearly, which can well protect the user's privacy.
  • the ceramic driver will further increase the intensity of the sound emitted by the sound-emitting component of the ceramic driver screen and the sound emitted by the speaker set on the top. Larger, the intensity of the sound leaked into the surrounding environment will also increase simultaneously. In this way, other people in the surrounding environment can hear the user's call content clearly, which will lead to the leakage of the user's call content and the user's privacy. Poor user experience.
  • embodiments of the present application provide a voice signal output method, device and electronic equipment.
  • This method can be applied to electronic equipment.
  • the electronic equipment can generate an interference signal based on the downlink voice signal during the user's call, and then adjust the downlink voice signal and the interference signal to the same delay, and then output the same signal.
  • the downlink speech signal is output through the first sound-emitting component in the electronic device close to the human ear, and the interference signal is output through the second sound-emission component far away from the human ear, so that the downlink speech signal is masked by the interference signal, so that the interference signal in the surrounding environment Others cannot hear clearly the content of the user's call, thus achieving the purpose of protecting the privacy of the user's call and giving the user a better experience.
  • the electronic device of the present application may be stationary or mobile.
  • Electronic equipment may include communication terminals, vehicle-mounted equipment, mobile equipment, user terminals, mobile terminals, wireless communication equipment, portable terminals, user agents, user devices, service equipment or user equipment (UE) and other computer networks that are at the core of the network.
  • Peripheral devices are mainly used for data input and output or display of processing results.
  • the terminal device may be a mobile phone, a cordless phone, a smart watch, a wearable device, a tablet device, a handheld device with wireless communication capabilities, a computing device, a vehicle-mounted communication module or other processing device connected to a wireless modem, or the like.
  • FIG. 1 shows a schematic structural diagram of an electronic device 100 .
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (SIM) card interface 195, etc.
  • a processor 110 an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete instruction fetching and execution. control.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver and transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (derail clock line, SCL).
  • processor 110 may include multiple sets of I2C buses.
  • the processor 110 can separately couple the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
  • the processor 110 can be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • processor 110 may include multiple sets of I2S buses.
  • the processor 110 can be coupled with the audio module 170 through the I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface to implement the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to implement the function of answering calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 110 and the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate through the CSI interface to implement the shooting function of the electronic device 100 .
  • the processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, display screen 194, wireless communication module 160, audio module 170, sensor module 180, etc.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other electronic devices, such as AR devices, etc.
  • the interface connection relationships between the modules illustrated in the embodiment of the present invention are only schematic illustrations and do not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142, it can also provide power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, the wireless communication module 160, and the like.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 may provide information including wireless local area network (wireless local area network) applied on the electronic device 100 area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near Wireless communication solutions such as near field communication (NFC) and infrared technology (IR).
  • WLAN wireless local area network
  • Wi-Fi wireless fidelity
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi) -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the display screen 194 of the electronic device 100 may also be called a screen 194.
  • a sound-generating area may also be provided on the screen 194 of the electronic device 100.
  • the electronic device 100 may drive the sound-generating area through ceramics or other driving devices. screen to make sounds.
  • the sound emitting area on the screen 194 can be facing the human ear.
  • the electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, the light is transmitted to the camera sensor through the lens, the optical signal is converted into an electrical signal, and the camera sensor passes the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital video.
  • Electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network (NN) computing processor.
  • NN neural network
  • Intelligent cognitive applications of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a program storage area and a data storage area.
  • the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the electronic device 100 (such as audio data, phone book, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to hands-free calls.
  • Multiple speakers 170A may be provided in the electronic device 100. For example, one speaker 170A may be provided at the top of the electronic device 100, and another speaker 170A may be provided at the bottom.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear.
  • the speaker 170A and the receiver 170B can be configured as one component, which is not limited in this application.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • the headphone interface 170D is used to connect wired headphones.
  • the headphone interface 170D may be a USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, or a Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA Cellular Telecommunications Industry Association of the USA
  • the pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals.
  • pressure sensor 180A may be disposed on display screen 194 .
  • pressure sensors 180A there are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc.
  • a capacitive pressure sensor may include at least two parallel plates of conductive material.
  • the electronic device 100 determines the intensity of the pressure based on the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch location but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes may be determined by gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the angle at which the electronic device 100 shakes, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to offset the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory gaming scenarios.
  • Air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • Magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may utilize the magnetic sensor 180D to detect opening and closing of the flip holster.
  • the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. Then, based on the detected opening and closing status of the leather case or the opening and closing status of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices and be used in horizontal and vertical screen switching, pedometer and other applications.
  • Distance sensor 180F for measuring distance.
  • Electronic device 100 can measure distance via infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may utilize the distance sensor 180F to measure distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outwardly through the light emitting diode.
  • Electronic equipment The Prep 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect when the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in holster mode, and pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touching.
  • Fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to achieve fingerprint unlocking, access to application locks, fingerprint photography, fingerprint answering of incoming calls, etc.
  • Temperature sensor 180J is used to detect temperature.
  • the electronic device 100 utilizes the temperature detected by the temperature sensor 180J to execute the temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 reduces the performance of a processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to prevent the low temperature from causing the electronic device 100 to shut down abnormally. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also known as "touch device”.
  • the touch sensor 180K can be disposed on the display screen 194.
  • the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a location different from that of the display screen 194 .
  • Bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human body's vocal part.
  • the bone conduction sensor 180M can also contact the human body's pulse and receive blood pressure beating signals.
  • the bone conduction sensor 180M can also be provided in an earphone and combined into a bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibrating bone obtained by the bone conduction sensor 180M to implement the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M to implement the heart rate detection function.
  • the buttons 190 include a power button, a volume button, etc.
  • Key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the electronic device 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the electronic device 100 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • This embodiment of the present invention takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
  • FIG. 2 is a software structure block diagram of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime and system libraries, and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include camera, gallery, calendar, calling, map, navigation, WLAN, Bluetooth, music, video, short message and other applications.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications.
  • Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 .
  • call status management including connected, hung up, etc.
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • the Android Runtime includes core libraries and virtual machines. The Android runtime is responsible for the scheduling and management of the Android system. reason.
  • the core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (Media Libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the following exemplifies the workflow of the software and hardware of the electronic device 100 in conjunction with capturing the photographing scene.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation and the control corresponding to the click operation as a camera application icon control as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. Camera 193 captures still images or video.
  • Figure 3 is a schematic flowchart of a voice signal output method provided by an embodiment of the present application.
  • the method can be applied to an electronic device, which is provided with a first sound-generating component and a second sound-generating component.
  • the first sound-emitting component can be disposed at a first position of the electronic device.
  • the first position is close to the user's ear.
  • the second sound-generating component may be disposed at a second position different from the first position in the electronic device.
  • FIG. 4 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • the first position may be located on the screen 194 of the electronic device 100
  • the first sound-generating component 401 may be a sound-generating component of a ceramic-driven screen.
  • the second position may be located at the bottom of the electronic device 100, and the second sound-generating component 402 may be a speaker disposed at the bottom of the electronic device 100.
  • first position and the second position can also be located at other positions of the electronic device.
  • first position may also be located at the top of the electronic device, above the screen.
  • the second location may also be located on the side of the electronic device, etc.
  • first sound-generating component and the second sound-generating component can also be configured as other sound-generating components.
  • the first sound-generating component can also be a speaker disposed on the top of the electronic device
  • the second sound-generating component can also be a speaker disposed on the side of the electronic device. Speakers etc. This application does not limit this.
  • the method may include the following steps:
  • Step S101 Generate a first voice signal.
  • the first voice signal refers to an interference signal generated based on the downlink voice signal.
  • the downlink voice signal may be the downlink voice signal during the user's call.
  • the electronic device held by the user can receive downlink voice signals transmitted to the electronic device by other terminal devices (hereinafter referred to as the terminal device) in real time and continuously.
  • the electronic device can process the received downlink voice signal and then output it according to the voice signal output method provided by the embodiment of the present application.
  • generating the first voice signal can be achieved by following the following steps:
  • Step S201 Generate a first power spectral density.
  • the first power spectral density refers to the power spectral density calculated based on the downlink speech signal.
  • the first power spectral density may be the power spectral density of the downlink speech signal.
  • the power spectral density of the downlink speech signal may first be calculated through the autocorrelation function method, and then the calculated power spectral density of the downlink speech signal may be determined as the first power spectral density.
  • the downlink voice signal can be band-pass filtered through a preset second band-pass filter to obtain the first signal within the first bandwidth range, and the first bandwidth is the second band-pass filter. The bandwidth of the filter. Then, the power spectral density of the first signal is calculated through the autocorrelation function method, and then the calculated power spectral density of the first signal is determined as the first power spectral density. In this way, the inaudible signals in the downlink voice signal can be filtered out through the second bandpass filter, thereby improving the output efficiency of the downlink voice signal, making the call process smoother and the user experience better.
  • Step S202 Generate a first speech signal according to the first power spectral density.
  • generating the first speech signal according to the first power spectral density may include multiple implementation methods. Exemplarily, as shown in Figure 6, generating the first speech signal according to the first power spectral density can be implemented according to the following steps:
  • Step S301 Generate a masking signal and a pink noise signal according to the first power spectral density.
  • the masking signal can be used to mask the downlink voice signal. After the two are output at the same time, the intensity of the sound signal corresponding to the downlink voice signal entering the ears of other people in the surrounding environment is weakened, and the information is incomplete. Even if the user increases the volume, the surroundings Other people in the environment cannot hear the user's call clearly, which can protect the user's privacy very well.
  • generating a masking signal based on the first power spectral density can be achieved by following the following steps:
  • Step S401 Determine the first average power according to the first power spectral density.
  • the first average power refers to the average value of the power spectral density values of all frequency points corresponding to the first power spectral density.
  • Step S402 Determine the first frequency point.
  • the first frequency point refers to a frequency point whose corresponding power spectral density value is greater than the first average power among all frequency points corresponding to the first power spectral density.
  • the power spectral density values corresponding to frequency points f1, f2 and f3 are greater than the first average power, then the frequency points f1, f2 and f3 can be averaged determined as the first frequency point.
  • Step S403 Generate a masking signal according to the first frequency point.
  • the number of first frequency points may be multiple or one. Based on this, the implementation of generating the masking signal according to the first frequency point may also include multiple implementations.
  • a first preset number of first frequency points are selected from all first frequency points and determined as second frequency points.
  • the first preset frequency threshold can be set according to the requirements of the actual application scenario, for example, it can be set to 1000HZ.
  • the first preset number can also be set according to the needs of the actual application scenario, for example, it can be set to 3 or 5.
  • a third frequency point is determined.
  • the third frequency point is located between two adjacent second frequency points. That is, a third frequency point can be selected between each two adjacent second frequency points.
  • the determined third frequency point can be used as the frequency point of the single masking tone.
  • a frequency point can be randomly selected between two adjacent second frequency points and determined as the third frequency point.
  • a frequency point located in the middle of two adjacent second frequency points can also be selected as the third frequency point, and this application does not limit this.
  • the amplitude corresponding to the third frequency point can be determined according to the preset human ear masking effect curve, and this amplitude can be used to characterize the strength of the signal.
  • a masking signal can be generated based on the frequency value of the third frequency point and the amplitude corresponding to the third frequency point.
  • the masking signal includes a signal whose frequency is a frequency value corresponding to the third frequency point and whose amplitude is an amplitude value corresponding to the third frequency point.
  • the frequency value (hereinafter referred to as the preset frequency value, the unit is Hertz Hz) and the amplitude (hereinafter referred to as the preset amplitude) corresponding to the points included in the human ear masking effect curve can be stored in the electronic device.
  • the preset frequency value that is the same as the frequency value of the third frequency point can be filtered out from the stored preset frequency values, and then the same as the frequency value of the third frequency point.
  • the filtered preset frequency value corresponding to the stored preset amplitude is determined to be the amplitude corresponding to the third frequency point.
  • the amplitude can be sound intensity in decibels (dB), or the amplitude can be power spectral density in decibels/hertz, or the amplitude can be power, etc. This application does not limit this.
  • interpolation can be used to determine the third frequency value based on the corresponding stored preset frequency value and preset amplitude value. The amplitude corresponding to the three frequency points.
  • first frequency points f1, f2 and f3 there are three first frequency points, namely f1, f2 and f3.
  • the frequency value of the first frequency point f3 with the largest frequency value is 400HZ
  • the frequency value of the first frequency point f3 with the smallest frequency value is 400HZ
  • the frequency value of frequency point f1 is 100HZ
  • the frequency difference between the two is 300HZ, which is less than 1000HZ.
  • the first frequency points f1, f2, and f3 can all be determined as the second frequency points.
  • the preset frequency value that is the same as the frequency value of the third frequency point fa1 and the frequency value of the third frequency point fa2 from the pre-saved preset frequency values, and then find the corresponding preset amplitude values respectively. , thereby determining the amplitude corresponding to the third frequency point fa1 and the amplitude corresponding to the third frequency point fa2. Then the masking signal is determined.
  • the difference in frequency value is greater than or equal to the first predetermined value.
  • a frequency threshold select the first frequency point with the largest power spectral density value, the first frequency point with the largest frequency value, and the first frequency point with the smallest frequency value among all first frequency points, and determine them as the fourth frequency point.
  • one frequency point is selected between each fourth frequency point and its adjacent first frequency point, and is determined as the fifth frequency point.
  • the second preset frequency threshold can be set according to the requirements of the actual application scenario. For example, the second preset frequency threshold can be set to 5Hz, 20Hz, or 50Hz.
  • the amplitude corresponding to the fifth frequency point is determined based on the preset human ear masking effect curve.
  • the specific content of the amplitude can be Refer to the contents of the foregoing embodiments, which will not be described again here.
  • a masking signal is generated based on the frequency value of the fifth frequency point and the amplitude corresponding to the fifth frequency point.
  • the masking signal includes a signal whose frequency is the frequency value corresponding to the fifth frequency point and whose amplitude is the amplitude value corresponding to the fifth frequency point.
  • first frequency points f4, f5 and f6 there are three first frequency points, namely frequency points f4, f5 and f6.
  • the first frequency point f5 corresponds to the largest power spectral density value
  • the first frequency point with the largest frequency value The frequency value of point f6 is 1500HZ
  • the frequency value of the first frequency point f4 with the smallest frequency value is 100HZ.
  • the frequency value that differs between the two is 1400HZ, which is greater than 1000HZ.
  • the first frequency points f4, f5 and f6 can be determined. It is the fourth frequency point.
  • one frequency point fa3, fa4 and fa5 can be selected respectively near the fourth frequency point f4, f5 and f6, and then the frequency points fa3, fa4 and fa5 are determined as the fifth frequency point.
  • the preset frequency value that is the same as the frequency value of the fifth frequency point fa3, the frequency value of the fifth frequency point fa4, and the frequency value of the fifth frequency point fa5 can be found from the pre-saved preset frequency values. Then the corresponding preset amplitudes are found respectively, thereby determining the amplitude corresponding to the fifth frequency point fa3, the amplitude corresponding to the fifth frequency point fa4, and the amplitude corresponding to the fifth frequency point fa5. Then the masking signal is determined.
  • the difference in frequency value is greater than or equal to the first predetermined value.
  • a frequency threshold in each frequency point interval, according to the order of corresponding power spectral density values from large to small, select a second preset number of first frequency points and determine them as the sixth frequency point, where each frequency point The difference in frequency value between the end frequency point and the start frequency point of the interval is less than or equal to the third preset frequency threshold, and the number of first frequency points included in each frequency point interval is greater than or equal to the third preset number.
  • the second preset number can be set according to the needs of the actual application scenario.
  • the second preset number can be set to 3 or 5, etc.
  • the third preset frequency threshold can also be set according to the needs of actual application scenarios.
  • the third preset frequency threshold can be set to 50 Hz or 100 Hz.
  • the third preset number can also be set according to the needs of the actual application scenario.
  • the third preset number can be set to 8 or 10, etc. This application does not limit this.
  • the seventh frequency point corresponding to each frequency point interval is determined, and the seventh frequency point is located between two adjacent sixth frequency points in the corresponding frequency point interval. That is, in each frequency point interval, one frequency point is selected between every two adjacent sixth frequency points, and the seventh frequency point corresponding to the frequency point interval is determined.
  • one frequency point can be selected arbitrarily between two adjacent sixth frequency points as the seventh frequency point.
  • the frequency point in the middle of two adjacent sixth frequency points can also be used as the seventh frequency point. This application does not limit this.
  • the amplitude corresponding to each seventh frequency point can be determined according to the preset human ear masking effect curve.
  • a masking signal can be generated based on the frequency value of each seventh frequency point and the corresponding amplitude of each seventh frequency point.
  • the masking signal includes a signal whose frequency value is the frequency value of the seventh frequency point and whose amplitude is the amplitude corresponding to the seventh frequency point.
  • the number of first frequency points is 10, among which the first frequency points f7, f8, f9, f10 and f11 are concentrated in one frequency point interval (for example, frequency point interval 1).
  • Frequency points f12, f13, f14, f15 and f16 are concentrated in one frequency point interval (for example, frequency point interval 2).
  • frequency point interval 1 f8, f9 and f10 can be selected in descending order of the power spectral density values corresponding to each first frequency point and determined as the sixth frequency point.
  • f13, f15 and f16 are selected and determined as the sixth frequency point.
  • frequency point interval 1 between the sixth frequency point f8 and f9, select a frequency point fa6, It is determined as the seventh frequency point, and between the sixth frequency point f9 and f10, a frequency point fa7 is selected and determined as the seventh frequency point.
  • frequency point interval 2 a frequency point fa8 is selected between the sixth frequency point f13 and f15 and determined as the seventh frequency point, and a frequency point fa9 is selected between the sixth frequency point f15 and f16. It is determined as the seventh frequency point.
  • the frequency value between the eighth frequency point and the first frequency point may differ by a fourth preset frequency threshold.
  • the fourth preset frequency threshold can be set according to the needs of actual application scenarios.
  • the fourth preset frequency threshold can be set to 5Hz, 10Hz or 20Hz, etc.
  • the amplitude corresponding to the eighth frequency point can be determined according to the preset human ear masking effect curve.
  • the content of the amplitude please refer to the content of the foregoing embodiments, which will not be described again here.
  • a masking signal can be generated based on the frequency value of the eighth frequency point and the amplitude corresponding to the eighth frequency point.
  • the masking signal includes a signal whose frequency value is the eighth frequency point and whose amplitude is the amplitude corresponding to the eighth frequency point.
  • the preset frequency value that is the same as the frequency value of the eighth frequency point f18 and the preset frequency value that is the same as the frequency value of the eighth frequency point f19 can be found respectively from the prestored preset frequency values, and then The corresponding preset amplitudes are found respectively, thereby determining the amplitude corresponding to the eighth frequency point f18 and the amplitude corresponding to the eighth frequency point f19. Then the masking signal is determined.
  • the frequency value between the ninth frequency point and the first frequency point may differ by a fifth preset frequency threshold.
  • the fifth preset frequency threshold can be set according to the needs of actual application scenarios.
  • the fifth preset frequency threshold can be set to 5Hz, 10Hz or 20Hz, etc.
  • one frequency point is selected between each ninth frequency point and the first frequency point, and is determined as the tenth frequency point. That is, between each ninth frequency point and the first frequency point, one frequency point is selected and determined as the tenth frequency point.
  • the amplitude corresponding to the tenth frequency point can be determined according to the preset human ear masking effect curve.
  • a masking signal can be generated based on the frequency value of the tenth frequency point and the amplitude corresponding to the tenth frequency point.
  • the masking signal includes a signal whose frequency is the frequency value corresponding to the tenth frequency point and whose amplitude is the amplitude value corresponding to the tenth frequency point.
  • one frequency point can be selected between the ninth frequency point f21 and the first frequency point f20, and between the ninth frequency point f22 and the first frequency point f20, respectively.
  • fa10 and fa11 are determined as the tenth frequency point.
  • the frequency value corresponding to the tenth frequency point fa10 and the tenth frequency point can be found respectively from the pre-saved preset frequency values.
  • the frequency value of the frequency point fa11 is the same as the preset frequency value, and then the corresponding preset amplitude value is found respectively, thereby determining the amplitude corresponding to the tenth frequency point fa10 and the amplitude corresponding to the tenth frequency point fa11. Then the masking signal is determined.
  • generating a pink noise signal based on the first power spectral density can be achieved by following the following steps:
  • Step S501 Determine the second average power.
  • the second average power refers to the average of the power spectral density values of all eleventh frequency points.
  • the eleventh frequency point refers to a frequency point whose power spectral density value is less than or equal to the first average power among all frequency points corresponding to the first power spectral density.
  • the first average power refers to the average of the power spectral density values of all frequency points corresponding to the first power spectral density.
  • Step S502 Obtain the preset pink noise bandpass filter gain corresponding to the second average power.
  • a gain rule table can be preset in the electronic device based on multiple call tests.
  • the gain rule table stores corresponding power (hereinafter referred to as preset power) and pink noise bandpass filter gain (hereinafter referred to as preset power). Let the pink noise bandpass filter gain).
  • the electronic device can find out the preset power that is the same as the second average power from the preset gain rule table, and then find out the storage corresponding to the preset power from the gain rule table.
  • the preset pink noise bandpass filter gain is determined as the preset pink noise bandpass filter gain corresponding to the second average power.
  • Step S503 Adjust the gain of the first bandpass filter to the preset pink noise bandpass filter gain.
  • the first band-pass filter is pre-set in the electronic device and is used for band-pass filtering the signal output by the pink noise signal source.
  • Step S504 Band-pass filter the signal output from the pink noise signal source through the gain-adjusted first band-pass filter to generate a pink noise signal.
  • step S201 the generated first power spectral density is the power spectral density of the first signal
  • step S504 the bandwidth of the first bandpass filter also needs to be set to the first bandwidth. , and then pass the first bandpass filter whose bandwidth is the first bandwidth and whose gain is the preset pink noise bandpass filter gain corresponding to the second average power, bandpass filter the signal output from the pink noise signal source to generate a pink noise signal .
  • Step S302 Adjust the masking signal and the pink noise signal to the same time delay.
  • Step S303 Generate a first speech signal based on the masking signal and the pink noise signal adjusted to the same time delay.
  • the masking signal and the pink noise signal adjusted to the same delay can be summed, and then the summed signal is determined as The first speech signal.
  • the masking signal and the pink noise signal adjusted to the same delay can also be weighted and summed, and then the weighted sum is obtained.
  • the signal is determined as the first speech signal.
  • the weights corresponding to the masking signal and the pink noise signal can be determined in advance through call tests.
  • the delay-adjusted masking signal and the pink noise signal can also be gain adjusted respectively.
  • the corresponding gain of the masking signal is the same as that of the pink noise signal.
  • the corresponding gains of the signals can be the same or different. They can be determined based on call tests and then pre-set in the electronic equipment. When adjusting the gain of the masking signal and pink noise signal, they can be retrieved from the electronic equipment.
  • the gain-adjusted masking signal and the pink noise signal are summed or weighted and the sum or weighted sum is The resulting signal is determined to be the first speech signal.
  • the appropriate masking signal and pink noise signal can be better determined in advance based on the call test, so that the generated first voice signal can subsequently better mask the downlink voice signal, ensuring that the user can clearly hear the call content while avoiding surrounding Other people in the environment can hear the user's call clearly, and the user experience is better.
  • the generated first voice signal can mask the signal at a frequency point with a larger power spectral density value in the downlink voice signal, and subsequently output the first voice signal and the third voice signal at the same time.
  • the intensity of the sound entering the ears of other people in the user's surrounding environment is weakened, so that the user's call content cannot be heard clearly, which can well protect the user's privacy and provide a better user experience.
  • generating the first speech signal according to the first power spectral density can also be implemented according to the following steps:
  • Step S601 Determine the first average power according to the first power spectral density.
  • the first average power refers to the average value of the power spectral density values of all frequency points corresponding to the first power spectral density.
  • Step S602 Determine the twelfth frequency point.
  • the twelfth frequency point refers to a frequency point whose corresponding power spectral density value is greater than the first average power among all frequency points corresponding to the first power spectral density.
  • Step S603 Select a fourth preset number of twelfth frequency points from all the twelfth frequency points in order of corresponding power spectral density values from large to small, and determine them as the thirteenth frequency points.
  • the fourth preset quantity can be set according to the needs of actual application scenarios.
  • the fourth preset number can be set to 3 or 5.
  • Step S604 Generate a notch filter according to the thirteenth frequency point.
  • the notch frequency of the notch filter includes the frequency value of the thirteenth frequency point. That is to say, after filtering by the notch filter, the signal with the frequency value of the thirteenth frequency point can be filtered out.
  • Step S605 Notch filter the signal output by the pink noise signal source through the notch filter to generate a first speech signal.
  • the notch frequency includes the frequency value of the thirteenth frequency point.
  • the filter performs notch filtering on the signal output by the pink noise signal source, and determines the signal obtained after notch filtering as the first speech signal.
  • step S201 the generated first power spectral density is the power spectral density of the first signal
  • step S605 it is also necessary to set the bandwidth of the notch filter to the first bandwidth, and then set the bandwidth to the first bandwidth.
  • a notch filter with a bandwidth and a notch frequency including the frequency value of the thirteenth frequency point performs notch filtering on the signal output by the pink noise signal source, and determines the signal obtained after notch filtering as the first speech signal.
  • the generated first voice signal can mask most of the remaining signals in the downlink voice signal except the signals at frequency points with large power spectral density values, and subsequently output the first voice signal at the same time and the second voice signal, so that the sound entering the ears of other people in the user's surrounding environment lacks signal information at most frequency points, so that the user's call content cannot be heard clearly, and the user's privacy can also be well protected.
  • the experience is better.
  • the electronic device may also determine, after calculating the first power spectral density, that among all power spectral density values included in the first power spectral density, the maximum power spectral density value is the same as the first average power Is the difference between them greater than or equal to the preset power spectral density threshold? If the difference between the maximum power spectral density value and the first average power is greater than or equal to the preset power spectral density threshold, then through the implementation shown in Figure 18, Generate a first speech signal. or, If the difference between the maximum power spectral density value and the first average power is less than the preset power spectral density threshold, the first speech signal is generated through any one of the implementations shown in FIGS. 6 to 17 .
  • the preset power spectral density threshold can be set according to the needs of the actual scenario.
  • the electronic device after the electronic device obtains the downlink voice signal in real time and continuously, it can adjust the implementation method of generating the first voice signal in real time according to the attributes of the downlink voice signal, so that a more appropriate method can be used according to the attributes of the downlink voice signal. In this way, a more suitable interference signal is generated, which ensures that the user can hear the call content clearly while preventing other people around the user from hearing the user's call content clearly, and the user experience is better.
  • generating the first voice signal according to the first power spectral density can also be implemented in the following manner: the electronic device can also generate a first voice signal through the implementation shown in Figure 18. (In this embodiment, the first voice signal can be subsequently recorded as a fourth voice signal), and a first voice signal is generated through any of the implementation methods shown in Figures 6 to 17 (in this embodiment , the first voice signal may be later recorded as the fifth voice signal). Then, the fourth voice signal and the fifth voice signal are summed or weighted and the signal obtained after the summation or weighted sum is determined as the interference signal. Afterwards, the interference signal may be determined as the first voice signal in step S202.
  • the attribute information of the downlink voice signal can be adjusted first. For example, attribute information such as signal-to-noise ratio and/or gain of the downlink speech signal is adjusted, and then the first speech signal is generated based on the adjusted downlink speech signal with the attribute information. This application does not limit this.
  • Step S102 Generate a second voice signal.
  • the second voice signal refers to a voice signal with the same time delay as the first voice signal obtained after performing time delay processing on the downlink voice signal.
  • other attribute information of the downlink voice signal can also be adjusted first, for example, adjusting the signal-to-noise ratio and /or attribute information such as gain. Then, the downlink voice signal adjusted by the attribute information is subjected to delay processing to obtain a voice signal with the same time delay as the first voice signal. Finally, the obtained voice signal is determined as the second voice signal.
  • the method of adjusting the attribute information of the downlink speech signal when generating the first speech signal may be the same as the method of adjusting the attribute information of the downlink speech signal when generating the second speech signal, or may be different. This application does not limit this.
  • Step S103 At the same output time, output the second voice signal through the first sound-generating component, and output the first voice signal through the second sound-generating component.
  • both the first voice signal and the second voice signal are converted into sound signals and then emitted.
  • the first voice signal can mask the second voice signal, that is, the interference signal of the downlink voice signal can mask the downlink voice signal.
  • the first sound-emitting component is disposed at the first position of the electronic device, when the user holds the phone to make a call, the first position is close to the user's ear, or even directly opposite the user's ear, and the second sound-emitting component is disposed at the second position far away from the user's ear. In this way, when the user turns up the volume, even if the first voice signal masks the second voice signal, the intensity of the sound entering the user's ears is still strong enough, and the user can clearly hear the content of the call.
  • a third sound-generating component may also be provided in the electronic device.
  • the third sound-generating component may be disposed at a third position close to the first position.
  • the first sound-generating component may be a sound-generating component of a ceramic-driven screen.
  • the second sound-generating component may be a speaker disposed at the bottom of the electronic device, and the third sound-generating component may be a speaker disposed at the top of the electronic device.
  • the voice signal output method provided by the present application may also include: generating a third voice signal.
  • the third voice signal refers to a voice signal that is similar to the first voice signal after delay processing of the downlink voice signal.
  • the third voice signal is output through the third sound-emitting component.
  • the sound emitted by the first sound emitting component can be supplemented by the sound emitted by the third sound emitting component, so that the sound heard by the user is clearer and the user's experience is improved.
  • the downlink voice signal can be masked by using an interference signal generated based on the downlink voice signal.
  • an interference signal generated based on the downlink voice signal When a user holds an electronic device to make a call, the sound entering the user's ears is strong enough to ensure that the user can clearly hear the call content.
  • the intensity of the sound reaching these people's ears is smaller and the information is incomplete, making it impossible for these people to hear clearly the content of the user's call, which is very good.
  • the privacy of users is effectively protected and the user experience is better.
  • the above embodiments introduce the speech signal output method provided by this application.
  • embodiments of the electronic device provided by this application are introduced.
  • the electronic device includes hardware structures and/or software modules corresponding to each function.
  • Persons skilled in the art should easily realize that, with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • Embodiments of the present application can divide the electronic device into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • FIG 19 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 1900 includes a first sound-generating component 1901 and a second sound-generating component 1902.
  • the first sound-generating component 1901 is configured In the first position of the electronic device 1900, when the user holds the electronic device to talk, the first position is close to the user's ear, and the second sound-generating component 1902 is disposed in a second position different from the first position, so
  • the electronic device 1900 also includes a memory 1903 and a processor 1904, and the memory 1903 is coupled to the processor 1904; the memory 1903 is used to store computer program code, and the computer program code includes computer instructions.
  • the processor When 1904 executes the computer instruction, the electronic device 1900 is caused to execute the voice signal output method described in any one of the embodiments of FIG. 3 to FIG. 18 .
  • the electronic device 1900 can perform the operations of the above method embodiments.
  • the processor 1904 may be configured to: generate a first voice signal, where the first voice signal refers to an interference signal generated based on the downlink voice signal; generate a second voice signal, The second voice signal refers to a voice signal with the same time delay as the first voice signal obtained after delay processing of the downlink voice signal; at the same output time, the first voice signal is The component outputs the second voice signal, and the second voice component outputs the first voice signal.
  • the processor 1904 is configured to generate a first speech signal, specifically: the processor 1904 is configured to generate a first power spectral density, where the first power spectral density refers to the The calculated power spectral density of the downlink speech signal is used to generate the first speech signal according to the first power spectral density.
  • the processor 1904 is configured to generate the first speech signal according to the first power spectral density. Specifically, the processor 1904 is configured to generate the first speech signal according to the first power spectral density. Generate a masking signal and a pink noise signal; adjust the masking signal and the pink noise signal to the same time delay; and generate the first speech signal based on the masking signal and the pink noise signal adjusted to the same time delay.
  • the processor 1904 is configured to generate a masking signal according to the first power spectral density. Specifically, the processor 1904 is configured to determine a first signal based on the first power spectral density. Average power, the first average power refers to the average of the power spectral density values of all frequency points corresponding to the first power spectral density; determine the first frequency point, the first frequency point refers to the first Among all frequency points corresponding to the power spectral density, the frequency point corresponding to the power spectral density value is greater than the first average power; the masking signal is generated according to the first frequency point.
  • the processor 1904 is configured to generate the masking signal according to the first frequency point. Specifically, the processor 1904 is configured to: if the number of the first frequency points is more than , and the difference in frequency value between the first frequency point with the largest frequency value and the first frequency point with the smallest frequency value among all the first frequency points is less than the first preset frequency threshold, then the corresponding power spectral density value is: In descending order, a first preset number of first frequency points are selected from all the first frequency points and determined as second frequency points; a third frequency point is determined, and the third frequency point is located adjacent to Between the two second frequency points; determine the amplitude corresponding to the third frequency point according to the preset human ear masking effect curve, and the amplitude is used to represent the strength of the signal; according to the third frequency The frequency value of the point and the amplitude value corresponding to the third frequency point generate the masking signal.
  • the processor 1904 is configured to generate the masking signal according to the first frequency point. Specifically, the processor 1904 is configured to: if the number of the first frequency points is more than , and the difference in frequency value between the first frequency point with the largest frequency value and the first frequency point with the smallest frequency value among all the first frequency points is greater than or equal to the first preset frequency threshold, then in all the first frequency points Select the first frequency point corresponding to the largest power spectral density value, the first frequency point with the largest frequency value, and the first frequency point with the smallest frequency value from the frequency points, and determine them as the fourth frequency point; in each of the fourth frequency points Nearby, where the difference in frequency value between each of the fourth frequency points is less than or equal to the second preset frequency threshold, select a frequency point and determine it as the fifth frequency point; determine the frequency point according to the preset human ear masking effect curve.
  • the amplitude corresponding to the fifth frequency point is used to represent the strength of the signal; the masking signal is generated according to the frequency value of
  • the processor 1904 is configured to generate the masking signal according to the first frequency point. Specifically, the processor 1904 is configured to: if the number of the first frequency points is more than , and the difference in frequency value between the first frequency point with the largest frequency value and the first frequency point with the smallest frequency value among all the first frequency points is greater than or equal to the first preset frequency threshold, then in each frequency point interval , according to the order of corresponding power spectral density values from large to small, select a second preset number of first frequency points and determine them as the sixth frequency point, where the end frequency point and the start frequency point of each frequency point interval are between points, frequency The value difference is less than or equal to the third preset frequency threshold, and the number of first frequency points included in each frequency point interval is greater than or equal to the third preset number; determine the seventh frequency point corresponding to each of the frequency point intervals, The seventh frequency point is located between two adjacent sixth frequency points in the corresponding frequency point interval; the amplitude corresponding to the seventh frequency point is determined according to the
  • the processor 1904 is configured to generate the masking signal according to the first frequency point, specifically: the processor 1904 is configured to, if the number of the first frequency points is one , then select one frequency point on both sides of the first frequency point and determine it as the eighth frequency point; according to the preset human ear masking effect curve, determine the amplitude corresponding to the eighth frequency point, and the amplitude is In order to represent the strength of the signal; the masking signal is generated according to the frequency value of the eighth frequency point and the amplitude corresponding to the eighth frequency point.
  • the processor 1904 is configured to generate the masking signal according to the first frequency point, specifically: the processor 1904 is configured to, if the number of the first frequency points is one , then select one frequency point on both sides of the first frequency point and determine it as the ninth frequency point; select one frequency point between each of the ninth frequency points and the first frequency point and determine it as The tenth frequency point; determine the amplitude corresponding to the tenth frequency point according to the preset human ear masking effect curve, and the amplitude is used to represent the strength of the signal; according to the frequency value of the tenth frequency point and the The amplitude corresponding to the tenth frequency point is used to generate the masking signal.
  • the processor 1904 is configured to generate a pink noise signal according to the first power spectral density, specifically: the processor 1904 is configured to determine a second average power, and the second average power Refers to the average power spectral density value of all eleventh frequency points.
  • the eleventh frequency point refers to the power spectral density value of all frequency points corresponding to the first power spectral density that is less than or equal to the first average power.
  • the first average power refers to the average of the power spectral density values of all frequency points corresponding to the first power spectral density; obtain the preset pink noise bandpass filter gain corresponding to the second average power ; Adjust the gain of the first band-pass filter to the preset pink noise band-pass filter gain; perform band-pass filtering on the signal output by the pink noise signal source through the gain-adjusted first band-pass filter to generate the Describe the pink noise signal.
  • the processor 1904 is configured to generate the first speech signal according to the first power spectral density, specifically: the processor 1904 is configured to generate the first speech signal according to the first power spectral density, Determine the first average power, the first average power refers to the average of the power spectral density values of all frequency points corresponding to the first power spectral density; determine the twelfth frequency point, the twelfth frequency point is Refers to the frequency points at which the corresponding power spectral density value is greater than the first average power among all the frequency points corresponding to the first power spectral density; in the order of the corresponding power spectral density values from large to small, in all the tenth A fourth preset number of twelfth frequency points are selected from the two frequency points and determined as the thirteenth frequency point; a notch filter is generated according to the thirteenth frequency point, and the notch of the notch filter is The frequency includes the frequency value of the thirteenth frequency point; the signal output from the pink noise signal source is notch filtered through
  • the processor 1904 is configured to generate a first power spectral density, specifically: the processor 1904 is configured to band-pass filter the downlink speech signal through a second band-pass filter. , obtain the first signal within the first bandwidth range; the first bandwidth is the bandwidth of the second bandpass filter; calculate the power spectral density of the first signal; determine the power spectral density of the first signal is the first power spectral density.
  • the electronic device 1900 further includes a third sound-generating component, the third sound-generating component is disposed at a third position close to the first position, and the processor 1904 is further configured to: generate a third sound-generating component.
  • the third voice signal refers to a voice signal with the same delay as the first voice signal obtained after delay processing of the downlink voice signal; at the same output time, through The third sound-generating component outputs the third voice signal.
  • each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the embodiment of the present application also provides a computer program product.
  • the computer program product includes: a computer program or instructions. When the computer program or instructions are run on a computer, the computer executes the method. method in any of the examples.
  • the embodiment of the present application also provides a computer storage medium.
  • the computer storage medium stores a computer program or instructions.
  • the computer program or instructions When the computer program or instructions are run on the computer, the computer executes the method. method in any of the examples.
  • the disclosed devices and methods can be implemented in other ways.
  • the electronic device embodiments described above are only illustrative.
  • the division of modules is only a logical function division. In actual implementation, there may be other division methods.
  • multiple modules or components may be combined. Either it can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present application can be integrated into a processing unit, or each module can exist physically alone, or two or more modules can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
  • the electronic devices, computer storage media, and computer program products provided by the above embodiments of the present application are all used to execute the methods provided above. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects corresponding to the methods provided above. I won’t go into details here.
  • each step should be determined by its function and internal logic.
  • the size of each step number does not mean the order of execution, and does not limit the implementation process of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Telephone Function (AREA)
  • Noise Elimination (AREA)

Abstract

本申请实施例提供了一种语音信号的输出方法和电子设备。该方法可以生成第一语音信号,第一语音信号是指根据下行语音信号生成的干扰信号;以及,生成第二语音信号,第二语音信号是指对下行语音信号进行时延处理后,得到的与第一语音信号具有相同时延的语音信号;然后,可以在相同的输出时间,分别通过可以靠近人耳的第一发声组件输出第二语音信号,以及通过远离人耳的第二发声组件输出第一语音信号。这样,可以通过根据下行语音信号生成的干扰信号,对下行语音信号进行掩蔽,使得用户之外的其他人无法听清楚用户的通话内容,从而很好地保护了用户的隐私。

Description

一种语音信号的输出方法和电子设备
本申请要求于2022年8月11日提交到国家知识产权局、申请号为202210960657.7、发明名称为“一种语音信号的输出方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种语音信号的输出方法和电子设备。
背景技术
目前,为了减少手持通话中声音的泄露,电子设备可以通过陶瓷驱动屏幕的发声组件发声。这样,用户手持通话的过程中,陶瓷驱动屏幕的发声组件中,屏幕发声区域可以正对用户的耳朵,电子设备发出的声音,可以大部分进入用户的耳中,只有极少一部分声音会泄露到周围环境中,所以,周围环境中的其他人,无法听清楚用户的通话内容,从而可以很好地保护用户的隐私。
但是,如果用户增大通话音量,进一步提高声音在人耳内的响度,就会增大陶瓷驱动屏幕的发声组件中陶瓷器件的驱动,这样的话,泄露到用户周围的声音强度也会增大,其他人便可以听清楚用户的通话内容,从而造成用户隐私的泄露,用户体验较差。
发明内容
本申请实施例提供了一种语音信号的输出方法和电子设备,以解决目前电子设备在手持通话的场景中,增大音量会造成用户通话内容的泄露,用户体验较差的问题。
第一方面,本申请实施例提供了一种语音信号的输出方法,该方法用于电子设备,所述电子设备包括第一发声组件和第二发声组件,所述第一发声组件设置于所述电子设备的第一位置,用户手持所述电子设备通话时,所述第一位置靠近所述用户的耳朵,所述第二发声组件设置于与所述第一位置不同的第二位置,所述方法包括:生成第一语音信号,所述第一语音信号是指根据下行语音信号生成的干扰信号;生成第二语音信号,所述第二语音信号是指对所述下行语音信号进行时延处理后,得到的与所述第一语音信号具有相同时延的语音信号;在相同的输出时间,分别通过所述第一发声组件输出所述第二语音信号,以及通过所述第二发声组件输出所述第一语音信号。
这样,可以通过根据下行语音信号生成的干扰信号,对下行语音信号进行掩蔽。在用户手持电子设备进行通话时,进入用户耳朵的声音强度足够大,可以保证用户可以清楚地听到通话内容。而对于周围环境中的其他人,由于干扰信号对下行语音信号的掩蔽作用,传到这些人耳朵的声音的强度较小,信息不完整,使得这些人无法听清楚用户的通话内容,从而很好地保护了用户的隐私,用户的体验更好。
在一种实现方式中,所述生成第一语音信号,包括:生成第一功率谱密度,所述第一功率谱密度是指根据所述下行语音信号计算得到的功率谱密度;根据所述第一功率谱密度生成所述第一语音信号。
在一种实现方式中,所述根据所述第一功率谱密度生成所述第一语音信号,包括:根据所述第一功率谱密度生成掩蔽信号和粉噪信号;调整所述掩蔽信号和所述粉噪信号至相同时延;根据调整至相同时延的所述掩蔽信号和所述粉噪信号,生成所述第一语音信号。
这样,生成的第一语音信号,可以掩蔽掉下行语音信号中功率谱密度值较大的频点的信号,后续同时输出第一语音信号和第二语音信号之后,使得进入用户周围环境中其它人耳中的声音的强度减弱,从而无法听清楚用户的通话内容,可以很好地保护用户的隐私,用户的体验更好。
在一种实现方式中,所述根据所述第一功率谱密度生成掩蔽信号,包括:根据所述第一功率谱密度,确定第一平均功率,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;确定第一频点,所述第一频点是指所述第一功率谱密度对应的所有频点中,对应功率谱密度值大于所述第一平均功率的频点;根据所述第一频点生成所述掩蔽信号。
这样,可以生成能够掩蔽掉下行语音信号中功率谱密度值较大的信号的掩蔽音,后续可以掩蔽掉下行语音信号中功率谱密度值较大的信号,使得进入用户周围环境中其它人耳中的声音的强度减弱,从而无法听清楚用户的通话内容,进而可以很好地保护用户的隐私。
在一种实现方式中,所述根据所述第一频点生成所述掩蔽信号,包括:如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差小于第一预设频率阈值,则按照对应功率谱密度值由大至小的顺序,在所有所述第一频点中选出第一预设数量的第一频点,确定为第二频点;确定第三频点,所述第三频点位于相邻两个所述第二频点之间;根据预设人耳掩蔽效应曲线,确定所述第三频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第三频点的频率值和所述第三频点对应的幅值,生成所述掩蔽信号。
这样,当功率谱密度值大于第一平均功率的频点较为集中时,可以确定出较为准确的掩蔽信号,后续掩蔽效果较好。
在一种实现方式中,所述根据所述第一频点生成所述掩蔽信号,包括:如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在所有所述第一频点中选出对应功率谱密度值最大的第一频点、频率值最大的第一频点和频率值最小的第一频点,确定为第四频点;在各所述第四频点附近,与各所述第四频点之间,频率值相差小于等于第二预设频率阈值处,分别选取一个频点,确定为第五频点;根据预设人耳掩蔽效应曲线,确定所述第五频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第五频点的频率值和所述第五频点对应的幅值,生成所述掩蔽信号。
这样,当功率谱密度值大于第一平均功率的频点分布较为离散,且频率值最大的频点与频率值最小的频点之间频率值相差较大时,可以确定出较为准确的掩蔽信号,后续掩蔽效果较好。
在一种实现方式中,所述根据所述第一频点生成所述掩蔽信号,包括:如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在各频点区间中,按照对应功率谱密度值由大至小的顺序,选出第二预设数量的第一频点,确定为第六频点,其中,各所述频 点区间的终止频点与起始频点之间,频率值相差小于等于第三预设频率阈值,且各所述频点区间包括的第一频点的数量大于等于第三预设数量;确定各所述频点区间对应的第七频点,所述第七频点位于相应频点区间中相邻两个所述第六频点之间;根据预设人耳掩蔽效应曲线,确定所述第七频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第七频点的频率值和所述第七频点对应的幅值,生成所述掩蔽信号。
这样,当功率谱密度值大于第一平均功率的频点有两个或两个以上密集区间,且频率值最大的频点与频率值最小的频点之间频率值相差较大时,可以确定出较为准确的掩蔽信号,后续掩蔽效果较好。
在一种实现方式中,所述根据所述第一频点生成所述掩蔽信号,包括:如果所述第一频点的数量为一个,则在所述第一频点两侧分别选取一个频点,确定为第八频点;根据预设人耳掩蔽效应曲线,确定所述第八频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第八频点的频率值和所述第八频点对应的幅值,生成所述掩蔽信号。
这样,当功率谱密度值大于第一平均功率的频点仅有一个时,可以确定出较为准确的掩蔽信号,后续掩蔽效果较好。
在一种实现方式中,所述根据所述第一频点生成所述掩蔽信号,包括:如果所述第一频点的数量为一个,则在所述第一频点两侧分别选取一个频点,确定为第九频点;在各所述第九频点与所述第一频点之间,分别选取一个频点,确定为第十频点;根据预设人耳掩蔽效应曲线,确定所述第十频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第十频点的频率值和所述第十频点对应的幅值,生成所述掩蔽信号。
这样,当功率谱密度值大于第一平均功率的频点仅有一个时,可以确定出较为准确的掩蔽信号,后续掩蔽效果较好。
在一种实现方式中,所述根据所述第一功率谱密度生成粉噪信号,包括:确定第二平均功率,所述第二平均功率是指所有第十一频点的功率谱密度值的平均值,所述第十一频点是指所述第一功率谱密度对应的所有频点中,功率谱密度值小于等于第一平均功率的频点,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;获取所述第二平均功率对应的预设粉噪带通滤波增益;调整第一带通滤波器的增益为所述预设粉噪带通滤波增益;通过增益调整后的所述第一带通滤波器对粉噪信号源输出的信号进行带通滤波,生成所述粉噪信号。
这样,可以确定出可以与掩蔽信号相互配合的粉噪信号,使得后续生成的干扰信号能够更好地对下行语音信号进行掩蔽,从而较好地保护用户的通话隐私。
在一种实现方式中,所述根据所述第一功率谱密度生成所述第一语音信号,包括:根据所述第一功率谱密度,确定第一平均功率,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;确定第十二频点,所述第十二频点是指所述第一功率谱密度对应的所有频点中,对应功率谱密度值大于所述第一平均功率的频点;按照对应功率谱密度值由大至小的顺序,在所有所述第十二频点中选出第四预设数量的第十二频点,确定为第十三频点;根据所述第十三频点,生成陷波滤波器,所述陷波滤波器的陷波频率包括所述第十三频点的频率值;通过所述陷波滤波器对粉噪信号源输出的信号进行陷波滤波,生成所述第一语音信号。
这样,生成的第一语音信号,可以掩蔽掉下行语音信号中除功率谱密度值较大的频点 的信号之外的其余大部分信号,后续同时输出第一语音信号和第二语音信号之后,使得进入用户周围环境中其他人耳中的声音缺失大部分频点的信号的信息,从而无法听清楚用户的通话内容,同样可以很好地保护用户的隐私,用户的体验更好。
在一种实现方式中,所述生成第一功率谱密度,包括:通过第二带通滤波器对所述下行语音信号进行带通滤波,得到第一带宽范围内的第一信号;所述第一带宽为所述第二带通滤波器的带宽;计算所述第一信号的功率谱密度;确定所述第一信号的功率谱密度为所述第一功率谱密度。
这样,可以先将下行语音信号中,人耳听不见的信号,通过第二带通滤波器滤掉,从而可以提高下行语音信号的输出效率,使得通话过程更加流畅,用户体验更好。
在一种实现方式中,所述电子设备还包括第三发声组件,所述第三发声组件设置于靠近所述第一位置的第三位置,所述方法还包括:生成第三语音信号,所述第三语音信号是指对所述下行语音信号进行时延处理后,得到的与所述第一语音信号具有相同时延的语音信号;在所述相同的输出时间,通过所述第三发声组件输出所述第三语音信号。
这样,可以通过第三发声组件发出的声音对第一发声组件发出的声音进行补充,使得用户听到的声音更加清楚,提高用户的体验。
第二方面,本申请实施例提供了一种电子设备,该电子设备包括第一发声组件和第二发声组件,所述第一发声组件设置于所述电子设备的第一位置,用户手持所述电子设备通话时,所述第一位置靠近所述用户的耳朵,所述第二发声组件设置于与所述第一位置不同的第二位置,该电子设备还包括存储器和处理器,所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使所述电子设备执行第一方面中任一项所述的方法。
这样,电子设备可以通过根据下行语音信号生成的干扰信号,对下行语音信号进行掩蔽。在用户手持电子设备进行通话时,进入用户耳朵的声音强度足够大,可以保证用户可以清楚地听到通话内容。而对于周围环境中的其他人,由于干扰信号对下行语音信号的掩蔽作用,传到这些人耳朵的声音的强度较小,信息不完整,使得这些人无法听清楚用户的通话内容,从而很好地保护了用户的隐私,用户的体验更好。
第三方面,本申请提供了一种计算机存储介质,所述计算机存储介质中存储有计算机程序或指令,当所述计算机程序或指令被执行时,如第一方面中任一项所述的方法被执行。
综上,通过本申请实施例提供的语音信号的输出方法和电子设备,可以通过根据下行语音信号生成的干扰信号,对下行语音信号进行掩蔽。在用户手持电子设备进行通话时,进入用户耳朵的声音强度足够大,可以保证用户可以清楚地听到通话内容。而对于周围环境中的其他人,由于干扰信号对下行语音信号的掩蔽作用,传到这些人耳朵的声音的强度较小,信息不完整,使得这些人无法听清楚用户的通话内容,从而很好地保护了用户的隐私,用户的体验更好。
附图说明
图1是本申请实施例提供的电子设备的结构示意图;
图2是本申请实施例提供的电子设备的软件结构框图;
图3是本申请实施例提供的语音信号的输出方法的流程示意图;
图4是本申请实施例提供的一种应用场景示意图;
图5是本申请实施例提供的一种生成第一语音信号的方法的流程示意图;
图6是本申请实施例提供的一种根据第一功率谱密度生成第一语音信号的方法的流程示意图;
图7是本申请实施例提供的一种根据第一功率谱密度生成掩蔽信号的方法的流程示意图;
图8是本申请实施例提供的另一种应用场景示意图;
图9是本申请实施例提供的另一种应用场景示意图;
图10是本申请实施例提供的另一种应用场景示意图;
图11是本申请实施例提供的另一种应用场景示意图;
图12是本申请实施例提供的另一种应用场景示意图;
图13是本申请实施例提供的另一种应用场景示意图;
图14是本申请实施例提供的另一种应用场景示意图;
图15是本申请实施例提供的另一种应用场景示意图;
图16是本申请实施例提供的另一种应用场景示意图;
图17是本申请实施例提供的一种根据第一功率谱密度生成粉噪信号的方法的流程示意图;
图18是本申请实施例提供的另一种根据第一功率谱密度生成第一语音信号的方法的流程示意图;
图19是本申请实施例提供的一种电子设备的结构框图。
具体实施方式
下面结合附图,对本申请的技术方案进行描述。
在本申请的描述中,除非另有说明,“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,“至少一个”是指一个或多个,“至少两个”是指两个或两个以上,“多个”也是指两个或两个以上。“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
需要说明的是,本申请中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。
为了便于理解本申请的技术方案,下面先对本申请提供的技术方案的应用场景进行示例性说明。
目前,电子设备中可以设置多个发声组件,例如,通过陶瓷驱动屏幕的发声组件、顶部设置的扬声器和底部设置的扬声器等。在手持通话的场景中,电子设备通常通过陶瓷驱动屏幕的发声组件和/或顶部设置的扬声器,进行发声。这样的话,用户手持通话的过程中,陶瓷驱动屏幕的发声组件中,屏幕发声区域可以正对用户的耳朵,顶部扬声器也可以紧挨用户的耳朵,电子设备发出的大部分声音,可以进入用户的耳内,仅有极少部分的声音会泄露到周围环境中,周围环境中即使有其他人存在,也无法听清楚用户通话的内容,可以很好的保护用户的隐私。
但是,当用户调高通话的音量,进一步提升声音在耳内的响度时,陶瓷驱动会进一步增大,陶瓷驱动屏幕的发声组件发出的声音,以及顶部设置的扬声器发出的声音,强度也会增大,使得泄露到周围环境中的声音的强度也同步增大,这样的话,周围环境中的其他人,便可以听清楚用户的通话内容,从而导致用户通话内容的泄露,造成用户的隐私泄露,用户体验较差。
为了解决上述技术问题,本申请实施例提供了一种语音信号的输出方法、装置和电子设备。该方法可以应用于电子设备中,通过该方法,电子设备可以根据用户通话过程中的下行语音信号生成干扰信号,然后将下行语音信号和该干扰信号调整至相同的时延后,在相同的输出时间,分别通过电子设备中靠近人耳的第一发声组件输出下行语音信号,以及通过远离人耳的第二发声组件输出干扰信号,从而通过干扰信号对下行语音信号进行掩蔽,使得周围环境中的其他人无法听清楚用户的通话内容,达到保护用户通话隐私的目的,用户的体验更好。
需要说明的是,本申请的电子设备可以是静止的,也可以是移动的。电子设备可以包括通信终端、车载设备、移动设备、用户终端、移动终端、无线通信设备、便携式终端、用户代理、用户装置、服务设备或用户设备(user equipment,UE)等计算机网络中处于网络最外围的设备,主要用于数据的输入以及处理结果的输出或显示等。例如,终端设备可以是移动电话、无绳电话、智能手表、可穿戴设备、平板设备、具备无线通信功能的手持设备、计算设备、车载通信模块或连接到无线调制解调器的其它处理设备等。
示例性地,可以参见图1,图1示出了电子设备100的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令 的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local  area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。在另一些实施例中,电子设备100的显示屏194还可以称为屏幕194,电子设备100的屏幕194上还可以设置一块发声区域,电子设备100可以通过陶瓷或其它驱动器件,驱动发声区域的屏幕,进行发声。在手持通话的场景下,屏幕194上的发声区域,可以正对人耳。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。电子设备100中可以设置多个扬声器170A,例如,可以在电子设备100的顶部设置一个扬声器170A,还可以在底部设置一个扬声器170A等。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。在一些实施例中,也 可以将扬声器170A和受话器170B设置为一个部件,本申请对此不进行限制。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设 备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图2是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管 理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
下面,对本申请提供的语音信号的输出方法的实施例进行说明。
参见图3,图3为本申请实施例提供的语音信号的输出方法的流程示意图。该方法可以应用于电子设备中,该电子设备中设置有第一发声组件和第二发声组件。其中,第一发声组件可以设置于电子设备的第一位置,用户手持电子设备通话时,即电子设备处于手持通话的应用场景中时,第一位置靠近用户的耳朵。第二发声组件可以设置于电子设备中与第一位置不同的第二位置。
例如,参见图4,图4为本申请实施例提供的一种应用场景示意图。如图4所示,第一位置可以位于电子设备100的屏幕194上,第一发声组件401可以为陶瓷驱动屏幕的发声组件。第二位置可以位于电子设备100的底部,第二发声组件402可以为设置于电子设备100底部的扬声器。
需要说明的是,第一位置和第二位置还可以位于电子设备的其它位置。例如,第一位置还可以位于电子设备的顶部、屏幕上方的位置。第二位置还可以位于电子设备的侧面等。本申请对此不进行限制。同理,第一发声组件和第二发声组件还可以设置为其它发声组件,例如,第一发声组件还可以为设置于电子设备顶部的扬声器,第二发声组件还可以为设置于电子设备侧面的扬声器等。本申请对此不进行限制。
如图3所示,该方法可以包括以下步骤:
步骤S101、生成第一语音信号。
其中,第一语音信号是指根据下行语音信号生成的干扰信号。下行语音信号可以为用户通话中的下行语音信号。
需要说明的是,在手持通话的应用场景中,用户手持的电子设备(以下简称电子设备)可以实时且连续地接收其它终端设备(以下简称终端设备)传输到该电子设备的下行语音信号。每次接收到下行语音信号之后,电子设备都可以按照本申请实施例提供的语音信号的输出方法,对本次接收到的下行语音信号进行处理后输出。
如图5所示,生成第一语音信号,可以按照以下步骤实现:
步骤S201、生成第一功率谱密度。
其中,第一功率谱密度是指根据下行语音信号计算得到的功率谱密度。
一种可能的实现方式中,第一功率谱密度可以是下行语音信号的功率谱密度。具体实现中,首先可以通过自相关函数法计算下行语音信号的功率谱密度,然后将计算得到的下行语音信号的功率谱密度确定为第一功率谱密度。
另一种可能的实现方式中,还可以先通过预先设置的第二带通滤波器对下行语音信号进行带通滤波,得到第一带宽范围内的第一信号,第一带宽为第二带通滤波器的带宽。然后通过自相关函数法计算第一信号的功率谱密度,之后,将计算得到的第一信号的功率谱密度确定为第一功率谱密度。这样,可以先将下行语音信号中,人耳听不见的信号,通过第二带通滤波器滤掉,从而可以提高下行语音信号的输出效率,使得通话过程更加流畅,用户体验更好。
步骤S202、根据所述第一功率谱密度生成第一语音信号。
具体实现时,根据第一功率谱密度生成第一语音信号,可以包括多种实现方式。示例性地,如图6所示,根据第一功率谱密度生成第一语音信号,可以按照以下步骤实现:
步骤S301、根据第一功率谱密度生成掩蔽信号和粉噪信号。
其中,掩蔽信号可以用于对下行语音信号进行掩蔽,两者同时输出后,下行语音信号对应的声音信号进入周围环境中其他人耳中的强度减弱,信息不完整,即使用户增大音量,周围环境中的其他人也无法听清楚用户的通话内容,可以很好的保护用户的隐私。
如图7所示,根据第一功率谱密度生成掩蔽信号,可以按照以下步骤实现:
步骤S401、根据第一功率谱密度,确定第一平均功率。
其中,第一平均功率是指第一功率谱密度对应的所有频点的功率谱密度值的平均值。
步骤S402、确定第一频点。
其中,第一频点是指第一功率谱密度对应的所有频点中,对应功率谱密度值大于第一平均功率的频点。
例如,如图8所示,第一功率谱密度对应的所有频点中,频点f1、f2和f3对应的功率谱密度值大于第一平均功率,则可以将频点f1、f2和f3均确定为第一频点。
步骤S403、根据所述第一频点生成掩蔽信号。
具体实现中,第一频点的数量可以为多个,也可以为一个。基于此,根据第一频点生成掩蔽信号的实现方式,也可以包括多种。
示例性地,如果第一频点的数量为多个,且所有第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差小于第一预设频率阈值,则按照对应功率谱密度 值由大至小的顺序,在所有第一频点中选出第一预设数量的第一频点,确定为第二频点。其中,第一预设频率阈值可以根据实际应用场景的需求设置,例如,可以设置为1000HZ。第一预设数量也可以根据实际应用场景的需求设置,例如,可以设置为3个或5个。
然后,确定第三频点,第三频点位于相邻两个第二频点之间,即,每相邻的两个第二频点之间可以选取一个第三频点。可以将确定出的第三频点作为单掩蔽音的频点。可选地,可以在相邻的两个第二频点之间任意选择一个频点,确定为第三频点。可选地,也可以选择位于相邻两个第二频点的中间的频点,作为第三频点,本申请对此不进行限制。
之后,可以根据预设人耳掩蔽效应曲线,确定第三频点对应的幅值,该幅值可以用于表征信号的强弱。最后,可以根据第三频点的频率值和第三频点对应的幅值,生成掩蔽信号。掩蔽信号包括频率为第三频点对应的频率值,且幅值为相应第三频点对应的幅值的信号。
具体实现时,可以将人耳掩蔽效应曲线包括的点对应的频率值(以下记为预设频率值,单位为赫兹Hz)和幅值(以下记为预设幅值)对应存储在电子设备中。在根据预设人耳掩蔽效应曲线,确定第三频点对应的幅值时,可以从存储的预设频率值中筛选出与第三频点的频率值相同的预设频率值,然后将与筛选出的该预设频率值对应存储的预设幅值确定为第三频点对应的幅值。其中,幅值可以为声强,单位为分贝(dB),或者,幅值也可以为功率谱密度值,单位为分贝/赫兹,或者,幅值还可以为功率等。本申请对此不进行限制。或者,如果从存储的预设频率值中没有找到与第三频点的频率值相同的预设频率值,可以采用插值的方式,根据对应存储的预设频率值和预设幅值,确定第三频点对应的幅值。
例如,仍然如图8所示,第一频点的数量为三个,分别为f1、f2和f3,其中,频率值最大的第一频点f3的频率值为400HZ,频率值最小的第一频点f1的频率值为100HZ,两者相差的频率值为300HZ,小于1000HZ,则可以将第一频点f1、f2和f3,均确定为第二频点。
然后,如图9所示,可以在第二频点f1和第二频点f2中间选取一个频点fa1,还可以在第二频点f2和第二频点f3之间,选取一个频点fa2,然后,将频点fa1和频点fa2确定为第三频点。
之后,可以分别从预先保存的预设频率值中,找到与第三频点fa1的频率值和第三频点fa2的频率值相同的预设频率值,然后再分别找到对应的预设幅值,从而确定出第三频点fa1对应的幅值和第三频点fa2对应的幅值。进而确定出掩蔽信号。
示例性地,如果第一频点的数量为多个,且所有第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在所有第一频点中选出对应功率谱密度值最大的第一频点、频率值最大的第一频点和频率值最小的第一频点,确定为第四频点。
然后,在各第四频点与其相邻第一频点之间,分别选取一个频点,确定为第五频点。或者,也可以在各个第四频点附近,与各所述第四频点之间,频率值相差小于等于第二预设频率阈值处,分别选取一个频点,确定为第五频点,即,各个第五频点与其相邻的第四频点之间,频率值相差小于等于第二预设频率阈值。其中,第二预设频率阈值可以根据实际应用场景的需求设置,例如,第二预设频率阈值可以设置为5Hz、20Hz或50Hz等。
之后,根据预设人耳掩蔽效应曲线,确定第五频点对应的幅值。幅值的具体内容可以 参考前述实施例的内容,此处不再赘述。最后,根据第五频点的频率值和第五频点对应的幅值,生成掩蔽信号。掩蔽信号包括频率为第五频点对应的频率值,且幅值为相应第五频点对应的幅值的信号。
例如,如图10所示,第一频点的数量为三个,分别为频点f4、f5和f6,其中,第一频点f5对应的功率谱密度值最大,频率值最大的第一频点f6的频率值为1500HZ,频率值最小的第一频点f4的频率值为100HZ,两者相差的频率值为1400HZ,大于1000HZ,则可以将第一频点f4、f5和f6,均确定为第四频点。
然后,如图11所示,可以分别在第四频点f4、f5和f6附近,各选取一个频点fa3、fa4和fa5,然后,将频点fa3、fa4和fa5确定为第五频点。
之后,可以分别从预先保存的预设频率值中,找到与第五频点fa3的频率值、第五频点fa4的频率值、以及第五频点fa5的频率值相同的预设频率值,然后再分别找到对应的预设幅值,从而确定出第五频点fa3对应的幅值、第五频点fa4对应的幅值、以及第五频点fa5对应的幅值。进而确定出掩蔽信号。
具体实现过程,还可以参考前述实施例的内容,此处不再赘述。
示例性地,如果第一频点的数量为多个,且所有第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在各频点区间中,按照对应功率谱密度值由大至小的顺序,选出第二预设数量的第一频点,确定为第六频点,其中,各个频点区间的终止频点与起始频点之间,频率值相差小于等于第三预设频率阈值,且各个频点区间包括的第一频点的数量大于等于第三预设数量。其中,第二预设数量可以根据实际应用场景的需求设置,例如,可以将第二预设数量设置为3个或5个等。第三预设频率阈值也可以根据实际应用场景的需求设置,例如,可以将第三预设频率阈值设置为50赫兹或100赫兹等。第三预设数量也可以根据实际应用场景的需求设置,例如,可以将第三预设数量设置为8个或10个等。本申请对此不进行限制。
然后,确定各个频点区间对应的第七频点,第七频点位于相应频点区间中相邻两个第六频点之间。即,在每个频点区间中,每两个相邻的第六频点之间,分别选取一个频点,确定为该频点区间对应的第七频点。可选地,可以在两个相邻的第六频点之间,任意选取一个频点,作为第七频点。可选地,还可以将两个相邻的第六频点中间的频点,作为第七频点。本申请对此不进行限制。
之后,可以根据预设人耳掩蔽效应曲线,确定各个第七频点对应的幅值,幅值的具体内容可以参考前述实施例的内容,此处不再赘述。最后,可以根据各个第七频点的频率值和各个第七频点对应的幅值,生成掩蔽信号。掩蔽信号包括频率值为第七频点的频率值,幅值为第七频点对应的幅值的信号。
例如,如图12所示,第一频点的数量为10个,其中,第一频点f7、f8、f9、f10和f11集中在一个频点区间(例如频点区间1)中,第一频点f12、f13、f14、f15和f16集中在一个频点区间(例如频点区间2)中。则可以在频点区间1中,按照各第一频点对应的功率谱密度值由大至小的顺序,选取f8、f9和f10,确定为第六频点。以及,在频点区间2中,按照各第一频点对应的功率谱密度值由大至小的顺序,选取f13、f15和f16,确定为第六频点。
然后,如图13所示,在频点区间1中,在第六频点f8和f9之间,选取一个频点fa6, 确定为第七频点,以及,在第六频点f9和f10之间,选取一个频点fa7,确定为第七频点。在频点区间2中,在第六频点f13和f15之间,选取一个频点fa8,确定为第七频点,以及,在第六频点f15和f16之间,选取一个频点fa9,确定为第七频点。
之后,可以分别从预先保存的预设频率值中,找到与第七频点fa6的频率值、第七频点fa7的频率值、第七频点fa8的频率值、以及第七频点fa9的频率值相同的预设频率值,然后再分别找到对应的预设幅值,从而确定出第七频点fa6对应的幅值、第七频点fa7对应的幅值、第七频点fa8对应的幅值、以及第七频点fa9对应的幅值。进而确定出掩蔽信号。
具体实现过程,还可以参考前述实施例的内容,此处不再赘述。
示例性地,如果第一频点的数量为一个,则可以在第一频点两侧分别选取一个频点,确定为第八频点。具体实现时,第八频点与第一频点之间,频率值可以相差第四预设频率阈值。其中,第四预设频率阈值可以根据实际应用场景的需求设置。例如,可以将第四预设频率阈值设置为5Hz、10Hz或20Hz等。
然后,可以根据预设人耳掩蔽效应曲线,确定第八频点对应的幅值,幅值的内容可以参考前述实施例的内容,此处不再赘述。
之后,可以根据第八频点的频率值和第八频点对应的幅值,生成掩蔽信号。掩蔽信号包括频率值为第八频点的频率值,幅值为第八频点对应的幅值的信号。
例如,如图14所示,第一频点只有一个,为频点f17,则可以在第一频点f17两侧分别选取一个频点f18和f19,确定为第八频点。
然后,可以分别从预先存储的预设频率值中,找到与第八频点f18的频率值相同的预设频率值,以及与第八频点f19的频率值相同的预设频率值,然后再分别找到相应的预设幅值,从而确定出第八频点f18对应的幅值,以及第八频点f19对应的幅值。进而确定出掩蔽信号。
具体实现过程,还可以参考前述实施例的内容,此处不再赘述。
示例性地,如果第一频点的数量为一个,则可以在第一频点两侧分别选取一个频点,确定为第九频点。具体实现时,第九频点与第一频点之间,频率值可以相差第五预设频率阈值。其中,第五预设频率阈值可以根据实际应用场景的需求设置。例如,可以将第五预设频率阈值设置为5Hz、10Hz或20Hz等。
然后,在各第九频点与第一频点之间,分别选取一个频点,确定为第十频点。即,在每一个第九频点与第一频点之间,各自选取一个频点,确定为第十频点。
之后,可以根据预设人耳掩蔽效应曲线,确定第十频点对应的幅值,幅值的内容可以参考前述实施例的内容,此处不再赘述。最后,可以根据第十频点的频率值和第十频点对应的幅值,生成掩蔽信号。掩蔽信号包括频率为第十频点对应的频率值,且幅值为相应第十频点对应的幅值的信号。
例如,如图15所示,第一频点只有一个,为频点f20,则可以在第一频点f20两侧分别选取一个频点f21和f22,确定为第九频点。
然后,如图16所示,可以分别在第九频点f21与第一频点f20之间,以及第九频点f22与第一频点f20之间,各选取一个频点,分别为频点fa10和fa11,确定为第十频点。
之后,可以分别从预先保存的预设频率值中,找到与第十频点fa10的频率值和第十 频点fa11的频率值相同的预设频率值,然后再分别找到对应的预设幅值,从而确定出第十频点fa10对应的幅值和第十频点fa11对应的幅值。进而确定出掩蔽信号。
具体实现过程,还可以参考前述实施例的内容,此处不再赘述。
如图17所示,根据第一功率谱密度生成粉噪信号,可以按照以下步骤实现:
步骤S501、确定第二平均功率。
其中,第二平均功率是指所有第十一频点的功率谱密度值的平均值。第十一频点是指第一功率谱密度对应的所有频点中,功率谱密度值小于等于第一平均功率的频点。第一平均功率是指第一功率谱密度对应的所有频点的功率谱密度值的平均值。
步骤S502、获取所述第二平均功率对应的预设粉噪带通滤波增益。
具体实现时,可以根据多次通话试验,在电子设备中预先设置一个增益规则表,增益规则表中对应存储有功率(后续称为预设功率)与粉噪带通滤波增益(后续称为预设粉噪带通滤波增益)。
在执行步骤S502时,电子设备可以从预先设置的增益规则表中,找出与第二平均功率相同的预设功率,然后,再从该增益规则表中,找出与该预设功率对应存储的预设粉噪带通滤波增益,将该预设粉噪带通滤波增益确定为第二平均功率对应的预设粉噪带通滤波增益。
步骤S503、调整第一带通滤波器的增益为所述预设粉噪带通滤波增益。
其中,第一带通滤波器预先设置于电子设备中,用于对粉噪信号源输出的信号进行带通滤波。
步骤S504、通过增益调整后的所述第一带通滤波器对粉噪信号源输出的信号进行带通滤波,生成粉噪信号。
需要说明的是,如果在步骤S201中,生成的第一功率谱密度为第一信号的功率谱密度,则在执行步骤S504时,还需要将第一带通滤波器的带宽设置为第一带宽,然后通过带宽为第一带宽,增益为第二平均功率对应的预设粉噪带通滤波增益的第一带通滤波器,对粉噪信号源输出的信号进行带通滤波,生成粉噪信号。
步骤S302、调整所述掩蔽信号和所述粉噪信号至相同时延。
生成掩蔽信号和粉噪信号之后,需要分别对掩蔽信号和粉噪信号进行时延调节,将掩蔽信号和粉噪信号调整至相同的时延。
步骤S303、根据调整至相同时延的所述掩蔽信号和所述粉噪信号,生成第一语音信号。
一种可能的实现方式中,将掩蔽信号和粉噪信号调整至相同时延之后,可以对调整至相同时延后的掩蔽信号和粉噪信号求和,然后将求和后得到的信号确定为第一语音信号。
一种可能的实现方式中,将掩蔽信号和粉噪信号调整至相同时延之后,还可以对调整至相同时延后的掩蔽信号和粉噪信号进行加权求和,然后将加权求和后得到的信号确定为第一语音信号。其中,掩蔽信号和粉噪信号分别对应的权重可以预先通过通话试验确定。
一种可能的实现方式中,将掩蔽信号和粉噪信号调整至相同的时延之后,还可以分别对时延调整后的掩蔽信号和粉噪信号进行增益调整,掩蔽信号对应的增益与粉噪信号对应的增益可以相同,也可以不同,均可以根据通话试验确定,然后预先设置于电子设备中,对掩蔽信号和粉噪信号进行增益调整时,从电子设备中调取即可。
然后,对增益调整后的掩蔽信号和粉噪信号进行求和或加权求和,将求和或加权求和 得到的信号确定为第一语音信号。这样,可以预先根据通话试验,更好地确定合适的掩蔽信号和粉噪信号,使得生成的第一语音信号后续可以更好的掩蔽下行语音信号,在保证用户听清楚通话内容的同时,避免周围环境中的其他人听清楚用户的通话内容,用户体验更好。
按照图6至图17所示的任意一种实施方式,生成的第一语音信号,可以掩蔽掉下行语音信号中功率谱密度值较大的频点的信号,后续同时输出第一语音信号和第二语音信号之后,使得进入用户周围环境中其它人耳中的声音的强度减弱,从而无法听清楚用户的通话内容,可以很好地保护用户的隐私,用户的体验更好。
示例性地,如图18所示,根据第一功率谱密度生成第一语音信号,还可以按照以下步骤实现:
步骤S601、根据第一功率谱密度,确定第一平均功率。
其中,第一平均功率是指第一功率谱密度对应的所有频点的功率谱密度值的平均值。
步骤S602、确定第十二频点。
其中,第十二频点是指第一功率谱密度对应的所有频点中,对应功率谱密度值大于第一平均功率的频点。
步骤S603、按照对应功率谱密度值由大至小的顺序,在所有所述第十二频点中选出第四预设数量的第十二频点,确定为第十三频点。
其中,第四预设数量可以根据实际应用场景的需求设置。例如,可以将第四预设数量设置为3个或5个。
步骤S604、根据所述第十三频点,生成陷波滤波器。
其中,陷波滤波器的陷波频率包括第十三频点的频率值。也就是说,通过该陷波滤波器滤波之后,可以滤掉频率为第十三频点的频率值的信号。
步骤S605、通过所述陷波滤波器对粉噪信号源输出的信号进行陷波滤波,生成第一语音信号。
需要说明的是,如果在步骤S201中,生成的第一功率谱密度为下行语音信号的功率谱密度,则在执行步骤S605时,通过陷波频率包括第十三频点的频率值的陷波滤波器,对粉噪信号源输出的信号进行陷波滤波,将陷波滤波后得到的信号确定为第一语音信号。
如果在步骤S201中,生成的第一功率谱密度为第一信号的功率谱密度,则在执行步骤S605时,还需要将陷波滤波器的带宽设置为第一带宽,然后通过带宽为第一带宽,陷波频率包括第十三频点的频率值的陷波滤波器,对粉噪信号源输出的信号进行陷波滤波,将陷波滤波后得到的信号确定为第一语音信号。
按照图18所示的实施方式,生成的第一语音信号,可以掩蔽掉下行语音信号中除功率谱密度值较大的频点的信号之外的其余大部分信号,后续同时输出第一语音信号和第二语音信号之后,使得进入用户周围环境中其它人耳中的声音缺失大部分频点的信号的信息,从而无法听清楚用户的通话内容,同样可以很好地保护用户的隐私,用户的体验更好。
在其它一些可选的实施例中,电子设备还可以在计算出第一功率谱密度之后,确定第一功率谱密度包含的所有功率谱密度值中,最大的功率谱密度值与第一平均功率之间的差值,是否大于等于预设功率谱密度阈值,如果最大的功率谱密度值与第一平均功率的差值大于等于预设功率谱密度阈值,则通过图18所示的实施方式,生成第一语音信号。或者, 如果最大的功率谱密度值与第一平均功率的差值小于预设功率谱密度阈值,则通过图6至图17所示的任意一种实施方式,生成第一语音信号。其中,预设功率谱密度阈值可以根据实际场景的需求设置。
这样的话,电子设备在实时且连续地获取到下行语音信号之后,可以实时地根据下行语音信号的属性,调整生成第一语音信号的实施方式,从而可以根据下行语音信号的属性,采用更加合适的方式,生成更加适合的干扰信号,在保证用户听清楚通话内容的同时,可以避免用户周围的其他人听清楚用户的通话内容,用户的体验更好。
在其它一些可选的实施例中,根据第一功率谱密度生成第一语音信号,还可以按照下述方式实现:电子设备还可以分别通过图18所示的实施方式,生成一个第一语音信号(在此实施例中,后续可以将该第一语音信号记为第四语音信号),以及通过图6至图17所示的任意一种实施方式,生成一个第一语音信号(在此实施例中,后续可以将该第一语音信号记为第五语音信号)。然后,将第四语音信号与第五语音信号进行求和或加权求和,将求和或加权求和后得到的信号确定为干扰信号。之后,可以将该干扰信号确定为步骤S202中的第一语音信号。
需要说明的是,在其它一些可选的实施例中,在上述任意一个实施例的基础上,在根据下行语音信号生成第一语音信号之前,均可以先对下行语音信号的属性信息进行调整,例如,调整下行语音信号的信噪比和/或增益等属性信息,然后,根据属性信息调整后的下行语音信号生成第一语音信号,本申请对此不进行限制。
步骤S102、生成第二语音信号。
其中,第二语音信号是指对下行语音信号进行时延处理后,得到的与第一语音信号具有相同时延的语音信号。
需要说明的是,在一些可选的实施例中,在对下行语音信号进行时延处理之前,还可以先对下行语音信号的其它属性信息进行调整,例如,调整下行语音信号的信噪比和/或增益等属性信息。然后再对属性信息调整后的下行语音信号进行时延处理,得到与第一语音信号具有相同时延的语音信号,最后,将得到的语音信号确定为第二语音信号。
还需要说明的是,生成第一语音信号时,对下行语音信号的属性信息的调整方式,与生成第二语音信号时,对下行语音信号的属性信息的调整方式,可以相同,也可以不同,本申请对此不进行限制。
步骤S103、在相同的输出时间,分别通过第一发声组件输出所述第二语音信号,以及通过第二发声组件输出所述第一语音信号。
在相同的输出时间,通过第一发声组件输出第二语音信号,以及通过第二发声组件输出第一语音信号后,第一语音信号和第二语音信号均被转化为声音信号后发出。这样,第一语音信号可以对第二语音信号进行掩蔽,即,下行语音信号的干扰信号可以掩蔽下行语音信号。
其中,由于第一发声组件设置于电子设备的第一位置,在用户手持通话时,第一位置靠近用户的耳朵,甚至正对用户的耳朵,第二发声组件设置于远离用户耳朵的第二位置,这样,在用户调大音量的情况下,即使第一语音信号对第二语音信号进行了掩蔽,进入用户耳朵的声音的强度仍然足够大,用户可以清楚地听清楚通话的内容。而在用户周围的其它人,由于第一发声组件发出的声音和第二发声组件发出的声音都离耳朵较远,在第一语 音信号对第二语音信号进行掩蔽的情况下,进入耳朵的声音强度较小,无法听清楚用户的通话内容,从而达到保护用户通话隐私的目的,用户体验较好。
在一些可选的实施例中,电子设备中还可以设置第三发声组件,第三发声组件可以设置于靠近第一位置的第三位置,例如,第一发声组件可以为陶瓷驱动屏幕的发声组件,第二发声组件可以为设置于电子设备底部的扬声器,第三发声组件可以为设置于电子设备顶部的扬声器。此种应用场景下,本申请提供的语音信号的输出方法,还可以包括:生成第三语音信号,第三语音信号是指对下行语音信号进行时延处理后,得到的与第一语音信号具有相同时延的语音信号;在与第一语音信号和第二语音信号相同的输出时间,通过第三发声组件输出第三语音信号。这样,可以通过第三发声组件发出的声音对第一发声组件发出的声音进行补充,使得用户听到的声音更加清楚,提高用户的体验。
可见,本申请实施例提供的语音信号的输出方法中,可以通过根据下行语音信号生成的干扰信号,对下行语音信号进行掩蔽。在用户手持电子设备进行通话时,进入用户耳朵的声音强度足够大,可以保证用户可以清楚地听到通话内容。而对于周围环境中的其他人,由于干扰信号对下行语音信号的掩蔽作用,传到这些人耳朵的声音的强度较小,信息不完整,使得这些人无法听清楚用户的通话内容,从而很好地保护了用户的隐私,用户的体验更好。
本文中描述的各个方法实施例可以为独立的方案,也可以根据内在逻辑进行组合,这些方案都落入本申请的保护范围中。
上述实施例对本申请提供的语音信号的输出方法进行了介绍。下面,对本申请提供的电子设备的实施例进行介绍。可以理解的是,电子设备为了实现上述功能,其包含了执行每一个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应每一个功能划分每一个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
以上,结合图3至图18详细说明了本申请实施例提供的方法。以下,结合图19详细说明本申请实施例提供的电子设备。应理解,电子设备实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。
参见图19,图19为本申请实施例提供的一种电子设备的结构框图,如图19所示,该电子设备1900包括第一发声组件1901和第二发声组件1902,第一发声组件1901设置于电子设备1900的第一位置,用户手持所述电子设备通话时,所述第一位置靠近所述用户的耳朵,第二发声组件1902设置于与所述第一位置不同的第二位置,所述电子设备1900还包括存储器1903和处理器1904,所述存储器1903和所述处理器1904耦合;所述存储器1903用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器 1904执行所述计算机指令时,使所述电子设备1900执行上述图3至图18任意一个实施例所述的语音信号的输出方法。该电子设备1900可以执行上述方法实施例的操作。
例如,在本申请一种可选的实施例中,处理器1904可以用于:生成第一语音信号,所述第一语音信号是指根据下行语音信号生成的干扰信号;生成第二语音信号,所述第二语音信号是指对所述下行语音信号进行时延处理后,得到的与所述第一语音信号具有相同时延的语音信号;在相同的输出时间,分别通过所述第一发声组件输出所述第二语音信号,以及通过所述第二发声组件输出所述第一语音信号。
一种可能的实现方式中,所述处理器1904用于生成第一语音信号,具体为:所述处理器1904用于,生成第一功率谱密度,所述第一功率谱密度是指根据所述下行语音信号计算得到的功率谱密度;根据所述第一功率谱密度生成所述第一语音信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一功率谱密度生成所述第一语音信号,具体为:所述处理器1904用于,根据所述第一功率谱密度生成掩蔽信号和粉噪信号;调整所述掩蔽信号和所述粉噪信号至相同时延;根据调整至相同时延的所述掩蔽信号和所述粉噪信号,生成所述第一语音信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一功率谱密度生成掩蔽信号,具体为:所述处理器1904用于,根据所述第一功率谱密度,确定第一平均功率,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;确定第一频点,所述第一频点是指所述第一功率谱密度对应的所有频点中,对应功率谱密度值大于所述第一平均功率的频点;根据所述第一频点生成所述掩蔽信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一频点生成所述掩蔽信号,具体为:所述处理器1904用于,如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差小于第一预设频率阈值,则按照对应功率谱密度值由大至小的顺序,在所有所述第一频点中选出第一预设数量的第一频点,确定为第二频点;确定第三频点,所述第三频点位于相邻两个所述第二频点之间;根据预设人耳掩蔽效应曲线,确定所述第三频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第三频点的频率值和所述第三频点对应的幅值,生成所述掩蔽信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一频点生成所述掩蔽信号,具体为:所述处理器1904用于,如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在所有所述第一频点中选出对应功率谱密度值最大的第一频点、频率值最大的第一频点和频率值最小的第一频点,确定为第四频点;在各所述第四频点附近,与各所述第四频点之间,频率值相差小于等于第二预设频率阈值处,分别选取一个频点,确定为第五频点;根据预设人耳掩蔽效应曲线,确定所述第五频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第五频点的频率值和所述第五频点对应的幅值,生成所述掩蔽信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一频点生成所述掩蔽信号,具体为:所述处理器1904用于,如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在各频点区间中,按照对应功率谱密度值由大至小的顺序,选出第二预设数量的第一频点,确定为第六频点,其中,各所述频点区间的终止频点与起始频点之间,频率 值相差小于等于第三预设频率阈值,且各所述频点区间包括的第一频点的数量大于等于第三预设数量;确定各所述频点区间对应的第七频点,所述第七频点位于相应频点区间中相邻两个所述第六频点之间;根据预设人耳掩蔽效应曲线,确定所述第七频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第七频点的频率值和所述第七频点对应的幅值,生成所述掩蔽信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一频点生成所述掩蔽信号,具体为:所述处理器1904用于,如果所述第一频点的数量为一个,则在所述第一频点两侧分别选取一个频点,确定为第八频点;根据预设人耳掩蔽效应曲线,确定所述第八频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第八频点的频率值和所述第八频点对应的幅值,生成所述掩蔽信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一频点生成所述掩蔽信号,具体为:所述处理器1904用于,如果所述第一频点的数量为一个,则在所述第一频点两侧分别选取一个频点,确定为第九频点;在各所述第九频点与所述第一频点之间,分别选取一个频点,确定为第十频点;根据预设人耳掩蔽效应曲线,确定所述第十频点对应的幅值,所述幅值用于表征信号的强弱;根据所述第十频点的频率值和所述第十频点对应的幅值,生成所述掩蔽信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一功率谱密度生成粉噪信号,具体为:所述处理器1904用于确定第二平均功率,所述第二平均功率是指所有第十一频点的功率谱密度值的平均值,所述第十一频点是指所述第一功率谱密度对应的所有频点中,功率谱密度值小于等于第一平均功率的频点,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;获取所述第二平均功率对应的预设粉噪带通滤波增益;调整第一带通滤波器的增益为所述预设粉噪带通滤波增益;通过增益调整后的所述第一带通滤波器对粉噪信号源输出的信号进行带通滤波,生成所述粉噪信号。
一种可能的实现方式中,所述处理器1904用于根据所述第一功率谱密度生成所述第一语音信号,具体为:所述处理器1904用于根据所述第一功率谱密度,确定第一平均功率,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;确定第十二频点,所述第十二频点是指所述第一功率谱密度对应的所有频点中,对应功率谱密度值大于所述第一平均功率的频点;按照对应功率谱密度值由大至小的顺序,在所有所述第十二频点中选出第四预设数量的第十二频点,确定为第十三频点;根据所述第十三频点,生成陷波滤波器,所述陷波滤波器的陷波频率包括所述第十三频点的频率值;通过所述陷波滤波器对粉噪信号源输出的信号进行陷波滤波,生成所述第一语音信号。
一种可能的实现方式中,所述处理器1904用于生成第一功率谱密度,具体为:所述处理器1904用于,通过第二带通滤波器对所述下行语音信号进行带通滤波,得到第一带宽范围内的第一信号;所述第一带宽为所述第二带通滤波器的带宽;计算所述第一信号的功率谱密度;确定所述第一信号的功率谱密度为所述第一功率谱密度。
一种可能的实现方式中,所述电子设备1900还包括第三发声组件,所述第三发声组件设置于靠近所述第一位置的第三位置,所述处理器1904还用于:生成第三语音信号,所述第三语音信号是指对所述下行语音信号进行时延处理后,得到的与所述第一语音信号具有相同时延的语音信号;在所述相同的输出时间,通过所述第三发声组件输出所述第三 语音信号。
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
应注意,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
根据本申请实施例提供的方法,本申请实施例还提供一种计算机程序产品,该计算机程序产品包括:计算机程序或指令,当该计算机程序或指令在计算机上运行时,使得该计算机执行方法实施例中任意一个实施例的方法。
根据本申请实施例提供的方法,本申请实施例还提供一种计算机存储介质,该计算机存储介质存储有计算机程序或指令,当该计算机程序或指令在计算机上运行时,使得该计算机执行方法实施例中任意一个实施例的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所 描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的电子设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的电子设备实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
上述本申请实施例提供的电子设备、计算机存储介质、计算机程序产品均用于执行上文所提供的方法,因此,其所能达到的有益效果可参考上文所提供的方法对应的有益效果,在此不再赘述。
应理解,在本申请的各个实施例中,各步骤的执行顺序应以其功能和内在逻辑确定,各步骤序号的大小并不意味着执行顺序的先后,不对实施例的实施过程构成限定。
本说明书的各个部分均采用递进的方式进行描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点介绍的都是与其他实施例不同之处。尤其,对于电子设备、计算机存储介质、计算机程序产品的实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例中的说明即可。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
以上所述的本申请实施方式并不构成对本申请保护范围的限定。

Claims (15)

  1. 一种语音信号的输出方法,其特征在于,所述方法用于电子设备,所述电子设备包括第一发声组件和第二发声组件,所述第一发声组件设置于所述电子设备的第一位置,用户手持所述电子设备通话时,所述第一位置靠近所述用户的耳朵,所述第二发声组件设置于与所述第一位置不同的第二位置,所述方法包括:
    生成第一语音信号,所述第一语音信号是指根据下行语音信号生成的干扰信号;
    生成第二语音信号,所述第二语音信号是指对所述下行语音信号进行时延处理后,得到的与所述第一语音信号具有相同时延的语音信号;
    在相同的输出时间,分别通过所述第一发声组件输出所述第二语音信号,以及通过所述第二发声组件输出所述第一语音信号。
  2. 根据权利要求1所述的方法,其特征在于,所述生成第一语音信号,包括:
    生成第一功率谱密度,所述第一功率谱密度是指根据所述下行语音信号计算得到的功率谱密度;
    根据所述第一功率谱密度生成所述第一语音信号。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述第一功率谱密度生成所述第一语音信号,包括:
    根据所述第一功率谱密度生成掩蔽信号和粉噪信号;
    调整所述掩蔽信号和所述粉噪信号至相同时延;
    根据调整至相同时延的所述掩蔽信号和所述粉噪信号,生成所述第一语音信号。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述第一功率谱密度生成掩蔽信号,包括:
    根据所述第一功率谱密度,确定第一平均功率,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;
    确定第一频点,所述第一频点是指所述第一功率谱密度对应的所有频点中,对应功率谱密度值大于所述第一平均功率的频点;
    根据所述第一频点生成所述掩蔽信号。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第一频点生成所述掩蔽信号,包括:
    如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差小于第一预设频率阈值,则按照对应功率谱密度值由大至小的顺序,在所有所述第一频点中选出第一预设数量的第一频点,确定为第二频点;
    确定第三频点,所述第三频点位于相邻两个所述第二频点之间;
    根据预设人耳掩蔽效应曲线,确定所述第三频点对应的幅值,所述幅值用于表征信号的强弱;
    根据所述第三频点的频率值和所述第三频点对应的幅值,生成所述掩蔽信号。
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述第一频点生成所述掩蔽信号,包括:
    如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点 与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在所有所述第一频点中选出对应功率谱密度值最大的第一频点、频率值最大的第一频点和频率值最小的第一频点,确定为第四频点;
    在各所述第四频点附近,与各所述第四频点之间,频率值相差小于等于第二预设频率阈值处,分别选取一个频点,确定为第五频点;
    根据预设人耳掩蔽效应曲线,确定所述第五频点对应的幅值,所述幅值用于表征信号的强弱;
    根据所述第五频点的频率值和所述第五频点对应的幅值,生成所述掩蔽信号。
  7. 根据权利要求4所述的方法,其特征在于,所述根据所述第一频点生成所述掩蔽信号,包括:
    如果所述第一频点的数量为多个,且所有所述第一频点中频率值最大的第一频点与频率值最小的第一频点之间,频率值相差大于等于第一预设频率阈值,则在各频点区间中,按照对应功率谱密度值由大至小的顺序,选出第二预设数量的第一频点,确定为第六频点,其中,各所述频点区间的终止频点与起始频点之间,频率值相差小于等于第三预设频率阈值,且各所述频点区间包括的第一频点的数量大于等于第三预设数量;
    确定各所述频点区间对应的第七频点,所述第七频点位于相应频点区间中相邻两个所述第六频点之间;
    根据预设人耳掩蔽效应曲线,确定所述第七频点对应的幅值,所述幅值用于表征信号的强弱;
    根据所述第七频点的频率值和所述第七频点对应的幅值,生成所述掩蔽信号。
  8. 根据权利要求4所述的方法,其特征在于,所述根据所述第一频点生成所述掩蔽信号,包括:
    如果所述第一频点的数量为一个,则在所述第一频点两侧分别选取一个频点,确定为第八频点;
    根据预设人耳掩蔽效应曲线,确定所述第八频点对应的幅值,所述幅值用于表征信号的强弱;
    根据所述第八频点的频率值和所述第八频点对应的幅值,生成所述掩蔽信号。
  9. 根据权利要求4所述的方法,其特征在于,所述根据所述第一频点生成所述掩蔽信号,包括:
    如果所述第一频点的数量为一个,则在所述第一频点两侧分别选取一个频点,确定为第九频点;
    在各所述第九频点与所述第一频点之间,分别选取一个频点,确定为第十频点;
    根据预设人耳掩蔽效应曲线,确定所述第十频点对应的幅值,所述幅值用于表征信号的强弱;
    根据所述第十频点的频率值和所述第十频点对应的幅值,生成所述掩蔽信号。
  10. 根据权利要求3所述的方法,其特征在于,所述根据所述第一功率谱密度生成粉噪信号,包括:
    确定第二平均功率,所述第二平均功率是指所有第十一频点的功率谱密度值的平 均值,所述第十一频点是指所述第一功率谱密度对应的所有频点中,功率谱密度值小于等于第一平均功率的频点,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;
    获取所述第二平均功率对应的预设粉噪带通滤波增益;
    调整第一带通滤波器的增益为所述预设粉噪带通滤波增益;
    通过增益调整后的所述第一带通滤波器对粉噪信号源输出的信号进行带通滤波,生成所述粉噪信号。
  11. 根据权利要求2所述的方法,其特征在于,所述根据所述第一功率谱密度生成所述第一语音信号,包括:
    根据所述第一功率谱密度,确定第一平均功率,所述第一平均功率是指所述第一功率谱密度对应的所有频点的功率谱密度值的平均值;
    确定第十二频点,所述第十二频点是指所述第一功率谱密度对应的所有频点中,对应功率谱密度值大于所述第一平均功率的频点;
    按照对应功率谱密度值由大至小的顺序,在所有所述第十二频点中选出第四预设数量的第十二频点,确定为第十三频点;
    根据所述第十三频点,生成陷波滤波器,所述陷波滤波器的陷波频率包括所述第十三频点的频率值;
    通过所述陷波滤波器对粉噪信号源输出的信号进行陷波滤波,生成所述第一语音信号。
  12. 根据权利要求2所述的方法,其特征在于,所述生成第一功率谱密度,包括:
    通过第二带通滤波器对所述下行语音信号进行带通滤波,得到第一带宽范围内的第一信号;所述第一带宽为所述第二带通滤波器的带宽;
    计算所述第一信号的功率谱密度;
    确定所述第一信号的功率谱密度为所述第一功率谱密度。
  13. 根据权利要求1至12中任意一项所述的方法,其特征在于,所述电子设备还包括第三发声组件,所述第三发声组件设置于靠近所述第一位置的第三位置,所述方法还包括:
    生成第三语音信号,所述第三语音信号是指对所述下行语音信号进行时延处理后,得到的与所述第一语音信号具有相同时延的语音信号;
    在所述相同的输出时间,通过所述第三发声组件输出所述第三语音信号。
  14. 一种电子设备,其特征在于,所述电子设备包括第一发声组件和第二发声组件,所述第一发声组件设置于所述电子设备的第一位置,用户手持所述电子设备通话时,所述第一位置靠近所述用户的耳朵,所述第二发声组件设置于与所述第一位置不同的第二位置,所述电子设备还包括存储器和处理器,所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使所述电子设备执行如权利要求1-13中任一项所述的方法。
  15. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有计算机程序或指令,当所述计算机程序或指令被执行时,如权利要求1-13中任一项所述的方法 被执行。
PCT/CN2023/091095 2022-08-11 2023-04-27 一种语音信号的输出方法和电子设备 WO2024032035A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210960657.7A CN116320123B (zh) 2022-08-11 2022-08-11 一种语音信号的输出方法和电子设备
CN202210960657.7 2022-08-11

Publications (2)

Publication Number Publication Date
WO2024032035A1 true WO2024032035A1 (zh) 2024-02-15
WO2024032035A9 WO2024032035A9 (zh) 2024-04-18

Family

ID=86801906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091095 WO2024032035A1 (zh) 2022-08-11 2023-04-27 一种语音信号的输出方法和电子设备

Country Status (2)

Country Link
CN (1) CN116320123B (zh)
WO (1) WO2024032035A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050091040A1 (en) * 2003-01-09 2005-04-28 Nam Young H. Preprocessing of digital audio data for improving perceptual sound quality on a mobile phone
JP2014146941A (ja) * 2013-01-29 2014-08-14 Pioneer Electronic Corp ノイズ低減装置、放送受信装置及びノイズ低減方法
CN113497849A (zh) * 2020-03-20 2021-10-12 华为技术有限公司 一种声音的掩蔽方法、装置及终端设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575486B (zh) * 2014-12-25 2019-04-02 中国科学院信息工程研究所 基于声掩蔽原理的声泄漏防护方法及系统
EP3048608A1 (en) * 2015-01-20 2016-07-27 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. Speech reproduction device configured for masking reproduced speech in a masked speech zone
CN105872275B (zh) * 2016-03-22 2019-10-11 Tcl集团股份有限公司 一种用于回声消除的语音信号时延估计方法及系统
CN109727605B (zh) * 2018-12-29 2020-06-12 苏州思必驰信息科技有限公司 处理声音信号的方法及系统
CN113129916B (zh) * 2019-12-30 2024-04-12 华为技术有限公司 一种音频采集方法、系统及相关装置
CN111524498B (zh) * 2020-04-10 2023-06-16 维沃移动通信有限公司 滤波方法、装置及电子设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050091040A1 (en) * 2003-01-09 2005-04-28 Nam Young H. Preprocessing of digital audio data for improving perceptual sound quality on a mobile phone
JP2014146941A (ja) * 2013-01-29 2014-08-14 Pioneer Electronic Corp ノイズ低減装置、放送受信装置及びノイズ低減方法
CN113497849A (zh) * 2020-03-20 2021-10-12 华为技术有限公司 一种声音的掩蔽方法、装置及终端设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAN BAI, CAO PENG: "Capacity Estimation Research for Speech Information Hiding", JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, GAI-KAN BIANJIBU, BEIJING, CN, no. S1, 1 June 2016 (2016-06-01), CN , pages 76 - 80, XP093139834, ISSN: 1007-5321, DOI: 10.13190/j.jbupt.2016.s.018 *
O. YILMAZ, S. RICKARD: "Blind Separation of Speech Mixtures via Time-Frequency Masking", IEEE TRANSACTIONS ON SIGNAL PROCESSING, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, vol. 52, no. 7, 1 July 2004 (2004-07-01), pages 1830 - 1847, XP055150683, ISSN: 1053587X, DOI: 10.1109/TSP.2004.828896 *

Also Published As

Publication number Publication date
CN116320123B (zh) 2024-03-08
CN116320123A (zh) 2023-06-23
WO2024032035A9 (zh) 2024-04-18

Similar Documents

Publication Publication Date Title
EP3974970A1 (en) Full-screen display method for mobile terminal, and apparatus
WO2020062159A1 (zh) 无线充电方法及电子设备
US20230053104A1 (en) Method for implementing stereo output and terminal
US20220174143A1 (en) Message notification method and electronic device
WO2021083128A1 (zh) 一种声音处理方法及其装置
US20230189366A1 (en) Bluetooth Communication Method, Terminal Device, and Computer-Readable Storage Medium
CN111602379A (zh) 语音通话方法、电子设备及系统
CN113571035B (zh) 降噪方法及降噪装置
CN114827581A (zh) 同步时延测量方法、内容同步方法、终端设备及存储介质
CN114115770A (zh) 显示控制的方法及相关装置
CN113141483B (zh) 基于视频通话的共享屏幕方法及移动设备
CN111065020B (zh) 音频数据处理的方法和装置
WO2022257563A1 (zh) 一种音量调节的方法,电子设备和系统
CN113438364B (zh) 振动调节方法、电子设备、存储介质
US20240178771A1 (en) Method and apparatus for adjusting vibration waveform of linear motor
CN113923372B (zh) 曝光调整方法及相关设备
WO2021052408A1 (zh) 一种电子设备显示方法及电子设备
WO2024032035A1 (zh) 一种语音信号的输出方法和电子设备
CN114449393B (zh) 一种声音增强方法、耳机控制方法、装置及耳机
CN115695640A (zh) 一种防关机保护方法及电子设备
WO2024046416A1 (zh) 一种音量调节方法、电子设备及系统
CN113678481A (zh) 无线音频系统、音频通讯方法及设备
WO2022242299A1 (zh) 驱动波形的调整方法及装置、电子设备、可读存储介质
WO2024027259A1 (zh) 信号处理方法及装置、设备控制方法及装置
WO2023020420A1 (zh) 音量显示方法、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851265

Country of ref document: EP

Kind code of ref document: A1