CN111863004A

CN111863004A - Sound signal processing method, system, electronic device and storage medium

Info

Publication number: CN111863004A
Application number: CN202010744551.4A
Authority: CN
Inventors: 马洪刚; 冯亚东
Original assignee: Shanghai Simcom Wireless Solutions Co Ltd
Current assignee: Shanghai Simcom Wireless Solutions Co Ltd
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2020-10-30

Abstract

The invention discloses a method and a system for processing a sound signal, electronic equipment and a storage medium. The processing method comprises the following steps: monitoring an external sound signal; judging whether the external sound signal meets a trigger condition; if so, raising the frequency of the sound signal to be played; and playing the sound signal to be played after the frequency is raised. The invention can monitor the external sound signal in real time in the playing process of the sound signal, and when the external sound signal meets the triggering condition, for example, when the playing environment turns from quiet to noisy, the frequency of the sound signal to be played is automatically raised, and then the sound signal to be played is played, so that a user can more clearly listen to the content of the played sound signal, and the definition of the played content is automatically enhanced.

Description

Sound signal processing method, system, electronic device and storage medium

Technical Field

The present invention relates to the field of audio signal processing technologies, and in particular, to a method and a system for processing an audio signal, an electronic device, and a storage medium.

Background

Currently, in the playing process of a sound signal (for example, in the process of a voice call, for example, in the process of playing music, etc.), when the playing environment is changed from quiet to noisy, in order to listen to the playing content more clearly, a common practice includes increasing the playing volume, wearing an earphone, changing to a quiet environment, etc., and the clarity of the playing content cannot be automatically enhanced.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method and a system for processing a sound signal, an electronic device and a storage medium, aiming at overcoming the defect that the definition of played content cannot be automatically enhanced when the playing environment is changed from quiet to noisy in the prior art.

The invention solves the technical problems through the following technical scheme:

a method of processing a sound signal, comprising:

monitoring an external sound signal;

judging whether the external sound signal meets a trigger condition;

if so, raising the frequency of the sound signal to be played;

and playing the sound signal to be played after the frequency is raised.

Preferably, the external sound signal includes an external noise signal, and the step of determining whether the external sound signal satisfies a trigger condition includes:

identifying an external noise signal in the external sound signal;

judging whether the volume of the external noise is greater than a first threshold value according to the external noise signal;

if yes, determining that the external sound signal meets a trigger condition;

and/or, the external sound signal includes an external voice signal, and the step of judging whether the external sound signal satisfies the trigger condition includes:

recognizing an external voice signal in the external sound signal;

judging whether the difference between the volume of the external voice at the current sampling moment and the volume of the external voice at the previous sampling moment is larger than a second threshold value or not according to the external voice signal;

and if so, determining that the external sound signal meets the trigger condition.

Preferably, the external sound signal includes an external voice signal, and the step of determining whether the external sound signal satisfies the trigger condition includes:

recognizing an external voice signal in the external sound signal;

converting the external voice signal into text;

judging whether the characters comprise trigger keywords or not;

Preferably, the step of monitoring the external sound signal includes monitoring the external sound signal during the voice call;

the sound signal to be played comprises a voice call signal.

A system for processing a sound signal, comprising:

the monitoring module is used for monitoring external sound signals;

the judging module is used for judging whether the external sound signal meets a triggering condition;

if so, calling a lifting module, wherein the lifting module is used for lifting the frequency of the sound signal to be played;

and the playing module is used for playing the sound signal to be played after the frequency is raised.

Preferably, the external sound signal includes an external noise signal, and the determining module includes:

a first recognition unit for recognizing an external noise signal in the external sound signal;

a first judgment unit for judging whether the volume of the external noise is larger than a first threshold value according to the external noise signal;

if yes, calling a determining unit, wherein the determining unit is used for determining that the external sound signal meets a trigger condition;

and/or, the external sound signal includes an external voice signal, and the judging module includes:

a second recognition unit for recognizing an external voice signal among the external sound signals;

the second judgment unit is used for judging whether the difference between the volume of the external voice at the current sampling moment and the volume of the external voice at the previous sampling moment is larger than a second threshold value or not according to the external voice signal;

and if so, calling a determining unit, wherein the determining unit is used for determining that the external sound signal meets the triggering condition.

Preferably, the external sound signal includes an external voice signal, and the determining module includes:

a conversion unit for converting the external voice signal into a text;

a third judging unit, configured to judge whether the text includes a trigger keyword;

Preferably, the monitoring module is specifically configured to monitor an external sound signal during a voice call;

the sound signal to be played comprises a voice call signal;

the playing module comprises an earphone and/or a loudspeaker.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements any one of the above sound signal processing methods when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of any of the above-mentioned sound signal processing methods.

The positive progress effects of the invention are as follows: the invention can monitor the external sound signal in real time in the playing process of the sound signal, and when the external sound signal meets the triggering condition, for example, when the playing environment turns from quiet to noisy, the frequency of the sound signal to be played is automatically raised, and then the sound signal to be played is played, so that a user can more clearly listen to the content of the played sound signal, and the definition of the played content is automatically enhanced.

Drawings

Fig. 1 is a flowchart of a processing method of a sound signal according to embodiment 1 of the present invention.

Fig. 2 is a detailed flowchart of a method for processing an audio signal according to embodiment 1 of the present invention.

Fig. 3 is a schematic diagram of frequency increase in a processing method of an acoustic signal according to embodiment 1 of the present invention.

Fig. 4 is a block diagram of a system for processing an audio signal according to embodiment 2 of the present invention.

Fig. 5 is a schematic structural diagram of an electronic device according to embodiment 3 of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

Referring to fig. 1, the processing method of the present embodiment includes:

s101, monitoring an external sound signal;

s102, judging whether an external sound signal meets a trigger condition;

if yes, go to step S103;

s103, raising the frequency of the sound signal to be played;

and S104, playing the sound signal to be played after the frequency is raised.

The processing method of this embodiment may be applicable to a voice call process, where the to-be-played sound signal may include a voice call signal, and the step S101 may specifically include monitoring an external sound signal during the voice call process. It should be understood that the processing method of the present embodiment may also be applied to the playing process of sound media such as music.

Specifically, in this embodiment, the external sound signal may include an external noise signal and an external voice signal, for example, when the sound signal to be played includes a voice call signal, the external noise signal may include a sound signal of an environment in which the voice call device is located, which is monitored by a microphone of the voice call device, and the external voice signal may include a sound signal input by a user of the voice call device through the microphone of the voice call device.

In this embodiment, the triggering condition for triggering the processing of the sound signal to be played may be set by a user according to an actual application, for example, the triggering condition may include: the volume of the external noise is greater than the preset threshold, the volume of the external voice is suddenly increased, the content of the external voice indicates that a user of the voice communication device cannot hear the playing content, and the like, wherein different triggering conditions may be triggered in parallel, or may be triggered sequentially according to a preset sequence, and this embodiment does not limit this.

Specifically, referring to fig. 2, step S102 in this embodiment may include:

s1021, recognizing an external noise signal in the external sound signal;

s1022, judging whether the volume of the external noise is larger than a first threshold value or not according to the external noise signal;

if yes, go to step S1023;

and S1023, determining that the external sound signal meets the trigger condition.

The first threshold may be set in a self-defined manner according to the actual application, for example, the value of the first threshold may be 50 db.

Step S102 may further include:

s1024, recognizing an external voice signal in the external voice signal;

s1025, judging whether the difference between the volume of the external voice at the current sampling moment and the volume at the previous sampling moment is larger than a second threshold value or not according to the external voice signal;

if yes, go to step S1023.

The second threshold may be set in a user-defined manner according to the actual application, for example, the value of the second threshold may be 6 db.

After step S1024, the method may further include:

s1026, converting the external voice signal into characters;

s1027, judging whether the characters comprise trigger keywords;

if yes, go to step S1023.

The trigger keywords can be set according to the actual application, for example, "hear unclean", "what", "feed", "click", and the like.

In this embodiment, if it is determined in step S102 that the external sound signal does not satisfy the trigger condition, the frequency raising process is not performed on the sound signal to be played.

In this embodiment, referring to fig. 3, after the frequency raising process, the frequency of the sound signal to be played is raised, and the volume is not changed, that is, the sound signal is translated toward the direction of increasing the frequency. After the frequency of the sound signal to be played is increased, the ear is more sensitive to the sound with higher frequency, so that the ear can more clearly listen to the content of the played sound signal, and the definition of the played content is enhanced.

Specifically, when the processing method of the present embodiment is applied to a voice call process, although the increase in frequency may result in the reduction of sound penetration, the distance factor of sound propagation may be ignored because the distance between the user of the voice call apparatus and the voice call apparatus is short during the voice call. In the voice communication process, when the triggering condition is met, namely, when the user of the voice communication equipment is judged to possibly not be able to clearly hear the content spoken by the other party, the real-time frequency lifting processing of the sound signal to be played is carried out through the voice communication equipment, so that the user of the voice communication equipment can hear the sound signal after the real-time frequency lifting processing, and the user of the voice communication equipment can clearly hear the content spoken by the other party.

Thus, the voice call signal inputted by the user of the voice call apparatus (user A) is processed before being transmitted to the other party (user B) by means of the double-microphone noise reduction and voice noise/echo cancellation algorithm, to solve the echo and noise generated in the voice communication process, so that the voice communication content of the user (user A) of the voice communication equipment can be clearly transmitted to the opposite side (user B), and the frequency of the voice communication signal input by the opposite side (user B) can be raised in real time under the condition that the user (user A) of the voice communication equipment can not hear the content spoken by the opposite side (user B), therefore, a user (user A) of the voice communication equipment can clearly listen to the content spoken by the other party (user B), and the voice communication quality in the voice communication process is further improved.

Example 2

Referring to fig. 4, the processing system of this embodiment includes:

a monitoring module 201, configured to monitor an external sound signal;

the judging module 202 is configured to judge whether the external sound signal satisfies a trigger condition;

if yes, calling the lifting module 203;

the lifting module 203 lifts the frequency of the sound signal to be played;

the playing module 204 plays the audio signal to be played after the frequency is raised.

The processing system of this embodiment may be applied to a voice call device, wherein the to-be-played sound signal may include a voice call signal, and the monitoring module 201 may include a microphone of the voice call device, and may be specifically configured to monitor an external sound signal during a voice call. It should be understood that the processing system of the present embodiment may also be applied to a playing device for sound media such as music.

Specifically, in this embodiment, the external sound signal may include an external noise signal and an external voice signal, for example, when the sound signal to be played includes a voice call signal, the external noise signal may include a sound signal of an environment where the voice call device is located, which is monitored by a microphone of the voice call device, the external voice signal may include a sound signal input by a user of the voice call device through the microphone of the voice call device, and the playing module 204 may include at least one of an earphone and a speaker of the voice call device.

Specifically, referring to fig. 4, the determining module 202 in this embodiment may include:

a first identifying unit 2021 for identifying an external noise signal in the external sound signal;

a first judging unit 2022, configured to judge whether the volume of the external noise is greater than a first threshold according to the external noise signal;

if yes, the determination unit 2023 is invoked;

a determining unit 2023, configured to determine that the external sound signal satisfies the trigger condition.

The determining module 202 may further include:

a second recognition unit 2024 for recognizing an external voice signal in the external sound signal;

a second determining unit 2025, configured to determine, according to the external voice signal, whether a difference between a volume of the external voice at the current sampling time and a volume of the external voice at the previous sampling time is greater than a second threshold;

if so, determination unit 2023 is invoked.

The determining module 202 may further include:

a conversion unit 2026 for converting the external voice signal into text;

a third judging unit 2027, configured to judge whether the text includes a trigger keyword;

if so, determination unit 2023 is invoked.

In this embodiment, if the determining module 202 determines that the external sound signal does not satisfy the trigger condition, the raising module 203 is not called to perform the frequency raising process on the sound signal to be played.

Specifically, when the processing system of the present embodiment is applied to a voice call apparatus, although the increase in frequency leads to the reduction in sound penetration, since the user of the voice call apparatus is close to the voice call apparatus at the time of voice call, the distance factor of sound propagation can be ignored. In the voice communication process, when the triggering condition is met, namely, when the user of the voice communication equipment is judged to possibly not be able to clearly hear the content spoken by the other party, the real-time frequency lifting processing of the sound signal to be played is carried out through the voice communication equipment, so that the user of the voice communication equipment can hear the sound signal after the real-time frequency lifting processing, and the user of the voice communication equipment can clearly hear the content spoken by the other party.

Example 3

The present embodiment provides an electronic device, which may be represented in the form of a computing device (for example, may be a server device), and includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for processing the sound signal provided in embodiment 1.

Fig. 5 shows a schematic diagram of a hardware structure of the present embodiment, and as shown in fig. 5, the electronic device 9 specifically includes:

at least one processor 91, at least one memory 92, and a bus 93 for connecting the various system components (including the processor 91 and the memory 92), wherein:

the bus 93 includes a data bus, an address bus, and a control bus.

Memory 92 includes volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and can further include Read Only Memory (ROM) 923.

Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 91 executes various functional applications and data processing, such as a processing method of a sound signal provided in embodiment 1 of the present invention, by running the computer program stored in the memory 92.

The electronic device 9 may further communicate with one or more external devices 94 (e.g., a keyboard, a pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 9 via the bus 93. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 4

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the processing method of sound signals provided in embodiment 1.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation, the invention can also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the method for processing sound signals described in embodiment 1, when said program product is run on said terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A method for processing a sound signal, comprising:

monitoring an external sound signal;

judging whether the external sound signal meets a trigger condition;

if so, raising the frequency of the sound signal to be played;

and playing the sound signal to be played after the frequency is raised.

2. The method for processing the sound signal according to claim 1, wherein the external sound signal includes an external noise signal, and the step of determining whether the external sound signal satisfies a trigger condition includes:

identifying an external noise signal in the external sound signal;

if yes, determining that the external sound signal meets a trigger condition;

recognizing an external voice signal in the external sound signal;

3. The method for processing the sound signal according to claim 1, wherein the external sound signal includes an external voice signal, and the step of determining whether the external sound signal satisfies a trigger condition includes:

recognizing an external voice signal in the external sound signal;

converting the external voice signal into text;

judging whether the characters comprise trigger keywords or not;

4. The method for processing the audio signal according to claim 1, wherein the step of monitoring the external audio signal includes monitoring the external audio signal during a voice call;

the sound signal to be played comprises a voice call signal.

5. A system for processing a sound signal, comprising:

the monitoring module is used for monitoring external sound signals;

6. The sound signal processing system according to claim 5, wherein the external sound signal includes an external noise signal, and the determining module includes:

7. The sound signal processing system according to claim 5, wherein the external sound signal includes an external voice signal, and the determining module includes:

a conversion unit for converting the external voice signal into a text;

8. The system for processing the sound signal according to claim 5, wherein the monitoring module is specifically configured to monitor an external sound signal during a voice call;

the sound signal to be played comprises a voice call signal;

the playing module comprises an earphone and/or a loudspeaker.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of processing a sound signal according to any one of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of processing a sound signal according to any one of claims 1 to 4.