CN112216302A

CN112216302A - Audio signal processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN112216302A
Application number: CN202010941131.5A
Authority: CN
Inventors: 周德海
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2021-01-12
Anticipated expiration: 2040-09-09
Also published as: CN112216302B

Abstract

The application provides an audio signal processing method, an audio signal processing device, an electronic device and a readable storage medium, wherein the audio signal processing method comprises the following steps: acquiring an input audio signal and user interaction information corresponding to the audio signal; determining a target user usage scene corresponding to the audio signal according to the user interaction information; and determining a target adjusting parameter value of the audio signal according to the target user using scene, and performing corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal. By adopting the method, the error identification condition of the audio signal can be reduced.

Description

Audio signal processing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of signal processing technologies, and in particular, to an audio signal processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

When a user performs a voice input, the electronic device generally needs to perform enhancement processing on an audio signal. When the user triggers the voice input, but the user does not speak, and other people chat nearby, the electronic device can recognize the chat content of the chat of the other people, and the user does not actually want to recognize the chat content of the other people. The traditional audio signal processing method has the problem of false recognition.

Disclosure of Invention

The embodiment of the application provides an audio signal processing method, an audio signal processing device, electronic equipment and a computer readable storage medium, which can reduce the error identification condition of an audio signal.

An audio signal processing method applied to an electronic device comprises the following steps:

acquiring an input audio signal and user interaction information corresponding to the audio signal;

determining a target user usage scene corresponding to the audio signal according to the user interaction information;

and determining a target adjusting parameter value of the audio signal according to the target user using scene, and performing corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal.

An audio signal processing apparatus comprising:

the information acquisition module is used for acquiring an input audio signal and user interaction information corresponding to the audio signal;

the scene determining module is used for determining a target user usage scene corresponding to the audio signal according to the user interaction information;

and the signal adjusting module is used for determining a target adjusting parameter value of the audio signal according to the target user using scene, and performing corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal.

An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the audio signal processing method, the audio signal processing device, the electronic equipment and the readable storage medium, the input audio signal and the user interaction information corresponding to the audio signal are obtained, and the target user use scene corresponding to the audio signal is determined according to the user interaction information, namely the voice input of a user in which scene is distinguished; and the adjusting parameter value of the audio signal is determined according to the use scene of the target user, the audio signal under different scenes can be processed in a targeted manner, the accuracy of audio signal processing is improved, and the condition of error identification of the audio signal is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow diagram of a method of audio signal processing in one embodiment;

FIG. 2 is a diagram illustrating a first usage scenario in one embodiment;

FIG. 3 is a diagram illustrating a second usage scenario in one embodiment;

FIG. 4 is a flow chart illustrating an audio signal processing method according to another embodiment;

FIG. 5 is a block diagram of an audio signal processing apparatus according to an embodiment;

fig. 6 is a schematic diagram of an internal structure of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various data, but the data is not limited by these terms. These terms are only used to distinguish one datum from another. For example, a first usage scenario may be referred to as a second usage scenario, and similarly, the second usage scenario may be referred to as the first usage scenario, without departing from the scope of the present application. The first usage scenario and the second usage scenario are both usage scenarios, but they are not the same usage scenario.

FIG. 1 is a flow diagram of a method for audio signal processing in one embodiment. The audio signal processing method in this embodiment is described by taking an example of the audio signal processing method running on an electronic device. As shown in fig. 1, the audio signal processing method includes steps 102 to 106.

Step 102, acquiring an input audio signal and user interaction information corresponding to the audio signal.

The audio signal is a frequency and amplitude variation information carrier with regular sound waves of voice, music and sound effects. The input audio signal is input by a user through the audio acquisition device when the user uses the electronic equipment to record sound. The audio acquisition device may be a microphone or the like. The user interaction information corresponding to the audio signal refers to information generated in the electronic equipment when the user inputs the audio signal. For example, the user interaction information may be at least one of motion data and a trigger manner corresponding to the step of acquiring the input audio signal, but is not limited thereto. The motion data may be at least one of gyroscope data and acceleration data, but is not limited thereto.

Specifically, the electronic device acquires an input audio signal through the audio acquisition device in response to an audio input operation, and acquires corresponding user interaction information when acquiring the input audio signal.

And step 104, determining a target user use scene corresponding to the audio signal according to the user interaction information.

Wherein the user usage scenario includes a first usage scenario and a second usage scenario. A first distance between the user and the electronic device characterized by the first usage scenario is greater than a second distance between the user and the electronic device characterized by the second usage scenario. For example, the first usage scenario may be a far-field usage scenario and the second usage scenario may be a near-field usage scenario. The target user usage scenario is either a first usage scenario or a second usage scenario.

Specifically, the electronic device may determine a target user usage scenario corresponding to the audio signal according to at least one of the motion data and the trigger manner corresponding to the step of acquiring the input audio signal.

And 106, determining a target adjusting parameter value of the audio signal according to the use scene of the target user, and performing corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain the target audio signal.

The target adjusting parameter value may be at least one of, but not limited to, an adjusting multiple, an adjusting phase value, and an adjusting pitch frequency value. And each target user use scene has a corresponding target adjusting parameter value. The target adjusting parameter values corresponding to the target user use scenes are different. For example, the adjustment multiple corresponding to the first usage scenario is 1.5 times, and the adjustment multiple corresponding to the second usage scenario is 0.8 times. The target audio signal is an audio signal obtained by performing signal conditioning on an input audio signal.

Specifically, the electronic device determines a target adjustment parameter value of the audio signal according to the target user usage scenario. And the electronic equipment carries out corresponding signal conditioning processing on the audio signal based on the target conditioning parameter value to obtain a target audio signal. For example, the electronic device determines that the target user usage scenario is the first usage scenario, then the corresponding target adjustment parameter value is 1.5 times, then the electronic device increases the audio signal by 1.5 times.

In this embodiment, the electronic device performs speech recognition processing on the target audio signal to obtain a speech recognition result.

In the audio signal processing method in this embodiment, by acquiring the input audio signal and the user interaction information corresponding to the audio signal, a target user usage scenario corresponding to the audio signal is determined according to the user interaction information, that is, a voice input performed by a user in what scenario is distinguished; and the adjusting parameter value of the audio signal is determined according to the use scene of the target user, the audio signal under different scenes can be processed in a targeted manner, the accuracy of audio signal processing is improved, and the condition of error identification of the audio signal is reduced.

In one embodiment, the user interaction information includes motion data. Determining a target user usage scenario of an audio signal according to user interaction information, comprising: when the variation amplitude of the motion data in the preset time length is in a preset range, determining a target user usage scene of the audio signal as a first usage scene; and when the variation amplitude of the motion data in the preset time length is not in the preset range, determining that the target user usage scene of the audio signal is a second usage scene.

The electronic equipment comprises a motion detection device. The motion detection means may detect motion data acting on the electronic device. The motion detection means may be a gyroscope or an accelerometer, etc. Wherein the gyroscope is an angular motion detection device. When the electronic device includes a gyroscope, the motion data may specifically be angular velocity data. When the electronic device includes an accelerometer, the motion data may specifically be acceleration data. The motion data may be at least one of acceleration data and gyroscope data, but is not limited thereto. The preset time period refers to a time period preset on the electronic device, and may be, for example, 1 second, 2 seconds, 3 seconds, 1 minute, or the like, without being limited thereto. The preset range is used to represent a shaking range of the motion data. The change range of the motion data represented in the preset range is small, and the change range of the motion data not represented in the preset range is large.

Specifically, when the variation range of the motion data in the preset duration is within the preset range, it is indicated that the variation range of the motion data in a period of time of the audio input is small, that is, the electronic device may be in a stationary state, and then the target user usage scene corresponding to the audio signal is determined to be the first usage scene. And the first usage scenario may be referred to as a far-field usage scenario. When the variation amplitude of the gyroscope within the preset time length is not within the preset range, it is indicated that the variation amplitude of the motion data within a period of time of the audio input is large, that is, the electronic device may be in a motion state, and then the target user usage scene corresponding to the audio signal is determined to be a second usage scene. And the second usage scenario may be referred to as a far-field usage scenario. The electronic equipment is taken as a mobile phone, the preset time length is 3 seconds, and the change amplitude is 0-0.1, when the change amplitude of the motion data is 0-0.05 within 3 seconds, it indicates that the mobile phone is likely to be placed on a desk by a user, and the use scene is a far-field use scene. Conversely, when the variation range of the motion data within 3 seconds is 0-2, the mobile phone is likely to be held by the user, and the use scene is a near-field use scene.

In the audio signal processing method in this embodiment, when the variation amplitude of the motion data within the preset duration is within the preset range, it is determined as a first usage scenario; when the change amplitude of the motion data in the preset duration is not in the preset range, the second use scene is determined, then the scenes are distinguished according to the motion data, and the target adjusting parameter value is determined based on different scenes, so that the accuracy of audio signal processing can be improved, and the error recognition condition of the audio signal is reduced.

In one embodiment, the user interaction information includes a trigger corresponding to the step of acquiring the input audio signal. Determining a target user usage scenario of an audio signal according to user interaction information, comprising: when the triggering mode is triggered through a voice awakening mode, determining that a target user use scene of the audio signal is a first use scene; and when the triggering mode is triggered in a mode of touching the electronic equipment, determining that the target user usage scene of the audio signal is a second usage scene.

The triggering mode refers to an operation mode performed by a user before the input audio signal is acquired. For example, the triggering manner may be triggering by a voice wakeup manner or triggering by a touch electronic device. The voice wake-up mode is to wake up the audio signal input function in a voice mode. And a specific voice wake-up message is required to be able to wake up the audio signal input function. For example, the user may send out voice wake-up information such as "perform voice recognition", "i want to record cheer", and so on, and then the electronic device starts to acquire the input audio signal. The manner of touching the electronic device includes, but is not limited to, a virtual control of the touch electronic device, a physical control of the touch electronic device, a screen of the touch electronic device, and the like. For example, the manner of touching the electronic device may be long pressing the power key, clicking the hover ball, clicking the audio input control, but is not limited thereto.

Specifically, when the triggering mode is triggered by a voice wake-up mode, it indicates that the distance between the electronic device and the user is long, and then it is determined that the target user usage scenario of the audio signal is the first usage scenario. The first usage scenario may be referred to as a far-field usage scenario. When the triggering mode is triggered in a mode of touching the electronic device, it is indicated that the distance between the electronic device and the user is short, and then the target user usage scenario of the audio signal is determined to be a second usage scenario. The second usage scenario may be referred to as a near-field usage scenario. Taking the electronic device as a mobile phone as an example for explanation, when a user says "please perform voice recognition" to the mobile phone, that is, the user is triggered by a voice wake-up mode, it is explained that the distance between the user and the mobile phone is probably far, and at this time, the user is determined to be a far-field use scene. When a user triggers and acquires an input audio signal through a control on the touch mobile phone, the situation that the mobile phone is held by the user with high probability is determined as a near-field use scene.

According to the audio signal processing method in the embodiment of the application, when the audio signal is triggered in a voice awakening mode, namely, when the situation that the distance between a user and the electronic equipment is large is stated, the situation is determined to be a first use scene, when the audio signal is triggered in a touch electronic equipment mode, the situation that the distance between the user and the electronic equipment is small is stated, the situation is determined to be a second use scene, different scenes can be distinguished based on the triggering mode, the distance between the user and a mobile phone does not need to be directly acquired, a target adjusting parameter value is determined based on different scenes, the accuracy of audio signal processing can be improved, and therefore the situation of misidentification of the audio signal is reduced.

In one embodiment, the user interaction information includes motion data and a trigger corresponding to the step of obtaining the input audio signal. Determining a target user usage scenario of an audio signal according to user interaction information, comprising: when the triggering mode is triggered in a voice awakening mode and the variation amplitude of the motion data in the preset time length is in a preset range, determining a target user use scene of the audio signal as a first use scene; and when at least one condition that the triggering mode is triggered in a mode of touching the electronic equipment and the variation amplitude of the motion data in the preset time length is not in the preset range is met, determining the target user usage scene of the audio signal as a second usage scene.

Specifically, when the triggering mode corresponding to the acquired input voice signal is triggered in a voice mode and the variation amplitude of the motion data in the preset time length is within the preset range, it indicates that the electronic device is placed on a stable place such as a desk at a high probability and the distance between the electronic device and the user is far at a high probability, and then the target user usage scenario of the audio signal is determined to be the first usage scenario. When the triggering mode is triggered in a mode of touching the electronic device, or the variation amplitude of the motion data in the preset time length is not in the preset range, or the triggering mode is triggered in a mode of touching the electronic device and the variation amplitude of the motion data in the preset time length is not in the preset range, it is indicated that the electronic device is held by the user with a high probability, that is, the distance between the electronic device and the user is close to the high probability, and then the target user usage scene of the audio signal is determined to be a second usage scene. For example, when the user says "please perform voice recognition" to the electronic device, and the variation amplitude of the motion data of the electronic device within the preset time length is within the preset range, that is, when the electronic device is in a static state, the target user usage scenario of the audio signal is determined to be the first usage scenario. And when the step of acquiring the input audio signal is triggered by the user through the control of the touch electronic device, it indicates that the distance between the user and the electronic device is not too far, and the user is determined to be the second usage scenario.

In this embodiment, as shown in fig. 2, a scene diagram of a first usage scene in one embodiment is shown. Included in fig. 2 are an electronic device 210, a desk 220, a voice wake up mode 220, and a user 230. And the electronic device 210 may be placed on the table 220, at this time, the variation range of the motion data within the preset time length is within the preset range, so that the target user usage scenario may be determined as the first usage scenario. The user 230 may trigger the step of acquiring the input audio signal through the voice wake-up mode 220, and at this time, the distance between the user 230 and the electronic device 210 is likely to be relatively far, so that the target user usage scenario may be determined as the first usage scenario. Moreover, when the input audio signal is triggered and acquired through the voice wake-up mode 220 and the variation amplitude of the motion data in the preset time length is within the preset range, the target user usage scenario of the audio signal can also be determined as the first usage scenario. Namely, when the mobile phone is stably placed on the table top and is awakened by voice, the distance between the mobile phone and the user is considered to be relatively long, and a relatively large signal enhancement amplitude (namely a target adjustment parameter value) can be set.

In this embodiment, as shown in fig. 3, a scene diagram of a second usage scene in one embodiment is shown. A user 310 and an electronic device 320 are included in fig. 3. Then, when the user 310 holds the electronic device 320, the step of acquiring the input audio signal is generally triggered by touching the electronic device, and the variation range of the motion data within the preset time duration is generally large, so that it can be determined that the target user usage scenario of the input audio signal is the second usage scenario. Then the user is holding the handset to speak and the magnitude of the signal enhancement is limited to a lower level.

In the audio signal processing method in this embodiment, the motion data and the triggering manner corresponding to the step of acquiring the input audio signal are combined to determine the target user usage scenario of the electronic device, so that accuracy of scenario judgment can be improved, accuracy of audio signal processing can be improved, and false recognition of the audio signal can be reduced.

In one embodiment, performing corresponding signal conditioning processing on the audio signal based on the target conditioning parameter value to obtain the target audio signal includes: acquiring a reference adjusting parameter value of the audio signal; and when the reference adjusting parameter value is larger than the target adjusting parameter value, carrying out corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal.

Wherein, the reference adjusting parameter value can be obtained by the existing voice adjusting algorithm. The target tuning parameter value may be a maximum multiple of a signal amplification.

Specifically, when the reference adjusting parameter value is larger than the target adjusting parameter value, the electronic device performs corresponding signal adjustment processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal. For example, the target user usage scene is a first usage scene, the target adjustment parameter corresponding to the first usage scene is M, the input audio signal is calculated by the adaptive signal enhancement algorithm to obtain a reference adjustment parameter value X, and when X > M, the audio signal is subjected to corresponding signal adjustment processing according to M to obtain the target audio signal. Or, the target user usage scene is a second usage scene, the target adjustment signal corresponding to the second usage scene is N, the input audio signal is calculated by the adaptive signal enhancement algorithm to obtain a reference adjustment parameter value X, and when X is greater than N, the audio signal is subjected to corresponding signal adjustment processing according to N to obtain the target audio signal.

In the audio signal processing method in this embodiment, when the reference adjusting parameter value is greater than the target adjusting parameter value, corresponding signal adjustment processing is performed on the audio signal based on the target adjusting parameter value to obtain the target audio signal, the signal adjusting parameter value is limited, and a smaller value is taken, so that when the user does not speak, and another person nearby is in a chat scene, a smaller adjusting parameter value can be obtained, the accuracy of audio signal processing is improved, the chat content of another person is prevented from being entered, and the situation of misidentification of the audio signal is reduced.

In one embodiment, the audio signal processing method further includes: and when the reference adjusting parameter value is smaller than or equal to the target adjusting parameter value, carrying out corresponding signal adjusting processing on the audio signal based on the reference adjusting parameter value to obtain a target audio signal.

Specifically, when the reference adjusting parameter value is smaller than or equal to the target adjusting parameter value, the audio signal is subjected to corresponding signal adjusting processing based on the reference adjusting parameter value, and a target audio signal is obtained. For example, the target user usage scene is a first usage scene, the target adjustment parameter corresponding to the first usage scene is M, the input audio signal is calculated by the adaptive signal enhancement algorithm to obtain a reference adjustment parameter value X, and when X is less than or equal to M, the audio signal is subjected to corresponding signal adjustment processing according to X to obtain the target audio signal. Or, the target usage scene is a second usage scene, the target adjustment signal corresponding to the second usage scene is N, the input audio signal is calculated by the adaptive signal enhancement algorithm to obtain a reference adjustment parameter value X, and when X is less than or equal to N, the audio signal is subjected to corresponding signal adjustment processing according to X to obtain the target audio signal.

In the audio signal processing method in this embodiment, when the reference adjusting parameter value is less than or equal to the target adjusting parameter value, corresponding signal adjustment processing is performed on the audio signal based on the reference adjusting parameter value to obtain the target audio signal, and then corresponding signal adjustment processing is performed on the audio signal based on a smaller value of the reference adjusting parameter value and the target adjusting parameter value, so that an effect of correctly identifying the input audio signal can be achieved, and when the user does not speak, and other people nearby are in a chat scene, a smaller adjusting parameter value can be obtained, accuracy of audio signal processing is improved, entry of chat contents of other people is avoided, and a situation of misidentification of the audio signal is reduced.

In one embodiment, the user usage scenario includes a first usage scenario and a second usage scenario, wherein a distance between the user and the electronic device characterized by the first usage scenario is greater than a distance between the user and the electronic device characterized by the second usage scenario; adjusting the parameter value comprises adjusting multiple;

determining a target adjusting parameter value of an audio signal according to a target user usage scene, and performing corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal, including:

when the target user usage scene is a first usage scene, determining that a target adjusting parameter value of the audio signal comprises a first adjusting multiple corresponding to the first usage scene, and performing corresponding signal adjusting processing on the audio signal based on the first adjusting multiple to obtain a target audio signal;

when the target user usage scene is a second usage scene, determining that a target adjustment parameter value of the audio signal comprises a second adjustment multiple corresponding to the second usage scene, and performing corresponding signal adjustment processing on the audio signal based on the second adjustment multiple to obtain a target audio signal; the first adjustment factor is greater than the second adjustment factor.

Wherein the user may be the owner of the electronic device. Alternatively, the user refers to a user who is using the electronic device. The distance between the user and the electronic device characterized by the first usage scenario is greater than the distance between the user and the electronic device characterized by the second usage scenario. The adjustment parameter value includes an adjustment multiple. An adjusting multiple can be preset for each user use scene, and the multiple of different grades can be distinguished according to the intensity of the audio signal under each user use scene. For example, the first adjustment multiple may be a maximum adjustment multiple of the setting in the first usage scenario, and the second adjustment multiple may be a maximum adjustment multiple of the setting in the second usage scenario, which may be determined according to an actual test effect. For example, the intensities of the audio signal are 1 and 2, then in the first usage scenario, the intensity 1 corresponds to a multiple of 10, and the intensity 2 corresponds to a multiple of 5; in the second usage scenario, the intensity 1 corresponds to a multiple of 8, the intensity 2 corresponds to a multiple of 4, and so on. The specific audio signal strength is positively correlated with the multiple. The corresponding adjusting times of the same audio signal are different under different user use scenes.

Specifically, when the usage scene of the target user is the first usage scene, the electronic device determines that a target adjustment parameter value of the audio signal is a first adjustment multiple corresponding to the first usage scene, and performs corresponding signal adjustment processing on the audio signal based on the first adjustment multiple to obtain the target audio signal. When the target user usage scene is a second usage scene, the electronic device determines that a target adjustment parameter value of the audio signal is a second adjustment multiple corresponding to the second usage scene, and performs corresponding signal adjustment processing on the audio signal based on the second adjustment multiple to obtain a target audio signal. For example, the first usage scenario is a far-field usage scenario, the second usage scenario is a near-field usage scenario, the first adjustment factor is 10, and the second adjustment factor is 8. When the target user usage scene is a far-field usage scene, determining that a target adjustment parameter value of the audio signal is a first adjustment multiple 10 corresponding to the far-field usage scene, and increasing the input audio signal by 10 times; and when the target user usage scene is the near-field usage scene, determining that the target adjustment parameter value of the audio signal is a second adjustment multiple 8 corresponding to the near-field usage scene, and increasing the input audio signal by 8 times to obtain the target audio signal.

In this embodiment, the audio signal processing method further includes: acquiring a reference multiple value of the audio signal; carrying out corresponding signal conditioning processing on the audio signal based on the first conditioning multiple to obtain a target audio signal, comprising: when the target user usage scene is a first usage scene and the reference multiple value is greater than a first adjustment multiple, performing corresponding signal adjustment processing on the input audio signal based on the first adjustment multiple to obtain a target audio signal; carrying out corresponding signal conditioning processing on the audio signal based on the second conditioning multiple to obtain a target audio signal, comprising: and when the target user usage scene is a second usage scene and the reference multiple value is greater than a second adjustment multiple, performing corresponding signal adjustment processing on the input audio signal based on the second adjustment multiple to obtain a target audio signal.

In the audio signal processing method in this embodiment, when the usage scenario of the target user is a first usage scenario, a target adjustment parameter value of the audio signal is determined to be a first adjustment multiple corresponding to the first usage scenario, and signal adjustment is performed based on the first adjustment multiple; when the target user usage scene is the second usage scene, determining that the target adjustment parameter value of the audio signal is the second adjustment multiple corresponding to the second usage scene, and performing signal adjustment based on the second adjustment multiple, that is, the actual adjustment multiples of the same audio signal in different scenes are different, so that the accuracy of adjusting the audio signal multiple is improved, and the misidentification condition of the audio signal is reduced.

In one embodiment, as shown in fig. 4, a flow chart of an audio signal processing method in another embodiment is shown. The electronic equipment records the sound and acquires the input audio signal. And (4) the audio signal is processed by a self-adaptive enhancement algorithm to obtain a reference adjusting parameter value. And acquiring user interaction information while recording, determining a target user use scene based on the user interaction information, and determining a target adjusting parameter value according to the target user use scene. And when the reference adjusting parameter value is larger than the target adjusting parameter value, performing signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal. And inputting the target audio signal into a recognition engine to obtain a voice recognition result. In the audio signal processing method in this embodiment, by acquiring the input audio signal and the user corresponding to the audio signal, a target user usage scenario corresponding to the audio signal is determined according to the user, that is, a voice input performed by the user in what scenario is distinguished; and the adjusting parameter value of the audio signal is determined according to the use scene of the target user, the audio signal under different scenes can be processed in a targeted manner, the accuracy of audio signal processing is improved, and the condition of error identification of the audio signal is reduced.

It should be understood that although the steps in the flowcharts of fig. 1 or 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 or 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

Fig. 5 is a block diagram of an audio signal processing apparatus according to an embodiment. As shown in fig. 5, an audio signal processing apparatus including an information acquisition module 502, a scene determination module 504, and a signal adjustment module 506 includes:

an information obtaining module 502, configured to obtain an input audio signal and user interaction information corresponding to the audio signal;

a scene determining module 504, configured to determine, according to the user interaction information, a target user usage scene corresponding to the audio signal;

and the signal adjusting module 506 is configured to determine a target adjusting parameter value of the audio signal according to the target user usage scenario, and perform corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal.

The audio signal processing apparatus in this embodiment determines that the motion data is a first usage scenario when a variation range of the motion data within a preset duration is within a preset range; when the change amplitude of the motion data in the preset duration is not in the preset range, the second use scene is determined, then the scenes are distinguished according to the motion data, and the target adjusting parameter value is determined based on different scenes, so that the accuracy of audio signal processing can be improved, and the error recognition condition of the audio signal is reduced.

In one embodiment, the user interaction information includes motion data. The scene determining module 504 is configured to determine a target user usage scene of the audio signal as a first usage scene when a variation amplitude of the motion data within a preset time length is within a preset range; and when the variation amplitude of the motion data in the preset time length is not in the preset range, determining that the target user usage scene of the audio signal is a second usage scene.

In one embodiment, an accelerometer is included in the electronic device; the user interaction information includes acceleration data. The scene determining module 504 is configured to determine a target user usage scene of the audio signal as a first usage scene when a variation amplitude of the acceleration data within a preset duration is within a preset acceleration range; and when the variation amplitude of the acceleration data in the preset time length is not in the preset range, determining that the target user usage scene of the audio signal is a second usage scene.

The audio signal processing device in the embodiment of the application determines that the audio signal processing device is a first use scene when the variation amplitude of the acceleration data in the preset time length is in the preset range; when the change amplitude of the acceleration data in the preset duration is not in the preset range, the second use scene is determined, then the scenes are distinguished according to the acceleration data, and the target adjusting parameter value is determined based on different scenes, so that the accuracy of audio signal processing can be improved, and the error identification condition of the audio signal is reduced.

In one embodiment, the user interaction information includes a trigger corresponding to the step of acquiring the input audio signal. The scene determining module 504 is configured to determine a target user usage scene of the audio signal as a first usage scene when the triggering manner is triggering by a voice wake-up manner; and when the triggering mode is triggered in a mode of touching the electronic equipment, determining that the target user usage scene of the audio signal is a second usage scene.

The audio signal processing device in the embodiment of the application, when triggered in a voice wake-up mode, namely, when it is stated that the approximate rate of the distance between a user and an electronic device is far, the audio signal processing device is determined to be a first usage scenario, when triggered in a touch electronic device mode, the approximate rate of the distance between the user and the electronic device is close, the audio signal processing device is determined to be a second usage scenario, different scenarios can be distinguished based on the trigger mode, the distance between the user and a mobile phone does not need to be directly acquired, a target adjusting parameter value is determined based on different scenarios, the accuracy of audio signal processing can be improved, and therefore the situation of misidentification of an audio signal is reduced.

In one embodiment, the user interaction information includes motion data and a trigger corresponding to the step of obtaining the input audio signal. The scene determining module 504 is configured to determine a target user usage scene of the audio signal as a first usage scene when the triggering mode is triggered in a voice wake-up mode and a variation range of the motion data in a preset time duration is within a preset range; and when at least one condition that the triggering mode is triggered in a mode of touching the electronic equipment and the variation amplitude of the motion data in the preset time length is not in the preset range is met, determining the target user usage scene of the audio signal as a second usage scene.

The audio signal processing device in this embodiment combines the motion data and the triggering mode corresponding to the step of acquiring the input audio signal to determine the target user usage scenario of the electronic device, so that the accuracy of scenario determination can be improved, the accuracy of audio signal processing can be improved, and the false recognition condition of the audio signal can be reduced.

In one embodiment, the signal conditioning module 506 is configured to obtain a reference conditioning parameter value for the audio signal; and when the reference adjusting parameter value is larger than the target adjusting parameter value, carrying out corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal.

The audio signal processing device in this embodiment, when the reference adjustment parameter value is greater than the target adjustment parameter value, carry out corresponding signal conditioning processing to audio signal based on the target adjustment parameter value, obtain the target audio signal, restricted signal conditioning parameter value, get less value, then do not speak at the user, there is other people in the scene of chatting near this moment, can obtain less adjustment parameter value, improve audio signal processing's accuracy, avoid typing into other people's chat content, reduce the condition of audio signal misidentification.

In one embodiment, the signal conditioning module 506 is further configured to perform corresponding signal conditioning processing on the audio signal based on the reference conditioning parameter value to obtain the target audio signal when the reference conditioning parameter value is less than or equal to the target conditioning parameter value.

The audio signal processing device in this embodiment, when the reference adjusting parameter value is less than or equal to the target adjusting parameter value, perform corresponding signal conditioning processing on the audio signal based on the reference adjusting parameter value to obtain the target audio signal, perform corresponding signal conditioning processing on the audio signal based on a smaller value of the reference adjusting parameter value and the target adjusting parameter value, and also can achieve the effect of correctly identifying the input audio signal, and when the user does not speak yet, at this time, another person nearby can obtain a smaller adjusting parameter value in a chat scene, thereby improving the accuracy of audio signal processing, avoiding entering chat content of another person, and reducing the situation of audio signal misidentification.

In one embodiment, the user usage scenario includes a first usage scenario and a second usage scenario, wherein a distance between the user and the electronic device characterized by the first usage scenario is greater than a distance between the user and the electronic device characterized by the second usage scenario; the adjustment parameter value includes an adjustment multiple. The signal adjusting module 506 is configured to, when the usage scenario of the target user is a first usage scenario, determine that a target adjusting parameter value of the audio signal includes a first adjusting multiple corresponding to the first usage scenario, and perform corresponding signal adjusting processing on the audio signal based on the first adjusting multiple to obtain a target audio signal; when the target user usage scene is a second usage scene, determining that a target adjustment parameter value of the audio signal comprises a second adjustment multiple corresponding to the second usage scene, and performing corresponding signal adjustment processing on the audio signal based on the second adjustment multiple to obtain a target audio signal; the first adjustment factor is greater than the second adjustment factor.

In the audio signal processing apparatus in this embodiment, when the usage scenario of the target user is a first usage scenario, a target adjustment parameter value of the audio signal is determined to be a first adjustment multiple corresponding to the first usage scenario, and signal adjustment is performed based on the first adjustment multiple; when the target user usage scene is the second usage scene, determining that the target adjustment parameter value of the audio signal is the second adjustment multiple corresponding to the second usage scene, and performing signal adjustment based on the second adjustment multiple, that is, the actual adjustment multiples of the same audio signal in different scenes are different, so that the accuracy of adjusting the audio signal multiple is improved, and the misidentification condition of the audio signal is reduced.

The division of the modules in the audio signal processing apparatus is merely for illustration, and in other embodiments, the audio signal processing apparatus may be divided into different modules as needed to complete all or part of the functions of the audio signal processing apparatus.

For the specific definition of the audio signal processing apparatus, reference may be made to the above definition of the audio signal processing method, which is not described herein again. The respective modules in the audio signal processing apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 6 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 6, the electronic device includes a processor and a memory connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program is executable by a processor for implementing an audio signal processing method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The electronic device may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a Point of Sales (POS), a vehicle-mounted computer, and a wearable device.

The implementation of each module in the audio signal processing apparatus provided in the embodiments of the present application may be in the form of a computer program. The computer program may be run on a terminal or a server. Program modules constituted by such computer programs may be stored on the memory of the electronic device. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of the audio signal processing method.

A computer program product comprising instructions which, when run on a computer, cause the computer to perform an audio signal processing method.

Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An audio signal processing method applied to an electronic device includes:

2. The method of claim 1, wherein the user interaction information comprises motion data;

the determining a target user usage scenario of the audio signal according to the user interaction information includes:

when the variation amplitude of the motion data in a preset time length is in a preset range, determining that a target user usage scene of the audio signal is a first usage scene;

and when the variation amplitude of the motion data in the preset time length is not in the preset range, determining that the target user usage scene of the audio signal is a second usage scene.

3. The method according to claim 1, wherein the user interaction information includes a trigger mode corresponding to the step of acquiring the input audio signal;

when the triggering mode is triggered through a voice awakening mode, determining that a target user usage scene of the audio signal is a first usage scene;

and when the triggering mode is triggered in a mode of touching the electronic equipment, determining that the target user usage scene of the audio signal is a second usage scene.

4. The method of claim 1, wherein the user interaction information includes motion data and a trigger corresponding to the step of obtaining the input audio signal;

when the triggering mode is triggered in a voice awakening mode and the variation amplitude of the motion data in a preset time length is in a preset range, determining that a target user usage scene of the audio signal is a first usage scene;

and when at least one condition that the triggering mode is triggered in a mode of touching the electronic equipment and the change amplitude of the motion data in a preset time length is not in a preset range is met, determining that the target user usage scene of the audio signal is a second usage scene.

5. The method according to any one of claims 1 to 4, wherein performing the corresponding signal conditioning processing on the audio signal based on the target conditioning parameter value to obtain a target audio signal comprises:

acquiring a reference adjusting parameter value of the audio signal;

and when the reference adjusting parameter value is larger than the target adjusting parameter value, carrying out corresponding signal adjusting processing on the audio signal based on the target adjusting parameter value to obtain a target audio signal.

6. The method of claim 5, further comprising:

and when the reference adjusting parameter value is smaller than or equal to the target adjusting parameter value, carrying out corresponding signal adjusting processing on the audio signal based on the reference adjusting parameter value to obtain a target audio signal.

7. The method of any one of claims 1 to 4, wherein the user usage scenario comprises a first usage scenario and a second usage scenario, wherein a distance between the user characterized by the first usage scenario and the electronic device is greater than a distance between the user characterized by the second usage scenario and the electronic device; adjusting the parameter value comprises adjusting multiple;

the determining a target adjustment parameter value of the audio signal according to the target user usage scenario, and performing corresponding signal adjustment processing on the audio signal based on the target adjustment parameter value to obtain a target audio signal includes:

when the target user usage scene is a first usage scene, determining a target adjustment parameter value of the audio signal as a first adjustment multiple corresponding to the first usage scene, and performing corresponding signal adjustment processing on the audio signal based on the first adjustment multiple to obtain a target audio signal;

when the target user usage scene is a second usage scene, determining a second adjustment multiple corresponding to the second usage scene of a target adjustment parameter value of the audio signal, and performing corresponding signal adjustment processing on the audio signal based on the second adjustment multiple to obtain a target audio signal; the first adjustment multiple is greater than the second adjustment multiple.

8. An audio signal processing apparatus, comprising:

9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, wherein the computer program, when executed by the processor, causes the processor to carry out the steps of the audio signal processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.