CN117292691A - Audio energy analysis method and related device - Google Patents
Audio energy analysis method and related device Download PDFInfo
- Publication number
- CN117292691A CN117292691A CN202210697612.5A CN202210697612A CN117292691A CN 117292691 A CN117292691 A CN 117292691A CN 202210697612 A CN202210697612 A CN 202210697612A CN 117292691 A CN117292691 A CN 117292691A
- Authority
- CN
- China
- Prior art keywords
- energy
- audio energy
- audio
- sound source
- total
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 28
- 230000003993 interaction Effects 0.000 claims abstract description 32
- 238000003062 neural network model Methods 0.000 claims description 27
- 238000000034 method Methods 0.000 claims description 21
- 239000010410 layer Substances 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 11
- 239000002356 single layer Substances 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 6
- 230000002618 waking effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 23
- 238000007405 data analysis Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 229920002430 Fibre-reinforced plastic Polymers 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000011151 fibre-reinforced plastic Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The embodiment of the application discloses an audio energy analysis method and a related device, when audio energy analysis is performed, energy loss parameters of a first device can be determined first, and the energy loss parameters can identify loss when a second device transmits audio energy to the first device. And the processing device can acquire the first total audio energy received by the first device and the self audio energy generated by the second device through playing the audio, so that the audio energy received by the first device from the voice interaction sound source can be obtained through combining the data analysis, the interference of the self-played audio of the second device on the voice interaction recognition is eliminated, the device which the user wants to interact with can be more accurately analyzed based on the audio energy, and the voice interaction experience of the user is improved.
Description
Technical Field
The present disclosure relates to the field of data analysis technologies, and in particular, to an audio energy analysis method and a related device.
Background
The voice interaction is one of the common man-machine interaction means, and when a plurality of devices supporting voice interaction exist in a scene, the related devices need to judge which device the user really wants to interact with and perform corresponding voice interaction.
In the related art, when the equipment for performing voice interaction does not emit sound, the equipment for user interaction can be accurately judged; however, when these devices themselves make sounds, it is difficult to determine the device that the user wants to interact with, and the user's voice interaction experience is poor.
Disclosure of Invention
In order to solve the technical problems, the application provides an audio energy analysis method, and processing equipment can analyze the audio interference of the equipment, so that a sound source of voice interaction can be accurately identified, and the voice interaction experience of a user is improved.
The embodiment of the application discloses the following technical scheme:
in a first aspect, embodiments of the present application disclose an audio energy analysis method, the method comprising:
determining an energy loss parameter of a first device, the energy loss parameter being used to identify a loss in a second device delivering audio energy to the first device;
acquiring first total audio energy of the first device and self audio energy of the second device, wherein the first total audio energy is audio energy received by the first device, and the self audio energy is energy generated based on audio played by the second device;
and determining first sound source audio energy of the first device according to the energy loss parameter, the self audio energy and the first total audio energy, wherein the first sound source audio energy is the audio energy acquired from a sound source by the first device.
In one possible implementation, the energy loss parameter is based on the following:
determining a test own audio energy of the second device when playing audio and a received audio energy received by the first device from the second device in an environment without other sound sources than the second device;
and determining an energy loss parameter of the first device according to the ratio of the received audio energy to the test own audio energy.
In one possible implementation, the method further includes:
determining a second total audio energy received by the second device;
determining second sound source audio energy of the second device according to the self audio energy and the second total audio energy, wherein the second sound source audio energy and the first sound source audio energy are from the same sound source;
and waking up a voice interaction function of the second device in response to the second source audio energy being greater than the first source audio energy.
In a possible implementation manner, the determining the second sound source audio energy of the second device according to the self audio energy and the second total audio energy includes:
determining nonlinear energy of the second device according to the self audio energy, wherein the nonlinear energy is generated based on vibration generated when the second device plays audio;
and removing the self audio energy and the nonlinear energy in the second total audio energy to obtain the second sound source audio energy.
In a possible implementation manner, the determining, according to the self audio energy, nonlinear energy of the second device includes:
and determining nonlinear energy of the second equipment according to the self audio energy through a neural network model.
In one possible implementation, the neural network model is trained by:
acquiring a training sample set, wherein the training sample set comprises sample self audio energy and sample total audio energy acquired by target equipment in an environment without other sound sources;
determining sample nonlinear energy of the target device according to the sample total audio energy and the sample own audio energy;
and training an initial neural network model through the total audio energy of the sample, the self audio energy of the sample and the nonlinear energy of the sample to obtain the neural network model.
In one possible implementation, the neural network model is a convolutional recursive network model having a single layer encoding layer and a single layer decoding layer.
In a second aspect, an embodiment of the present application discloses an audio energy analysis apparatus, the apparatus including a first determination unit, an acquisition unit, and a second determination unit:
the first determining unit is configured to determine an energy loss parameter of a first device, where the energy loss parameter is used to identify a loss when a second device transmits audio energy to the first device;
the acquiring unit is configured to acquire a first total audio energy of the first device and an own audio energy of the second device, where the first total audio energy is audio energy received by the first device, and the own audio energy is energy generated based on audio played by the second device;
the second determining unit is configured to determine, according to the energy loss parameter, the own audio energy, and the first total audio energy, first sound source audio energy of the first device, where the first sound source audio energy is audio energy obtained by the first device from a sound source.
In one possible implementation, the energy loss parameter is based on the following:
determining a test own audio energy of the second device when playing audio and a received audio energy received by the first device from the second device in an environment without other sound sources than the second device;
and determining an energy loss parameter of the first device according to the ratio of the received audio energy to the test own audio energy.
In a possible implementation manner, the apparatus further includes a third determining unit, a fourth determining unit, and a wake-up unit:
the third determining unit is configured to determine a second total audio energy received by the second device;
the fourth determining unit is configured to determine second sound source audio energy of the second device according to the self audio energy and the second total audio energy, where the second sound source audio energy and the first sound source audio energy are from a same sound source;
the awakening unit is used for awakening the voice interaction function of the second equipment if the second sound source audio energy is larger than the first sound source audio energy;
and if the second sound source audio energy is smaller than the first sound source audio energy, waking up the voice interaction function of the first device.
In a possible implementation manner, the fourth determining unit is specifically configured to:
determining nonlinear energy of the second device according to the self audio energy, wherein the nonlinear energy is generated based on vibration generated when the second device plays audio;
and removing the self audio energy and the nonlinear energy in the second total audio energy to obtain the second sound source audio energy.
In a possible implementation manner, the fourth determining unit is specifically configured to:
and determining nonlinear energy of the second equipment according to the self audio energy through a neural network model.
In one possible implementation, the neural network model is trained by:
acquiring a training sample set, wherein the training sample set comprises sample self audio energy and sample total audio energy acquired by target equipment in an environment without other sound sources;
determining sample nonlinear energy of the target device according to the sample total audio energy and the sample own audio energy;
and training an initial neural network model through the total audio energy of the sample, the self audio energy of the sample and the nonlinear energy of the sample to obtain the neural network model.
In one possible implementation, the neural network model is a convolutional recursive network model having a single layer encoding layer and a single layer decoding layer.
In a third aspect, embodiments of the present application disclose a computer-readable storage medium, which when executed by at least one processor performs the audio energy analysis method of any of the first aspects.
In a fourth aspect, embodiments of the present application disclose a computer device comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer device executable instructions, when executed by the at least one processor, perform the audio energy analysis method of any of the first aspects.
As can be seen from the above technical solution, when performing audio energy analysis, the processing device may first determine an energy loss parameter of the first device, where the energy loss parameter can identify a loss when the second device transmits audio energy to the first device. And the processing device can acquire the first total audio energy received by the first device and the self audio energy generated by the second device through playing the audio, so that the audio energy received by the first device from the voice interaction sound source can be obtained through combining the data analysis, the interference of the self-played audio of the second device on the voice interaction recognition is eliminated, the device which the user wants to interact with can be more accurately analyzed based on the audio energy, and the voice interaction experience of the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an audio energy analysis method according to an embodiment of the present application;
fig. 2 is a block diagram of an audio energy analysis device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
It will be appreciated that the method may be applied to a processing device that is capable of audio energy analysis, for example a terminal device or a server with audio energy analysis functionality. The method can be independently executed by the terminal equipment or the server, can also be applied to a network scene of communication between the terminal equipment and the server, and is executed by the cooperation of the terminal equipment and the server. The terminal equipment can be a computer, a mobile phone and other equipment. The server can be understood as an application server or a Web server, and can be an independent server or a cluster server in actual deployment.
Referring to fig. 1, fig. 1 is a flowchart of an audio energy analysis method according to an embodiment of the present application, where the method includes:
s101: an energy loss parameter of the first device is determined.
Wherein the energy loss parameter is used to identify a loss in the transfer of audio energy to the first device by the second device, i.e. how much of the audio energy the first device is able to receive when the audio energy emanates from the second device. For example, the energy loss parameter may be 0.9, i.e. when the second device emits audio energy, the first device is able to receive 90% of the emitted audio energy.
S102: a first total audio energy of the first device and an own audio energy of the second device are acquired.
The first total audio energy is audio energy received by the first device, and the self audio energy is energy generated based on audio played by the second device. It will be appreciated that in a scenario with a device self-playing, the audio energy received by the first device comprises two parts, one part being audio energy generated by audio emitted by a user performing a voice interaction and the other part being audio energy generated by audio emitted by the second device. In a multi-device scenario, in order to accurately analyze which device the user wants to interact with, typically, the audio energy received by each device from the user is determined, so in the embodiment of the present application, the processing device needs to remove the audio energy portion from the second device from the first total audio energy first, so that accurate voice interaction can be performed.
S103: a first source audio energy of the first device is determined based on the energy loss parameter, the own audio energy, and the first total audio energy.
The above-mentioned energy loss parameter can identify the loss when the second device transmits the audio energy to the first device, where the self audio energy is the audio energy generated by playing the audio by the second device, so that the processing device can determine, through the energy loss parameter and the self audio energy, the audio energy received by the first device from the second device, so that the energy can be removed from the first total audio energy, and a first sound source audio energy of the first device is obtained, where the first sound source audio energy is the audio energy obtained by the first device from a sound source, where the sound source is a sound source performing voice interaction, and where the sound source is a different sounding sound source with the second device, for example, may be a user performing voice interaction, and so on.
As can be seen from the above technical solution, when performing audio energy analysis, the processing device may first determine an energy loss parameter of the first device, where the energy loss parameter can identify a loss when the second device transmits audio energy to the first device. And the processing device can acquire the first total audio energy received by the first device and the self audio energy generated by the second device through playing the audio, so that the audio energy received by the first device from the voice interaction sound source can be obtained through combining the data analysis, the interference of the self-played audio of the second device on the voice interaction recognition is eliminated, the device which the user wants to interact with can be more accurately analyzed based on the audio energy, and the voice interaction experience of the user is improved.
In one possible implementation, the energy loss parameter may be based on the following. In an environment where there are no other sound sources than the second device, the processing device may determine a test own audio energy of the second device when playing audio, and a received audio energy received by the first device from the second device, the test own audio energy being audio energy generated by audio played by the second device. The processing device can determine the energy loss parameter of the first device according to the ratio of the received audio energy to the audio energy of the test device, and the ratio can reflect the difference between the audio energy actually received by the first device and the audio energy sent by the second device, so that the loss of the second device when the second device transmits the audio energy to the first device can be identified.
In one possible implementation, to determine whether the user wants to perform voice interaction with the second device or the first device, the processing device may further determine a second total audio energy received by the second device, and then determine a second audio source audio energy of the second device according to the second audio energy and the second total audio energy, that is, remove a portion of the second audio energy in the second total audio energy, where the second audio source audio energy and the first audio source audio energy are from the same sound source. The processing device may determine a magnitude relation between the second source audio energy and the first source audio energy, where the magnitude relation may represent, to a certain extent, a user's willingness to interact with the second device and the first device. If the second sound source audio energy is larger than the first sound source audio energy, the user is more expected to interact with the second device, and the processing device can wake up the voice interaction function of the second device; if the second source audio energy is less than the first source audio energy, indicating that the user would like to interact with the first device, the processing device may wake up the voice interaction function of the first device.
For example, two speakers a, B are taken as an example, and the flow is described. The two audios are configured with the same wake-up word and have a distributed wake-up function. When the two devices receive the wake-up word, the processing device can calculate audio energy and upload the audio energy to a cloud terminal for decision making, and the cloud terminal selects the device needing to respond according to the scoring standard.
Firstly, the cloud can control the loudspeaker box A to play white noise audio for a period of time, at this moment, audio signals received by the loudspeaker box A and B are subjected to stft conversion, and the average value of audio energy for the period of time is counted as X A (k) And X A→B (k) A. The invention relates to a method for producing a fibre-reinforced plastic composite Taking the 16k sampling rate as an example, the FFT length 512, and the frequency statistical range is selected to be 200-5000Hz (corresponding to the frequency band k=3-160), the energy loss parameter of the sound box B can be C A→B (k)=X A→B (k)/X A (k)。
Otherwise, the energy loss parameter of the sound box A can be obtained to be C B→A (k)=X B→A (k)/X B (k).
Wherein X is A (k) And X A→B (k) The equipment calculation is uploaded to the cloud end, and the cloud end calculates to obtain C A→B (k) Pushing the device A; push C in the same way B→A (k)。
When the device A plays the audio, the user gets close to the A and sends out the wake-up word, and the audio energy received by the A comprises:
Y A (l,k)=S A (l,k)+E A (l,k),S A (k) Representing the audio energy of the sound source of the loudspeaker A, E A (k) Representing the self audio energy of the sound box A, Y A (l, k) represents a second total audio energy.
The audio energy received by sound box B at this point includes:
Y A→B (l,k)=S B (l,k)+E A→B (l,k),S B (l, k) represents the audio energy of the sound source of the loudspeaker B, E A→B (l, k) represents the audio energy received by sound box B from the audio emitted by sound box AAmount of the components.
Through AEC, the sound source audio energy Y 'of the sound box A can be obtained' A (l,k)=S A (l, k), then E A (l,k)=Y A (l,k)-Y′ A (l, k). After correction E A→B (l,k)=E A (l,k)*C A→B (l, k) so that the sound source audio energy S of the sound box B can be determined B (l,k)。
According to S B (l, k) and S A The size judgment of (l, k) the processing device can determine the sound box that the user actually wants to wake up.
It will be appreciated that the audio energy received by the device based on its own generation may not be equivalent to the audio energy being played due to vibrations or the like generated when the audio is played. Thus, in one possible implementation, for more accurate audio energy analysis, in analyzing the audio energy of the sound source of the second device, the processing device may determine the nonlinear energy of the second device based on its own audio energy, the nonlinear energy being generated based on vibrations generated when the second device plays audio. The processing device may remove the self audio energy and the nonlinear energy in the second total audio energy to obtain the second sound source audio energy, so as to obtain more accurate sound source energy.
In one possible implementation, the processing device may determine the nonlinear energy of the second device from its own audio energy, in particular, by means of a neural network model. The neural network model may be trained by:
first, the processing device may acquire a training sample set that includes the sample's own audio energy and the sample's total audio energy that the target device acquired in an environment without other sound sources. Since the sample self audio energy is determined directly based on the audio information played by the target device itself, the sample self audio energy is the linear audio energy of the target device; the sample total audio energy is the total audio energy received from the target device, i.e. comprises linear audio energy and nonlinear audio energy, so that the processing device can determine the sample nonlinear energy of the target device according to the sample total audio energy and the sample own audio energy, and then train an initial neural network model through the sample total audio energy, the sample own audio energy and the sample nonlinear energy to obtain the neural network model, and in the training process, the neural network model can learn the association relation between the nonlinear audio energy part and the linear audio energy part, so as to learn how to determine the nonlinear audio energy part based on the linear audio energy part.
For example, the microphone signal Mic (i.e., the sample total audio energy) can be echo cancelled using the stope signal Ref (i.e., the sample's own audio energy) to yield Aec (i, k) (the sample nonlinear energy). Removing linear audio energy component E by linear methods such as NLMS and RLS linear (l, k) and also a portion of the nonlinear audio energy component E residual (l,k)。
The speech signal mic, ref, aec is a frequency domain signal obtained by stft, which is complex, taking 16k samples, a frame length of 16ms, and a FFT length of 512 as an example, each frequency domain signal is a complex set of 257 (512/2+1=257 according to the symmetry characteristics of FFT). The complex signal takes absolute value and then is converted into Bark domain, thus 64-dimensional data can be obtained. The input of the model, the bark value of concatenation aec, ref, mic, is a 64 x 3 vector per frame.
In one possible implementation, the neural network model is a convolutional recursive network model having a single layer encoding layer and a single layer decoding layer. Thus, the neural network model may be suitable for devices with poor processing power, providing audio energy analysis functionality. For example, the model structure employs a convolutional recursive network model (Convolution Recurrent Network, CRN) structure, while considering on-device memory and performance constraints, with only one layer for the encoder and decoder layers. Specifically, the encoder layer adopts one-dimensional convolution, an input channel 192, an output channel 64, a convolution kernel size of 3, and is connected with BatchNorm and PReLU; the enhancement layer takes the input 64, hiding the LSTM of the layer 64; the decoder layer adopts two-dimensional convolution, an input channel 64+64, an output channel 64, a convolution kernel size of 3 and the connections of BatchNorm and PReLU; the activation function is adoptedsigmoid. The loss function uses MES, and the mean square error of the estimated residual echo component and the real residual echo component is adopted. Adam is adopted as an optimization function, and the learning rate is 0.001 and beta 1 =0.9,β 1 =0.999。
Model input dimension 64 x 3, output dimension 64, and gain value G (l, k) and E obtained by converting the model input dimension into frequency domain residual (l,k)=(Mic(l,k)-E linear (l, k)) G (l, k), i.e. the nonlinear audio energy can be determined in combination with the gain value after removing the linear audio energy from the total audio energy emitted by the device. Thus, through this process, the model can learn how to determine the nonlinear audio energy portion based on the linear audio energy portion.
Based on the audio energy analysis method provided in the foregoing embodiments, the present application further provides an audio energy analysis device, referring to fig. 2, fig. 2 is a block diagram of an audio energy analysis device 200 provided in the embodiment of the present application, where the device includes a first determining unit 201, an obtaining unit 202, and a second determining unit 203:
the first determining unit 201 is configured to determine an energy loss parameter of a first device, where the energy loss parameter is used to identify a loss when a second device transmits audio energy to the first device;
the acquiring unit 202 is configured to acquire a first total audio energy of the first device and an own audio energy of the second device, where the first total audio energy is audio energy received by the first device, and the own audio energy is energy generated based on audio played by the second device;
the second determining unit 203 is configured to determine a first sound source audio energy of the first device according to the energy loss parameter, the self audio energy, and the first total audio energy, where the first sound source audio energy is an audio energy obtained by the first device from a sound source.
In one possible implementation, the energy loss parameter is based on the following:
determining a test own audio energy of the second device when playing audio and a received audio energy received by the first device from the second device in an environment without other sound sources than the second device;
and determining an energy loss parameter of the first device according to the ratio of the received audio energy to the test own audio energy.
In a possible implementation manner, the apparatus further includes a third determining unit, a fourth determining unit, and a wake-up unit:
the third determining unit is configured to determine a second total audio energy received by the second device;
the fourth determining unit is configured to determine second sound source audio energy of the second device according to the self audio energy and the second total audio energy, where the second sound source audio energy and the first sound source audio energy are from a same sound source;
the awakening unit is used for awakening the voice interaction function of the second equipment if the second sound source audio energy is larger than the first sound source audio energy;
and if the second sound source audio energy is smaller than the first sound source audio energy, waking up the voice interaction function of the first device.
In a possible implementation manner, the fourth determining unit is specifically configured to:
determining nonlinear energy of the second device according to the self audio energy, wherein the nonlinear energy is generated based on vibration generated when the second device plays audio;
and removing the self audio energy and the nonlinear energy in the second total audio energy to obtain the second sound source audio energy.
In a possible implementation manner, the fourth determining unit is specifically configured to:
and determining nonlinear energy of the second equipment according to the self audio energy through a neural network model.
In one possible implementation, the neural network model is trained by:
acquiring a training sample set, wherein the training sample set comprises sample self audio energy and sample total audio energy acquired by target equipment in an environment without other sound sources;
determining sample nonlinear energy of the target device according to the sample total audio energy and the sample own audio energy;
and training an initial neural network model through the total audio energy of the sample, the self audio energy of the sample and the nonlinear energy of the sample to obtain the neural network model.
In one possible implementation, the neural network model is a convolutional recursive network model having a single layer encoding layer and a single layer decoding layer.
The application discloses a computer readable storage medium, which when executed by at least one processor, performs the audio energy analysis method of any of the above embodiments.
The application also discloses a computer device comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer device executable instructions, when executed by the at least one processor, perform the audio energy analysis method of any of the above embodiments.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-only memory (ROM), RAM, magnetic disk or optical disk, etc., which can store program codes.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method of audio energy analysis, the method comprising:
determining an energy loss parameter of a first device, the energy loss parameter being used to identify a loss in a second device delivering audio energy to the first device;
acquiring first total audio energy of the first device and self audio energy of the second device, wherein the first total audio energy is audio energy received by the first device, and the self audio energy is energy generated based on audio played by the second device;
and determining first sound source audio energy of the first device according to the energy loss parameter, the self audio energy and the first total audio energy, wherein the first sound source audio energy is the audio energy acquired from a sound source by the first device.
2. The method of claim 1, wherein the energy loss parameter is based on:
determining a test own audio energy of the second device when playing audio and a received audio energy received by the first device from the second device in an environment without other sound sources than the second device;
and determining an energy loss parameter of the first device according to the ratio of the received audio energy to the test own audio energy.
3. The method according to claim 1, wherein the method further comprises:
determining a second total audio energy received by the second device;
determining second sound source audio energy of the second device according to the self audio energy and the second total audio energy, wherein the second sound source audio energy and the first sound source audio energy are from the same sound source;
if the second sound source audio energy is larger than the first sound source audio energy, waking up a voice interaction function of the second device;
and if the second sound source audio energy is smaller than the first sound source audio energy, waking up the voice interaction function of the first device.
4. A method according to claim 3, wherein said determining a second sound source audio energy of said second device from said own audio energy, said second total audio energy, comprises:
determining nonlinear energy of the second device according to the self audio energy, wherein the nonlinear energy is generated based on vibration generated when the second device plays audio;
and removing the self audio energy and the nonlinear energy in the second total audio energy to obtain the second sound source audio energy.
5. The method of claim 4, wherein said determining the nonlinear energy of the second device from the self audio energy comprises:
and determining nonlinear energy of the second equipment according to the self audio energy through a neural network model.
6. The method of claim 5, wherein the neural network model is trained by:
acquiring a training sample set, wherein the training sample set comprises sample self audio energy and sample total audio energy acquired by target equipment in an environment without other sound sources;
determining sample nonlinear energy of the target device according to the sample total audio energy and the sample own audio energy;
and training an initial neural network model through the total audio energy of the sample, the self audio energy of the sample and the nonlinear energy of the sample to obtain the neural network model.
7. The method of claim 5, wherein the neural network model is a convolutional recursive network model having a single layer encoding layer and a single layer decoding layer.
8. An audio energy analysis device, characterized in that the device comprises a first determination unit, an acquisition unit and a second determination unit:
the first determining unit is configured to determine an energy loss parameter of a first device, where the energy loss parameter is used to identify a loss when a second device transmits audio energy to the first device;
the acquiring unit is configured to acquire a first total audio energy of the first device and an own audio energy of the second device, where the first total audio energy is audio energy received by the first device, and the own audio energy is energy generated based on audio played by the second device;
the second determining unit is configured to determine, according to the energy loss parameter, the own audio energy, and the first total audio energy, first sound source audio energy of the first device, where the first sound source audio energy is audio energy obtained by the first device from a sound source.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, perform the audio energy analysis method of any of claims 1-7.
10. A computer device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer device executable instructions, when executed by the at least one processor, perform the audio energy analysis method of any of claims 1-7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210697612.5A CN117292691A (en) | 2022-06-20 | 2022-06-20 | Audio energy analysis method and related device |
PCT/CN2022/102036 WO2023245700A1 (en) | 2022-06-20 | 2022-06-28 | Audio energy analysis method and related apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210697612.5A CN117292691A (en) | 2022-06-20 | 2022-06-20 | Audio energy analysis method and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117292691A true CN117292691A (en) | 2023-12-26 |
Family
ID=89252361
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210697612.5A Pending CN117292691A (en) | 2022-06-20 | 2022-06-20 | Audio energy analysis method and related device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117292691A (en) |
WO (1) | WO2023245700A1 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9947333B1 (en) * | 2012-02-10 | 2018-04-17 | Amazon Technologies, Inc. | Voice interaction architecture with intelligent background noise cancellation |
CN111091828B (en) * | 2019-12-31 | 2023-02-14 | 华为技术有限公司 | Voice wake-up method, device and system |
CN113593548B (en) * | 2021-06-29 | 2023-12-19 | 青岛海尔科技有限公司 | Method and device for waking up intelligent equipment, storage medium and electronic device |
CN113674761B (en) * | 2021-07-26 | 2023-07-21 | 青岛海尔科技有限公司 | Device determination method and device determination system |
-
2022
- 2022-06-20 CN CN202210697612.5A patent/CN117292691A/en active Pending
- 2022-06-28 WO PCT/CN2022/102036 patent/WO2023245700A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023245700A1 (en) | 2023-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Szöke et al. | Building and evaluation of a real room impulse response dataset | |
JP7434137B2 (en) | Speech recognition method, device, equipment and computer readable storage medium | |
CN111161752B (en) | Echo cancellation method and device | |
CN109074816B (en) | Far field automatic speech recognition preprocessing | |
CN111833896B (en) | Voice enhancement method, system, device and storage medium for fusing feedback signals | |
JP5127754B2 (en) | Signal processing device | |
CN108847215B (en) | Method and device for voice synthesis based on user timbre | |
WO2022012195A1 (en) | Audio signal processing method and related apparatus | |
CN103124165A (en) | Automatic gain control | |
CN109658935B (en) | Method and system for generating multi-channel noisy speech | |
JP5634959B2 (en) | Noise / dereverberation apparatus, method and program thereof | |
CN111142066A (en) | Direction-of-arrival estimation method, server, and computer-readable storage medium | |
WO2020139724A1 (en) | Context-based speech synthesis | |
CN114792524B (en) | Audio data processing method, apparatus, program product, computer device and medium | |
Morita et al. | Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments | |
CN116030823A (en) | Voice signal processing method and device, computer equipment and storage medium | |
JPWO2011004503A1 (en) | Noise removing apparatus and noise removing method | |
CN105869656B (en) | Method and device for determining definition of voice signal | |
JPWO2014049944A1 (en) | Audio processing device, audio processing method, audio processing program, and noise suppression device | |
CN110169082A (en) | Combining audio signals output | |
CN114333893A (en) | Voice processing method and device, electronic equipment and readable medium | |
CN109741761B (en) | Sound processing method and device | |
JP2012181561A (en) | Signal processing apparatus | |
WO2023051622A1 (en) | Method for improving far-field speech interaction performance, and far-field speech interaction system | |
CN117292691A (en) | Audio energy analysis method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |