WO2023245700A1 - 一种音频能量分析方法和相关装置 - Google Patents

一种音频能量分析方法和相关装置 Download PDF

Info

Publication number
WO2023245700A1
WO2023245700A1 PCT/CN2022/102036 CN2022102036W WO2023245700A1 WO 2023245700 A1 WO2023245700 A1 WO 2023245700A1 CN 2022102036 W CN2022102036 W CN 2022102036W WO 2023245700 A1 WO2023245700 A1 WO 2023245700A1
Authority
WO
WIPO (PCT)
Prior art keywords
energy
audio energy
audio
sound source
total
Prior art date
Application number
PCT/CN2022/102036
Other languages
English (en)
French (fr)
Inventor
郝斌
Original Assignee
青岛海尔科技有限公司
海尔智家股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛海尔科技有限公司, 海尔智家股份有限公司 filed Critical 青岛海尔科技有限公司
Publication of WO2023245700A1 publication Critical patent/WO2023245700A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present disclosure relates to the field of data analysis technology, and in particular to an audio energy analysis method and related devices.
  • Voice interaction is one of the commonly used human-computer interaction methods today. When there are multiple devices that support voice interaction in the scene, the relevant device needs to determine which device the user really wants to interact with and perform corresponding voice interaction.
  • the processing device can analyze the audio interference of the device itself, thereby accurately identifying the sound source of voice interaction and improving the user's voice interaction experience.
  • an embodiment of the present disclosure discloses an audio energy analysis method, which method includes:
  • the second total audio energy is the audio energy received by the second device, and the own audio energy is energy generated based on audio played by the first device;
  • the own audio energy and the second total audio energy, the second sound source audio energy corresponding to the second device is determined, and the second sound source audio energy is the second device Audio energy obtained from the sound source.
  • an embodiment of the present disclosure discloses an audio energy analysis device, which includes a first determination unit, an acquisition unit, and a second determination unit:
  • the first determining unit is configured to determine the energy loss parameter corresponding to the second device, where the energy loss parameter is used to identify the loss when the first device transmits audio energy to the second device;
  • the acquisition unit is configured to acquire the second total audio energy corresponding to the second device and its own audio energy corresponding to the first device, where the second total audio energy is the audio energy received by the second device.
  • the own audio energy is the energy generated based on the audio played by the first device;
  • the second determination unit is configured to determine the second sound source audio energy corresponding to the second device according to the energy loss parameter, the own audio energy and the second total audio energy.
  • the source audio energy is the audio energy obtained by the second device from the sound source.
  • an embodiment of the present disclosure discloses a computer-readable storage medium.
  • the at least one processor When instructions in the computer-readable storage medium are executed by at least one processor, the at least one processor performs the steps of the first aspect.
  • an embodiment of the present disclosure discloses a computer device, including:
  • At least one memory storing computer-executable instructions
  • the at least one processor when the computer device executable instructions are run by the at least one processor, the at least one processor performs the audio energy analysis method according to any one of the first aspects.
  • the processing device when performing audio energy analysis, can first determine the energy loss parameter corresponding to the second device.
  • the energy loss parameter can identify the loss when the first device transmits audio energy to the second device.
  • the processing device can obtain the second total audio energy received by the second device, and the own audio energy generated by the first device by playing audio, so that it can be combined with these data analysis to obtain the second device received from the sound source of the voice interaction.
  • the obtained audio energy eliminates the interference of the first device's self-broadcast audio on voice interaction recognition, and can more accurately analyze the device that the user wants to interact with based on the audio energy, improving the user's voice interaction experience.
  • Figure 1 is a flow chart of an audio energy analysis method provided by an embodiment of the present disclosure
  • Figure 2 is a structural block diagram of an audio energy analysis device provided by an embodiment of the present disclosure
  • Figure 3 is a structural block diagram of an optional computer device according to an embodiment of the present disclosure.
  • this method can be applied to a processing device, which is a processing device capable of audio energy analysis, for example, it can be a terminal device or a server with an audio energy analysis function.
  • This method can be executed independently by the terminal device or the server, or can be applied to a network scenario in which the terminal device and the server communicate, and can be executed by the terminal device and the server in cooperation.
  • the terminal device can be a computer, a mobile phone and other devices.
  • the server can be understood as an application server or a Web server. In actual deployment, the server can be an independent server or a cluster server.
  • Figure 1 is a flow chart of an audio energy analysis method provided by an embodiment of the present disclosure.
  • the method includes:
  • S101 Determine the energy loss parameters corresponding to the second device.
  • the energy loss parameter is used to identify the loss when the first device transmits audio energy to the second device, that is, when the audio energy is emitted from the first device, how much audio energy the second device can receive.
  • the energy loss parameter may be 0.9, that is, when the first device emits audio energy, the second device can receive 90% of the emitted audio energy.
  • S102 Obtain the second total audio energy corresponding to the second device and the own audio energy corresponding to the first device.
  • the second total audio energy is the audio energy received by the second device, and the own audio energy is the energy generated based on the audio played by the first device.
  • the audio energy received by the second device includes two parts. One part is the audio energy generated by the audio emitted by the user performing voice interaction, and the other part is the audio energy generated by the first device. The audio energy produced by the emitted audio.
  • the processing device needs to first remove the audio energy portion from the first device from the second total audio energy before accurate voice interaction can be performed.
  • S103 Determine the second sound source audio energy corresponding to the second device according to the energy loss parameter, the own audio energy and the second total audio energy.
  • the energy loss parameter can identify the loss when the first device transmits audio energy to the second device.
  • the own audio energy is the audio energy generated by the first device playing audio. Therefore, the processing device can pass the energy loss parameter. and its own audio energy to determine the audio energy received by the second device from the first device, so that this part of the energy can be removed from the second total audio energy to obtain the second sound source audio energy corresponding to the second device.
  • the audio energy of the second sound source is the audio energy obtained by the second device from the sound source.
  • the sound source is a sound source performing voice interaction, for example, it may be a user performing voice interaction.
  • the processing device when performing audio energy analysis, can first determine the energy loss parameter corresponding to the second device.
  • the energy loss parameter can identify the loss when the first device transmits audio energy to the second device.
  • the processing device can obtain the second total audio energy received by the second device, and the own audio energy generated by the first device by playing audio, so that it can be combined with these data analysis to obtain the second device received from the sound source of the voice interaction.
  • the obtained audio energy eliminates the interference of the first device's self-broadcast audio on voice interaction recognition, and can more accurately analyze the device that the user wants to interact with based on the audio energy, improving the user's voice interaction experience.
  • the energy loss parameter can be obtained based on the following method.
  • the processing device can determine the corresponding test own audio energy of the first device when playing audio, and the received audio energy received by the second device from the first device, and the test The self-audio energy refers to the audio energy generated by the audio played by the first device.
  • the processing device can determine the energy loss parameter corresponding to the second device based on the ratio of the received audio energy to the tested own audio energy. The ratio can reflect the ratio between the audio energy actually received by the second device and the audio energy emitted by the first device. The difference between them can then identify the loss when the first device transmits audio energy to the second device.
  • the processing device in order to determine whether the user wants to perform voice interaction with the first device or the second device, can also determine the first total audio energy received by the first device, and then use the Audio energy, the first total audio energy, determine the first sound source audio energy corresponding to the first device, that is, remove the own audio energy part of the first total audio energy, the first sound source audio energy and the first sound source audio energy
  • the audio energy of the two sources comes from the same source.
  • the processing device can determine the magnitude relationship between the audio energy of the first sound source and the audio energy of the second sound source. This magnitude relationship can reflect the user's willingness to interact with the first device and the second device to a certain extent.
  • the processing device can wake up the corresponding voice interaction function of the first device.
  • Both audio devices are configured with the same wake-up word and have distributed wake-up function.
  • the processing device can calculate the corresponding audio energy and upload it to the cloud for decision-making.
  • the cloud selects the device that needs to respond based on the scoring criteria.
  • the cloud can control speaker A to play white noise audio for a period of time.
  • the audio signals received by A and B are transformed by stft.
  • the average audio energy during the period is recorded as X A (k) and X A ⁇ B ( k).
  • the FFT length is 512
  • the audio energy received by A includes:
  • Y A (l, k) S A (l, k) + E A (l, k)
  • S A (k) represents the audio energy of the sound source corresponding to speaker A
  • E A (k) represents the audio energy of speaker A corresponding to itself.
  • Audio energy, Y A (l, k) represents the first total audio energy.
  • the audio energy received by speaker B at this moment includes:
  • Y A ⁇ B (l, k) S B (l, k)+E A ⁇ B (l, k)
  • S B (l, k) represents the audio energy of the sound source corresponding to speaker B
  • E A ⁇ B ( l,k) represents the audio energy that speaker B receives from the audio emitted by speaker A.
  • E A ⁇ B (l, k) E A (l, k) * C A ⁇ B (l, k), so that the sound source audio energy S B (l, k) corresponding to speaker B can be determined ).
  • the processing device can determine the speaker that the user actually wants to wake up.
  • the processing device can determine the audio energy corresponding to the first device based on its own audio energy. Nonlinear energy, the nonlinear energy is generated based on the vibration generated when the first device plays audio. The processing device may remove the own audio energy and the nonlinear energy from the first total audio energy to obtain the first sound source audio energy, thereby obtaining more accurate sound source energy.
  • the processing device can determine the nonlinear energy corresponding to the first device according to its own audio energy through a neural network model.
  • the neural network model can be trained in the following ways:
  • the processing device can obtain a training sample set, which includes the sample's own audio energy and the total audio energy of the sample collected by the target device in an environment without other sound sources. Since the audio energy of the sample itself is determined directly based on the audio information played by the target device itself, the audio energy of the sample itself is the linear audio energy corresponding to the target device; the total audio energy of the sample is the total audio energy received from the target device. Audio energy includes linear audio energy and nonlinear audio energy.
  • the processing device can determine the nonlinear energy of the sample corresponding to the target device, and then use the total audio energy of the sample,
  • the sample's own audio energy and the sample's nonlinear energy are used to train the initial neural network model to obtain the neural network model.
  • the neural network model can learn the relationship between the nonlinear audio energy part and the linear audio energy. Correlation relationships, thereby being able to learn how to determine nonlinear audio energy components based on linear audio energy components.
  • the echo cancellation signal Ref(l,k) (i.e., the audio energy of the sample itself) can be used to perform echo cancellation on the microphone signal Mic(l,k) (i.e., the total audio energy of the sample) to obtain Aec(l,k) (the nonlinear energy of the sample).
  • Mic(l,k) i.e., the total audio energy of the sample
  • Aec(l,k) the nonlinear energy of the sample.
  • the speech signals mic, ref, and aec obtain the corresponding frequency domain signals through stft.
  • the input of the model is the bark value of splicing aec, ref, and mic. Each frame is a 64*3 vector.
  • the model structure adopts the CRN structure, taking into account the memory and performance limitations on the device, and the encoder and decoder layers are only one layer.
  • the encoder layer uses one-dimensional convolution, input channel 192, output channel 64, convolution kernel size 3, connected to BatchNorm and PReLU;
  • the enhancer layer uses LSTM with input 64 and hidden layer 64;
  • the decoder layer uses two-dimensional convolution, Input channel 64+64, output channel 64, convolution kernel size 3, connected to BatchNorm and PReLU;
  • activation function uses sigmoid.
  • the loss function uses MES, which is the mean square error of the estimated residual echo component and the true residual echo component.
  • the model learns how to determine the nonlinear audio energy portion based on the linear audio energy portion.
  • Figure 2 is a structural block diagram of an audio energy analysis device 200 provided by the embodiment of the present disclosure.
  • the device includes a first determination unit 201, an acquisition unit 202 and a second determination unit 203:
  • the first determining unit 201 is configured to determine the energy loss parameter corresponding to the second device, where the energy loss parameter is used to identify the loss when the first device transmits audio energy to the second device;
  • the acquisition unit 202 is configured to acquire the second total audio energy corresponding to the second device and its own audio energy corresponding to the first device, where the second total audio energy is the audio energy received by the second device. Audio energy, the own audio energy is the energy generated based on the audio played by the first device;
  • the second determining unit 203 is configured to determine the second sound source audio energy corresponding to the second device according to the energy loss parameter, the own audio energy and the second total audio energy.
  • the audio energy of the sound source is the audio energy obtained by the second device from the sound source.
  • the energy loss parameter is obtained based on the following method:
  • the energy loss parameter corresponding to the second device is determined.
  • the device further includes a third determination unit, a fourth determination unit and a wake-up unit:
  • the third determining unit is configured to determine the first total audio energy received by the first device
  • the fourth determining unit is configured to determine the first sound source audio energy corresponding to the first device according to the own audio energy and the first total audio energy, the first sound source audio energy and the The second sound source audio energy comes from the same sound source;
  • the wake-up unit is configured to wake up the corresponding voice interaction function of the first device in response to the audio energy of the first sound source being greater than the audio energy of the second sound source.
  • the fourth determining unit is specifically configured to:
  • the first sound source audio energy is obtained by removing the own audio energy and the nonlinear energy from the first total audio energy.
  • the fourth determining unit is specifically configured to:
  • the nonlinear energy corresponding to the first device is determined according to the own audio energy.
  • the neural network model is trained in the following manner:
  • Obtain a training sample set which includes the sample's own audio energy and the total audio energy of the sample collected by the target device in an environment without other sound sources;
  • the initial neural network model is trained through the total audio energy of the sample, the audio energy of the sample itself, and the nonlinear energy of the sample to obtain the neural network model.
  • the present disclosure discloses a computer-readable storage medium.
  • the at least one processor executes the audio method described in any one of the above embodiments. Energy analysis methods.
  • the present disclosure also discloses a computer device.
  • the computer device includes:
  • At least one processor 304 At least one processor 304;
  • At least one memory 302 storing computer-executable instructions
  • the at least one processor when the computer device executable instructions are run by the at least one processor, the at least one processor performs the audio energy analysis method as described in any one of the above embodiments.
  • the above-mentioned computer device may be located in at least one network device among multiple network devices of the computer network.
  • the above-mentioned processor may be configured to perform the following steps through a computer program:
  • the energy loss parameter is used to identify the loss when the first device transmits audio energy to the second device;
  • the second total audio energy is the audio energy received by the second device.
  • the own audio energy is the audio energy received by the second device.
  • the energy is energy generated based on the audio played by the first device;
  • S3 Determine the second sound source audio energy corresponding to the second device according to the energy loss parameter, the own audio energy and the second total audio energy.
  • the second sound source audio energy is the second sound source audio energy.
  • the structure shown in Figure 3 is only illustrative, and the computer device can also be a smart phone (such as an Android phone, iOS phone, etc.), a tablet computer, a handheld computer, and a mobile Internet device (Mobile Internet Devices, MID), PAD and other terminal equipment.
  • Figure 3 does not limit the structure of the above computer equipment.
  • the computer device may also include more or fewer components (such as network interfaces, etc.) than shown in Figure 3, or have a different configuration than that shown in Figure 2.
  • the memory 302 can be used to store software programs and modules, such as the program instructions/modules corresponding to the semantic conversion method and device in the embodiment of the present disclosure.
  • the processor 304 executes various software programs and modules by running the software programs and modules stored in the memory 302. A kind of functional application and data processing, that is, to implement the above-mentioned semantic conversion method.
  • Memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 302 may further include memory located remotely relative to the processor 304, and these remote memories may be connected to the terminal through a network.
  • the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the memory 302 may include, but is not limited to, the first determination unit 1301 , the acquisition unit 1302 , and the second determination unit 1303 in the audio energy analysis device. In addition, it may also include but is not limited to other module units in the audio energy analysis device described above, which will not be described again in this example. c
  • the above-mentioned transmission device 306 is used to receive or send data via a network.
  • Specific examples of the above-mentioned network may include wired networks and wireless networks.
  • the transmission device 306 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers through network cables to communicate with the Internet or a local area network.
  • the transmission device 306 is a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet wirelessly.
  • RF Radio Frequency
  • the above-mentioned electronic device also includes: a display 308; and a connection bus 310 for connecting various module components in the above-mentioned electronic device.
  • the foregoing program can be stored in a computer-readable storage medium.
  • the execution includes: The steps of the above method embodiment; and the aforementioned storage medium can be at least one of the following media: read-only memory (English: read-only memory, abbreviation: ROM), RAM, magnetic disk or optical disk, etc., which can store The medium for program code.
  • each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments can be referred to each other.
  • Each embodiment focuses on the differences from other embodiments. at.
  • the device and system embodiments are described simply because they are basically similar to the method embodiments.
  • the device and system embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

本公开实施例提供了一种音频能量分析方法和相关装置,在进行音频能量分析时,处理设备可以先确定第二设备对应的能量损耗参数,该能量损耗参数能够标识第一设备向第二设备传递音频能量时的损耗。以及,处理设备可以获取第二设备接收到的第二总音频能量,以及第一设备通过播放音频产生的自身音频能量,从而可以结合这些数据分析得到该第二设备从语音交互的声源处接收到的音频能量。

Description

一种音频能量分析方法和相关装置 技术领域
本公开涉及数据分析技术领域,特别是涉及一种音频能量分析方法和相关装置。
本公开要求于2022年06月20日提交中国专利局、申请号为202210697612.5、发明名称“一种音频能量分析方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
背景技术
语音交互是当下常用的人机交互手段之一,当场景中存在多个支持语音交互的设备时,相关设备需要判断用户真正想要交互的是哪一个设备,并进行相应的语音交互。
在相关技术中,当进行语音交互的设备自身不会发出声音时,可以较为准确的判断用户想要进行交互的设备;然而,当这些设备自身会发出声音时,就难以确定用户想要进行交互的设备,用户的语音交互体验较差。
发明内容
为了解决上述技术问题,本公开提供了一种音频能量分析方法,处理设备可以对设备自身的音频干扰进行分析,从而准确识别出语音交互的声源,改善用户的语音交互体验。
本公开实施例公开了如下技术方案:
第一方面,本公开实施例公开了一种音频能量分析方法,所述方法包括:
确定第二设备对应的能量损耗参数,所述能量损耗参数用于标识第一设备向所述第二设备传递音频能量时的损耗;
获取所述第二设备对应的第二总音频能量以及所述第一设备对应的自身音频能量,所述第二总音频能量为所述第二设备接收到的音频能量,所述自身音频能量为基于所述第一设备播放的音频产生的能量;
根据所述能量损耗参数、所述自身音频能量和所述第二总音频能量,确 定所述第二设备对应的第二声源音频能量,所述第二声源音频能量为所述第二设备从声源处获取到的音频能量。
第二方面,本公开实施例公开了一种音频能量分析装置,所述装置包括第一确定单元、获取单元和第二确定单元:
所述第一确定单元,设置为确定第二设备对应的能量损耗参数,所述能量损耗参数用于标识第一设备向所述第二设备传递音频能量时的损耗;
所述获取单元,设置为获取所述第二设备对应的第二总音频能量以及所述第一设备对应的自身音频能量,所述第二总音频能量为所述第二设备接收到的音频能量,所述自身音频能量为基于所述第一设备播放的音频产生的能量;
所述第二确定单元,设置为根据所述能量损耗参数、所述自身音频能量和所述第二总音频能量,确定所述第二设备对应的第二声源音频能量,所述第二声源音频能量为所述第二设备从声源处获取到的音频能量。
第三方面,本公开实施例公开了一种计算机可读存储介质,当所述计算机可读存储介质中的指令被至少一个处理器运行时,所述至少一个处理器执行如第一方面中的任一项所述的音频能量分析方法。
第四方面,本公开实施例公开了一种计算机设备,包括:
至少一个处理器;
至少一个存储计算机可执行指令的存储器,
其中,所述计算机设备可执行指令在被所述至少一个处理器运行时,所述至少一个处理器执行如第一方面中的任一项所述的音频能量分析方法。
由上述技术方案可以看出,在进行音频能量分析时,处理设备可以先确定第二设备对应的能量损耗参数,该能量损耗参数能够标识第一设备向第二设备传递音频能量时的损耗。以及,处理设备可以获取第二设备接收到的第二总音频能量,以及第一设备通过播放音频产生的自身音频能量,从而可以结合这些数据分析得到该第二设备从语音交互的声源处接收到的音频能量,消除第一设备自播的音频对于语音交互识别上的干扰,能够基于音频能量更加准确的分析出用户想要进行交互的设备,改善用户的语音交互体验。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种音频能量分析方法的流程图;
图2为本公开实施例提供的一种音频能量分析装置的结构框图;
图3是根据本公开实施例的一种可选的计算机设备的结构框图。
具体实施方式
下面结合附图,对本公开的实施例进行描述。
可以理解的是,该方法可以应用于处理设备上,该处理设备为能够进行音频能量分析的处理设备,例如可以为具有音频能量分析功能的终端设备或服务器。该方法可以通过终端设备或服务器独立执行,也可以应用于终端设备和服务器通信的网络场景,通过终端设备和服务器配合执行。其中,终端设备可以为计算机、手机等设备。服务器可以理解为是应用服务器,也可以为Web服务器,在实际部署时,该服务器可以为独立服务器,也可以为集群服务器。
参见图1,图1为本公开实施例提供的一种音频能量分析方法的流程图,该方法包括:
S101:确定第二设备对应的能量损耗参数。
其中,该能量损耗参数用于标识第一设备向第二设备传递音频能量时的损耗,即音频能量从第一设备发出时,第二设备所能够接收多少该音频能量。例如,该能量损耗参数可以为0.9,即当第一设备发出音频能量时,第二设备能够接收到所发出音频能量的90%。
S102:获取第二设备对应的第二总音频能量以及第一设备对应的自身音频能量。
该第二总音频能量为第二设备接收到的音频能量,该自身音频能量为基于第一设备播放的音频产生的能量。可以理解的是,在有设备自播的场景下,该第二设备所接收到的音频能量包括两部分,一部分为进行语音交互的用户所发出的音频产生的音频能量,另一部分为第一设备所发出音频产生的音频 能量。而在多设备场景下,为了准确分析用户想要进行交互的是哪一台设备,通常情况下是基各台设备所接收到的来自该用户的音频能量进行判定,因此,在本公开实施例中,处理设备需要先从该第二总音频能量中去除来自第一设备的音频能量部分,才能够进行准确的语音交互。
S103:根据能量损耗参数、自身音频能量和第二总音频能量,确定第二设备对应的第二声源音频能量。
上已述及,该能量损耗参数能够标识出第一设备向第二设备传递音频能量时的损耗,该自身音频能量为第一设备播放音频产生的音频能量,因此,处理设备可以通过能量损耗参数和自身音频能量确定出第二设备从第一设备处接收到的音频能量,从而可以从第二总音频能量中去除这部分能量,得到该第二设备对应的第二声源音频能量,该第二声源音频能量为第二设备从声源处获取到的音频能量,该声源为进行语音交互的声源,例如可以为进行语音交互的用户等。
由上述技术方案可以看出,在进行音频能量分析时,处理设备可以先确定第二设备对应的能量损耗参数,该能量损耗参数能够标识第一设备向第二设备传递音频能量时的损耗。以及,处理设备可以获取第二设备接收到的第二总音频能量,以及第一设备通过播放音频产生的自身音频能量,从而可以结合这些数据分析得到该第二设备从语音交互的声源处接收到的音频能量,消除第一设备自播的音频对于语音交互识别上的干扰,能够基于音频能量更加准确的分析出用户想要进行交互的设备,改善用户的语音交互体验。
在一种可能的实现方式中,该能量损耗参数可以是基于以下方式得到的。在没有除第一设备以外的其它声源的环境下,处理设备可以确定第一设备在播放音频时对应的测试自身音频能量,以及第二设备从第一设备接收到的接收音频能量,该测试自身音频能量是指由该第一设备播放的音频产生的音频能量。处理设备可以根据该接收音频能量与测试自身音频能量的比值,确定所述第二设备对应的能量损耗参数,该比值能够体现出第二设备实际接收到音频能量与第一设备发出的音频能量之间的差异,进而能够标识出第一设备向第二设备传递音频能量时的损耗。
在一种可能的实现方式中,为了判断用户想要进行语音交互的是第一设 备还是第二设备,处理设备还可以确定该第一设备接收到的第一总音频能量,然后根据所述自身音频能量、所述第一总音频能量,确定所述第一设备对应的第一声源音频能量,即去除该第一总音频能量中的自身音频能量部分,该第一声源音频能量和第二声源音频能量来自同一声源。处理设备可以判断第一声源音频能量与第二声源音频能量之间的大小关系,该大小关系在一定程度上能够体现出用户对于第一设备和第二设备的交互意愿。响应于该第一声源音频能量大于该第二声源音频能量,说明该用户更想要与第一设备进行交互,处理设备可以唤醒该第一设备对应的语音交互功能。
例如,以两台音箱A,B为例,说明流程。两台音频都配置了相同的唤醒词,且具备分布式唤醒功能。当两台设备接收到唤醒词时,处理设备可以计算出对应的音频能量上传到云端决策,云端根据打分标准选择需要响应的设备。
首先,云端可以控制音箱A播放一段时间的白噪音频,此时A,B接收到的音频信号经过stft变换,统计该段时间的音频能量均值记为X A(k)和X A→B(k)。以16k采样率为例,FFT长度512,选择频率统计范围为200-5000Hz(对应频带k=3-160),则音箱B对应的能量损耗参数可以为C A→B(k)=X A→B(k)/X A(k)。
反之可以得到音箱A对应的能量损耗参数可以为C B→A(k)=X B→A(k)/X B(k).
其中,X A(k)和X A→B(k)是设备计算上传到云端,云端计算得到C A→B(k)后推送到设备A中;同理推送C B→A(k)。
当设备A播放音频时,用户靠近A发出唤醒词,此时A接收到的音频能量包括:
Y A(l,k)=S A(l,k)+E A(l,k),S A(k)表示音箱A对应的声源音频能量,E A(k)表示音箱A对应的自身音频能量,Y A(l,k)表示第一总音频能量。
此刻音箱B接收到的音频能量包括:
Y A→B(l,k)=S B(l,k)+E A→B(l,k),S B(l,k)表示音箱B对应的声源音频能量,E A→B(l,k)表示音箱B从音箱A发出的音频中接收到的音频能量。
经过AEC,可以得到音箱A对应的声源音频能量Y′ A(l,k)=S A(l,k),则 E A(l,k)=Y A(l,k)-Y′ A(l,k)。经矫正后,E A→B(l,k)=E A(l,k)*C A→B(l,k),从而可以确定出音箱B对应的声源音频能量S B(l,k)。
根据S B(l,k)和S A(l,k)的大小判断,处理设备可以确定用户实际想要唤醒的音箱。
可以理解的是,由于播放音频时产生的震动等原因,设备所接收的基于自身产生的音频能量可能并不等同于播放的音频能量。因此,在一种可能的实现方式中,为了更加精确的进行音频能量分析,在对第一设备对应的声源音频能量进行分析时,处理设备可以根据自身音频能量,确定该第一设备对应的非线性能量,该非线性能量是基于所述第一设备播放音频时产生的震动生成的。处理设备可以去除所述第一总音频能量中的所述自身音频能量和所述非线性能量,得到所述第一声源音频能量,从而能够得到更加准确的声源能量。
在一种可能的实现方式中,具体的,处理设备可以通过神经网络模型,根据自身音频能量,确定第一设备对应的非线性能量。该神经网络模型可以是通过以下方式训练得到:
首先,处理设备可以获取训练样本集,该训练样本集包括目标设备在没有其它声源的环境下采集的样本自身音频能量和样本总音频能量。由于该样本自身音频能量是直接基于目标设备自身播放的音频信息确定的,因此该样本自身音频能量为了该目标设备对应的线性音频能量;该样本总音频能量为从该目标设备中接收到的总音频能量,即包括线性音频能量和非线性音频能量,因此,根据该样本总音频能量和该样本自身音频能量,处理设备可以确定目标设备对应的样本非线性能量,然后通过该样本总音频能量、该样本自身音频能量和所述样本非线性能量,训练初始神经网络模型,得到所述神经网络模型,在该训练过程中,神经网络模型可以学习到非线性音频能量部分与线性音频能量之间的关联关系,从而能够学习到如何基于线性音频能量部分确定非线性音频能量部分。
例如,利用回采信号Ref(l,k)(即样本自身音频能量)可以对麦克风信号Mic(l,k)(即样本总音频能量)进行回声消除得到Aec(l,k)(样本非线性能量)。用NLMS、RLS等线性方法,去除线性音频能量成分E linear(l,k)后,还有一部 分非线性音频能量成分E residual(l,k)。
语音信号mic、ref、aec通过stft得到对应的频域信号,频域信号是复数,以16k采样,帧长16ms,fft长度512为例,每个频域信号都是257(根据FFT的对称特性,512/2+1=257)的复数组。复数信号取绝对值后转换到Bark域,可得64维数据。模型的输入,拼接aec、ref、mic的bark值,每帧为64*3的向量。
模型结构采用CRN结构,同时考虑设备上内存和性能限制,encoder和decoder层仅一层。具体得,encoder层采用一维卷积,输入通道192,输出通道64,卷积核大小3,接BatchNorm和PReLU;enhancer层采用输入64,隐藏层64的LSTM;decoder层采用二维卷积,输入通道64+64,输出通道64,卷积核大小3,接BatchNorm和PReLU;激活函数采用sigmoid。损失函数采用MES,估计的残留回声成分与真实的残留回声成分的均方差。优化函数采用Adam,学习率0.001,β 1=0.9,β 1=0.999。
模型输入维度64*3,输出维度64,转换为频域得到增益值G(l,k),E residual(l,k)=(Mic(l,k)-E linear(l,k))*G(l,k),即,从设备发出的总音频能量去除线性音频能量后,结合该增益值可以确定出非线性音频能量。从而,通过该过程,模型可以学习到如何基于线性音频能量部分确定出非线性音频能量部分。
基于上述实施例提供的音频能量分析方法,本公开实施例还提供了一种音频能量分析装置,参见图2,图2为本公开实施例提供的一种音频能量分析装置200的结构框图,该装置包括第一确定单元201、获取单元202和第二确定单元203:
所述第一确定单元201,设置为确定第二设备对应的能量损耗参数,所述能量损耗参数用于标识第一设备向所述第二设备传递音频能量时的损耗;
所述获取单元202,,设置为获取所述第二设备对应的第二总音频能量以及所述第一设备对应的自身音频能量,所述第二总音频能量为所述第二设备接收到的音频能量,所述自身音频能量为基于所述第一设备播放的音频产生的能量;
所述第二确定单元203,设置为根据所述能量损耗参数、所述自身音频能 量和所述第二总音频能量,确定所述第二设备对应的第二声源音频能量,所述第二声源音频能量为所述第二设备从声源处获取到的音频能量。
在一种可能的实现方式中,所述能量损耗参数是基于以下方式得到的:
在没有除所述第一设备以外的其它声源的环境下,确定所述第一设备在播放音频时对应的测试自身音频能量,以及所述第二设备从所述第一设备接收到的接收音频能量;
根据所述接收音频能量与所述测试自身音频能量的比值,确定所述第二设备对应的能量损耗参数。
在一种可能的实现方式中,所述装置还包括第三确定单元、第四确定单元和唤醒单元:
所述第三确定单元,设置为确定所述第一设备接收到的第一总音频能量;
所述第四确定单元,设置为根据所述自身音频能量、所述第一总音频能量,确定所述第一设备对应的第一声源音频能量,所述第一声源音频能量和所述第二声源音频能量来自同一声源;
所述唤醒单元,设置为响应于所述第一声源音频能量大于所述第二声源音频能量,唤醒所述第一设备对应的语音交互功能。
在一种可能的实现方式中,所述第四确定单元具体设置为:
根据所述自身音频能量,确定所述第一设备对应的非线性能量,所述非线性能量是基于所述第一设备播放音频时产生的震动生成的;
去除所述第一总音频能量中的所述自身音频能量和所述非线性能量,得到所述第一声源音频能量。
在一种可能的实现方式中,所述第四确定单元具体设置为:
通过神经网络模型,根据所述自身音频能量,确定所述第一设备对应的非线性能量。
在一种可能的实现方式中,所述神经网络模型是通过以下方式训练得到的:
获取训练样本集,所述训练样本集包括目标设备在没有其它声源的环境下采集的样本自身音频能量和样本总音频能量;
根据所述样本总音频能量和所述样本自身音频能量,确定所述目标设备 对应的样本非线性能量;
通过所述样本总音频能量、所述样本自身音频能量和所述样本非线性能量,训练初始神经网络模型,得到所述神经网络模型。
本公开公开了一种计算机可读存储介质,当所述计算机可读存储介质中的指令被至少一个处理器运行时,所述至少一个处理器执行上述实施例中的任一项所述的音频能量分析方法。
本公开还公开了一种计算机设备,如图3所示,该计算机设备包括:
至少一个处理器304;
至少一个存储计算机可执行指令的存储器302,
其中,所述计算机设备可执行指令在被所述至少一个处理器运行时,所述至少一个处理器执行如上述实施例中的任一项所述的音频能量分析方法。
可选地,在本实施例中,上述计算机设备可以位于计算机网络的多个网络设备中的至少一个网络设备。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,确定第二设备对应的能量损耗参数,所述能量损耗参数用于标识第一设备向所述第二设备传递音频能量时的损耗;
S2,获取所述第二设备对应的第二总音频能量以及所述第一设备对应的自身音频能量,所述第二总音频能量为所述第二设备接收到的音频能量,所述自身音频能量为基于所述第一设备播放的音频产生的能量;
S3,根据所述能量损耗参数、所述自身音频能量和所述第二总音频能量,确定所述第二设备对应的第二声源音频能量,所述第二声源音频能量为所述第二设备从声源处获取到的音频能量。
可选地,本领域普通技术人员可以理解,图3所示的结构仅为示意,计算机设备也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Intemet Devices,MID)、PAD等终端设备。图3其并不对上述计算机设备的结构造成限定。例如,计算机设备还可包括比图3中所示更多或者更少的组件(如网络接口等),或者具有与图2所 示不同的配置。
其中,存储器302可用于存储软件程序以及模块,如本公开实施例中的语义转换方法和装置对应的程序指令/模块,处理器304通过运行存储在存储器302内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的语义转换方法。存储器302可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器302可进一步包括相对于处理器304远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。作为一种示例,如图3所示,上述存储器302中可以但不限于包括上述音频能量分析装置中的第一确定单元1301、获取单元1302、第二确定单元1303。此外,还可以包括但不限于上述音频能量分析装置中的其他模块单元,本示例中不再赘述。c
可选地,上述的传输装置306用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置306包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置306为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
此外,上述电子装置还包括:显示器308;和连接总线310,用于连接上述电子装置中的各个模块部件。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质可以是下述介质中的至少一种:只读存储器(英文:read-only memory,缩写:ROM)、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其 他实施例的不同之处。尤其,对于设备及系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的设备及系统实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述,仅为本公开的一种具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应该以权利要求的保护范围为准。

Claims (14)

  1. 一种音频能量分析方法,所述方法包括:
    确定第二设备对应的能量损耗参数,所述能量损耗参数用于标识第一设备向所述第二设备传递音频能量时的损耗;
    获取所述第二设备对应的第二总音频能量以及所述第一设备对应的自身音频能量,所述第二总音频能量为所述第二设备接收到的音频能量,所述自身音频能量为基于所述第一设备播放的音频产生的能量;
    根据所述能量损耗参数、所述自身音频能量和所述第二总音频能量,确定所述第二设备对应的第二声源音频能量,所述第二声源音频能量为所述第二设备从声源处获取到的音频能量。
  2. 根据权利要求1所述的方法,其中,所述能量损耗参数是基于以下方式得到的:
    在没有除所述第一设备以外的其它声源的环境下,确定所述第一设备在播放音频时对应的测试自身音频能量,以及所述第二设备从所述第一设备接收到的接收音频能量;
    根据所述接收音频能量与所述测试自身音频能量的比值,确定所述第二设备对应的能量损耗参数。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    确定所述第一设备接收到的第一总音频能量;
    根据所述自身音频能量、所述第一总音频能量,确定所述第一设备对应的第一声源音频能量,所述第一声源音频能量和所述第二声源音频能量来自同一声源;
    响应于所述第一声源音频能量大于所述第二声源音频能量,唤醒所述第一设备对应的语音交互功能。
  4. 根据权利要求3所述的方法,其中,所述根据所述自身音频能量、所述第一总音频能量,确定所述第一设备对应的第一声源音频能量,包括:
    根据所述自身音频能量,确定所述第一设备对应的非线性能量,所述非线性能量是基于所述第一设备播放音频时产生的震动生成的;
    去除所述第一总音频能量中的所述自身音频能量和所述非线性能量,得到所述第一声源音频能量。
  5. 根据权利要求4所述的方法,其中,所述根据所述自身音频能量,确定所述第一设备对应的非线性能量,包括:
    通过神经网络模型,根据所述自身音频能量,确定所述第一设备对应的非线性能量。
  6. 根据权利要求5所述的方法,其中,所述神经网络模型是通过以下方式训练得到的:
    获取训练样本集,所述训练样本集包括目标设备在没有其它声源的环境下采集的样本自身音频能量和样本总音频能量;
    根据所述样本总音频能量和所述样本自身音频能量,确定所述目标设备对应的样本非线性能量;
    通过所述样本总音频能量、所述样本自身音频能量和所述样本非线性能量,训练初始神经网络模型,得到所述神经网络模型。
  7. 一种音频能量分析装置,所述装置包括第一确定单元、获取单元和第二确定单元:
    所述第一确定单元,设置为确定第二设备对应的能量损耗参数,所述能量损耗参数用于标识第一设备向所述第二设备传递音频能量时的损耗;
    所述获取单元,设置为获取所述第二设备对应的第二总音频能量以及所述第一设备对应的自身音频能量,所述第二总音频能量为所述第二设备接收到的音频能量,所述自身音频能量为基于所述第一设备播放的音频产生的能量;
    所述第二确定单元,设置为根据所述能量损耗参数、所述自身音频能量和所述第二总音频能量,确定所述第二设备对应的第二声源音频能量,所述第二声源音频能量为所述第二设备从声源处获取到的音频能量。
  8. 根据权利要求7所述的装置,其中,所述能量损耗参数是基于以下方式得到的:
    在没有除所述第一设备以外的其它声源的环境下,确定所述第一设备在播放音频时对应的测试自身音频能量,以及所述第二设备从所述第一设备接收到的接收音频能量;
    根据所述接收音频能量与所述测试自身音频能量的比值,确定所述第二 设备对应的能量损耗参数。
  9. 根据权利要求7所述的装置,其中,所述装置还包括第三确定单元、第四确定单元和唤醒单元:
    所述第三确定单元,设置为确定所述第一设备接收到的第一总音频能量;
    所述第四确定单元,设置为根据所述自身音频能量、所述第一总音频能量,确定所述第一设备对应的第一声源音频能量,所述第一声源音频能量和所述第二声源音频能量来自同一声源;
    所述唤醒单元,设置为响应于所述第一声源音频能量大于所述第二声源音频能量,唤醒所述第一设备对应的语音交互功能。
  10. 根据权利要求9所述的装置,其中,所述第四确定单元还设置为:
    根据所述自身音频能量,确定所述第一设备对应的非线性能量,所述非线性能量是基于所述第一设备播放音频时产生的震动生成的;
    去除所述第一总音频能量中的所述自身音频能量和所述非线性能量,得到所述第一声源音频能量。
  11. 根据权利要求9所述的装置,其中,所述第四确定单元还设置为:
    通过神经网络模型,根据所述自身音频能量,确定所述第一设备对应的非线性能量。
  12. 根据权利要求11所述的装置,其中,所述神经网络模型是通过以下方式训练得到的:
    获取训练样本集,所述训练样本集包括目标设备在没有其它声源的环境下采集的样本自身音频能量和样本总音频能量;
    根据所述样本总音频能量和所述样本自身音频能量,确定所述目标设备对应的样本非线性能量;
    通过所述样本总音频能量、所述样本自身音频能量和所述样本非线性能量,训练初始神经网络模型,得到所述神经网络模型。
  13. 一种计算机可读存储介质,当所述计算机可读存储介质中的指令被 至少一个处理器运行时,所述至少一个处理器执行如权利要求1-6中的任一项所述的音频能量分析方法。
  14. 一种计算机设备,包括:
    至少一个处理器;
    至少一个存储计算机可执行指令的存储器,
    其中,所述计算机设备可执行指令在被所述至少一个处理器运行时,所述至少一个处理器执行如权利要求1-6中的任一项所述的音频能量分析方法。
PCT/CN2022/102036 2022-06-20 2022-06-28 一种音频能量分析方法和相关装置 WO2023245700A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210697612.5 2022-06-20
CN202210697612.5A CN117292691A (zh) 2022-06-20 2022-06-20 一种音频能量分析方法和相关装置

Publications (1)

Publication Number Publication Date
WO2023245700A1 true WO2023245700A1 (zh) 2023-12-28

Family

ID=89252361

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102036 WO2023245700A1 (zh) 2022-06-20 2022-06-28 一种音频能量分析方法和相关装置

Country Status (2)

Country Link
CN (1) CN117292691A (zh)
WO (1) WO2023245700A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9947333B1 (en) * 2012-02-10 2018-04-17 Amazon Technologies, Inc. Voice interaction architecture with intelligent background noise cancellation
CN111091828A (zh) * 2019-12-31 2020-05-01 华为技术有限公司 语音唤醒方法、设备及系统
CN113593548A (zh) * 2021-06-29 2021-11-02 青岛海尔科技有限公司 智能设备的唤醒方法和装置、存储介质及电子装置
CN113674761A (zh) * 2021-07-26 2021-11-19 青岛海尔科技有限公司 设备确定方法及设备确定系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9947333B1 (en) * 2012-02-10 2018-04-17 Amazon Technologies, Inc. Voice interaction architecture with intelligent background noise cancellation
CN111091828A (zh) * 2019-12-31 2020-05-01 华为技术有限公司 语音唤醒方法、设备及系统
CN113593548A (zh) * 2021-06-29 2021-11-02 青岛海尔科技有限公司 智能设备的唤醒方法和装置、存储介质及电子装置
CN113674761A (zh) * 2021-07-26 2021-11-19 青岛海尔科技有限公司 设备确定方法及设备确定系统

Also Published As

Publication number Publication date
CN117292691A (zh) 2023-12-26

Similar Documents

Publication Publication Date Title
JP7434137B2 (ja) 音声認識方法、装置、機器及びコンピュータ読み取り可能な記憶媒体
US11502859B2 (en) Method and apparatus for waking up via speech
CN114283795B (zh) 语音增强模型的训练、识别方法、电子设备和存储介质
CN110288997A (zh) 用于声学组网的设备唤醒方法及系统
CN107799126A (zh) 基于有监督机器学习的语音端点检测方法及装置
JP2019204074A (ja) 音声対話方法、装置及びシステム
CN109658935B (zh) 多通道带噪语音的生成方法及系统
US20210287653A1 (en) System and method for data augmentation of feature-based voice data
CN113241085B (zh) 回声消除方法、装置、设备及可读存储介质
WO2023116660A2 (zh) 一种模型训练以及音色转换方法、装置、设备及介质
CN111868823B (zh) 一种声源分离方法、装置及设备
CN111048061B (zh) 回声消除滤波器的步长获取方法、装置及设备
CN110956976B (zh) 一种回声消除方法、装置、设备及可读存储介质
US20130246061A1 (en) Automatic realtime speech impairment correction
CN111142066A (zh) 波达方向估计方法、服务器以及计算机可读存储介质
US11641592B1 (en) Device management using stored network metrics
CN110169082A (zh) 组合音频信号输出
WO2023245700A1 (zh) 一种音频能量分析方法和相关装置
CN113517000A (zh) 回声消除的测试方法、终端以及存储装置
WO2023051622A1 (zh) 提升远场语音交互性能的方法和远场语音交互系统
CN115376538A (zh) 用于交互的语音降噪方法、系统、电子设备和存储介质
CN104078049B (zh) 信号处理设备和信号处理方法
CN113436610B (zh) 测试方法、装置及系统
US11924368B2 (en) Data correction apparatus, data correction method, and program
CN115019826A (zh) 音频信号处理方法、设备、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22947463

Country of ref document: EP

Kind code of ref document: A1