WO2020024508A1 - Voice information obtaining method and apparatus - Google Patents

Voice information obtaining method and apparatus Download PDF

Info

Publication number
WO2020024508A1
WO2020024508A1 PCT/CN2018/120368 CN2018120368W WO2020024508A1 WO 2020024508 A1 WO2020024508 A1 WO 2020024508A1 CN 2018120368 W CN2018120368 W CN 2018120368W WO 2020024508 A1 WO2020024508 A1 WO 2020024508A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice information
voice
information
frequency
sound frequency
Prior art date
Application number
PCT/CN2018/120368
Other languages
French (fr)
Chinese (zh)
Inventor
廖湖锋
王子
刘健军
Original Assignee
珠海格力电器股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海格力电器股份有限公司 filed Critical 珠海格力电器股份有限公司
Publication of WO2020024508A1 publication Critical patent/WO2020024508A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Definitions

  • This application relates to, but is not limited to, the field of electrical appliances, and in particular, to a method and device for acquiring voice information.
  • online voice devices have occupied a considerable proportion in the market, and will continue to increase.
  • online voice devices support voice interaction and additional functions, such as singing and broadcasting the weather. Communicating with them is affected by the pronunciation of the voice device itself.
  • the embodiments of the present application provide a method and an apparatus for acquiring voice information, so as to at least solve the problem that it is difficult to distinguish between the sound broadcast by the device itself and the voice information collected by the device in the related art.
  • a method for acquiring voice information including: a device collects first voice information in an environment where the device is located; and the device determines a first sound corresponding to the first voice information. Frequency, a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; and is determined according to the similarity between the first voice frequency and the second voice frequency The third voice information in the first voice information is deleted from the first voice information to obtain the target voice information.
  • a method for acquiring voice information includes: the first device collects first voice information in an environment in which it is located, and acquires all voice playback devices in the current environment from a network side The currently played second voice information, wherein the environment includes the plurality of voice playback devices; the first device determines a first sound frequency corresponding to the first voice information, and a first sound frequency corresponding to the second voice information Two sound frequencies; determining the third sound information in the first sound information according to the similarity between the first sound frequency and the second sound frequency, and deleting the third sound information from the first sound information To get the target voice information.
  • a method for acquiring voice information includes: the device collects first voice information in an environment in which the device is located; and the device determines that the first voice information corresponds to The first feature information and the second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; and the determined feature is determined based on the similarity between the feature information and the second feature information.
  • the third voice information in the first voice information is described, and the third voice information is deleted from the first voice information to obtain the target voice information.
  • a device for acquiring voice information including: a first acquisition module configured to acquire first voice information in an environment where the device is located; a first determination module configured to set To determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; a second determining module is configured to: To determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain a target voice message.
  • a device for acquiring voice information including: a second acquisition module configured to acquire first voice information in an environment in which the device is located, and acquiring current information from a network side Second voice information currently played by all voice playback devices in the environment, wherein the environment includes the plurality of voice playback devices; a third determining module is configured to determine a first sound frequency corresponding to the first voice information, A second voice frequency corresponding to the second voice information; a fourth determination module configured to determine a third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency And deleting the third voice information from the first voice information to obtain the target voice information.
  • a voice information acquisition device including: a third acquisition module configured to acquire first voice information in an environment where the device is located; and a fifth determination module configured to set To determine the first feature information corresponding to the first voice information and the second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; a sixth determining module, sets To determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information to obtain the target voice information.
  • a storage medium stores a computer program, and the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • an electronic device which includes a memory and a processor.
  • the memory stores a computer program
  • the processor is configured to run the computer program to execute any one of the foregoing. Steps in a method embodiment.
  • a device collects first voice information in an environment in which the device is located; the device determines a first voice frequency corresponding to the first voice information and a second voice frequency corresponding to the second voice information, wherein the second The voice information is the voice played by the device itself; the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and the third voice information is deleted from the first voice information Voice information to get the target voice information.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
  • FIG. 1 is a block diagram of a hardware structure of a home appliance with a method for acquiring voice information according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for acquiring voice information according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a voice device according to the present application.
  • FIG. 1 is a block diagram of a hardware structure of a home appliance according to a method for acquiring voice information according to an embodiment of the present application.
  • the home appliance 10 may include one or more (only one shown in FIG. 1) a processor 102 (the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) ) And a memory 104 configured to store data, optionally, the home appliance may further include a transmission device 106 and an input-output device 108 configured as a communication function.
  • FIG. 1 is only schematic, and it does not limit the structure of the home appliance.
  • the home appliance 10 may further include more or fewer components than those shown in FIG. 1, or have a different configuration from that shown in FIG. 1.
  • the memory 104 may be configured to store software programs and modules of application software, such as program instructions / modules corresponding to the method for acquiring voice information in the embodiments of the present application.
  • the processor 102 runs the software programs and modules stored in the memory 104, thereby Perform various functional applications and data processing, that is, implement the method described above.
  • the memory 104 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely disposed with respect to the processor 102, and these remote memories may be connected to the home appliance 10 through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 106 is configured to receive or transmit data via a network.
  • a specific example of the above network may include a wireless network provided by a communication provider of the home appliance 10.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (RF) module, which is configured to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • FIG. 2 is a flowchart of a method for acquiring voice information according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps. :
  • Step S202 The device collects first voice information in an environment where the device is located;
  • the first voice information may include information such as music played by itself, and also includes a user's control instruction on the device.
  • Step S204 The device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, where the second voice information is a voice played by the device itself;
  • Step S206 Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain the target voice information.
  • the semantics of the target voice information can be identified, and the control instruction of the user can be determined.
  • the device collects the first voice information in the environment where the device is located; the device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second The voice information is the voice played by the device itself; the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and the third voice information is deleted from the first voice information Voice information to get the target voice information.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device Voice interaction.
  • the main body of the above steps may be home appliances such as air conditioners and refrigerators, but is not limited thereto.
  • the second sound frequency is determined by obtaining the second sound frequency from a buffer of the device.
  • the voice information played by the device itself is generally stored in the cache in advance, or it may be obtained from other connected storage media, such as a USB flash drive.
  • the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and the third voice information is deleted from the first voice information to obtain the target voice.
  • the information includes: in the first sound frequency, determining a sound frequency having a similarity with the second sound frequency higher than a threshold, and using the determined sound frequency as the third sound frequency; The third voice information is deleted from the first voice information to obtain the target voice information.
  • the portion of the first sound frequency that has a high degree of similarity to the second sound frequency may be determined to be the portion of the sound that it plays itself, and deleted, and the rest is the user's voice information.
  • the device collects the first voice information in the environment in which the device is located, when detecting that the device is not currently playing a voice, it is determined that the first voice information is the target voice information.
  • the device collecting the first voice information in the environment where the device is located includes: the device collecting the first voice information through a microphone.
  • a method for acquiring voice information including the following steps:
  • Step 1 The first device collects first voice information in an environment where the device is located, and obtains second voice information currently played by all voice playback devices in the current environment from a network side, where the environment includes the multiple voice playbacks. device;
  • Step 2 the first device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information;
  • Step 3 Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain the target voice information.
  • multiple voice playback devices share the voice information they play to the network-side device for other devices to refer to when identifying the user's control command, so as to leave the user's voice message.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
  • a method for acquiring voice information including the following steps:
  • Step 1 The device collects first voice information in an environment where the device is located;
  • Step 2 The device determines first feature information corresponding to the first voice information and second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
  • Step 3 Determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information to obtain the target voice information.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
  • the first feature information and the second feature information each include at least one of the following: a sound frequency, a tone, a tone color, and a volume.
  • the equipment in this application file supports online voice functions, as well as voice broadcast and interactive functions.
  • FIG. 3 is a schematic structural diagram of a voice device according to the present application. As shown in FIG. 3, it includes a voice acquisition module, a control unit, and a voice playback part. Module, when the device broadcasts the voice, the control unit buffers the frequency of the broadcast sound at the same time; at the same time, the control unit receives the voice acquisition audio; in the control unit, compares the audio collected by the voice with the audio buffer of the voice broadcast Yes, delete the audio content collected by the voice and delete the part with a high degree of similarity to the audio of the voice broadcast, and the remaining part is the audio content of the actual collection environment.
  • the method according to the above embodiments can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware, but in many cases the former is Better implementation.
  • the technical solution of this application that is essentially or contributes to the existing technology can be embodied in the form of a software product, which is stored in a storage medium (such as ROM / RAM, magnetic disk, The optical disc) includes several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the embodiments of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.
  • a device for acquiring voice information is also provided in this embodiment, and the device is configured to implement the foregoing embodiments and preferred implementations, and the descriptions will not be repeated.
  • the term "module” may implement a combination of software and / or hardware for a predetermined function.
  • the devices described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware is also possible and conceived.
  • a device for acquiring voice information including:
  • a first acquisition module configured to acquire first voice information in an environment in which the device is located
  • a first determining module configured to determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
  • a second determining module configured to determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information, Get the target voice information.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device Voice interaction.
  • a device for acquiring voice information including:
  • the second acquisition module is configured to collect first voice information in an environment in which the device is located, and obtain second voice information currently played by all voice playback devices in the current environment from the network side, where the environment includes the multiple voices Playback equipment
  • a third determining module configured to determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information
  • a fourth determining module configured to determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information, Get the target voice information.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
  • a device for acquiring voice information including:
  • a third acquisition module configured to acquire first voice information in an environment in which the device is located
  • a fifth determining module is configured to determine first feature information corresponding to the first voice information and second feature information corresponding to the second voice information, where the second voice information is a voice played by the device itself;
  • a sixth determining module is configured to determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information to obtain a target voice. information.
  • the above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
  • the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to the above: the above modules are located in the same processor; or the above modules are arbitrarily combined The forms are located in different processors.
  • An embodiment of the present application further provides a storage medium.
  • the foregoing storage medium may be configured to store program code configured to perform the following steps:
  • the device collects first voice information in an environment where the device is located.
  • the device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
  • the foregoing storage medium may include, but is not limited to, a U disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a mobile hard disk, and a magnetic disk.
  • Various media such as discs or optical discs that can store program codes.
  • An embodiment of the present application further provides an electronic device including a memory and a processor.
  • the memory stores a computer program
  • the processor is configured to run the computer program to perform the steps in any one of the foregoing method embodiments.
  • the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor, and the input / output device is connected to the processor.
  • the foregoing processor may be configured to execute the following steps by a computer program:
  • the device collects first voice information in an environment where the device is located.
  • the device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
  • modules or steps of the present application may be implemented by a general-purpose computing device, and they may be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they may be implemented with program code executable by a computing device, so that they may be stored in a storage device and executed by the computing device, and in some cases, may be in a different order than here
  • the steps shown or described are performed either by making them into individual integrated circuit modules or by making multiple modules or steps into a single integrated circuit module. As such, this application is not limited to any particular combination of hardware and software.
  • the sound played by the device itself is deleted, so as to eliminate interference of the device's own sound as much as possible, and solves the sound broadcasted by the device itself in the related technology and the data collected by the device.
  • the two are accurately separated according to the sound frequency, so that the device can accurately obtain the user's voice information and realize the voice interaction with the device.

Abstract

A voice information obtaining method and apparatus. The method comprises: a device acquires first voice information in an environment where the device is located (S202); the device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to second voice information, wherein the second voice information is voice played by the device itself (S204); and determine third voice information in the first voice information according to the similarity between the first sound frequency and the second sound frequency, and delete the third voice information from the first voice information to obtain target voice information (S206). The present invention solves the problem in the prior art of being difficult to distinguish sound played by a device itself and voice information acquired by the device, and enables said sound and said voice to be accurately separated from each other according to the sound frequencies, so that the device can accurately obtain voice information of the user, thereby implementing voice interaction with the device.

Description

语音信息的获取方法及装置Method and device for acquiring voice information 技术领域Technical field
本申请涉及但不限于电器领域,具体而言,涉及一种语音信息的获取方法及装置。This application relates to, but is not limited to, the field of electrical appliances, and in particular, to a method and device for acquiring voice information.
背景技术Background technique
在相关技术中,在线语音设备已在市场上占有相当比重,而且还会不断增大,而一般在线语音设备都支持语音交互和附加功能,比如唱歌,播报天气等,但在语音设备播报时,与其语音交流会受到语音设备本身发音的影响。In related technologies, online voice devices have occupied a considerable proportion in the market, and will continue to increase. Generally, online voice devices support voice interaction and additional functions, such as singing and broadcasting the weather. Communicating with them is affected by the pronunciation of the voice device itself.
针对相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,目前还没有有效的解决方案。There is no effective solution to the problem that the sound broadcast by the device itself and the voice information collected by the device are difficult to distinguish in the related art.
发明内容Summary of the invention
本申请实施例提供了一种语音信息的获取方法及装置,以至少解决相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题。The embodiments of the present application provide a method and an apparatus for acquiring voice information, so as to at least solve the problem that it is difficult to distinguish between the sound broadcast by the device itself and the voice information collected by the device in the related art.
根据本申请的一个实施例,提供了一种语音信息的获取方法,包括:设备采集所述设备所处环境中的第一语音信息;所述设备确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,所述第二语音信息为所述设备自身播放的语音;依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。According to an embodiment of the present application, a method for acquiring voice information is provided, including: a device collects first voice information in an environment where the device is located; and the device determines a first sound corresponding to the first voice information. Frequency, a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; and is determined according to the similarity between the first voice frequency and the second voice frequency The third voice information in the first voice information is deleted from the first voice information to obtain the target voice information.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取方法,包括:第一设备采集自身所处环境中的第一语音信息,以及从网络侧获取当前环境中所有语音播放设备当前播放的第二语音信息,其中,所述环境中包括所述多个语音播放设备;所述第一设备确定所述第一语音信息对应 的第一声音频率,和第二语音信息对应的第二声音频率;依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。According to another embodiment of the present application document, a method for acquiring voice information is also provided, which includes: the first device collects first voice information in an environment in which it is located, and acquires all voice playback devices in the current environment from a network side The currently played second voice information, wherein the environment includes the plurality of voice playback devices; the first device determines a first sound frequency corresponding to the first voice information, and a first sound frequency corresponding to the second voice information Two sound frequencies; determining the third sound information in the first sound information according to the similarity between the first sound frequency and the second sound frequency, and deleting the third sound information from the first sound information To get the target voice information.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取方法,包括:设备采集所述设备所处环境中的第一语音信息;所述设备确定所述第一语音信息对应的第一特征信息,和第二语音信息对应的第二特征信息,其中,所述第二语音信息为所述设备自身播放的语音;依据所述特征信息和第二特征信息的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。According to another embodiment of the application document, a method for acquiring voice information is also provided, which includes: the device collects first voice information in an environment in which the device is located; and the device determines that the first voice information corresponds to The first feature information and the second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; and the determined feature is determined based on the similarity between the feature information and the second feature information. The third voice information in the first voice information is described, and the third voice information is deleted from the first voice information to obtain the target voice information.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取装置,包括:第一采集模块,设置为采集所述设备所处环境中的第一语音信息;第一确定模块,设置为确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,所述第二语音信息为所述设备自身播放的语音;第二确定模块,设置为依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。According to another embodiment of the application document, a device for acquiring voice information is further provided, including: a first acquisition module configured to acquire first voice information in an environment where the device is located; a first determination module configured to set To determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; a second determining module is configured to: To determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain a target voice message.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取装置,包括:第二采集模块,设置为采集所述设备所处环境中的第一语音信息,以及从网络侧获取当前环境中所有语音播放设备当前播放的第二语音信息,其中,所述环境中包括所述多个语音播放设备;第三确定模块,设置为确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率;第四确定模块,设置为依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。According to another embodiment of the application document, a device for acquiring voice information is further provided, including: a second acquisition module configured to acquire first voice information in an environment in which the device is located, and acquiring current information from a network side Second voice information currently played by all voice playback devices in the environment, wherein the environment includes the plurality of voice playback devices; a third determining module is configured to determine a first sound frequency corresponding to the first voice information, A second voice frequency corresponding to the second voice information; a fourth determination module configured to determine a third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency And deleting the third voice information from the first voice information to obtain the target voice information.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取装置,包括:第三采集模块,设置为采集所述设备所处环境中的第一语音信息;第五确定模块,设置为确定所述第一语音信息对应的第一特征信息,和第 二语音信息对应的第二特征信息,其中,所述第二语音信息为所述设备自身播放的语音;第六确定模块,设置为依据所述特征信息和第二特征信息的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。According to another embodiment of the present application document, a voice information acquisition device is further provided, including: a third acquisition module configured to acquire first voice information in an environment where the device is located; and a fifth determination module configured to set To determine the first feature information corresponding to the first voice information and the second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself; a sixth determining module, sets To determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information to obtain the target voice information.
根据本申请的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present application, a storage medium is also provided. The storage medium stores a computer program, and the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
根据本申请的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。According to another embodiment of the present application, an electronic device is further provided, which includes a memory and a processor. The memory stores a computer program, and the processor is configured to run the computer program to execute any one of the foregoing. Steps in a method embodiment.
通过本申请,设备采集该设备所处环境中的第一语音信息;该设备确定该第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,该第二语音信息为该设备自身播放的语音;依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。Through this application, a device collects first voice information in an environment in which the device is located; the device determines a first voice frequency corresponding to the first voice information and a second voice frequency corresponding to the second voice information, wherein the second The voice information is the voice played by the device itself; the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and the third voice information is deleted from the first voice information Voice information to get the target voice information. The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present application and constitute a part of the present application. The schematic embodiments of the present application and the description thereof are used to explain the present application, and do not constitute an improper limitation on the present application. In the drawings:
图1是本申请实施例的一种语音信息的获取方法的家电设备的硬件结构框图;FIG. 1 is a block diagram of a hardware structure of a home appliance with a method for acquiring voice information according to an embodiment of the present application; FIG.
图2是根据本申请实施例的语音信息的获取方法的流程图;2 is a flowchart of a method for acquiring voice information according to an embodiment of the present application;
图3是根据本申请的一种语音设备的结构示意图。FIG. 3 is a schematic structural diagram of a voice device according to the present application.
具体实施方式detailed description
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present application will be described in detail with reference to the drawings and embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms “first” and “second” in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence.
实施例一Example one
本申请实施例一所提供的方法实施例可以在家电设备、计算机终端或者类似的运算装置中执行。以运行在家电设备上为例,图1是本申请实施例的一种语音信息的获取方法的家电设备的硬件结构框图。如图1所示,家电设备10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和设置为存储数据的存储器104,可选地,上述家电设备还可以包括设置为通信功能的传输装置106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述家电设备的结构造成限定。例如,家电设备10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。The method embodiments provided in the first embodiment of the present application may be executed in a home appliance, a computer terminal, or a similar computing device. Taking a home appliance as an example, FIG. 1 is a block diagram of a hardware structure of a home appliance according to a method for acquiring voice information according to an embodiment of the present application. As shown in FIG. 1, the home appliance 10 may include one or more (only one shown in FIG. 1) a processor 102 (the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) ) And a memory 104 configured to store data, optionally, the home appliance may further include a transmission device 106 and an input-output device 108 configured as a communication function. Persons of ordinary skill in the art can understand that the structure shown in FIG. 1 is only schematic, and it does not limit the structure of the home appliance. For example, the home appliance 10 may further include more or fewer components than those shown in FIG. 1, or have a different configuration from that shown in FIG. 1.
存储器104可设置为存储应用软件的软件程序以及模块,如本申请实施例中的语音信息的获取方法对应的程序指令/模块,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至家电设备10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 may be configured to store software programs and modules of application software, such as program instructions / modules corresponding to the method for acquiring voice information in the embodiments of the present application. The processor 102 runs the software programs and modules stored in the memory 104, thereby Perform various functional applications and data processing, that is, implement the method described above. The memory 104 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory remotely disposed with respect to the processor 102, and these remote memories may be connected to the home appliance 10 through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
传输装置106设置为经由一个网络接收或者发送数据。上述的网络具 体实例可包括家电设备10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其设置为通过无线方式与互联网进行通讯。The transmission device 106 is configured to receive or transmit data via a network. A specific example of the above network may include a wireless network provided by a communication provider of the home appliance 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (RF) module, which is configured to communicate with the Internet in a wireless manner.
在本实施例中提供了一种运行于上述家电设备的语音信息的获取方法,图2是根据本申请实施例的语音信息的获取方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, a method for acquiring voice information running on the home appliance is provided. FIG. 2 is a flowchart of a method for acquiring voice information according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps. :
步骤S202,设备采集该设备所处环境中的第一语音信息;Step S202: The device collects first voice information in an environment where the device is located;
该第一语音信息中可能包括自身播放的音乐等信息,也包括用户对设备的控制指令。The first voice information may include information such as music played by itself, and also includes a user's control instruction on the device.
步骤S204,该设备确定该第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,该第二语音信息为该设备自身播放的语音;Step S204: The device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, where the second voice information is a voice played by the device itself;
步骤S206,依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。Step S206: Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain the target voice information. .
得到目标语音信息之后,可以识别目标语音信息的语义,确定用户的控制指令。After the target voice information is obtained, the semantics of the target voice information can be identified, and the control instruction of the user can be determined.
通过上述步骤,设备采集该设备所处环境中的第一语音信息;该设备确定该第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,该第二语音信息为该设备自身播放的语音;依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。Through the above steps, the device collects the first voice information in the environment where the device is located; the device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second The voice information is the voice played by the device itself; the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and the third voice information is deleted from the first voice information Voice information to get the target voice information. The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device Voice interaction.
可选地,上述步骤的执行主体可以为空调,冰箱等家电设备等,但不限于此。Optionally, the main body of the above steps may be home appliances such as air conditioners and refrigerators, but is not limited thereto.
可选地,该第二声音频率通过以下方式确定:从该设备的缓存中获取该第二声音频率。设备自身播放的语音信息,一般来讲将预先在缓存中存放,也可能从其他连接的存储介质中获取,例如U盘。Optionally, the second sound frequency is determined by obtaining the second sound frequency from a buffer of the device. The voice information played by the device itself is generally stored in the cache in advance, or it may be obtained from other connected storage media, such as a USB flash drive.
可选地,依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息,包括:在该第一声音频率中,确定与该第二声音频率的相似度高于阈值的声音频率,并将确定的声音频率作为该第三声音频率;将该第三声音频率对应的第三语音信息从该第一语音信息中的删除,得到该目标语音信息。Optionally, the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and the third voice information is deleted from the first voice information to obtain the target voice. The information includes: in the first sound frequency, determining a sound frequency having a similarity with the second sound frequency higher than a threshold, and using the determined sound frequency as the third sound frequency; The third voice information is deleted from the first voice information to obtain the target voice information.
第一声音频率中与第二声音频率相似度高的部分,可以确定是自身播放的那部分声音,将其删除掉,剩下的即为用户的语音信息。The portion of the first sound frequency that has a high degree of similarity to the second sound frequency may be determined to be the portion of the sound that it plays itself, and deleted, and the rest is the user's voice information.
可选地,设备采集该设备所处环境中的第一语音信息之后,在检测到该设备当前未播放语音时,确定该第一语音信息为该目标语音信息。Optionally, after the device collects the first voice information in the environment in which the device is located, when detecting that the device is not currently playing a voice, it is determined that the first voice information is the target voice information.
可选地,设备采集该设备所处环境中的第一语音信息,包括:该设备通过麦克风采集该第一语音信息。Optionally, the device collecting the first voice information in the environment where the device is located includes: the device collecting the first voice information through a microphone.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取方法,包括以下步骤:According to another embodiment of the application document, a method for acquiring voice information is also provided, including the following steps:
步骤一,第一设备采集该设备所处环境中的第一语音信息,以及从网络侧获取当前环境中所有语音播放设备当前播放的第二语音信息,其中,该环境中包括该多个语音播放设备;Step 1: The first device collects first voice information in an environment where the device is located, and obtains second voice information currently played by all voice playback devices in the current environment from a network side, where the environment includes the multiple voice playbacks. device;
步骤二,该第一设备确定该第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率;Step 2: the first device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information;
步骤三,依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。Step 3: Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain the target voice information. .
在当前环境中存在多个语音播放设备时,多个语音播放设备将自身播放的语音信息共享至网络侧设备,供其他设备在识别用户的控制命令时进行参考,以尽可能的留下用户的语音信息。When there are multiple voice playback devices in the current environment, multiple voice playback devices share the voice information they play to the network-side device for other devices to refer to when identifying the user's control command, so as to leave the user's voice message.
采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取方法,包括以下步骤:According to another embodiment of the application document, a method for acquiring voice information is also provided, including the following steps:
步骤一,设备采集该设备所处环境中的第一语音信息;Step 1: The device collects first voice information in an environment where the device is located;
步骤二,该设备确定该第一语音信息对应的第一特征信息,和第二语音信息对应的第二特征信息,其中,该第二语音信息为该设备自身播放的语音;Step 2: The device determines first feature information corresponding to the first voice information and second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
步骤三,依据该特征信息和第二特征信息的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。Step 3: Determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information to obtain the target voice information.
采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
可选地,该第一特征信息和第二特征信息均包括以下至少之一:声音频率、音调、音色、音量。Optionally, the first feature information and the second feature information each include at least one of the following: a sound frequency, a tone, a tone color, and a volume.
下面结合本申请文件的另一个实施例进行说明。The following description is made with reference to another embodiment of the application document.
本申请文件中针对以下技术问题:在线语音设备接收的语音信号不受其播报的声音影响。This application document addresses the following technical issues: The voice signals received by an online voice device are not affected by the sounds they broadcast.
本申请文件中的设备支持在线语音功能,也支持语音播报和交互功能。The equipment in this application file supports online voice functions, as well as voice broadcast and interactive functions.
本申请文件中整个小系统包含语音采集部分、控制单元、语音播放部分,图3是根据本申请的一种语音设备的结构示意图,如图3所示,包括 语音采集模块,控制单元与语音播放模块,在设备进行播报语音时,控制单元同时把播报声音的频率缓存下来;同时,控制单元接收到语音采集到音频;在控制单元内,将语音采集的音频与语音播报的音频缓存做一个比对,将语音采集到的音频内容删除掉与语音播报音频相似度极高的部分删除,剩余部分则为实际采集环境的音频内容。The entire small system in this application file includes a voice acquisition part, a control unit, and a voice playback part. FIG. 3 is a schematic structural diagram of a voice device according to the present application. As shown in FIG. 3, it includes a voice acquisition module, a control unit, and a voice playback part. Module, when the device broadcasts the voice, the control unit buffers the frequency of the broadcast sound at the same time; at the same time, the control unit receives the voice acquisition audio; in the control unit, compares the audio collected by the voice with the audio buffer of the voice broadcast Yes, delete the audio content collected by the voice and delete the part with a high degree of similarity to the audio of the voice broadcast, and the remaining part is the audio content of the actual collection environment.
采用上述技术方案,排除在线语音设备本身播报的声音的影响,提高了在线语音设备对声音采样的准确性。By adopting the above technical solution, the influence of the sound broadcast by the online voice device itself is eliminated, and the accuracy of sound sampling by the online voice device is improved.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware, but in many cases the former is Better implementation. Based on such an understanding, the technical solution of this application that is essentially or contributes to the existing technology can be embodied in the form of a software product, which is stored in a storage medium (such as ROM / RAM, magnetic disk, The optical disc) includes several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the embodiments of the present application.
实施例二Example two
在本实施例中还提供了一种语音信息的获取装置,该装置设置为实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。A device for acquiring voice information is also provided in this embodiment, and the device is configured to implement the foregoing embodiments and preferred implementations, and the descriptions will not be repeated. As used below, the term "module" may implement a combination of software and / or hardware for a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware is also possible and conceived.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取装置,包括:According to another embodiment of the present application document, a device for acquiring voice information is further provided, including:
第一采集模块,设置为采集该设备所处环境中的第一语音信息;A first acquisition module configured to acquire first voice information in an environment in which the device is located;
第一确定模块,设置为确定该第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,该第二语音信息为该设备自身播放的语音;A first determining module configured to determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
第二确定模块,设置为依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。A second determining module, configured to determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information, Get the target voice information.
采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device Voice interaction.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取装置,包括:According to another embodiment of the present application document, a device for acquiring voice information is further provided, including:
第二采集模块,设置为采集该设备所处环境中的第一语音信息,以及从网络侧获取当前环境中所有语音播放设备当前播放的第二语音信息,其中,该环境中包括该多个语音播放设备;The second acquisition module is configured to collect first voice information in an environment in which the device is located, and obtain second voice information currently played by all voice playback devices in the current environment from the network side, where the environment includes the multiple voices Playback equipment
第三确定模块,设置为确定该第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率;A third determining module, configured to determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information;
第四确定模块,设置为依据该第一声音频率和该第二声音频率的相似度,确定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。A fourth determining module, configured to determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information, Get the target voice information.
采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
根据本申请文件的另一个实施例,还提供了一种语音信息的获取装置,包括:According to another embodiment of the present application document, a device for acquiring voice information is further provided, including:
第三采集模块,设置为采集该设备所处环境中的第一语音信息;A third acquisition module configured to acquire first voice information in an environment in which the device is located;
第五确定模块,设置为确定该第一语音信息对应的第一特征信息,和第二语音信息对应的第二特征信息,其中,该第二语音信息为该设备自身播放的语音;A fifth determining module is configured to determine first feature information corresponding to the first voice information and second feature information corresponding to the second voice information, where the second voice information is a voice played by the device itself;
第六确定模块,设置为依据该特征信息和第二特征信息的相似度,确 定该第一语音信息中的第三语音信息,并从该第一语音信息中删除第三语音信息,得到目标语音信息。A sixth determining module is configured to determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information to obtain a target voice. information.
采用上述技术方案,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。The above technical solution is adopted to solve the problem that it is difficult to distinguish between the sound broadcasted by the device itself and the voice information collected by the device in the related technology, and the two are accurately separated according to the frequency of the sound, so that the device can accurately obtain the user's voice information and realize the same as the device. Voice interaction.
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to the above: the above modules are located in the same processor; or the above modules are arbitrarily combined The forms are located in different processors.
实施例三Example three
本申请的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储设置为执行以下步骤的程序代码:An embodiment of the present application further provides a storage medium. Optionally, in this embodiment, the foregoing storage medium may be configured to store program code configured to perform the following steps:
S1,设备采集所述设备所处环境中的第一语音信息;S1. The device collects first voice information in an environment where the device is located.
S2,所述设备确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,所述第二语音信息为所述设备自身播放的语音;S2. The device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
S3,依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。S3. Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain Target voice information.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the foregoing storage medium may include, but is not limited to, a U disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a mobile hard disk, and a magnetic disk. Various media such as discs or optical discs that can store program codes.
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present application further provides an electronic device including a memory and a processor. The memory stores a computer program, and the processor is configured to run the computer program to perform the steps in any one of the foregoing method embodiments.
可选地,上述电子装置还可以包括传输装置以及输入输出设备,其中, 该传输装置和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor, and the input / output device is connected to the processor.
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:Optionally, in this embodiment, the foregoing processor may be configured to execute the following steps by a computer program:
S1,设备采集所述设备所处环境中的第一语音信息;S1. The device collects first voice information in an environment where the device is located.
S2,所述设备确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,所述第二语音信息为所述设备自身播放的语音;S2. The device determines a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
S3,依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。S3. Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain Target voice information.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not described in this embodiment.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not described in this embodiment.
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present application may be implemented by a general-purpose computing device, and they may be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Above, optionally, they may be implemented with program code executable by a computing device, so that they may be stored in a storage device and executed by the computing device, and in some cases, may be in a different order than here The steps shown or described are performed either by making them into individual integrated circuit modules or by making multiple modules or steps into a single integrated circuit module. As such, this application is not limited to any particular combination of hardware and software.
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above description is only a preferred embodiment of the present application, and is not intended to limit the present application. For those skilled in the art, this application may have various modifications and changes. Any modification, equivalent replacement, or improvement made within the spirit and principle of this application shall be included in the protection scope of this application.
工业实用性Industrial applicability
本申请提供的上述技术方案,在设备采集的环境语音信息中,将设备自身播放的声音删除掉,以尽可能刨除设备自身声音的干扰,解决了相关技术中设备自身播报的声音与设备采集的语音信息难以区分的问题,依据声音频率准确的将二者进行分离,使得设备可以准确获取用户的语音信息,实现与设备的语音交互。In the above technical solution provided by the present application, in the environmental voice information collected by the device, the sound played by the device itself is deleted, so as to eliminate interference of the device's own sound as much as possible, and solves the sound broadcasted by the device itself in the related technology and the data collected by the device. For the problem that voice information is difficult to distinguish, the two are accurately separated according to the sound frequency, so that the device can accurately obtain the user's voice information and realize the voice interaction with the device.

Claims (13)

  1. 一种语音信息的获取方法,其中,包括:A method for acquiring voice information, including:
    设备采集所述设备所处环境中的第一语音信息;The device collects first voice information in an environment in which the device is located;
    所述设备确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,所述第二语音信息为所述设备自身播放的语音;Determining, by the device, a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
    依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain a target voice information.
  2. 根据权利要求1所述的方法,其中,所述第二声音频率通过以下方式确定:The method of claim 1, wherein the second sound frequency is determined by:
    从所述设备的缓存中获取所述第二声音频率。Acquiring the second sound frequency from a buffer of the device.
  3. 根据权利要求1所述的方法,其中,依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息,包括:The method according to claim 1, wherein the third voice information in the first voice information is determined according to the similarity between the first voice frequency and the second voice frequency, and from the first voice The third voice information is deleted from the information to obtain the target voice information, including:
    在所述第一声音频率中,确定与所述第二声音频率的相似度高于阈值的声音频率,并将确定的声音频率作为第三声音频率;In the first sound frequency, determining a sound frequency with a similarity to the second sound frequency higher than a threshold, and using the determined sound frequency as the third sound frequency;
    将所述第三声音频率对应的第三语音信息从所述第一语音信息中的删除,得到所述目标语音信息。Deleting the third voice information corresponding to the third voice frequency from the first voice information to obtain the target voice information.
  4. 根据权利要求1所述的方法,其中,设备采集所述设备所处环境中的第一语音信息之后,所述方法还包括:The method according to claim 1, wherein after the device collects the first voice information in the environment in which the device is located, the method further comprises:
    在检测到所述设备当前未播放语音时,确定所述第一语音信息为 所述目标语音信息。When it is detected that the device is not currently playing a voice, it is determined that the first voice information is the target voice information.
  5. 根据权利要求1所述的方法,其中,设备采集所述设备所处环境中的第一语音信息,包括:The method according to claim 1, wherein the device collecting the first voice information in an environment in which the device is located comprises:
    所述设备通过麦克风采集所述第一语音信息。The device collects the first voice information through a microphone.
  6. 一种语音信息的获取方法,其中,包括:A method for acquiring voice information, including:
    第一设备采集自身所处环境中的第一语音信息,以及从网络侧获取当前环境中所有语音播放设备当前播放的第二语音信息,其中,所述环境中包括多个语音播放设备;The first device collects first voice information in an environment in which the first device is located, and acquires second voice information currently played by all voice playback devices in the current environment from the network side, where the environment includes multiple voice playback devices;
    所述第一设备确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率;Determining, by the first device, a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information;
    依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。Determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the third voice information from the first voice information to obtain a target voice information.
  7. 一种语音信息的获取方法,其中,包括:A method for acquiring voice information, including:
    设备采集所述设备所处环境中的第一语音信息;The device collects first voice information in an environment in which the device is located;
    所述设备确定所述第一语音信息对应的第一特征信息,和第二语音信息对应的第二特征信息,其中,所述第二语音信息为所述设备自身播放的语音;Determining, by the device, first feature information corresponding to the first voice information and second feature information corresponding to second voice information, wherein the second voice information is a voice played by the device itself;
    依据所述特征信息和第二特征信息的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。Determining the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and deleting the third voice information from the first voice information to obtain the target voice information.
  8. 根据权利要求7所述的方法,其中,所述第一特征信息和第 二特征信息均包括以下至少之一:The method according to claim 7, wherein the first characteristic information and the second characteristic information each include at least one of the following:
    声音频率、音调、音色、音量。Sound frequency, tone, timbre, volume.
  9. 一种语音信息的获取装置,其中,包括:An apparatus for acquiring voice information, including:
    第一采集模块,设置为采集设备所处环境中的第一语音信息;A first acquisition module configured to acquire first voice information in an environment where the device is located;
    第一确定模块,设置为确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率,其中,所述第二语音信息为所述设备自身播放的语音;A first determining module configured to determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
    第二确定模块,设置为依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。The second determining module is configured to determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the first voice information from the first voice information. Three voice messages to get the target voice message.
  10. 一种语音信息的获取装置,其中,包括:An apparatus for acquiring voice information, including:
    第二采集模块,设置为采集设备所处环境中的第一语音信息,以及从网络侧获取当前环境中所有语音播放设备当前播放的第二语音信息,其中,所述环境中包括多个语音播放设备;The second acquisition module is configured to collect first voice information in an environment where the device is located, and obtain second voice information currently played by all voice playback devices in the current environment from the network side, where the environment includes multiple voice playbacks. device;
    第三确定模块,设置为确定所述第一语音信息对应的第一声音频率,和第二语音信息对应的第二声音频率;A third determining module, configured to determine a first sound frequency corresponding to the first voice information and a second sound frequency corresponding to the second voice information;
    第四确定模块,设置为依据所述第一声音频率和所述第二声音频率的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。A fourth determining module is configured to determine the third voice information in the first voice information according to the similarity between the first voice frequency and the second voice frequency, and delete the first voice information from the first voice information. Three voice messages to get the target voice message.
  11. 一种语音信息的获取装置,其中,包括:An apparatus for acquiring voice information, including:
    第三采集模块,设置为采集设备所处环境中的第一语音信息;A third acquisition module, configured to acquire first voice information in an environment where the device is located;
    第五确定模块,设置为确定所述第一语音信息对应的第一特征信 息,和第二语音信息对应的第二特征信息,其中,所述第二语音信息为所述设备自身播放的语音;A fifth determining module, configured to determine the first feature information corresponding to the first voice information and the second feature information corresponding to the second voice information, wherein the second voice information is a voice played by the device itself;
    第六确定模块,设置为依据所述特征信息和第二特征信息的相似度,确定所述第一语音信息中的第三语音信息,并从所述第一语音信息中删除第三语音信息,得到目标语音信息。A sixth determining module, configured to determine the third voice information in the first voice information according to the similarity between the feature information and the second feature information, and delete the third voice information from the first voice information, Get the target voice information.
  12. 一种存储介质,其中,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至8任一项中所述的方法。A storage medium, wherein a computer program is stored in the storage medium, and the computer program is configured to execute the method according to any one of claims 1 to 8 when running.
  13. 一种电子装置,包括存储器和处理器,其中,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至8任一项中所述的方法。An electronic device includes a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the computer program of any one of claims 1 to 8. method.
PCT/CN2018/120368 2018-08-01 2018-12-11 Voice information obtaining method and apparatus WO2020024508A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810866714.9A CN110797048B (en) 2018-08-01 2018-08-01 Method and device for acquiring voice information
CN201810866714.9 2018-08-01

Publications (1)

Publication Number Publication Date
WO2020024508A1 true WO2020024508A1 (en) 2020-02-06

Family

ID=69230807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/120368 WO2020024508A1 (en) 2018-08-01 2018-12-11 Voice information obtaining method and apparatus

Country Status (2)

Country Link
CN (1) CN110797048B (en)
WO (1) WO2020024508A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509567A (en) * 2020-12-25 2021-03-16 北京百度网讯科技有限公司 Method, device, equipment, storage medium and program product for processing voice data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040131201A1 (en) * 2003-01-08 2004-07-08 Hundal Sukhdeep S. Multiple wireless microphone speakerphone system and method
CN202197344U (en) * 2011-07-08 2012-04-18 歌尔声学股份有限公司 Transmitter array echo eliminating system
CN103325379A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Method and device used for acoustic echo control
CN104158990A (en) * 2013-05-13 2014-11-19 英特尔Ip公司 Method for processing an audio signal and audio receiving circuit
CN105187594A (en) * 2015-07-28 2015-12-23 小米科技有限责任公司 Echo canceling method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325383A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Audio processing method and audio processing device
CN104517607A (en) * 2014-12-16 2015-04-15 佛山市顺德区美的电热电器制造有限公司 Speed-controlled appliance and method of filtering noise therein
CN105657150A (en) * 2015-09-29 2016-06-08 宇龙计算机通信科技(深圳)有限公司 Noise elimination method and device and electronic device
CN106098078B (en) * 2016-06-14 2020-06-02 惠州Tcl移动通信有限公司 Voice recognition method and system capable of filtering loudspeaker noise
CN111968643A (en) * 2017-09-29 2020-11-20 赵成智 Intelligent recognition method, robot and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040131201A1 (en) * 2003-01-08 2004-07-08 Hundal Sukhdeep S. Multiple wireless microphone speakerphone system and method
CN202197344U (en) * 2011-07-08 2012-04-18 歌尔声学股份有限公司 Transmitter array echo eliminating system
CN103325379A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Method and device used for acoustic echo control
CN104158990A (en) * 2013-05-13 2014-11-19 英特尔Ip公司 Method for processing an audio signal and audio receiving circuit
CN105187594A (en) * 2015-07-28 2015-12-23 小米科技有限责任公司 Echo canceling method and device

Also Published As

Publication number Publication date
CN110797048A (en) 2020-02-14
CN110797048B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN105120304B (en) Information display method, apparatus and system
CN109658932B (en) Equipment control method, device, equipment and medium
US10097884B2 (en) Media playback method, client and system
JP2018519538A (en) Karaoke processing method and system
CN109271130B (en) Audio playing method, medium, device and computing equipment
WO2018076664A1 (en) Voice broadcasting method and device
US11057664B1 (en) Learning multi-device controller with personalized voice control
CN104091596A (en) Music identifying method, system and device
CN104918069A (en) Play scene reduction method, system, playing terminal and control terminal
CN110265004B (en) Control method and device for target terminal in intelligent home operating system
CN110830832B (en) Audio playing parameter configuration method of mobile terminal and related equipment
WO2019128829A1 (en) Action execution method and apparatus, storage medium and electronic apparatus
CN103905925A (en) Method and terminal for repeatedly playing program
CN113010139B (en) Screen projection method and device and electronic equipment
WO2020207373A1 (en) Method, device, terminal, and system for playing back multimedia resource
CN112312167B (en) Broadcast content monitoring method and device, storage medium and electronic equipment
CN105812581A (en) Volume automatic adjustment method and device
CN106257928A (en) Audio file acquisition methods, update notification sending method, equipment and system
CN111258530A (en) Audio playing control method, server and audio playing system
WO2020024508A1 (en) Voice information obtaining method and apparatus
US20150271598A1 (en) Radio to Tune Multiple Stations Simultaneously and Select Programming Segments
JP6151112B2 (en) REPRODUCTION DEVICE, REPRODUCTION DEVICE CONTROL METHOD, SERVER, AND SYSTEM
US11557303B2 (en) Frictionless handoff of audio content playing using overlaid ultrasonic codes
CN108196817B (en) Audio recognition method, device and storage medium
WO2019052361A1 (en) Ring tone setting method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18928906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18928906

Country of ref document: EP

Kind code of ref document: A1