WO2021180162A1 - 功耗控制、模式配置与vad方法、设备及存储介质 - Google Patents

功耗控制、模式配置与vad方法、设备及存储介质 Download PDF

Info

Publication number
WO2021180162A1
WO2021180162A1 PCT/CN2021/080172 CN2021080172W WO2021180162A1 WO 2021180162 A1 WO2021180162 A1 WO 2021180162A1 CN 2021080172 W CN2021080172 W CN 2021080172W WO 2021180162 A1 WO2021180162 A1 WO 2021180162A1
Authority
WO
WIPO (PCT)
Prior art keywords
vad
voice chip
mode
voice
currently used
Prior art date
Application number
PCT/CN2021/080172
Other languages
English (en)
French (fr)
Inventor
杨智慧
付强
田彪
马骁
吴登峰
袁斌
余磊
左玲云
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021180162A1 publication Critical patent/WO2021180162A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • This application relates to the field of voice processing technology, and in particular to a method, device and storage medium for power consumption control, mode configuration and VAD.
  • VAD Voice Activity Detection, that is, voice endpoint detection
  • the hardware VAD module detects whether there is a voice signal input based on the signal energy; when no voice signal input is detected, the voice chip is in low power consumption mode; when a voice signal input is detected, the voice chip is awakened and starts voice processing.
  • the hardware VAD module detects whether there is a voice signal input based on the signal energy, it is easy to be triggered by mistake in a noisy environment, resulting in poor low power consumption performance of the voice chip.
  • Various aspects of the present application provide a power consumption control, mode configuration, and VAD method, device, and storage medium to improve the accuracy of the voice input detection result and reduce the probability of false triggering.
  • the embodiment of the present application provides a power consumption control method, which is suitable for a voice chip or device that has a hardware VAD function and a software VAD function; the method includes: collecting sound signals input to the voice chip or device; using The VAD mode currently used by the voice chip or device detects whether the voice signal contains a voice signal; if the voice signal contains a voice signal, the voice chip or device enters the normal working mode from the low power consumption mode; wherein, the The currently used VAD mode is one of a variety of VAD modes produced by the combined use of the hardware VAD function and the software VAD function.
  • the embodiment of the present application also provides a VAD mode configuration method, which is suitable for a voice chip or device that has a hardware VAD function and a software VAD function.
  • the method includes: receiving a policy configuration instruction; according to the policy configuration instruction, Configure the VAD mode usage strategy required by the voice chip or device to use multiple VAD modes; wherein, the multiple VAD modes are generated by the combined use of the hardware VAD function and the software VAD function.
  • the embodiment of the present application also provides a voice endpoint detection method, which is suitable for a voice chip or device, the voice chip or device has a hardware VAD function and a software VAD function, and the method includes: collecting a sound signal input to the voice chip or device; VAD processing is performed on the voice signal using the VAD mode currently used by the voice chip or the device; wherein the currently used VAD mode is one of multiple VAD modes generated by the combined use of the hardware VAD function and the software VAD function.
  • the embodiment of the application also provides a voice chip, including: a sound pickup module, a hardware VAD module, a processor, and a memory; the memory stores a VAD program and a power consumption control program; the sound pickup module is used to collect input The voice signal of the voice chip; the hardware VAD module is used to detect whether the voice signal contains a voice signal in a hardware manner when the currently used VAD mode indicates to enable the hardware VAD function of the voice chip; the processing The processor is used to execute the VAD program to detect whether the voice signal contains a voice signal in a software manner when the currently used VAD mode instructs to enable the software VAD function of the voice chip; the processor is also used to execute all The power consumption control program is used to control the voice chip to enter the normal working mode from the low power consumption mode in the case where the voice signal is detected to contain the voice signal by using the currently used VAD mode; wherein The currently used VAD mode is one of multiple VAD modes generated by the combined use of the hardware VAD function and the software VAD function.
  • the embodiment of the present application also provides a voice chip, including: a sound pickup module, a hardware VAD module, a main processor, a coprocessor, and a memory; the memory stores a VAD program and a power consumption control program; the sound pickup module , Used to collect the voice signal input to the voice chip; the hardware VAD module, used to detect whether the voice signal contains voice in a hardware manner when the currently used VAD mode indicates to enable the hardware VAD function of the voice chip Signal; the coprocessor is used to execute the VAD program to detect whether the sound signal contains a voice signal in a software manner when the currently used VAD mode indicates to enable the software VAD function of the voice chip; the co-processing The device is also used to execute the power consumption control program, so as to control the main processor to enter the low power consumption mode when it is detected that the voice signal contains a voice signal by using the currently used VAD mode Normal working mode; wherein, the currently used VAD mode is one of multiple VAD modes generated by the combined use of the hardware VAD function and the software
  • the embodiment of the present application also provides an intelligent terminal, including a voice chip, the voice chip includes a pickup module, a hardware VAD module, a processor, and a memory; the memory stores a VAD program and a power consumption control program; the pickup The sound module is used to collect the sound signal input to the voice chip; the hardware VAD module is used to detect whether the sound signal is the sound signal in a hardware manner when the currently used VAD mode indicates to enable the hardware VAD function of the voice chip Contains a voice signal; the processor is configured to execute the VAD program to detect whether the voice signal contains a voice signal by software when the currently used VAD mode instructs to enable the software VAD function of the voice chip; the processing The device is also used to execute the power consumption control program, so as to control the voice device to enter the normal mode from the low power consumption mode when the voice signal is detected in the voice signal using the currently used VAD mode.
  • Working mode wherein, the currently used VAD mode is one of multiple VAD modes generated by the combined use of the hardware VAD function and the software
  • An embodiment of the present application also provides an intelligent terminal, including: a voice chip and a main processor, the voice chip includes a pickup module, a hardware VAD module, a co-processor, and a memory; the memory stores a VAD program and power consumption Control program; the pickup module is used to collect sound signals input to the voice chip; the hardware VAD module is used to use hardware when the currently used VAD mode indicates to enable the hardware VAD function of the voice chip Detect whether the sound signal contains a voice signal; the coprocessor is used to execute the VAD program to detect whether the sound signal is in a software manner when the currently used VAD mode indicates to enable the software VAD function of the voice chip Contains a voice signal; the coprocessor is also used to execute the power consumption control program for: in the case of using the currently used VAD mode to detect that the voice signal contains a voice signal, control the main The processor enters the normal working mode from the low power consumption mode; wherein, the currently used VAD mode is one of multiple VAD modes generated by the combined use of the hardware VA
  • the embodiment of the present application also provides a self-service terminal, including a voice chip and a main processor;
  • the voice chip includes a pickup module, a hardware VAD module, a coprocessor, and a memory;
  • the memory stores a VAD program and power consumption Control program;
  • the pickup module is used to collect sound signals input to the voice chip;
  • the hardware VAD module is used to use hardware when the currently used VAD mode indicates to enable the hardware VAD function of the voice chip Detect whether the sound signal contains a voice signal;
  • the coprocessor is used to execute the VAD program to detect whether the sound signal is in a software manner when the currently used VAD mode indicates to enable the software VAD function of the voice chip Contains a voice signal;
  • the coprocessor is also used to execute the power consumption control program for: in the case of using the currently used VAD mode to detect that the voice signal contains a voice signal, control the main The processor enters the normal working mode from the low power consumption mode; wherein, the currently used VAD mode is one of
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program.
  • the processor When the computer program is executed by a processor, the processor is caused to implement the power consumption control method and VAD mode configuration provided by the embodiment of the present application. The steps in the method or voice endpoint detection method.
  • the voice chip or device has the VAD function, and the VAD function can be used to detect whether there is a voice signal input.
  • the voice chip or device enters the normal working mode from the low power consumption mode. Save the power consumption of the voice chip or device; further, the voice chip or device has both the hardware VAD function and the software VAD function.
  • the combination of the hardware VAD function and the software VAD function can generate a variety of VAD modes.
  • the VAD mode used can improve the accuracy of voice input detection results to a certain extent, reduce the probability of false triggers, and improve the low power consumption performance of the voice chip or device.
  • FIG. 1 is a schematic flowchart of a power consumption control method according to an exemplary embodiment of this application
  • FIG. 2 is a schematic flowchart of a method for configuring a VAD mode according to an exemplary embodiment of this application;
  • FIG. 3 is a schematic flowchart of a voice endpoint detection method according to an exemplary embodiment of this application.
  • FIG. 4a is a schematic structural diagram of a voice chip provided by an exemplary embodiment of this application.
  • FIG. 4b is a schematic diagram of a state of configuring a voice chip provided by an exemplary embodiment of this application.
  • FIG. 4c is a schematic structural diagram of another voice chip provided by an exemplary embodiment of this application.
  • FIG. 5 is a schematic structural diagram of still another voice chip provided by an exemplary embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a smart terminal provided by an exemplary embodiment of this application.
  • Fig. 7 is a schematic structural diagram of another smart terminal provided by an exemplary embodiment of this application.
  • the voice chip or device has both hardware VAD function and software VAD function. Combining the hardware VAD function and software VAD function can generate multiple VAD modes, and the voice chip or device can be configured flexibly The VAD mode used can improve the accuracy of voice input detection results to a certain extent, reduce the probability of false triggers, and improve the low power consumption performance of the voice chip or device.
  • the voice chip or device in the embodiments of the present application will be explained first.
  • the voice chip or device is not limited, and any chip or device capable of performing voice signal detection and other processing (such as storage and playback) is applicable to the embodiment of the present application.
  • the voice device may be an electronic device that includes a voice chip.
  • the voice device may also include other components, such as communication modules such as WiFi and Bluetooth, a display, and a power supply module.
  • the voice chip contains at least a processor and a memory.
  • the voice device can no longer contain other processors.
  • the processor in the voice chip must not only Realizing the functions related to speech processing also needs to realize the basic functions of the speech equipment. For example, suppose the voice device is a smart alarm clock that supports voice broadcast. The smart alarm clock contains a voice chip. Because the alarm clock function is relatively simple, it does not contain other processors except the processor in the voice chip. This can reduce the smart alarm clock. At this time, the processor in the voice chip must implement functions related to voice processing on the one hand, and implement basic functions such as timing on the other hand.
  • the voice device may also include a main processor.
  • the processor can be implemented as a coprocessor.
  • the voice device is a smart phone that contains a voice chip. Since the smart phone has more powerful functions, it also includes the main processor in addition to the processor in the voice chip, and the processor in the voice chip acts as a co-processing.
  • the device is mainly responsible for functions related to voice processing, while the main processor is mainly responsible for implementing the basic functions of the smart phone, such as communication, wireless Internet access, game playing, video playback, photographing, online transactions and other functions.
  • these voice chips or devices can be applied in low power consumption scenarios, and power consumption issues need to be considered.
  • the voice chip or device is powered by a battery (such as a dry cell, battery, or battery pack), or the device where the voice chip is located is powered by a battery, power consumption needs to be considered in these scenarios.
  • the battery-powered voice device or the device where the voice chip is located can be any smart terminal that contains a voice chip with voice functions, including but not limited to: various remote controls, story machines, smart speakers, tablet computers, smart phones, smart phones Robots, smart alarm clocks, smart bracelets, smart switches, smart speakers, unmanned delivery vehicles, self-service express cabinets or self-service terminals, etc.
  • the self-service terminal may be a supermarket POS machine, a bank self-service teller machine, a self-service shopping guide service terminal in a scene such as a shopping mall, an airport, and so on.
  • the voice chip or device can also be powered by a non-battery power source (such as city power), such as but not limited to: TV, air conditioner, water heater, desktop computer, etc.
  • the voice chip or device in the embodiment of the present application has both the hardware VAD function and the software VAD function.
  • the hardware VAD function refers to the VAD function realized by the voice chip or the hardware VAD module built in the device.
  • the hardware VAD module can be solidified on the voice chip or device, and the hardware VAD module can be modified by configuration parameters.
  • the software VAD function refers to the VAD function realized by the VAD program executed by the processor in the voice chip or device.
  • the hardware VAD function and the software VAD function are used in combination.
  • the combination of the hardware VAD function and the software VAD function can generate a variety of VAD modes.
  • at least the following VAD modes can be combined: hardware VAD mode, software VAD mode, soft-hard combined VAD mode, and soft-soft combined VAD mode, etc.
  • the hardware VAD mode refers to the mode of using the hardware VAD module alone to perform VAD
  • the software VAD mode refers to the mode of using the processor to execute the VAD program alone to perform VAD
  • the combination of software and hardware VAD mode refers to the first use of the hardware VAD module to perform VAD.
  • the processor executes the VAD program to perform a VAD mode
  • the soft-soft combined VAD mode refers to the mode in which the processor executes multiple VAD programs to perform multiple VADs.
  • the implementation of the hardware VAD is not limited.
  • the hardware VAD mode can detect whether the received sound signal contains a voice signal based on the energy of the sound signal.
  • the implementation of the software VAD is not limited.
  • a software VAD implementation method includes: framing the sound signal; extracting features from each frame of data; training a classifier on a set of data frames in a known voice and silence signal area; The framed data is classified, and it is judged whether it is a voice signal or a silent signal, so as to obtain whether there is a voice signal input.
  • another implementation of software VAD includes: pre-training a neural network VAD (Nature Network-VAD, NN-VAD) model through human voice samples, which can detect whether the voice signal contains a voice signal; based on this, The voice signal can be sent to the NN-VAD) model, and the model can be used to detect whether there is a voice signal input.
  • VAD Neture Network-VAD
  • the combination of the hardware VAD function and the software VAD function can generate a variety of VAD modes, and allows the environment information, application scenarios, time information, and/or user to be located according to the voice chip or device. You can flexibly configure the VAD mode used by the voice chip or device according to your preferences, and use a reasonable VAD mode to improve the accuracy of voice input detection results to a certain extent, reduce the probability of false triggers, and improve the low-power performance of the voice chip or device .
  • FIG. 1 is a schematic flowchart of a power consumption control method provided by an exemplary embodiment of this application. This method is suitable for voice chips or devices, which have both hardware VAD functions and software VAD functions. For related descriptions of the voice chip or device, the hardware VAD function and the software VAD function, refer to the foregoing embodiment, and will not be repeated here. As shown in Figure 1, the method includes:
  • the voice chip or device enters the normal working mode from the low power consumption mode; among them, the currently used VAD mode is one of the multiple VAD modes produced by the combination of the hardware VAD function and the software VAD function. .
  • the voice chip or device includes a microphone or pickup module such as a microphone, which can capture sound signals entering the voice chip or device.
  • the sound signal generally refers to the sound wave generated by the vibration of the object.
  • the sound signal entering the voice chip or device may contain both the voice signal and the environmental noise, or it may only contain the environmental noise but not the voice signal.
  • environmental noise includes but is not limited to: traffic noise, construction noise, industrial noise and social noise.
  • the collected sound signals will be different. For example, in an outdoor scene, the sound signal collected by the voice chip or device may include traffic noise, or may include the voice signal sent by the user of the voice chip or device.
  • the sound signals collected by the voice chip or device may include the noise generated by surrounding users, or may include the users of the voice chip or device.
  • the voice signal sent out may include the noise generated by surrounding users, or may include the users of the voice chip or device.
  • the voice chip or device can be in a low power consumption mode to save power consumption.
  • the voice chip or the VAD function of the device can be used to detect whether the voice signal contains the voice signal; when the voice signal is detected to contain the voice signal, the voice chip or device will enter the normal mode from the low power consumption mode Work mode for voice processing. In the normal working mode, the voice chip or device can perform further processing on the voice signal, such as identifying whether the voice signal is a designated wake-up word, and if not, it can enter the sleep state again.
  • the voice chip or device can continue to be kept in the low power consumption mode to save power consumption.
  • the sound pickup module and some hardware modules that implement the VAD function can work normally, and other hardware modules can stop working.
  • the normal working mode is relative to the low power consumption mode. In the normal working mode, the voice chip or each hardware module in the device can work normally, and most or all of the functions can be used normally. Regarding which functions of the voice chip or device can be used normally in low power consumption mode and normal working mode, it can be flexibly implemented in the chip or device design process according to application scenarios and application requirements, and there is no restriction on this.
  • the voice chip or device not only has the hardware VAD function, but also has the software VAD function.
  • the software VAD function and the hardware VAD function are used in combination to obtain multiple VAD modes.
  • the combination of the software VAD function and the hardware VAD function can at least obtain the following VAD modes: hardware VAD mode, software VAD mode, soft-hard combined VAD mode, and soft-soft combined VAD mode.
  • the hardware VAD function, the software VAD function, and various VAD modes please refer to the foregoing embodiment, which will not be repeated here.
  • the VAD mode that the voice chip or device supports (or chooses to use) is not limited.
  • the VAD mode supported by the voice chip or device includes at least the multiple (ie, two or more) VAD modes listed above. In this embodiment, it is allowed to flexibly configure the VAD mode currently used by the voice chip or device.
  • the VAD mode currently used by the voice chip or device is one of the multiple VAD modes supported by the voice chip or device. According to the different VAD modes supported by the voice chip or device, the currently used VAD mode will be different, and there is no limitation on this.
  • the following is an example of the various VAD modes supported by the voice chip or device and the currently used VAD mode in four situations:
  • Case 1 The voice chip or device supports hardware VAD mode and software VAD mode; accordingly, the VAD mode currently used by the voice chip or device is one of the two VAD modes, which can be flexibly configured;
  • Case 2 The voice chip or device supports the hardware VAD mode and the VAD mode combined with hardware and software; accordingly, the VAD mode currently used by the voice chip or device is one of the two VAD modes, which can be flexibly configured;
  • Case 3 The voice chip or device supports the software VAD mode and the VAD mode combined with software and hardware; accordingly, the VAD mode currently used by the voice chip or device is one of the two VAD modes, which can be flexibly configured;
  • Case 4 The voice chip or device supports hardware VAD, software VAD, and VAD combined with software and hardware; accordingly, the VAD mode currently used by the voice chip or device is one of the three VAD modes, which can be flexibly configured .
  • the VAD mode currently used by the voice chip or the device can be used to detect whether the sound signal contains a voice signal.
  • the implementation of detecting whether the voice signal contains the voice signal by using the currently used VAD mode will also be different.
  • the following example illustrates:
  • the voice signal can be sent to the hardware VAD module in the voice chip or device to detect whether the voice signal contains a voice signal by hardware; If it is detected by hardware that the sound signal contains a voice signal, it can be directly determined that the sound signal contains a voice signal.
  • the sound pickup module and hardware VAD module in the voice chip or device are in a normal working state, and whether other modules are in a normal working state is not limited.
  • the voice signal can be sent to the processor in the voice chip or device to detect whether the voice signal contains a voice signal by software; if It can be directly determined that the voice signal contains the voice signal by detecting that the voice signal contains the voice signal by means of software.
  • the processor in the low power consumption mode, the voice chip or the pickup module in the device is in a normal working state, the processor can at least execute the VAD program, and whether other modules are in a normal working state is not limited.
  • the VAD mode currently used by the voice chip or device is a combination of software and hardware.
  • the sound signal can be sent to the voice chip or the hardware VAD module in the device to detect the sound by hardware Whether the signal contains a voice signal; on the other hand, the voice signal can be sent to the voice chip or the processor in the device. If it is detected by hardware that the voice signal contains the voice signal, the processor can detect whether the voice signal contains voice again by software Signal; if it is detected again that the sound signal contains a voice signal by software, it is determined that the sound signal contains a voice signal.
  • the subsequent steps may not be executed, and the voice chip or device will continue to be kept in the low power consumption mode.
  • the voice chip or device will continue to remain in the low power consumption mode.
  • the voice chip or the pickup module in the device and the hardware VAD module are in a normal working state, and the processor can at least execute the VAD program. Whether other modules are in a normal working state, don’t do limited.
  • the voice chip or device supports multiple VAD modes, and the VAD mode used by the voice chip or device can be flexibly configured according to requirements, so that a more suitable VAD mode can be used to improve voice input to a certain extent
  • the accuracy of the detection result reduces the probability of false triggers and improves the low power consumption performance of the voice chip or device.
  • a strategy required for the voice chip or device to use multiple VAD modes can be pre-configured, which is called a VAD mode use strategy.
  • the voice chip or device is configured with information about the conditions that need to be met when using various VAD modes. Based on this, in practical applications, the VAD mode currently used by the voice chip or device can be configured according to the pre-wired VAD usage strategy.
  • VAD mode configuration method As depicting the implementation of the pre-configured VAD mode usage strategy, refer to the following VAD mode configuration method embodiment. It should be noted that the following embodiments of the VAD mode configuration method can also be implemented separately, without relying on the foregoing embodiment of the power consumption control method.
  • FIG. 2 is a schematic flowchart of a VAD mode configuration method provided by an exemplary embodiment of this application. This method is also applicable to voice chips or devices, which have both hardware VAD functions and software VAD functions. For related descriptions of the voice chip or device, the hardware VAD function and the software VAD function, refer to the foregoing embodiment, and will not be repeated here. As shown in Figure 2, the method includes:
  • the strategy configuration instruction configure the VAD mode usage strategy required by the voice chip or device to use multiple VAD modes; among them, the multiple VAD modes are generated by the combined use of hardware VAD functions and software VAD functions.
  • the sender of the policy configuration instruction is not limited. It can be any configuration party that has configuration management authority for voice chips or devices, such as computers, virtual machines, various applications, terminal devices, and configuration personnel. and many more.
  • the method for the configuration party to generate the policy configuration instruction is also not limited.
  • the policy configuration instruction can be generated through a command window or command line, or the policy configuration can be generated through the App used to manage the voice chip or device on the terminal. Commands can also be used to generate policy configuration commands through some web pages that manage voice chips or devices.
  • the policy configuration instruction is used to instruct the VAD mode use policy required to configure the voice chip or device to use various VAD modes.
  • the voice chip or device After the voice chip or device receives the strategy configuration instruction, it can configure the VAD mode strategy required for using various VAD modes according to the strategy configuration instruction.
  • the timing of configuring the use strategy of the VAD mode is not limited.
  • a policy configuration instruction can be generated, and the policy configuration instruction can be sent to the voice chip or device, and the voice chip or device can be configured to use the VAD mode according to the policy configuration instruction during its initialization.
  • Strategy For another example, when the voice chip or device is factory configured, a policy configuration instruction can be generated, and the policy configuration instruction can be sent to the voice chip or device, and the voice chip or device can configure the instruction according to the policy during the factory configuration process. Configure the VAD mode usage strategy.
  • the content of the policy configuration instruction is not limited.
  • the content of the strategy configuration instruction is different, and the method of configuring the VAD mode usage strategy according to the strategy configuration instruction will be different; accordingly, according to the pre-configured VAD mode usage strategy, the way of configuring the voice chip or the VAD mode currently used by the device will also be different. It's different.
  • the following example illustrates:
  • the user is allowed to directly specify the VAD mode used by the voice chip or device through the VAD configuration command in practical applications.
  • the configuration personnel can configure the identification of various VAD modes through the policy configuration instructions.
  • the policy configuration instructions can carry the identification of various VAD modes or the method of assigning identification to various VAD modes and other information. .
  • the identifier of the specified VAD mode can be carried in the VAD configuration command and provided to the voice chip or device; for the voice chip or device , Receive the VAD configuration instruction provided by the user, configure the VAD mode currently used by the voice chip or device according to the VAD mode identifier carried in the VAD configuration instruction, to achieve the purpose of specifying the VAD mode used by the voice chip or device. Further, the voice chip or device can parse out the VAD mode identifier from the VAD configuration command.
  • the VAD mode identifier is used to identify the user-specified VAD mode, where the user-specified VAD mode is a hardware VAD mode, a software VAD mode, or a combination of software and hardware. VAD mode; after that, configure the VAD mode currently used by the voice chip or device to the specified VAD mode.
  • the use of the VAD mode is combined with the remaining power of the voice chip or device. Based on this, the remaining power range corresponding to the voice chip or device when using various VAD modes can be configured according to the policy configuration instructions. Correspondingly, during the application process, the current remaining power of the voice chip or device can be monitored; according to the current remaining power of the voice chip or device, the VAD mode currently used by the voice chip or device can be configured.
  • the hardware VAD mode can be used to give priority to power saving.
  • one or more power thresholds can be set to divide the remaining power range, where the number of power thresholds can be flexibly set according to application requirements.
  • two power thresholds can be set, that is, a first power threshold and a second power threshold, and the first power threshold is greater than the second power threshold.
  • the first power threshold is 90% power
  • the second power threshold is 40% power, but it is not limited to this value.
  • the first power threshold and the second power threshold can be carried in the policy configuration instructions and provided to the voice chip or device; the voice chip or device can be configured with a soft and hard VAD mode corresponding to the remaining power range greater than or equal to the first power threshold ,
  • the configuration software VAD mode corresponds to a remaining power range greater than or equal to the second power threshold but less than the first power threshold, and the configuration hardware VAD mode corresponds to a remaining power range less than the second power threshold.
  • the current remaining power of the voice chip or device can be monitored. If the current remaining power of the voice chip or device is greater than or equal to the first power threshold, configure the voice chip or device current
  • the VAD mode used is a combination of software and hardware; if the current remaining power of the voice chip or device is greater than or equal to the second power threshold and less than the first telecommunication threshold, configure the VAD mode currently used by the voice chip or device as the software VAD mode; If the remaining power of the voice chip or device is less than the second power threshold, configure the VAD mode currently used by the voice chip or device as the hardware VAD mode.
  • the use of the VAD mode is combined with the user attributes of the voice chip or device.
  • the corresponding user attributes of the voice chip or device when using various VAD modes can be configured according to the policy configuration instructions.
  • the user attributes of the voice chip or device currently used can be monitored; according to the user attributes of the voice chip or device currently used, the VAD mode currently used by the voice chip or device can be configured.
  • the user attributes are not limited. For example, it can be any user-related attribute information such as the user's gender, age, voiceprint characteristics, volume level, user level, and the distance between the user and the voice chip.
  • users can be classified into different categories according to one or more attributes of the users. For example, divide users into men and women according to gender; divide users into adults and non-adults (including elderly and children) according to age. Based on this, you can set user categories, such as women or non-adults, using software VAD mode or a combination of software and hardware VAD mode; for non-set user categories, such as men or adults, use hardware VAD mode. Based on this, the set user category can be carried in the policy configuration information and provided to the voice chip or device; the voice chip or device parses the set user category from the policy configuration information, and configures the software VAD mode or the combination of software and hardware VAD mode corresponding Set the user category; configure the hardware VAD mode to correspond to the non-set user category.
  • the set user category can be carried in the policy configuration information and provided to the voice chip or device; the voice chip or device parses the set user category from the policy configuration information, and configures the software VAD mode or the combination of software and hardware VAD mode corresponding Set the user category; configure the
  • different VAD modes can also be used for different users. For example, for set users (such as family member 1 and family member 2), use software VAD mode or a combination of software and hardware VAD mode; for non-set users ( For example, family member 3 and family member 4), use the hardware VAD mode.
  • the identification information of the set user can be carried in the policy configuration information and provided to the voice chip or device; the voice chip or device parses the set user identification information from the policy configuration information, and configures the software VAD mode or a combination of software and hardware
  • the VAD mode corresponds to the set user; the configuration hardware VAD mode corresponds to the non-set user.
  • the user attributes of the current voice chip or device can be monitored; according to the monitored user attributes, it is determined whether the user currently using the voice chip or device is a set user category or Set the user; if the user currently using the voice chip or device is setting the user category or setting the user, the VAD mode currently used by the voice chip or device is configured as software VAD mode or a combination of software and hardware VAD mode; otherwise, if the current VAD mode is If the user using the voice chip or device is a non-set user category or non-set user, configure the VAD mode currently used by the voice chip or device as the hardware VAD mode.
  • voiceprint recognition can be performed on the collected user voice, and then it is determined whether the user is a set user category or a set user according to the voiceprint information. Alternatively, it is also possible to determine whether the user is a set user category or a set user based on the account information used when the user logs in.
  • users can be divided into different categories according to the volume of the users. For example, you can set different volume thresholds to classify users into different categories, and the number of volume thresholds can be flexibly determined according to needs. For example, two volume thresholds are set, namely a first volume threshold and a second volume threshold, and the first volume threshold is smaller than the second volume threshold.
  • the volume is relatively large, use the software VAD mode alone to reduce the power consumption of the VAD, and try to ensure the accuracy of the voice detection results; for users whose volume is greater than the second volume threshold, the volume is relatively large and can be used
  • the hardware VAD mode can further save power while ensuring the accuracy of voice detection results.
  • the first volume threshold and the second volume threshold can be carried in the policy configuration instruction and provided to the voice chip or device; the voice chip or device parses the first volume threshold and the second volume threshold from the policy configuration instruction, and configures the software
  • the user attribute corresponding to the hard-combined VAD mode is that the volume is less than the first volume threshold.
  • the user attribute corresponding to the configuration software VAD mode is greater than or equal to the first volume threshold but less than the second volume threshold.
  • the user attribute corresponding to the hardware VAD mode is configured as volume Less than the second volume threshold.
  • the volume of the user currently using the voice chip or device can be monitored; if the volume of the user currently using the voice chip or device is less than the set first volume threshold, configure The VAD mode currently used by the voice chip or device is a combination of soft and hard VAD; if the volume of the user currently using the voice chip or device is greater than or equal to the first volume threshold but less than the second volume threshold, configure the voice chip or device currently using The VAD mode is a software VAD mode; if the volume of the user currently using the voice chip or device is greater than the second volume threshold, configure the VAD mode currently used by the voice chip or device as the hardware VAD mode.
  • the distance range from the user to the voice chip can also be set.
  • different VAD modes are used. For example, if the distance between the user and the voice chip or device is relatively short, less than the first distance threshold, the hardware VAD mode can be used; if the distance between the user and the voice chip or device is slightly farther, for example, greater than the first distance threshold but less than the second distance threshold , You can use the software VAD mode; if the user is far from the voice chip or device, for example, greater than the second distance threshold, you can use the soft and hard combination of VAD mode.
  • the distance between the user and the voice chip or device can be determined based on the collected information such as the amplitude and orientation of the user's voice.
  • the use of the VAD mode is combined with the upper-layer application state related to the voice chip or device. Based on this, the corresponding upper-layer application state of the voice chip or device when using various VAD modes can be configured according to the policy configuration instruction. Correspondingly, during the application process, the status of the upper layer application related to the voice chip or device can be monitored; according to the status of the upper layer application related to the voice chip or device, the VAD mode currently used by the voice chip or device can be configured.
  • the upper-layer application related to the voice chip or device is not limited.
  • it may be any application that may use the voice function, such as a navigation application, a map application, or an instant messaging application.
  • the state of the upper-layer application can be divided into a running state and a non-running state.
  • a running state in order to perform voice detection more accurately, you can use software VAD mode or a combination of software and hardware VAD mode; conversely, for upper-layer applications that are not running, you can use hardware VAD mode.
  • the information corresponding to the operating state of the upper-layer application and the software VAD mode or the VAD mode combined with software and hardware, and the information corresponding to the non-operating state of the upper-layer application and the hardware VAD mode can be carried in the policy configuration instructions and provided to the voice chip or device.
  • the voice chip or device configures the software VAD mode or the combination of software and hardware VAD mode corresponding to the running state of the upper layer application according to the information carried in the policy configuration instruction, and configures the hardware VAD mode to correspond to the non-running state of the upper layer application.
  • the state of the upper-layer application associated with the voice chip or device can be monitored; if it is connected to the voice chip or device The associated upper-layer application is running, configure the VAD mode currently used by the voice chip or device as software VAD mode or a combination of software and hardware; if the upper-layer application associated with the voice chip or device is in a non-running state, configure the voice chip or device
  • the currently used VAD mode is the hardware VAD mode.
  • the order and duration of use of various VAD modes can be preset, and the various VAD modes are used in sequence according to the order of use, and the length of each use of each VAD mode is the set duration of use. Based on this, the order and duration of use of various VAD modes can be configured according to the policy configuration instructions.
  • the strategy configuration instructions include the order and duration of use of various VAD modes.
  • the order and duration of use of the multiple VAD modes configured in the VAD mode use policy at the end of the use time of the previous VAD mode, configure the VAD mode currently used by the voice chip or device (that is, with The previous VAD mode is adjacent to the next VAD mode).
  • the order and duration of use of various VAD modes are not limited.
  • the order of use of various VAD modes is: a combined software and hardware VAD mode, a software VAD mode, and a hardware VAD mode; or, a hardware VAD mode, a software VAD mode, and a combined software and hardware VAD mode.
  • the use time of the hardware VAD mode can be set longer, such as 10 hours; accordingly, the software VAD mode and the soft-hard combined VAD mode consume power The amount is relatively large.
  • the use time of the software VAD mode and the soft and hard VAD mode can be set to be shorter, such as 2 hours.
  • the duration of use of the three VAD modes can also be the same.
  • the use order and/or use duration of various VAD modes can be dynamically adjusted according to the environmental information of the voice chip or the device. For example, if the voice chip or device is in a noisy environment for a long time, the use time of the software VAD mode and the VAD mode combined with software and hardware can be increased to ensure the accuracy of the voice detection result as much as possible.
  • the use of the VAD mode is combined with the information of the environment in which the voice chip or the device is located. Based on this, the corresponding environmental information when the voice chip or device uses various VAD modes can be configured according to the policy configuration instructions. Correspondingly, during the application process, the current environmental information of the voice chip or device can be monitored; according to the current environmental information of the voice chip or device, the VAD mode currently used by the voice chip or device can be configured.
  • the configuration personnel can obtain in advance the information about the environment where the voice chip or device needs to be located, and carry the information about the environment where the voice chip or device needs to be located in the policy configuration instruction, that is, the policy configuration instruction includes the location where the voice chip or device needs to be located.
  • Environmental information For example, the configuration personnel can input environmental information on a Web page or an App page, and then click the configuration control on the page to issue a policy configuration instruction.
  • the voice chip or device can parse out the environmental information that the voice chip or device needs to be in from the policy configuration instructions; according to the environmental information that the voice chip or device needs to be in, configure the voice chip or device to correspond to when using various VAD modes Environmental information.
  • the configuration personnel may also instruct the voice chip or device to collect the environmental information it needs to be in.
  • the instruction for collecting environmental information may be carried in the policy configuration instruction, that is, the policy configuration instruction includes the instruction for collecting environmental information.
  • the configuration personnel can input an instruction to collect environmental information on a Web page or an App page, and then click the configuration control on the page to issue a policy configuration instruction.
  • the voice chip or device can collect the environmental information in which the voice chip or device needs to be located according to the instructions in the policy configuration instruction to collect environmental information; according to the environmental information in which the voice chip or device needs to be located, configure the voice chip or device in use. Corresponding environmental information in a VAD mode.
  • the environmental information where the voice chip or device needs to be located is not limited.
  • the environmental information may include, but is not limited to: the location of the environment where the voice chip or device is located, environmental type, environmental noise, and environmental noise category And at least one kind of information from time information and the like.
  • the environmental noise can be classified according to the decibel of the sound, for example, it can include but not limited to the following categories: very noisy, generally noisy, relatively quiet, very quiet, etc.; environmental types can include but are not limited to: home environment, office environment, Entertainment places, public places, etc.; environmental noise categories can include but are not limited to: human noise, non-human noise; further, non-human noise can also be divided into: animal noise, construction noise, traffic noise, etc.; time information can be divided into For day and night, it can also be divided into morning, afternoon, evening and so on. It should be noted that several types of environmental information, such as the above-mentioned environmental location, environmental type, environmental noisy, environmental noise category, and time information, can be used alone or combined in any manner. For example, one type of environmental information includes: office environment during the day, which is relatively quiet; another type of environmental information includes: entertainment places at night, which are very noisy; another type of environmental information includes: home environment during the day; and so on.
  • one type of environmental information includes: office environment during the day
  • the voice chip or device can be configured to use various VAD modes according to at least one of the environmental location, environmental type, environmental noise, environmental noise category, and time information corresponding to the environmental information of the voice chip or device.
  • the corresponding environmental information when the voice chip or device uses various VAD modes also includes at least one of environmental location, environmental type, environmental annoyance, environmental noise category, and time information.
  • the implementation of configuring the VAD mode currently used by the voice chip or device includes: according to the environment location, environment type, and environmental noisy corresponding to the current environment of the voice chip or device , At least one of the environmental noise category and time information, configure the VAD mode currently used by the voice chip or device.
  • the VAD mode currently used by the configured voice chip or device is also different according to the different environmental information on which it is based. The following are described in three scenarios:
  • Scenario C1 Configure the VAD mode currently used by the voice chip or device according to the environmental noisy corresponding to the environment where the voice chip or device is located. For example, if the environment annoyance corresponding to the environment where the voice chip or device is located is known or known in advance, if the environment annoyance is greater than the set annoyness threshold, it means the environment is relatively noisy, and the hardware VAD mode has a higher false trigger probability, so it can be configured
  • the VAD mode currently used by the voice chip or device is the software VAD mode or the soft-hard combination VAD mode, which helps to improve the accuracy of the subsequent voice input detection results; if the environmental annoyance is less than or equal to the set annoyance threshold, the environment is compared Quiet, you can configure the VAD mode currently used by the voice chip or device as the hardware VAD mode, which is conducive to the low power consumption performance of the voice chip or device.
  • Scenario C2 Configure the VAD mode currently used by the voice chip or device according to the time information corresponding to the environment where the voice chip or device is located. For example, time can be divided into two time periods: day and night. Compared with night, the environment during the day is relatively noisy and complicated.
  • the time information is the time period corresponding to the day, that is, when the voice chip or device is in the daytime
  • the time information is the time period corresponding to the night, that is, the voice chip or device is in At night, configure the VAD mode currently used by the voice chip or device as the hardware VAD mode, which is conducive to the low power consumption performance of the voice chip or device.
  • Scenario C3 Configure the VAD mode currently used by the voice chip or device according to the environmental noise category corresponding to the environment where the voice chip or device is located.
  • environmental noise can be divided into two categories: human noise and non-human noise. If the environmental noise category includes human noise, in order to better distinguish between human noise and effective voice signals, you can configure the VAD mode currently used by the voice chip or device It is a software VAD mode or a combination of software and hardware VAD mode, which helps to improve the accuracy of subsequent voice input detection results; if the environmental noise category does not include human noise, you can configure the VAD mode currently used by the voice chip or device to be the hardware VAD mode. Conducive to the low power consumption performance of the voice chip or device.
  • one of the multiple VAD modes supported by the voice chip or device is selected as the currently used VAD mode.
  • the voice chip or device supports hardware VAD mode and software VAD mode
  • scenario C1 if the environmental annoyance is greater than the set annoyness threshold, the VAD mode currently used by the voice chip or device is configured as software VAD mode; if the environmental annoyance is less than the set annoyance threshold, the VAD mode currently used by the voice chip or device is configured as the hardware VAD mode.
  • the voice chip or device supports hardware VAD, software VAD, and a combination of hardware and software VAD
  • scenario C2 if the time information is the time period corresponding to the daytime, during the daytime period, there are more human activities, so configure
  • the VAD mode currently used by the voice chip or device is the software VAD mode or the VAD mode combined with software and hardware; if the time information is the time period corresponding to the night, the night time period is less human activity, and the VAD mode currently used by the voice chip or device is configured It is the hardware VAD mode.
  • the solution provided in the embodiments of the present application can be used to reduce the probability of false triggering of the voice chip or device, and realize low power consumption control of the voice chip or device.
  • performing low power consumption control on a voice chip or device is only one application scenario of the technical solution provided by the embodiment of the present application, and is not limited to this.
  • the technical solutions provided by the embodiments of the present application can also be applied to voice endpoint detection, and can also be applied to job control of various language devices, as described in detail in the following embodiments.
  • FIG. 3 is a schematic flowchart of a voice endpoint detection method provided by an exemplary embodiment of this application. This method is suitable for voice chips or devices, which have hardware VAD and software VAD functions. For related descriptions of the voice chip or device, the hardware VAD function and the software VAD function, refer to the foregoing embodiment, and will not be repeated here. As shown in Figure 3, the method includes:
  • VAD mode currently used by the voice chip or the device to perform VAD processing on the sound signal; among them, the currently used VAD mode is one of the multiple VAD modes generated by the combined use of hardware VAD and software VAD.
  • the application scenarios of the voice chip or device are not limited, and the voice chip and device can be applied to various scenarios.
  • the voice chip and device can be applied to smart home devices such as sweeping robots, air conditioners, and televisions, and the smart home devices can be controlled by voice.
  • it can also be applied to car navigation equipment, that is, voice navigation during driving, such as planning a route, querying vehicle speed, and so on.
  • voice navigation equipment that is, voice navigation during driving, such as planning a route, querying vehicle speed, and so on.
  • it can also be applied to automatic ticket vending machines, public lighting systems, and so on.
  • the voice signal can be further processed.
  • the further processing of the voice signal may be: text conversion at the local end to recognize the operation instructions carried by the voice signal; or sending the voice signal to the cloud, and the cloud recognizes the operation instruction carried by the voice signal; and so on.
  • the process of identifying whether there is a voice signal input is the voice endpoint detection process.
  • the VAD mode currently used by the voice chip or device is used to detect whether the voice signal contains a voice signal.
  • the method shown in FIG. 2 can be used to pre-configure the VAD mode usage strategy required by the voice chip or device when using various VAD modes; further, the VAD mode usage strategy can be used in the actual application process. Flexibly configure the VAD mode currently used by the voice chip or device in combination with information about the environment where the voice chip or device is located, related user attributes, upper-layer application status, remaining power, etc.
  • the specific configuration process can refer to the foregoing embodiment, and will not be repeated here. Go into details.
  • smart home devices such as sweeping robots, air conditioners, and televisions have built-in voice chips, and the voice chips are pre-configured to use software VAD mode during the day and use hardware VAD mode during the night. .
  • the sweeping robot In order to work flexibility, the sweeping robot is powered by a battery, so that it does not have to be restricted by the location of the home power socket and the charging cable. In order to save battery power, the sweeping robot works in a low power consumption mode when there is no voice signal input.
  • the user issues cleaning instructions to the sweeping robot by voice, such as "cleaning the living room"; the built-in voice chip of the sweeping robot collects sound signals in the surrounding environment, and uses the software VAD mode to detect whether the sound signals contain voice signals; When detecting that the sound signal contains a voice signal, the sweeping robot enters the normal working mode from the low power consumption mode; after the sweeping robot enters the normal working mode, the voice chip continues to recognize the voice signal to determine whether the voice signal contains the set instruction Words, such as cleaning + living room, cleaning + kitchen, or cleaning, etc. In this embodiment, the voice chip recognizes that the voice signal contains instruction words, namely cleaning + living room, and reports the recognition results to the processor of the cleaning robot. The processor of the cleaning robot controls the cleaning robot according to the instruction words recognized by the voice chip. Perform cleaning tasks in the living room.
  • voice such as "cleaning the living room”
  • the voice chip of the sweeping robot collects sound signals in the surrounding environment, and uses the software VAD mode to detect whether the sound signals contain voice signals;
  • the built-in voice chip of the sweeping robot will collect sound signals in the surrounding environment, and use the hardware VAD mode to detect whether the sound signal contains a voice signal ;
  • the sweeping robot enters the normal working mode from the low power consumption mode.
  • Air conditioners are relatively power-consuming equipment. They are generally installed near a power socket and powered by AC power.
  • a voice chip is built in the air conditioner, and the user can control the air conditioner by voice.
  • the user issues a cooling command to the air conditioner by voice at night, such as "Turn on the air conditioner, the temperature is 27°C"; the voice chip built in the air conditioner will collect the sound signal in the surrounding environment, and use the hardware VAD mode to detect whether the sound signal contains Voice signal: When detecting that the voice signal contains a voice signal, continue to recognize the voice signal and determine whether the voice signal contains a set instruction word, such as turn on + temperature value, turn off, or turn on + work mode name, etc.
  • a set instruction word such as turn on + temperature value, turn off, or turn on + work mode name, etc.
  • the voice chip recognizes that the voice signal contains the instruction word, that is, turn on +27°C, and reports the recognition result to the processor of the air conditioner.
  • the processor of the air conditioner controls the cooling of the air conditioner according to the instruction word recognized by the voice chip.
  • the system starts to work and sets the cooling temperature to 27°C.
  • the execution subject of each step of the method provided in the foregoing embodiment may be the same device, or different devices may also be the execution subject of the method.
  • the execution subject of steps 11 to 13 may be device A; for another example, the execution subject of steps 11 and 12 may be device A, and the execution subject of step 13 may be device B; and so on.
  • Fig. 4a is a schematic structural diagram of a voice chip provided by an exemplary embodiment of this application.
  • the voice chip has both hardware VAD function and software VAD function.
  • the combination of the hardware VAD function and the software VAD function can produce multiple VAD modes.
  • the voice chip of this embodiment supports multiple VAD modes, and allows the voice chip to be located or to be located in accordance with the application scenario, time information and/or Users can flexibly configure the VAD mode used by the voice chip according to their preferences.
  • the voice chip 40 includes: a sound pickup module 41, a hardware VAD module 42, a processor 43, and a memory 44.
  • the application mode of the voice chip is not limited.
  • voice chips can be applied to low-power or energy-saving devices.
  • Low-power or energy-saving devices can include, but are not limited to: battery-powered remote controls, story machines, smart speakers, tablets, smart phones, smart alarm clocks, and smart devices. Bracelets, smart switches, smart speakers, smart robots, unmanned delivery vehicles, self-service express cabinets or self-service terminals, etc.
  • the voice chip can also be implemented as an independent voice device.
  • the sound pickup module 41 is used to collect sound signals input to the voice chip 40.
  • the sound pickup module 41 may be a microphone or a microphone.
  • the sound signal input to the voice chip 40 is not limited.
  • the sound signal input to the voice chip 40 may include, but is not limited to: a voice signal, human noise, environmental noise, and so on.
  • the hardware VAD module 42 is used to detect whether the sound signal contains a voice signal in a hardware manner when the currently used VAD mode indicates that the hardware VAD function of the voice chip 40 is enabled.
  • the currently used VAD mode is one of a variety of VAD modes generated by the combination of hardware VAD functions and software VAD functions. For example, it can be a hardware VAD mode, a software VAD mode, or a combination of software and hardware. .
  • the memory 44 stores a VAD program and a power consumption control program.
  • the processor 43 is configured to execute the VAD program to detect whether the sound signal contains a voice signal in a software manner when the currently used VAD mode instructs to enable the software VAD function of the voice chip.
  • the voice chip 40 of this embodiment supports a low power consumption scheme, that is, when there is no voice signal, the voice chip 40 is in a low power consumption mode, and only when a voice signal appears, will it enter the normal working mode.
  • the processor 43 is also used to execute a power consumption control program, to control the voice chip to enter the normal working mode from the low power consumption mode when the voice signal is detected to contain the voice signal by using the currently used VAD mode .
  • the processor 43 is further configured to control the voice chip to remain in the low power consumption mode when it is detected that the voice signal does not contain the voice signal by using the currently used VAD mode.
  • the signal direction of the sound pickup module 41 will be different.
  • the sound pickup module 41 may specifically send the collected sound signals to the hardware VAD module and the processor 43 respectively.
  • the sound pickup module 41 may specifically send the collected sound signal to the hardware VAD module.
  • the processor 43 will further process the voice signal or the sound signal, and the sound pickup module 41 may also send the collected sound signal to the processor 43.
  • the sound pickup module 41 may specifically send the collected sound signal to the processor 43.
  • the voice chip 40 further includes: a switching module 45; the switching module 45 is connected between the sound pickup module 41, the hardware VAD module 42 and the processor 43.
  • the switching module 45 is specifically configured to switch the sound pickup module 41 to be connected to the hardware VAD module 42 and/or the processor 43 according to the currently used VAD mode.
  • the hardware VAD module 42 if the sound signal sent by the sound pickup module 41 is received, it will detect whether the sound signal contains voice in a hardware manner. Signal, and report the detection result to the processor 43.
  • the actions of the processor 43 are different depending on the currently used VAD mode.
  • the processor 43 is specifically configured to: when the currently used VAD mode is a combination of software and hardware, if it is determined according to the detection result reported by the hardware VAD module 42 that the hardware VAD module detects that the sound signal contains a voice signal, it is detected again by software Whether the voice signal contains a voice signal, and when the voice signal is detected again by software, the voice chip 40 is controlled to enter the normal working mode from the low power consumption mode.
  • the processor 43 is specifically configured to: when the currently used VAD mode is the hardware VAD mode, if it is determined according to the detection result reported by the hardware VAD module 42 that the hardware VAD module detects that the sound signal contains a voice signal, control the voice chip 40 to change from low to low.
  • the power consumption mode enters the normal working mode.
  • the processor 43 is specifically configured to: when the currently used VAD mode is the software VAD mode, detect whether the voice signal contains a voice signal in a software manner, and control the voice signal when the voice signal is detected in the software manner.
  • the chip 40 enters the normal working mode from the low power consumption mode.
  • multiple VAD modes supported by the voice chip 40 can be pre-configured.
  • the voice chip 40 supports hardware VAD mode and software VAD mode; or, supports hardware VAD mode and VAD mode combining software and hardware; or, software VAD mode and VAD mode combining software and hardware; or, supports hardware VAD, software VAD and A combination of hard and soft VAD.
  • the VAD mode currently used by the voice chip 40 can also be flexibly configured. Further, in order to be able to flexibly configure the VAD mode currently used by the voice chip 40, it is also possible to pre-configure the VAD mode use strategy required by the voice chip 40 when using various VAD modes. Based on this, the processor 43 is further configured to configure the VAD mode currently used by the voice chip according to the pre-configured VAD mode use policy.
  • the processor 43 pre-configures the VAD mode use strategy, it is specifically used for: in the initialization process of the voice chip, according to the policy configuration instruction, configure the VAD mode use strategy required by the voice chip 40 when using various VAD modes ; Or, in the process of factory configuration of the voice chip, configure the VAD mode use strategy required by the voice chip 40 when using various VAD modes according to the policy configuration instructions; or, in the process of using the voice chip, configure the instructions according to the policy , Reconfigure the VAD mode usage strategy required when the voice chip 40 uses various VAD modes.
  • the processor 43 is specifically configured to perform at least one of the following operations when configuring the VAD mode usage policy according to the policy configuration instruction:
  • the policy configuration instructions configure the corresponding environmental information when the voice chip uses various VAD modes
  • policy configuration instructions configure the corresponding user attributes of the voice chip when using various VAD modes
  • the policy configuration instructions configure the corresponding upper application state of the voice chip when using various VAD modes
  • the identifiers of various VAD modes are configured for the user to specify the VAD mode used by the voice chip through the VAD configuration instruction.
  • the processor 43 configures the environment information corresponding to the voice chip in various VAD modes, it is specifically used to: according to the policy configuration instruction, obtain the environment information where the voice chip needs to be; configure according to the environment information where the voice chip needs to be The corresponding environmental information when the voice chip uses various VAD modes.
  • the configuration personnel configures the environment information that the voice chip needs to be in through the App page or Web page on the terminal, and clicks the configuration control on the page to issue a policy configuration instruction to the voice chip 40.
  • the policy configuration instruction carries information about the environment where the voice chip needs to be located.
  • the communication module 46 in the voice chip 40 receives the strategy configuration instruction, and reports the strategy configuration instruction to the processor 43; the processor 43 parses out the environmental information where the voice chip needs to be located from the strategy configuration instruction; Environmental information, configure the corresponding environmental information when the voice chip uses various VAD modes.
  • the configuration personnel configure the remaining power range corresponding to various VAD modes through the App page or Web page on the terminal, and click the configuration control on the page to issue a policy configuration instruction to the voice chip 40.
  • the strategy configuration instruction carries the remaining power range corresponding to various VAD modes.
  • the communication module 46 in the voice chip 40 receives the strategy configuration instruction, and reports the strategy configuration instruction to the processor 43; the processor 43 parses the remaining power range corresponding to various VAD modes from the strategy configuration instruction; configures the voice chip in The range of remaining power when using various VAD modes.
  • the configuration personnel configures the correspondence between various VAD modes and the upper-layer application state through the App page or Web page on the terminal, and clicks the configuration control on the page to issue a policy configuration instruction to the voice chip 40.
  • the policy configuration instruction carries the correspondence between various VAD modes and upper-layer application states.
  • the communication module 46 in the voice chip 40 receives the policy configuration instruction, and reports the policy configuration instruction to the processor 43; the processor 43 parses the corresponding relationship between various VAD modes and the upper-layer application state from the policy configuration instruction; configures the voice The corresponding upper-layer application status when the chip uses various VAD modes.
  • the processor 43 is specifically configured to perform at least one of the following operations when configuring the VAD mode currently used by the voice chip according to a pre-configured VAD mode use policy:
  • the use order and use duration of the multiple VAD modes configured in the VAD mode use policy when the use duration of the previous VAD mode ends, configure the VAD mode currently used by the voice chip;
  • the content of the environmental information where the voice chip is located is not limited.
  • the environmental information may include, but is not limited to: the location of the environment where the voice chip is located, type of environment, environmental annoyance, category of environmental noise, and time information. and many more.
  • the above-mentioned environmental information such as environmental location, environmental type, environmental noise, environmental noise category, and time information can be used alone or combined in any manner.
  • the processor 43 is specifically configured to configure the VAD mode currently used by the voice chip according to at least one type of information in the current environment information of the voice chip. According to the different environmental information, the VAD mode currently used by the configured voice chip is also different.
  • the processor 43 is specifically configured to configure the VAD mode currently used by the voice chip according to the environmental annoyance corresponding to the environment where the voice chip is currently located. For example, if the environmental annoyance is greater than the set annoyance threshold, configure the VAD mode currently used by the voice chip to be software VAD mode or a combination of software and hardware VAD; The VAD mode currently used by the chip is the hardware VAD mode.
  • the processor 43 is specifically configured to configure the VAD mode currently used by the voice chip according to time information corresponding to the environment where the voice chip is currently located.
  • the time can be divided into two time periods, day and night.
  • the time information is the time period corresponding to the day, that is, when the voice chip is in the daytime period, configure the VAD mode currently used by the voice chip to be software VAD mode or a combination of software and hardware.
  • VAD mode when the time information is the time period corresponding to the night, that is, when the voice chip is in the night time period, configure the VAD mode currently used by the voice chip as the hardware VAD mode.
  • the processor 43 is specifically configured to configure the VAD mode currently used by the voice chip according to the environmental noise category corresponding to the environment where the voice chip is currently located.
  • environmental noise can be divided into two categories: human noise and non-human noise. If the environmental noise category includes human noise, configure the VAD mode currently used by the voice chip to be software VAD mode or a combination of software and hardware VAD mode; if environmental noise The category does not include human noise. Configure the VAD mode currently used by the voice chip as the hardware VAD mode.
  • the processor 43 configures the VAD mode currently used by the voice chip according to the current remaining power of the voice chip, it is specifically configured to: if the current remaining power of the voice chip is greater than or equal to the first power threshold, configure the current remaining power of the voice chip
  • the VAD mode used is a combination of software and hardware; if the current remaining power of the voice chip is greater than or equal to the second power threshold and less than the first telecommunications threshold, configure the VAD mode currently used by the voice chip to be the software VAD mode; if the voice chip’s If the remaining power is less than the second power threshold, the VAD mode currently used by the voice chip is configured as a hardware VAD mode; wherein the first power threshold is greater than the second power threshold.
  • the processor 43 configures the VAD mode currently used by the voice chip according to the attributes of the user currently using the voice chip, it is specifically used to: if the user currently using the voice chip is a set user category or set user, configure The VAD mode currently used by the voice chip is software VAD mode or a combination of software and hardware VAD mode; if the user currently using the voice chip is a non-set user category or a non-set user, configure the VAD mode currently used by the voice chip to hardware VAD mode .
  • the processor 43 configures the VAD mode currently used by the voice chip according to the attributes of the user currently using the voice chip, it is specifically configured to: if the volume of the user currently using the voice chip is less than the set first volume threshold, Configure the VAD mode currently used by the voice chip as a combination of software and hardware; if the volume of the user currently using the voice chip is greater than or equal to the first volume threshold but less than the second volume threshold, configure the VAD mode currently used by the voice chip as software VAD Mode; if the volume of the user currently using the voice chip is greater than the second volume threshold, configure the VAD mode currently used by the voice chip to the hardware VAD mode; wherein the first volume threshold is less than the second volume threshold.
  • the processor 43 configures the VAD mode currently used by the voice chip according to the state of the upper-layer application associated with the voice chip, it is specifically configured to: if the upper-layer application associated with the voice chip is in the running state, configure the current voice chip
  • the VAD mode used is a software VAD mode or a combination of software and hardware VAD mode; if the upper-layer application associated with the voice chip is not running, configure the VAD mode currently used by the voice chip as the hardware VAD mode.
  • the processor 43 is further configured to dynamically adjust the use order and/or use duration of various VAD modes according to the environmental information in which the voice chip is located.
  • the processor 43 configures the VAD mode currently used by the voice chip according to the VAD mode identifier carried in the VAD configuration instruction, it is specifically configured to: parse the VAD mode identifier from the VAD configuration instruction, and the VAD The mode identifier is used to identify the VAD mode specified by the user, and the specified VAD mode is a hardware VAD mode, a software VAD mode, or a combined software and hardware VAD mode; the VAD mode currently used by the voice chip is configured as the specified VAD mode.
  • the voice chip 40 may also include: a communication module 46, an analog-to-digital conversion (A/D) module 47, a digital-to-analog conversion (D/A) module 48, an audio output component (such as a speaker) 49, and Power component 491 and so on.
  • the A/D module 47 is connected between the sound pickup module 41 and the hardware VAD module 42 for analog-to-digital conversion of the sound signal collected by the sound pickup module 41 and then sent to the hardware VAD module 42.
  • the positions of the A/D module 47 and the switch module 45 are not limited. For example, as shown in FIG.
  • the A/D module 47 is located in front of the switch module 45, or the switch module 45 may be located in Front of D module 47.
  • the D/A module 48 is connected between the audio output component 49 and the processor 43, and is used to convert the digital signal output by the processor 43 into an analog signal and send it to the line audio output component 49 for output by the audio output component 49.
  • the voice chip has the VAD function.
  • the VAD function can be used to detect whether there is a voice signal input. When the voice signal is detected, the voice chip enters the normal working mode from the low power consumption mode, which can save the power of the voice chip.
  • the voice chip has both the hardware VAD function and the software VAD function. The combination of the hardware VAD function and the software VAD function can generate a variety of VAD modes. By flexibly configuring the VAD mode used by the voice chip, it can be improved to a certain extent. The accuracy of the voice input detection result reduces the probability of false triggers and improves the low power consumption performance of the voice chip.
  • FIG. 5 is a schematic structural diagram of still another voice chip provided by an exemplary embodiment of this application.
  • the voice chip has both hardware VAD function and software VAD function.
  • the hardware VAD function and the software VAD function refer to the foregoing embodiment, and will not be repeated here.
  • the combination of the hardware VAD function and the software VAD function can produce multiple VAD modes.
  • the voice chip of this embodiment supports multiple VAD modes, and allows the voice chip or device to be located or to be located in accordance with the application scenario, time information and / Or user preferences, etc. flexibly configure the VAD mode used by the voice chip. As shown in FIG.
  • the voice chip 50 includes: a sound pickup module 51, a hardware VAD module 52, a main processor 53, a coprocessor 56 and a memory 54.
  • the coprocessor 56 mainly assists the main processor 53 to complete processing tasks that it cannot execute or have low execution efficiency and low effect.
  • the coprocessor 56 is mainly used to replace the main processor 53 to complete some work when the main processor 53 is in the low power consumption mode, and is responsible for waking up the main processor 53.
  • the application mode of the voice chip is not limited.
  • the voice chip can be applied to low-power or energy-saving devices.
  • Low-power or energy-saving devices can include, but are not limited to: battery-powered remote controls, story machines, smart speakers, tablets, smart phones, sweeping robots, etc.
  • the voice chip can also be implemented as an independent voice device or an independent application.
  • the sound pickup module 51 is used to collect sound signals input to the voice chip 50.
  • the sound pickup module 51 may be a microphone or a microphone.
  • the sound signal input to the voice chip 50 is not limited.
  • the sound signal input to the voice chip 50 may include, but is not limited to: a voice signal, human noise, environmental noise, and so on.
  • the hardware VAD module 52 is used to detect whether the sound signal contains a voice signal in a hardware manner when the currently used VAD mode indicates that the hardware VAD function of the voice chip is enabled.
  • the currently used VAD mode is one of a variety of VAD modes generated by the combination of hardware VAD functions and software VAD functions. For example, it can be a hardware VAD mode, a software VAD mode, or a combination of software and hardware. .
  • the memory 54 stores a VAD program and a power consumption control program.
  • the coprocessor 56 is used to execute the VAD program to detect whether the sound signal contains a voice signal in a software manner when the currently used VAD mode instructs to enable the software VAD function of the voice chip.
  • the voice chip 50 of this embodiment supports a low power consumption scheme, that is, when there is no voice signal, the voice chip 50 is in a low power consumption mode.
  • the main processor 53 does not work, and the main processor 53 will only work when the voice signal appears. Enter normal working mode.
  • the co-processor 56 is also used to execute a power consumption control program to control the main processor 53 to enter the low power consumption mode when it is detected that the voice signal contains a voice signal by using the currently used VAD mode Normal working mode.
  • the main processor 53 can wake up other hardware modules or functions in the voice chip 50.
  • the coprocessor 56 is also used to control the voice chip 50 to remain in the low power consumption mode when it is detected that the voice signal does not contain the voice signal by using the currently used VAD mode.
  • the voice chip 50 includes a main processor 53 and a coprocessor 56.
  • the function of the coprocessor 56 is similar to that of the processor 43 in the embodiment shown in FIG. 4a to FIG. 4c, and will not be repeated here. Please refer to the embodiment shown in FIG. 4a to FIG. 4c.
  • the functions of other modules except the main processor 53 and the coprocessor 56 are similar to the functions of the corresponding modules in the embodiment shown in FIGS. 4a to 4c. ⁇ Example.
  • Fig. 6 is a schematic structural diagram of a smart terminal provided by an exemplary embodiment of this application.
  • the smart terminal 60 includes a voice chip 65, and the voice chip 65 includes: a sound pickup module 61, a hardware VAD module 62, a processor 63 and a memory 64.
  • the memory 64 stores a VAD program and a power consumption control program.
  • the sound pickup module 61 is used to collect sound signals input to the voice chip.
  • the hardware VAD module 62 is used to detect whether the sound signal contains a voice signal in a hardware manner when the currently used VAD mode indicates that the hardware VAD function of the voice chip is enabled.
  • the processor 63 is configured to execute the VAD program to detect whether the sound signal contains a voice signal in a software manner when the currently used VAD mode instructs to enable the software VAD function of the voice chip. Further, the processor 63 is also used to execute a power consumption control program, to control the smart terminal to enter the normal working mode from the low power consumption mode when the voice signal is detected in the currently used VAD mode. .
  • the currently used VAD mode is one of a variety of VAD modes generated by the combination of hardware VAD functions and software VAD functions.
  • it can be a hardware VAD mode, a software VAD mode, or a combination of software and hardware. .
  • the difference between the smart terminal shown in Fig. 6 and the voice chip shown in Figs. 4a to 4c is that the product form is different.
  • the smart terminal shown in Figure 6 is a device form, such as a remote control, a story machine, a smart speaker, a tablet computer, a smart phone, a smart alarm clock, a smart bracelet, a smart switch, a smart speaker, a smart robot, an unmanned delivery vehicle, Self-service courier cabinets or self-service terminals and other battery-powered electronic devices can also be air conditioners, refrigerators, TVs and other devices that do not rely on battery power.
  • the smart terminal further includes: a communication component 66, a power supply component 68, a display 67 and other components. Among them, the components in the dashed box are optional components, not mandatory components.
  • Fig. 7 is a schematic structural diagram of another smart terminal provided by an exemplary embodiment of this application.
  • the smart terminal 70 includes: a voice chip 75 and a main processor 73; the voice chip 75 includes: a pickup module 71, a hardware VAD module 72, a coprocessor 76, and a memory 74; the memory 74 stores VAD Program and power control program.
  • the sound pickup module 71 is used to collect sound signals input to the voice chip.
  • the hardware VAD module 72 is used to detect whether the sound signal contains a voice signal in a hardware manner when the currently used VAD mode indicates that the hardware VAD function of the voice chip is enabled.
  • the coprocessor 76 is configured to execute the VAD program to detect whether the sound signal contains a voice signal in a software manner when the currently used VAD mode instructs to enable the software VAD function of the voice chip. Further, the co-processor 76 is also used to execute a power consumption control program, so as to control the main processor 73 to enter the low power consumption mode when it is detected that the voice signal contains the voice signal using the currently used VAD mode. Normal working mode.
  • the currently used VAD mode is one of a variety of VAD modes generated by the combination of hardware VAD functions and software VAD functions.
  • it can be a hardware VAD mode, a software VAD mode, or a combination of software and hardware. .
  • the difference between the smart terminal shown in Fig. 7 and the voice chip shown in Fig. 5 is that the product form is different.
  • the smart terminal shown in Figure 7 is a device form, such as a remote control, a story machine, a smart speaker, a tablet computer, a smart phone, a smart alarm clock, a smart bracelet, a smart switch, a smart speaker, a smart robot, an unmanned delivery vehicle, Self-service courier cabinets or self-service terminals and other battery-powered electronic devices can also be air conditioners, refrigerators, TVs and other devices that do not rely on battery power.
  • the smart terminal further includes: a communication component 77, a power supply component 79, a display 78 and other components. Among them, the components in the dashed box are optional components, not mandatory components.
  • the embodiment of the present application also provides an autonomous service terminal, including: a voice chip and a main processor; the voice chip includes a pickup module, a hardware VAD module, a coprocessor, and a memory; the memory stores VAD Program and power control program; pickup module, used to collect the sound signal input to the voice chip; hardware VAD module, used to detect whether the sound signal is in hardware when the current VAD mode indicates that the hardware VAD function of the voice chip is enabled Contains voice signals; the coprocessor is used to execute the VAD program to detect whether the voice signal contains a voice signal by software when the currently used VAD mode indicates to enable the software VAD function of the voice chip; the coprocessor is also used to perform power consumption The control program is used to control the main processor to enter the normal working mode from the low power consumption mode when the voice signal contains the voice signal in the VAD mode currently used; among them, the VAD mode currently used is the hardware VAD One of the multiple VAD modes produced by the combined use of functions and software VAD
  • the difference between the self-service terminal provided in this embodiment and the voice chip shown in FIG. 5 is that the product form is different.
  • the self-service terminal provided in this embodiment may be a supermarket POS machine, a bank self-service teller machine, a self-service shopping guide service terminal in a shopping mall, an airport, and the like.
  • each module in the self-service terminal please refer to the description of the corresponding modules in the embodiment shown in FIG. 5 and FIGS. 4a-4c.
  • the embodiment of the present application provides a computer-readable storage medium storing a computer program.
  • the processor can cause the processor to implement the steps in the method embodiment shown in FIG. 1.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program.
  • the processor can cause the processor to implement the steps in the method embodiment shown in FIG. 2.
  • the embodiment of the present application further provides a computer readable storage medium storing a computer program.
  • the processor can cause the processor to implement the steps in the method embodiment shown in FIG. 3.
  • the memory in the above embodiments is used to store computer programs, and can be configured to store various other data to support operations on the chips or devices to which it belongs. Examples of these data include instructions, messages, pictures, audio, etc., for any application or method operating on the chip or device to which the memory belongs.
  • the memory in the above embodiment can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) , Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • ROM Read-Only Memory
  • Magnetic Memory Flash Memory
  • Flash Memory Magnetic Disk or Optical Disk.
  • the communication component in the foregoing embodiment is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination of them.
  • the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the power supply component in the above embodiment provides power to various components of the device where the power supply component is located.
  • the power supply component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device where the power supply component is located.
  • the audio output component in the above embodiment may be configured to output audio signals.
  • the audio output component includes a speaker (or horn) for outputting audio signals.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

一种功耗控制、模式配置与VAD方法、设备及存储介质。语音芯片(40、50、65、75)或设备具备VAD功能,利用VAD功能可检测是否有语音信号输入,在检测到语音信号的情况下,语音芯片(40、50、65、75)或设备从低功耗模式进入正常工作模式,可节约语音芯片(40、50、65、75)或设备的功耗;进一步,语音芯片(40、50、65、75)或设备同时具备硬件VAD功能和软件VAD功能,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,通过灵活配置语音芯片(40、50、65、75)或设备所使用的VAD模式,可在一定程度上提高语音输入检测结果的准确度,降低误触发概率,提高语音芯片(40、50、65、75)或设备的低功耗性能。

Description

功耗控制、模式配置与VAD方法、设备及存储介质 技术领域
本申请涉及语音处理技术领域,尤其涉及一种功耗控制、模式配置与VAD方法、设备及存储介质。
背景技术
传统语音芯片实现低功耗的方案,主要是在语音芯片中内置硬件VAD(Voice Activity Detection,即语音端点检测)模块。硬件VAD模块基于信号能量检测是否有语音信号输入;在未检测到有语音信号输入时,语音芯片处于低功耗模式;在检测到有语音信号输入时,语音芯片被唤醒并开始语音处理。
由于硬件VAD模块是基于信号能量检测是否有语音信号输入的,在嘈杂的环境中很容易误触发,导致语音芯片的低功耗性能较差。
发明内容
本申请的多个方面提供一种功耗控制、模式配置与VAD方法、设备及存储介质,用以提高语音输入检测结果的准确度,降低误触发概率。
本申请实施例提供一种功耗控制方法,适用于语音芯片或设备,所述语音芯片或设备具备硬件VAD功能和软件VAD功能;所述方法包括:采集输入语音芯片或设备的声音信号;利用语音芯片或设备当前使用的VAD模式,检测所述声音信号是否包含语音信号;若所述声音信号中包含语音信号,所述语音芯片或设备从低功耗模式进入正常工作模式;其中,所述当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种VAD模式配置方法,适用于语音芯片或设备,所述语音芯片或设备具备硬件VAD功能和软件VAD功能,所述方法包括:接收策略配置指令;根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略;其中,所述多种VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的。
本申请实施例还提供一种语音端点检测方法,适用于语音芯片或设备,所述语音芯片或设备具备硬件VAD功能和软件VAD功能,所述方法包括:采集输入语音芯片或设备的声音信号;利用语音芯片或设备当前使用的VAD模式,对所述声音信号进行VAD处理;其中,所述当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种语音芯片,包括:拾音模块、硬件VAD模块、处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;所述拾音模块,用于采集输入所述语音芯片的声音信号;所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;所述处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;所述处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述语音芯片从低功耗模式进入正常工作模式;其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种语音芯片,包括:拾音模块、硬件VAD模块、主处理器、协处理器和存储器;所述存储器中存储有VAD程序和功 耗控制程序;所述拾音模块,用于采集输入所述语音芯片的声音信号;所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;所述协处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;所述协处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述主处理器从低功耗模式进入正常工作模式;其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种智能终端,包括语音芯片,所述语音芯片包括拾音模块、硬件VAD模块、处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;所述拾音模块,用于采集输入所述语音芯片的声音信号;所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;所述处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;所述处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述语音设备从低功耗模式进入正常工作模式;其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种智能终端,包括:语音芯片和主处理器,所述语音芯片包括拾音模块、硬件VAD模块、协处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;所述拾音模块,用于采集输 入所述语音芯片的声音信号;所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;所述协处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;所述协处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述主处理器从低功耗模式进入正常工作模式;其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种自助服务终端,包括语音芯片和主处理器;所述语音芯片包括拾音模块、硬件VAD模块、协处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;所述拾音模块,用于采集输入所述语音芯片的声音信号;所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;所述协处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;所述协处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述主处理器从低功耗模式进入正常工作模式;其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当所述计算机程序被处理器执行时,致使所述处理器实现本申请实施例提供的功耗控制方法、VAD模式配置方法或语音端点检测方法中的步骤。
在本申请实施例中,语音芯片或设备具备VAD功能,利用VAD功能可检测是否有语音信号输入,在检测到语音信号的情况下,语音芯片或设备从低功耗模式进入正常工作模式,可节约语音芯片或设备的功耗;进一步,语音芯片或设备同时具备硬件VAD功能和软件VAD功能,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,通过灵活配置语音芯片或设备所使用的VAD模式,可在一定程度上提高语音输入检测结果的准确度,降低误触发概率,提高语音芯片或设备的低功耗性能。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请示例性实施例的一种功耗控制方法的流程示意图;
图2为本申请示例性实施例的一种VAD模式配置方法的流程示意图;
图3为本申请示例性实施例的一种语音端点检测方法的流程示意图;
图4a为本申请示例性实施例提供的一种语音芯片的结构示意图;
图4b为本申请示例性实施例提供的配置语音芯片的状态示意图;
图4c为本申请示例性实施例提供的又一种语音芯片的结构示意图;
图5为本申请示例性实施例提供的再一种语音芯片的结构示意图;
图6为本申请示例性实施例提供的一种智能终端的结构示意图;
图7为本申请示例性实施例提供的另一种智能终端的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在现有技术中,基于硬件VAD模块检测是否有语音信号输入,在嘈杂的环境中很容易误触发,导致语音芯片的低功耗性能较差。针对该技术问题,在本申请实施例中,语音芯片或设备同时具备硬件VAD功能和软件VAD功能,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,通过灵活配置语音芯片或设备所使用的VAD模式,可在一定程度上提高语音输入检测结果的准确度,降低误触发概率,提高语音芯片或设备的低功耗性能。
在对本申请实施例进行详细介绍之前,先对本申请实施例中的语音芯片或设备进行解释说明。在本申请实施例中,并不对语音芯片或设备做限定,凡是能够进行语音信号检测以及其它处理(例如存储、播放)的芯片或设备均适用于本申请实施例。其中,语音设备可以是包含语音芯片的电子设备,其中,语音设备除了包含语音芯片之外,还可以包含其他组件,例如WiFi、蓝牙等通信模块,显示器、电源模块等。其中,语音芯片中至少包含处理器和存储器。
如果语音设备的功能比较简单,则在语音芯片中的处理器足以实现语音设备的基础功能的情况下,语音设备可以不再包含其他处理器,此种情况下,语音芯片中的处理器不仅要实现语音处理相关的功能还需要实现语音设备的基础功能。例如,假设语音设备是一款支持语音播报的智能闹钟,该智能闹钟包含语音芯片,由于闹钟功能比较简单,所以除了语音芯片中的处理器之外不再包含其它处理器,这可以降低智能闹钟 的实现成本,此时语音芯片中的处理器一方面要实现语音处理相关的功能,另一方面还要实现计时等基础功能。
当然,如果语音设备的功能比较强大,或者语音芯片中处理器的处理能力有限,语音芯片中的处理器不足以实现语音设备的基础功能,则语音设备还可以包含主处理器,语音芯片中的处理器可作为协处理器实现。例如,假设语音设备是一款智能手机,该智能手机包含语音芯片,由于智能手机功能比较强大,所以除了语音芯片中的处理器之外还包括主处理器,语音芯片中的处理器作为协处理器主要负责与语音处理相关的功能,而主处理器主要负责实现智能手机的基础功能,例如通信、无线上网、玩游戏、视频播放、拍照、在线交易等功能。
在本申请实施例中,这些语音芯片或设备可以应用在低功耗场景下,需要考虑功耗问题。例如,语音芯片或设备使用电池(例如干电池、蓄电池或蓄电池组)供电,或者语音芯片所在设备使用电池供电,这些场景下就需要考虑功耗问题。其中,使用电池供电的语音设备或语音芯片所在的设备可以是任何包含语音芯片具备语音功能的智能终端,包括但不限于:各种遥控器、故事机、智能音箱、平板电脑、智能手机、智能机器人、智能闹钟、智能手环、智能开关、智能扬声器、无人送货车、自助快递柜或自助终端机等。其中,自助终端机可以是超市POS机,银行自助取款机,商场、机场等场景中的自助导购服务终端等等。当然,语音芯片或设备也可以使用非电池类的电源(例如市电)供电,例如可以包括但不限于:电视机、空调、热水器、台式电脑等等。
无论是哪种供电方式,也不论语音芯片或设备的实现形态,本申请实施例中的语音芯片或设备同时具备硬件VAD功能和软件VAD功能。其中,硬件VAD功能是指由语音芯片或设备内置的硬件VAD模块实现的VAD功能,可选地,硬件VAD模块可以固化在语音芯片或设备上, 可通过配置参数来修改硬件VAD模块所实现的VAD功能。软件VAD功能是指由语音芯片或设备中的处理器执行VAD程序实现的VAD功能。
在本申请实施例中,将硬件VAD功能与软件VAD功能组合使用。其中,硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式。例如,至少可以组合得到以下VAD模式:硬件VAD模式,软件VAD模式,软硬结合的VAD模式,以及软软结合的VAD模式等。其中,硬件VAD模式是指单独使用硬件VAD模块进行VAD的模式;软件VAD模式是指单独使用处理器执行VAD程序进行VAD的模式;软硬结合的VAD模式是指先使用硬件VAD模块进行一次VAD,再由处理器执行VAD程序进行一次VAD的模式;软软结合的VAD模式是指处理器执行多种以上VAD程序进行多次VAD的模式。
在本申请实施例中,并不限定硬件VAD的实现方式。例如,硬件VAD模式可以基于声音信号的能量来检测接收到的声音信号中是否包含语音信号。同理,在本申请实施例中,也不限定软件VAD的实现方式。例如,一种软件VAD的实现方式包括:将声音信号进行分帧处理;从每一帧数据当中提取特征;在一个已知语音和静默信号区域的数据帧集合上训练一个分类器;对未知的分帧数据进行分类,判断其属于语音信号还是静默信号,从而得到是否有语音信号输入。又例如,另一种软件VAD的实现方式包括:预先通过人声样本训练出神经网络VAD(Nature Network-VAD,NN-VAD)模型,该模型可检测声音信号中是否包含语音信号;基于此,可将声音信号送入NN-VAD)模型,利用该模型检测是否有语音信号输入。
在本申请各实施例中,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,并且允许根据语音芯片或设备所处或要处的环境信息、应用场景、时间信息和/或用户喜好等灵活地配置语音芯片或设备 所使用的VAD模式,使用合理的VAD模式,可在一定程度上提高语音输入检测结果的准确度,降低误触发概率,提高语音芯片或设备的低功耗性能。
以下结合附图,将对本申请实施例提供的基于语音芯片或设备当前使用的VAD模式的功耗控制方法以及配置语音芯片或设备使用的VAD模式的方法进行详细说明。
图1为本申请示例性实施例提供的一种功耗控制方法的流程示意图。该方法适用于语音芯片或设备,语音芯片或设备同时具备硬件VAD功能和软件VAD功能。关于语音芯片或设备以及硬件VAD功能和软件VAD功能的相关描述参见前述实施例,在此不再赘述。如图1所示,该方法包括:
11、采集输入语音芯片或设备的声音信号。
12、利用语音芯片或设备当前使用的VAD模式,检测声音信号是否包含语音信号。
13、若声音信号中包含语音信号,语音芯片或设备从低功耗模式进入正常工作模式;其中,当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
在本实施例中,语音芯片或设备中包括麦克风或者话筒等拾音模块,可捕捉进入语音芯片或设备中的声音信号。在本实施例中,声音信号泛指由物体振动产生的声波。其中,进入语音芯片或设备的声音信号可能同时包含语音信号和环境噪声,也可能仅包含环境噪声但不包含语音信号。其中,环境噪声包括但不限于:交通噪声、施工噪声、工业噪声以及社会噪声等。根据语音芯片或设备所处应用场景的不同,采集到的声音信号也会有所不同。例如,在户外场景中,语音芯片或设备采集到的声音信号可能包括交通噪声,也可能包含语音芯片或设备的使用用户发 出的语音信号。又例如,在商业交易、体育比赛、游行集会、娱乐场所等各种社交场景中,语音芯片或设备采集到的声音信号可能包含周围用户产生的喧闹声,也可能包含语音芯片或设备的使用用户发出的语音信号。
在本实施例中,在没有语音信号输入的时间内,语音芯片或设备可以处于低功耗模式,以节约功耗。在采集到声音信号时,可以利用语音芯片或设备具备的VAD功能检测声音信号中是否包含有语音信号;在检测到声音信号中包含语音信号时,语音芯片或设备会从低功耗模式进入正常工作模式,以便进行语音处理。在正常工作模式下,语音芯片或设备可以对语音信号做进一步处理,例如识别语音信号是否为指定的唤醒词,若不是,还可以重新进入休眠状态等。可选地,在声音信号中不包含语音信号时,语音芯片或设备可以继续保持在低功耗模式下,节约功耗。
在本实施例中,语音芯片或设备在进入低功耗模式时,拾音模块和实现VAD功能的部分硬件模块可以正常工作,其它硬件模块可停止工作。正常工作模式是相对低功耗模式而言的,在正常工作模式,语音芯片或设备中的各硬件模块都可正常工作,大多数或所有功能都可以正常使用。关于语音芯片或设备在低功耗模式和正常工作模式下具有哪些功能可以正常使用,可根据应用场景以及应用需求在芯片或设备设计过程中灵活实现,对此不做限定。
在本实施例中,语音芯片或设备不仅具备硬件VAD功能,同时具备软件VAD功能。在本实施例中,将软件VAD功能和硬件VAD功能组合使用,可得到多种VAD模式。其中,软件VAD功能和硬件VAD功能组合使用至少可以得到以下VAD模式:硬件VAD模式,软件VAD模式,软硬结合的VAD模式,以及软软结合的VAD模式。关于硬件VAD功能、 软件VAD功能以及各种VAD模式的详细说明,请参见前述实施例,在此不再赘述。
在本实施例中,并不限定语音芯片或设备支持(或选择使用)的VAD模式。其中,语音芯片或设备支持的VAD模式至少包括上述列举的多种(即两种或两种以上)的VAD模式。在本实施例中,允许灵活配置语音芯片或设备当前使用的VAD模式。语音芯片或设备当前使用的VAD模式是语音芯片或设备支持的多种VAD模式之一。根据语音芯片或设备支持的多种VAD模式的不同,当前使用的VAD模式也会有所不同,对此不做限定。下面分四种情况对语音芯片或设备支持的多种VAD模式和当前使用的VAD模式进行示例性说明:
情况1:语音芯片或设备支持硬件VAD模式和软件VAD模式;相应地,语音芯片或设备当前使用的VAD模式是这两种VAD模式中的一种,具体是哪种可灵活配置;
情况2:语音芯片或设备支持硬件VAD模式和软硬结合的VAD模式;相应地,语音芯片或设备当前使用的VAD模式是这两种VAD模式中的一种,具体是哪种可灵活配置;
情况3:语音芯片或设备支持软件VAD模式和软硬结合的VAD模式;相应地,语音芯片或设备当前使用的VAD模式是这两种VAD模式中的一种,具体是哪种可灵活配置;
情况4:语音芯片或设备支持硬件VAD、软件VAD和软硬结合的VAD;相应地,语音芯片或设备当前使用的VAD模式是这三种VAD模式中的一种,具体是哪种可灵活配置。
在本实施例中,在采集到声音信号之后,可以利用语音芯片或设备当前使用的VAD模式,检测声音信号中是否包含语音信号。其中,根据语音芯片或设备当前使用的VAD模式的不同,利用当前使用的VAD模 式,检测声音信号中是否包含语音信号的实施方式也会有所不同。下面举例说明:
在可选实施例A1中,语音芯片或设备当前使用VAD模式为硬件VAD模式,则可将声音信号送入语音芯片或设备中的硬件VAD模块中,以硬件方式检测声音信号是否包含语音信号;若以硬件方式检测出声音信号中包含语音信号,可直接确定声音信号中包含语音信号。在可选实施例A1中,在低功耗模式下,语音芯片或设备中的拾音模块和硬件VAD模块处于正常工作状态,其它模块是否处于正常工作状态不做限定。
在可选实施例A2中,语音芯片或设备当前使用VAD模式为软件VAD模式,则可将声音信号送入语音芯片或设备中的处理器,以软件方式检测声音信号中是否包含语音信号;若以软件方式检测出声音信号中包含语音信号,可直接确定声音信号中包含语音信号。在可选实施例A2中,在低功耗模式下,语音芯片或设备中的拾音模块处于正常工作状态,处理器至少可以执行VAD程序,其它模块是否处于正常工作状态不做限定。
在可选实施例A3中,语音芯片或设备当前使用的VAD模式为软硬结合的VAD模式,则一方面可将声音信号送入语音芯片或设备中的硬件VAD模块中,以硬件方式检测声音信号是否包含语音信号;另一方面可将声音信号送入语音芯片或设备中的处理器,若以硬件方式检测出声音信号中包含语音信号,处理器可以以软件方式再次检测声音信号是否包含语音信号;若以软件方式再次检测出声音信号中包含语音信号,确定声音信号中包含语音信号。其中,若以硬件方式未检测出声音信号中包含语音信号,可不再执行后续步骤,语音芯片或设备继续保持在低功耗模式下。同理,若以软件方式再次检测时,未检测出声音信号中包含语音信号,则语音芯片或设备继续保持在低功耗模式下。在可选实施例A3 中,在低功耗模式下,语音芯片或设备中的拾音模块和硬件VAD模块处于正常工作状态,处理器至少可以执行VAD程序,其它模块是否处于正常工作状态不做限定。
在本申请实施例中,语音芯片或设备支持多种VAD模式,并且可根据需求灵活地配置语音芯片或设备所使用的VAD模式,从而使用更加符合需求的VAD模式,在一定程度上提高语音输入检测结果的准确度,降低误触发概率,提高语音芯片或设备的低功耗性能。为了便于配置语音芯片或设备所使用的VAD模式,在本申请实施例中,可以预先配置语音芯片或设备使用多种VAD模式所需的策略,称为VAD模式使用策略。在VAD模式使用策略中,配置了语音芯片或设备使用各种VAD模式需要满足的条件信息。基于此,在实际应用中,可根据预线配置的VAD使用策略,配置语音芯片或设备当前使用的VAD模式。
关于预先配置VAD模式使用策略的实施方式,可参见下述VAD模式配置方法实施例。需要说明的是,下述VAD模式配置方法实施例也可以单独实施,而不依赖于上述功耗控制方法实施例。
图2为本申请示例性实施例提供的一种VAD模式配置方法的流程示意图。该方法同样适用于语音芯片或设备,语音芯片或设备同时具备硬件VAD功能和软件VAD功能。关于语音芯片或设备以及硬件VAD功能和软件VAD功能的相关描述参见前述实施例,在此不再赘述。如图2所示,该方法包括:
21、接收策略配置指令;
22、根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略;其中,多种VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的。
在本实施例中,并不限定策略配置指令的发送方,其可以是任何对 语音芯片或设备具有配置管理权限的配置方,例如可以是计算机、虚拟机、各类应用、终端设备以及配置人员等等。在本实施例中,也不限定配置方生成策略配置指令的方式,例如可以通过命令窗口或命令行等方式生成策略配置指令,也可以通过终端上用于管理语音芯片或设备的App生成策略配置指令,还可以通过一些管理语音芯片或设备的Web页面生成策略配置指令。
在本实施例中,策略配置指令用于指示配置语音芯片或设备使用各种VAD模式所需的VAD模式使用策略。语音芯片或设备接收到策略配置指令之后,可根据策略配置指令配置使用各种VAD模式所需的VAD模式策略。
在本申请实施例中,并不限定配置VAD模式使用策略的时机。例如,可以在对语音芯片或设备进行初始化时,生成策略配置指令,将策略配置指令发送给语音芯片或设备,则语音芯片或设备可以在其初始化过程中,根据策略配置指令,配置VAD模式使用策略。又例如,也可以在对语音芯片或设备进行出厂配置时,生成策略配置指令,将策略配置指令发送给语音芯片或设备,则语音芯片或设备可在其出厂配置过程中,根据策略配置指令,配置VAD模式使用策略。又例如,也可以在语音芯片或设备使用过程中,生成策略配置指令,将策略配置指令发送给语音芯片或设备,则语音芯片或设备可以在使用过程中,根据策略配置指令,重新配置VAD模式使用策略,达到更新VAD模式使用策略的目的。
无论是上述哪种场景,在本实施例中,并不对策略配置指令的内容进行限定。策略配置指令包含的内容不同,根据策略配置指令配置VAD模式使用策略的方式会有所不同;相应地,根据预先配置的VAD模式使用策略,配置语音芯片或设备当前使用的VAD模式的方式也会有所不同。下面举例说明:
在可选实施例B1中,允许用户在实际应用中通过VAD配置指令直接指定语音芯片或设备使用的VAD模式。为了便于用户识别各种VAD模式,配置人员可以通过策略配置指令配置各种VAD模式的标识,策略配置指令中可以携带有各种VAD模式的标识或者是为各种VAD模式分配标识的方式等信息。相应地,在应用过程中,若用户希望为语音芯片或设备指定使用的VAD模式,则可以将指定VAD模式的标识携带在VAD配置指令中提供给语音芯片或设备;对语音芯片或设备来说,接收用户提供的VAD配置指令,根据VAD配置指令中携带的VAD模式标识,配置语音芯片或设备当前使用的VAD模式,达到指定语音芯片或设备使用的VAD模式的目的。进一步,语音芯片或设备可以从VAD配置指令中解析出VAD模式标识,该VAD模式标识用于标识用户指定的VAD模式,其中,用户指定VAD模式为硬件VAD模式、软件VAD模式或软硬结合的VAD模式;之后,将语音芯片或设备当前使用的VAD模式配置为指定VAD模式。
在可选实施例B2中,将VAD模式的使用与语音芯片或设备的剩余电量相结合。基于此,可以根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的剩余电量范围。相应地,在应用过程中,可监测语音芯片或设备当前的剩余电量;根据语音芯片或设备当前的剩余电量,配置语音芯片或设备当前使用的VAD模式。
例如,在剩余电量较多时,可以使用软硬结合的VAD模式,保证语音检测结果的准确度;在剩余电量一般时,可以使用软件VAD模式,在减少VAD消耗的电量的同时,尽量保证语音检测结果的准确度;在剩余电量较少时,可以使用硬件VAD模式,优先节约电量。其中,可以设定一个或多个电量阈值来划分剩余电量范围,其中,电量阈值的数量可根据应用需求灵活设定。
例如,可以设定两个电量阈值,即第一电量阈值和第二电量阈值,而且第一电量阈值大于第二电量阈值。例如,第一电量阈值为90%电量,第二电量阈值为40%,电量,但不限于此数值。基于此,可将第一电量阈值和第二电量阈值携带在策略配置指令中提供给语音芯片或设备;语音芯片或设备可以配置软硬结合的VAD模式对应大于等于第一电量阈值的剩余电量范围,配置软件VAD模式对应大于等于第二电量阈值但小于第一电量阈值的剩余电量范围,配置硬件VAD模式对应小于第二电量阈值的剩余电量范围。相应地,在配置语音芯片或设备当前使用的VAD模式时,可以监测语音芯片或设备当前的剩余电量,若语音芯片或设备当前的剩余电量大于或等于第一电量阈值,配置语音芯片或设备当前使用的VAD模式为软硬结合的VAD模式;若语音芯片或设备当前的剩余电量大于等于第二电量阈值且小于第一电联阈值,配置语音芯片或设备当前使用的VAD模式为软件VAD模式;若语音芯片或设备的剩余电量小于第二电量阈值,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
在可选实施例B3中,将VAD模式的使用与使用语音芯片或设备的用户属性相结合。基于此,可以根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的用户属性。相应地,在应用过程中,可监测当前使用语音芯片或设备的用户属性;根据当前使用语音芯片或设备的用户属性,配置语音芯片或设备当前使用的VAD模式。在本实施例中,并不对用户属性进行限定,例如可以是用户的性别、年龄阶段、声纹特征、音量大小、用户等级、用户到语音芯片的距离等任何与用户相关的属性信息。
在一种可选实施例中,可以根据用户的一种或多种属性将用户划分为不同类别。例如,按照性别将用户划分为男人和女人;又例如按照年 龄将用户划分为成年人和非成年人(包括老年人和孩子)。基于此,可以设定用户类别,例如女人或非成年人,使用软件VAD模式或软硬结合的VAD模式;对于非设定用户类别,例如男人或成年人,使用硬件VAD模式。基于此,可以将设定用户类别携带在策略配置信息中提供给语音芯片或设备;语音芯片或设备从策略配置信息中解析出设定用户类别,配置软件VAD模式或软硬结合的VAD模式对应设定用户类别;配置硬件VAD模式对应非设定用户类别。在某些场景中,也可以针对不同用户使用不同VAD模式,例如对于设定用户(例如家庭成员1和家庭成员2),使用软件VAD模式或软硬结合的VAD模式;对于非设定用户(例如家庭成员3和家庭成员4),使用硬件VAD模式。基于此,可以将设定用户的标识信息携带在策略配置信息中提供给语音芯片或设备;语音芯片或设备从策略配置信息中解析出设定用户的标识信息,配置软件VAD模式或软硬结合的VAD模式对应设定用户;配置硬件VAD模式对应非设定用户。相应地,在配置语音芯片或设备当前使用的VAD模式时,可以监测当前使用语音芯片或设备的用户属性;根据监测到的用户属性判断当前使用语音芯片或设备的用户是否是设定用户类别或设定用户;若当前使用语音芯片或设备的用户是设定用户类别或设定用户,则配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;反之,若当前使用语音芯片或设备的用户为非设定用户类别或非设定用户,则配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。可选地,可以对采集到的用户声音进行声纹识别,进而根据声纹信息判断用户是否为设定用户类别或设定用户。或者,也可以根据用户登录时使用的账号信息,判断用户是否为设定用户类别或设定用户。
在另一可选实施例中,可以根据用户的音量大小将用户划分为不同类别。例如,可以设定不同的音量阈值将用户划分为不同类别,音量阈 值的数量可根据需求灵活而定。例如,设定两个音量阈值,即第一音量阈值和第二音量阈值,且第一音量阈值小于第二音量阈值。基于此,对于音量小于第一音量阈值的用户,考虑用户音量较小,使用软硬结合的VAD模式,可以保证语音检测结果的准确度;对于音量大于或等于第一音量阈值但小于第二音量阈值的用户,音量相对较大,单独使用软件VAD模式,在减少VAD消耗的电量的同时,尽量保证语音检测结果的准确度;对于音量大于第二音量阈值的用户,音量相对较大,可以使用硬件VAD模式,在保证语音检测结果的准确度的同时,可进一步节约电量。基于此,可以将第一音量阈值和第二音量阈值携带在策略配置指令中提供给语音芯片或设备;语音芯片或设备从策略配置指令中解析出第一音量阈值和第二音量阈值,配置软硬结合的VAD模式对应的用户属性为音量小于第一音量阈值,配置软件VAD模式对应的用户属性为大于或等于第一音量阈值但小于第二音量阈值,配置硬件VAD模式对应的用户属性为音量小于第二音量阈值。相应地,在配置语音芯片或设备当前使用的VAD模式时,可监测当前使用语音芯片或设备的用户的音量;若当前使用语音芯片或设备的用户的音量小于设定的第一音量阈值,配置语音芯片或设备当前使用的VAD模式为软硬结合的VAD模式;若当前使用语音芯片或设备的用户的音量大于或等于第一音量阈值但小于第二音量阈值,配置语音芯片或设备当前使用的VAD模式为软件VAD模式;若当前使用语音芯片或设备的用户的音量大于第二音量阈值,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
在另一可选实施例中,还可以设定用户到语音芯片的距离范围,用户到语音芯片或设备的距离属于不同距离范围时,使用不同的VAD模式。例如,如果用户到语音芯片或设备的距离较近,小于第一距离阈值,可使用硬件VAD模式;如果用户到语音芯片或设备的距离稍远,例如大于 第一距离阈值但小于第二距离阈值,可以使用软件VAD模式;如果用户到语音芯片或设备的距离较远,例如大于第二距离阈值,可以使用软硬结合的VAD模式。可选地,可以根据采集到的用户声音的幅度和方位等信息判断用户到语音芯片或设备的距离。
在可选实施例B4中,将VAD模式的使用与语音芯片或设备相关的上层应用状态相结合。基于此,可以根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的上层应用状态。相应地,在应用过程中,可监测与语音芯片或设备相关的上层应用的状态;根据与语音芯片或设备相关的上层应用的状态,配置语音芯片或设备当前使用的VAD模式。
在本实施例中,并不限定与语音芯片或设备相关的上层应用,例如可以是导航应用、地图应用、或即时通信应用等任何可能使用语音功能的应用。
可选地,可以将上层应用的状态划分为运行状态和非运行状态。考虑到处于运行状态的上层应用使用语音芯片的概率较高,为了更加准确地进行语音检测,可以使用软件VAD模式或软硬结合VAD模式;反之,对处于非运行状态的上层应用,可以使用硬件VAD模式。基于此,可以将上层应用的运行状态与软件VAD模式或软硬结合的VAD模式对应的信息以及上层应用的非运行状态与硬件VAD模式对应的信息携带在策略配置指令中提供给语音芯片或设备;语音芯片或设备根据策略配置指令中携带的信息,配置软件VAD模式或软硬结合的VAD模式对应上层应用的运行状态,配置硬件VAD模式对应上层应用的非运行状态。相应地,在根据与语音芯片或设备关联的上层应用的运行状态,配置语音芯片或设备当前使用的VAD模式时,可以监听与语音芯片或设备关联的上层应用的状态;若与语音芯片或设备关联的上层应用处于运行状态,配 置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;若与语音芯片或设备关联的上层应用处于非运行状态,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
在可选实施例B5中,可预先设定各种VAD模式的使用顺序和使用时长,各种VAD模式按照使用顺序依次使用,每种VAD模式每次使用的时间长度为设定的使用时长。基于此,可以根据策略配置指令,配置各种VAD模式的使用顺序和使用时长。策略配置指令中包含各种VAD模式的使用顺序和使用时长。相应地,在应用过程中,根据VAD模式使用策略中配置的多种VAD模式的使用顺序和使用时长,在前一VAD模式使用时长结束时,配置语音芯片或设备当前使用的VAD模式(即与前一VAD模式相邻的下一VAD模式)。
在本实施例中,并不限定各种VAD模式的使用顺序和使用时长。例如,各种VAD模式的使用顺序依次为:软硬结合的VAD模式、软件VAD模式以及硬件VAD模式;或者,是硬件VAD模式、软件VAD模式以及软硬结合的VAD模式。另外,考虑到硬件VAD模式耗电量较低,有利于节约电量,可以将硬件VAD模式的使用时长设置的长一些,例如10小时;相应地,软件VAD模式和软硬结合的VAD模式耗电量相对较多,为了节约电量,可以将软件VAD模式和软硬结合的VAD模式的使用时长设置的短一些,例如2小时。当然,三种VAD模式的使用时长也可以相同。
进一步,可以根据语音芯片或设备所处的环境信息,动态调整各种VAD模式的使用顺序和/或使用时长。例如,若语音芯片或设备长期处于比较嘈杂的环境中,可以增大软件VAD模式和软硬结合的VAD模式的使用时长,尽量保证语音检测结果的准确度。
在可选实施例B6中,将VAD模式的使用与语音芯片或设备所处的 环境信息相结合。基于此,可以根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的环境信息。相应地,在应用过程中,可监测语音芯片或设备当前所处的环境信息;根据语音芯片或设备当前所处的环境信息,配置语音芯片或设备当前使用的VAD模式。
可选地,配置人员可以预先获取语音芯片或设备需要处于的环境信息,并将语音芯片或设备需要处于的环境信息携带在策略配置指令中,即策略配置指令中包含语音芯片或设备需要处于的环境信息。例如,配置人员可在Web页面或App页面上输入环境信息,然后点击页面上的配置控件发出策略配置指令。基于此,语音芯片或设备可以从策略配置指令中解析出语音芯片或设备需要处于的环境信息;根据语音芯片或设备需要处于的环境信息,配置语音芯片或设备在使用各种VAD模式时对应的环境信息。
可选地,配置人员也可以指示语音芯片或设备采集其需要处于的环境信息,基于此可将采集环境信息的指令携带在策略配置指令中,即策略配置指令中包含采集环境信息的指令。例如,配置人员可在Web页面或App页面上输入指示采集环境信息的指令,然后点击页面上的配置控件发出策略配置指令。基于此,语音芯片或设备可以根据策略配置指令中指示采集环境信息的指令,采集语音芯片或设备需要处于的环境信息;根据语音芯片或设备需要处于的环境信息,配置语音芯片或设备在使用各种VAD模式时对应的环境信息。
在本申请实施例中,并不对语音芯片或设备需要处于的环境信息进行限定,例如环境信息可以包含但不限于:语音芯片或设备所处的环境位置、环境类型、环境嘈杂度、环境噪声类别以及时间信息等中的至少一种信息。其中,环境嘈杂度可以根据声音的分贝来分类,例如可以包括但不限于以下几类:非常嘈杂、一般嘈杂、比较安静、非常安静等; 环境类型可包括但不限于:家庭环境、办公环境、娱乐场所、公共场所等等;环境噪声类别可以包含但不限于:人声噪声、非人类噪声;进一步,非人类噪声还可以分为:动物噪声、施工噪声、交通噪声等等;时间信息可以分为白天、夜晚,也可以分为上午、下午、晚上等等。需要说明的是,上述环境位置、环境类型、环境嘈杂度、环境噪声类别、时间信息等几类环境信息可以单独使用,也可以以任意方式组合使用。例如,一种环境信息包括:白天的办公环境,比较安静;另一种环境信息包括:晚上的娱乐场所,非常嘈杂;又一种环境信息包括:白天的家庭环境;等等。
基于上述,可以根据语音芯片或设备所处环境信息对应的环境位置、环境类型、环境嘈杂度、环境噪声类别以及时间信息中的至少一种信息,配置语音芯片或设备在使用各种VAD模式时对应的环境信息。相应地,语音芯片或设备在使用各种VAD模式时对应的环境信息也包括:环境位置、环境类型、环境嘈杂度、环境噪声类别以及时间信息中的至少一种信息。相应地,根据语音芯片或设备当前所处的环境信息,配置语音芯片或设备当前使用的VAD模式的实施方式包括:根据语音芯片或设备当前所处环境对应的环境位置、环境类型、环境嘈杂度、环境噪声类别以及时间信息中的至少一种信息,配置语音芯片或设备当前使用的VAD模式。根据所依据的环境信息的不同,所配置的语音芯片或设备当前使用的VAD模式也有所不同。下面分三种场景进行说明:
场景C1:根据语音芯片或设备所处环境对应的环境嘈杂度,配置语音芯片或设备当前使用的VAD模式。例如,已知或预先可知语音芯片或设备所处环境对应的环境嘈杂度,若环境嘈杂度大于设定的嘈杂度阈值,说明环境比较嘈杂,使用硬件VAD模式误触发概率较高,因此可配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD 模式,这有利于提高后续语音输入检测结果的精度;若环境嘈杂度小于或等于设定的嘈杂度阈值,说明环境比较安静,则可以配置语音芯片或设备当前使用的VAD模式为硬件VAD模式,这有利于语音芯片或设备的低功耗性能。
场景C2:根据语音芯片或设备所处环境对应的时间信息,配置语音芯片或设备当前使用的VAD模式。例如,可以将时间划分为白天和夜晚两个时间段,相对于夜晚,白天环境相对要嘈杂和复杂,因此,在时间信息为白天对应的时间段,即语音芯片或设备处于白天时段时,则配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式,这有利于提高后续语音输入检测结果的精度;在时间信息为夜晚对应的时间段,即语音芯片或设备处于夜晚时段时,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式,这利于语音芯片或设备的低功耗性能。
场景C3:根据语音芯片或设备所处环境对应的环境噪声类别,配置语音芯片或设备当前使用的VAD模式。例如,可将环境噪声划分为人类噪声和非人类噪声两大类,则若环境噪声类别包括人类噪声,为了更好地区分人类噪声与有效语音信号,可以配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式,这有利于提高后续语音输入检测结果的精度;若环境噪声类别不包括人类噪声,可以配置语音芯片或设备当前使用的VAD模式为硬件VAD模式,这利于语音芯片或设备的低功耗性能。
上述三种场景可以与前述实施例描述的四种情况结合使用,具体地,根据上述三种场景中的配置方式,在语音芯片或设备所支持的多种VAD模式中选择一种作为当前使用的VAD模式。例如,对于情况1,语音芯片或设备支持硬件VAD模式和软件VAD模式,则在场景C1中,若环境 嘈杂度大于设定的嘈杂度阈值,则配置语音芯片或设备当前使用的VAD模式为软件VAD模式;若环境嘈杂度小于设定的嘈杂度阈值,则配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。又例如,对于情况4,语音芯片或设备支持硬件VAD、软件VAD和软硬结合的VAD,则在场景C2中,若时间信息为白天对应的时间段,白天时间段,人类活动较多,配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;若时间信息为夜晚对应的时间段,夜晚时间段,人类活动较少,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
需要说明的是,在低功耗场景中,可以利用本申请实施例提供的方案来降低对语音芯片或设备的误触发概率,实现对语音芯片或设备的低功耗控制。在低功耗场景中,对语音芯片或设备进行低功耗控制仅为本申请实施例提供的技术方案的一种应用场景,并不限于此。本申请实施例提供的技术方案还可以应用于语音端点检测,也可以应用于各类语言设备的作业控制,下面实施例进行详细说明。
图3为本申请示例性实施例提供的一种语音端点检测方法的流程示意图。该方法适用于语音芯片或设备,语音芯片或设备具备硬件VAD和软件VAD功能。关于语音芯片或设备以及硬件VAD功能和软件VAD功能的相关描述参见前述实施例,在此不再赘述。如图3所示,该方法包括:
31、采集输入语音芯片或设备的声音信号;
32、利用语音芯片或设备当前使用的VAD模式,对声音信号进行VAD处理;其中,当前使用的VAD模式是硬件VAD和软件VAD组合使用所产生的多种VAD模式之一。
在本申请实施例中,并不对语音芯片或设备的应用场景做限定,语 音芯片和设备可以应用于各种场景。例如,可应用于扫地机器人、空调、电视机等智能家居设备中,进而可通过语音操控智能家居设备。又例如,也可应用于车载导航设备中,即在驾驶过程中进行语音导航,比如规划路线、查询车速等。又例如,还可应用于自动售票机、公共照明系统等等。
无论上述哪种应用场景,都需要识别是否有语音信号输入,进一步,在有语音信号输入的情况下,可对语音信号做进一步处理。其中,对语音信号的进一步处理可以是:在本端进行文本转换,识别语音信号携带的作业指令;或者,将语音信号发送至云端,由云端识别语音信号携带的作业指令;等等。其中,识别是否有语音信号输入的过程就是语音端点检测过程。在本实施例中,利用语音芯片或设备当前使用的VAD模式,检测声音信号是否包含语音信号。关于利用语音芯片或设备当前使用的VAD模式,检测声音信号是否包含语音信号的详细实施例,可参见前述实施例,在此不再赘述。
在本实施例中,可以利用图2所示的方法预先配置语音芯片或设备在使用各种VAD模式时所需的VAD模式使用策略;进而,在实际应用过程中可根据所述VAD模式使用策略,结合语音芯片或设备所处的环境信息、相关用户属性、上层应用的状态、剩余电量等信息灵活配置语音芯片或设备当前使用的VAD模式,具体配置过程可参见前述实施例,在此不再赘述。
例如,在语音控制智能家居设备的场景中,扫地机器人、空调、电视机等智能家居设备内置有语音芯片,并预先配置语音芯片在白天时段内使用软件VAD模式,在晚上时段内使用硬件VAD模式。
语音芯片在扫地机器人中的应用:
为了作业的灵活性,扫地机器人以蓄电池供电,这样不用受家庭电 源插座位置以及充电线的限制。为了节约电池电量,在未有语音信号输入的情况下,扫地机器人工作在低功耗模式下。用户在白天时段面向扫地机器人以语音方式发出清扫指令,例如“清扫客厅”;扫地机器人内置的语音芯片会采集周围环境中的声音信号,并利用软件VAD模式检测该声音信号中是否包含语音信号;在检测到声音信号包含语音信号时,扫地机器人从低功耗模式进入正常工作模式;在扫地机器人进入正常工作模式之后,语音芯片继续对语音信号进行识别,判断语音信号中是否包含设定的指令词,例如清扫+客厅,清扫+厨房,或者清扫等。在本实施例中,语音芯片识别出语音信号中包含指令词,即清扫+客厅,将识别结果上报给扫地机器人的处理器,扫地机器人的处理器根据语音芯片识别出的指令词,控制扫地机器人在客厅执行清扫任务。
若用户在夜晚时段面向扫地机器人以语音方式发出清扫指令,例如“清扫客厅”;扫地机器人内置的语音芯片会采集周围环境中的声音信号,并利用硬件VAD模式检测该声音信号中是否包含语音信号;在检测到声音信号包含语音信号时,扫地机器人从低功耗模式进入正常工作模式。关于扫地机器人进入正常工作模式之后的相关操作可参见前面描述,在此不再赘述。
语音芯片在空调中的应用:
空调属于比较耗电的设备,一般安装于靠近电源插座的位置,利用交流电源供电。空调中内置语音芯片,用户可以对空调进行语音控制。用户在夜晚时段面向空调以语音方式发出制冷指令,例如“打开空调,温度为27℃”;空调内置的语音芯片会采集周围环境中的声音信号,并利用硬件VAD模式检测该声音信号中是否包含语音信号;在检测到声音信号包含语音信号时,继续对语音信号进行识别,判断语音信号中是否包含设定的指令词,例如打开+温度值,关机,或者打开+工作模式名称 等。在本实施例中,语音芯片识别出语音信号中包含指令词,即打开+27℃,将识别结果上报给空调的处理器,空调的处理器根据语音芯片识别出的指令词,控制空调的制冷系统开始工作并将制冷温度设置为27℃。
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤11至步骤13的执行主体可以为设备A;又比如,步骤11和12的执行主体可以为设备A,步骤13的执行主体可以为设备B;等等。
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如11、12等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
图4a为本申请示例性实施例提供的一种语音芯片的结构示意图。该语音芯片同时具备硬件VAD功能和软件VAD功能。关于硬件VAD功能和软件VAD功能的相关描述参见前述实施例,在此不再赘述。其中,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,本实施例的语音芯片支持多种VAD模式,并且允许根据语音芯片所处或要处的应用场景、时间信息和/或用户喜好等灵活地配置语音芯片所使用的VAD模式。如图4a所示,语音芯片40包括:拾音模块41、硬件VAD模块42、处理器43和存储器44。
在本申请实施例中,并不对语音芯片的应用方式做限定。例如,语音芯片可以应用在低功耗或者节能设备上,低功耗或者节能设备可以包 含但不限于:靠电池供电的遥控器、故事机、智能音箱、平板电脑、智能手机、智能闹钟、智能手环、智能开关、智能扬声器、智能机器人、无人送货车、自助快递柜或自助终端机等等。又例如,语音芯片也可以实现为独立的语音设备。
拾音模块41,用于采集输入语音芯片40的声音信号。拾音模块41可以是麦克风或话筒。在本申请实施例中,并不对输入语音芯片40的声音信号进行限定。输入语音芯片40的声音信号可以包含但不限于:语音信号、人类噪声、环境噪声等等。
硬件VAD模块42,用于在当前使用的VAD模式指示启用语音芯片40的硬件VAD功能时,以硬件方式检测声音信号是否包含语音信号。其中,当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一,例如可以是硬件VAD模式,也可以是软件VAD模式,还可以是软硬结合的VAD模式。
存储器44中存储有VAD程序和功耗控制程序。处理器43,用于在当前使用的VAD模式指示启用语音芯片的软件VAD功能时,执行VAD程序以软件方式检测声音信号是否包含语音信号。
本实施例的语音芯片40支持低功耗方案,即在没有语音信号时,语音芯片40处于低功耗模式,只有语音信号出现时才会进入正常工作模式。进一步,处理器43,还用于执行功耗控制程序,以用于:在利用当前使用的VAD模式检测出声音信号中包含语音信号的情况下,控制语音芯片从低功耗模式进入正常工作模式。可选地,处理器43还用于:在利用当前使用的VAD模式检测出声音信号中不包含语音信号的情况下,控制语音芯片保持在低功耗模式下。
根据当前使用的VAD模式的不同,拾音模块41的信号走向会有所不同。可选地,如图4b所示,在当前使用的VAD模式为软硬结合的VAD 模式时,拾音模块41具体可将采集到的声音信号分别送入硬件VAD模块和处理器43。或者,如图4b所示,在当前使用的VAD模式为硬件VAD模式时,拾音模块41具体可将采集到的声音信号送入硬件VAD模块。可选地,若语音芯片进入正常工作模式后,处理器43会对语音信号或声音信号做进一步处理,则拾音模块41也可以将采集到的声音信号送入处理器43。或者,如图4b所示,在当前使用的VAD模式为软件VAD模式时,拾音模块41具体可将采集到的声音信号送入处理器43。
进一步可选地,如图4c所示,语音芯片40还包括:切换模块45;切换模块45连接于拾音模块41、硬件VAD模块42以及处理器43之间。切换模块45具体用于:根据当前使用的VAD模式,将拾音模块41切换至与硬件VAD模块42和/或处理器43接通。
在一可选实施例中,无论当前使用的VAD模式是哪种,对硬件VAD模块42来说,若接收到拾音模块41发送的声音信号,就会以硬件方式检测声音信号中是否包含语音信号,并将检测结果上报给处理器43。
对处理器43来说,当前使用的VAD模式不同,处理器43的动作也会有所不同。处理器43具体用于:在当前使用的VAD模式为软硬结合的VAD模式时,若根据硬件VAD模块42上报的检测结果确定硬件VAD模块检测出声音信号中包含语音信号,以软件方式再次检测声音信号中是否包含语音信号,并在以软件方式再次检测出声音信号中包含语音信号时,控制语音芯片40从低功耗模式进入正常工作模式。或者,处理器43具体用于:在当前使用的VAD模式为硬件VAD模式时,若根据硬件VAD模块42上报的检测结果确定硬件VAD模块检测出声音信号中包含语音信号,控制语音芯片40从低功耗模式进入正常工作模式。或者,处理器43具体用于:在当前使用的VAD模式为软件VAD模式时,以软件方式检测声音信号中是否包含语音信号,并在以软件方式检测出声音信 号中包含语音信号时,控制语音芯片40从低功耗模式进入正常工作模式。
进一步可选地,语音芯片40支持的多种VAD模式可预先配置。例如,语音芯片40支持硬件VAD模式和软件VAD模式;或者,支持硬件VAD模式和软硬结合的VAD模式;或者,软件VAD模式和软硬结合的VAD模式;或者,支持硬件VAD、软件VAD和软硬结合的VAD。另外,语音芯片40当前使用的VAD模式也可以灵活配置。进一步,为了能够灵活地配置语音芯片40当前使用的VAD模式,还可以预先配置语音芯片40在使用各种VAD模式时所需的VAD模式使用策略。基于此,处理器43还用于:根据预先配置的VAD模式使用策略,配置语音芯片当前使用的VAD模式。
可选地,处理器43在预先配置VAD模式使用策略时,具体用于:语音芯片的初始化过程中,根据策略配置指令,配置语音芯片40在使用各种VAD模式时所需的VAD模式使用策略;或者,在语音芯片的出厂配置过程中,根据策略配置指令,配置语音芯片40在使用各种VAD模式时所需的VAD模式使用策略;或者,在语音芯片的使用过程中,根据策略配置指令,重新配置语音芯片40在使用各种VAD模式时所需的VAD模式使用策略。
在一可选实施例中,处理器43在根据策略配置指令,配置VAD模式使用策略时,具体用于执行以下至少一种操作:
根据策略配置指令,配置语音芯片在使用各种VAD模式时对应的环境信息;
根据策略配置指令,配置语音芯片在使用各种VAD模式时对应的剩余电量范围;
根据策略配置指令,配置语音芯片在使用各种VAD模式时对应的用户属性;
根据策略配置指令,配置语音芯片在使用各种VAD模式时对应的上层应用状态;
根据策略配置指令,配置各种VAD模式的使用顺序和使用时长;
根据策略配置指令,配置各种VAD模式的标识以供用户通过VAD配置指令指定语音芯片使用的VAD模式。
进一步,处理器43在配置语音芯片在使用各种VAD模式时对应的环境信息时,具体用于:根据策略配置指令,获取语音芯片需要处于的环境信息;根据语音芯片需要处于的环境信息,配置语音芯片在使用各种VAD模式时对应的环境信息。
如图4b所示,配置人员通过终端上的App页面或Web页面配置语音芯片需要处于的环境信息,并点击页面上的配置控件向语音芯片40发出策略配置指令。该策略配置指令中携带有语音芯片需要处于的环境信息。基于此,语音芯片40中的通信模块46接收策略配置指令,将策略配置指令上报给处理器43;处理器43从策略配置指令中解析出语音芯片需要处于的环境信息;根据语音芯片需要处于的环境信息,配置语音芯片在使用各种VAD模式时对应的环境信息。
或者
如图4b所示,配置人员通过终端上的App页面或Web页面配置各种VAD模式对应的剩余电量范围,并点击页面上的配置控件向语音芯片40发出策略配置指令。该策略配置指令中携带有各种VAD模式对应的剩余电量范围。基于此,语音芯片40中的通信模块46接收策略配置指令,将策略配置指令上报给处理器43;处理器43从策略配置指令中解析出各种VAD模式对应的剩余电量范围;配置语音芯片在使用各种VAD模式时对应的剩余电量范围。
或者
如图4b所示,配置人员通过终端上的App页面或Web页面配置各种VAD模式与上层应用状态的对应关系,并点击页面上的配置控件向语音芯片40发出策略配置指令。该策略配置指令中携带有各种VAD模式与上层应用状态的对应关系。基于此,语音芯片40中的通信模块46接收策略配置指令,将策略配置指令上报给处理器43;处理器43从策略配置指令中解析出各种VAD模式与上层应用状态的对应关系;配置语音芯片在使用各种VAD模式时对应的上层应用状态。
进一步可选地,处理器43在根据预先配置的VAD模式使用策略,配置语音芯片当前使用的VAD模式时,具体用于执行以下至少一种操作:
根据语音芯片当前所处的环境信息,配置语音芯片当前使用的VAD模式;
根据语音芯片当前的剩余电量,配置语音芯片当前使用的VAD模式;
根据当前使用语音芯片的用户属性,配置语音芯片当前使用的VAD模式;
根据与语音芯片关联的上层应用的状态,配置语音芯片当前使用的VAD模式;
根据所述VAD模式使用策略中配置的多种VAD模式的使用顺序和使用时长,在前一VAD模式使用时长结束时,配置语音芯片当前使用的VAD模式;
根据VAD配置指令中携带的VAD模式标识,配置语音芯片当前使用的VAD模式。
在本申请实施例中,并不对语音芯片所处环境信息的内容进行限定,例如环境信息可以包含但不限于:语音芯片所处的环境位置、环境类型、环境嘈杂度、环境噪声类别、时间信息等等。需要说明的是,上述环境 位置、环境类型、环境嘈杂度、环境噪声类别、时间信息等几类环境信息可以单独使用,也可以以任意方式组合使用。基于此,处理器43具体用于:根据语音芯片当前所处环境信息中的至少一种信息,配置语音芯片当前使用的VAD模式。根据所依据的环境信息的不同,所配置的语音芯片当前使用的VAD模式也有所不同。
可选地,处理器43具体用于:根据语音芯片当前所处环境对应的环境嘈杂度,配置语音芯片当前使用的VAD模式。例如,若环境嘈杂度大于设定的嘈杂度阈值,配置语音芯片当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;若环境嘈杂度小于或等于设定的嘈杂度阈值,配置语音芯片当前使用的VAD模式为硬件VAD模式。
可选地,处理器43具体用于:根据语音芯片当前所处环境对应的时间信息,配置语音芯片当前使用的VAD模式。例如,可以将时间划分为白天和夜晚两个时间段,在时间信息为白天对应的时间段,即语音芯片处于白天时段时,配置语音芯片当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;在时间信息为夜晚对应的时间段,即语音芯片处于夜晚时段时,配置语音芯片当前使用的VAD模式为硬件VAD模式。
可选地,处理器43具体用于:根据语音芯片当前所处环境对应的环境噪声类别,配置语音芯片当前使用的VAD模式。例如,可将环境噪声划分为人类噪声和非人类噪声两大类,则若环境噪声类别包括人类噪声,配置语音芯片当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;若环境噪声类别不包括人类噪声,配置语音芯片当前使用的VAD模式为硬件VAD模式。
进一步可选地,处理器43在根据语音芯片当前的剩余电量,配置语音芯片当前使用的VAD模式时,具体用于:若语音芯片当前的剩余电量大于或等于第一电量阈值,配置语音芯片当前使用的VAD模式为软硬结 合的VAD模式;若语音芯片当前的剩余电量大于等于第二电量阈值且小于第一电联阈值,配置语音芯片当前使用的VAD模式为软件VAD模式;若语音芯片的剩余电量小于第二电量阈值,配置语音芯片当前使用的VAD模式为硬件VAD模式;其中,所述第一电量阈值大于所述第二电量阈值。
进一步可选地,处理器43在根据当前使用语音芯片的用户属性,配置语音芯片当前使用的VAD模式时,具体用于:若当前使用语音芯片的用户为设定用户类别或设定用户,配置语音芯片当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;若当前使用语音芯片的用户为非设定用户类别或非设定用户,配置语音芯片当前使用的VAD模式为硬件VAD模式。
进一步可选地,处理器43在根据当前使用语音芯片的用户属性,配置语音芯片当前使用的VAD模式时,具体用于:若当前使用语音芯片的用户的音量小于设定的第一音量阈值,配置语音芯片当前使用的VAD模式为软硬结合的VAD模式;若当前使用语音芯片的用户的音量大于或等于第一音量阈值但小于第二音量阈值,配置语音芯片当前使用的VAD模式为软件VAD模式;若当前使用语音芯片的用户的音量大于第二音量阈值,配置语音芯片当前使用的VAD模式为硬件VAD模式;其中,第一音量阈值小于第二音量阈值。
进一步可选地,处理器43在根据与语音芯片关联的上层应用的状态,配置语音芯片当前使用的VAD模式时,具体用于:若与语音芯片关联的上层应用处于运行状态,配置语音芯片当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;若与语音芯片关联的上层应用处于非运行状态,配置语音芯片当前使用的VAD模式为硬件VAD模式。
进一步可选地,处理器43还用于:根据语音芯片所处的环境信息, 动态调整各种VAD模式的使用顺序和/或使用时长。
进一步可选地,处理器43在根据VAD配置指令中携带的VAD模式标识,配置语音芯片当前使用的VAD模式时,具体用于:从所述VAD配置指令中解析出VAD模式标识,所述VAD模式标识用于标识用户指定的VAD模式,所述指定VAD模式为硬件VAD模式、软件VAD模式或软硬结合的VAD模式;将语音芯片当前使用的VAD模式配置为所述指定VAD模式。
进一步,如图4c所示,语音芯片40还可以包括:通信模块46、模数转换(A/D)模块47、数模转换(D/A)模块48、音频输出组件(例如喇叭)49以及电源组件491等。其中,A/D模块47连接于拾音模块41与硬件VAD模块42之间,用于将拾音模块41采集到的声音信号进行模数转换后送入硬件VAD模块42。需要说明的是,A/D模块47与切换模块45的位置不做限定,例如可以如图4c所示,A/D模块47位于切换模块45前面,或者,也可以是切换模块45位于A/D模块47前面。D/A模块48连接于音频输出组件49与处理器43之间,用于将处理器43输出的数字信号转换为模拟信号后送入行音频输出组件49,供音频输出组件49输出。
在本实施例中,语音芯片具备VAD功能,利用VAD功能可检测是否有语音信号输入,在检测到语音信号的情况下,语音芯片从低功耗模式进入正常工作模式,可节约语音芯片的功耗;进一步,语音芯片同时具备硬件VAD功能和软件VAD功能,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,通过灵活配置语音芯片所使用的VAD模式,可在一定程度上提高语音输入检测结果的准确度,降低误触发概率,提高语音芯片的低功耗性能。
图5为本申请示例性实施例提供的再一种语音芯片的结构示意图。 该语音芯片同时具备硬件VAD功能和软件VAD功能。关于硬件VAD功能和软件VAD功能的相关描述参见前述实施例,在此不再赘述。其中,将硬件VAD功能与软件VAD功能组合使用可产生多种VAD模式,本实施例的语音芯片支持多种VAD模式,并且允许根据语音芯片或设备所处或要处的应用场景、时间信息和/或用户喜好等灵活地配置语音芯片所使用的VAD模式。如图5所示,该语音芯片50包括:拾音模块51、硬件VAD模块52、主处理器53、协处理器56和存储器54。协处理器56主要协助主处理器53完成其无法执行或执行效率、效果低下的处理工作。在本实施例中,协处理器56主要用于在主处理器53处于低功耗模式时,替代主处理器53完成一些工作,并负责将主处理器53唤醒。
在本申请实施例中,并不对语音芯片的应用方式做限定。例如,语音芯片可以应用在低功耗或者节能设备上,低功耗或者节能设备可以包含但不限于:靠电池供电的遥控器、故事机、智能音箱、平板电脑、智能手机、扫地机器人等等。又例如,语音芯片也可以实现为独立的语音设备,或独立应用。
拾音模块51,用于采集输入语音芯片50的声音信号。拾音模块51可以是麦克风或话筒。在本申请实施例中,并不对输入语音芯片50的声音信号进行限定。输入语音芯片50的声音信号可以包含但不限于:语音信号、人类噪声、环境噪声等等。
硬件VAD模块52,用于在当前使用的VAD模式指示启用语音芯片的硬件VAD功能时,以硬件方式检测声音信号是否包含语音信号。其中,当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一,例如可以是硬件VAD模式,也可以是软件VAD模式,还可以是软硬结合的VAD模式。
存储器54中存储有VAD程序和功耗控制程序。协处理器56,用于 在当前使用的VAD模式指示启用语音芯片的软件VAD功能时,执行VAD程序以软件方式检测声音信号是否包含语音信号。
本实施例的语音芯片50支持低功耗方案,即在没有语音信号时,语音芯片50处于低功耗模式,此时,主处理器53不工作,只有语音信号出现时主处理器53才会进入正常工作模式。进一步,协处理器56,还用于执行功耗控制程序,以用于:在利用当前使用的VAD模式检测出声音信号中包含语音信号的情况下,控制主处理器53从低功耗模式进入正常工作模式。进一步,主处理器53可以将语音芯片50中的其它硬件模块或功能唤醒。可选地,协处理器56还用于:在利用当前使用的VAD模式检测出声音信号中不包含语音信号的情况下,控制语音芯片50保持在低功耗模式下。
图5所示实施例与图4a-图4c所示实施例相比,主要区别在于:语音芯片50包括主处理器53和协处理器56。其中,协处理器56的功能与图4a-图4c所示实施例中的处理器43相似,在此不再赘述,可参见图4a-图4c所示实施例。除主处理器53和协处理器56之外的其它模块的功能,均与图4a-图4c所示实施例中相应模块的功能相似,在此不再赘述,可参见图4a-图4c所示实施例。
图6为本申请示例性实施例提供的一种智能终端的结构示意图。如图6所示,该智能终端60包括语音芯片65,语音芯片65包括:拾音模块61、硬件VAD模块62、处理器63和存储器64。
存储器64中存储有VAD程序和功耗控制程序。拾音模块61,用于采集输入语音芯片的声音信号。硬件VAD模块62,用于在当前使用的VAD模式指示启用语音芯片的硬件VAD功能时,以硬件方式检测声音信号是否包含语音信号。处理器63,用于在当前使用的VAD模式指示启用语音芯片的软件VAD功能时,执行VAD程序以软件方式检测声音 信号是否包含语音信号。进一步,处理器63,还用于执行功耗控制程序,以用于:在利用当前使用的VAD模式检测出声音信号中包含语音信号的情况下,控制智能终端从低功耗模式进入正常工作模式。其中,当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一,例如可以是硬件VAD模式,也可以是软件VAD模式,还可以是软硬结合的VAD模式。
图6所示智能终端与图4a-图4c所示语音芯片的区别在于:产品形态不同。图6所示智能终端是设备形态,例如可以是遥控器、故事机、智能音箱、平板电脑、智能手机、智能闹钟、智能手环、智能开关、智能扬声器、智能机器人、无人送货车、自助快递柜或自助终端机等靠电池供电的电子设备,也可以是空调、冰箱、电视等不靠电池类电源供电的设备。关于图6所示智能终端中各模块的详细描述,可参见图4a-图4c所示实施例中相应模块的描述。进一步,如图6所示,该智能终端还包括:通信组件66、电源组件68以及显示器67等其他组件。其中,虚线框内的组件为可选组件,而非必选组件。
图7为本申请示例性实施例提供的另一种智能终端的结构示意图。如图7所示,该智能终端70包括:语音芯片75和主处理器73;语音芯片75包括:拾音模块71、硬件VAD模块72、协处理器76和存储器74;存储器74中存储有VAD程序和功耗控制程序。
拾音模块71,用于采集输入语音芯片的声音信号。硬件VAD模块72,用于在当前使用的VAD模式指示启用语音芯片的硬件VAD功能时,以硬件方式检测声音信号是否包含语音信号。协处理器76,用于在当前使用的VAD模式指示启用语音芯片的软件VAD功能时,执行VAD程序以软件方式检测声音信号是否包含语音信号。进一步,协处理器76,还用于执行功耗控制程序,以用于:在利用当前使用的VAD模式检测出 声音信号中包含语音信号的情况下,控制主处理器73从低功耗模式进入正常工作模式。其中,当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一,例如可以是硬件VAD模式,也可以是软件VAD模式,还可以是软硬结合的VAD模式。
图7所示智能终端与图5所示语音芯片的区别在于:产品形态不同。图7所示智能终端是设备形态,例如可以是遥控器、故事机、智能音箱、平板电脑、智能手机、智能闹钟、智能手环、智能开关、智能扬声器、智能机器人、无人送货车、自助快递柜或自助终端机等靠电池供电的电子设备,也可以是空调、冰箱、电视等不靠电池类电源供电的设备。关于图7所示智能终端中各模块的详细描述,可参见图5以及图4a-4c所示实施例中相应模块的描述。进一步,如图7所示,该智能终端还包括:通信组件77、电源组件79以及显示器78等其他组件。其中,虚线框内的组件为可选组件,而非必选组件。
除上述智能终端之外,本申请实施例还提供一种自主服务终端,包括:语音芯片和主处理器;语音芯片包括拾音模块、硬件VAD模块、协处理器和存储器;存储器中存储有VAD程序和功耗控制程序;拾音模块,用于采集输入语音芯片的声音信号;硬件VAD模块,用于在当前使用的VAD模式指示启用语音芯片的硬件VAD功能时,以硬件方式检测声音信号是否包含语音信号;协处理器,用于在当前使用的VAD模式指示启用语音芯片的软件VAD功能时,执行VAD程序以软件方式检测声音信号是否包含语音信号;协处理器,还用于执行功耗控制程序,以用于:在利用当前使用的VAD模式检测出声音信号中包含语音信号的情况下,控制主处理器从低功耗模式进入正常工作模式;其中,当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
本实施例提供的自助服务终端与图5所示语音芯片的区别在于:产品形态不同。本实施例提供的自助服务终端可以是超市POS机,银行自助取款机,商场、机场等场景中的自助导购服务终端等。关于自助服务终端中各模块的详细描述,可参见图5以及图4a-4c所示实施例中相应模块的描述。
本申请实施例提供一种存储有计算机程序的计算机可读存储介质,计算机程序被处理器执行时,致使处理器能够实现上述图1所示方法实施例中的各步骤。
本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被处理器执行时,致使处理器能够实现上述图2所示方法实施例中的各步骤。
本申请实施例再提供一种存储有计算机程序的计算机可读存储介质,计算机程序被处理器执行时,致使处理器能够实现上述图3所示方法实施例中的各步骤。
上述实施例中的存储器,用于存储计算机程序,并可被配置为存储其它各种数据以支持在其所属芯片或设备上的操作。这些数据的示例包括用于在存储器所属芯片或设备上操作的任何应用程序或方法的指令,消息,图片,音频等。
上述实施例中的存储器,可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
上述实施例中的通信组件被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标 准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
上述实施例中的电源组件,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的组件。
上述实施例中的音频输出组件,可被配置为输出音频信号。例如,音频输出组件包括一个扬声器(或喇叭),用于输出音频信号。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数 据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (41)

  1. 一种功耗控制方法,适用于语音芯片或设备,其特征在于,所述语音芯片或设备具备硬件VAD功能和软件VAD功能;所述方法包括:
    采集输入语音芯片或设备的声音信号;
    利用语音芯片或设备当前使用的VAD模式,检测所述声音信号是否包含语音信号;
    若所述声音信号中包含语音信号,所述语音芯片或设备从低功耗模式进入正常工作模式;
    其中,所述当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    若所述声音信号中不包含语音信号,所述语音芯片或设备保持在低功耗模式下。
  3. 根据权利要求1所述的方法,其特征在于,在当前使用的VAD模式为软硬结合的VAD模式时,利用当前使用的VAD模式,检测所述声音信号是否包含语音信号,包括:
    将所述声音信号送入所述语音芯片或设备中的硬件VAD模块中,以硬件方式检测所述声音信号是否包含语音信号;
    若以硬件方式检测出所述声音信号中包含语音信号,则将所述声音信号送入所述语音芯片或设备中的处理器,以软件方式再次检测所述声音信号是否包含语音信号;
    若以软件方式再次检测出所述声音信号中包含语音信号,确定所述声音信号中包含语音信号。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,在利用语音芯片或设备当前使用的VAD模式之前,还包括:
    根据预先配置的VAD模式使用策略,配置语音芯片或设备当前使用的VAD模式。
  5. 根据权利要求4所述的方法,其特征在于,预先配置VAD模式使用策略,包括:
    在对语音芯片或设备进行初始化时,根据策略配置指令,配置VAD模式使用策略;
    或者
    在对语音芯片或设备进行出厂配置时,根据策略配置指令,配置VAD模式使用策略。
  6. 根据权利要求5所述的方法,其特征在于,根据策略配置指令,配置VAD模式使用策略,包括以下至少一种操作:
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的环境信息;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的剩余电量范围;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的用户属性;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的上层应用状态;
    根据策略配置指令,配置各种VAD模式的使用顺序和使用时长;
    根据策略配置指令,配置各种VAD模式的标识以供用户通过VAD配置指令指定语音芯片或设备使用的VAD模式。
  7. 根据权利要求6所述的方法,其特征在于,根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的环境信息,包括:
    根据策略配置指令,获取语音芯片或设备需要处于的环境信息;
    根据语音芯片或设备需要处于的环境信息,配置语音芯片或设备在使用各VAD模式时对应的环境信息。
  8. 根据权利要求7所述的方法,其特征在于,语音芯片或设备在使用各种VAD模式时对应的环境信息,包括:环境位置、环境类型、环境嘈杂度、环境噪声类别以及时间信息中的至少一种信息。
  9. 根据权利要求4所述的方法,其特征在于,根据预先配置的VAD模式使用策略,配置语音芯片或设备当前使用的VAD模式,包括以下至少一种方式:
    根据语音芯片或设备当前所处的环境信息,配置语音芯片或设备当前使用的VAD模式;
    根据语音芯片或设备当前的剩余电量,配置语音芯片或设备当前使用的VAD模式;
    根据当前使用语音芯片或设备的用户属性,配置语音芯片或设备当前使用的VAD模式;
    根据与语音芯片或设备关联的上层应用的状态,配置语音芯片或设备当前使用的VAD模式;
    根据所述VAD模式使用策略中配置的多种VAD模式的使用顺序和使用时长,在前一VAD模式使用时长结束时,配置语音芯片或设备当前使用的VAD模式;
    根据VAD配置指令中携带的VAD模式标识,配置语音芯片或设备当前使用的VAD模式。
  10. 根据权利要求9所述的方法,其特征在于,根据语音芯片或设备当前所处的环境信息,配置语音芯片或设备当前使用的VAD模式,包括:
    根据语音芯片或设备当前所处环境对应的环境位置、环境类型、环 境嘈杂度、环境噪声类别以及时间信息中的至少一种信息,配置语音芯片或设备当前使用的VAD模式。
  11. 根据权利要求10所述的方法,其特征在于,根据语音芯片或设备当前所处环境对应的环境嘈杂度,配置语音芯片或设备当前使用的VAD模式,包括:
    若所述环境嘈杂度大于设定的嘈杂度阈值,配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;
    若所述环境嘈杂度小于或等于设定的嘈杂度阈值,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
  12. 根据权利要求10所述的方法,其特征在于,根据语音芯片或设备当前所处环境对应的时间信息,配置语音芯片或设备当前使用的VAD模式,包括:
    若所述时间信息为白天对应的时间段,配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;
    若所述时间信息为夜晚对应的时间段,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
  13. 根据权利要求10所述的方法,其特征在于,根据语音芯片或设备当前所处环境对应的环境噪声类别,配置语音芯片或设备当前使用的VAD模式,包括:
    若所述环境噪声类别包括人类噪声,配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;
    若所述环境噪声类别不包括人类噪声,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
  14. 根据权利要求9所述的方法,其特征在于,根据语音芯片或设备当前的剩余电量,配置语音芯片或设备当前使用的VAD模式,包括:
    若语音芯片或设备当前的剩余电量大于或等于第一电量阈值,配置语音芯片或设备当前使用的VAD模式为软硬结合的VAD模式;
    若语音芯片或设备当前的剩余电量大于等于第二电量阈值且小于第一电联阈值,配置语音芯片或设备当前使用的VAD模式为软件VAD模式;
    若语音芯片或设备的剩余电量小于第二电量阈值,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式;
    其中,所述第一电量阈值大于所述第二电量阈值。
  15. 根据权利要求9所述的方法,其特征在于,根据当前使用语音芯片或设备的用户属性,配置语音芯片或设备当前使用的VAD模式,包括:
    若当前使用语音芯片或设备的用户为设定用户类别或设定用户,配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;
    若当前使用语音芯片或设备的用户为非设定用户类别或非设定用户,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
  16. 根据权利要求9所述的方法,其特征在于,根据当前使用语音芯片或设备的用户属性,配置语音芯片或设备当前使用的VAD模式,包括:
    若当前使用语音芯片或设备的用户的音量小于设定的第一音量阈值,配置语音芯片或设备当前使用的VAD模式为软硬结合的VAD模式;
    若当前使用语音芯片或设备的用户的音量大于或等于第一音量阈值但小于第二音量阈值,配置语音芯片或设备当前使用的VAD模式为软件VAD模式;
    若当前使用语音芯片或设备的用户的音量大于第二音量阈值,配置 语音芯片或设备当前使用的VAD模式为硬件VAD模式;
    其中,第一音量阈值小于第二音量阈值。
  17. 根据权利要求9所述的方法,其特征在于,根据与语音芯片或设备关联的上层应用的状态,配置语音芯片或设备当前使用的VAD模式,包括:
    若与语音芯片或设备关联的上层应用处于运行状态,配置语音芯片或设备当前使用的VAD模式为软件VAD模式或软硬结合的VAD模式;
    若与语音芯片或设备关联的上层应用处于非运行状态,配置语音芯片或设备当前使用的VAD模式为硬件VAD模式。
  18. 根据权利要求9所述的方法,其特征在于,还包括:
    根据语音芯片或设备所处的环境信息,动态调整各种VAD模式的使用顺序和/或使用时长。
  19. 根据权利要求9所述的方法,其特征在于,根据VAD配置指令中携带的VAD模式标识,配置语音芯片或设备当前使用的VAD模式,包括:
    从所述VAD配置指令中解析出VAD模式标识,所述VAD模式标识用于标识用户指定的VAD模式,所述指定VAD模式为硬件VAD模式、软件VAD模式或软硬结合的VAD模式;
    将语音芯片或设备当前使用的VAD模式配置为所述指定VAD模式。
  20. 一种VAD模式配置方法,适用于语音芯片或设备,其特征在于,所述语音芯片或设备具备硬件VAD功能和软件VAD功能,所述方法包括:
    接收策略配置指令;
    根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略;
    其中,所述多种VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的。
  21. 根据权利要求20所述的方法,其特征在于,根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略,包括:
    在对语音芯片或设备进行初始化时,根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略;
    或者
    在对语音芯片或设备进行出厂配置时,根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略。
  22. 根据权利要求20或21所述的方法,其特征在于,根据策略配置指令,配置语音芯片或设备使用多种VAD模式所需的VAD模式使用策略,包括以下至少一种操作:
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的环境信息;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的剩余电量范围;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的用户属性;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的上层应用状态;
    根据策略配置指令,配置各种VAD模式的使用顺序和使用时长;
    根据策略配置指令,配置各种VAD模式的标识以供用户通过VAD配置指令指定语音芯片或设备使用的VAD模式。
  23. 根据权利要求22所述的方法,其特征在于,根据策略配置指令, 配置语音芯片或设备在使用各种VAD模式时对应的环境信息,包括:
    根据策略配置指令,获取语音芯片或设备需要处于的环境信息;
    根据语音芯片或设备需要处于的环境信息,配置语音芯片或设备在使用各种VAD模式时对应的环境信息。
  24. 根据权利要求23所述的方法,其特征在于,根据策略配置指令,获取语音芯片或设备需要处于的环境信息,包括:
    从策略配置指令中,解析出语音芯片或设备需要处于的环境信息;
    或者
    根据策略配置指令,采集语音芯片或设备需要处于的环境信息。
  25. 根据权利要求24所述的方法,其特征在于,接收策略配置指令,包括:
    接收配置终端发送的策略配置指令,所述策略配置指令是配置终端根据配置人员在配置界面上输入的语音芯片或设备需要处于的环境信息生成的。
  26. 一种语音端点检测方法,适用于语音芯片或设备,其特征在于,所述语音芯片或设备具备硬件VAD功能和软件VAD功能,所述方法包括:
    采集输入语音芯片或设备的声音信号;
    利用语音芯片或设备当前使用的VAD模式,对所述声音信号进行VAD处理;
    其中,所述当前使用的VAD模式是硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
  27. 一种语音芯片,其特征在于,包括:拾音模块、硬件VAD模块、处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;
    所述拾音模块,用于采集输入所述语音芯片的声音信号;
    所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;
    所述处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;
    所述处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述语音芯片从低功耗模式进入正常工作模式;
    其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
  28. 根据权利要求27所述的语音芯片,其特征在于,所述处理器还用于:
    在利用当前使用的VAD模式检测出所述声音信号中不包含语音信号的情况下,控制所述语音芯片保持在低功耗模式下。
  29. 根据权利要求27所述的语音芯片,其特征在于,
    所述拾音模块具体用于:在当前使用的VAD模式为软硬结合的VAD模式时,将采集到的声音信号分别送入所述硬件VAD模块和所述处理器;或者,在当前使用的VAD模式为硬件VAD模式时,将采集到的声音信号送入所述硬件VAD模块;或者,在当前使用的VAD模式为软件VAD模式时,将采集到的声音信号送入所述处理器。
  30. 根据权利要求29所述的语音芯片,其特征在于,还包括:切换模块;所述切换模块连接于所述拾音模块、所述硬件VAD模块以及所述处理器之间;
    所述切换模块,用于根据当前使用的VAD模式,将所述拾音模块切 换至与所述硬件VAD模块和/或所述处理器接通。
  31. 根据权利要求29所述的语音芯片,其特征在于,所述处理器具体用于:
    在当前使用的VAD模式为软硬结合的VAD模式时,若根据所述硬件VAD模块上报的检测结果确定所述硬件VAD模块检测出所述声音信号中包含语音信号,以软件方式再次检测所述声音信号中是否包含语音信号,并在以软件方式再次检测出所述声音信号中包含语音信号时,控制所述语音芯片从低功耗模式进入正常工作模式;或者,
    在当前使用的VAD模式为硬件VAD模式时,若根据所述硬件VAD模块上报的检测结果确定所述硬件VAD模块检测出所述声音信号中包含语音信号,控制所述语音芯片从低功耗模式进入正常工作模式;或者,
    在当前使用的VAD模式为软件VAD模式时,以软件方式检测所述声音信号中是否包含语音信号,并在以软件方式检测出所述声音信号中包含语音信号时,控制所述语音芯片从低功耗模式进入正常工作模式。
  32. 根据权利要求27-31任一项所述的语音芯片,其特征在于,所述处理器还用于:
    根据预先配置的VAD模式使用策略,配置语音芯片当前使用的VAD模式。
  33. 根据权利要求32所述的语音芯片,其特征在于,所述处理器在预先配置VAD模式使用策略时,具体用于执行以下至少一种操作:
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的环境信息;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的剩余电量范围;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对 应的用户属性;
    根据策略配置指令,配置语音芯片或设备在使用各种VAD模式时对应的上层应用状态;
    根据策略配置指令,配置各种VAD模式的使用顺序和使用时长;
    根据策略配置指令,配置各种VAD模式在VAD配置指令中的标识。
  34. 根据权利要求32所述的语音芯片,其特征在于,所述处理器在配置语音芯片当前使用的VAD模式时,具体用于执行以下至少一种操作:
    根据语音芯片或设备当前所处的环境信息,配置语音芯片或设备当前使用的VAD模式;
    根据语音芯片或设备当前的剩余电量,配置语音芯片或设备当前使用的VAD模式;
    根据当前使用语音芯片或设备的用户属性,配置语音芯片或设备当前使用的VAD模式;
    根据与语音芯片或设备关联的上层应用的状态,配置语音芯片或设备当前使用的VAD模式;
    根据所述VAD模式使用策略中配置的多种VAD模式的使用顺序和使用时长,在前一VAD模式使用时长结束时,配置语音芯片或设备当前使用的VAD模式;
    根据VAD配置指令中携带的VAD模式标识,配置语音芯片或设备当前使用的VAD模式。
  35. 一种语音芯片,其特征在于,包括:拾音模块、硬件VAD模块、主处理器、协处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;
    所述拾音模块,用于采集输入所述语音芯片的声音信号;
    所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;
    所述协处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;
    所述协处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述主处理器从低功耗模式进入正常工作模式;
    其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
  36. 一种智能终端,其特征在于,包括语音芯片,所述语音芯片包括拾音模块、硬件VAD模块、处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;
    所述拾音模块,用于采集输入所述语音芯片的声音信号;
    所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;
    所述处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;
    所述处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述语音设备从低功耗模式进入正常工作模式;
    其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD 功能组合使用所产生的多种VAD模式之一。
  37. 根据权利要求36所述的智能终端,其特征在于,所述智能终端为智能闹钟、智能手环、智能开关、智能扬声器、智能音箱、智能手机、智能机器人、无人送货车、自助快递柜或自助终端机。
  38. 一种智能终端,其特征在于,包括语音芯片和主处理器,所述语音芯片包括拾音模块、硬件VAD模块、协处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;
    所述拾音模块,用于采集输入所述语音芯片的声音信号;
    所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;
    所述协处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;
    所述协处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述主处理器从低功耗模式进入正常工作模式;
    其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
  39. 根据权利要求38所述的智能终端,其特征在于,所述智能终端为智能闹钟、智能手环、智能开关、智能扬声器、智能音箱、智能手机、智能机器人、无人送货车、自助快递柜或自助终端机。
  40. 一种自助服务终端,其特征在于,包括语音芯片和主处理器;所述语音芯片包括拾音模块、硬件VAD模块、协处理器和存储器;所述存储器中存储有VAD程序和功耗控制程序;
    所述拾音模块,用于采集输入所述语音芯片的声音信号;
    所述硬件VAD模块,用于在当前使用的VAD模式指示启用所述语音芯片的硬件VAD功能时,以硬件方式检测所述声音信号是否包含语音信号;
    所述协处理器,用于在当前使用的VAD模式指示启用所述语音芯片的软件VAD功能时,执行所述VAD程序以软件方式检测所述声音信号是否包含语音信号;
    所述协处理器,还用于执行所述功耗控制程序,以用于:在利用当前使用的VAD模式检测出所述声音信号中包含语音信号的情况下,控制所述主处理器从低功耗模式进入正常工作模式;
    其中,所述当前使用的VAD模式是所述硬件VAD功能和软件VAD功能组合使用所产生的多种VAD模式之一。
  41. 一种存储有计算机程序的计算机可读存储介质,其特征在于,当所述计算机程序被处理器执行时,致使所述处理器实现权利要求1-26任一项所述方法中的步骤。
PCT/CN2021/080172 2020-03-13 2021-03-11 功耗控制、模式配置与vad方法、设备及存储介质 WO2021180162A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010176807.6 2020-03-13
CN202010176807.6A CN113393865B (zh) 2020-03-13 2020-03-13 功耗控制、模式配置与vad方法、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021180162A1 true WO2021180162A1 (zh) 2021-09-16

Family

ID=77616161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080172 WO2021180162A1 (zh) 2020-03-13 2021-03-11 功耗控制、模式配置与vad方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113393865B (zh)
WO (1) WO2021180162A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114531441A (zh) * 2022-01-11 2022-05-24 南京博联智能科技有限公司 一种基于动态配置的多功能智能面板形态转换方法和系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114512127B (zh) * 2022-01-29 2023-12-26 深圳市九天睿芯科技有限公司 语音控制方法、装置、设备、介质及智能语音采集系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122078A1 (en) * 2012-11-01 2014-05-01 3iLogic-Designs Private Limited Low Power Mechanism for Keyword Based Hands-Free Wake Up in Always ON-Domain
CN105224074A (zh) * 2015-08-31 2016-01-06 联想(北京)有限公司 一种控制方法及电子设备
CN106531165A (zh) * 2016-12-15 2017-03-22 北京塞宾科技有限公司 一种便携式智能家居语音控制系统及控制方法
CN108986822A (zh) * 2018-08-31 2018-12-11 出门问问信息科技有限公司 语音识别方法、装置、电子设备及非暂态计算机存储介质
CN110660413A (zh) * 2018-06-28 2020-01-07 新唐科技股份有限公司 语音活动侦测系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626498B2 (en) * 2010-02-24 2014-01-07 Qualcomm Incorporated Voice activity detection based on plural voice activity detectors
US9293131B2 (en) * 2010-08-10 2016-03-22 Nec Corporation Voice activity segmentation device, voice activity segmentation method, and voice activity segmentation program
US9142215B2 (en) * 2012-06-15 2015-09-22 Cypress Semiconductor Corporation Power-efficient voice activation
CN103065629A (zh) * 2012-11-20 2013-04-24 广东工业大学 一种仿人机器人的语音识别系统
US10748529B1 (en) * 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US9854526B2 (en) * 2015-01-28 2017-12-26 Qualcomm Incorporated Sensor activated power reduction in voice activated mobile platform
EP3185244B1 (en) * 2015-12-22 2019-02-20 Nxp B.V. Voice activation system
CN108665889B (zh) * 2018-04-20 2021-09-28 百度在线网络技术(北京)有限公司 语音信号端点检测方法、装置、设备及存储介质
CN110858488A (zh) * 2018-08-24 2020-03-03 阿里巴巴集团控股有限公司 语音活动检测方法、装置、设备及存储介质
CN109473123B (zh) * 2018-12-05 2022-05-31 百度在线网络技术(北京)有限公司 语音活动检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122078A1 (en) * 2012-11-01 2014-05-01 3iLogic-Designs Private Limited Low Power Mechanism for Keyword Based Hands-Free Wake Up in Always ON-Domain
CN105224074A (zh) * 2015-08-31 2016-01-06 联想(北京)有限公司 一种控制方法及电子设备
CN106531165A (zh) * 2016-12-15 2017-03-22 北京塞宾科技有限公司 一种便携式智能家居语音控制系统及控制方法
CN110660413A (zh) * 2018-06-28 2020-01-07 新唐科技股份有限公司 语音活动侦测系统
CN108986822A (zh) * 2018-08-31 2018-12-11 出门问问信息科技有限公司 语音识别方法、装置、电子设备及非暂态计算机存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114531441A (zh) * 2022-01-11 2022-05-24 南京博联智能科技有限公司 一种基于动态配置的多功能智能面板形态转换方法和系统
CN114531441B (zh) * 2022-01-11 2024-03-12 南京博联智能科技有限公司 一种基于动态配置的多功能智能面板形态转换方法和系统

Also Published As

Publication number Publication date
CN113393865A (zh) 2021-09-14
CN113393865B (zh) 2022-06-03

Similar Documents

Publication Publication Date Title
CN111223497B (zh) 一种终端的就近唤醒方法、装置、计算设备及存储介质
WO2020253715A1 (zh) 语音数据处理方法、装置及系统
WO2021027267A1 (zh) 语音交互方法、装置、终端及存储介质
CN106782540B (zh) 语音设备及包括所述语音设备的语音交互系统
US11295760B2 (en) Method, apparatus, system and storage medium for implementing a far-field speech function
CN111192591B (zh) 智能设备的唤醒方法、装置、智能音箱及存储介质
CN110265006B (zh) 唤醒方法、主节点、从节点和存储介质
WO2021180162A1 (zh) 功耗控制、模式配置与vad方法、设备及存储介质
US10986573B2 (en) Bluetooth mesh network gateway and device data communication
CN110956963A (zh) 一种基于可穿戴设备实现的交互方法及可穿戴设备
CN109308018A (zh) 一种智能家居分布式语音控制系统
US20240203416A1 (en) Combining Device or Assistant-Specific Hotwords in a Single Utterance
CN103746819A (zh) 一种终端节能方法及终端、系统
CN102884843B (zh) 通过活跃和睡眠状态之间的转换来实现节电的无线个域网(pan)协调器
CN209606794U (zh) 一种可穿戴设备、音箱设备和智能家居控制系统
CN108093350B (zh) 麦克风的控制方法和麦克风
CN111654782B (zh) 一种智能音箱及信号处理方法
WO2018023514A1 (zh) 一种家居背景音乐控制系统
CN110958348B (zh) 语音处理方法、装置、用户设备及智能音箱
CN109658924B (zh) 会话消息处理方法、装置及智能设备
CN206533520U (zh) 一种中控系统
CN111627441B (zh) 电子设备的控制方法、装置、设备和存储介质
WO2018023518A1 (zh) 一种语音交互识别智能终端
CN114765026A (zh) 一种语音控制方法、装置及系统
CN110501916A (zh) 一种智能家电控制方法、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21767922

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21767922

Country of ref document: EP

Kind code of ref document: A1