WO2023020076A1 - 设备的唤醒方法 - Google Patents

设备的唤醒方法 Download PDF

Info

Publication number
WO2023020076A1
WO2023020076A1 PCT/CN2022/097202 CN2022097202W WO2023020076A1 WO 2023020076 A1 WO2023020076 A1 WO 2023020076A1 CN 2022097202 W CN2022097202 W CN 2022097202W WO 2023020076 A1 WO2023020076 A1 WO 2023020076A1
Authority
WO
WIPO (PCT)
Prior art keywords
wake
angle
voice
detection information
smart devices
Prior art date
Application number
PCT/CN2022/097202
Other languages
English (en)
French (fr)
Inventor
郝斌
Original Assignee
青岛海尔科技有限公司
海尔智家股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛海尔科技有限公司, 海尔智家股份有限公司 filed Critical 青岛海尔科技有限公司
Publication of WO2023020076A1 publication Critical patent/WO2023020076A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the technical field of control, and in particular to a method for waking up a device.
  • An embodiment of the present application provides a method for waking up a device, which is used to solve the problem that multiple devices are woken up at the same time or cannot be woken up, and improve user experience.
  • the embodiment of the present application provides a method for waking up a device, which is applied to a target device, and the method includes:
  • the wake-up voice includes the same wake-up word of multiple smart devices within a preset range, and the multiple smart devices include the target device;
  • the receiving server sends the wake-up instruction, and controls the target device to wake up according to the wake-up instruction.
  • the embodiment of the present application provides a method for waking up a device, which is applied to a server, and the method includes:
  • an embodiment of the present application provides a device wake-up device, which is applied to a target device, and the device includes: an acquisition module, a reception module, and a determination module;
  • an acquisition module configured to acquire an identifier of the control strategy
  • the receiving module is configured to receive the wake-up voice input by the user, the wake-up voice includes the same wake-up word of multiple smart devices within a preset range, and the multiple smart devices include the target device;
  • the determination module is configured to determine the detection information according to the identification and the wake-up voice, and send the detection information to the server;
  • the receiving module is further configured to receive the wake-up instruction sent by the server, and control the target device to wake up according to the wake-up instruction.
  • the embodiment of the present application provides a device wake-up device, which is applied to a server, and the device includes: a receiving module, a determining module, and a sending module; wherein,
  • the receiving module is configured to receive detection information sent by multiple smart devices
  • a determining module configured to determine the target device among the multiple smart devices according to the detection information sent by the multiple smart devices
  • the sending module is configured to send a wake-up instruction to the target device.
  • the embodiment of the present application provides a smart device, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-implemented instructions stored in the memory, so that the processor performs the method according to any one of the first aspect.
  • the embodiment of the present application provides a server, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-implemented instructions stored in the memory, so that the processor performs the method according to any one of the second aspect.
  • the embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the method according to any one of the first aspect.
  • the embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the method according to any one of the second aspect.
  • an embodiment of the present application provides a computer program product, including a computer program, and when the computer program is executed by a processor, the method according to any one of the first aspect is implemented.
  • an embodiment of the present application provides a computer program product, including a computer program, and when the computer program is executed by a processor, the method according to any one of the second aspect is implemented.
  • An embodiment of the present application provides a method for waking up a device.
  • the method includes: acquiring an identification of a control strategy; receiving a wake-up voice input by the user, the wake-up voice includes the same wake-up word of multiple smart devices within a preset range, multiple smart devices.
  • the device includes the target device; according to the identification and the wake-up voice, determine the detection information, and send the detection information to the server; receive the wake-up instruction sent by the server, and control the wake-up of the target device according to the wake-up instruction.
  • the device determines the detection information according to the identification and wake-up voice, and sends the detection information to the server.
  • the server determines the target device among multiple smart devices according to the detection information sent by multiple smart devices, and the server sends a wake-up call to the target device. Instructions, only to control the wake-up of the target device, that is, to control the wake-up of the target device only through the wake-up voice, which solves the problem of multiple devices being woken up at the same time or multiple devices cannot be woken up, and improves the user experience.
  • FIG. 1 is a schematic diagram of an application scenario of a method for waking up a device provided in an embodiment of the present application
  • FIG. 2 is a flow chart 1 of a method for waking up a device provided in an embodiment of the present application
  • FIG. 3 is the second flow chart of the device wake-up method provided by the embodiment of the present application.
  • FIG. 4 is a schematic diagram of a setting interface provided by an embodiment of the present application.
  • Fig. 5 is the application scenario 1 provided by the embodiment of the present application.
  • Figure 6 shows the second application scenario provided by the embodiment of the present application.
  • Fig. 7 is the second application scenario provided by the embodiment of the present application.
  • FIG. 8 is a third flowchart of a method for waking up a device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram 1 of a wake-up device provided by an embodiment of the present application.
  • FIG. 10 is a second schematic diagram of a device wake-up device provided by the embodiment of the present application.
  • FIG. 11 is a schematic diagram of the hardware structure of the smart device provided by the embodiment of the present application.
  • FIG. 12 is a schematic diagram of a hardware structure of a server provided by an embodiment of the present application.
  • Control strategy In order to solve the problem in related technologies that when the wake-up words of multiple devices are consistent, multiple devices are woken up at the same time or multiple devices cannot be woken up, resulting in poor user experience.
  • the inventor thought of setting Control strategy, under this control strategy, according to the wake-up voice input by the user, determine the target device to be woken up, and then control the target device to wake up, so as to solve the problem that multiple devices are woken up at the same time or multiple devices cannot be woken up, and improve user experience. experience.
  • FIG. 1 is a schematic diagram of an application scenario of a method for waking up a device provided in an embodiment of the present application.
  • the application scenario includes: multiple devices, servers and users.
  • multiple devices include device 1 , device 2 , and device 3 .
  • the wake-up words of multiple devices are the same, for example, the wake-up words are all "small U small U”.
  • Users can set control policies for multiple devices. After the user sets the control strategy, when the user calls "Xiao U Xiao U", multiple devices can receive the wake-up voice "Xiao U Xiao U", and determine the detection information according to the set control strategy and wake-up voice, and then send The server sends detection information.
  • the server After receiving the detection information, the server determines the target device to be awakened among multiple devices according to the detection information, and then sends a wake-up instruction to the target device, so that the target device is awakened according to the wake-up instruction.
  • control strategy set by the user can enable the server to only send a wake-up instruction to the target device to control the target device from being woken up, so as to solve the problem that multiple devices are woken up at the same time or multiple devices cannot be woken up, and improve user experience.
  • FIG. 2 is a first flowchart of a method for waking up a device provided in an embodiment of the present application. As shown in Figure 2, the method includes:
  • the target device acquires the identifier of the control policy.
  • the target device may be a smart device such as a TV, a speaker or a refrigerator, or a smart device with a screen such as a TV or a refrigerator.
  • the identification can be obtained according to the voice instruction input by the user, or can be obtained according to the setting information sent by the control device.
  • the control device can be a smart phone, a tablet computer, etc. installed with an application program, or a controller specially designed for an application scenario of the Internet of Things, etc.
  • the voice instruction or setting information includes the identification or mapping information corresponding to the identification. It should be noted that, for the method of obtaining the identification according to the setting information, refer to the embodiment in FIG. 4 .
  • mapping list is pre-stored in the target device, and the mapping list includes multiple identifiers and mapping information corresponding to each identifier; after receiving the mapping information, the target device, according to the mapping information, The mapping list is searched to obtain the identification.
  • each identifier indicates at least one type of information included in the detection information.
  • the target device receives a wake-up voice input by the user, the wake-up voice includes the same wake-up word of multiple smart devices within a preset range, and the multiple smart devices include the target device.
  • the pre-range can be the range covered by the same wireless local area network, and the multiple smart devices are multiple smart devices located in the wireless local area network.
  • the wake-up words may all be "Xiao Bing Xiao Bing", or "Xiao Xi Xiao Xi” and so on.
  • the wake word of each device can be customized by the user through the application program.
  • the target device determines detection information according to the identification and the wake-up voice.
  • the detection information includes at least one of the following information: the energy of the wake-up speech; the angle of the sound source of the wake-up speech within a preset angle range in front of the target device; or the change information of the sound source.
  • each piece of detection information may also include a device identifier, which is used to instruct the server to send the device that detects the information.
  • the preset angle range is 0-180 degrees from left to right in front of the device.
  • the flag when the flag is "1", it indicates that the detection information includes energy; when it is marked as “2”, it indicates that the detection information includes energy and angle; when it is marked as "3", it indicates that the detection information includes energy, angle and change information.
  • mapping information or the voice command includes “Voice Smart Mode 1", it indicates that the detection information includes energy; when the mapping information or the voice command includes “Voice Smart Mode 2", it indicates that the detection information includes energy and angle; When the mapping information or the voice command includes "Voice Smart Mode 3", it indicates that the detection information includes energy, angle and change information.
  • the target device sends detection information to the server.
  • the server receives detection information sent by multiple smart devices.
  • Each of the multiple smart devices sends detection information to the server.
  • the method for each smart device to send the detection information to the server is similar to the methods in S201-S204, and will not be repeated here.
  • the server determines the target device among the multiple smart devices according to the detection information sent by the multiple smart devices.
  • the server determines the smart device with the highest energy among the multiple smart devices as the target device.
  • its corresponding application scenario may be as shown in FIG. 5 .
  • the server determines the smart device with the highest wakeup score among the multiple smart devices as the target device.
  • the arousal score is equal to the sum of the product of the energy and the weight corresponding to the energy and the product of the angle and the weight corresponding to the angle.
  • the server determines the smart device with the highest wakeup score among the multiple smart devices as the target device.
  • the arousal score is equal to the product of energy and the weight corresponding to the energy, the product of the angle and the weight corresponding to the angle, and the product of the change information and the weight corresponding to the change information.
  • the server sends a wake-up instruction to the target device.
  • the wake-up indication is used to instruct the target device to wake up.
  • the target device wakes up according to the wakeup instruction.
  • the target device is an air conditioner and the target device is woken up
  • the user can adjust the temperature, wind force, and wind direction of the air conditioner through voice.
  • the device determines the detection information according to the identification and the wake-up voice, and sends the detection information to the server, and the server determines the target among multiple smart devices according to the detection information sent by multiple smart devices device, the server sends a wake-up instruction to the target device, only to control the wake-up of the target device, that is, to realize the wake-up of only the target device through the wake-up voice, which solves the problem that multiple devices are woken up at the same time or multiple devices cannot be woken up, and improves the user experience. experience.
  • the process of the wake-up method executed by the target setting side will be described below by taking the execution subject as the target device as an example. Specifically, refer to the embodiment in FIG. 3 .
  • FIG. 3 is a second flowchart of a method for waking up a device provided in an embodiment of the present application. As shown in Figure 3, the method includes:
  • the wake-up voice includes the same wake-up word of multiple smart devices within a preset range, and the multiple smart devices include a target device.
  • the target device is provided with a microphone array, the microphone array includes at least one microphone, and each microphone can receive the wake-up voice, and the target device can only execute S303 when any one of the at least one microphone receives the wake-up voice.
  • the wake-up voice is an analog signal, and the voice sequence is a digital signal.
  • the wake-up speech is sampled by using a preset sampling frequency to obtain a speech sequence.
  • the preset sampling frequency may be 16000, or other values, and the preset sampling frequency is not limited here.
  • S304 Perform segmentation processing on the speech sequence to obtain multiple speech subsequences.
  • the speech sequence is segmented according to the preset data length, and the length of each speech subsequence may be equal to the preset data length.
  • the preset data length may be 512, or other values, and the preset data length is not limited here.
  • the preset transformation length is 257, and may be other values, and the preset transformation length is not limited here.
  • an average value of sums of amplitudes corresponding to each frequency point within a preset frequency range in the frequency domain subsequence may be determined as the energy of the frequency domain subsequence.
  • Formula 1 For details, please refer to Formula 1.
  • E(n) is the energy of the nth frequency domain subsequence
  • X n (f) is the nth frequency domain subsequence
  • f is a frequency point
  • f 1 ⁇ f m is a preset frequency range.
  • the sum of amplitudes corresponding to each frequency point within a preset frequency range in the frequency domain subsequence may be determined as the energy of the frequency domain subsequence.
  • formula 2 the sum of amplitudes corresponding to each frequency point within a preset frequency range in the frequency domain subsequence.
  • S307. Determine an average value of energies corresponding to each of the multiple frequency domain subsequences as the energy of the wake-up speech.
  • an average value of energies corresponding to the partial frequency-domain subsequences may be determined as the energy of the wake-up speech.
  • the energy of the wake-up speech can be obtained as the following formula 3.
  • E is the energy of the wake-up speech, and part of the frequency-domain subsequences are the a-th frequency-domain sub-sequence to the b-th frequency-domain sub-sequence.
  • an average value of energies corresponding to all frequency-domain subsequences may be determined as the energy of the wake-up speech.
  • the energy of the wake-up speech can be obtained as the following formula 4.
  • the wake-up voice in S308 includes that each microphone in the microphone array can receive the wake-up voice.
  • the obtained search function can be the following formula 5:
  • Aml( ⁇ ) is the search function
  • is the angle variable
  • PA( ⁇ ,f) is the directional derivative
  • Rxx(f) is the covariance matrix
  • Rxx(f) X(f)*X * (f)
  • X(f) is the frequency-domain signal corresponding to the wake-up voice collected by multiple microphones included in the microphone array on the target device
  • X * (f ) is the conjugate matrix of X(f).
  • d is the distance between the microphones
  • c is the speed of light
  • the search function can be optimized and searched through the ion group optimization algorithm, and the angle corresponding to the maximum function value of the search function can be obtained.
  • S311 Determine a first angle corresponding to the first audio frame of the wake-up voice, and a second angle corresponding to the last audio frame.
  • the method for determining the first angle and the second angle is the same as the method for S306 to S307, and will not be repeated here.
  • S312. Determine change information of the sound source according to the first angle and the second angle.
  • the difference between the first angle and the second angle may be determined as the change information of the sound source.
  • the ratio of the first angle and the second angle to the duration of the wake-up speech may also be determined as the change information of the sound source.
  • the identification indicates that the detection information includes energy, angle, and sound source change information, so that the target device sequentially determines the energy, angle, and sound source change information, and sends the detection information to the server (including the determined energy, angle, and change information of the sound source) to improve the accuracy of the server in determining the target device.
  • FIG. 4 is a schematic diagram of a setting interface provided by an embodiment of the present application.
  • the control device may display a setting interface, and the setting interface includes: multiple controls of the control strategy.
  • Each control has a corresponding name, for example, the name corresponding to control 1 is control strategy 1, and for example, the name corresponding to control 2 is control strategy 2.
  • the setting interface may also include descriptions corresponding to the controls. Wherein, description is used to let the user understand the way the server determines the target device. For example, the description corresponding to control 1 is "decision based on energy".
  • the identification corresponding to each control is pre-stored in the control device.
  • the control device searches for the identification corresponding to the control, and sends setting information to the target device according to the identification.
  • each smart device includes two judgment modules: an energy judgment module and a direction of arrival (DOA) judgment module.
  • the energy judgment module can be used to execute the above S303-S307
  • the DOA judgment module can be used to execute the above-mentioned S308-S312.
  • the identification in this application can control the switch of the energy judgment module and/or the DOA judgment module. For example, if the identification indicates that the detection information includes energy, the energy judging module is controlled to be turned on. For example, if the identification indicates that the detection information includes energy, angle and change information, then the energy judging module and the DOA judging module are controlled to be turned on at the same time.
  • each smart device may also include: a reverberation sound module.
  • the reverberation sound module is used to linearly reverberate the speech sequence corresponding to each microphone in each microphone array through the generalization weighted prediction error (Generalization Weighted Prediction Error, Gwpe), and obtain the frequency domain corresponding to each microphone signal, and further inverse transform processing can be performed on the frequency signal corresponding to each microphone to obtain the speech sequence after the linear anti-reverberation sound processing, and perform S304 ⁇ S312.
  • the voice sequence corresponding to the microphone can be obtained through the above S302-S303.
  • FIG. 5 shows application scenario 1 provided by the embodiment of the present application.
  • the application scenario includes, for example: device 1 , device 2 and a user.
  • Device 1 and Device 2 can be any smart device with or without a screen.
  • Both the device 1 and the device 2 can execute the methods shown in S301-S307 in the embodiment of FIG. 3 to obtain the detection information, and send the detection information to the server (not shown in FIG. 5 ).
  • the server determines that the energy corresponding to the device 2 is greater than the energy corresponding to the device 1, it determines that the device 2 is the target device, and sends a wake-up instruction to the device 2, so that the device 2 wakes up.
  • FIG. 6 shows the second application scenario provided by the embodiment of the present application.
  • the application scenario includes, for example: device 1 , device 2 , device 3 and a user.
  • device 1 When device 1 is a smart device with a screen (such as a TV), and device 2 and device 3 are smart devices without a screen (for example, both are speakers), device 1 can execute the steps in steps S301 to S310 in the embodiment of FIG. 3 above. Get the detection information by the method shown, and send the detection information to the server. However, the device 2 and the device 3 can only execute the method shown in S301-S307 in the embodiment of FIG. 3 to obtain the detection information (that is, the angle in the detection information is 0), and send the detection information to the server.
  • the server may determine the target device through the following method: For the detection information sent by each device, if the angle is a non-zero value, then judge whether the angle is within the preset angle range in front of the device, if the angle is within the If the device is within the preset angle range in front of the device, the device is determined as the target device. For example, in Fig. 6, device 1 may be determined as the target device.
  • FIG. 7 shows the second application scenario provided by the embodiment of the present application.
  • the application scenario includes, for example: device 1 , device 2 and user.
  • device 1 is a smart device with a screen (such as a TV) and device 2 is a smart device with a screen (such as a refrigerator), the screens of device 1 and device 2 are perpendicular to each other.
  • both device 1 and device 2 execute the methods shown in S301-S312 in the embodiment of FIG. 3 to obtain detection information, and send the detection information to the server.
  • the server determines the target device through the following two methods.
  • Method 1 When the server determines that the angle corresponding to device 1 is within the preset angle range and the angle corresponding to device 2 is within the preset angle range, the server determines the device with the smallest sound source change information among device 1 and device 2 as the target equipment. For example, device 2 is determined as the target device in FIG. 7 .
  • Method 2 The server determines the corresponding wake-up scores of device 1 and device 2 according to the corresponding energy, angle and change information of device 1 and device 2, and determines the smart device with the highest wake-up score among device 1 and device 2 as the target equipment.
  • FIG. 8 is a third flowchart of a method for waking up a device according to an embodiment of the present application. As shown in Figure 8, the method includes:
  • the detection information includes the energy of the wake-up voice, the angle of the sound source of the wake-up voice within a preset angle range in front of the smart device, and the change information of the sound source.
  • the detection information also has the following 2 designs.
  • the detection information includes the energy of the wake-up speech.
  • the detection information includes the energy of the wake-up voice and the angle of the sound source of the wake-up voice within a preset angle range in front of the smart device.
  • the minimum variation information of the sound source indicates the minimum DOA variation.
  • the server determines the smart device with the highest energy among the multiple smart devices as the target device.
  • the server judges whether the angle of the first smart device among the multiple smart devices is the same as the pre-stored preset angle of the first smart device; if so, the first smart device is determined as the target device; If not, according to the energies and angles corresponding to the multiple smart devices, determine the wake-up scores corresponding to each of the multiple smart devices, and determine the smart device with the highest wake-up score among the multiple smart devices as the target device.
  • the pre-stored preset angle may be empty.
  • the preset angle may be 90 degrees.
  • device 1 may be determined as the first smart device.
  • the detection information is different, and the method for determining the target device by the server is also different, which improves the diversity and flexibility of determining the target device.
  • FIG. 9 is a first schematic diagram of a device wake-up device provided by an embodiment of the present application.
  • the wake-up device 10 is applied to a target device, and the wake-up device 10 includes: an acquiring module 101, a receiving module 102 and a determining module 103;
  • the obtaining module 101 is configured to obtain the identification of the control strategy
  • the receiving module 102 is configured to receive a wake-up voice input by the user, the wake-up voice includes the same wake-up word of multiple smart devices within a preset range, and the multiple smart devices include the target device;
  • the determination module 103 is configured to determine the detection information according to the identification and the wake-up voice, and send the detection information to the server;
  • the receiving module 102 is further configured to receive the wake-up indication sent by the server, and control the target device to wake up according to the wake-up indication.
  • the wake-up device 10 provided in the embodiment of the present application can execute the method steps performed by the target device in the above-mentioned method embodiments, and its implementation principles and beneficial effects are similar, and will not be repeated here.
  • the detection information includes at least one of the following information:
  • the identification indicates that the detection information includes the energy of the wake-up speech; the determination module 103 is specifically configured to:
  • the identification indicates that the detection information includes the angle of the sound source of the wake-up voice within a preset angle range in front of the target device; the determination module 103 is specifically configured to:
  • the angle corresponding to the maximum function value of the search function is determined as the angle.
  • the identification indicates that the detection information includes the change information of the sound source of the wake-up speech; the determination module 103 is specifically configured to:
  • change information of the sound source is determined.
  • the wake-up device 10 provided in the embodiment of the present application can execute the method steps performed by the target device in the above-mentioned method embodiments, and its implementation principles and beneficial effects are similar, and will not be repeated here.
  • FIG. 10 is a second schematic diagram of a device wake-up device provided by an embodiment of the present application.
  • the wake-up device 20 is applied to a server, and the wake-up device 20 includes: a receiving module 201, a determining module 202, and a sending module 203; wherein,
  • the receiving module 201 is configured to receive detection information sent by multiple smart devices
  • the determining module 202 is configured to determine the target device among the multiple smart devices according to the detection information sent by the multiple smart devices;
  • the sending module 203 is configured to send a wake-up indication to the target device.
  • the wake-up device 20 provided in the embodiment of the present application can execute the method steps performed by the server in the above method embodiments, and its implementation principles and beneficial effects are similar, and will not be repeated here.
  • the detection information includes at least one of the following information:
  • the detection information includes the energy of the wake-up speech; the determining module 102 is specifically configured to:
  • the smart device with the largest energy among the plurality of smart devices is determined as the target device.
  • the detection information includes the energy of the wake-up voice and the angle of the sound source of the wake-up voice within a preset angle range in front of the smart device; the determination module 102 is specifically configured to include:
  • the angle for judging whether there is a first smart device among the plurality of smart devices is the same as the pre-stored preset angle of the first smart device;
  • the wake-up scores corresponding to the multiple smart devices determine the smart device with the largest wake-up score among the multiple smart devices as the target device.
  • the detection information includes the energy of the wake-up voice and the angle of the sound source of the wake-up voice within a preset angle range in front of the smart device and the change information of the sound source; the determination module 102 is specifically configured to:
  • the wake-up device 20 provided in the embodiment of the present application can execute the method steps performed by the server in the above method embodiments, and its implementation principles and beneficial effects are similar, and will not be repeated here.
  • FIG. 11 is a schematic diagram of a hardware structure of a smart device provided by an embodiment of the present application.
  • the smart device 30 includes: a processor 301 and a memory 302,
  • processor 301 and the memory 302 are connected through a bus 303 .
  • the processor 301 executes the computer-executed instructions stored in the memory 302, so that the processor 301 executes the above method executed by the target device.
  • FIG. 12 is a schematic diagram of a hardware structure of a server provided by an embodiment of the present application.
  • the server 40 includes: a processor 401 and a memory 402,
  • processor 401 and the memory 402 are connected through a bus 403 .
  • the processor 401 executes the computer-executed instructions stored in the memory 402, so that the processor 401 executes the above method executed by the target device.
  • the processor can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP ), Application Specific Integrated Circuit (ASIC), etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in conjunction with the application can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • Memory may include high-speed RAM memory, and may also include non-volatile storage NVM, such as disk memory.
  • NVM non-volatile storage
  • the bus can be an Industry Standard Architecture (Industry Standard Architecture, ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the buses in the drawings of the present application are not limited to only one bus or one type of bus.
  • An embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the method as executed by the above-mentioned target device.
  • An embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the method as executed by the server above.
  • An embodiment of the present application provides a computer program product, including a computer program.
  • the computer program is executed by a processor, the above-mentioned method performed by the target device is implemented.
  • An embodiment of the present application provides a computer program product, including a computer program.
  • the computer program is executed by a processor, the above-mentioned method performed by the server is implemented.
  • the above-mentioned computer-readable storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable Programmable Read Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
  • An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium.
  • the readable storage medium can also be a component of the processor.
  • the processor and the readable storage medium may be located in Application Specific Integrated Circuits (ASIC for short).
  • ASIC Application Specific Integrated Circuits
  • the processor and the readable storage medium can also exist in the device as discrete components.
  • the division of units is only a division of logical functions. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the aforementioned program can be stored in a computer-readable storage medium.
  • the program executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Electric Clocks (AREA)
  • Selective Calling Equipment (AREA)

Abstract

本申请实施例提供一种设备的唤醒方法,该方法包括:获取控制策略的标识;接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备;根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息;接收服务器发送唤醒指示,并根据唤醒指示控制目标设备唤醒。

Description

设备的唤醒方法
本申请要求于2021年08月18日提交中国专利局、申请号为202110949891.5、申请名称为“设备的唤醒方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及控制技术领域,尤其涉及一种设备的唤醒方法。
背景技术
目前,随着工业互联网技术的发展,用户可以通过唤醒词(例如小冰小冰)唤醒设备(例如冰箱)。
在相关技术中,当多个设备的唤醒词一致时,存在多个设备同时被唤醒或者多个设备都无法被唤醒的问题,导致用户体验差。
发明内容
本申请实施例提供一种设备的唤醒方法,用于解决多个设备同时被唤醒或者多个设备都无法被唤醒的问题,提高用户体验。
第一方面,本申请实施例提供一种设备的唤醒方法,应用于目标设备,方法包括:
获取控制策略的标识;
接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备;
根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息;
接收服务器发送唤醒指示,并根据唤醒指示控制目标设备唤醒。
第二方面,本申请实施例提供一种设备的唤醒方法,应用于服务器,方法包括:
接收多个智能设备发送的检测信息;
根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备;
并向目标设备发送唤醒指示。
第三方面,本申请实施例提供一种设备的唤醒装置,应用于目标设备,装置包括:获取模块、接收模块和确定模块;
获取模块,被配置为获取控制策略的标识;
接收模块,被配置为接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多 个智能设备的相同唤醒词,多个智能设备中包括目标设备;
确定模块,被配置为根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息;
接收模块,还被配置为接收服务器发送唤醒指示,并根据唤醒指示控制目标设备唤醒。
第四方面,本申请实施例提供一种设备的唤醒装置,应用于服务器,装置包括:接收模块、确定模块和发送模块;其中,
接收模块,被配置为接收多个智能设备发送的检测信息;
确定模块,被配置为根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备;
发送模块,被配置为并向目标设备发送唤醒指示。
第五方面,本申请实施例提供一种智能设备,包括:处理器和存储器;
存储器存储计算机执行指令;
处理器执行存储器存储的计算机执行指令,使得处理器执行如第一方面任一项的方法。
第六方面,本申请实施例提供一种服务器,包括:处理器和存储器;
存储器存储计算机执行指令;
处理器执行存储器存储的计算机执行指令,使得处理器执行如第二方面任一项的方法。
第七方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行如第一方面任一项的方法。
第八方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行如第二方面任一项的方法。
第九方面,本申请实施例提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现如第一方面任一项的方法。
第十方面,本申请实施例提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现如第二方面任一项的方法。
本申请实施例提供一种设备的唤醒方法,该方法包括:获取控制策略的标识;接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备;根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息;接收服务器发送唤醒指示,并根据唤醒指示控制目标设备唤醒。在该方 法中,设备根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息,服务器根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备,服务器向目标设备发送唤醒指示,仅以控制目标设备唤醒,即实现通过唤醒语音仅控制目标设备唤醒,解决了多个设备同时被唤醒或者多个设备都无法被唤醒的问题,提高了用户体验。
附图说明
图1为本申请实施例提供的设备的唤醒方法的应用场景示意图;
图2为本申请实施例提供的设备的唤醒方法的流程图一;
图3为本申请实施例提供的设备的唤醒方法的流程图二;
图4为本申请实施例提供的设置界面的示意图;
图5为本申请实施例提供的应用场景一;
图6为本申请实施例提供的应用场景二;
图7为本申请实施例提供的应用场景二;
图8为本申请实施例提供的设备的唤醒方法的流程图三;
图9为本申请实施例提供的设备的唤醒装置的示意图一;
图10为本申请实施例提供的设备的唤醒装置的示意图二;
图11为本申请实施例提供的智能设备的硬件结构示意图;
图12为本申请实施例提供的服务器的硬件结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了解决相关技术中,当多个设备的唤醒词一致时,存在多个设备同时被唤醒或者多个设备都无法被唤醒的问题,导致用户体验差的问题,在本申请中,发明人想到设置控制策略,在该控制策略下,根据用户输入的唤醒语音,确定要唤醒的目标设备,进而控制目标设备唤醒,从而解决多个设备同时被唤醒或者多个设备都无法被唤醒的问题,提高用户体验。
下面结合图1对本申请中提供的设备的唤醒方法的应用场景进行说明。
图1为本申请实施例提供的设备的唤醒方法的应用场景示意图。如图1所示,例如应 用场景中包括:多个设备、服务器和用户。例如多个设备中包括设备1、设备2、设备3。多个设备的唤醒词均相同,例如唤醒词均为“小U小U”。用户可以设置多个设备的控制策略。在用户设置控制策略之后,当用户呼叫“小U小U”时,多个设备均可以接收到唤醒语音“小U小U”,并根据设置的控制策略和唤醒语音,确定检测信息,进而向服务器发送检测信息。
服务器在接收到检测信息之后,根据检测信息,在多个设备中确定要唤醒的目标设备,进而向目标设备发送唤醒指示,以使目标设备根据唤醒指示被唤醒。
在本申请中,用户设置的控制策略,能够使得服务器仅向目标设备发送唤醒指示,以控制目标设备被唤醒,从而解决多个设备同时被唤醒或者多个设备都无法被唤醒的问题,提高用户体验。
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图2为本申请实施例提供的设备的唤醒方法的流程图一。如图2所示,该方法包括:
S201、目标设备获取控制策略的标识。
目标设备可以为电视、音箱或者冰箱等智能设备,或者为电视或者冰箱等带有屏幕的智能设备。
标识可以为根据用户输入的语音指令得到的,也可以为根据控制设备发送的设置信息得到的。
控制设备可以为安装有应用程序的智能手机、平板电脑等,还可以为专门为物联网应用场景设计的控制器等。
当根据语音指令或者设置信息得到标识时,语音指令或者设置信息中包括标识、或者包括标识对应的映射信息。需要说明的是,根据设置信息得到标识的方法可以参见图4实施例。
当语音指令或者设置信息中包括映射信息时,在目标设备中预先存储映射列表,映射列表中包括多个标识和每个标识对应的映射信息;目标设备在接收映射信息之后,根据映射信息,对映射列表进行查找,得到标识。
在本申请中,控制策略的标识可以有多个,每个标识指示检测信息中包括的至少一种类型的信息。
S202、目标设备接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备。
可选地,预先范围可以为同一个无线局域网络覆盖的范围,多个智能设备为位于无线 局域网络中多个智能设备。
例如,唤醒词可以均为“小冰小冰”、或者“小洗小洗”等。可选地,每个设备的唤醒词均可以为用户通过应用程序进行自定义设置的。
S203、目标设备根据标识和唤醒语音,确定检测信息。
其中,检测信息中包括如下至少一种信息:唤醒语音的能量;唤醒语音的声源在目标设备的前方的预设角度范围内的角度;或者,声源的变化信息。
可选地,每个检测信息中还可以包括设备的标识,设备的标识用于指示服务器发送检测信息的设备。
其中,预设角度范围为设备的正前方从左至右的0~180度。
例如,当标识为“1”时,指示检测信息中包括能量;当标识为“2”时,指示检测信息中包括能量和角度;当标识为“3”时,指示检测信息中包括能量、角度和变化信息。
例如,当映射信息或者语音指令中包括“语音智能方式1”时,指示检测信息中包括能量;当映射信息或者语音指令中包括“语音智能方式2”时,指示检测信息中包括能量和角度;当映射信息或者语音指令中包括“语音智能方式3”时,指示检测信息中包括能量、角度和变化信息。
S204、目标设备向服务器发送检测信息。
S205、服务器接收多个智能设备发送的检测信息。
多个智能设备中的每个智能设备均向服务器发送检测信息。各智能设备向服务器发送检测信息的方法与S201~S204的方法相似,此处不再赘述。
S206、服务器根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备。
例如,当检测信息仅包括能量时,服务器将多个智能设备中能量最大的智设备确定为目标设备。当检测信息仅包括能量时,其对应的应用场景可以如图5所示。
例如,当检测信息包括能量和角度时,服务器将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。其中,唤醒评分等于能量和能量对应的权重的乘积与角度和角度对应的权重的乘积之和。当检测信息包括能量和角度时,其对应的应用场景可以如图6所示。
例如,当检测信息包括能量、角度和声源的变化信息时,服务器将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。其中,唤醒评分等于能量和能量对应的权重的乘积、角度和角度对应的权重的乘积、以及变化信息和变化信息对应的权重的乘积。当检测信息包括能量、角度和声源的变化信息时,其对应的应用场景可以如图7所示。
S207、服务器向目标设备发送唤醒指示。
其中,唤醒指示用于指示目标设备唤醒。
S208、目标设备根据唤醒指示唤醒。
例如,当目标设备为空调时,目标设备被唤醒,则用户可以通过语音调节空调的温度、风力、风向等。
在图2实施例提供的设备的唤醒方法中,设备根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息,服务器根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备,服务器向目标设备发送唤醒指示,仅以控制目标设备唤醒,即实现通过唤醒语音仅控制目标设备唤醒,解决了多个设备同时被唤醒或者多个设备都无法被唤醒的问题,提高了用户体验。
下面以执行主体为目标设备为例,对目标设置侧执行的唤醒方法的过程进行说明。具体的,请参见图3实施例。
图3为本申请实施例提供的设备的唤醒方法的流程图二。如图3所示,该方法包括:
S301、获取控制策略的标识,标识指示检测信息中包括唤醒语音的能量、唤醒语音的声源在目标设备的前方的预设角度范围内的角度、声源的变化信息。
S302、接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备。
S303、对唤醒语音进行采样处理,得到语音序列。
需要说明的是,目标设备上设置有麦克风阵列,麦克风阵列中包括至少一个麦克风,每个麦克风均可以接收唤醒语音,目标设备可以仅对至少一个麦克风中的任意一个麦克风接收到唤醒语音执行S303中的方法。
唤醒语音为模拟信号,语音序列为数字信号。
可选地,采用预设采样频率,对唤醒语音进行采样处理,得到语音序列。
例如,预设采样频率可以为16000,还可以为其他,此处不对预设采样频率进行限定。
S304、对语音序列进行分段处理,得到多个语音子序列。
可选地,根据预设数据长度对语音序列进行分段处理,多个语音子序列,每个语音子序列的长度可以等于预设数据长度。
例如,预设数据长度可以为512,还可以为其他,此处不对预设数据长度进行限定。
S305、分别对多个语音子序列进行频域变换,得到多个频域子序列。
可选地,根据预设变换长度对分别对多个语音子序列进行频域变换,得到多个频域子序列。
例如,预设变换长度为257,还可以为其他,此处不对预设变换长度进行限定。
S306、确定在预设频率范围内多个频域子序列各自对应的能量。
可选地,可以将频域子序列中预设频率范围内的每个频点对应的幅度之和的平均值,确定为频域子序列的能量。具体的,请参见公式1。
Figure PCTCN2022097202-appb-000001
其中,E(n)为第n个频域子序列的能量,X n(f)为第n个频域子序列,f为频点,f 1~f m为预设频率范围。
可选地,可以将频域子序列中预设频率范围内的每个频点对应的幅度之和,确定为频域子序列的能量。具体的,请参见公式2。
Figure PCTCN2022097202-appb-000002
S307、将多个频域子序列各自对应的能量的平均值,确定为唤醒语音的能量。
可选地,可以将部分频域子序列各自对应的能量的平均值,确定为唤醒语音的能量。例如,得到的唤醒语音的能量可以为如下公式3。
Figure PCTCN2022097202-appb-000003
其中,E为唤醒语音的能量,部分频域子序列为第a个频域子序列至第b个频域子序列。
可选地,可以将所有频域子序列各自对应的能量的平均值,确定为唤醒语音的能量。例如,得到的唤醒语音的能量可以为如下公式4。
Figure PCTCN2022097202-appb-000004
S308、根据唤醒语音和角度变量,构造角度变量的搜索函数。
需要说明的是,在S308中唤醒语音包括麦克风阵列中每个麦克风均可以接收唤醒语音。
可选地,得到的搜索函数可以为如下公式5:
Figure PCTCN2022097202-appb-000005
其中,Aml(θ)为搜索函数,θ为角度变量,PA(θ,f)为方向导量,Rxx(f)为协方差矩阵。
其中,Rxx(f)=X(f)*X *(f),X(f)为目标设备上麦克风阵列中包括的多个麦克风各自采集到的唤醒语音对应的频域信号,X *(f)为X(f)的共轭矩阵。
其中,
Figure PCTCN2022097202-appb-000006
d为麦克风之间的距离,c为光速。
S309、对搜索函数进行优化搜索,得到搜索函数的函数值最大时对应的角度。
具体的,能够通过离子群优化算法对搜索函数进行优化搜索,得到搜索函数的函数值最大时对应的角度。
S310、将搜索函数的函数值最大时对应的角度,确定为唤醒语音的声源在目标设备的前方的预设角度范围内的角度。
S311、确定唤醒语音的首个音频帧对应的第一角度、以及最后一个音频帧对应的第二角度。
具体的,确定第一角度和第二角度的方法与S306至S307的方法相同,此处不再赘述。
S312、根据第一角度和第二角度,确定声源的变化信息。
可选地,可以将第一角度和第二角度的差值确定为声源的变化信息。
可选地,也可以将第一角度和第二角度与唤醒语音的持续时长的比值,确定为声源的变化信息。
在图3实施例提供的设备的唤醒方法中,标识指示检测信息中包括能量、角度、声源的变化信息,使目标设备依次确定能量、角度、声源的变化信息,并向服务器发送检测信息(包括确定出的能量、角度、声源的变化信息),提高服务器确定目标设备的准确性。
图4为本申请实施例提供的设置界面的示意图。如图4所示,控制设备可以显示设置界面,设置界面中包括:控制策略的多个控件。每个控制具有对应的名称,例如控件1对应的名称为控制策略1,例如控件2对应的名称为控制策略2。
设置界面中还可以包括控件对应的描述。其中,描述用于使用户了解服务器确定目标设备的方式。例如,控件1对应的描述为“依据能量决策”。
在具体应用中,控制设备中预先存储有每个控件对应的标识,当用户点击其中一个控件时,控制设备查找与控件对应标识,并根据该标识向目标设备发送设置信息。
需要说明的是,每个智能设备中包括2个判决模块:能量判决模块和波达方向(direction of arrival,DOA)判决模块。其中,能量判决模块能够用于执行上述S303~S307,DOA判决模块能够用于执行上述S308~S312。
本申请中的标识可以控制能量判决模块和/或DOA判决模块的开关。例如,标识指示检测信息中包括能量,则控制能量判决模块打开。例如,标识指示检测信息中包括能量、角度和变化信息,则控制能量判决模块DOA判决模块同时打开。
可选地,每个智能设备中还可以包括:混响音模块。混响音模块用于通过泛化加权预 测误差(Generalization Weighted Prediction Error,Gwpe)对每个麦克风阵列中的每个麦克风对应的语音序列进行线性去混响音处理,得到每个麦克风对应的频域信号,进一步地可以分别对每个麦克风对应的频信号进行逆变换处理,得到进行线性去混响音处理后的语音序列,并针对每个进行线性去混响音处理后的语音序列执行S304~S312。其中,麦克风对应的语音序列可以通过上述S302~S303得到。
需要说明的是,在采用Gwpe进行线性去混响音处理的过程中,涉及矩阵求逆计算,因此导致计算量较大,处理时间较长,使得混响音处理的效率较低。而在本申请中,采用LDLT分解算法代替矩阵求逆计算,从而节省计算量,降低处理时间,提高混响音处理的效率。
在实际应用中,当利用通道直接的相干关系进行混响抑制时,对智能设备的依赖性较大,由于实际中各个智能设备采用的麦克风型号不一定相同,麦克风的灵敏度、频响等可能存在差异,因此不适用利用通道直接的相干关系进行混响抑制,若要利用通道直接的相干关系进行混响抑制,则需要对麦克风进行校准,由于校准方式较复杂,因此导致进行混响抑制的过程较为复杂。而在本申请中,Gwpe属于线性预测,对智能设备依赖性较小,无需对麦克风进行校准,简化了混响抑制的过程。
图5为本申请实施例提供的应用场景一。当检测信息仅包括能量时,如图5所示,应用场景例如包括:设备1、设备2和用户。
设备1、设备2可以为任意带有屏幕或者无屏幕的智能设备。
设备1和设备2均可以执行上述图3实施例中S301~S307所示的方法得到检测信息,向服务器发送检测信息(图5中未示出)。
服务器根据检测信息,若确定设备2对应的能量大于设备1对应的能量,则确定设备2为目标设备,并向设备2发送唤醒指示,以使设备2唤醒。
图6为本申请实施例提供的应用场景二。当检测信息包括能量和角度时,如图6所示,应用场景例如包括:设备1、设备2、设备3和用户。
当设备1为带有屏幕的智能设备(例如电视)、设备2和设备3为未带有屏幕的智能设备(例如均为音箱)时,设备1可以执行上述图3实施例中S301~S310所示的方法得到检测信息,向服务器发送检测信息。而设备2和设备3仅可以执行图3实施例中S301~S307所示的方法得到检测信息(即检测信息中的角度为0),向服务器发送检测信息。
可选地,服务器可以通过如下方法确定目标设备:针对每个设备发送的检测信息,若该角度为非0值,则判断角度是否在该设备的前方的预设角度范围内,若角度在该设备的前方的预设角度范围内,则将该设备确定为目标设备。例如在图6中可以将设备1确定为 目标设备。
图7为本申请实施例提供的应用场景二。当检测信息包括能量、角度和声源的变化信息时,如图7所示,应用场景例如包括:设备1、设备2和用户。
可选地,当设备1为带有屏幕的智能设备(例如电视)、设备2为带有屏幕的智能设备(例如冰箱)时,设备1和设备2屏幕互相垂直,当用户输入唤醒语音的过程中,由位置1变动到位置2,设备1和设备2均执行上述图3实施例中S301~S312所示的方法得到检测信息,向服务器发送检测信息。
可选地,服务器通过如下2种方法确定目标设备。
方法1,服务器在确定设备1对应的角度在预设角度范围内、设备2对应的角度在预设角度范围内时,将设备1、设备2中声源的变化信息最小的设备,确定为目标设备。例如在图7中设备2确定为目标设备。
方法2,服务器根据设备1和设备2各自对应的能量、角度和变化信息,确定设备1和设备2各自对应的唤醒评分,并将设备1和设备2中唤醒评分最大的智能设备,确定为目标设备。
下面以执行主体为服务器为例,对服务器侧执行的唤醒方法的过程进行说明。具体的,请参见图8实施例。
图8为本申请实施例提供的设备的唤醒方法的流程图三。如图8所示,该方法包括:
S801、接收多个智能设备发送的检测信息,检测信息中包括唤醒语音的能量、唤醒语音的声源在智能设备的前方的预设角度范围内的角度和声源的变化信息。
可选地,检测信息还具有如下2中设计。
设计1,检测信息中包括唤醒语音的能量。
设计2,检测信息中包括唤醒语音的能量和唤醒语音的声源在智能设备的前方的预设角度范围内的角度。
S802、判断多个智能设备中是否存在至少一个智能设备对应的角度在预设角度范围内。
若是,则执行S803,若否,则执行S804。
S803、将至少一个智能设备中声源的变化信息最小的设备,确定为目标设备。
声源的变化信息最小指示DOA变动最小。
S804、根据多个智能设备对应的能量、角度和变化信息,确定多个智能设备各自对应的唤醒评分,将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。
进一步地,针对上述设计1,服务器将多个智能设备中能量最大的智能设备,确定为目标设备。
进一步地,针对上述设计2,服务器判断多个智能设备中是否存在第一智能设备的角度与预先存储的第一智能设备的预设角度相同;若是,则将第一智能设备确定为目标设备;若否,则根据多个智能设备对应的能量和角度,确定多个智能设备各自对应的唤醒评分,并将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。
可选地,当智能设备为无屏幕的智能设备时,预先存储的预设角度可以为空。
可选地,预设角度可以为90度。例如在图6实施例中,可以将设备1确定第一智能设备。
在图8所示的设备的唤醒方法中,检测信息不同,服务器确定目标设备的方法也不同,提高确定目标设备的多样性和灵活性。
图9为本申请实施例提供的设备的唤醒装置的示意图一。该唤醒装置10应用于目标设备,唤醒装置10包括:获取模块101、接收模块102和确定模块103;
获取模块101,被配置为获取控制策略的标识;
接收模块102,被配置为接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备;
确定模块103,被配置为根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息;
接收模块102,还被配置为接收服务器发送唤醒指示,并根据唤醒指示控制目标设备唤醒。
本申请实施例提供的唤醒装置10可以执行上述方法实施例中目标设备执行的方法步骤,其实现原理以及有益效果类似,此处不再进行赘述。
在一种可能的设计中,检测信息中包括如下至少一种信息:
唤醒语音的能量;
唤醒语音的声源在目标设备的前方的预设角度范围内的角度;或者,
声源的变化信息。
在一种可能的设计中,标识指示检测信息中包括唤醒语音的能量;确定模块103具体被配置为:
对唤醒语音进行采样处理,得到语音序列;
对语音序列进行分段处理,得到多个语音子序列;
分别对多个语音子序列进行频域变换,得到多个频域子序列;
确定在预设频率范围内多个频域子序列各自对应的能量,并将多个频域子序列各自对应的能量的平均值,确定为唤醒语音的能量。
在一种可能的设计中,标识指示检测信息中包括唤醒语音的声源在目标设备的前方的预设角度范围内的角度;确定模块103具体被配置为:
根据唤醒语音和角度变量,构造角度变量的搜索函数;
对搜索函数进行优化搜索,得到搜索函数的函数值最大时对应的角度;
将搜索函数的函数值最大时对应的角度,确定为角度。
在一种可能的设计中,标识指示检测信息中包括唤醒语音的声源的变化信息;确定模块103具体被配置为:
确定唤醒语音的首个音频帧对应的第一角度、以及最后一个音频帧对应的第二角度;
根据第一角度和第二角度,确定声源的变化信息。
本申请实施例提供的唤醒装置10可以执行上述方法实施例中目标设备执行的方法步骤,其实现原理以及有益效果类似,此处不再进行赘述。
图10为本申请实施例提供的设备的唤醒装置的示意图二。该唤醒装置20应用于服务器,唤醒装置20包括:接收模块201、确定模块202和发送模块203;其中,
接收模块201,被配置为接收多个智能设备发送的检测信息;
确定模块202,被配置为根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备;
发送模块203,被配置为并向目标设备发送唤醒指示。
本申请实施例提供的唤醒装置20可以执行上述方法实施例中服务器执行的方法步骤,其实现原理以及有益效果类似,此处不再进行赘述。
在一种可能的设计中,检测信息中包括如下至少一种信息:
唤醒语音的能量;
唤醒语音的声源在智能设备的前方的预设角度范围内的角度;或者,
声源的变化信息。
在一种可能的设计中,检测信息中包括唤醒语音的能量;确定模块102具体被配置为:
将多个智能设备中能量最大的智能设备,确定为目标设备。
在一种可能的设计中,检测信息中包括唤醒语音的能量和唤醒语音的声源在智能设备的前方的预设角度范围内的角度;确定模块102具体被配置为,包括:
判断多个智能设备中是否存在第一智能设备的角度与预先存储的第一智能设备的预设角度相同;
若是,则将第一智能设备确定为目标设备;
若否,则根据多个智能设备对应的能量和角度,确定多个智能设备各自对应的唤醒评 分,并将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。
在一种可能的设计中,检测信息中包括唤醒语音的能量和唤醒语音的声源在智能设备的前方的预设角度范围内的角度和声源的变化信息;确定模块102具体被配置为:
判断多个智能设备中是否存在至少一个智能设备对应的角度在预设角度范围内;
若是,则将至少一个智能设备中声源的变化信息最小的设备,确定为目标设备;
若否,则根据多个智能设备对应的能量、角度和变化信息,确定多个智能设备各自对应的唤醒评分,将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。
本申请实施例提供的唤醒装置20可以执行上述方法实施例中服务器执行的方法步骤,其实现原理以及有益效果类似,此处不再进行赘述。
图11为本申请实施例提供的智能设备的硬件结构示意图。如11所示,该智能设备30包括:处理器301和存储器302,
其中,处理器301、存储器302通过总线303连接。
在具体实现过程中,处理器301执行存储器302存储的计算机执行指令,使得处理器301执行如上的目标设备执行的方法。
处理器301的具体实现过程可参见上述目标设备执行的方法,其实现原理和技术效果类似,本实施例此处不再赘述。
图12为本申请实施例提供的服务器的硬件结构示意图。如12所示,该服务器40包括:处理器401和存储器402,
其中,处理器401、存储器402通过总线403连接。
在具体实现过程中,处理器401执行存储器402存储的计算机执行指令,使得处理器401执行如上的目标设备执行的方法。
处理器401的具体实现过程可参见上述服务器执行的方法,其实现原理和技术效果类似,本实施例此处不再赘述。
在上述图11-图12所示的实施例中,应理解,处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合申请所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如磁盘存储器。
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。
本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行如上述目标设备执行的方法。
本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行如上述服务器执行的方法。
本申请实施例提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现如上述目标设备执行的方法。
本申请实施例提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现如上述服务器执行的方法。
上述的计算机可读存储介质,上述可读存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。可读存储介质可以是通用或专用计算机能够存取的任何可用介质。
一种示例性的可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称:ASIC)中。当然,处理器和可读存储介质也可以作为分立组件存在于设备中。
单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一 个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (24)

  1. 一种设备的唤醒方法,应用于目标设备,所述方法包括:
    获取控制策略的标识;
    接收用户输入的唤醒语音,所述唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,所述多个智能设备中包括所述目标设备;
    根据所述标识和所述唤醒语音,确定检测信息,并向服务器发送所述检测信息;
    接收所述服务器发送所述唤醒指示,并根据所述唤醒指示控制所述目标设备唤醒。
  2. 根据权利要求1所述的方法,其中,所述检测信息中包括如下至少一种信息:
    所述唤醒语音的能量;
    所述唤醒语音的声源在所述目标设备的前方的预设角度范围内的角度;或者,
    所述声源的变化信息。
  3. 根据权利要求1或2所述的方法,其中,所述标识指示所述检测信息中包括所述唤醒语音的能量;所述根据所述标识和所述唤醒语音,确定检测信息,包括:
    对所述唤醒语音进行采样处理,得到语音序列;
    对所述语音序列进行分段处理,得到多个语音子序列;
    分别对所述多个语音子序列进行频域变换,得到多个频域子序列;
    确定在预设频率范围内所述多个频域子序列各自对应的能量,并将所述多个频域子序列各自对应的能量的平均值,确定为所述唤醒语音的能量。
  4. 根据权利要求1-3中任一项所述的方法,其中,所述标识指示所述检测信息中包括所述唤醒语音的声源在所述目标设备的前方的预设角度范围内的角度;
    所述根据所述标识和所述唤醒语音,确定检测信息,包括:
    根据所述唤醒语音和角度变量,构造所述角度变量的搜索函数;
    对所述搜索函数进行优化搜索,得到所述搜索函数的函数值最大时对应的角度;
    将所述搜索函数的函数值最大时对应的角度,确定为所述角度。
  5. 根据权利要求1-4中任一项所述的方法,其中,所述标识指示所述检测信息中包括所述唤醒语音的声源的变化信息;所述根据所述标识和所述唤醒语音,确定检测信息,包括:
    确定所述唤醒语音的首个音频帧对应的第一角度、以及最后一个音频帧对应的第二角度;
    根据所述第一角度和第二角度,确定所述声源的变化信息。
  6. 一种设备的唤醒方法,应用于服务器,所述方法包括:
    接收多个智能设备发送的检测信息;
    根据所述多个智能设备发送的检测信息,在所述多个智能设备中确定目标设备;
    并向所述目标设备发送唤醒指示。
  7. 根据权利要求6所述的方法,其中,所述检测信息中包括如下至少一种信息:
    所述唤醒语音的能量;
    所述唤醒语音的声源在智能设备的前方的预设角度范围内的角度;或者,
    所述声源的变化信息。
  8. 根据权利要求6或7所述的方法,其中,所述检测信息中包括所述唤醒语音的能量;根据所述多个智能设备发送的检测信息,在所述多个智能设备中确定目标设备,包括:
    将所述多个智能设备中能量最大的智能设备,确定为所述目标设备。
  9. 根据权利要求6-8中任一项所述的方法,其中,所述检测信息中包括所述唤醒语音的能量和所述唤醒语音的声源在智能设备的前方的预设角度范围内的角度;根据所述多个智能设备发送的检测信息,在所述多个智能设备中确定目标设备,包括:
    判断所述多个智能设备中是否存在第一智能设备的角度与预先存储的第一智能设备的预设角度相同;
    若是,则将第一智能设备确定为所述目标设备;
    若否,则根据所述多个智能设备对应的能量和角度,确定所述多个智能设备各自对应的唤醒评分,并将多个智能设备中唤醒评分最大的智能设备,确定为所述目标设备。
  10. 根据权利要求6-9中任一项所述的方法,其中,所述检测信息中包括所述唤醒语音的能量、所述唤醒语音的声源在智能设备的前方的预设角度范围内的角度、以及所述声源的变化信息;
    根据所述多个智能设备发送的检测信息,在所述多个智能设备中确定目标设备,包括:
    判断所述多个智能设备中是否存在至少一个智能设备对应的角度在预设角度范围内;
    若是,则将所述至少一个智能设备中声源的变化信息最小的设备,确定为所述目标设备;
    若否,则根据所述多个智能设备对应的能量、角度和变化信息,确定所述多个智能设备各自对应的唤醒评分,将多个智能设备中唤醒评分最大的智能设备,确定为所述目标设备。
  11. 一种设备的唤醒装置,应用于目标设备,装置包括:获取模块、接收模块和确定模块;
    获取模块,被配置为获取控制策略的标识;
    接收模块,被配置为接收用户输入的唤醒语音,唤醒语音中包括预设范围内的多个智能设备的相同唤醒词,多个智能设备中包括目标设备;
    确定模块,被配置为根据标识和唤醒语音,确定检测信息,并向服务器发送检测信息;
    接收模块,还被配置为接收服务器发送唤醒指示,并根据唤醒指示控制目标设备唤醒。
  12. 根据权利要求11所述的装置,其中,所述检测信息中包括如下至少一种信息:
    所述唤醒语音的能量;
    所述唤醒语音的声源在所述目标设备的前方的预设角度范围内的角度;或者,
    所述声源的变化信息。
  13. 根据权利要求11或12所述的装置,其中,标识指示检测信息中包括唤醒语音的能量;确定模块具体被配置为:
    对唤醒语音进行采样处理,得到语音序列;
    对语音序列进行分段处理,得到多个语音子序列;
    分别对多个语音子序列进行频域变换,得到多个频域子序列;
    确定在预设频率范围内多个频域子序列各自对应的能量,并将多个频域子序列各自对应的能量的平均值,确定为唤醒语音的能量。
  14. 根据权利要求11-13中任一项所述的装置,其中,标识指示检测信息中包括唤醒语音的声源在目标设备的前方的预设角度范围内的角度;确定模块具体被配置为:
    根据唤醒语音和角度变量,构造角度变量的搜索函数;
    对搜索函数进行优化搜索,得到搜索函数的函数值最大时对应的角度;
    将搜索函数的函数值最大时对应的角度,确定为角度。
  15. 根据权利要求11-14中任一项所述的装置,其中,标识指示检测信息中包括唤醒语音的声源的变化信息;确定模块具体被配置为:
    确定唤醒语音的首个音频帧对应的第一角度、以及最后一个音频帧对应的第二角度;
    根据第一角度和第二角度,确定声源的变化信息。
  16. 一种设备的唤醒装置,应用于服务器,装置包括:接收模块、确定模块和发送模块;其中,
    接收模块,被配置为接收多个智能设备发送的检测信息;
    确定模块,被配置为根据多个智能设备发送的检测信息,在多个智能设备中确定目标设备;
    发送模块,被配置为并向目标设备发送唤醒指示。
  17. 根据权利要求16所述的装置,其中,检测信息中包括如下至少一种信息:
    唤醒语音的能量;
    所述唤醒语音的声源在智能设备的前方的预设角度范围内的角度;或者,
    声源的变化信息。
  18. 根据权利要求16或17所述的装置,其中,检测信息中包括唤醒语音的能量;确定模块具体被配置为:
    将多个智能设备中能量最大的智能设备,确定为目标设备。
  19. 根据权利要求16-18中任一项所述的装置,其中,检测信息中包括唤醒语音的能量和唤醒语音的声源在智能设备的前方的预设角度范围内的角度;确定模块具体被配置为:
    判断多个智能设备中是否存在第一智能设备的角度与预先存储的第一智能设备的预设角度相同;
    若是,则将第一智能设备确定为目标设备;
    若否,则根据多个智能设备对应的能量和角度,确定多个智能设备各自对应的唤醒评分,并将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。
  20. 根据权利要求16-19中任一项所述的装置,其中,检测信息中包括唤醒语音的能量、唤醒语音的声源在智能设备的前方的预设角度范围内的角度、以及声源的变化信息;确定模块具体被配置为:
    判断多个智能设备中是否存在至少一个智能设备对应的角度在预设角度范围内;
    若是,则将至少一个智能设备中声源的变化信息最小的设备,确定为目标设备;
    若否,则根据多个智能设备对应的能量、角度和变化信息,确定多个智能设备各自对应的唤醒评分,将多个智能设备中唤醒评分最大的智能设备,确定为目标设备。
  21. 一种智能设备,包括:处理器和存储器;
    存储器存储计算机执行指令;
    处理器执行存储器存储的计算机执行指令,使得处理器执行如权利要求1-5中所述的方法。
  22. 一种服务器,包括:处理器和存储器;
    存储器存储计算机执行指令;
    处理器执行存储器存储的计算机执行指令,使得处理器执行如权利要求6-10中所述的方法。
  23. 一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行如权利要求1-10中所述的方法。
  24. 一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现如权利要求1-10中所述的方法。
PCT/CN2022/097202 2021-08-18 2022-06-06 设备的唤醒方法 WO2023020076A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110949891.5A CN113763950A (zh) 2021-08-18 2021-08-18 设备的唤醒方法
CN202110949891.5 2021-08-18

Publications (1)

Publication Number Publication Date
WO2023020076A1 true WO2023020076A1 (zh) 2023-02-23

Family

ID=78790319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/097202 WO2023020076A1 (zh) 2021-08-18 2022-06-06 设备的唤醒方法

Country Status (2)

Country Link
CN (1) CN113763950A (zh)
WO (1) WO2023020076A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763950A (zh) * 2021-08-18 2021-12-07 青岛海尔科技有限公司 设备的唤醒方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461449A (zh) * 2018-12-29 2019-03-12 苏州思必驰信息科技有限公司 用于智能设备的语音唤醒方法及系统
CN110610711A (zh) * 2019-10-12 2019-12-24 深圳市华创技术有限公司 分布式物联网设备的全屋智能语音交互方法及其系统
CN111091828A (zh) * 2019-12-31 2020-05-01 华为技术有限公司 语音唤醒方法、设备及系统
CN111640431A (zh) * 2020-04-30 2020-09-08 海尔优家智能科技(北京)有限公司 一种设备响应处理方法及装置
CN112289313A (zh) * 2019-07-01 2021-01-29 华为技术有限公司 一种语音控制方法、电子设备及系统
CN112634872A (zh) * 2020-12-21 2021-04-09 北京声智科技有限公司 语音设备唤醒方法及装置
CN113763950A (zh) * 2021-08-18 2021-12-07 青岛海尔科技有限公司 设备的唤醒方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461449A (zh) * 2018-12-29 2019-03-12 苏州思必驰信息科技有限公司 用于智能设备的语音唤醒方法及系统
CN112289313A (zh) * 2019-07-01 2021-01-29 华为技术有限公司 一种语音控制方法、电子设备及系统
CN110610711A (zh) * 2019-10-12 2019-12-24 深圳市华创技术有限公司 分布式物联网设备的全屋智能语音交互方法及其系统
CN111091828A (zh) * 2019-12-31 2020-05-01 华为技术有限公司 语音唤醒方法、设备及系统
CN111640431A (zh) * 2020-04-30 2020-09-08 海尔优家智能科技(北京)有限公司 一种设备响应处理方法及装置
CN112634872A (zh) * 2020-12-21 2021-04-09 北京声智科技有限公司 语音设备唤醒方法及装置
CN113763950A (zh) * 2021-08-18 2021-12-07 青岛海尔科技有限公司 设备的唤醒方法

Also Published As

Publication number Publication date
CN113763950A (zh) 2021-12-07

Similar Documents

Publication Publication Date Title
CN106952653B (zh) 噪声去除方法、装置和终端设备
CN108922553B (zh) 用于音箱设备的波达方向估计方法及系统
US20160180837A1 (en) System and method of speech recognition
US9633655B1 (en) Voice sensing and keyword analysis
WO2023020076A1 (zh) 设备的唤醒方法
US11222652B2 (en) Learning-based distance estimation
CN113132193B (zh) 智能设备的控制方法、装置、电子设备以及存储介质
US11790888B2 (en) Multi channel voice activity detection
US9508345B1 (en) Continuous voice sensing
WO2019119593A1 (zh) 语音增强方法及装置
CN111261143B (zh) 一种语音唤醒方法、装置及计算机可读存储介质
KR20230113368A (ko) 검출들의 시퀀스에 기반한 핫프레이즈 트리거링
WO2024041512A1 (zh) 音频降噪方法、装置、电子设备及可读存储介质
WO2024051676A1 (zh) 模型训练方法、装置、电子设备及介质
WO2024027246A1 (zh) 声音信号处理方法、装置、电子设备和存储介质
CN110890104B (zh) 语音端点检测方法及系统
CN115862604B (zh) 语音唤醒模型训练及语音唤醒方法、装置及计算机设备
WO2023098103A9 (zh) 音频处理方法和音频处理装置
CN111383634B (zh) 根据基于声音的机制停用智能显示设备的显示器的方法及系统
US11205433B2 (en) Method and apparatus for activating speech recognition
CN114333017A (zh) 一种动态拾音方法、装置、电子设备及存储介质
CN112947100B (zh) 一种语音助手设备唤醒方法、装置、系统及存储介质
WO2024016793A1 (zh) 语音信号的处理方法、装置、设备及计算机可读存储介质
TWI756817B (zh) 語音活動偵測裝置與方法
US20230113883A1 (en) Digital Signal Processor-Based Continued Conversation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857390

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE