CN111899730A

CN111899730A - Voice control method, device and computer readable storage medium

Info

Publication number: CN111899730A
Application number: CN201910370848.6A
Authority: CN
Inventors: 迟欣; 吴海全; 张恩勤; 曹磊; 何桂晓
Original assignee: Shenzhen Grandsun Electronics Co Ltd
Current assignee: Shenzhen Grandsun Electronics Co Ltd
Priority date: 2019-05-06
Filing date: 2019-05-06
Publication date: 2020-11-06

Abstract

The present invention relates to the field of communications technologies, and in particular, to a voice control method, apparatus, and computer-readable storage medium. The voice control method comprises the following steps: collecting voice data; compressing the voice data; transmitting the compressed voice data to a central gateway, and decompressing the compressed voice data to obtain first transmission data; transmitting the first transmission data from the central gateway to a cloud end, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis; and transmitting the control instruction to a central gateway, transmitting the control instruction to intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction. The voice control method can ensure that the cloud can identify and analyze the voice data.

Description

Voice control method, device and computer readable storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a voice control method, apparatus, and computer-readable storage medium.

Background

The smart device refers to a device, an apparatus, or a machine with computing and processing capabilities, and for example, a bluetooth speaker is taken as an example, and the bluetooth speaker can be controlled by voice, for example, the bluetooth speaker is controlled by voice to play music.

At present, a cloud recognition technology is often adopted to recognize voice, user speaking content is collected by product equipment, data is compressed and then transmitted to a cloud, and the problem that a cloud server cannot recognize the data possibly exists.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present invention provide a voice control method, apparatus, and computer readable storage medium, which ensure that a cloud can identify and analyze voice data.

A first aspect of an embodiment of the present invention provides a voice control method, including the following steps: collecting voice data; compressing the voice data; transmitting the compressed voice data to a central gateway, and decompressing the compressed voice data to obtain first transmission data; transmitting the first transmission data from the central gateway to a cloud end, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis; and transmitting the control instruction to a central gateway, transmitting the control instruction to intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction.

In a first possible implementation manner of the first aspect of the embodiments of the present invention, the transmitting the compressed voice data to a central gateway, and decompressing the compressed voice data to obtain first transmission data includes: and transmitting the compressed voice data to the central gateway through a low-power-consumption Bluetooth protocol, and decompressing the compressed voice data to obtain first transmission data.

In a second possible implementation manner of the first aspect of the embodiment of the present invention, after the acquiring the voice data, the method further includes: and preprocessing the voice data.

In a third possible implementation manner of the first aspect of the embodiment of the present invention, after the acquiring the voice data, the method further includes: determining whether the voice data contains a wake-up word, and if so, executing wake-up operation; wherein the speech data comprises a set of speech segments within successive time periods.

With reference to the third possible implementation manner of the first aspect of the embodiment of the present invention, in a fourth possible implementation manner, the determining whether the voice data includes a wakeup word, and if the voice data includes the wakeup word, before executing a wakeup operation, the method further includes: confirming whether the sound energy of the voice data is larger than a preset energy threshold value; and when the sound energy of the voice data is larger than a preset energy threshold value, determining whether the voice data contains a wake-up word, and if so, executing wake-up operation.

A second aspect of an embodiment of the present invention provides a voice control apparatus, including: the acquisition module is used for acquiring voice data; the compression module is used for compressing and processing the voice data; the first transmission module is used for transmitting the compressed voice data to the central gateway and decompressing the compressed voice data to obtain first transmission data; the identification module is used for transmitting the first transmission data from the central gateway to the cloud, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis; and the execution module is used for transmitting the control instruction to the central gateway, transmitting the control instruction to the intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction.

In a first possible implementation manner of the second aspect of the embodiment of the present invention, the voice control apparatus further includes a preprocessing module, configured to preprocess the voice data.

In a second possible implementation manner of the second aspect of the embodiment of the present invention, the voice control apparatus further includes a wake-up module, configured to determine whether the voice data includes a wake-up word, and if the voice data includes the wake-up word, execute a wake-up operation; wherein the voice data comprises a set of voice segments within successive time periods

A third aspect of the embodiments of the present invention provides a speech control apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above method when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described voice control method.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

according to the embodiment of the invention, the requirement of voice data on bandwidth can be reduced by compressing the voice data, the power consumption is reduced, and the control instruction corresponding to the voice data can be identified and analyzed by the cloud terminal by decompressing the compressed data through the central gateway.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flow chart of a first embodiment of a voice control method provided by the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a voice control method according to the present invention;

FIG. 3 is a flowchart illustrating a first embodiment of a voice control method according to the present invention;

FIG. 4 is a flowchart illustrating a first embodiment of a voice control method according to the present invention;

FIG. 5 is a schematic structural diagram of a voice control apparatus according to a first embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a second embodiment of a voice control apparatus provided in the present invention;

FIG. 7 is a schematic structural diagram of a voice control apparatus according to a third embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a fourth embodiment of a voice control apparatus provided in the present invention;

FIG. 9 is a schematic structural diagram of a voice control apparatus provided in the present invention;

fig. 10 is a second schematic structural diagram of the voice control apparatus provided in the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

The embodiment of the invention discloses a voice control method, a voice control device and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a voice control method according to a first embodiment of the present invention, specifically:

s101, collecting voice data;

the main body for collecting voice data is intelligent equipment, the intelligent equipment can be an air conditioner, a washing machine, a sound box, an electric cooker, a refrigerator, a water heater, an electric cooker, a lamp, a curtain or a door and the like, and the voice data in the environment can be collected through a microphone arranged on the intelligent equipment such as the sound box; wherein the voice data may be voice uttered by a non-specific person. If a voice is sent, the air conditioner is turned on, and the temperature is set to be 27 ℃. The voices uttered by unspecified persons have different timbres, high frequencies and different energy levels.

S102, compressing the voice data;

the voice data is compressed, and the requirement on transmission bandwidth can be effectively reduced through compression, so that the transmission efficiency of the voice data is improved. The compression encoding method for the voice data may be a waveform encoding method, a parameter encoding method, or a hybrid encoding method, but is not limited to the above encoding method.

S103, transmitting the compressed voice data to a central gateway, and decompressing the compressed voice data to obtain first transmission data;

the intelligent device such as a Bluetooth sound box can be set as a central gateway, and other intelligent devices in a home are controlled through the Bluetooth sound box; and transmitting the compressed voice data to a central gateway, and managing, decompressing or forwarding the compressed voice data by the central gateway.

When the Bluetooth sound box is set as a central gateway, the Bluetooth sound box can decompress the compressed voice data; bluetooth sound box self contains bluetooth wireless module, through bluetooth wireless module's data channel, can realize mutual transmission data, can transmit the speech data after the smart machine compression processing to Bluetooth sound box. The protocol adopted by the Bluetooth wireless module may be BLE protocol (Bluetooth Low Energy), RFCOMM protocol (cable replacement protocol), SPP protocol (Serial Port protocol), SCO link (synchronized connection ordered), or A2DP protocol (Bluetooth Audio transmission protocol), but is not limited to the above protocol.

Specifically, the voice data after the compression processing may be transmitted to the central gateway through the BLE protocol, and the voice data after the compression processing is decompressed to obtain the first transmission data. Cost and power consumption may be reduced by BLE transmission.

S104, transmitting the first transmission data from the central gateway to a cloud end, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis;

after receiving the first transmission data, the cloud end can perform matching analysis through a voice recognition model established by the cloud end, and a control instruction is obtained after analysis;

the voice recognition model is established by collecting a large number of voice samples before being recognized at a cloud end, the samples can come from different speakers, different voice characteristics are learned by the training model from the voice samples of the different speakers through repeated training, and the similarity is summarized through analysis to establish the voice recognition model.

And S105, transmitting the control instruction to a central gateway, transmitting the control instruction to the intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction.

If the speaker sends out voice such as 'small space one', the air conditioner is turned on, the temperature is set to 27 ℃, after cloud identification and analysis, the central gateway forwards the control command to the air conditioner one, and the air conditioner one executes the operation corresponding to the control command.

With reference to fig. 2, fig. 2 shows a flowchart of a voice control method according to a second embodiment of the present invention, and after step S101, the method further includes, S106, preprocessing the voice data.

The preprocessing operation can reduce noise of the voice data and enhance the voice data, so that the accuracy of voice recognition and analysis is improved.

With reference to fig. 3, fig. 3 shows a flowchart of a voice control method according to a third embodiment of the present invention, which further includes, after step S101,

s107, determining whether the voice data contains a wake-up word, and if so, executing wake-up operation; wherein the speech data comprises a set of speech segments within successive time periods.

If the air-conditioning mode is set, a speaker may not continuously send out voices of 'small space one' and 'turning on the air conditioner within a period of time, the set temperature is 27 ℃, and all voices of' small space one 'and' turning on the air conditioner within the period of time are collected, and the set temperature is 27 ℃.

In a period of time, a speaker may continuously send a voice "small space one", turn on the air conditioner, set the temperature to 27 ℃ ", collect the voice" small space one "in the period of time, turn on the air conditioner, set the temperature to 27 ℃".

When the intelligent equipment is not triggered or used for a long time, a low power consumption mode can be set, and the intelligent equipment only has partial functions in the low power consumption mode; when a user needs to use the intelligent equipment, the intelligent equipment needs to be awakened, and after the intelligent equipment is awakened, the intelligent equipment restores to a normal working state and has all functions.

Collecting the voice data, judging whether the voice data contains awakening words or not, and executing awakening operation, wherein the voice data contains the awakening words; the voice data does not contain the awakening words, and the intelligent device is still in a low power consumption mode.

Different intelligent devices can set different wake-up words, and the wake-up words can contain names of the intelligent devices or keywords of the intelligent devices, for example, the name of an air conditioner in a living room can be named as 'air conditioner one' or 'small space one'; the setting of the awakening words can be preset when leaving a factory, and can also be trained and configured by a user according to requirements, and better user experience can be brought to the user through free training and configuration of the user.

If the intelligent device is an air conditioner in a living room, the air conditioner is in a low power consumption mode, after a speaker sends out voice, voice data collected by the air conditioner is ' small air number ', the air conditioner recognizes a wakeup word ' small air number ' locally, the whole functions of the air conditioner are awakened, the air conditioner restores to a normal working state, the air conditioner continues to collect the voice data ' turns on the air conditioner, the temperature is set to 27 ℃, the air conditioner can convert the voice data collected after awakening into ' open air conditioner ', the temperature is set to 27 ℃, the compressed voice data is transmitted to a central gateway and a cloud terminal, the air conditioner can convert all the collected voice data into ' small air number ', the air conditioner is turned on, the compressed voice data is set to 27 ℃, the compressed voice data is transmitted to the central gateway and the cloud terminal, secondary verification identification of the cloud terminal is increased, and accuracy of voice identification is improved.

With reference to fig. 4, fig. 4 shows a schematic flow chart of a voice control method according to a fourth embodiment of the present invention, before step S107, the method further includes:

s108, confirming whether the sound energy of the voice data is larger than a preset energy threshold value; and when the sound energy of the voice data is larger than a preset energy threshold value, determining whether the voice data contains a wake-up word, and if so, executing wake-up operation.

And when the sound energy of the voice data is smaller than a preset energy threshold value, the intelligent equipment is still in a low power consumption mode.

An energy threshold value is preset, so that false awakening can be avoided;

the intelligent equipment can continuously collect voice, and when the voice energy is not greater than a preset energy threshold, the intelligent equipment enters a low power consumption mode to reduce the power consumption of the intelligent equipment; and when the sound energy is larger than the preset energy threshold value, the intelligent device is awakened.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In the embodiment of the present invention, a voice control apparatus is further provided, and the voice control apparatus includes modules for executing the steps in the embodiment corresponding to fig. 1. Please refer to fig. 1 for the related description of the corresponding embodiment.

Fig. 5 is a schematic structural diagram of a voice control apparatus according to a first embodiment of the present invention. As shown in fig. 5, the voice control apparatus 2 of this embodiment includes,

the acquisition module 21 is used for acquiring voice data;

a compression module 22, configured to compress and process the voice data;

the first transmission module 23 is configured to transmit the compressed voice data to the central gateway, and decompress the compressed voice data to obtain first transmission data;

the identification module 24 is configured to transmit the first transmission data from the central gateway to the cloud, identify and analyze the first transmission data, and obtain a control instruction after analysis;

and the execution module 25 is configured to transmit the control instruction to the central gateway, transmit the control instruction to the intelligent device from the central gateway, and control the intelligent device to execute an operation corresponding to the control instruction.

Fig. 6 is a schematic structural diagram of a second embodiment of the voice control apparatus provided in the present invention, and based on fig. 5, further includes a preprocessing module 26,

the preprocessing module 26 is configured to preprocess the voice data.

Fig. 7 is a schematic structural diagram of a voice control apparatus according to a third embodiment of the present invention. Based on fig. 5, a wake-up module 27 is also included,

the wake-up module 27 is configured to determine whether the voice data contains a wake-up word, and if the voice data contains the wake-up word, perform a wake-up operation; wherein the speech data comprises a set of speech segments within successive time periods.

Fig. 8 is a schematic structural diagram of a fourth embodiment of the voice control apparatus provided in the present invention. Based on fig. 7, a threshold determination module 28 is further included,

the threshold determination module 28 is configured to determine whether sound energy of the voice data is greater than a preset energy threshold; and when the sound energy of the voice data is larger than a preset energy threshold value, determining whether the voice data contains a wake-up word, and if so, executing wake-up operation.

Fig. 9 is a schematic diagram of a voice control apparatus according to an embodiment of the present invention. The central gateway 32 performs bidirectional wireless data transmission with the intelligent devices 31 in the home, such as air conditioners, electric lamps, curtains and the like, and the central gateway 32 can be connected with the cloud 33 through the Internet.

Fig. 10 is a schematic diagram of a voice control apparatus according to an embodiment of the present invention. As shown in fig. 10, the voice control device 6 includes: a processor 60, a memory 61 and a computer program 62, such as a speech controlled implementation program, stored in said memory 61 and executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps in the various speech control method embodiments described above, such as S101 to S105 shown in fig. 1. Alternatively, the processor 60, when executing the computer program 62, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 21 to 25 shown in fig. 5.

Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 62 in the speech control apparatus 6. For example, the computer program 62 may be divided into an acquisition module, a compression module, a first transmission module, an identification module, and an execution module (module in the virtual device), and each module has the following specific functions:

the acquisition module is used for acquiring voice data; the compression module is used for compressing and processing the voice data; the first transmission module is used for transmitting the compressed voice data to the central gateway and decompressing the compressed voice data to obtain first transmission data; the identification module is used for transmitting the first transmission data from the central gateway to the cloud, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis; and the execution module is used for transmitting the control instruction to the central gateway, transmitting the control instruction to the intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction.

The voice control device 6 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or other computing devices. The voice control device 6 may include, but is not limited to, a processor 60, and a memory 61. It will be understood by those skilled in the art that fig. 5 is merely an example of the voice control apparatus 6, and does not constitute a limitation of the voice control apparatus 6, and may include more or less components than those shown, or combine some components, or different components, for example, the voice control apparatus 6 may further include an input-output device, a network access device, a bus, etc.

The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may be an internal storage unit of the voice control apparatus 6, such as a hard disk or a memory of the voice control apparatus 6. The memory 61 may also be an external storage device of the voice control apparatus 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory 61 may also include both an internal storage unit of the voice control apparatus 6 and an external storage device. The memory 61 is used for storing the computer program and other programs and data required by the terminal device. The memory 61 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A voice control method, comprising the steps of:

collecting voice data;

compressing the voice data;

transmitting the compressed voice data to a central gateway, and decompressing the compressed voice data to obtain first transmission data;

transmitting the first transmission data from the central gateway to a cloud end, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis;

and transmitting the control instruction to a central gateway, transmitting the control instruction to intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction.

2. The voice control method according to claim 1, wherein the transmitting the compressed voice data to a central gateway, and decompressing the compressed voice data to obtain first transmission data includes:

and transmitting the compressed voice data to the central gateway through a low-power-consumption Bluetooth protocol, and decompressing the compressed voice data to obtain first transmission data.

3. The voice control method according to claim 1, further comprising, after the collecting the voice data: and preprocessing the voice data.

4. The voice control method according to claim 1, further comprising, after the collecting the voice data:

determining whether the voice data contains a wake-up word, and if so, executing wake-up operation;

wherein the speech data comprises a set of speech segments within successive time periods.

5. The voice control method according to claim 4, wherein the determining whether the voice data includes a wakeup word, and if the voice data includes the wakeup word, before performing the wakeup operation, further includes:

confirming whether the sound energy of the voice data is larger than a preset energy threshold value;

and when the sound energy of the voice data is larger than a preset energy threshold value, determining whether the voice data contains a wake-up word, and if so, executing wake-up operation.

6. A voice control apparatus, comprising:

the acquisition module is used for acquiring voice data;

the compression module is used for compressing and processing the voice data;

the first transmission module is used for transmitting the compressed voice data to the central gateway and decompressing the compressed voice data to obtain first transmission data;

the identification module is used for transmitting the first transmission data from the central gateway to the cloud, identifying and analyzing the first transmission data, and obtaining a control instruction after analysis;

and the execution module is used for transmitting the control instruction to the central gateway, transmitting the control instruction to the intelligent equipment from the central gateway, and controlling the intelligent equipment to execute the operation corresponding to the control instruction.

7. The voice-controlled apparatus according to claim 6, further comprising a preprocessing module for preprocessing the voice data.

8. The voice control device according to claim 6, further comprising a wake-up module for determining whether the voice data contains a wake-up word, and if so, performing a wake-up operation; wherein the speech data comprises a set of speech segments within successive time periods.

9. A speech control apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 5 when executing the computer program.

10. Computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.