CN110752973B

CN110752973B - Terminal equipment control method and device and terminal equipment

Info

Publication number: CN110752973B
Application number: CN201810819351.3A
Authority: CN
Inventors: 李楠
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2020-12-25
Anticipated expiration: 2038-07-24
Also published as: CN110752973A

Abstract

The invention is applicable to the technical field of communication, and provides a control method and device of terminal equipment and the terminal equipment. The method comprises the following steps: collecting voice information input by a user; converting the collected voice information into spectrogram information; and converting the spectrogram information into a control instruction, and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction. According to the invention, a new voice control mode is added to the intelligent home terminal system, and the mode can directly convert voice information into a format of a control instruction which can be recognized by the intelligent terminal system, so that a complex intermediate layer is removed; in addition, the method does not need to identify other language information except the equipment name in the user voice information and the corresponding voice information for short of the equipment, greatly reduces the complexity of the model, greatly reduces the calculated amount due to direct end-to-end conversion, also reduces the consumption of computer hardware resources, and has stronger usability and practicability.

Description

Terminal equipment control method and device and terminal equipment

Technical Field

The invention belongs to the technical field of communication, and particularly relates to a control method and device of terminal equipment and the terminal equipment.

Background

At present, with the continuous development of artificial intelligence technology, the field to which the artificial intelligence technology is applied is more and more extensive. A variety of artificial intelligence devices are flooded in life. The demand for such devices is increasing, and manual operation of such devices has not met the need for more intelligent, safe, and comfortable devices. Furthermore, functional modes for controlling the devices by voices appear in the intelligent terminals of the devices, but the voice control modes in the existing intelligent terminal function control modes are complex, and the voices need to be converted into characters firstly, then the characters need to be converted into instruction formats which can be recognized by the intelligent home terminal system, and then the intelligent home terminal system controls the states of the devices through the instruction formats which can be recognized by the system. The mode comprises the conversion of the middle layer, the calculation amount is large, and more software and hardware resources are consumed.

Therefore, it is necessary to provide a solution to the above problems.

Disclosure of Invention

In view of this, embodiments of the present invention provide a terminal device control method and apparatus, and a terminal device, so as to solve the problems that in the prior art, a control method for a terminal device requires a complex intermediate layer, which results in a large amount of calculation and consumes more software and hardware resources.

A first aspect of an embodiment of the present invention provides a method for controlling a terminal device, including:

collecting voice information input by a user;

converting the collected voice information into spectrogram information;

and converting the spectrogram information into a control instruction, and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction.

Optionally, before starting the functional mode corresponding to the control instruction in the terminal device, the method further includes:

judging the complexity level of the functional mode of the terminal equipment;

if the complexity level is higher than a preset level, translating the spectrogram information into a logic format instruction;

and if the complex grade is lower than or equal to the preset grade, classifying the spectrogram, and classifying each spectrogram into a corresponding class.

Optionally, if the complexity level is lower than or equal to a predetermined level, classifying the spectrograms, and classifying each spectrogram into a corresponding category, including:

and if the complex grade is lower than or equal to a preset grade, classifying the spectrogram by using a neural network based on the spectrogram information, and classifying each spectrogram into a corresponding class.

Optionally, if the complexity level is higher than a predetermined level, translating the spectrogram information into a logic format instruction, including:

if the complexity level is higher than the preset level, generating a state vector by the received spectrogram information through an encoder;

outputting the state vector as a logic format instruction through a decoder.

Optionally, the outputting the state vector as a logic format instruction by a decoder includes:

inputting the state vector into a neural network to obtain a first output result;

decoding the first output result through a decoder of the neural network to obtain probability space distribution information to which the first output result obtained through decoding belongs;

and based on the probability space distribution information, taking the output result with the largest area or volume on the probability space in the first output results as a final output result so as to realize the translation from the state vector to the logic format instruction.

Optionally, before responding to the control instruction, the method further includes:

detecting the volume of the voice information through a volume amplitude detector;

triggering a response to the control instruction when the volume is greater than a predetermined threshold.

A second aspect of an embodiment of the present invention provides a control apparatus for a terminal device, including:

the acquisition module is used for acquiring voice information input by a user;

the conversion module is used for converting the collected voice information into spectrogram information;

and the control module is used for converting the spectrogram information into a control instruction and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction.

A third aspect of embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that the processor implements the steps of the method in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of the first aspect.

In the embodiment of the invention, firstly, voice information input by a user is collected, then, the collected voice information is converted into spectrogram information, finally, the spectrogram information is converted into a control instruction, and a function mode corresponding to the control instruction in terminal equipment is started based on the control instruction. According to the invention, a new voice control mode is added to the intelligent home terminal system, and the mode can directly convert voice information into a format of a control instruction which can be recognized by the intelligent terminal system, so that a complex intermediate layer is removed; in addition, the method does not need to identify other language information except the equipment name in the user voice information and the corresponding voice information for short of the equipment, greatly reduces the complexity of the model, greatly reduces the calculated amount due to direct end-to-end conversion, also reduces the consumption of computer hardware resources, and has stronger usability and practicability.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a control method for a terminal device according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating an implementation of a control method for a terminal device according to a second embodiment of the present invention;

fig. 3 is a schematic flow chart illustrating an implementation of a control method for a terminal device according to an embodiment of the present invention;

fig. 4 is a block diagram of a control apparatus of a terminal device according to a third embodiment of the present invention;

fig. 5 is a schematic diagram of a terminal device according to a fourth embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when … …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Fig. 1 shows a schematic implementation flow diagram of a control method for a terminal device according to an embodiment of the present invention. As shown in fig. 1, the method for controlling the terminal device may specifically include the following steps S101 to S103.

Step S101: and collecting voice information input by a user.

The execution main body of the embodiment is a terminal device, and the terminal device may be an intelligent home device, for example, a home device such as an intelligent television, an air conditioner, and the like. The terminal equipment comprises an audio collector, and voice information of a user is automatically collected through the audio collector. It should be noted that the audio collector only collects the voice information corresponding to the device name and the device abbreviation, and does not collect other voice commands. For example, the voice information of "television, power on" is the voice information collected by the audio collector, and the "i come right now" belongs to other voice instructions, and the audio collector does not collect the voice information.

Step S102: and converting the collected voice information into spectrogram information.

The voice information sent by the user and collected by the audio collector is human natural language, and the computer can not directly process the voice information. Therefore, the terminal device needs to process the voice information and generate a spectrogram, which is a data form that can be recognized by the system.

Step S103: and converting the spectrogram information into a control instruction, and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction.

And converting the spectrogram information into a control instruction through an instruction conversion module, and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction. Preferably, before responding to the control instruction, the volume of the voice information is detected by a volume amplitude detector, and when the volume is larger than a preset threshold value, the response to the control instruction is triggered. For example, a volume amplitude detector is added to each device, a threshold of the volume amplitude detector is set, and when the volume received by the device is above the threshold of the detector, the corresponding function mode is responded, so that the false triggering of the same type of devices in a certain range is prevented in the process of executing the instruction by the device. The control command in this embodiment is a converged abstract form of natural language, and its formulation rule may be as shown in the following example, but is not limited to the following form, and an appropriate logical format may be defined according to specific situations when implemented. For example, unconditionally: the logic format corresponding to the power-on state is 'launch power', and under the condition: the logical format corresponding to "power on" is "launch power register ═ tv".

For example, in a scenario where a user wants to control the television to turn on through voice, the user needs to speak "turn on" first, convert the voice information of "turn on" into text information, convert the text information into an instruction format that can be recognized by the television, and then control the state of the television through the instruction format that can be recognized by the system. In the embodiment of the invention, a user directly inputs the voice information 'on', the terminal equipment converts the collected voice information 'on' into spectrogram information, converts the spectrogram information into a control instruction, and responds to the control instruction to finish the purpose of starting the television. In the embodiment, the conversion from the voice information to the machine instruction (control instruction) is directly completed, so that the complexity and the calculated amount of the model are greatly reduced, and the consumption of software and hardware resources is reduced.

Example two

On the basis of the first embodiment, fig. 2 shows a schematic flow chart of an implementation of the control method for the terminal device provided by the second embodiment of the present invention:

step S201: and collecting voice information input by a user.

Step S202: and converting the collected voice information into spectrogram information.

The implementation processes of steps S201 and S202 are similar to the implementation processes of S101 and S102, respectively, and are not described herein again.

Step S203: judging the complexity level of the functional mode of the terminal equipment, and if the complexity level is higher than a preset level, executing a step S204; if the complexity level is lower than or equal to the predetermined level, step S205 is performed.

In this embodiment, the complexity level of the functional mode of the terminal device is determined according to the type of the terminal device, for example, when it is determined that the current terminal device is a device with relatively few function control instructions, such as a washing machine, an intelligent curtain, and the like, it is determined that the complexity level of the functional mode of the terminal device is a low level, such as 1 level or 2 level; when the current terminal equipment is judged to be equipment with more function control instructions such as a television, an air conditioner and the like, the complex level of the function mode of the terminal equipment is judged to be a high level, such as 3 levels. In this embodiment, only the example in which the complexity level of the terminal device includes 3 levels is described, which is a low level represented by level 1 and level 2, and a high level represented by level 3. In a specific embodiment, the predetermined level is level 2, and when the complexity level of the functional mode of the terminal device (e.g., a television) is more than three levels (including three levels), the spectrogram information is translated into a logic format instruction; when the complexity level of the functional mode of the terminal device (such as a washing machine) is less than two (including two), the spectrograms are classified, and each spectrogram is classified into a corresponding category.

Step S204: and translating the spectrogram information into a logic format instruction.

Specifically, translating the spectrogram information into a logic format instruction includes:

step S301: and generating a state vector by the encoder according to the received spectrogram information.

The spectrogram information is used as output data, and the encoder generates the output data into a state vector, which may be a row vector or a column vector, but is not limited herein. Optionally, an attention algorithm mechanism is added in the encoding process, so as to prevent information loss caused by too long sentences in the voice information.

Step S302: outputting the state vector as a logic format instruction through a decoder.

Optionally, outputting the state vector as a logic format instruction through a decoder specifically includes:

and A1, inputting the state vector into a neural network to obtain a first output result.

A2, decoding the first output result through a decoder of the neural network, and acquiring probability space distribution information to which the decoded first output result belongs.

And A3, based on the probability space distribution information, taking the output result with the largest area or volume on the probability space in the first output result as the final output result, so as to realize the translation of the state vector to the logic format instruction.

For a1, a2, and A3, the state vectors generated in the encoder are mainly implemented to be output as logic format instructions through the decoder. The decoding process mainly processes the input state vector through a neural network, determines the maximum output result falling in the probability space according to the probability space distribution information to which the first output result obtained by decoding belongs and the probability space distribution information, and takes the maximum as the final output result, thereby realizing the translation from the state vector to the logic format instruction.

Step S205: and classifying the spectrograms, and classifying each spectrogram into a corresponding category.

Optionally, based on the spectrogram information, classifying the spectrograms by using a neural network, and classifying each spectrogram into a corresponding category.

Step S206: and converting the spectrogram information into a control instruction, and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction.

And the terminal equipment executes the function mode of the equipment according to the logic format instruction output by the decoding module.

In the embodiment of the invention, the terminal equipment is additionally and differentially processed according to the complexity of the functional mode, and the conversion from the voice instruction to the machine instruction is directly completed by using the end-to-end model, so that the complexity and the calculated amount of the model are further reduced, and the consumption of software and hardware resources is reduced.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

EXAMPLE III

Referring to fig. 4, a block diagram of a control apparatus of a terminal device according to a third embodiment of the present invention is shown, and for convenience of description, only the relevant parts related to the third embodiment of the present invention are shown. The control device 40 of the terminal device includes: an acquisition module 41, a conversion module 42 and a control module 43. The specific functions of each module are as follows:

an acquisition module 41, configured to acquire voice information input by a user;

a conversion module 42, configured to convert the collected voice information into spectrogram information;

and the control module 43 is configured to convert the spectrogram information into a control instruction, and start a function mode corresponding to the control instruction in the terminal device based on the control instruction.

Optionally, the control device 40 of the terminal device further includes:

the judging module is used for judging the complexity level of the functional mode of the terminal equipment;

the translation module is used for translating the spectrogram information into a logic format instruction if the complexity level is higher than a preset level;

and the classification module is used for classifying the spectrogram if the complexity grade is lower than or equal to a preset grade and classifying each spectrogram into a corresponding class.

Optionally, the translation module includes:

the encoding unit is used for generating a state vector of the received spectrogram information through an encoder if the complexity level is higher than a preset level;

and the output unit is used for outputting the state vector as a logic format instruction through a decoder.

Optionally, the classification module comprises:

and the classification unit is used for classifying the spectrogram by using a neural network based on the spectrogram information and classifying each spectrogram into a corresponding class if the complex grade is lower than or equal to a preset grade.

Optionally, the output unit includes:

a generating subunit, configured to generate, by an encoder, a state vector for the received spectrogram information if the complexity level is higher than a predetermined level;

and the output subunit is used for outputting the state vector as a logic format instruction through a decoder.

Optionally, the output subunit includes:

the input subunit is used for inputting the state vector into a neural network to obtain a first output result;

the decoding subunit is configured to decode the first output result through a decoder of the neural network, and acquire probability spatial distribution information to which the decoded first output result belongs;

and the translation unit is used for taking the output result with the largest area or volume on the probability space in the first output result as a final output result based on the probability space distribution information so as to realize the translation from the state vector to the logic format instruction.

Optionally, the control device 40 of the terminal device further includes:

the detection module is used for detecting the volume of the voice information through a volume amplitude detector;

and the triggering module is used for triggering and responding to the control instruction when the volume is greater than a preset threshold value.

Example four

Fig. 5 is a schematic diagram of a terminal device according to a fourth embodiment of the present invention, and only the relevant parts to the embodiment of the present invention are shown for convenience of illustration. As shown in fig. 5, the terminal device 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52 stored in said memory 51 and being executable on said processor 50, such as a control method program of a terminal device. The processor 50, when executing the computer program 52, implements the steps in the control method embodiments of the respective terminal devices described above, such as steps S101 to S103 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the modules in the above-described device embodiments, such as the functions of the modules 41 to 43 shown in fig. 4.

Illustratively, the computer program 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 52 in the terminal device 5. For example, the computer program 52 may be divided into an acquisition module, a conversion module and a control module, and the specific functions of each module are as follows:

the acquisition module is used for acquiring voice information input by a user;

The terminal device 5 may be a desktop computer, a notebook, a palm computer, or other computing devices. The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a terminal device and is not limiting and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.

The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer program and other programs and data required by the terminal device. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A control method of a terminal device, comprising:

collecting voice information input by a user;

converting the collected voice information into spectrogram information;

converting the spectrogram information into a control instruction, and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction;

before the functional mode corresponding to the control instruction in the terminal equipment is started, the method further comprises the following steps:

judging the complexity level of the functional mode of the terminal equipment;

2. The method of claim 1, wherein if the complexity level is lower than or equal to a predetermined level, classifying the spectrograms and classifying each spectrogram into a corresponding category comprises:

3. The method of claim 1, wherein translating the spectrogram information into a logical format instruction if the complexity level is higher than a predetermined level comprises:

outputting the state vector as a logic format instruction through a decoder.

4. The method for controlling a terminal device according to claim 3, wherein said outputting the state vector as a logic format instruction through a decoder comprises:

5. The method for controlling a terminal device according to any one of claims 1 to 4, wherein before the step of starting the functional mode corresponding to the control instruction in the terminal device based on the control instruction, the method further comprises:

6. A control apparatus of a terminal device, characterized by comprising:

the acquisition module is used for acquiring voice information input by a user;

the control module is used for converting the spectrogram information into a control instruction and starting a functional mode corresponding to the control instruction in the terminal equipment based on the control instruction;

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.