CN110580904A

CN110580904A - Method and device for controlling small program through voice, electronic equipment and storage medium

Info

Publication number: CN110580904A
Application number: CN201910931446.9A
Authority: CN
Inventors: 贺学焱; 欧阳能钧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2019-12-17

Abstract

the application discloses a method and a device for controlling an applet through voice, electronic equipment and a storage medium, and relates to a voice recognition technology. The specific implementation scheme is as follows: receiving a voice instruction for controlling an applet; recognizing the voice command according to a preset command set associated with the small program, and determining a control command; and sending a control instruction to the small program so that the small program responds to the control instruction. According to the method, the device, the electronic equipment and the storage medium for controlling the applet through the voice, the user instruction is identified through the preset instruction set associated with the applet, when the applet comprises personalized words, the control instruction corresponding to the applet can still be accurately identified due to the existence of the preset instruction set, so that the applet can be accurately controlled, and the problem that the applet is failed to control due to the error in instruction identification is solved.

Description

Method and device for controlling small program through voice, electronic equipment and storage medium

Technical Field

the present disclosure relates to computer technology, and more particularly, to speech recognition technology.

Background

at present, a plurality of small program platforms exist, and developers can research and develop small programs and publish the small programs on the platforms. Also, to facilitate the user's use of these applets, a voice-controlled manner of operating the applets may be provided.

in the prior art, a voice command can be recognized, and then a recognition result is sent to an applet, so that the applet executes a corresponding action according to the recognition result.

However, many names and operations of applets often have some personalized vocabulary, which results in that the applets cannot be operated according to the speech recognition result in the prior art.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, and a storage medium for controlling an applet by voice, so as to solve a technical problem that the applet cannot be operated according to a voice recognition result in the prior art due to the existence of personalized vocabulary in the name and operation of the applet.

A first aspect of the present disclosure provides a method for controlling an applet by voice, including:

Receiving a voice instruction for controlling an applet;

Recognizing the voice command according to a preset command set associated with the applet, and determining a control command;

And sending the control instruction to the small program so that the small program responds to the control instruction.

the voice command sent by the user is identified through the command set associated with the small program, the control command corresponding to the small program can be obtained, and the small program can be accurately operated through the control command corresponding to the small program.

In an optional embodiment, the method further comprises: the preset instruction set is received and associated with the applet. By associating the applet with the preset instruction set, the preset instruction set associated with the applet can be acquired when the applet is controlled by the voice instruction, and then the voice instruction sent by the user can be identified based on the preset instruction set associated with the applet when the applet is controlled by the voice instruction.

In an alternative embodiment, the method further comprises:

When the small program is started, adding a preset control instruction in a language model according to the preset instruction set, and setting the preset control instruction as an instruction with the highest confidence level in the language model;

The recognizing the voice command according to a preset command set associated with the applet and determining a control command comprise:

and identifying the voice instruction, and determining the control instruction corresponding to the voice instruction according to an identification result and the confidence degree of the instruction in the language model.

By adding the instructions included in the preset instruction set in the language model, the voice instructions of the user can be identified together based on the instructions related to the small program and the common instructions, and a better identification effect is obtained.

Secondly, the confidence coefficient of the instructions included in the preset instruction set is set to be the highest, when the recognition result is determined, the voice instructions can be recognized by combining the confidence coefficient of each instruction in the language model, and the instructions corresponding to the small programs can be used as control instructions, so that the small programs can be controlled more accurately.

In an optional implementation manner, the recognizing the voice instruction and determining the control instruction corresponding to the voice instruction according to the recognition result and the confidence of the instruction in the language model includes:

recognizing the voice instruction to obtain syllable information;

and determining a corresponding matching instruction in the language model according to the syllable information, and determining an instruction with the highest confidence level in the matching instruction as the control instruction.

The instructions included in the preset instruction set are set to be the highest in advance, and when the control instructions are determined, the instructions with the highest confidence degree in the matched instructions are determined to be the control instructions, namely the instructions included in the preset instruction set are preferentially used as the control instructions, and the instructions included in the preset instruction set are the instructions corresponding to the small programs, so that the small programs can be controlled more accurately.

In an alternative embodiment, the receiving voice instructions for controlling an applet comprises:

And receiving a voice instruction for controlling the small program through a voice interaction system.

In an alternative embodiment, the preset control instructions in the language model are cleared when the applet is closed.

When the small program is in a closed state, the instruction associated with the small program in the language model is cleared, so that the control interference on other programs caused by the fact that a voice instruction sent by a user and used for controlling other programs is mistakenly recognized as the instruction associated with the small program can be avoided.

A second aspect of the present disclosure provides an apparatus for controlling an applet by voice, comprising:

The receiving module is used for receiving a voice instruction for controlling the small program;

the recognition module is used for recognizing the voice command according to a preset command set associated with the small program and determining a control command;

and the sending module is used for sending the control instruction to the small program so as to enable the small program to respond to the control instruction.

a third aspect of the present disclosure provides an electronic device that controls an applet by voice, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any of the methods of controlling an applet by speech according to the first aspect.

A fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any one of the methods for controlling a applet by speech according to the first aspect.

According to the method, the device, the electronic equipment and the storage medium for controlling the applet through the voice, the user instruction is recognized through the preset instruction set associated with the applet, when the applet comprises personalized words, the control instruction corresponding to the applet can still be recognized accurately due to the existence of the preset instruction set, so that the applet can be controlled accurately, and the problem that the applet is controlled unsuccessfully due to wrong instruction recognition is solved.

other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a diagram illustrating a system architecture according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for controlling an applet by speech according to an exemplary embodiment of the present application;

FIG. 2A is a schematic view of a first interface shown in an exemplary embodiment of the present application;

FIG. 2B is a second interface schematic shown in an exemplary embodiment of the present application;

FIG. 2C is a third interface schematic shown in an exemplary embodiment of the present application;

FIG. 3 is a flow chart illustrating a method for controlling an applet by speech according to another exemplary embodiment of the present application;

FIG. 3A is a schematic diagram illustrating a voice command recognition process according to an exemplary embodiment of the present application;

FIG. 4 is a block diagram of an apparatus for controlling applets by speech according to an exemplary embodiment of the present application;

FIG. 5 is a block diagram of an apparatus for controlling an applet by speech according to another exemplary embodiment of the present application;

fig. 6 is a block diagram of an electronic device controlling an applet through speech according to another exemplary embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 is a diagram illustrating a system architecture according to an exemplary embodiment of the present application.

as shown in fig. 1, the system may include at least one server 11 and may further include at least one terminal device 12.

A network is provided between the server 11 and the terminal device 12, which may comprise various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 11 over a network using a terminal device 12. Various applications may be installed on end device 12 such as voice interaction type applications, applications for using applets, shopping type applications, search type applications, instant messaging tools, mailbox clients, social platform software, and the like.

The terminal device 12 may be hardware or software. When the terminal device 12 is hardware, it may be any electronic device capable of processing speech and controlling an applet according to the processing result, including but not limited to a smart phone, a tablet computer, etc., a laptop portable computer, a desktop computer, etc. When the terminal device 12 is software, it can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

the server 11 may be a server that provides various services, such as a background server that provides support for information displayed on the terminal device 12. The background server may analyze and process the received data such as the voice command, and feed back a processing result (e.g., accent information) to the terminal device.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

it should be noted that the method for controlling the applet through voice provided in the embodiment of the present application may be executed by the terminal device 12, or may be executed by the server 11. Accordingly, means for controlling the applet by voice may be provided in the terminal device 12, and also in the server 11.

at present, a user can start an applet on a terminal side, and if a platform where the applet is located is associated with a voice interaction program, the applet can be controlled in a voice mode. For example, after the user opens the applet, the voice control function of the applet may also be opened. The user can speak the voice command, the voice interaction program recognizes the voice command to obtain the control command, and the control command is sent to the small program, so that the small program responds to the corresponding control command.

However, since there are personalized names or vocabularies in many applets, the instructions spoken by the user are recognized as common vocabularies, causing the applets to fail to respond to the instructions in response. For example, one applet name is "Maidonghao," which the voice interactive program recognizes as "Maidonghao" when the user speaks the content.

according to the scheme provided by the embodiment of the application, the voice set corresponding to the applet is associated in advance, when the voice instruction of the user is identified, the voice instruction is identified according to the preset instruction set associated with the applet, and then the control instruction corresponding to the applet can be identified, so that the problem that the applet is controlled in a voice mode and fails is solved.

Fig. 2 is a flowchart illustrating a method for controlling an applet through speech according to an exemplary embodiment of the present application.

As shown in fig. 2, the method for controlling an applet through voice according to this embodiment includes:

step 201, receiving a voice instruction for controlling an applet.

The method provided by the present embodiment may be executed by an electronic device with computing capability, for example, may be a terminal device as shown in fig. 1, and may also be a server as shown in fig. 1.

Specifically, if the terminal device executes the method provided in this embodiment, a voice interaction program may be set in the terminal device, and the program may be used to recognize a voice instruction of a user and control an applet according to a recognition result.

Further, if the server executes the method provided by the embodiment, the server may obtain the voice instruction of the user through the terminal device. In the terminal device, a front-end program of the server may be provided, and the voice instruction may be transmitted to the background server through the program. The background server can identify the program, and the background server can send the identification result to the background server corresponding to the small program so as to control the small program.

In practical use, an applet (mlni Program) is an application that can be used without download and installation. Developers can develop these applets relying on an applet platform in which users can use the applets.

Wherein, the user can open an applet and open the function of controlling the applet through voice. For example, an operable control for starting the voice control function may be set in the page of the applet, and the user may operate the operable control to start the voice control function of the applet.

specifically, when the user terminal detects that the voice control function needs to be started, the voice interaction program can be started, so that the user instruction is identified through the voice interaction program, and then the applet is controlled.

the voice interaction program may be a program provided by a platform carrying an applet, or may be a program independent from the platform. The present embodiment does not limit this.

Fig. 2A is a schematic view of a first interface according to an exemplary embodiment of the present application.

Fig. 2B is a second interface diagram according to an exemplary embodiment of the present application.

as shown in fig. 2A, a user may operate the terminal device to start an applet, so that the terminal device displays an operation interface corresponding to the applet.

In the operation interface, a key for starting a voice control function can be arranged, and a user can operate the key to call up a voice interaction program.

An applet interface that turns on the voice control function may be as shown in fig. 2B.

wherein, the user can send out voice command for controlling the currently opened applet. In one embodiment, the voice command can be received and recognized by the terminal device, and in another embodiment, the received voice command can be uploaded to the server by the terminal device, so that the server receives the voice command and recognizes the voice command.

step 202, recognizing the voice command according to a preset command set associated with the applet, and determining a control command.

if the voice command is identified by the terminal device, a preset command set associated with the applet may be stored in the terminal device, or the terminal device may obtain the preset command set from the server side.

Specifically, if the server recognizes the voice command, a preset command set associated with the applet may be stored in the server.

Further, the preset instruction set may be set by a developer based on the specific situation of the applet, and the developer may upload the instruction set to the server. The server may be, for example, a server of a platform on which the applet is located, or a server for performing speech recognition.

In actual application, the preset instruction set comprises at least one instruction, and the instructions correspond to the small programs. For example, the preset instruction set includes 100 instructions, and then all of the 100 instructions can control the applet to perform certain operations.

When the voice instruction is identified, a syllable corresponding to the voice instruction can be identified, and if the syllable is matched with one instruction in a preset instruction set, the matched instruction can be used as a control instruction.

Specifically, the preset instruction set may also be an instruction corresponding to the applet and including a personalized vocabulary, so that the content in the preset instruction set can be reduced. In this embodiment, the instruction in the preset instruction set may be preferentially used to match with the syllable of the voice instruction to obtain a matching result, and if there is no matching instruction, the instruction in the language model may be used to match with the syllable, and the matching instruction is used as the control instruction.

furthermore, a language module can be arranged for recognizing the voice command. The language module may include a common instruction, and after an applet is opened, a preset instruction in a preset instruction set associated with the applet may be added to the language module, and the confidence level thereof may be set to be the highest. For example, the confidence is 5 at the highest, and the confidences of the preset instructions in the preset instruction set are all set to 5.

in practical application, the matched matching instruction can be determined in the language module according to the syllable corresponding to the voice instruction, and then the matching instruction with the highest confidence coefficient is determined as the control instruction. For example, the instructions that match syllables include instruction 1, instruction 2, and instruction 3. Wherein, the instruction 3 is added into the language module based on a preset instruction set, the confidence coefficient is 5, and the confidence coefficients of the instruction 1 and the instruction 2 are 4 and 3 respectively, and then the instruction 3 is determined as the control instruction.

If the confidence degrees of the two instructions are the same and one of the two instructions is added into the language module through a preset voice instruction, the preset instruction can be placed in front of the other instruction, so that the preset instruction can be acquired first.

When the voice command is recognized through the step, the control command which is associated with the small program can be recognized, and the small program can be controlled correctly through the control command.

step 203, sending a control instruction to the applet, so that the applet responds to the control instruction.

Specifically, the control instruction may be sent to the applet through the terminal device or the server. For example, if the terminal device recognizes a voice command to obtain a control command, the terminal device may transmit the control command to the applet.

Further, if the voice command is recognized by the server to obtain the control command, the control command may be sent to the applet by the server.

Fig. 2C is a third interface diagram according to an exemplary embodiment of the present application.

In actual application, after receiving the corresponding control instruction, the applet can respond to the control instruction so as to respond to a voice instruction sent by a user. For example, when the interface of the terminal device is as shown in fig. 2B, and the user sends a voice command "start ordering", the terminal device may receive the voice command through a microphone and the like, and may recognize the voice command, or the terminal device uploads the voice command to the server, and the server recognizes the voice command to obtain a control command, and then feeds the control command back to the applet of the terminal device, so that the applet may respond to the corresponding control command to open the menu page, which is specifically as shown in fig. 2C.

The method provided by the present embodiment is used for controlling an applet by voice, and is performed by a device provided with the method provided by the present embodiment, which is typically implemented in hardware and/or software.

The method for controlling the small program through voice provided by the embodiment comprises the steps of receiving a voice instruction for controlling the small program; recognizing the voice command according to a preset command set associated with the small program, and determining a control command; and sending a control instruction to the small program so that the small program responds to the control instruction. According to the method provided by the embodiment, the user instruction is recognized through the preset instruction set associated with the applet, and when the applet comprises the personalized vocabulary, the control instruction corresponding to the applet can still be accurately recognized due to the existence of the preset instruction set, so that the applet can be accurately controlled, and the problem that the applet is failed to control due to the error in instruction recognition is solved.

fig. 3 is a flowchart illustrating a method for controlling an applet by voice according to another exemplary embodiment of the present application.

As shown in fig. 3, the method for controlling an applet through voice according to this embodiment includes:

Step 301, a preset instruction set is received and associated with an applet.

If the method provided by the embodiment is executed by the terminal device, the developer can upload the preset instruction set to the server and then issue the preset instruction set to the terminal device through the server. For example, the preset instruction set may be acquired by the terminal device from the server side after the user starts the applet.

if the method provided by the embodiment is executed by the server, the developer may upload the preset instruction set to the server. The server may be, for example, a background server of an applet, or may be a server provided with a voice recognition method.

Specifically, when uploading the preset instruction set, the developer may also specify the applet corresponding to the instruction set, for example, may open an applet development page and upload the preset instruction set therein. The server may associate the applet with the instruction set, and when the server issues the applet to the terminal device, the server may also carry an applet identifier corresponding to the preset instruction set, so that the terminal device associates the applet with the instruction set.

and step 302, when the applet is started, adding a preset control instruction in the language model according to a preset instruction set, and setting the preset control instruction as an instruction with the highest confidence level in the language model.

Fig. 3A is a schematic diagram illustrating a voice command recognition process according to an exemplary embodiment of the present application.

As shown in fig. 3A, in the voice instruction recognition process, the endpoint detection may be performed on the input voice first to extract an effective voice segment, and then the effective voice segment is subjected to signal processing, such as windowing and framing. And then extracting acoustic features in each signal based on the acoustic model, and then performing syllable conversion on the acoustic features to obtain syllables corresponding to the voice instructions. And then determining instructions corresponding to the syllables according to the language model, determining confidence degrees corresponding to the instructions according to the language model, and determining control instructions based on the confidence degrees of the instructions.

In the method provided by this embodiment, when the applet starts, a preset control instruction may be added to the language model according to a preset instruction set, for example, the preset control instruction may be added to the language model shown in fig. 3A.

For example, after the terminal device detects that the applet is started, the terminal device may read an instruction included in the preset instruction set and add the instruction as a preset control instruction to the language model. For another example, when the terminal device interacts with the backend server of the applet to request data, the backend server of the applet may read the instructions included in the preset instruction set and add the instructions to the language model as the preset control instructions. Or, the applet background server may indicate the background server of the voice interaction program, so that the applet background server adds a preset control instruction to the language module, and identifies the voice instruction uploaded by the terminal device according to the modified language module.

The electronic device may further set a confidence level for the added preset control instruction, and specifically, the confidence level of the preset control instruction may be set to be the highest.

Specifically, the language model may include some instructions in advance, and these instructions also have confidence. The instructions may be in the form of strings and the confidence levels may be in the form of probabilities, in which case the language model may be constructed as a probability distribution p(s) of a string s, where p(s) actually reflects the probability that s appears as an instruction. For example, the confidence level of the added preset control instruction may be set to 100%.

further, an instruction path may be constructed for the instructions in the language model, and the instructions may be specifically sorted according to the confidence. For example, instructions may be ordered from high to low confidence, and instructions with the same confidence may be ranked at the same level. The instruction can include both the original instruction of the language model and the added preset control instruction. Since the confidence of the preset control instruction is set to be the highest, all the preset control instructions should be included at the start position of the instruction path.

Step 303, receiving a voice instruction for controlling the applet through the voice interaction system.

In practical application, the voice interaction system can be started through the small program, and then the voice instruction is received through the voice interaction system. For example, the user may operate an applet to turn on a voice control function, thereby invoking the voice interaction system.

After the voice interaction system is started, a microphone of the terminal equipment can be started, and a voice instruction input by a user is collected through the microphone. The voice command can be identified by the terminal device, or the voice command can be uploaded to the server by the terminal device and identified by the server.

and 304, identifying the voice instruction, and determining a control instruction corresponding to the voice instruction according to the identification result and the confidence degree of the instruction in the language model.

Specifically, in the process of recognizing the voice command, the voice command itself may be processed first, for example, the voice command may be recognized to obtain syllable information. And determining a control instruction according to the processing result.

Furthermore, the corresponding instruction can be determined in the language model according to the syllable information, and the instruction with the highest confidence level in the instructions is determined as the control instruction.

In practice, the instructions in the language model may include original instructions and may also include preset control instructions added thereto in step 302. There may be a plurality of instructions matching the determined syllable information, for example, if the syllable information is "maidanglao", then "mcdonglou" and "maidonglou" are both matching instructions corresponding to the syllable information. At this time, the control command may be determined according to the confidence degrees corresponding to the two matching commands. For example, the confidence level of "mai dao" added as a preset control instruction to the language model is highest, and therefore "mai dao" can be determined as a control instruction.

For example, traversal can be performed in the instruction path through the syllable information, specifically traversal is performed in the order from high confidence to low confidence, and the instruction matched with the syllable information, which is traversed first, is determined as the control instruction.

Step 305, sending a control instruction to the applet, so that the applet responds to the control instruction.

Specifically, the implementation and effect of step 305 are similar to those of step 103, and are not described again.

After step 305, step 303 may also be continued to continue controlling the applet by voice.

and step 306, when the applet is closed, clearing the preset control instruction in the language model.

furthermore, when the small program is closed, the preset control instruction in the language model can be cleared. Thereby enabling the electronic device to recognize the user's voice command based on the general command.

In practical application, the terminal device may clear the preset control instruction in the language model after detecting that the applet is closed. In addition, the terminal equipment can interact with the applet background server when the applet is closed, so that the server knows that the applet is closed and clears a preset control instruction corresponding to the applet in the language model.

Fig. 4 is a block diagram of an apparatus for controlling an applet by voice according to an exemplary embodiment of the present application.

As shown in fig. 4, the apparatus for controlling an applet through voice according to this embodiment includes:

A receiving module 41, configured to receive a voice instruction for controlling an applet;

The recognition module 42 is configured to recognize the voice command according to a preset command set associated with the applet, and determine a control command;

A sending module 43, configured to send the control instruction to the applet, so that the applet responds to the control instruction.

The device for controlling the applet through the voice provided by the embodiment comprises: the receiving module is used for receiving a voice instruction for controlling the small program; the recognition module is used for recognizing the voice command according to a preset command set associated with the small program and determining a control command; and the sending module is used for sending a control instruction to the small program so that the small program responds to the control instruction. According to the device provided by the embodiment, the user instruction is recognized through the preset instruction set associated with the applet, and when the applet comprises personalized words, the control instruction corresponding to the applet can still be accurately recognized due to the existence of the preset instruction set, so that the applet can be accurately controlled, and the problem that the applet is failed to control due to the error in instruction recognition is solved.

The specific principle and implementation of the apparatus for controlling an applet through voice provided in this embodiment are similar to those of the embodiment shown in fig. 2, and are not described herein again.

Fig. 5 is a block diagram of an apparatus for controlling an applet by voice according to another exemplary embodiment of the present application.

As shown in fig. 5, in the apparatus for controlling an applet through voice according to this embodiment, on the basis of the foregoing embodiment, the receiving module 41 is further configured to:

The preset instruction set is received and associated with the applet.

the apparatus further comprises an adding module 44 for:

The identification module 42 is specifically configured to:

The identification module 42 includes:

The recognition unit 421 is configured to recognize the voice command to obtain syllable information;

The determining unit 422 is configured to determine a corresponding matching instruction in the language model according to the syllable information, and determine an instruction with a highest confidence level in the matching instruction as the control instruction.

The receiving module 41 is specifically configured to:

The apparatus further comprises a purge module 45 for:

and when the applet is closed, clearing the preset control instruction in the language model.

The specific principle and implementation of the apparatus for controlling an applet through voice provided in this embodiment are similar to those of the embodiment shown in fig. 3, and are not described herein again.

according to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, is a block diagram of an electronic device for controlling an applet through speech according to an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic device for controlling an applet by voice includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of the GU I on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of controlling applets by speech provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the method of controlling a applet by speech provided herein.

the memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the receiving module 41, the identifying module 42, and the transmitting module 43 shown in fig. 4) corresponding to the method of controlling a applet by voice in the embodiments of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implementing the method of controlling the applet by voice in the above-described method embodiments.

the memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device by the voice control applet, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to an electronic device that controls the applet through speech. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device that controls the applet by voice may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

the input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus by voice control of the applet, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

the computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The present embodiment also provides a computer program comprising a program code for performing any of the methods of controlling an applet by speech as described above when the computer program is run by a computer.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. a method for controlling an applet by speech, comprising:

Receiving a voice instruction for controlling an applet;

2. the method of claim 1, further comprising: the preset instruction set is received and associated with the applet.

3. The method of claim 1, further comprising:

4. The method according to claim 3, wherein the recognizing the voice command and determining the control command corresponding to the voice command according to the recognition result and the confidence level of the command in the language model comprises:

Recognizing the voice instruction to obtain syllable information;

5. The method of claim 1, wherein receiving voice instructions for controlling an applet comprises:

6. the method of claim 3, further comprising:

7. An apparatus for controlling an applet by speech, comprising:

8. The apparatus of claim 7, wherein the receiving module is further configured to:

the preset instruction set is received and associated with the applet.

9. The apparatus of claim 7, further comprising an adding module 44 configured to:

the identification module 42 is specifically configured to:

10. the apparatus of claim 9, wherein the identification module 42 comprises:

The determining unit 422 is configured to determine a corresponding matching instruction in the language model according to the syllable information, and determine a matching instruction with the highest confidence level in the instruction as the control instruction.

11. The apparatus according to claim 7, wherein the receiving module 41 is specifically configured to:

12. The apparatus of claim 9, further comprising a purge module 45 configured to:

13. An electronic device for controlling an applet by speech, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.