CN111309283A - Voice control method and device for user interface, electronic equipment and storage medium - Google Patents

Voice control method and device for user interface, electronic equipment and storage medium Download PDF

Info

Publication number
CN111309283A
CN111309283A CN202010220645.1A CN202010220645A CN111309283A CN 111309283 A CN111309283 A CN 111309283A CN 202010220645 A CN202010220645 A CN 202010220645A CN 111309283 A CN111309283 A CN 111309283A
Authority
CN
China
Prior art keywords
information
control
controlled
user interface
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010220645.1A
Other languages
Chinese (zh)
Other versions
CN111309283B (en
Inventor
李扬
王雷
李士岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010220645.1A priority Critical patent/CN111309283B/en
Publication of CN111309283A publication Critical patent/CN111309283A/en
Application granted granted Critical
Publication of CN111309283B publication Critical patent/CN111309283B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The application discloses a voice control method and device of a user interface, electronic equipment and a storage medium, and relates to a voice technology in the technical field of computers. The specific implementation scheme is as follows: the method comprises the steps of firstly acquiring voice information input by a user, then converting the voice information into text information, then generating a control instruction according to the text information and characteristic information in a user interface to be controlled, and finally triggering a control to be controlled corresponding to the control instruction in the user interface to be controlled in response to the control instruction, so that the user can complete interface operation of the user interface to be controlled in a voice mode. Therefore, the user interface is executed through voice, so that the user can conveniently interact with the service equipment more directly, and more compatible and easier service experience is provided for the user.

Description

Voice control method and device for user interface, electronic equipment and storage medium
Technical Field
The present application relates to a voice technology in the field of computer technologies, and in particular, to a method and an apparatus for controlling a user interface with voice, an electronic device, and a storage medium.
Background
With the rapid development of voice technology, the voice technology has wide application in the field of electronic devices (e.g., the field of service devices).
In the existing interactive mode for service equipment, a single-touch interactive mode, a single-voice interactive mode or an interactive mode of simple superposition and combination of the two modes is generally adopted. In addition, for voice interaction, it is also generally required that the user speak a specific wake-up word to perform voice service.
Therefore, at present, the control mode for the service device, especially the control mode for the user interface on the service device, is single, and the business logic requirements of multiple rounds of interaction can be met by combining a touch mode.
Disclosure of Invention
The method and the device for controlling the voice of the user interface, the electronic equipment and the storage medium can directly control the user interface to be controlled according to the voice information input by the user, and can meet the complex business logic requirements of the user.
In a first aspect, an embodiment of the present application provides a method for controlling a user interface with voice, including:
determining text information according to the acquired voice information;
generating a control instruction according to the text information and feature information in the user interface to be controlled, wherein the feature information is characteristic information corresponding to each control in the user interface to be controlled;
and responding to the control instruction, and triggering a control to be controlled corresponding to the control instruction in the user interface to be controlled.
In the embodiment, the voice information input by the user is acquired, the voice information is converted into the text information, the control instruction is generated according to the text information and the characteristic information in the user interface to be controlled, and the control to be controlled corresponding to the control instruction in the user interface to be controlled is triggered in response to the control instruction, so that the user can complete the interface operation of the user interface to be controlled in a voice mode. Therefore, the user interface is executed through voice, so that the user can conveniently interact with the service equipment more directly, and more compatible and easier service experience is provided for the user.
In a second aspect, an embodiment of the present application provides a voice control apparatus for a user interface, including:
the acquisition module is used for acquiring voice information;
the recognition module is used for determining text information according to the voice information;
the generating module is used for generating a control instruction according to the text information and characteristic information in the user interface to be controlled, wherein the characteristic information is characteristic information corresponding to each control in the user interface to be controlled;
and the control module is used for responding to the control instruction and triggering the control to be controlled corresponding to the control instruction in the user interface to be controlled.
In the embodiment, the voice information input by the user is acquired, the voice information is converted into the text information, the control instruction is generated according to the text information and the characteristic information in the user interface to be controlled, and the control to be controlled corresponding to the control instruction in the user interface to be controlled is triggered in response to the control instruction, so that the user can complete the interface operation of the user interface to be controlled in a voice mode. Therefore, the user interface is executed through voice, so that the user can conveniently interact with the service equipment more directly, and more compatible and easier service experience is provided for the user.
In a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor;
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of voice control of a user interface as claimed in any one of the first aspects.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the voice control method of the user interface according to any one of the first aspect.
In a fifth aspect, an embodiment of the present application provides a program product, where the program product includes: a computer program stored in a readable storage medium, the computer program being readable from the readable storage medium by at least one processor of a server, execution of the computer program by the at least one processor causing the server to perform the method of voice control of a user interface of any of the first aspects.
According to the technology of the application, the user can complete the interface operation of the user interface to be controlled in a voice mode, more direct interaction between the user and the service equipment can be facilitated, and further more compatible and easier service experience is provided for the user.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a diagram of an application scenario in which a voice control method of a user interface according to an embodiment of the present application may be implemented;
FIG. 2 is a schematic diagram according to a first embodiment of the present application;
FIG. 3 is a schematic diagram according to a second embodiment of the present application;
FIG. 4 is an interface interaction diagram of the second embodiment;
FIG. 5 is another interface interaction diagram of the second embodiment;
FIG. 6 is a further interface interaction diagram of the second embodiment;
FIG. 7 is a schematic illustration according to a third embodiment of the present application;
FIG. 8 is a schematic illustration according to a fourth embodiment of the present application;
FIG. 9 is a schematic illustration according to a fifth embodiment of the present application;
fig. 10 is a block diagram of an electronic device for implementing a voice control method of a user interface according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the existing interactive mode for service equipment, a single-touch interactive mode, a single-voice interactive mode or an interactive mode of simple superposition and combination of the two modes is generally adopted. In addition, for voice interaction, it is also generally required that the user speak a specific wake-up word to perform voice service. Therefore, at present, the control mode for the service device, especially the control mode for the user interface on the service device, is single, and the business logic requirements of multiple rounds of interaction can be met by combining a touch mode.
In order to solve the technical problems, the application provides a voice control method and device for a user interface, an electronic device and a storage medium, the voice information input by a user is obtained first, then the voice information is converted into text information, a control instruction is generated according to the text information and feature information in a user interface to be controlled, and finally a control to be controlled corresponding to the control instruction in the user interface to be controlled is triggered in response to the control instruction, so that the user can complete interface operation of the user interface to be controlled in a voice mode. Therefore, the user interface is executed through voice, so that the user can conveniently interact with the service equipment more directly, and more compatible and easier service experience is provided for the user.
Fig. 1 is an application scenario diagram of a voice control method of a user interface according to an embodiment of the present application. As shown in fig. 1, the voice control method of the user interface provided by the present embodiment may be applied to the interaction between the user 100 and the electronic device 200. The user 100 controls the user interface to be controlled displayed on the electronic apparatus 200 by voice. The electronic device 200 may be a robot, a service terminal, a computer, a smart phone, and the like. Furthermore, multiple rounds of interaction with the user 100 may also be achieved in the user interface of the electronic device 200 by displaying a form of "digital person".
For example, when the electronic device 200 is a banking terminal, the user 100 may control a loan application interface displayed in the banking terminal in a voice manner, so that the user may fill in the loan amount, select a loan term, and determine selected information in a voice control manner, so as to complete interaction with the loan application interface only through voice input.
For another example, when the electronic device 200 is a food service terminal, the user 100 may control an ordering interface displayed in the food service terminal in a voice manner, so that the number of people who have a meal is filled in, dishes are selected, and selected information is determined in the voice control manner, so that the user can complete interaction with the ordering interface only through voice input.
Fig. 2 is a schematic diagram according to a first embodiment of the present application. As shown in fig. 2, the voice control method for a user interface provided in this embodiment includes:
and S101, determining text information according to the acquired voice information.
In this step, a microphone may be disposed on the electronic device, and the voice information input by the user is acquired through the microphone. After the electronic device acquires the voice information, the electronic device can recognize the voice information in a voice recognition mode to determine text information corresponding to the voice information.
And S102, generating a control instruction according to the text information and the characteristic information in the user interface to be controlled.
Specifically, the control instruction may be generated according to the recognized text information and the feature information in the user interface to be controlled. The characteristic information may be characteristic information corresponding to each control in the user interface to be controlled, and may be, for example, a name or attribute information of each control.
In a possible implementation manner, for each user interface to be controlled, a group of data lists used for storing characteristic information corresponding to each control in the user interface to be controlled is corresponding, after the text information is obtained, the data lists can be retrieved according to the text information, so that the control to be controlled corresponding to the text information is determined, and the specific operation on the control to be controlled is determined continuously according to the text information, so that the control instruction is generated.
And S103, responding to the control instruction, and triggering a control to be controlled corresponding to the control instruction in the user interface to be controlled.
After the control instruction is generated, the electronic equipment responds to the control instruction and triggers a control to be controlled corresponding to the control instruction in the user interface to be controlled. For example, when the control instruction is an instruction for clicking a "confirm" button, a control corresponding to the "confirm" button is triggered in response to the instruction, so as to realize the effect of controlling each control in the user interface to be controlled by voice.
In the embodiment, the interface operation of the user interface to be controlled can be completed by the user in a voice mode by acquiring the voice information input by the user, converting the voice information into the text information, generating the control instruction according to the text information and the characteristic information in the user interface to be controlled, and triggering the control to be controlled corresponding to the control instruction in the user interface to be controlled in response to the control instruction. Therefore, the user interface is executed through voice, so that the user can conveniently interact with the service equipment more directly, and more compatible and easier service experience is provided for the user.
Fig. 3 is a schematic diagram according to a second embodiment of the present application. As shown in fig. 3, the voice control method for a user interface provided in this embodiment includes:
s201, determining text information according to the acquired voice information.
In this step, a microphone may be disposed on the electronic device, and the voice information input by the user is acquired through the microphone. After the electronic device acquires the voice information, the electronic device can recognize the voice information in a voice recognition mode to determine text information corresponding to the voice information.
S202, determining a control to be controlled according to the text information and the characteristic information in the user interface to be controlled.
Specifically, the control instruction may be generated according to the recognized text information and the feature information in the user interface to be controlled. The characteristic information may be characteristic information corresponding to each control in the user interface to be controlled, and may be, for example, a name or attribute information of each control.
In a possible implementation manner, for each user interface to be controlled, a group of data lists used for storing characteristic information corresponding to each control in the user interface to be controlled is corresponded, and after the text information is obtained, the control to be controlled corresponding to the text information can be determined by retrieving from the data lists according to the text information.
For example, the feature information may include names of the controls in the user interface to be controlled, and the controls to be controlled are determined according to a first text in the text information and the names of the controls in the user interface to be controlled, where the first text includes keywords in the names of the controls to be controlled. The control to be controlled is determined by matching the first text in the text information with the keywords in the name of the control to be controlled, and the aim of accurately positioning the target control controlled by the user is fulfilled.
And S203, generating a control instruction according to the text information and the control type of the control to be controlled.
And S204, responding to the control instruction, and triggering a control to be controlled corresponding to the control instruction in the user interface to be controlled.
After the control to be controlled is determined, a control instruction can be generated according to the text information and the control type of the control to be controlled. And responding to the control instruction, and triggering a control to be controlled corresponding to the control instruction in the user interface to be controlled.
And if the control type of the control to be controlled is an input box control, determining an input instruction according to a second text in the text information, and responding to the input instruction to input the second text in the box to be input in the user interface to be controlled. Therefore, when the control type of the control to be controlled is determined to be the input box control, the second text in the text information can be acquired to fill the information of the box to be input in the user interface to be controlled, and therefore the user can fill the content of the box to be input in a voice information input mode.
The above steps will be described in detail below by using a loan application interface displayed in the banking terminal as the user interface to be controlled.
Fig. 4 is an interface interaction diagram of the second embodiment, and fig. 5 is another interface interaction diagram of the second embodiment. As shown in fig. 4-5, in the current interface, there are displayed a "loan amount" input box, and a "confirm" button and a "cancel" button.
When the user inputs 'loan amount of 5 ten thousand' by voice, the control to be controlled can be positioned to the 'loan amount' input box according to the first text 'loan amount' in the text information, and then an input instruction is determined according to the second text '5 ten thousand' in the text information, so that '50000' is input in the 'loan amount' input box through the input instruction.
And when the control type of the control to be controlled is the click control, determining a click instruction according to the first text, and responding to the click instruction to trigger the control to be clicked corresponding to the click instruction in the user interface to be controlled. Therefore, when the control type of the control to be controlled is determined to be the click control, the control to be clicked corresponding to the first text in the text information can be obtained, and then a click instruction is generated, so that the user can click the key in the user interface to be controlled in a mode of inputting the voice information.
With continued reference to fig. 5, after inputting "50000" in the "loan amount" input box, the user continues to enter "ok" by voice, so that the "ok" button in the loan application interface is located according to the text information and is triggered according to the generated click command.
And when the control type of the control to be controlled is the selection control, determining a selection instruction according to the second text in the text information, and responding to the selection instruction to select an option corresponding to the second text in a frame to be selected in the user interface to be controlled. Therefore, when the control type of the control to be controlled is determined to be the selection control, the second text in the text information can be acquired to select the multiple options in the selection box to be selected, so that the options corresponding to the text information are determined, and the multiple options in the selection box to be selected by the user in a voice information input mode are selected.
Fig. 6 is a further interface interaction diagram of the second embodiment. As shown in fig. 6, in the loan term selection interface, the user continues to input "loan term of 6 months" by voice, the control to be controlled may be positioned to the "loan term" selection box according to the first text "loan term" in the text message, and then a selection instruction is determined according to the second text "loan term" in the text message, so that "6 months" is selected in the "loan term" selection box by the selection instruction.
In the embodiment, the voice information input by the user is acquired, the voice information is converted into the text information, the corresponding control to be controlled is accurately positioned according to the text information and the characteristic information in the user interface to be controlled, the control instruction is generated according to the text information and the control type of the control to be controlled, and the control to be controlled corresponding to the control instruction in the user interface to be controlled is triggered in response to the control instruction, so that the user can complete the interface operation of the user interface to be controlled in a voice mode.
In addition, on the basis of the above embodiment, after the text information is determined according to the acquired voice information, feedback information may be determined according to the text information and language environment information, where the language environment information includes a plurality of intention information corresponding to the text information, and the feedback information is intention information with the highest confidence in the service request among the plurality of intention information, and then the feedback information is output. Therefore, after the user inputs voice information, a plurality of intention information is determined according to the converted text information and the language environment information, and intention information with the highest service request confidence coefficient in the intention information is selected as feedback information to be fed back, so that the skill of conversation interaction is achieved, the understanding of natural voice and the logic realization of interaction skill can be based, and the long-time voice interaction and service flow can be completed.
Specifically, for determining the feedback information according to the text information and the language environment information, multiple rounds of historical text information corresponding to the voice information input by the user in multiple rounds of history may be obtained first, then multiple rounds of historical feedback information output in response to the voice information input by the multiple rounds of history may be obtained, and finally, the service request confidence of each intention in the multiple intention information may be determined according to the multiple rounds of historical text information and the multiple rounds of historical feedback information. Therefore, the service request confidence degree of each intention in the plurality of intention information is determined based on the plurality of rounds of historical text information and the plurality of rounds of historical feedback information, so that the language environment can be tracked and understood for a longer time, the true purpose corresponding to the voice information input by the user is determined, the interaction skills of the plurality of rounds of conversations are improved, and the mechanical supplement of the plurality of rounds of conversations in the prior art is avoided.
In order to intuitively explain the technical effect of determining the feedback information according to the text information and the language environment information, the following examples are further provided:
the multiple rounds of dialog of the existing mechanical complement are as follows:
the user: recommending financial products;
an electronic device: a, preparing a product;
the user: the yield is too low;
an electronic device: and B, product.
In the existing mechanically supplemented multi-turn dialog, when the electronic device recommends a product B, the electronic device only supplements the voice information input by the user in the last turn, and actually supplements the product with higher income stored in the electronic device. Since it is only supplemented with the voice message "the profit is too low" input by the last round of user, there is a high possibility that the profit of the second recommended B product is lower than that of the a product. It can be seen that the recommendation of the electronic device at this time is only a mechanical complement, not the true intention of the user.
And the situation that the feedback information is determined in multiple rounds of conversations according to the text information and the language environment information is as follows:
the user: recommending financial products;
an electronic device: a, preparing a product;
the user: the yield is too low;
an electronic device: and C, products.
When the electronic equipment recommends the C product, because the service request confidence of a Natural language (NLP for short) is increased, the service request confidence of each intention in a plurality of intention information is determined based on a plurality of rounds of historical text information and a plurality of rounds of historical feedback information, so as to track and understand the language environment for a longer time.
In addition, in the process of carrying out voice control on the user interface, the feedback information can be determined according to the text information and the language environment information, and then the control instruction is generated according to the feedback information and the characteristic information in the user interface to be controlled, so that the voice control on the user interface is more accurate.
Fig. 7 is a schematic diagram according to a third embodiment of the present application. As shown in fig. 7, the voice control method for a user interface provided in this embodiment includes:
s301, acquiring image information of the face of the user, and generating a wake-up instruction according to the voice information and the image information.
In order to naturally perform a long-time voice interaction in a multi-turn conversation process so as to provide a more compatible and relaxed service experience for a user, it is necessary to implement natural wake-up of a voice service of an electronic device.
Specifically, the voice service may be configured to acquire voice information input by a user and image information of a face of the user, and then generate a wake-up instruction according to the voice information and the image information, where the wake-up instruction is used to wake up the voice service. Whether the wake-up command is generated or not can be determined according to the matching degree between the voice information and the image information, and when the matching degree between the voice information and the image information exceeds a set threshold value, the wake-up condition can be considered to be met. Therefore, the voice information input by the user and the image information of the face of the user are obtained, and the awakening instruction is generated according to the voice information and the image information, so that natural conversation between the user and the electronic equipment in multiple rounds of conversation and multiple rounds of conversation can be realized, and the awakening words do not need to be input before each round of conversation starts in the prior art.
In one possible design, for an implementation manner of generating the wake-up instruction according to the voice information and the image information, the lip movement frequency may be determined according to the image information, and if the voice information matches the lip movement frequency, the wake-up instruction is generated. Therefore, whether the awakening instruction is generated or not is determined by comparing the lip movement frequency with the voice information, so that the natural conversation between the user and the electronic equipment in multiple rounds of conversation can be realized.
And S302, determining text information according to the acquired voice information.
And S303, determining a control to be controlled according to the text information and the characteristic information in the user interface to be controlled.
And S304, generating a control instruction according to the text information and the control type of the control to be controlled.
S305, responding to the control instruction, and triggering a to-be-controlled control corresponding to the control instruction in the to-be-controlled user interface.
It should be noted that, for the specific implementation of S302 to S305 in this embodiment, reference may be made to the description of S201 to S204 in the embodiment shown in fig. 3, and details are not repeated here.
In order to intuitively explain the technical effect of the voice control method of the user interface in the present embodiment, the description may be continued with reference to the examples shown in fig. 4 to 6.
First, if the user needs to trigger the "loan amount" input box and then input "50000" through the keyboard in order to input the loan amount while controlling the loan application interface displayed in the banking terminal through the interactive manner in the related art. Or holding down the voice input key or inputting the voice message of "5 ten thousand" after the wakeup word is spoken.
After the input is completed, the user needs to click the 'confirm' button to confirm, and then selects '6 months' in the 'loan term' selection box in the loan term selection interface by touching.
If the voice input information needs to be continued in the input boxes of other interfaces, the voice input key needs to be triggered again, or the awakening word needs to be spoken again for awakening, so that natural interaction between the user and the user interface displayed in the electronic equipment cannot be realized.
However, if the user interface control is performed by the voice control method in the embodiment shown in fig. 7, the specific flow is as follows:
with continued reference to fig. 5-6, when the user inputs "5 ten thousand loan amount" by voice, the electronic device wakes up the voice service naturally by acquiring the voice information input by the user and the image information of the user's face and then generating a wake-up command according to the voice information and the image information.
Then, the 'loan amount of 5 ten thousand' input by voice is converted into text information, the control to be controlled is positioned to the 'loan amount' input box according to the first text 'loan amount' in the text information, then an input instruction is determined according to the second text '5 ten thousand' in the text information, and '50000' is input in the 'loan amount' input box through the input instruction.
After inputting "50000" in the "loan amount" input box, the user continues to input "determine" by voice. At this time, the electronic device keeps the awakening state of the voice service by continuously acquiring the voice information input by the user and the image information of the face of the user and then generating an awakening instruction according to the voice information and the image information. And positioning the 'confirm' key in the loan application interface according to the text information, and triggering the 'confirm' key according to the generated click command.
When jumping to the loan term selection interface, the user continues to input the loan term of 6 months by voice, and similarly, the voice service continues to maintain the natural awakening state, at this time, the control to be controlled can be positioned to the loan term selection box according to the first text of the loan term in the text message, and then the selection instruction is determined according to the second text of the loan term in the text message, so that the "6 months" is selected in the loan term selection box by the selection instruction.
Therefore, based on a natural awakening mode, voice information input by a user is acquired firstly, then the voice information is converted into text information, the corresponding control to be controlled is accurately positioned according to the text information and characteristic information in the user interface to be controlled, then a control instruction is generated according to the text information and the control type of the control to be controlled, and finally the control to be controlled corresponding to the control instruction in the user interface to be controlled is triggered in response to the control instruction, so that the user can completely operate the interface of the user interface to be controlled in a voice mode without awakening or touching, and the service experience of the user is greatly improved.
Fig. 8 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 8, the speech control apparatus 400 of a user interface provided in this embodiment includes:
an obtaining module 401, configured to obtain voice information;
a recognition module 402, configured to determine text information according to the voice information;
a generating module 403, configured to generate a control instruction according to the text information and feature information in the user interface to be controlled, where the feature information is characteristic information corresponding to each control in the user interface to be controlled;
and the control module 404 is configured to trigger a to-be-controlled control corresponding to the control instruction in the to-be-controlled user interface in response to the control instruction.
In the embodiment, the interface operation of the user interface to be controlled can be completed by the user in a voice mode by acquiring the voice information input by the user, converting the voice information into the text information, generating the control instruction according to the text information and the characteristic information in the user interface to be controlled, and triggering the control to be controlled corresponding to the control instruction in the user interface to be controlled in response to the control instruction. Therefore, the user interface is executed through voice, so that the user can conveniently interact with the service equipment more directly, and more compatible and easier service experience is provided for the user.
In one possible design, the generating module 403 is specifically configured to:
determining a control to be controlled according to the text information and the characteristic information in the user interface to be controlled;
and generating a control instruction according to the text information and the control type of the control to be controlled.
In the embodiment, the voice information input by the user is acquired, the voice information is converted into the text information, the corresponding control to be controlled is accurately positioned according to the text information and the characteristic information in the user interface to be controlled, the control instruction is generated according to the text information and the control type of the control to be controlled, and the control to be controlled corresponding to the control instruction in the user interface to be controlled is triggered in response to the control instruction, so that the user can complete the interface operation of the user interface to be controlled in a voice mode.
In one possible design, the characteristic information includes names of respective controls in the user interface to be controlled;
correspondingly, the generating module 403 is specifically configured to:
and determining the control to be controlled according to a first text in the text information and the name of each control in the user interface to be controlled, wherein the first text comprises keywords in the name of the control to be controlled.
In this embodiment, the control to be controlled is determined by matching the first text in the text information with the keyword in the name of the control to be controlled, so that the purpose of accurately positioning the target control controlled by the user is achieved.
In one possible design, the generating module 403 is specifically configured to:
if the control type of the control to be controlled is an input box control, determining an input instruction according to a second text in the text information;
correspondingly, the control module 404 is specifically configured to:
and responding to the input instruction, and inputting a second text in a to-be-input box in the to-be-controlled user interface.
In this embodiment, when it is determined that the control type of the control to be controlled is the input box control, the second text in the text information may be acquired to fill the information of the box to be input in the user interface to be controlled, so that the user can fill the content of the box to be input by inputting the voice information.
In one possible design, the generating module 403 is specifically configured to:
if the control type of the control to be controlled is a selection control, determining a selection instruction according to a second text in the text information;
correspondingly, the control module 404 is specifically configured to:
and responding to the selection instruction, and selecting an option corresponding to the second text in a to-be-selected box in the to-be-controlled user interface.
In this embodiment, when the control type of the control to be controlled is determined to be the selection control, the second text in the text information may be acquired to select the multiple options in the to-be-selected box, so as to determine the options corresponding to the text information, thereby enabling the user to select the multiple options in the to-be-selected box by inputting the voice information.
In one possible design, the generating module 403 is specifically configured to:
if the control type of the control to be controlled is a click control, determining a click instruction according to the first text;
correspondingly, the control module 404 is specifically configured to:
and responding to the click command, and triggering a to-be-clicked control corresponding to the click command in the to-be-controlled user interface.
In this embodiment, when it is determined that the control type of the control to be controlled is the click control, the control to be clicked corresponding to the first text in the text information may be obtained, and then a click instruction is generated, so that a user can click a key in the user interface to be controlled by inputting the voice information.
In one possible design, the obtaining module 401 is configured to obtain voice information input by a user;
an obtaining module 401, configured to obtain image information of a face of a user;
and a generating instruction 403, configured to generate a wake-up instruction according to the voice information and the image information, where the wake-up instruction is used to wake up the voice service.
In this embodiment, by acquiring the voice information input by the user and the image information of the face of the user, and then generating the wake-up instruction according to the voice information and the image information, a natural conversation between the user and the electronic device in multiple rounds of conversation can be realized without inputting a wake-up word before each round of conversation in the prior art.
In one possible design, instructions 403 are generated specifically for:
determining lip movement frequency according to the image information;
and if the voice information is matched with the lip movement frequency, generating a wake-up instruction.
In the embodiment, whether the wake-up instruction is generated or not is determined by comparing the lip movement frequency with the voice information, so that the natural conversation between the user and the electronic equipment in the multi-turn conversation of the multi-turn conversations can be realized.
On the basis of the embodiment shown in fig. 8, fig. 9 is a schematic view according to a fifth embodiment of the present application. As shown in fig. 9, the speech control apparatus 400 of a user interface according to this embodiment further includes:
a determining module 405, configured to determine feedback information according to the text information and the language environment information, where the language environment information corresponds to multiple intention information, and the feedback information is intention information with a highest confidence in the service request in the multiple intention information;
and an output module 406, configured to output the feedback information.
In this embodiment, after the user inputs the voice information, a plurality of pieces of intention information are determined according to the converted text information and the language environment information, and intention information with the highest confidence of the service request among the plurality of pieces of intention information is selected as feedback information to be fed back, so that the skill of the dialogue interaction is achieved, and the long-time voice interaction and the service flow can be completed based on the understanding of natural voice and the logic realization of the interaction skill.
In one possible design, the determining module 405 is specifically configured to:
acquiring multiple rounds of historical text information corresponding to voice information input by multiple rounds of history of a user;
acquiring multi-round historical feedback information output in response to voice information input by multi-round history;
and determining the service request confidence degree of each intention in the plurality of intention information according to the plurality of rounds of historical text information and the plurality of rounds of historical feedback information.
In this embodiment, after the user inputs the voice information, a plurality of pieces of intention information are determined according to the converted text information and the language environment information, and intention information with the highest confidence of the service request among the plurality of pieces of intention information is selected as feedback information to be fed back, so that the skill of the dialogue interaction is achieved, and the long-time voice interaction and the service flow can be completed based on the understanding of natural voice and the logic realization of the interaction skill.
In one possible design, the generating module 403 is specifically configured to:
determining feedback information according to the text information and the language environment information;
and generating a control instruction according to the feedback information and the characteristic information in the user interface to be controlled.
In this embodiment, in the process of performing voice control on the user interface, the feedback information may be determined according to the text information and the language environment information, and then the control instruction is generated according to the feedback information and the feature information in the user interface to be controlled, so that the voice control performed on the user interface is more accurate.
The voice control apparatus of the user interface of the embodiment shown in fig. 8 to fig. 9 may execute the steps in the above method embodiment, and specific implementation processes and technical principles thereof refer to the relevant description in the embodiment, which is not described herein again.
Fig. 10 is a block diagram of an electronic device for implementing a voice control method of a user interface according to an embodiment of the present application. As shown in fig. 10, the electronic device 500 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, robots, personal digital assistants, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 10, the electronic device 500 includes: one or more processors 501 and memory 502, as well as interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, a plurality of electronic devices may be connected, each device providing part of the necessary operations. In fig. 6, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the voice control method of the user interface provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform a voice control method of a user interface provided by the present application.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the voice control method of the user interface in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., a voice control method of the user interface in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 502.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created according to use of the calibration system of the vehicle-mounted camera external parameter of fig. 6, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device depicted in FIG. 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device shown in fig. 10 may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 10 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus shown in fig. 10, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), GPUs (graphics processors), FPGA (field programmable gate array) devices, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (24)

1. A method for voice control of a user interface, comprising:
determining text information according to the acquired voice information;
generating a control instruction according to the text information and feature information in the user interface to be controlled, wherein the feature information is characteristic information corresponding to each control in the user interface to be controlled;
and responding to the control instruction, and triggering a control to be controlled corresponding to the control instruction in the user interface to be controlled.
2. The method according to claim 1, wherein the generating a control command according to the text information and the feature information in the user interface to be controlled comprises:
determining the control to be controlled according to the text information and the characteristic information in the user interface to be controlled;
and generating the control instruction according to the text information and the control type of the control to be controlled.
3. The method according to claim 2, wherein the feature information includes names of respective controls in the user interface to be controlled;
correspondingly, the determining the control to be controlled according to the text information and the feature information in the user interface to be controlled includes:
and determining the control to be controlled according to a first text in the text information and the name of each control in the user interface to be controlled, wherein the first text comprises keywords in the name of the control to be controlled.
4. The method according to claim 3, wherein the generating the control instruction according to the text information and the control type of the control to be controlled comprises:
if the control type of the control to be controlled is an input box control, determining an input instruction according to a second text in the text information;
correspondingly, the triggering the to-be-controlled control corresponding to the control instruction in the to-be-controlled user interface in response to the control instruction includes:
and responding to the input instruction, and inputting the second text in a to-be-input box in the to-be-controlled user interface.
5. The method according to claim 3, wherein the generating the control instruction according to the text information and the control type of the control to be controlled comprises:
if the control type of the control to be controlled is a selection control, determining a selection instruction according to a second text in the text information;
correspondingly, the triggering the to-be-controlled control corresponding to the control instruction in the to-be-controlled user interface in response to the control instruction includes:
and responding to the selection instruction, and selecting an option corresponding to the second text in a to-be-selected frame in the to-be-controlled user interface.
6. The method according to claim 3, wherein the generating the control instruction according to the text information and the control type of the control to be controlled comprises:
if the control type of the control to be controlled is a click control, determining a click instruction according to the first text;
correspondingly, the triggering the to-be-controlled control corresponding to the control instruction in the to-be-controlled user interface in response to the control instruction includes:
and responding to the click instruction, and triggering a to-be-clicked control corresponding to the click instruction in the to-be-controlled user interface.
7. The method for controlling the voice of the user interface according to any one of claims 1 to 6, before the determining the text information according to the acquired voice information, further comprising:
acquiring the voice information input by a user;
acquiring image information of a user face, and generating a wake-up instruction according to the voice information and the image information, wherein the wake-up instruction is used for waking up voice service.
8. The method of claim 7, wherein the generating a wake-up command according to the voice message and the image message comprises:
determining lip movement frequency according to the image information;
and if the voice information is matched with the lip movement frequency, generating the awakening instruction.
9. The method for controlling the voice of the user interface according to any one of claims 1 to 6, further comprising, after determining the text information according to the acquired voice information:
determining feedback information according to the text information and language environment information, wherein the language environment information comprises a plurality of intention information corresponding to the text information, and the feedback information is intention information with the highest confidence level of the service request in the intention information;
and outputting the feedback information.
10. The method of claim 9, wherein the determining feedback information according to the text information and the language environment information comprises:
acquiring multiple rounds of historical text information corresponding to voice information input by multiple rounds of history of a user;
acquiring multi-round historical feedback information output in response to voice information input by multi-round history;
and determining the service request confidence degree of each intention in the plurality of intention information according to the plurality of rounds of historical text information and the plurality of rounds of historical feedback information.
11. The method according to claim 10, wherein the generating a control command according to the text information and the feature information in the user interface to be controlled comprises:
determining the feedback information according to the text information and the language environment information;
and generating the control instruction according to the feedback information and the characteristic information in the user interface to be controlled.
12. A voice control apparatus for a user interface, comprising:
the acquisition module is used for acquiring voice information;
the recognition module is used for determining text information according to the voice information;
the generating module is used for generating a control instruction according to the text information and characteristic information in the user interface to be controlled, wherein the characteristic information is characteristic information corresponding to each control in the user interface to be controlled;
and the control module is used for responding to the control instruction and triggering the control to be controlled corresponding to the control instruction in the user interface to be controlled.
13. The speech control apparatus of a user interface according to claim 12, wherein the generating module is specifically configured to:
determining the control to be controlled according to the text information and the characteristic information in the user interface to be controlled;
and generating the control instruction according to the text information and the control type of the control to be controlled.
14. The voice-operated apparatus of claim 13, wherein the feature information comprises names of respective controls in the user interface to be controlled;
correspondingly, the generating module is specifically configured to:
and determining the control to be controlled according to a first text in the text information and the name of each control in the user interface to be controlled, wherein the first text comprises keywords in the name of the control to be controlled.
15. The speech control apparatus of a user interface according to claim 14, wherein the generating module is specifically configured to:
if the control type of the control to be controlled is an input box control, determining an input instruction according to a second text in the text information;
the control module is specifically configured to:
and responding to the input instruction, and inputting a second text in a to-be-input box in the to-be-controlled user interface.
16. The speech control apparatus of a user interface according to claim 14, wherein the generating module is specifically configured to:
if the control type of the control to be controlled is a selection control, determining a selection instruction according to a second text in the text information;
the control module is specifically configured to:
and responding to the selection instruction, and selecting an option corresponding to the second text in a to-be-selected frame in the to-be-controlled user interface.
17. The speech control apparatus of a user interface according to claim 14, wherein the generating module is specifically configured to:
if the control type of the control to be controlled is a click control, determining a click instruction according to the first text;
the control module is specifically configured to:
and responding to the click instruction, and triggering a to-be-clicked control corresponding to the click instruction in the to-be-controlled user interface.
18. The voice control apparatus of user interface according to any one of claims 12-17, wherein the obtaining module is further configured to obtain the voice information input by the user;
the acquisition module is further used for acquiring image information of the face of the user and generating a wake-up instruction according to the voice information and the image information, wherein the wake-up instruction is used for waking up voice service.
19. The speech control apparatus of a user interface according to claim 18, wherein the generating module is specifically configured to:
determining lip movement frequency according to the image information;
and if the voice information is matched with the lip movement frequency, generating the awakening instruction.
20. The voice control apparatus of a user interface according to any one of claims 12 to 17, further comprising:
the determining module is used for determining feedback information according to the text information and language environment information, wherein the language environment information comprises a plurality of intention information corresponding to the text information, and the feedback information is intention information with the highest confidence level of the service request in the intention information;
and the output module is used for outputting the feedback information.
21. The speech control apparatus of a user interface according to claim 20, wherein the determining module is specifically configured to:
acquiring multiple rounds of historical text information corresponding to voice information input by multiple rounds of history of a user;
acquiring multi-round historical feedback information output in response to voice information input by multi-round history;
and determining the service request confidence degree of each intention in the plurality of intention information according to the plurality of rounds of historical text information and the plurality of rounds of historical feedback information.
22. The speech control apparatus of a user interface according to claim 21, wherein the generating module is specifically configured to:
determining the feedback information according to the text information and the language environment information;
and generating the control instruction according to the feedback information and the characteristic information in the user interface to be controlled.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor;
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-12.
CN202010220645.1A 2020-03-25 2020-03-25 Voice control method and device of user interface, electronic equipment and storage medium Active CN111309283B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010220645.1A CN111309283B (en) 2020-03-25 2020-03-25 Voice control method and device of user interface, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010220645.1A CN111309283B (en) 2020-03-25 2020-03-25 Voice control method and device of user interface, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111309283A true CN111309283A (en) 2020-06-19
CN111309283B CN111309283B (en) 2023-12-05

Family

ID=71150325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010220645.1A Active CN111309283B (en) 2020-03-25 2020-03-25 Voice control method and device of user interface, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111309283B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102832A (en) * 2020-09-18 2020-12-18 广州小鹏汽车科技有限公司 Speech recognition method, speech recognition device, server and computer-readable storage medium
CN112201245A (en) * 2020-09-30 2021-01-08 中国银行股份有限公司 Information processing method, device, equipment and storage medium
CN113571057A (en) * 2021-06-15 2021-10-29 北京来也网络科技有限公司 Voice control method and device combining RPA and AI
WO2023087934A1 (en) * 2021-11-19 2023-05-25 杭州逗酷软件科技有限公司 Voice control method, apparatus, device, and computer storage medium
WO2023103918A1 (en) * 2021-12-07 2023-06-15 杭州逗酷软件科技有限公司 Speech control method and apparatus, and electronic device and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431573A (en) * 2007-11-08 2009-05-13 上海赢思软件技术有限公司 Method and equipment for implementing automatic customer service through human-machine interaction technology
CN105551488A (en) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 Voice control method and system
CN106205611A (en) * 2016-06-29 2016-12-07 北京智能管家科技有限公司 A kind of man-machine interaction method based on multi-modal historical responses result and system
CN106297777A (en) * 2016-08-11 2017-01-04 广州视源电子科技股份有限公司 A kind of method and apparatus waking up voice service up
US20170193992A1 (en) * 2015-12-30 2017-07-06 Le Holdings (Beijing) Co., Ltd. Voice control method and apparatus
CN107316643A (en) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 Voice interactive method and device
CN109003605A (en) * 2018-07-02 2018-12-14 北京百度网讯科技有限公司 Intelligent sound interaction processing method, device, equipment and storage medium
CN109377995A (en) * 2018-11-20 2019-02-22 珠海格力电器股份有限公司 A kind of method and apparatus controlling equipment
CN109741737A (en) * 2018-05-14 2019-05-10 北京字节跳动网络技术有限公司 A kind of method and device of voice control
CN109960537A (en) * 2019-03-29 2019-07-02 北京金山安全软件有限公司 Interaction method and device and electronic equipment
CN109992237A (en) * 2018-01-03 2019-07-09 腾讯科技(深圳)有限公司 Intelligent sound apparatus control method, device, computer equipment and storage medium
US20190228212A1 (en) * 2018-01-22 2019-07-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Wakeup method, apparatus and device based on lip reading, and computer readable medium
US20200013406A1 (en) * 2018-07-03 2020-01-09 Boe Technology Group Co., Ltd. Control method for human-computer interaction device, human-computer interaction device and human-computer interaction system
CN110675870A (en) * 2019-08-30 2020-01-10 深圳绿米联创科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN110737335A (en) * 2019-10-11 2020-01-31 深圳追一科技有限公司 Interaction method and device of robot, electronic equipment and storage medium
CN110874201A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Interaction method, device, storage medium and operating system

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431573A (en) * 2007-11-08 2009-05-13 上海赢思软件技术有限公司 Method and equipment for implementing automatic customer service through human-machine interaction technology
CN105551488A (en) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 Voice control method and system
US20170193992A1 (en) * 2015-12-30 2017-07-06 Le Holdings (Beijing) Co., Ltd. Voice control method and apparatus
CN106205611A (en) * 2016-06-29 2016-12-07 北京智能管家科技有限公司 A kind of man-machine interaction method based on multi-modal historical responses result and system
CN106297777A (en) * 2016-08-11 2017-01-04 广州视源电子科技股份有限公司 A kind of method and apparatus waking up voice service up
CN107316643A (en) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 Voice interactive method and device
CN109992237A (en) * 2018-01-03 2019-07-09 腾讯科技(深圳)有限公司 Intelligent sound apparatus control method, device, computer equipment and storage medium
US20190228212A1 (en) * 2018-01-22 2019-07-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Wakeup method, apparatus and device based on lip reading, and computer readable medium
CN109741737A (en) * 2018-05-14 2019-05-10 北京字节跳动网络技术有限公司 A kind of method and device of voice control
CN109003605A (en) * 2018-07-02 2018-12-14 北京百度网讯科技有限公司 Intelligent sound interaction processing method, device, equipment and storage medium
US20200013406A1 (en) * 2018-07-03 2020-01-09 Boe Technology Group Co., Ltd. Control method for human-computer interaction device, human-computer interaction device and human-computer interaction system
CN110874201A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Interaction method, device, storage medium and operating system
CN109377995A (en) * 2018-11-20 2019-02-22 珠海格力电器股份有限公司 A kind of method and apparatus controlling equipment
CN109960537A (en) * 2019-03-29 2019-07-02 北京金山安全软件有限公司 Interaction method and device and electronic equipment
CN110675870A (en) * 2019-08-30 2020-01-10 深圳绿米联创科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN110737335A (en) * 2019-10-11 2020-01-31 深圳追一科技有限公司 Interaction method and device of robot, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐贵宝;: "语音控制互联网交互及其关键技术研究", 电信网技术, no. 01, pages 38 - 42 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102832A (en) * 2020-09-18 2020-12-18 广州小鹏汽车科技有限公司 Speech recognition method, speech recognition device, server and computer-readable storage medium
CN112102832B (en) * 2020-09-18 2021-12-28 广州小鹏汽车科技有限公司 Speech recognition method, speech recognition device, server and computer-readable storage medium
CN112201245A (en) * 2020-09-30 2021-01-08 中国银行股份有限公司 Information processing method, device, equipment and storage medium
CN112201245B (en) * 2020-09-30 2024-02-06 中国银行股份有限公司 Information processing method, device, equipment and storage medium
CN113571057A (en) * 2021-06-15 2021-10-29 北京来也网络科技有限公司 Voice control method and device combining RPA and AI
WO2023087934A1 (en) * 2021-11-19 2023-05-25 杭州逗酷软件科技有限公司 Voice control method, apparatus, device, and computer storage medium
WO2023103918A1 (en) * 2021-12-07 2023-06-15 杭州逗酷软件科技有限公司 Speech control method and apparatus, and electronic device and storage medium

Also Published As

Publication number Publication date
CN111309283B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN111309283B (en) Voice control method and device of user interface, electronic equipment and storage medium
KR20190023341A (en) Method for operating speech recognition service and electronic device supporting the same
JP7091430B2 (en) Interaction information recommendation method and equipment
CN112533041A (en) Video playing method and device, electronic equipment and readable storage medium
CN111354360A (en) Voice interaction processing method and device and electronic equipment
CN111680517B (en) Method, apparatus, device and storage medium for training model
CN111275190A (en) Neural network model compression method and device, image processing method and processor
CN111507111B (en) Pre-training method and device of semantic representation model, electronic equipment and storage medium
CN111443801B (en) Man-machine interaction method, device, equipment and storage medium
CN112466280B (en) Voice interaction method and device, electronic equipment and readable storage medium
EP3799036A1 (en) Speech control method, speech control device, electronic device, and readable storage medium
CN112148850A (en) Dynamic interaction method, server, electronic device and storage medium
KR20220011083A (en) Information processing method, device, electronic equipment and storage medium in user dialogue
CN113325954A (en) Method, apparatus, device, medium and product for processing virtual objects
CN110767212B (en) Voice processing method and device and electronic equipment
CN112382291B (en) Voice interaction processing method and device, electronic equipment and storage medium
US20210098012A1 (en) Voice Skill Recommendation Method, Apparatus, Device and Storage Medium
CN112652304A (en) Voice interaction method and device of intelligent equipment and electronic equipment
CN112382292A (en) Voice-based control method and device
EP3901905B1 (en) Method and apparatus for processing image
US20220075952A1 (en) Method and apparatus for determining recommended expressions, device and computer storage medium
CN116339871A (en) Control method and device of terminal equipment, terminal equipment and storage medium
CN111651229A (en) Font changing method, device and equipment
CN111783872A (en) Method and device for training model, electronic equipment and computer readable storage medium
CN111738325A (en) Image recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant