WO2023040692A1 - Speech control method, apparatus and device, and medium - Google Patents

Speech control method, apparatus and device, and medium Download PDF

Info

Publication number
WO2023040692A1
WO2023040692A1 PCT/CN2022/117090 CN2022117090W WO2023040692A1 WO 2023040692 A1 WO2023040692 A1 WO 2023040692A1 CN 2022117090 W CN2022117090 W CN 2022117090W WO 2023040692 A1 WO2023040692 A1 WO 2023040692A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
target
control instruction
interface
voice
Prior art date
Application number
PCT/CN2022/117090
Other languages
French (fr)
Chinese (zh)
Inventor
胡明国
徐超
Original Assignee
北京车和家信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京车和家信息技术有限公司 filed Critical 北京车和家信息技术有限公司
Publication of WO2023040692A1 publication Critical patent/WO2023040692A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the present disclosure relates to the technical field of voice recognition, and in particular, to a voice control method, device, equipment and medium.
  • control instructions corresponding to each interactive interface are pre-stored in the electronic device, and the user can control the operation of each interactive interface of the electronic device by uttering these control instructions.
  • the user due to the limited number of pre-stored control instructions, the user cannot fully realize all voice control of each interactive interface based on these control instructions.
  • the present disclosure provides a voice control method, device, equipment and medium.
  • the present disclosure provides a voice control method, including:
  • the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface
  • the target control operation corresponding to the target control instruction is executed.
  • a voice control device including:
  • an interface display module configured to display a target interactive interface
  • the instruction loading module is configured to load the target control instruction set corresponding to the target interactive interface, and the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface;
  • the instruction matching module is configured to, in response to receiving the user control speech, query the target control instruction set that matches the user control speech;
  • the instruction execution module is configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
  • a voice control device comprising:
  • the processor is used to read executable instructions from the memory, and execute the executable instructions to implement the voice control method described in the first aspect.
  • the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the voice control method described in the first aspect.
  • the present disclosure provides a computer program product, including a computer program or an instruction.
  • the voice control method described in the first aspect is implemented.
  • the voice control method, device, device, and medium of the embodiments of the present disclosure can load the target control instruction set corresponding to the target interactive interface after displaying the target interactive interface, and then when receiving the user's control voice, the target control instruction set Query the target control instruction that matches the received user control voice, and execute the queried target control instruction, so as to realize the user's voice control of the target interactive interface.
  • the interface control data can cover all the interface controls in the target interactive interface, therefore, can fully realize all the voice control of the target interactive interface, and then achieve the visibility of the target interactive interface (Display Can be Said, DCS) effect.
  • FIG. 1 is a schematic flowchart of a voice control method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a main interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure
  • FIG. 3A is a schematic diagram of an application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure
  • FIG. 3B is a schematic diagram of an application interface of another vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a processing process of dynamic control data provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic flowchart of another voice control method provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • control instructions corresponding to each interactive interface are pre-stored in the electronic device, and the user can control the operation of each interactive interface of the electronic device by uttering these control instructions.
  • the control command may be a wake-up word
  • each interactive interface may be registered with a fixed number of wake-up words, and the user may satisfy voice control requirements for each interactive interface by speaking these wake-up words.
  • control instructions need to be set in advance, which means that the wake-up words on each interactive interface need to be designed in advance. If the content on the interactive interface is dynamically loaded, due to the limited number of pre-stored control instructions, users will not be able to control based on these Instructions fully realize all voice control of each interactive interface.
  • wake-up models based on wake-up words are often small, and it is impossible to support multiple complex and large numbers of wake-up words in one scene. If too many wake-up words are registered on an interactive interface, it will also cause errors in the wake-up model. Wake up problem.
  • the embodiments of the present disclosure provide a voice control method, device, device, and medium capable of realizing "see-to-talk".
  • the voice control method provided by the embodiment of the present disclosure will first be described below with reference to FIG. 1 to FIG. 5 .
  • the voice control method may be executed by an electronic device.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
  • Fig. 1 shows a schematic flowchart of a voice control method provided by an embodiment of the present disclosure.
  • the voice control method may include the following steps S110 to S140.
  • the target interaction interface may be an interface visually displayed through a display screen of the electronic device.
  • At least one interface control may be displayed in the target interaction interface.
  • the interface controls may be controls that can be manipulated by the user, such as buttons, options, icons, or links in the interface.
  • the target interaction interface may include a main interface displayed when the electronic device is turned on or in a standby state.
  • the interface control displayed on the target interaction interface may be an icon of an application program.
  • Fig. 2 shows a schematic diagram of a main interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • the vehicle-mounted terminal can display a main interface 201, and multiple interface controls can be displayed in the main interface 201, such as a "setting application” icon 202, a "file application” icon 203, and a “browser application” icon. 204 and the “Music Application” icon 205 .
  • the target interaction interface may include an application interface of any application program installed in the electronic device.
  • the interface controls displayed in the target interaction interface may be buttons, options, icons or links in the application interface.
  • Fig. 3A shows a schematic diagram of an application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • the vehicle-mounted terminal may display a main application interface 301 for setting an application program, and multiple interface controls may be displayed on the application interface 301, such as a "My Device” button 302, a "Control Center” button 303, and a " More Settings” button 304 .
  • Fig. 3B shows a schematic diagram of another application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • the vehicle-mounted terminal can display a main application interface 305 of a music application program, and multiple interface controls can be displayed in the application interface 305, such as "daily recommendation” option 306, “song list” option 307, “Local Music” option 308, “Settings” option 309, various playback control buttons 310, various playlist links 311 and a "Back" button 312.
  • the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface.
  • the target control instruction set corresponding to the target interaction interface can be loaded, and each control command in the target control instruction set is generated according to the interface control data of the target interaction interface
  • the interface control data may include control data corresponding to all interface controls, that is, all interface controls in the target interactive interface have corresponding control instructions.
  • the target control instruction set may be a set of control instructions corresponding to various interface controls in the target interactive interface.
  • control instruction may include a first control instruction generated according to static control data in the interface control data.
  • the first control instruction is a control instruction of the static control.
  • the static control data may be control data corresponding to static controls in the target interactive interface.
  • the static control may be an interface control that is always fixed and displayed, that is, the static control will not change according to user preferences or settings.
  • the static controls can be interface controls that come with the device when it leaves the factory, and cannot be dynamically updated or changed by the user.
  • the corresponding "setting application” icon 202, "file application” icon 203 and The “browser application” icon 204 belongs to the static control of the main interface 201, and the control instruction generated based on the control data corresponding to the "setting application” icon 202, the "file application” icon 203 and the “browser application” icon 204 is the main interface 201 corresponding to the first control command.
  • static controls can be interface controls that are fixedly displayed within the interface frame and will not change according to user preferences, such as built-in resources preset in the interface project, and these built-in resources can be displayed in the application interface
  • the interface content is perceived in advance before being pushed to the user.
  • the corresponding "my device” button 302 and “control center” of these setting functions belong to the static controls of the application main interface 301, and the control instructions are generated based on the control data corresponding to the "my device” button 302, the "control center” button 303 and the “more settings” button 304 It is the first control instruction corresponding to the application main interface 301 .
  • control instruction may further include a second control instruction generated according to the dynamic control data in the interface control data.
  • the second control instruction is a control instruction of the dynamic control.
  • the dynamic control data may be control data corresponding to the dynamic controls in the target interactive interface.
  • the dynamic control may be an interface control that can be dynamically updated or changed with user preferences or settings.
  • the dynamic control may be an interface control added by the user.
  • the "music application" icon 205 corresponding to the music application belongs to the dynamic control of the main interface 201, and is generated based on the control data corresponding to the "music application” icon 205.
  • the control instruction is the second control instruction corresponding to the main interface 201 .
  • the dynamic control may be an interface control that will be dynamically updated within the interface framework, such as a resource that is filled after retrieval based on a network source.
  • the playlist links 311 corresponding to these playlist names belong to the dynamic control of the application main interface 305, based on each The control instruction generated by the control data corresponding to the song list link 311 is the second control instruction corresponding to the application main interface 305 .
  • the dynamic control may also be an interface control within the interface frame that changes according to the user's preference, and details are not described here.
  • the target control instruction set corresponding to the loaded target interaction interface can be used to control static controls in the target interaction interface, and can also be used to control dynamic controls in the target interaction interface.
  • the electronic device can load the target control instruction set corresponding to the target interaction interface, so that each interface control in the target interaction interface has a corresponding control instruction, so that the user can perform voice control on the target interaction interface more comprehensively.
  • the electronic device after the electronic device loads the target control instruction set, it can monitor the user control voice, and after receiving the user control voice, search for information related to the user control voice in each control instruction of the target control instruction set. Matching target control instructions.
  • S130 may specifically include: converting the user's control voice into the target voice text; querying the target control instruction set for a target control instruction that matches the target voice text.
  • the electronic device can input the user's control voice into an automatic speech recognition (Automatic Speech Recognition, ASR) engine set offline to obtain the target voice text output by the ASR engine, and then in each control command of the target control command set Query for target control commands that match the target speech text.
  • ASR Automatic Speech Recognition
  • the matching of the target control instruction and the target speech text may be that the target speech text contains any verb in the target control instruction and any control text participle, or that the verb in the target speech text is the same as any verb in the target control instruction and
  • the similarity between nouns in the target speech text and any control text word in the target control instruction is greater than or equal to a preset similarity threshold.
  • the electronic device can determine the user's voice control intention by querying the target control command matching the user's control voice in the target control command set.
  • the electronic device may execute the target control operation corresponding to the target control command; If there is a matching target control instruction, the detection of the user's voice can be continued, waiting for the next user to control the voice.
  • S140 may specifically include: performing a target control operation for the target interface control involved in the target control instruction.
  • each control instruction can be used to trigger the execution of the target control operation on the target interface control involved in the control instruction, that is, each control instruction can be used to trigger The target interface control to which the control data generating the control instruction belongs performs the target control operation.
  • the target control operation may be a control operation implemented in a target control manner indicated by the target control instruction.
  • the electronic device may perform a control operation on the target interface control to which the control data generating the control instruction belongs according to the target control mode indicated by the target control instruction.
  • the electronic device may enter a new interaction interface, or may remain in the target interaction interface.
  • the electronic device when the electronic device remains in the target interaction interface, the electronic device does not need to reload the control instruction set, and can continue to implement the user's voice control of the target interaction interface based on the target control instruction set.
  • the electronic device when the electronic device enters a new interactive interface, the electronic device needs to reload the control instruction set corresponding to the new interactive interface, so as to implement the user's voice on the target interactive interface based on the reloaded control instruction set control.
  • the electronic device when the user controls the electronic device to perform the control operation of "opening the music application", the electronic device can enter the application main interface of the music application. Therefore, after jumping from the main interface 201 to the application main interface of the music application After the interface, it is necessary to reacquire the control instruction set corresponding to the application main interface of the music application program, so as to realize the voice control of the user on the application main interface of the music application program based on the reloaded control instruction set.
  • the electronic device when the user controls the electronic device to perform the control operation of "play daily recommendation", the electronic device can directly play the daily recommended song in the application main interface 305 of the music application without jumping to other Therefore, without reloading the control instruction set, the user can continue to implement voice control of the music application main application interface 305 based on the control instruction set corresponding to the music application main application interface 305 .
  • the target control instruction set corresponding to the target interaction interface can be loaded, and then when the user control voice is received, the target control command set can be queried with the received user control voice match the target control instruction, and execute the queried target control instruction, thereby realizing the voice control of the user on the target interactive interface.
  • the loaded target control instruction set includes control instructions generated according to the interface control data of the target interactive interface, the interface control data can cover all interface controls in the target interactive interface, therefore, all voice control of the target interactive interface can be fully realized, Furthermore, the effect of DCS on the target interactive interface is achieved, and the user experience is improved.
  • the electronic device may directly acquire the pre-generated first control instruction.
  • S120 may specifically include: determining the target application to which the target interactive interface belongs; querying the control instruction set corresponding to the target application among multiple pre-stored preset control instruction sets; in the control instruction set corresponding to the target application, A first control instruction is extracted.
  • multiple preset control instruction sets may be pre-stored in the electronic device, and each preset control instruction set may correspond to an application program, that is, each preset control instruction set may contain Control instructions for all static controls involved.
  • the target application may be an application program to which the target interactive interface belongs.
  • the electronic device may use the application program that needs to be run when displaying the target interaction interface as the target application to which the target interaction interface belongs.
  • the electronic device may first receive the preset control instruction set sent by the server.
  • the server may receive the control instructions of all static controls corresponding to each interactive interface of the application program and the control mode corresponding to each control instruction input by the developer.
  • the control instruction of each static control includes a verb set and a control text word segmentation set corresponding to the static control.
  • a set of word segmentation extracted from the text, the static control text may be the control name of the static control that can be seen by the user, and the verb set in the control instruction includes multiple verbs with similar semantics.
  • the server may extract the control text word segmentation set from the control data of the static control, that is, the static control text in the static control data, and then Multiple control instructions of the static control are obtained by combining different pre-set verb sets and control text segmentation sets, and the verb sets in each control command include multiple verbs with similar semantics.
  • the server can also use the control instruction corresponding to the verb set in the control instruction and the control function of the static control corresponding to the control text participle set to determine the control mode corresponding to the control instruction.
  • each participle in the control text participle set can be connected by "
  • the participle set content conforming to the Extended Backus-Naur Form (EBNF) grammatical paradigm can be obtained, so that the first control instruction can be loaded into the language model of the grammar (Grammar) engine .
  • EBNF Extended Backus-Naur Form
  • the verb set can be "open
  • the participle set may be "music
  • the preset control instruction set stored in the electronic device since the static controls in the target interaction interface may be updated due to reasons such as version upgrades, the preset control instruction set stored in the electronic device also needs to be updated to ensure that the user can control the updated target All static controls in the interactive interface are controlled by voice.
  • the voice control method may further include: detecting an instruction set version of the control instruction set corresponding to the target application.
  • the electronic device may detect the instruction set version of the control instruction set corresponding to the target application, and obtain the version number of the control instruction set corresponding to the target application.
  • extracting the first control instruction from the control instruction set corresponding to the target application may specifically include: in response to detecting that the version of the instruction set is the latest version, extracting the first control instruction from the control instruction set corresponding to the target application.
  • the electronic device can determine whether the instruction set version of the control instruction set corresponding to the target application is the latest version by judging whether the detected version number is the latest version number. If the electronic device determines that the version number is the latest version number, it can It is determined that the version of the instruction set is the latest version. At this time, there is no need to update the control instruction set corresponding to the target application, and the first control instruction corresponding to the target interactive interface can be directly extracted from the control instruction set corresponding to the target application.
  • the voice control method may further include: in response to detecting that the instruction set version is not the latest version, downloading from the server the to-be-updated version corresponding to the target application.
  • the control instruction set; the control instruction set corresponding to the target application is replaced by the control instruction set to be updated; the first control instruction is extracted from the control instruction set to be updated.
  • the electronic device determines that the version number is not the latest version number, it can determine that the instruction set version is not the latest version.
  • the control instruction set corresponding to the target application needs to be updated, and the electronic device can send the control instruction set for the target application to the server.
  • An instruction set update request so that the server responds to receiving the control instruction set update request, and feeds back to the electronic device the latest version of the control instruction set corresponding to the target application, that is, the control instruction set corresponding to the target application to be updated, so as to download the corresponding version of the target application from the server.
  • the control instruction set to be updated and then use the control instruction set to be updated to replace the control instruction set corresponding to the target application, that is, the control instruction set to be updated is used as the new control instruction set corresponding to the target application, and the target control instruction set that is not the latest version is deleted, Then extract the first control instruction corresponding to the target interaction interface from the control instruction set to be updated, that is, the new control instruction set corresponding to the target application.
  • control instructions of all static controls can be sorted out in advance for each interactive interface of each application program, and then the control instructions of all static controls corresponding to all installed application programs can be pre-stored in the electronic device. instructions, and these control instructions are used as static preset content to realize fast loading of the first control instructions of the target interactive interface.
  • control instruction when the control instruction includes a second control instruction generated according to the dynamic control data in the interface control data, the electronic device may generate the second control instruction according to the dynamic control data.
  • loading the target control instruction set corresponding to the target interaction interface may specifically include: processing the dynamic control data to generate the second control instruction.
  • the dynamic control is a control formed by filling control data in a dynamic content reserved field.
  • Fig. 4 shows a schematic flowchart of a processing procedure of dynamic control data provided by an embodiment of the present disclosure.
  • the processing process of the dynamic control data may include the following steps S410 to S430.
  • each static control data can belong to a static control
  • each dynamic control data can belong to a dynamic control.
  • the electronic device may extract the dynamic control text of the dynamic control to which the dynamic control data belongs from the dynamic control data corresponding to the target interaction interface, and the dynamic control text may be the name of the control that the dynamic control can be seen by the user. .
  • the song list link 311 belongs to the dynamic control of the application main interface 305. Taking the song list link 311 of "Ambient Piano Music as Soul and Endless Void" as an example, its dynamic control text is "Ambient Piano Music as Soul and Endless Void” Endless Void Dialogue”.
  • the electronic device may perform word segmentation processing on the dynamic control text to obtain a word segment set corresponding to the dynamic control text, that is, a control text word segment set of the dynamic control.
  • the electronic device can use any word segmentation processing algorithm to split the dynamic control text into multiple control text word segments, then combine any number of adjacent control text word segments to obtain multiple word segment combinations, and finally A word segment set corresponding to the dynamic control text including multiple control text word segments and combinations of multiple word segments is obtained.
  • the method of combining multiple control text word segments and multiple word segment combinations to obtain a word segment set may include connecting multiple control text word segments and multiple word segment combinations with "
  • the content of the participle set conforming to the EBNF grammatical paradigm can be obtained, so that the generated second control instruction can be loaded into the language model of the Grammar engine.
  • the electronic device may generate the second control instruction according to the word segmentation set based on a preset control instruction generation manner.
  • S430 may specifically include: generating a second control instruction according to a preset verb set and participle set.
  • the electronic device can combine different pre-set verb sets and word segmentation sets to obtain multiple control instructions of the dynamic control, and the verb set in each control instruction includes multiple verbs with similar semantics.
  • the electronic device can also use the control command corresponding to the verb set in the control command and the control function of the dynamic control corresponding to the participle set to determine the control mode corresponding to the control command.
  • the voice control method may further include: preprocessing the dynamic control text.
  • the electronic device after the electronic device extracts the dynamic control text, before performing word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text, it can also preprocess the dynamic control text to obtain Dynamic control text for text processing.
  • pre-processing may include symbol removal processing, digital conversion processing.
  • Symbol elimination processing can be used to eliminate symbols in the dynamic control text, such as punctuation marks, special symbols, mathematical symbols and any other symbols that do not have semantic meaning.
  • Numeral conversion processing can be used to convert Arabic numerals in dynamic control text to Chinese numerals. Among them, if the Arabic numerals have more than two digits, the entire Arabic numerals can be converted into a Chinese numeral, or each numeral can be converted into a numeral respectively.
  • the electronic device may first remove symbols in the dynamic control text to obtain the dynamic control text after removing symbols. Then, the electronic device can convert the Arabic numerals in the dynamic control text after removing symbols into Chinese numerals.
  • the Arabic numeral “200” can be converted into the Chinese numeral "two hundred”, and it can also be converted into the Chinese numeral "200” to obtain the digital conversion
  • the final dynamic control text is "The Chinese class represents the private collection of 200 idiom song names in the cheat sheet
  • the electronic device can perform word segmentation and word segmentation combination on the converted dynamic control text, Get the participle collection "Chinese class representative's private collection cheat sheet 200 idiom song titles
  • the electronic device after the electronic device displays the target interaction interface, it can generate control instructions for all the dynamic controls in the target interaction interface based on the data of each dynamic control in the target interaction interface, that is, the second control instruction, and then These control instructions are used as dynamic loading content to realize reliable and efficient loading of the second control instructions of the target interactive interface.
  • Fig. 5 shows a schematic flowchart of another voice control method provided by an embodiment of the present disclosure.
  • the voice control method may include the following steps S510 to S560.
  • an electronic device with a voice control function may display a target interaction interface, so that the user may perform voice control on the target interaction interface.
  • the electronic device may initialize the ASR engine, and load the language model whose instruction content is empty. Then, load the target control instruction set corresponding to the target interactive interface into the language model. During the start-up and initialization of the ASR engine and the loading of the target control instruction set, the electronic device does not receive user voice.
  • the electronic device needs to load the first control instruction and the second control instruction in the target control instruction set into the language model.
  • the electronic device can first determine whether the application to which the preloaded language model belongs is the target application to which the target interactive interface belongs, and if so, load the target control instruction set into the language model; if not, reload the empty program corresponding to the target application. language model, and then load the target control instruction set into the reloaded language model.
  • the electronic device may first determine the target application to which the target interactive interface belongs, and then query the control instruction set corresponding to the target application in multiple preset control instruction sets, and then determine the instruction set version of the control instruction set corresponding to the target application Whether it is the latest version, if it is determined that the instruction set version is the latest version, there is no need to update the control instruction set corresponding to the target application at this time, and the first control instruction corresponding to the target interactive interface can be directly extracted from the control instruction set corresponding to the target application; The set version is not the latest version. In this case, the control instruction set corresponding to the target application needs to be updated.
  • the first control instruction corresponding to the target interactive interface is extracted from the set of control instructions to be updated. After the first control instruction is acquired by the electronic device, the first control instruction can be loaded into the language model.
  • the electronic device may acquire dynamic control data corresponding to all dynamic controls in the target interaction interface.
  • the electronic device can extract the dynamic control text of the dynamic control from the dynamic control data, then remove the symbols in the dynamic control text and convert the Arabic numerals in the dynamic control text into Chinese numerals,
  • the preprocessed dynamic control text is obtained, and then word segmentation is performed on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text, and finally a second control command is generated according to the preset verb set and word segmentation set.
  • the second control instruction may be loaded into the language model.
  • the electronic device may also convert the first control instruction and the second control instruction into binary codes, and load them into the speech model.
  • the electronic device may wait for the user to input a voice. If the start of human voice is recognized based on Voice Activity Detection (VAD), the recording will continue. If the end of the human voice is recognized based on the VAD, the recording is stopped. The electronic device can use the recorded audio as the user control voice. The electronic device can then input the user control voice into the ASR engine to obtain the target voice text corresponding to the user control voice.
  • VAD Voice Activity Detection
  • the electronic device may search the target control command set for the target control command matching the target voice text.
  • the electronic device may determine whether the target control instruction is found, and if the target control instruction is found, execute S560, otherwise return to execute S530.
  • the electronic device may perform a control operation on the target interface control to which the control data generating the control command belongs according to the target control mode indicated by the target control command.
  • the verb set and the participle set are used to generate control instructions, it can support the voice control of thousands of levels of speech in a single interface, and the control instructions include the generation of static controls based on the interface.
  • the static control instructions and the dynamic control instructions generated based on the dynamic controls in the interactive interface so on the basis of being able to support a sufficiently large grammatical level, the control instructions can also be arbitrarily expanded to achieve an effect that is visible to the interactive interface.
  • the loading process of the control instruction and the recognition process of the user's control voice are independent of each other and do not interfere with each other, which can improve the accuracy of recognition.
  • both the ASR engine and the Grammar engine are offline engines, which can run on the terminal side (that is, run in the electronic device) without relying on the network.
  • the model of the engine is small enough and requires low computing power, so that the control commands that need to be supported in the interactive interface can be responded at a faster speed (on average, it is about 1.2s faster than the cloud recognition result, and faster than the offline general recognition result. 500ms), which can bring higher benefits in vehicle application scenarios.
  • Fig. 6 shows a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • the apparatus shown in FIG. 6 may be applied to electronic equipment.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
  • the voice control device 600 may include an interface display module 610 , an instruction loading module 620 , an instruction matching module 630 and an instruction execution module 640 .
  • the interface display module 610 may be configured to display a target interaction interface.
  • the instruction loading module 620 may be configured to load a target control instruction set corresponding to the target interactive interface, where the target control instruction set includes control instructions generated according to interface control data of the target interactive interface.
  • the instruction matching module 630 may be configured to, in response to receiving the user control speech, query the target control instruction set for the target control instruction matching the user control speech.
  • the instruction executing module 640 may be configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
  • the target control instruction set corresponding to the target interaction interface can be loaded, and then when the user control voice is received, the target control command set can be queried with the received user control voice match the target control instruction, and execute the queried target control instruction, and then realize the user's voice control of the target interactive interface.
  • the loaded target control instruction set includes control instructions generated according to the interface control data of the target interactive interface, the The interface control data can cover all the interface controls in the target interactive interface. Therefore, all the voice control of the target interactive interface can be fully realized, thereby achieving the effect of being visible and talking about the target interactive interface, and improving the user experience.
  • control instruction may include a first control instruction generated according to static control data in the interface control data.
  • the interface display module 620 may further include an application determination unit, a first query unit, and a first extraction unit.
  • the application determining unit may be configured to determine the target application to which the target interactive interface belongs.
  • the first query unit may be configured to query the target control command set corresponding to the target application among the multiple preset control command sets stored in advance.
  • the first extracting unit may also be configured to extract the first control instruction from the control instruction set corresponding to the target application.
  • the interface display module 620 may further include a version detection unit, which may be configured to detect the instruction set version of the control instruction set corresponding to the target application before extracting the first control instruction from the target control instruction set .
  • the first extraction unit may be further configured to extract the first control instruction from the target control instruction set when the version detection unit detects that the instruction set version is the latest version.
  • the interface display module 620 may further include an instruction set downloading unit, a first processing unit, and a second extracting unit.
  • the instruction set downloading unit may be configured to download the control instruction set to be updated corresponding to the target application from the server if it detects that the instruction set version is not the latest version after detecting the instruction set version of the control instruction set corresponding to the target application.
  • the first processing unit may be configured to replace the target control instruction set with the control instruction set to be updated.
  • the second extracting unit may be configured to extract the first control instruction from the set of control instructions to be updated.
  • control instruction may include a second control instruction generated according to the dynamic control data in the interface control data.
  • the interface display module 620 may further include a third extracting unit, a second processing unit, and an instruction generating unit.
  • the third extracting unit may be configured to extract dynamic control text from the dynamic control data.
  • the second processing unit may be configured to perform word segmentation processing on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text.
  • the instruction generation unit may be configured to generate a second control instruction according to the word segmentation set.
  • the interface display module 620 may further include a third processing unit, and the third processing unit may be configured to perform word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text,
  • the text is pre-processed, wherein the pre-processing includes symbol elimination processing and digital conversion processing.
  • the instruction generation unit may be further configured to generate a second control instruction according to a preset verb set and participle set.
  • the interface display module 630 may include a text conversion unit and a second query unit.
  • the text converting unit may be configured to convert the user control speech into target speech text.
  • the second query unit may be configured to query the target control command set that matches the target voice text.
  • the instruction executing module 640 may be further configured to execute a target control operation for the target interface control involved in the target control command.
  • the voice control device 600 shown in FIG. 6 can execute each step in the method embodiment shown in FIG. 1 to FIG. 5 , and realize each process and effects, which will not be described here.
  • Fig. 7 shows a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • the voice control device shown in FIG. 7 may be an electronic device.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
  • the voice control device may include a processor 701 and a memory 702 storing computer program instructions.
  • the processor 701 may include a central processing unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits in the embodiments of the present application.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • Memory 702 may include mass storage for information or instructions.
  • memory 702 may include a hard disk drive (Hard Disk Drive, HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a Universal Serial Bus (Universal Serial Bus, USB) drive or two or more thereof.
  • HDD Hard Disk Drive
  • floppy disk drive a flash memory
  • an optical disk a magneto-optical disk
  • magnetic tape or a Universal Serial Bus (Universal Serial Bus, USB) drive or two or more thereof.
  • Universal Serial Bus Universal Serial Bus
  • Storage 602 may include removable or non-removable (or fixed) media, where appropriate.
  • Memory 602 may be internal or external to the integrated gateway device, where appropriate.
  • memory 602 is a non-volatile solid-state memory.
  • the memory 702 includes a read-only memory (Read-Only Memory, ROM).
  • the ROM can be a mask programmed ROM, a programmable ROM (Programmable ROM, PROM), an erasable PROM (Electrical Programmable ROM, EPROM), an electrically erasable PROM (Electrically Erasable Programmable ROM, EEPROM) ), electrically rewritable ROM (Electrically Alterable ROM, EAROM) or flash memory, or a combination of two or more of these.
  • the processor 701 reads and executes the computer program instructions stored in the memory 702 to execute the steps of the voice control method provided by the embodiments of the present disclosure.
  • the voice control device may further include a transceiver 703 and a bus 704 .
  • a processor 701 a memory 702 and a transceiver 703 are connected through a bus 704 and complete mutual communication.
  • Bus 704 includes hardware, software, or both.
  • a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Super Transmission (Hyper Transport, HT) interconnection, Industrial Standard Architecture (Industrial Standard Architecture, ISA) bus, Infinity Bandwidth interconnection, Low Pin Count (Low Pin Count, LPC) bus, memory bus, Micro Channel Architecture (Micro Channel Architecture) , MCA) bus, Peripheral Component Interconnect (PCI) bus, PCI-Express (PCI-X) bus, Serial Advanced Technology Attachment (Serial Advanced Technology Attachment, SATA) bus, Video Electronics Standards Association local (Video Electronics Standards Association Local Bus, VLB) bus or other suitable bus or a combination of two or more of these.
  • Bus 704 may comprise one or more buses, where appropriate.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium can store a computer program, and when the computer program is executed by the processor, the processor implements the voice control method provided by the embodiment of the present disclosure.
  • the above-mentioned storage medium may include, for example, a memory 702 of computer program instructions, and the above-mentioned instructions can be executed by the processor 701 of the voice control device to complete the voice control method provided by the embodiment of the present disclosure.
  • the storage medium can be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium can be ROM, random access memory (Random Access Memory, RAM), compact disc read-only memory (Compact Disc ROM) , CD-ROM), tapes, floppy disks and optical data storage devices, etc.
  • An embodiment of the present disclosure also provides a computer program product, including a computer program or an instruction.
  • a computer program product including a computer program or an instruction.
  • the voice control method described in the first aspect is implemented.

Abstract

A speech control method, apparatus and device, and a medium. The method comprises: displaying a target interaction interface (S110); loading a target control instruction set corresponding to the target interaction interface, wherein the target control instruction set comprises control instructions that are generated according to interface control data of the target interaction interface (S120); when user control speech is received, querying the target control instruction set for a target control instruction that matches the user control speech (S130); and if the target control instruction is found, executing a target control operation corresponding to the target control instruction (S140).

Description

语音控制方法、装置、设备及介质Voice control method, device, equipment and medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111084298.5、申请日为2021年09月14日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111084298.5 and a filing date of September 14, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及语音识别技术领域,尤其涉及一种语音控制方法、装置、设备及介质。The present disclosure relates to the technical field of voice recognition, and in particular, to a voice control method, device, equipment and medium.
背景技术Background technique
随着技术不断进步,越来越多的应用场景引入了具有语音控制功能的电子设备,例如在车辆中引入具有语音控制功能的车载终端。With the continuous advancement of technology, more and more application scenarios have introduced electronic devices with voice control functions, such as introducing a vehicle-mounted terminal with voice control functions in vehicles.
一般情况下,电子设备内会预先存储有每个交互界面对应的一些控制指令,用户可以通过说出这些控制指令实现对电子设备的各个交互界面的操作控制。但是,由于预先存储的控制指令的数量有限,用户无法基于这些控制指令完全实现对各个交互界面的全部语音控制。Generally, some control instructions corresponding to each interactive interface are pre-stored in the electronic device, and the user can control the operation of each interactive interface of the electronic device by uttering these control instructions. However, due to the limited number of pre-stored control instructions, the user cannot fully realize all voice control of each interactive interface based on these control instructions.
发明内容Contents of the invention
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种语音控制方法、装置、设备及介质。In order to solve the above technical problem or at least partly solve the above technical problem, the present disclosure provides a voice control method, device, equipment and medium.
在第一方面,本公开提供了一种语音控制方法,包括:In a first aspect, the present disclosure provides a voice control method, including:
显示目标交互界面;Display the target interface;
加载目标交互界面对应的目标控制指令集,目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令;Loading the target control instruction set corresponding to the target interactive interface, the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface;
响应于接收到用户控制语音,在目标控制指令集中查询与用户控制语音相匹配的目标控制指令;In response to receiving the user control voice, querying the target control command set for the target control command matching the user control voice;
响应于查询到目标控制指令,执行目标控制指令对应的目标控制操作。In response to querying the target control instruction, the target control operation corresponding to the target control instruction is executed.
在第二方面,本公开提供了一种语音控制装置,包括:In a second aspect, the present disclosure provides a voice control device, including:
界面显示模块,配置为显示目标交互界面;an interface display module configured to display a target interactive interface;
指令加载模块,配置为加载目标交互界面对应的目标控制指令集,目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令;The instruction loading module is configured to load the target control instruction set corresponding to the target interactive interface, and the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface;
指令匹配模块,配置为响应于接收到用户控制语音,在目标控制指令集中查询与用户控制语音相匹配的目标控制指令;The instruction matching module is configured to, in response to receiving the user control speech, query the target control instruction set that matches the user control speech;
指令执行模块,配置为响应于查询到目标控制指令,执行目标控制指令对应的目标控制操作。The instruction execution module is configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
在第三方面,本公开提供了一种语音控制设备,包括:In a third aspect, the present disclosure provides a voice control device, comprising:
处理器;processor;
存储器,用于存储可执行指令;memory for storing executable instructions;
其中,处理器用于从存储器中读取可执行指令,并执行可执行指令以实现第一方面所述的语音控制方法。Wherein, the processor is used to read executable instructions from the memory, and execute the executable instructions to implement the voice control method described in the first aspect.
在第四方面,本公开提供了一种计算机可读存储介质,该存储介质存储有计算机程序,当计算机程序被处理器执行时,使得处理器实现第一方面所述的语音控制方法。In a fourth aspect, the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the voice control method described in the first aspect.
在第五方面,本公开提供了一种计算机程序产品,包括计算机程序或指令,该计算机程序或指令被处理器执行时,实现第一方面所述的语音控制方法。本公开实施例的语音控制方法、装置、设备及介质,能够在显示目标交互界面之后,对目标交互界面对应的目标控制指令集进行加载,进而在接收到用户控制语音时,在目标控制指令集中查询与接收到的用户控制语音相匹配的目标控制指令,并执行该查询到的目标控制指令,进而实现用户对目标交互界面的语音控制,由于加载的目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令,该界面控件数据可以涵盖目标交互界面内的全部界面控件,因此,可以完全实现对目标交互界面的全部语音控制,进而达到了对目标交互界面的可见即可说(Display Can be Said,DCS)的效果。In a fifth aspect, the present disclosure provides a computer program product, including a computer program or an instruction. When the computer program or instruction is executed by a processor, the voice control method described in the first aspect is implemented. The voice control method, device, device, and medium of the embodiments of the present disclosure can load the target control instruction set corresponding to the target interactive interface after displaying the target interactive interface, and then when receiving the user's control voice, the target control instruction set Query the target control instruction that matches the received user control voice, and execute the queried target control instruction, so as to realize the user's voice control of the target interactive interface. Since the loaded target control instruction set includes the interface according to the target interactive interface The control command generated by the control data, the interface control data can cover all the interface controls in the target interactive interface, therefore, can fully realize all the voice control of the target interactive interface, and then achieve the visibility of the target interactive interface (Display Can be Said, DCS) effect.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
图1为本公开实施例提供的一种语音控制方法的流程示意图;FIG. 1 is a schematic flowchart of a voice control method provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种车载终端的主界面的示意图;FIG. 2 is a schematic diagram of a main interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure;
图3A为本公开实施例提供的一种车载终端的应用界面的示意图;FIG. 3A is a schematic diagram of an application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure;
图3B为本公开实施例提供的另一种车载终端的应用界面的示意图;FIG. 3B is a schematic diagram of an application interface of another vehicle-mounted terminal provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种动态控件数据的处理过程的流程示意图;FIG. 4 is a schematic flowchart of a processing process of dynamic control data provided by an embodiment of the present disclosure;
图5为本公开实施例提供的另一种语音控制方法的流程示意图;FIG. 5 is a schematic flowchart of another voice control method provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种语音控制装置的结构示意图;FIG. 6 is a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种语音控制设备的结构示意图。Fig. 7 is a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施 例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.
随着技术不断进步,越来越多的应用场景引入了具有语音控制功能的电子设备。As technology continues to advance, more and more application scenarios have introduced electronic devices with voice control functions.
一般情况下,电子设备内会预先存储有每个交互界面对应的一些控制指令,用户可以通过说出这些控制指令实现对电子设备的各个交互界面的操作控制。Generally, some control instructions corresponding to each interactive interface are pre-stored in the electronic device, and the user can control the operation of each interactive interface of the electronic device by uttering these control instructions.
例如,在基于唤醒的语音控制中,控制指令可以为唤醒词,每个交互界面可以注册有固定数量的唤醒词,用户可以通过说出这些唤醒词来满足对各个交互界面的语音控制需求。For example, in wake-up based voice control, the control command may be a wake-up word, and each interactive interface may be registered with a fixed number of wake-up words, and the user may satisfy voice control requirements for each interactive interface by speaking these wake-up words.
但是,控制指令需要预先设置,意为各个交互界面上的唤醒词都需要被提前设计,如果交互界面上的内容是动态加载,由于预先存储的控制指令的数量有限,会导致用户无法基于这些控制指令完全实现对各个交互界面的全部语音控制。However, the control instructions need to be set in advance, which means that the wake-up words on each interactive interface need to be designed in advance. If the content on the interactive interface is dynamically loaded, due to the limited number of pre-stored control instructions, users will not be able to control based on these Instructions fully realize all voice control of each interactive interface.
此外,基于唤醒词形成的唤醒模型往往较小,无法做到一个场景上支持多个复杂和较多数量的唤醒词,如果一个交互界面上注册的唤醒词过多,也会导致唤醒模型出现误唤醒的问题。In addition, the wake-up models based on wake-up words are often small, and it is impossible to support multiple complex and large numbers of wake-up words in one scene. If too many wake-up words are registered on an interactive interface, it will also cause errors in the wake-up model. Wake up problem.
为了解决上述的问题,本公开实施例提供了一种能够实现可见即可说的语音控制方法、装置、设备及介质。In order to solve the above problems, the embodiments of the present disclosure provide a voice control method, device, device, and medium capable of realizing "see-to-talk".
下面首先结合图1至图5对本公开实施例提供的语音控制方法进行说明。The voice control method provided by the embodiment of the present disclosure will first be described below with reference to FIG. 1 to FIG. 5 .
在本公开实施例中,该语音控制方法可以由电子设备执行。其中,电子设备可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴电子设备、智能家居设备等具有语音控制功能的设备。In the embodiment of the present disclosure, the voice control method may be executed by an electronic device. Wherein, the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
图1示出了本公开实施例提供的一种语音控制方法的流程示意图。Fig. 1 shows a schematic flowchart of a voice control method provided by an embodiment of the present disclosure.
如图1所示,该语音控制方法可以包括如下步骤S110至步骤S140。As shown in FIG. 1, the voice control method may include the following steps S110 to S140.
S110、显示目标交互界面。S110, displaying a target interaction interface.
在本公开实施例中,目标交互界面可以为通过电子设备的显示屏幕直观展示的界面。In the embodiment of the present disclosure, the target interaction interface may be an interface visually displayed through a display screen of the electronic device.
在一些实施例中,目标交互界面内可以显示有至少一个界面控件。界面控件可以为界面中的按钮、选项、图标或者链接等能够被用户操控的控件。In some embodiments, at least one interface control may be displayed in the target interaction interface. The interface controls may be controls that can be manipulated by the user, such as buttons, options, icons, or links in the interface.
在一些实施例中,目标交互界面可以包括电子设备开机后或者处于待机状态时所显示的主界面。In some embodiments, the target interaction interface may include a main interface displayed when the electronic device is turned on or in a standby state.
在一些实施例中,目标交互界面中所显示的界面控件可以为应用程序的图标。In some embodiments, the interface control displayed on the target interaction interface may be an icon of an application program.
图2示出了本公开实施例提供的一种车载终端的主界面的示意图。Fig. 2 shows a schematic diagram of a main interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
如图2所示,该车载终端可以显示有主界面201,在主界面201中可以显示有多个界面控件,例如“设置应用”图标202、“文件应用”图标203、“浏览器应用”图标 204和“音乐应用”图标205。As shown in FIG. 2, the vehicle-mounted terminal can display a main interface 201, and multiple interface controls can be displayed in the main interface 201, such as a "setting application" icon 202, a "file application" icon 203, and a "browser application" icon. 204 and the “Music Application” icon 205 .
在另一些实施例中,目标交互界面可以包括电子设备中安装的任意应用程序的应用界面。In some other embodiments, the target interaction interface may include an application interface of any application program installed in the electronic device.
在一些实施例中,目标交互界面中所显示的界面控件可以为应用界面中的按钮、选项、图标或者链接等。In some embodiments, the interface controls displayed in the target interaction interface may be buttons, options, icons or links in the application interface.
图3A示出了本公开实施例提供的一种车载终端的应用界面的示意图。Fig. 3A shows a schematic diagram of an application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
如图3A所示,该车载终端可以显示有设置应用程序的应用主界面301,在应用界面301可以显示有多个界面控件,例如“我的设备”按钮302、“控制中心”按钮303和“更多设置”按钮304。As shown in FIG. 3A, the vehicle-mounted terminal may display a main application interface 301 for setting an application program, and multiple interface controls may be displayed on the application interface 301, such as a "My Device" button 302, a "Control Center" button 303, and a " More Settings" button 304 .
图3B示出了本公开实施例提供的另一种车载终端的应用界面的示意图。Fig. 3B shows a schematic diagram of another application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
如图3B所示,该车载终端可以显示有音乐应用程序的应用主界面305,在应用界面305中可以显示有多个界面控件,例如“每日推荐”选项306、“歌单”选项307、“本地音乐”选项308、“设置”选项309、各种播放控制按钮310、各个歌单链接311和“返回”按钮312。As shown in Figure 3B, the vehicle-mounted terminal can display a main application interface 305 of a music application program, and multiple interface controls can be displayed in the application interface 305, such as "daily recommendation" option 306, "song list" option 307, "Local Music" option 308, "Settings" option 309, various playback control buttons 310, various playlist links 311 and a "Back" button 312.
S120、加载目标交互界面对应的目标控制指令集。目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令。S120. Load the target control instruction set corresponding to the target interactive interface. The target control instruction set includes control instructions generated according to the interface control data of the target interactive interface.
在本公开实施例中,在电子设备显示目标交互界面之后,可以加载目标交互界面对应的目标控制指令集,该目标控制指令集中的各个控制指令均为根据目标交互界面的界面控件数据所生成的,界面控件数据可以包含全部界面控件对应的控件数据,即目标交互界面内的全部界面控件分别具有相应的控制指令。In the embodiment of the present disclosure, after the electronic device displays the target interaction interface, the target control instruction set corresponding to the target interaction interface can be loaded, and each control command in the target control instruction set is generated according to the interface control data of the target interaction interface , the interface control data may include control data corresponding to all interface controls, that is, all interface controls in the target interactive interface have corresponding control instructions.
在一些实施例中,目标控制指令集可以为目标交互界面内的各个界面控件对应的控制指令的集合。In some embodiments, the target control instruction set may be a set of control instructions corresponding to various interface controls in the target interactive interface.
在一些实施例中,控制指令可以包括根据界面控件数据中的静态控件数据生成的第一控制指令。第一控制指令即静态控件的控制指令。In some embodiments, the control instruction may include a first control instruction generated according to static control data in the interface control data. The first control instruction is a control instruction of the static control.
静态控件数据可以为目标交互界面中的静态控件对应的控件数据。静态控件可以为始终固定显示的界面控件,即静态控件不会随着用户喜好或者设置而改变。The static control data may be control data corresponding to static controls in the target interactive interface. The static control may be an interface control that is always fixed and displayed, that is, the static control will not change according to user preferences or settings.
以目标交互界面为主界面为例,静态控件可以为设备出厂时自带的、不会动态更新也不可被用户更改的界面控件。Taking the target interactive interface as the main interface as an example, the static controls can be interface controls that come with the device when it leaves the factory, and cannot be dynamically updated or changed by the user.
继续参见图2,由于设置应用、文件应用和浏览器应用为设备出厂时主界面201内自带的应用程序,因此,这些应用程序对应的“设置应用”图标202、“文件应用”图标203和“浏览器应用”图标204属于主界面201的静态控件,基于“设置应用”图标202、“文件应用”图标203和“浏览器应用”图标204对应的控件数据生成的控制指令即为主界面201对应的第一控制指令。Continuing to refer to FIG. 2 , since the setting application, file application and browser application are built-in application programs in the main interface 201 when the device leaves the factory, the corresponding "setting application" icon 202, "file application" icon 203 and The "browser application" icon 204 belongs to the static control of the main interface 201, and the control instruction generated based on the control data corresponding to the "setting application" icon 202, the "file application" icon 203 and the "browser application" icon 204 is the main interface 201 corresponding to the first control command.
以目标交互界面为应用界面为例,静态控件可以为界面框架内固定显示且不会随着用户喜好更改的界面控件,例如被预置在界面工程内的内置资源,这些内置资源可以在应用界面的界面内容推送给用户之前被提前感知。Taking the target interactive interface as the application interface as an example, static controls can be interface controls that are fixedly displayed within the interface frame and will not change according to user preferences, such as built-in resources preset in the interface project, and these built-in resources can be displayed in the application interface The interface content is perceived in advance before being pushed to the user.
继续参见图3A,由于我的设备、控制中心和更多设置为设置应用程序的应用主界面301内固定的设置功能,因此,这些设置功能对应的“我的设备”按钮302、“控制 中心”按钮303和“更多设置”按钮304属于应用主界面301的静态控件,基于“我的设备”按钮302、“控制中心”按钮303和“更多设置”按钮304对应的控件数据生成的控制指令即为应用主界面301对应的第一控制指令。Continuing to refer to FIG. 3A, since my device, control center and more settings are fixed setting functions in the application main interface 301 of the setting application program, the corresponding "my device" button 302 and "control center" of these setting functions The button 303 and the "more settings" button 304 belong to the static controls of the application main interface 301, and the control instructions are generated based on the control data corresponding to the "my device" button 302, the "control center" button 303 and the "more settings" button 304 It is the first control instruction corresponding to the application main interface 301 .
继续参见图3B,由于每日推荐、排行、本地音乐、设置为音乐应用程序的应用主界面305内固定的模块功能,并且返回、各种播放控制为音乐应用程序的应用主界面301内固定的界面功能,因此,这些功能对应的“每日推荐”选项306、“歌单”选项307、“本地音乐”选项308、“设置”选项309、各种播放控制按钮310和“返回”按钮312属于应用主界面305的静态控件,基于“每日推荐”选项306、“排行”选项307、“本地音乐”选项308、“设置”选项309、各种播放控制按钮310和“返回”按钮312对应的控件数据生成的控制指令即为应用主界面305对应的第一控制指令。Continue referring to FIG. 3B , since daily recommendation, ranking, local music, and settings are fixed module functions in the application main interface 305 of the music application program, and return and various playback controls are fixed in the application main interface 301 of the music application program. Interface function, therefore, the "daily recommendation" option 306, "song list" option 307, "local music" option 308, "settings" option 309, various playback control buttons 310 and "return" button 312 corresponding to these functions belong to The static controls of the application main interface 305 are based on the "daily recommendation" option 306, the "ranking" option 307, the "local music" option 308, the "settings" option 309, various playback control buttons 310 and the "return" button 312 corresponding The control instruction generated by the control data is the first control instruction corresponding to the main application interface 305 .
在另一些实施例中,控制指令还可以包括根据界面控件数据中的动态控件数据生成的第二控制指令。第二控制指令即动态控件的控制指令。In some other embodiments, the control instruction may further include a second control instruction generated according to the dynamic control data in the interface control data. The second control instruction is a control instruction of the dynamic control.
其中,动态控件数据可以为目标交互界面中的动态控件对应的控件数据。动态控件可以为能够动态更新、或者随着用户喜好或设置而改变的界面控件。Wherein, the dynamic control data may be control data corresponding to the dynamic controls in the target interactive interface. The dynamic control may be an interface control that can be dynamically updated or changed with user preferences or settings.
以目标交互界面为主界面为例,动态控件可以为用户自行添加的界面控件。Taking the target interactive interface as the main interface as an example, the dynamic control may be an interface control added by the user.
继续参见图2,由于音乐应用为用户可以自主下载的应用程序,因此,音乐应用对应的“音乐应用”图标205属于主界面201的动态控件,基于“音乐应用”图标205对应的控件数据生成的控制指令即为主界面201对应的第二控制指令。Continuing to refer to FIG. 2, since the music application is an application program that the user can independently download, the "music application" icon 205 corresponding to the music application belongs to the dynamic control of the main interface 201, and is generated based on the control data corresponding to the "music application" icon 205. The control instruction is the second control instruction corresponding to the main interface 201 .
以目标交互界面为应用界面为例,动态控件可以为界面框架内会动态更新的界面控件,例如基于网络信源检索之后填充的资源。Taking the target interactive interface as an example of an application interface, the dynamic control may be an interface control that will be dynamically updated within the interface framework, such as a resource that is filled after retrieval based on a network source.
继续参见图3B,由于“歌单”选项307的选项卡内所显示的各个歌单名称可以动态更新,因此,这些歌单名称对应的歌单链接311属于应用主界面305的动态控件,基于各个歌单链接311对应的控件数据生成的控制指令即为应用主界面305对应的第二控制指令。Continuing to refer to FIG. 3B, since the titles of each playlist displayed in the tab of the "songlist" option 307 can be dynamically updated, therefore, the playlist links 311 corresponding to these playlist names belong to the dynamic control of the application main interface 305, based on each The control instruction generated by the control data corresponding to the song list link 311 is the second control instruction corresponding to the application main interface 305 .
需要说明的是,在目标交互界面为应用界面的情况下,动态控件还可以为界面框架内会随着用户喜好而改变的界面控件,在此不做赘述。It should be noted that, in the case that the target interaction interface is an application interface, the dynamic control may also be an interface control within the interface frame that changes according to the user's preference, and details are not described here.
在本公开实施例中,所加载的目标交互界面对应的目标控制指令集,既可以用于控制目标交互界面内的静态控件,又可以用于控制目标交互界面内的动态控件。由此,电子设备可以通过加载目标交互界面对应的目标控制指令集,使得目标交互界面中的各界面控件都分别具有对应的控制指令,以使用户能够更全面的对目标交互界面进行语音控制。In the embodiment of the present disclosure, the target control instruction set corresponding to the loaded target interaction interface can be used to control static controls in the target interaction interface, and can also be used to control dynamic controls in the target interaction interface. Thus, the electronic device can load the target control instruction set corresponding to the target interaction interface, so that each interface control in the target interaction interface has a corresponding control instruction, so that the user can perform voice control on the target interaction interface more comprehensively.
S130、响应于接收到用户控制语音,在目标控制指令集中查询与用户控制语音相匹配的目标控制指令。S130. In response to receiving the user control voice, query the target control command set for the target control command matching the user control voice.
在本公开实施例中,在电子设备加载目标控制指令集之后,可以对用户控制语音进行监测,并且在接收到用户控制语音之后,在目标控制指令集的各个控制指令中查询与用户控制语音相匹配的目标控制指令。In the embodiment of the present disclosure, after the electronic device loads the target control instruction set, it can monitor the user control voice, and after receiving the user control voice, search for information related to the user control voice in each control instruction of the target control instruction set. Matching target control instructions.
在一些实施例中,S130可以具体包括:将用户控制语音转换为目标语音文本;在目标控制指令集中查询与目标语音文本相匹配的目标控制指令。In some embodiments, S130 may specifically include: converting the user's control voice into the target voice text; querying the target control instruction set for a target control instruction that matches the target voice text.
在一些实施例中,电子设备可以将用户控制语音输入离线设置的自动语音识别(Automatic Speech Recognition,ASR)引擎中,得到ASR引擎输出的目标语音文本,进而在目标控制指令集的各个控制指令中查询与目标语音文本相匹配的目标控制指令。In some embodiments, the electronic device can input the user's control voice into an automatic speech recognition (Automatic Speech Recognition, ASR) engine set offline to obtain the target voice text output by the ASR engine, and then in each control command of the target control command set Query for target control commands that match the target speech text.
目标控制指令与目标语音文本相匹配可以为目标语音文本包含目标控制指令中的任一动词和任一控件文本分词,也可以为目标语音文本中的动词与目标控制指令中的任一动词相同且目标语音文本中的名词与目标控制指令中的任一控件文本分词的相似度大于或等于预设的相似度阈值。The matching of the target control instruction and the target speech text may be that the target speech text contains any verb in the target control instruction and any control text participle, or that the verb in the target speech text is the same as any verb in the target control instruction and The similarity between nouns in the target speech text and any control text word in the target control instruction is greater than or equal to a preset similarity threshold.
由此,电子设备可以通过在目标控制指令集中查询与用户控制语音相匹配的目标控制指令,来确定用户的语音控制意图。Thus, the electronic device can determine the user's voice control intention by querying the target control command matching the user's control voice in the target control command set.
S140、响应于查询到目标控制指令,执行目标控制指令对应的目标控制操作。S140. In response to finding the target control instruction, execute the target control operation corresponding to the target control instruction.
在本公开实施例中,如果电子设备确定其查询到与用户控制语音相匹配的目标控制指令,则可以执行该目标控制指令对应的目标控制操作;如果电子设备确定未查询到与用户控制语音相匹配的目标控制指令,则可以继续对用户语音进行检测,等待下一个用户控制语音。In the embodiment of the present disclosure, if the electronic device determines that it finds a target control instruction that matches the user control voice, it may execute the target control operation corresponding to the target control command; If there is a matching target control instruction, the detection of the user's voice can be continued, waiting for the next user to control the voice.
在一些实施例中,S140可以具体包括:针对目标控制指令所涉及的目标界面控件,执行目标控制操作。In some embodiments, S140 may specifically include: performing a target control operation for the target interface control involved in the target control instruction.
由于每个控制指令是根据对应界面控件的控件数据生成的,因此,每个控制指令能够用于触发对该控制指令所涉及的目标界面控件执行目标控制操作,即每个控制指令能够用于触发生成该控制指令的控件数据所属的目标界面控件执行目标控制操作。Since each control instruction is generated according to the control data of the corresponding interface control, each control instruction can be used to trigger the execution of the target control operation on the target interface control involved in the control instruction, that is, each control instruction can be used to trigger The target interface control to which the control data generating the control instruction belongs performs the target control operation.
进一步地,目标控制操作可以按照目标控制指令所指示的目标控制方式来实现的控制操作。Further, the target control operation may be a control operation implemented in a target control manner indicated by the target control instruction.
在一些实施例中,电子设备可以在查询到目标控制指令之后,按照目标控制指令所指示的目标控制方式,对生成该控制指令的控件数据所属的目标界面控件进行控制操作。In some embodiments, after querying the target control instruction, the electronic device may perform a control operation on the target interface control to which the control data generating the control instruction belongs according to the target control mode indicated by the target control instruction.
在本公开实施例中,在S140之后,电子设备可以进入新的交互界面,也可以保留在目标交互界面中。In the embodiment of the present disclosure, after S140, the electronic device may enter a new interaction interface, or may remain in the target interaction interface.
在一些实施例中,在电子设备保留在目标交互界面中的情况下,电子设备无需重新加载控制指令集,可以继续基于目标控制指令集实现用户对目标交互界面的语音控制。In some embodiments, when the electronic device remains in the target interaction interface, the electronic device does not need to reload the control instruction set, and can continue to implement the user's voice control of the target interaction interface based on the target control instruction set.
在另一些实施例中,在电子设备进入新的交互界面的情况下,电子设备需要重新加载新的交互界面对应的控制指令集,以基于重新加载的控制指令集实现用户对目标交互界面的语音控制。In other embodiments, when the electronic device enters a new interactive interface, the electronic device needs to reload the control instruction set corresponding to the new interactive interface, so as to implement the user's voice on the target interactive interface based on the reloaded control instruction set control.
继续参见图2,当用户控制电子设备执行“打开音乐应用”这一控制操作时,电子设备可以进入音乐应用程序的应用主界面,因此,在由主界面201跳转至音乐应用程序的应用主界面之后,需要重新获取音乐应用程序的应用主界面对应的控制指令集,以基于重新加载的控制指令集实现用户对音乐应用程序的应用主界面的语音控制。Continuing to refer to FIG. 2, when the user controls the electronic device to perform the control operation of "opening the music application", the electronic device can enter the application main interface of the music application. Therefore, after jumping from the main interface 201 to the application main interface of the music application After the interface, it is necessary to reacquire the control instruction set corresponding to the application main interface of the music application program, so as to realize the voice control of the user on the application main interface of the music application program based on the reloaded control instruction set.
继续参见图3B,当用户控制电子设备执行“播放每日推荐”这一控制操作时,电子设备可以在音乐应用程序的应用主界面305内直接播放每日推荐的歌曲,而无需跳转至其他的界面,因此,无需重新加载控制指令集,可以继续基于音乐应用程序的应用主界面305对应的控制指令集实现用户对音乐应用程序的应用主界面305的语音控制。Continuing to refer to FIG. 3B, when the user controls the electronic device to perform the control operation of "play daily recommendation", the electronic device can directly play the daily recommended song in the application main interface 305 of the music application without jumping to other Therefore, without reloading the control instruction set, the user can continue to implement voice control of the music application main application interface 305 based on the control instruction set corresponding to the music application main application interface 305 .
在本公开实施例中,能够在显示目标交互界面之后,对目标交互界面对应的目标控制指令集进行加载,进而在接收到用户控制语音时,在目标控制指令集中查询与接收到的用户控制语音相匹配的目标控制指令,并执行该查询到的目标控制指令,进而实现用户对目标交互界面的语音控制。由于加载的目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令,该界面控件数据可以涵盖目标交互界面内的全部界面控件,因此,可以完全实现对目标交互界面的全部语音控制,进而达到了对目标交互界面的DCS的效果,提升用户的使用体验。In the embodiment of the present disclosure, after the target interaction interface is displayed, the target control instruction set corresponding to the target interaction interface can be loaded, and then when the user control voice is received, the target control command set can be queried with the received user control voice match the target control instruction, and execute the queried target control instruction, thereby realizing the voice control of the user on the target interactive interface. Since the loaded target control instruction set includes control instructions generated according to the interface control data of the target interactive interface, the interface control data can cover all interface controls in the target interactive interface, therefore, all voice control of the target interactive interface can be fully realized, Furthermore, the effect of DCS on the target interactive interface is achieved, and the user experience is improved.
在本公开另一种实施方式中,在控制指令包括根据界面控件数据中的静态控件数据生成的第一控制指令的情况下,电子设备可以直接获取预先生成的第一控制指令。In another implementation manner of the present disclosure, when the control instruction includes the first control instruction generated according to the static control data in the interface control data, the electronic device may directly acquire the pre-generated first control instruction.
在一些实施例中,S120可以具体包括:确定目标交互界面所属的目标应用;在预先存储的多个预设控制指令集中,查询目标应用对应的控制指令集;在目标应用对应的控制指令集中,提取第一控制指令。In some embodiments, S120 may specifically include: determining the target application to which the target interactive interface belongs; querying the control instruction set corresponding to the target application among multiple pre-stored preset control instruction sets; in the control instruction set corresponding to the target application, A first control instruction is extracted.
在本公开实施例中,电子设备内可以预先存储有多个预设控制指令集,每个预设控制指令集可以对应一个应用程序,即每个预设控制指令集可以包含有对应应用程序所涉及的全部静态控件的控制指令。In the embodiment of the present disclosure, multiple preset control instruction sets may be pre-stored in the electronic device, and each preset control instruction set may correspond to an application program, that is, each preset control instruction set may contain Control instructions for all static controls involved.
进一步地,目标应用可以为目标交互界面所属的应用程序。电子设备可以将显示目标交互界面时所需运行的应用程序作为目标交互界面所属的目标应用。Further, the target application may be an application program to which the target interactive interface belongs. The electronic device may use the application program that needs to be run when displaying the target interaction interface as the target application to which the target interaction interface belongs.
在一些实施例中,电子设备可以将显示目标交互界面时所需运行的应用程序作为目标应用,然后在多个预设控制指令集中查询目标应用对应的控制指令集,最后在目标应用对应的控制指令集中提取目标交互界面对应的第一控制指令,第一控制指令可以包括目标交互界面所涉及的全部静态控件的控制指令。In some embodiments, the electronic device may use the application program that needs to be run when displaying the target interactive interface as the target application, then query the control instruction set corresponding to the target application in multiple preset control instruction sets, and finally control the The first control instruction corresponding to the target interactive interface is extracted from the instruction set, and the first control instruction may include control instructions of all static controls involved in the target interactive interface.
在本公开实施例中,电子设备在S110之前,可以首先接收服务器发送的预设控制指令集。In the embodiment of the present disclosure, before S110, the electronic device may first receive the preset control instruction set sent by the server.
在一些实施例中,针对每个应用程序,服务器可以接收开发者输入的该应用程序的各个交互界面所对应的全部静态控件的控制指令和每个控制指令对应的控制方式。每个静态控件的控制指令均包含有动词集合和该静态控件对应的控件文本分词集合,该静态控件对应的控件文本分词集合为开发者从该静态控件的控件数据即静态控件数据内的静态控件文本中提取的分词集合,该静态控件文本可以为该静态控件能够被用户看见的控件名,该控制指令中的动词集合包含有多个语义近似的动词。In some embodiments, for each application program, the server may receive the control instructions of all static controls corresponding to each interactive interface of the application program and the control mode corresponding to each control instruction input by the developer. The control instruction of each static control includes a verb set and a control text word segmentation set corresponding to the static control. A set of word segmentation extracted from the text, the static control text may be the control name of the static control that can be seen by the user, and the verb set in the control instruction includes multiple verbs with similar semantics.
在另一些实施例中,针对每个应用程序的每个交互界面内的每个静态控件,服务器可以从该静态控件的控件数据即静态控件数据内的静态控件文本中提取控件文本分词集合,然后利用预先设置的不同动词集合与控件文本分词集合进行组合,得到该静态控件的多个控制指令,每个控制指令中的动词集合包含有多个语义近似的动词。针对每个静态控件的每个控制指令,服务器还可以利用该控制指令中的动词集合对应的控制指令和控件文本分词集合对应的静态控件的控件功能,确定该控制指令对应的控制方式。In other embodiments, for each static control in each interactive interface of each application, the server may extract the control text word segmentation set from the control data of the static control, that is, the static control text in the static control data, and then Multiple control instructions of the static control are obtained by combining different pre-set verb sets and control text segmentation sets, and the verb sets in each control command include multiple verbs with similar semantics. For each control instruction of each static control, the server can also use the control instruction corresponding to the verb set in the control instruction and the control function of the static control corresponding to the control text participle set to determine the control mode corresponding to the control instruction.
其中,控件文本分词集合内的各个分词可以通过“|”进行连接,动词集合中的各个动词也可以通过“|”进行连接。Wherein, each participle in the control text participle set can be connected by "|", and each verb in the verb set can also be connected by "|".
由此,在本公开实施例中,可以得到符合扩展巴科斯范式(Extended Backus–Naur  Form,EBNF)语法范式的分词集合内容,使得第一控制指令可以加载到语法(Grammar)引擎的语言模型中。Thus, in the embodiment of the present disclosure, the participle set content conforming to the Extended Backus-Naur Form (EBNF) grammatical paradigm can be obtained, so that the first control instruction can be loaded into the language model of the grammar (Grammar) engine .
继续参考图2,以“音乐应用”图标205对应的控制指令为例,如果一个控制指令用于打开音乐应用,则动词集合可以为“打开|开|进入|进|点|点击”,控件文本分词集合可以为“音乐|音乐应用|音乐的|音乐图标”。Continuing to refer to FIG. 2 , taking the control instruction corresponding to the "music application" icon 205 as an example, if a control instruction is used to open a music application, the verb set can be "open|open|enter|enter|click|click", and the control text The participle set may be "music|music application|music|music icon".
在本公开实施例中,由于目标交互界面内的静态控件可能因版本升级等原因进行更新,因此,电子设备内存储的预设控制指令集也需要进行更新,以保证用户可以对更新后的目标交互界面内的全部静态控件进行语音控制。In the embodiment of the present disclosure, since the static controls in the target interaction interface may be updated due to reasons such as version upgrades, the preset control instruction set stored in the electronic device also needs to be updated to ensure that the user can control the updated target All static controls in the interactive interface are controlled by voice.
在一些实施例中,在目标应用对应的控制指令集中,提取第一控制指令之前,该语音控制方法还可以包括:检测目标应用对应的控制指令集的指令集版本。In some embodiments, before extracting the first control instruction from the control instruction set corresponding to the target application, the voice control method may further include: detecting an instruction set version of the control instruction set corresponding to the target application.
在一些实施例中,电子设备可以检测目标应用对应的控制指令集的指令集版本,得到目标应用对应的控制指令集的版本号。In some embodiments, the electronic device may detect the instruction set version of the control instruction set corresponding to the target application, and obtain the version number of the control instruction set corresponding to the target application.
相应地,在目标应用对应的控制指令集中,提取第一控制指令可以具体包括:响应于检测到指令集版本为最新版本,在目标应用对应的控制指令集中,提取第一控制指令。Correspondingly, extracting the first control instruction from the control instruction set corresponding to the target application may specifically include: in response to detecting that the version of the instruction set is the latest version, extracting the first control instruction from the control instruction set corresponding to the target application.
具体地,电子设备可以通过判断检测到的版本号是否为最新版本号,以确定目标应用对应的控制指令集的指令集版本是否为最新版本,如果电子设备确定版本号是最新版本号,则可以确定指令集版本为最新版本,此时无需更新目标应用对应的控制指令集,可以直接在目标应用对应的控制指令集中提取目标交互界面对应的第一控制指令。Specifically, the electronic device can determine whether the instruction set version of the control instruction set corresponding to the target application is the latest version by judging whether the detected version number is the latest version number. If the electronic device determines that the version number is the latest version number, it can It is determined that the version of the instruction set is the latest version. At this time, there is no need to update the control instruction set corresponding to the target application, and the first control instruction corresponding to the target interactive interface can be directly extracted from the control instruction set corresponding to the target application.
在另一些实施例中,在检测目标应用对应的控制指令集的指令集版本之后,该语音控制方法还可以包括:响应于检测到指令集版本不是最新版本,从服务器下载目标应用对应的待更新控制指令集;利用待更新控制指令集替换目标应用对应的控制指令集;在待更新控制指令集中,提取第一控制指令。In some other embodiments, after detecting the instruction set version of the control instruction set corresponding to the target application, the voice control method may further include: in response to detecting that the instruction set version is not the latest version, downloading from the server the to-be-updated version corresponding to the target application. The control instruction set; the control instruction set corresponding to the target application is replaced by the control instruction set to be updated; the first control instruction is extracted from the control instruction set to be updated.
在一些实施例中,如果电子设备确定版本号不是最新版本号,则可以确定指令集版本不是最新版本,此时需要更新目标应用对应的控制指令集,电子设备可以向服务器发送针对目标应用的控制指令集更新请求,使服务器响应于接收到控制指令集更新请求,向电子设备反馈目标应用对应的最近版本的控制指令集即目标应用对应的待更新控制指令集,以从服务器下载目标应用对应的待更新控制指令集,进而利用待更新控制指令集替换目标应用对应的控制指令集,即将待更新控制指令集作为目标应用对应的新的控制指令集,并且删除不是最新版本的目标控制指令集,然后在待更新控制指令集即目标应用对应的新的控制指令集中提取目标交互界面对应的第一控制指令。In some embodiments, if the electronic device determines that the version number is not the latest version number, it can determine that the instruction set version is not the latest version. At this time, the control instruction set corresponding to the target application needs to be updated, and the electronic device can send the control instruction set for the target application to the server. An instruction set update request, so that the server responds to receiving the control instruction set update request, and feeds back to the electronic device the latest version of the control instruction set corresponding to the target application, that is, the control instruction set corresponding to the target application to be updated, so as to download the corresponding version of the target application from the server. The control instruction set to be updated, and then use the control instruction set to be updated to replace the control instruction set corresponding to the target application, that is, the control instruction set to be updated is used as the new control instruction set corresponding to the target application, and the target control instruction set that is not the latest version is deleted, Then extract the first control instruction corresponding to the target interaction interface from the control instruction set to be updated, that is, the new control instruction set corresponding to the target application.
由此,在本公开实施例中,可以预先为各个应用程序的各个交互界面整理出全部静态控件的控制指令,进而在电子设备内预先存储其所安装的全部应用程序对应的全部静态控件的控制指令,并将这些控制指令作为静态预设内容,实现对目标交互界面的第一控制指令的快速加载。Therefore, in the embodiment of the present disclosure, the control instructions of all static controls can be sorted out in advance for each interactive interface of each application program, and then the control instructions of all static controls corresponding to all installed application programs can be pre-stored in the electronic device. instructions, and these control instructions are used as static preset content to realize fast loading of the first control instructions of the target interactive interface.
在本公开又一种实施方式中,在控制指令包括根据界面控件数据中的动态控件数据生成的第二控制指令的情况下,电子设备可以根据动态控件数据生成第二控制指令。In yet another implementation manner of the present disclosure, when the control instruction includes a second control instruction generated according to the dynamic control data in the interface control data, the electronic device may generate the second control instruction according to the dynamic control data.
在一些实施例中,加载目标交互界面对应的目标控制指令集可以具体包括:对动态 控件数据进行处理,以生成第二控制指令。In some embodiments, loading the target control instruction set corresponding to the target interaction interface may specifically include: processing the dynamic control data to generate the second control instruction.
在一些实施例中,动态控件为在动态内容预留字段内填充控件数据所形成的控件。In some embodiments, the dynamic control is a control formed by filling control data in a dynamic content reserved field.
下面参考图4对本公开实施例提供的动态控件数据的处理过程进行详细说明。The processing process of the dynamic control data provided by the embodiment of the present disclosure will be described in detail below with reference to FIG. 4 .
图4示出了本公开实施例提供的一种动态控件数据的处理过程的流程示意图。Fig. 4 shows a schematic flowchart of a processing procedure of dynamic control data provided by an embodiment of the present disclosure.
如图4所示,该动态控件数据的处理过程可以包括如下步骤S410至步骤S430。As shown in FIG. 4, the processing process of the dynamic control data may include the following steps S410 to S430.
S410、在动态控件数据中,提取动态控件文本。S410. Extract dynamic control text from the dynamic control data.
在本公开实施例中,无论是静态控件数据还是动态控件数据,都可以包括控件的控件文本、控件显示参数等等。每个静态控件数据可以属于一个静态控件,每个动态控件数据可以属于一个动态控件。In the embodiment of the present disclosure, whether it is static control data or dynamic control data, it may include control text of the control, display parameters of the control, and the like. Each static control data can belong to a static control, and each dynamic control data can belong to a dynamic control.
在一些实施例中,电子设备可以从在目标交互界面对应的动态控件数据中提取该动态控件数据所属的动态控件的动态控件文本,该动态控件文本可以为该动态控件能够被用户看见的控件名。In some embodiments, the electronic device may extract the dynamic control text of the dynamic control to which the dynamic control data belongs from the dynamic control data corresponding to the target interaction interface, and the dynamic control text may be the name of the control that the dynamic control can be seen by the user. .
继续参见图3B,歌单链接311属于应用主界面305的动态控件,以“氛围钢琴曲当灵魂与无尽虚空对话”的歌单链接311为例,其动态控件文本为“氛围钢琴曲当灵魂与无尽虚空对话”。Continuing to refer to FIG. 3B , the song list link 311 belongs to the dynamic control of the application main interface 305. Taking the song list link 311 of "Ambient Piano Music as Soul and Endless Void" as an example, its dynamic control text is "Ambient Piano Music as Soul and Endless Void" Endless Void Dialogue".
S420、对动态控件文本进行分词处理,得到动态控件文本对应的分词集合。S420. Perform word segmentation processing on the dynamic control text to obtain a word segment set corresponding to the dynamic control text.
在本公开实施例中,电子设备可以在提取到任一动态控件的动态控件文本之后,对该动态控件文本进行分词处理,得到动态控件文本对应的分词集合即动态控件的控件文本分词集合。In the embodiment of the present disclosure, after extracting the dynamic control text of any dynamic control, the electronic device may perform word segmentation processing on the dynamic control text to obtain a word segment set corresponding to the dynamic control text, that is, a control text word segment set of the dynamic control.
在一些实施例中,电子设备可以利用任意的分词处理算法,将动态控件文本拆分为多个控件文本分词,然后将任意多个相邻的控件文本分词进行组合,得到多个分词组合,最后得到包含多个控件文本分词和多个分词组合的动态控件文本对应的分词集合。In some embodiments, the electronic device can use any word segmentation processing algorithm to split the dynamic control text into multiple control text word segments, then combine any number of adjacent control text word segments to obtain multiple word segment combinations, and finally A word segment set corresponding to the dynamic control text including multiple control text word segments and combinations of multiple word segments is obtained.
在一些实施例中,将多个控件文本分词和多个分词组合进行组合得到分词集合的方法可以包括多个控件文本分词和多个分词组合利用“|”进行连接,得到分词集合。In some embodiments, the method of combining multiple control text word segments and multiple word segment combinations to obtain a word segment set may include connecting multiple control text word segments and multiple word segment combinations with "|" to obtain a word segment set.
由此,在本公开实施例中,可以得到符合EBNF语法范式的分词集合内容,使得生成的第二控制指令可以加载到Grammar引擎的语言模型中。Therefore, in the embodiment of the present disclosure, the content of the participle set conforming to the EBNF grammatical paradigm can be obtained, so that the generated second control instruction can be loaded into the language model of the Grammar engine.
S430、根据分词集合生成第二控制指令。S430. Generate a second control instruction according to the word segmentation set.
在本公开实施例中,电子设备在得到动态控件文本对应的分词集合之后,可以基于预设的控制指令生成方式,根据分词集合生成第二控制指令。In the embodiment of the present disclosure, after obtaining the word segmentation set corresponding to the dynamic control text, the electronic device may generate the second control instruction according to the word segmentation set based on a preset control instruction generation manner.
在一些实施例中,S430可以具体包括:根据预设的动词集合和分词集合,生成第二控制指令。In some embodiments, S430 may specifically include: generating a second control instruction according to a preset verb set and participle set.
电子设备可以利用预先设置的不同动词集合与分词集合进行组合,得到该动态控件的多个控制指令,每个控制指令中的动词集合包含有多个语义近似的动词。针对每个动态控件的每个控制指令,电子设备还可以利用该控制指令中的动词集合对应的控制指令和分词集合对应的动态控件的控件功能,确定该控制指令对应的控制方式。The electronic device can combine different pre-set verb sets and word segmentation sets to obtain multiple control instructions of the dynamic control, and the verb set in each control instruction includes multiple verbs with similar semantics. For each control command of each dynamic control, the electronic device can also use the control command corresponding to the verb set in the control command and the control function of the dynamic control corresponding to the participle set to determine the control mode corresponding to the control command.
在本公开一些实施例中,在S420之前,该语音控制方法还可以包括:对动态控件文本进行预处理。In some embodiments of the present disclosure, before S420, the voice control method may further include: preprocessing the dynamic control text.
在本公开实施例中,电子设备在提取到动态控件文本之后,在对动态控件文本进行 分词处理,得到动态控件文本对应的分词集合之前,还可以先对动态控件文本进行预处理,得到能够用于进行文本处理的动态控件文本。In the embodiment of the present disclosure, after the electronic device extracts the dynamic control text, before performing word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text, it can also preprocess the dynamic control text to obtain Dynamic control text for text processing.
在一些实施例中,预处理可以包括符号剔除处理、数字转换处理。In some embodiments, pre-processing may include symbol removal processing, digital conversion processing.
符号剔除处理可以用于剔除动态控件文本中的符号,例如标点符号、特殊符号、数学符号等任意不具有语义的符号。Symbol elimination processing can be used to eliminate symbols in the dynamic control text, such as punctuation marks, special symbols, mathematical symbols and any other symbols that do not have semantic meaning.
数字转换处理可以用于将动态控件文本中的阿拉伯数字转换为中文数字。其中,如果阿拉伯数字有两位以上的数字,则可以将整个阿拉伯数字转换为一个中文数字,也可以将每个数字分别转换为一个数字。Numeral conversion processing can be used to convert Arabic numerals in dynamic control text to Chinese numerals. Among them, if the Arabic numerals have more than two digits, the entire Arabic numerals can be converted into a Chinese numeral, or each numeral can be converted into a numeral respectively.
在一些实施例中,电子设备在提取到动态控件文本之后,可以首先剔除动态控件文本中的符号,得到剔除符号后的动态控件文本。然后,电子设备可以将剔除符号后的动态控件文本中的阿拉伯数字转换为中文数字。以动态控件文本“语文课代表私藏小抄200首成语歌名”为例,阿拉伯数字“200”可以转换为中文数字“二百”,也可以转换为中文数字“二零零”,得到数字转换后的动态控件文本“语文课代表私藏小抄二百首成语歌名|语文课代表私藏小抄二零零首成语歌名”,电子设备可以对转换后的动态控件文本进行分词和分词组合,得到分词集合“语文课代表私藏小抄二百首成语歌名|语文课代表私藏小抄二零零首成语歌名|二百首成语歌名|二零零首成语歌名|成语歌名|语文课代表|私藏小抄”。In some embodiments, after extracting the dynamic control text, the electronic device may first remove symbols in the dynamic control text to obtain the dynamic control text after removing symbols. Then, the electronic device can convert the Arabic numerals in the dynamic control text after removing symbols into Chinese numerals. Taking the dynamic control text "200 idiom song titles privately collected by the representative of the Chinese class" as an example, the Arabic numeral "200" can be converted into the Chinese numeral "two hundred", and it can also be converted into the Chinese numeral "200" to obtain the digital conversion The final dynamic control text is "The Chinese class represents the private collection of 200 idiom song names in the cheat sheet | The Chinese class represents the private collection of 200 idiom song titles in the private copy", the electronic device can perform word segmentation and word segmentation combination on the converted dynamic control text, Get the participle collection "Chinese class representative's private collection cheat sheet 200 idiom song titles | Chinese class representative private collection cheat sheet 200 idiom song titles | 200 idiom song titles | 200 idiom song titles | idiom song titles| Representative of the Chinese class|Privately kept a cheat sheet".
由此,在本公开实施例中,电子设备在显示目标交互界面之后,可以基于目标交互界面内的各个动态控件数据生成目标交互界面内的全部动态控件的控制指令即第二控制指令,进而将这些控制指令作为动态加载内容,实现对目标交互界面的第二控制指令的可靠、高效地加载。Therefore, in the embodiment of the present disclosure, after the electronic device displays the target interaction interface, it can generate control instructions for all the dynamic controls in the target interaction interface based on the data of each dynamic control in the target interaction interface, that is, the second control instruction, and then These control instructions are used as dynamic loading content to realize reliable and efficient loading of the second control instructions of the target interactive interface.
下面以一个示例,对本公开实施例所提供的语音控制方法进行详细说明。An example is used below to describe the voice control method provided by the embodiment of the present disclosure in detail.
图5示出了本公开实施例提供的另一种语音控制方法的流程示意图。Fig. 5 shows a schematic flowchart of another voice control method provided by an embodiment of the present disclosure.
如图5所示,该语音控制方法可以包括如下步骤S510至步骤S560。As shown in FIG. 5, the voice control method may include the following steps S510 to S560.
S510、显示目标交互界面。S510. Displaying a target interaction interface.
在本公开实施例中,具备语音控制功能的电子设备可以显示目标交互界面,使用户可以对目标交互界面进行语音控制。In an embodiment of the present disclosure, an electronic device with a voice control function may display a target interaction interface, so that the user may perform voice control on the target interaction interface.
S520、加载目标交互界面对应的目标控制指令集。S520. Load the target control instruction set corresponding to the target interaction interface.
在本公开实施例中,在显示目标交互界面之后,电子设备可以对ASR引擎进行初始化,并且加载指令内容为空的语言模型。然后,向语言模型中加载目标交互界面对应的目标控制指令集。在ASR引擎启动、初始化以及目标控制指令集的加载过程中,电子设备不接收用户语音。In the embodiment of the present disclosure, after displaying the target interaction interface, the electronic device may initialize the ASR engine, and load the language model whose instruction content is empty. Then, load the target control instruction set corresponding to the target interactive interface into the language model. During the start-up and initialization of the ASR engine and the loading of the target control instruction set, the electronic device does not receive user voice.
其中,电子设备需要向语言模型中加载目标控制指令集中的第一控制指令和第二控制指令。Wherein, the electronic device needs to load the first control instruction and the second control instruction in the target control instruction set into the language model.
电子设备可以首先判断预加载的语言模型所属的应用程序是否为目标交互界面所属的目标应用,如果是,则将目标控制指令集加载到语言模型中;如果不是,则重新加载目标应用对应的空的语言模型,再将目标控制指令集加载到重新加载的语言模型中。The electronic device can first determine whether the application to which the preloaded language model belongs is the target application to which the target interactive interface belongs, and if so, load the target control instruction set into the language model; if not, reload the empty program corresponding to the target application. language model, and then load the target control instruction set into the reloaded language model.
在一些实施例中,电子设备可以首先确定目标交互界面所属的目标应用,然后在多 个预设控制指令集中查询目标应用对应的控制指令集,进而确定目标应用对应的控制指令集的指令集版本是否为最新版本,如果确定指令集版本为最新版本,此时无需更新目标应用对应的控制指令集,可以直接在目标应用对应的控制指令集中提取目标交互界面对应的第一控制指令;如果确定指令集版本不是最新版本,此时需要更新目标应用对应的控制指令集,可以从服务器下载目标应用对应的待更新控制指令集,并利用待更新控制指令集替换目标应用对应的控制指令集,以在待更新控制指令集中提取目标交互界面对应的第一控制指令。在电子设备获取的第一控制指令之后,可以将第一控制指令加载到语言模型中。In some embodiments, the electronic device may first determine the target application to which the target interactive interface belongs, and then query the control instruction set corresponding to the target application in multiple preset control instruction sets, and then determine the instruction set version of the control instruction set corresponding to the target application Whether it is the latest version, if it is determined that the instruction set version is the latest version, there is no need to update the control instruction set corresponding to the target application at this time, and the first control instruction corresponding to the target interactive interface can be directly extracted from the control instruction set corresponding to the target application; The set version is not the latest version. In this case, the control instruction set corresponding to the target application needs to be updated. You can download the control instruction set corresponding to the target application to be updated from the server, and use the control instruction set to be updated to replace the control instruction set corresponding to the target application. The first control instruction corresponding to the target interactive interface is extracted from the set of control instructions to be updated. After the first control instruction is acquired by the electronic device, the first control instruction can be loaded into the language model.
在另一些实施例中,电子设备可以获取目标交互界面中的全部动态控件对应的动态控件数据。针对每个动态控件对应的动态控件数据,电子设备可以从动态控件数据中提取该动态控件的动态控件文本,然后剔除动态控件文本中的符号并将动态控件文本中的阿拉伯数字转换为中文数字,得到预处理后的动态控件文本,接着对该动态控件文本进行分词处理,得到动态控件文本对应的分词集合,最后根据预设的动词集合和分词集合,生成第二控制指令。在电子设备获取的第二控制指令之后,可以将第二控制指令加载到语言模型中。In some other embodiments, the electronic device may acquire dynamic control data corresponding to all dynamic controls in the target interaction interface. For the dynamic control data corresponding to each dynamic control, the electronic device can extract the dynamic control text of the dynamic control from the dynamic control data, then remove the symbols in the dynamic control text and convert the Arabic numerals in the dynamic control text into Chinese numerals, The preprocessed dynamic control text is obtained, and then word segmentation is performed on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text, and finally a second control command is generated according to the preset verb set and word segmentation set. After the second control instruction is acquired by the electronic device, the second control instruction may be loaded into the language model.
其中,在根据预设的动词集合和分词集合,生成第二控制指令之前,然后将不同的语法内容加入语言模型对应的代码位置,最后将更新后的语言模型编译为二进制的语言模型资源文件,并发给音频识别模型。Among them, before generating the second control instruction according to the preset verb set and participle set, then adding different grammatical content to the code position corresponding to the language model, and finally compiling the updated language model into a binary language model resource file, and sent to the audio recognition model.
进一步地,电子设备还可以将第一控制指令和第二控制指令转化为二进制代码后,加载到语音模型中。Further, the electronic device may also convert the first control instruction and the second control instruction into binary codes, and load them into the speech model.
S530、接收用户控制语音。S530. Receive user control voice.
在本公开实施例中,电子设备在完成对目标控制指令集的加载之后,可以等待用户输入语音。若基于语音端点检测(Voice Activity Detection,VAD)识别到人声开始时,持续录音。若基于VAD识别到人声结束时,则停止录音。电子设备可以将录音得到的音频作为用户控制语音。电子设备进而可以将用户控制语音输入ASR引擎,得到用户控制语音对应的目标语音文本。In the embodiment of the present disclosure, after the electronic device finishes loading the target control instruction set, it may wait for the user to input a voice. If the start of human voice is recognized based on Voice Activity Detection (VAD), the recording will continue. If the end of the human voice is recognized based on the VAD, the recording is stopped. The electronic device can use the recorded audio as the user control voice. The electronic device can then input the user control voice into the ASR engine to obtain the target voice text corresponding to the user control voice.
S540、根据用户控制语音对应的目标语音文本,在目标控制指令集中查找与用户控制语音相匹配的目标控制指令。S540. According to the target voice text corresponding to the user control voice, search for a target control command matching the user control voice in the target control command set.
在本公开实施例中,电子设备可以在目标控制指令集中查询与目标语音文本相匹配的目标控制指令。In the embodiment of the present disclosure, the electronic device may search the target control command set for the target control command matching the target voice text.
S550、判断是否查找到目标控制指令。S550. Determine whether the target control instruction is found.
在本公开实施例中,电子设备可以判断是否查找到目标控制指令,如果查询到目标控制指令,则执行S560,否则返回执行S530。In the embodiment of the present disclosure, the electronic device may determine whether the target control instruction is found, and if the target control instruction is found, execute S560, otherwise return to execute S530.
S560、执行目标控制指令对应的目标控制操作。S560. Execute the target control operation corresponding to the target control instruction.
在本公开实施例中,电子设备可以按照目标控制指令所指示的目标控制方式,对生成该控制指令的控件数据所属的目标界面控件进行控制操作。In the embodiment of the present disclosure, the electronic device may perform a control operation on the target interface control to which the control data generating the control command belongs according to the target control mode indicated by the target control command.
综上所述,在本公开实施例中,由于利用动词集合和分词集合生成控制指令,可以支持单个交互界面的数千级别说法的语音控制,同时控制指令包括了基于交互界面中的 静态控件生成的静态控制指令和基于交互界面中的动态控件生成的动态控制指令,因此在能够支撑足够大的语法量级的基础上,还可以任意扩充控制指令,实现对交互界面的可见即可说的效果。另外,对控制指令的加载过程和对用户控制语音的识别过程相互独立,互不干扰,能够提高识别的准确率。To sum up, in the embodiment of the present disclosure, since the verb set and the participle set are used to generate control instructions, it can support the voice control of thousands of levels of speech in a single interface, and the control instructions include the generation of static controls based on the interface. The static control instructions and the dynamic control instructions generated based on the dynamic controls in the interactive interface, so on the basis of being able to support a sufficiently large grammatical level, the control instructions can also be arbitrarily expanded to achieve an effect that is visible to the interactive interface. . In addition, the loading process of the control instruction and the recognition process of the user's control voice are independent of each other and do not interfere with each other, which can improve the accuracy of recognition.
进一步地,在本公开实施例中,ASR引擎和Grammar引擎均为离线的引擎,可以做到运行在端侧(即在电子设备内实现运行),不依赖网络。并且,引擎的模型足够小,对算力要求较低,可以让交互界面内需要支撑的控制指令以较快的速度被响应(平均比云端识别结果要快1.2s左右,比离线通用识别结果快500ms左右),在车辆应用场景下,可以带来较高的收益。Furthermore, in the embodiment of the present disclosure, both the ASR engine and the Grammar engine are offline engines, which can run on the terminal side (that is, run in the electronic device) without relying on the network. Moreover, the model of the engine is small enough and requires low computing power, so that the control commands that need to be supported in the interactive interface can be responded at a faster speed (on average, it is about 1.2s faster than the cloud recognition result, and faster than the offline general recognition result. 500ms), which can bring higher benefits in vehicle application scenarios.
图6示出了本公开实施例提供的一种语音控制装置的结构示意图。Fig. 6 shows a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
在本公开一些实施例中,图6所示的装置可以应用于电子设备中。其中,电子设备可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴电子设备、智能家居设备等具有语音控制功能的设备。In some embodiments of the present disclosure, the apparatus shown in FIG. 6 may be applied to electronic equipment. Wherein, the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
如图6所示,该语音控制装置600可以包括界面显示模块610、指令加载模块620、指令匹配模块630和指令执行模块640。As shown in FIG. 6 , the voice control device 600 may include an interface display module 610 , an instruction loading module 620 , an instruction matching module 630 and an instruction execution module 640 .
该界面显示模块610可以配置为显示目标交互界面。The interface display module 610 may be configured to display a target interaction interface.
该指令加载模块620,可以配置为加载目标交互界面对应的目标控制指令集,目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令。The instruction loading module 620 may be configured to load a target control instruction set corresponding to the target interactive interface, where the target control instruction set includes control instructions generated according to interface control data of the target interactive interface.
该指令匹配模块630,可以配置为响应于接收到用户控制语音,在目标控制指令集中查询与用户控制语音相匹配的目标控制指令。The instruction matching module 630 may be configured to, in response to receiving the user control speech, query the target control instruction set for the target control instruction matching the user control speech.
该指令执行模块640,可以配置为响应于查询到目标控制指令,执行目标控制指令对应的目标控制操作。The instruction executing module 640 may be configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
在本公开实施例中,能够在显示目标交互界面之后,对目标交互界面对应的目标控制指令集进行加载,进而在接收到用户控制语音时,在目标控制指令集中查询与接收到的用户控制语音相匹配的目标控制指令,并执行该查询到的目标控制指令,进而实现用户对目标交互界面的语音控制,由于加载的目标控制指令集包括根据目标交互界面的界面控件数据生成的控制指令,该界面控件数据可以涵盖目标交互界面内的全部界面控件,因此,可以完全实现对目标交互界面的全部语音控制,进而达到了对目标交互界面的可见即可说的效果,提升用户的使用体验。In the embodiment of the present disclosure, after the target interaction interface is displayed, the target control instruction set corresponding to the target interaction interface can be loaded, and then when the user control voice is received, the target control command set can be queried with the received user control voice match the target control instruction, and execute the queried target control instruction, and then realize the user's voice control of the target interactive interface. Since the loaded target control instruction set includes control instructions generated according to the interface control data of the target interactive interface, the The interface control data can cover all the interface controls in the target interactive interface. Therefore, all the voice control of the target interactive interface can be fully realized, thereby achieving the effect of being visible and talking about the target interactive interface, and improving the user experience.
在本公开一些实施例中,控制指令可以包括根据界面控件数据中的静态控件数据生成的第一控制指令。In some embodiments of the present disclosure, the control instruction may include a first control instruction generated according to static control data in the interface control data.
在本公开一些实施例中,界面显示模块620还可以包括应用确定单元、第一查询单元和第一提取单元。In some embodiments of the present disclosure, the interface display module 620 may further include an application determination unit, a first query unit, and a first extraction unit.
该应用确定单元可以配置为确定目标交互界面所属的目标应用。The application determining unit may be configured to determine the target application to which the target interactive interface belongs.
该第一查询单元可以配置为在预先存储的多个预设控制指令集中,查询目标应用对应的目标控制指令集。The first query unit may be configured to query the target control command set corresponding to the target application among the multiple preset control command sets stored in advance.
该第一提取单元还可以配置为在目标应用对应的控制指令集中,提取第一控制指令。The first extracting unit may also be configured to extract the first control instruction from the control instruction set corresponding to the target application.
在本公开一些实施例中,界面显示模块620还可以包括版本检测单元,该版本检测 单元可以配置为在目标控制指令集中提取第一控制指令之前,检测目标应用对应的控制指令集的指令集版本。In some embodiments of the present disclosure, the interface display module 620 may further include a version detection unit, which may be configured to detect the instruction set version of the control instruction set corresponding to the target application before extracting the first control instruction from the target control instruction set .
该第一提取单元可以进一步配置为在版本检测单元检测到指令集版本为最新版本时,则在目标控制指令集中,提取第一控制指令。The first extraction unit may be further configured to extract the first control instruction from the target control instruction set when the version detection unit detects that the instruction set version is the latest version.
在本公开一些实施例中,界面显示模块620还可以包括指令集下载单元、第一处理单元和第二提取单元。In some embodiments of the present disclosure, the interface display module 620 may further include an instruction set downloading unit, a first processing unit, and a second extracting unit.
该指令集下载单元可以配置为在检测目标应用对应的控制指令集的指令集版本之后,若检测到指令集版本不是最新版本,则从服务器下载目标应用对应的待更新控制指令集。The instruction set downloading unit may be configured to download the control instruction set to be updated corresponding to the target application from the server if it detects that the instruction set version is not the latest version after detecting the instruction set version of the control instruction set corresponding to the target application.
该第一处理单元可以配置为利用待更新控制指令集替换目标控制指令集。The first processing unit may be configured to replace the target control instruction set with the control instruction set to be updated.
该第二提取单元可以配置为在待更新控制指令集中,提取第一控制指令。The second extracting unit may be configured to extract the first control instruction from the set of control instructions to be updated.
在本公开一些实施例中,控制指令可以包括根据界面控件数据中的动态控件数据生成的第二控制指令。In some embodiments of the present disclosure, the control instruction may include a second control instruction generated according to the dynamic control data in the interface control data.
在本公开一些实施例中,界面显示模块620还可以包括第三提取单元、第二处理单元和指令生成单元。In some embodiments of the present disclosure, the interface display module 620 may further include a third extracting unit, a second processing unit, and an instruction generating unit.
该第三提取单元可以配置为在动态控件数据中,提取动态控件文本。The third extracting unit may be configured to extract dynamic control text from the dynamic control data.
该第二处理单元可以配置为对动态控件文本进行分词处理,得到动态控件文本对应的分词集合。The second processing unit may be configured to perform word segmentation processing on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text.
该指令生成单元可以配置为根据分词集合生成第二控制指令。The instruction generation unit may be configured to generate a second control instruction according to the word segmentation set.
在本公开一些实施例中,界面显示模块620还可以包括第三处理单元,该第三处理单元可以配置为在对动态控件文本进行分词处理,得到动态控件文本对应的分词集合之前,对动态控件文本进行预处理,其中,预处理包括符号剔除处理、数字转换处理。In some embodiments of the present disclosure, the interface display module 620 may further include a third processing unit, and the third processing unit may be configured to perform word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text, The text is pre-processed, wherein the pre-processing includes symbol elimination processing and digital conversion processing.
在本公开一些实施例中,该指令生成单元可以进一步配置为根据预设的动词集合和分词集合,生成第二控制指令。In some embodiments of the present disclosure, the instruction generation unit may be further configured to generate a second control instruction according to a preset verb set and participle set.
在本公开一些实施例中,界面显示模块630可以包括文本转换单元和第二查询单元。In some embodiments of the present disclosure, the interface display module 630 may include a text conversion unit and a second query unit.
该文本转换单元可以配置为将用户控制语音转换为目标语音文本。The text converting unit may be configured to convert the user control speech into target speech text.
该第二查询单元可以配置为在目标控制指令集中查询与目标语音文本相匹配的目标控制指令。The second query unit may be configured to query the target control command set that matches the target voice text.
在本公开一些实施例中,指令执行模块640可以进一步配置为针对目标控制指令所涉及的目标界面控件,执行目标控制操作。In some embodiments of the present disclosure, the instruction executing module 640 may be further configured to execute a target control operation for the target interface control involved in the target control command.
需要说明的是,图6所示的语音控制装置600可以执行图1至图5所示的方法实施例中的各个步骤,并且实现图1至图5所示的方法实施例中的各个过程和效果,在此不做赘述。It should be noted that the voice control device 600 shown in FIG. 6 can execute each step in the method embodiment shown in FIG. 1 to FIG. 5 , and realize each process and effects, which will not be described here.
图7示出了本公开实施例提供的一种语音控制设备的结构示意图。Fig. 7 shows a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
在本公开一些实施例中,图7所示的语音控制设备可以为电子设备中。In some embodiments of the present disclosure, the voice control device shown in FIG. 7 may be an electronic device.
其中,电子设备可以包括移动电话、平板电脑、台式计算机、笔记本电脑、车载终端、可穿戴电子设备、智能家居设备等具有语音控制功能的设备。Wherein, the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
如图7所示,该语音控制设备可以包括处理器701以及存储有计算机程序指令的存 储器702。As shown in Figure 7, the voice control device may include a processor 701 and a memory 702 storing computer program instructions.
具体地,上述处理器701可以包括中央处理器(CPU),或者特定集成电路(Application Specific Integrated Circuit,ASIC),或者可以被配置成实施本申请实施例的一个或多个集成电路。Specifically, the processor 701 may include a central processing unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits in the embodiments of the present application.
存储器702可以包括用于信息或指令的大容量存储器。举例来说而非限制,存储器702可以包括硬盘驱动器(Hard Disk Drive,HDD)、软盘驱动器、闪存、光盘、磁光盘、磁带或通用串行总线(Universal Serial Bus,USB)驱动器或者两个及其以上这些的组合。在合适的情况下,存储器602可包括可移除或不可移除(或固定)的介质。在合适的情况下,存储器602可在综合网关设备的内部或外部。在特定实施例中,存储器602是非易失性固态存储器。在特定实施例中,存储器702包括只读存储器(Read-Only Memory,ROM)。在合适的情况下,该ROM可以是掩模编程的ROM、可编程ROM(Programmable ROM,PROM)、可擦除PROM(Electrical Programmable ROM,EPROM)、电可擦除PROM(Electrically Erasable Programmable ROM,EEPROM)、电可改写ROM(Electrically Alterable ROM,EAROM)或闪存,或者两个或及其以上这些的组合。 Memory 702 may include mass storage for information or instructions. By way of example and not limitation, memory 702 may include a hard disk drive (Hard Disk Drive, HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a Universal Serial Bus (Universal Serial Bus, USB) drive or two or more thereof. A combination of the above. Storage 602 may include removable or non-removable (or fixed) media, where appropriate. Memory 602 may be internal or external to the integrated gateway device, where appropriate. In a particular embodiment, memory 602 is a non-volatile solid-state memory. In a particular embodiment, the memory 702 includes a read-only memory (Read-Only Memory, ROM). Where appropriate, the ROM can be a mask programmed ROM, a programmable ROM (Programmable ROM, PROM), an erasable PROM (Electrical Programmable ROM, EPROM), an electrically erasable PROM (Electrically Erasable Programmable ROM, EEPROM) ), electrically rewritable ROM (Electrically Alterable ROM, EAROM) or flash memory, or a combination of two or more of these.
处理器701通过读取并执行存储器702中存储的计算机程序指令,以执行本公开实施例所提供的语音控制方法的步骤。The processor 701 reads and executes the computer program instructions stored in the memory 702 to execute the steps of the voice control method provided by the embodiments of the present disclosure.
在一个示例中,该语音控制设备还可包括收发器703和总线704。其中,如图7所示,处理器701、存储器702和收发器703通过总线704连接并完成相互间的通信。In an example, the voice control device may further include a transceiver 703 and a bus 704 . Wherein, as shown in FIG. 7 , a processor 701 , a memory 702 and a transceiver 703 are connected through a bus 704 and complete mutual communication.
总线704包括硬件、软件或两者。举例来说而非限制,总线可包括加速图形端口(Accelerated Graphics Port,AGP)或其他图形总线、增强工业标准架构(Extended Industry Standard Architecture,EISA)总线、前端总线(Front Side BUS,FSB)、超传输(Hyper Transport,HT)互连、工业标准架构(Industrial Standard Architecture,ISA)总线、无限带宽互连、低引脚数(Low Pin Count,LPC)总线、存储器总线、微信道架构(Micro Channel Architecture,MCA)总线、外围控件互连(Peripheral Component Interconnect,PCI)总线、PCI-Express(PCI-X)总线、串行高级技术附件(Serial Advanced Technology Attachment,SATA)总线、视频电子标准协会局部(Video Electronics Standards Association Local Bus,VLB)总线或其他合适的总线或者两个或更多个以上这些的组合。在合适的情况下,总线704可包括一个或多个总线。尽管本申请实施例描述和示出了特定的总线,但本申请考虑任何合适的总线或互连。 Bus 704 includes hardware, software, or both. By way of example and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Super Transmission (Hyper Transport, HT) interconnection, Industrial Standard Architecture (Industrial Standard Architecture, ISA) bus, Infinity Bandwidth interconnection, Low Pin Count (Low Pin Count, LPC) bus, memory bus, Micro Channel Architecture (Micro Channel Architecture) , MCA) bus, Peripheral Component Interconnect (PCI) bus, PCI-Express (PCI-X) bus, Serial Advanced Technology Attachment (Serial Advanced Technology Attachment, SATA) bus, Video Electronics Standards Association local (Video Electronics Standards Association Local Bus, VLB) bus or other suitable bus or a combination of two or more of these. Bus 704 may comprise one or more buses, where appropriate. Although the embodiments of this application describe and illustrate a particular bus, this application contemplates any suitable bus or interconnect.
本公开实施例还提供了一种计算机可读存储介质,该存储介质可以存储有计算机程序,当计算机程序被处理器执行时,使得处理器实现本公开实施例所提供的语音控制方法。The embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium can store a computer program, and when the computer program is executed by the processor, the processor implements the voice control method provided by the embodiment of the present disclosure.
上述的存储介质可以例如包括计算机程序指令的存储器702,上述指令可由语音控制设备的处理器701执行以完成本公开实施例所提供的语音控制方法。可选地,存储介质可以是非临时性计算机可读存储介质,例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(Random Access Memory,RAM)、光盘只读存储器(Compact Disc ROM,CD-ROM)、磁带、软盘和光数据存储设备等。The above-mentioned storage medium may include, for example, a memory 702 of computer program instructions, and the above-mentioned instructions can be executed by the processor 701 of the voice control device to complete the voice control method provided by the embodiment of the present disclosure. Optionally, the storage medium can be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium can be ROM, random access memory (Random Access Memory, RAM), compact disc read-only memory (Compact Disc ROM) , CD-ROM), tapes, floppy disks and optical data storage devices, etc.
本公开实施例还提供了一种计算机程序产品,包括计算机程序或指令,所述计算机 程序或指令被处理器执行时,实现第一方面所述的语音控制方法。An embodiment of the present disclosure also provides a computer program product, including a computer program or an instruction. When the computer program or instruction is executed by a processor, the voice control method described in the first aspect is implemented.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Moreover, the term "comprising" is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements but also other elements not expressly listed, or is also included as such. An element inherent in a process, method, article, or device.
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific implementation manners of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure will not be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (15)

  1. 一种语音控制方法,包括:A voice control method, comprising:
    显示目标交互界面;Display the target interface;
    加载所述目标交互界面对应的目标控制指令集,其中,所述目标控制指令集包括根据所述目标交互界面的界面控件数据生成的控制指令;Loading the target control instruction set corresponding to the target interaction interface, wherein the target control instruction set includes control instructions generated according to the interface control data of the target interaction interface;
    响应于接收到用户控制语音,在所述目标控制指令集中查询与所述用户控制语音相匹配的目标控制指令;In response to receiving the user control voice, querying the target control command set for a target control command that matches the user control voice;
    响应于查询到所述目标控制指令,执行所述目标控制指令对应的目标控制操作。In response to finding the target control instruction, execute the target control operation corresponding to the target control instruction.
  2. 根据权利要求1所述的方法,其中,所述控制指令包括根据所述界面控件数据中的静态控件数据生成的第一控制指令。The method according to claim 1, wherein the control instruction comprises a first control instruction generated according to static control data in the interface control data.
  3. 根据权利要求2所述的方法,其中,所述加载所述目标交互界面对应的目标控制指令集,包括:The method according to claim 2, wherein said loading the target control instruction set corresponding to the target interactive interface comprises:
    确定所述目标交互界面所属的目标应用;determining the target application to which the target interactive interface belongs;
    在预先存储的多个预设控制指令集中,查询所述目标应用对应的控制指令集;Querying the control instruction set corresponding to the target application among the plurality of pre-stored preset control instruction sets;
    在所述目标应用对应的控制指令集中,提取所述第一控制指令。Extract the first control instruction from the control instruction set corresponding to the target application.
  4. 根据权利要求3所述的方法,其中,在所述在所述目标应用对应的控制指令集中,提取所述第一控制指令之前,所述方法还包括:The method according to claim 3, wherein, before extracting the first control instruction from the control instruction set corresponding to the target application, the method further comprises:
    检测所述目标应用对应的控制指令集的指令集版本;Detecting the instruction set version of the control instruction set corresponding to the target application;
    其中,所述在所述目标应用对应的控制指令集中,提取所述第一控制指令,包括:Wherein, extracting the first control instruction from the control instruction set corresponding to the target application includes:
    响应于检测到所述指令集版本为最新版本,在所述目标应用对应的控制指令集中,提取所述第一控制指令。In response to detecting that the instruction set version is the latest version, the first control instruction is extracted from the control instruction set corresponding to the target application.
  5. 根据权利要求4所述的方法,其中,在所述检测所述目标应用对应的控制指令集的指令集版本之后,所述方法还包括:The method according to claim 4, wherein, after detecting the instruction set version of the control instruction set corresponding to the target application, the method further comprises:
    响应于检测到所述指令集版本不是最新版本,从服务器下载所述目标应用对应的待更新控制指令集;In response to detecting that the version of the instruction set is not the latest version, downloading the control instruction set to be updated corresponding to the target application from the server;
    利用所述待更新控制指令集替换所述目标应用对应的控制指令集;Using the control instruction set to be updated to replace the control instruction set corresponding to the target application;
    在所述待更新控制指令集中,提取所述第一控制指令。In the set of control instructions to be updated, extract the first control instruction.
  6. 根据权利要求1所述的方法,其中,所述控制指令包括根据所述界面控件数据中的动态控件数据生成的第二控制指令。The method according to claim 1, wherein the control instruction comprises a second control instruction generated according to the dynamic control data in the interface control data.
  7. 根据权利要求6所述的方法,其中,所述加载所述目标交互界面对应的目标控制指令集,包括:The method according to claim 6, wherein said loading the target control instruction set corresponding to the target interactive interface comprises:
    在所述动态控件数据中,提取动态控件文本;Extracting dynamic control text from the dynamic control data;
    对所述动态控件文本进行分词处理,得到所述动态控件文本对应的分词集合;performing word segmentation processing on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text;
    根据所述分词集合生成所述第二控制指令。The second control instruction is generated according to the word segmentation set.
  8. 根据权利要求7所述的方法,其中,在对所述动态控件文本进行分词处理,得到所述动态控件文本对应的分词集合之前,所述方法还包括:The method according to claim 7, wherein, before performing word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text, the method further includes:
    对所述动态控件文本进行预处理;Preprocessing the dynamic control text;
    其中,所述预处理包括符号剔除处理、数字转换处理。Wherein, the preprocessing includes symbol elimination processing and digital conversion processing.
  9. 根据权利要求7所述的方法,其中,所述根据所述分词集合生成所述第二控制指令,包括:The method according to claim 7, wherein said generating said second control instruction according to said word segmentation set comprises:
    根据预设的动词集合和所述分词集合,生成所述第二控制指令。The second control instruction is generated according to the preset verb set and the participle set.
  10. 根据权利要求1所述的方法,其中,所述在所述目标控制指令集中查询与所述用户控制语音相匹配的目标控制指令,包括:The method according to claim 1, wherein the querying the target control command set for the target control command matching the user control voice comprises:
    将所述用户控制语音转换为目标语音文本;Converting the user-controlled voice into target voice text;
    在所述目标控制指令集中查询与所述目标语音文本相匹配的所述目标控制指令。The target control instruction matching the target speech text is searched in the target control instruction set.
  11. 根据权利要求1所述的方法,其中,所述执行所述目标控制指令对应的目标控制操作,包括:The method according to claim 1, wherein the executing the target control operation corresponding to the target control instruction comprises:
    针对所述目标控制指令所涉及的目标界面控件,执行所述目标控制操作。The target control operation is executed for the target interface control involved in the target control instruction.
  12. 一种语音控制装置,包括:A voice control device comprising:
    界面显示模块,配置为显示目标交互界面;an interface display module configured to display a target interactive interface;
    指令加载模块,配置为加载所述目标交互界面对应的目标控制指令集,其中,所述目标控制指令集包括根据所述目标交互界面的界面控件数据生成的控制指令;The instruction loading module is configured to load the target control instruction set corresponding to the target interaction interface, wherein the target control instruction set includes control instructions generated according to the interface control data of the target interaction interface;
    指令匹配模块,配置为响应于接收到用户控制语音,在所述目标控制指令集中查询与所述用户控制语音相匹配的目标控制指令;An instruction matching module, configured to, in response to receiving a user control voice, query the target control instruction set for a target control instruction that matches the user control voice;
    指令执行模块,配置为响应于查询到所述目标控制指令,执行所述目标控制指令对应的目标控制操作。The instruction execution module is configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
  13. 一种语音控制设备,包括:A voice control device comprising:
    处理器;processor;
    存储器,用于存储可执行指令;memory for storing executable instructions;
    其中,所述处理器用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以实现上述权利要求1至11中任一项所述的语音控制方法。Wherein, the processor is configured to read the executable instructions from the memory, and execute the executable instructions to implement the voice control method according to any one of claims 1 to 11.
  14. 一种计算机可读存储介质,其中,所述存储介质存储有计算机程序,当所述计算机程序被处理器执行时,使得处理器实现上述权利要求1至11中任一项所述的语音控制方法。A computer-readable storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the voice control method according to any one of claims 1 to 11 .
  15. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现如权利要求1至11中任一项所述的语音控制方法。A computer program product, including computer programs or instructions, when the computer programs or instructions are executed by a processor, the voice control method according to any one of claims 1 to 11 is realized.
PCT/CN2022/117090 2021-09-14 2022-09-05 Speech control method, apparatus and device, and medium WO2023040692A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111084298.5A CN115810354A (en) 2021-09-14 2021-09-14 Voice control method, device, equipment and medium
CN202111084298.5 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023040692A1 true WO2023040692A1 (en) 2023-03-23

Family

ID=85482069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117090 WO2023040692A1 (en) 2021-09-14 2022-09-05 Speech control method, apparatus and device, and medium

Country Status (2)

Country Link
CN (1) CN115810354A (en)
WO (1) WO2023040692A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872362A (en) * 2010-06-25 2010-10-27 大陆汽车亚太管理(上海)有限公司 Information inquiry system of dynamic voice label and information inquiry method thereof
CN108108142A (en) * 2017-12-14 2018-06-01 广东欧珀移动通信有限公司 Voice information processing method, device, terminal device and storage medium
CN111833868A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Voice assistant control method, device and computer readable storage medium
CN112295220A (en) * 2020-10-29 2021-02-02 北京字节跳动网络技术有限公司 AR game control method, AR game control device, electronic equipment and storage medium
WO2021027267A1 (en) * 2019-08-15 2021-02-18 华为技术有限公司 Speech interaction method and apparatus, terminal and storage medium
CN112825030A (en) * 2020-02-28 2021-05-21 腾讯科技(深圳)有限公司 Application program control method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872362A (en) * 2010-06-25 2010-10-27 大陆汽车亚太管理(上海)有限公司 Information inquiry system of dynamic voice label and information inquiry method thereof
CN108108142A (en) * 2017-12-14 2018-06-01 广东欧珀移动通信有限公司 Voice information processing method, device, terminal device and storage medium
WO2021027267A1 (en) * 2019-08-15 2021-02-18 华为技术有限公司 Speech interaction method and apparatus, terminal and storage medium
CN112825030A (en) * 2020-02-28 2021-05-21 腾讯科技(深圳)有限公司 Application program control method, device, equipment and storage medium
CN111833868A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Voice assistant control method, device and computer readable storage medium
CN112295220A (en) * 2020-10-29 2021-02-02 北京字节跳动网络技术有限公司 AR game control method, AR game control device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115810354A (en) 2023-03-17

Similar Documents

Publication Publication Date Title
US11176141B2 (en) Preserving emotion of user input
US11194448B2 (en) Apparatus for vision and language-assisted smartphone task automation and method thereof
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
JP5948671B2 (en) Multimedia information retrieval method and electronic device
US10496276B2 (en) Quick tasks for on-screen keyboards
WO2018045646A1 (en) Artificial intelligence-based method and device for human-machine interaction
US11238858B2 (en) Speech interactive method and device
US10108698B2 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
CN109817210B (en) Voice writing method, device, terminal and storage medium
RU2733816C1 (en) Method of processing voice information, apparatus and storage medium
US11630825B2 (en) Method and system for enhanced search term suggestion
WO2022135474A1 (en) Information recommendation method and apparatus, and electronic device
KR102140391B1 (en) Search method and electronic device using the method
WO2022105754A1 (en) Character input method and apparatus, and electronic device
CN110825840A (en) Word bank expansion method, device, equipment and storage medium
US20200411004A1 (en) Content input method and apparatus
WO2023040692A1 (en) Speech control method, apparatus and device, and medium
US20130179165A1 (en) Dynamic presentation aid
US10810998B2 (en) Custom temporal blacklisting of commands from a listening device
CN111079422A (en) Keyword extraction method, device and storage medium
CN114402384A (en) Data processing method, device, server and storage medium
CN107168627B (en) Text editing method and device for touch screen
US20140181672A1 (en) Information processing method and electronic apparatus
CN113360127B (en) Audio playing method and electronic equipment
CN115081423A (en) Document editing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22869063

Country of ref document: EP

Kind code of ref document: A1