WO2023040692A1 - Dispositif, appareil et procédé de commande vocale, et support - Google Patents

Dispositif, appareil et procédé de commande vocale, et support Download PDF

Info

Publication number
WO2023040692A1
WO2023040692A1 PCT/CN2022/117090 CN2022117090W WO2023040692A1 WO 2023040692 A1 WO2023040692 A1 WO 2023040692A1 CN 2022117090 W CN2022117090 W CN 2022117090W WO 2023040692 A1 WO2023040692 A1 WO 2023040692A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
target
control instruction
interface
voice
Prior art date
Application number
PCT/CN2022/117090
Other languages
English (en)
Chinese (zh)
Inventor
胡明国
徐超
Original Assignee
北京车和家信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京车和家信息技术有限公司 filed Critical 北京车和家信息技术有限公司
Publication of WO2023040692A1 publication Critical patent/WO2023040692A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the present disclosure relates to the technical field of voice recognition, and in particular, to a voice control method, device, equipment and medium.
  • control instructions corresponding to each interactive interface are pre-stored in the electronic device, and the user can control the operation of each interactive interface of the electronic device by uttering these control instructions.
  • the user due to the limited number of pre-stored control instructions, the user cannot fully realize all voice control of each interactive interface based on these control instructions.
  • the present disclosure provides a voice control method, device, equipment and medium.
  • the present disclosure provides a voice control method, including:
  • the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface
  • the target control operation corresponding to the target control instruction is executed.
  • a voice control device including:
  • an interface display module configured to display a target interactive interface
  • the instruction loading module is configured to load the target control instruction set corresponding to the target interactive interface, and the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface;
  • the instruction matching module is configured to, in response to receiving the user control speech, query the target control instruction set that matches the user control speech;
  • the instruction execution module is configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
  • a voice control device comprising:
  • the processor is used to read executable instructions from the memory, and execute the executable instructions to implement the voice control method described in the first aspect.
  • the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the voice control method described in the first aspect.
  • the present disclosure provides a computer program product, including a computer program or an instruction.
  • the voice control method described in the first aspect is implemented.
  • the voice control method, device, device, and medium of the embodiments of the present disclosure can load the target control instruction set corresponding to the target interactive interface after displaying the target interactive interface, and then when receiving the user's control voice, the target control instruction set Query the target control instruction that matches the received user control voice, and execute the queried target control instruction, so as to realize the user's voice control of the target interactive interface.
  • the interface control data can cover all the interface controls in the target interactive interface, therefore, can fully realize all the voice control of the target interactive interface, and then achieve the visibility of the target interactive interface (Display Can be Said, DCS) effect.
  • FIG. 1 is a schematic flowchart of a voice control method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a main interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure
  • FIG. 3A is a schematic diagram of an application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure
  • FIG. 3B is a schematic diagram of an application interface of another vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a processing process of dynamic control data provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic flowchart of another voice control method provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • control instructions corresponding to each interactive interface are pre-stored in the electronic device, and the user can control the operation of each interactive interface of the electronic device by uttering these control instructions.
  • the control command may be a wake-up word
  • each interactive interface may be registered with a fixed number of wake-up words, and the user may satisfy voice control requirements for each interactive interface by speaking these wake-up words.
  • control instructions need to be set in advance, which means that the wake-up words on each interactive interface need to be designed in advance. If the content on the interactive interface is dynamically loaded, due to the limited number of pre-stored control instructions, users will not be able to control based on these Instructions fully realize all voice control of each interactive interface.
  • wake-up models based on wake-up words are often small, and it is impossible to support multiple complex and large numbers of wake-up words in one scene. If too many wake-up words are registered on an interactive interface, it will also cause errors in the wake-up model. Wake up problem.
  • the embodiments of the present disclosure provide a voice control method, device, device, and medium capable of realizing "see-to-talk".
  • the voice control method provided by the embodiment of the present disclosure will first be described below with reference to FIG. 1 to FIG. 5 .
  • the voice control method may be executed by an electronic device.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
  • Fig. 1 shows a schematic flowchart of a voice control method provided by an embodiment of the present disclosure.
  • the voice control method may include the following steps S110 to S140.
  • the target interaction interface may be an interface visually displayed through a display screen of the electronic device.
  • At least one interface control may be displayed in the target interaction interface.
  • the interface controls may be controls that can be manipulated by the user, such as buttons, options, icons, or links in the interface.
  • the target interaction interface may include a main interface displayed when the electronic device is turned on or in a standby state.
  • the interface control displayed on the target interaction interface may be an icon of an application program.
  • Fig. 2 shows a schematic diagram of a main interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • the vehicle-mounted terminal can display a main interface 201, and multiple interface controls can be displayed in the main interface 201, such as a "setting application” icon 202, a "file application” icon 203, and a “browser application” icon. 204 and the “Music Application” icon 205 .
  • the target interaction interface may include an application interface of any application program installed in the electronic device.
  • the interface controls displayed in the target interaction interface may be buttons, options, icons or links in the application interface.
  • Fig. 3A shows a schematic diagram of an application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • the vehicle-mounted terminal may display a main application interface 301 for setting an application program, and multiple interface controls may be displayed on the application interface 301, such as a "My Device” button 302, a "Control Center” button 303, and a " More Settings” button 304 .
  • Fig. 3B shows a schematic diagram of another application interface of a vehicle-mounted terminal provided by an embodiment of the present disclosure.
  • the vehicle-mounted terminal can display a main application interface 305 of a music application program, and multiple interface controls can be displayed in the application interface 305, such as "daily recommendation” option 306, “song list” option 307, “Local Music” option 308, “Settings” option 309, various playback control buttons 310, various playlist links 311 and a "Back" button 312.
  • the target control instruction set includes control instructions generated according to the interface control data of the target interactive interface.
  • the target control instruction set corresponding to the target interaction interface can be loaded, and each control command in the target control instruction set is generated according to the interface control data of the target interaction interface
  • the interface control data may include control data corresponding to all interface controls, that is, all interface controls in the target interactive interface have corresponding control instructions.
  • the target control instruction set may be a set of control instructions corresponding to various interface controls in the target interactive interface.
  • control instruction may include a first control instruction generated according to static control data in the interface control data.
  • the first control instruction is a control instruction of the static control.
  • the static control data may be control data corresponding to static controls in the target interactive interface.
  • the static control may be an interface control that is always fixed and displayed, that is, the static control will not change according to user preferences or settings.
  • the static controls can be interface controls that come with the device when it leaves the factory, and cannot be dynamically updated or changed by the user.
  • the corresponding "setting application” icon 202, "file application” icon 203 and The “browser application” icon 204 belongs to the static control of the main interface 201, and the control instruction generated based on the control data corresponding to the "setting application” icon 202, the "file application” icon 203 and the “browser application” icon 204 is the main interface 201 corresponding to the first control command.
  • static controls can be interface controls that are fixedly displayed within the interface frame and will not change according to user preferences, such as built-in resources preset in the interface project, and these built-in resources can be displayed in the application interface
  • the interface content is perceived in advance before being pushed to the user.
  • the corresponding "my device” button 302 and “control center” of these setting functions belong to the static controls of the application main interface 301, and the control instructions are generated based on the control data corresponding to the "my device” button 302, the "control center” button 303 and the “more settings” button 304 It is the first control instruction corresponding to the application main interface 301 .
  • control instruction may further include a second control instruction generated according to the dynamic control data in the interface control data.
  • the second control instruction is a control instruction of the dynamic control.
  • the dynamic control data may be control data corresponding to the dynamic controls in the target interactive interface.
  • the dynamic control may be an interface control that can be dynamically updated or changed with user preferences or settings.
  • the dynamic control may be an interface control added by the user.
  • the "music application" icon 205 corresponding to the music application belongs to the dynamic control of the main interface 201, and is generated based on the control data corresponding to the "music application” icon 205.
  • the control instruction is the second control instruction corresponding to the main interface 201 .
  • the dynamic control may be an interface control that will be dynamically updated within the interface framework, such as a resource that is filled after retrieval based on a network source.
  • the playlist links 311 corresponding to these playlist names belong to the dynamic control of the application main interface 305, based on each The control instruction generated by the control data corresponding to the song list link 311 is the second control instruction corresponding to the application main interface 305 .
  • the dynamic control may also be an interface control within the interface frame that changes according to the user's preference, and details are not described here.
  • the target control instruction set corresponding to the loaded target interaction interface can be used to control static controls in the target interaction interface, and can also be used to control dynamic controls in the target interaction interface.
  • the electronic device can load the target control instruction set corresponding to the target interaction interface, so that each interface control in the target interaction interface has a corresponding control instruction, so that the user can perform voice control on the target interaction interface more comprehensively.
  • the electronic device after the electronic device loads the target control instruction set, it can monitor the user control voice, and after receiving the user control voice, search for information related to the user control voice in each control instruction of the target control instruction set. Matching target control instructions.
  • S130 may specifically include: converting the user's control voice into the target voice text; querying the target control instruction set for a target control instruction that matches the target voice text.
  • the electronic device can input the user's control voice into an automatic speech recognition (Automatic Speech Recognition, ASR) engine set offline to obtain the target voice text output by the ASR engine, and then in each control command of the target control command set Query for target control commands that match the target speech text.
  • ASR Automatic Speech Recognition
  • the matching of the target control instruction and the target speech text may be that the target speech text contains any verb in the target control instruction and any control text participle, or that the verb in the target speech text is the same as any verb in the target control instruction and
  • the similarity between nouns in the target speech text and any control text word in the target control instruction is greater than or equal to a preset similarity threshold.
  • the electronic device can determine the user's voice control intention by querying the target control command matching the user's control voice in the target control command set.
  • the electronic device may execute the target control operation corresponding to the target control command; If there is a matching target control instruction, the detection of the user's voice can be continued, waiting for the next user to control the voice.
  • S140 may specifically include: performing a target control operation for the target interface control involved in the target control instruction.
  • each control instruction can be used to trigger the execution of the target control operation on the target interface control involved in the control instruction, that is, each control instruction can be used to trigger The target interface control to which the control data generating the control instruction belongs performs the target control operation.
  • the target control operation may be a control operation implemented in a target control manner indicated by the target control instruction.
  • the electronic device may perform a control operation on the target interface control to which the control data generating the control instruction belongs according to the target control mode indicated by the target control instruction.
  • the electronic device may enter a new interaction interface, or may remain in the target interaction interface.
  • the electronic device when the electronic device remains in the target interaction interface, the electronic device does not need to reload the control instruction set, and can continue to implement the user's voice control of the target interaction interface based on the target control instruction set.
  • the electronic device when the electronic device enters a new interactive interface, the electronic device needs to reload the control instruction set corresponding to the new interactive interface, so as to implement the user's voice on the target interactive interface based on the reloaded control instruction set control.
  • the electronic device when the user controls the electronic device to perform the control operation of "opening the music application", the electronic device can enter the application main interface of the music application. Therefore, after jumping from the main interface 201 to the application main interface of the music application After the interface, it is necessary to reacquire the control instruction set corresponding to the application main interface of the music application program, so as to realize the voice control of the user on the application main interface of the music application program based on the reloaded control instruction set.
  • the electronic device when the user controls the electronic device to perform the control operation of "play daily recommendation", the electronic device can directly play the daily recommended song in the application main interface 305 of the music application without jumping to other Therefore, without reloading the control instruction set, the user can continue to implement voice control of the music application main application interface 305 based on the control instruction set corresponding to the music application main application interface 305 .
  • the target control instruction set corresponding to the target interaction interface can be loaded, and then when the user control voice is received, the target control command set can be queried with the received user control voice match the target control instruction, and execute the queried target control instruction, thereby realizing the voice control of the user on the target interactive interface.
  • the loaded target control instruction set includes control instructions generated according to the interface control data of the target interactive interface, the interface control data can cover all interface controls in the target interactive interface, therefore, all voice control of the target interactive interface can be fully realized, Furthermore, the effect of DCS on the target interactive interface is achieved, and the user experience is improved.
  • the electronic device may directly acquire the pre-generated first control instruction.
  • S120 may specifically include: determining the target application to which the target interactive interface belongs; querying the control instruction set corresponding to the target application among multiple pre-stored preset control instruction sets; in the control instruction set corresponding to the target application, A first control instruction is extracted.
  • multiple preset control instruction sets may be pre-stored in the electronic device, and each preset control instruction set may correspond to an application program, that is, each preset control instruction set may contain Control instructions for all static controls involved.
  • the target application may be an application program to which the target interactive interface belongs.
  • the electronic device may use the application program that needs to be run when displaying the target interaction interface as the target application to which the target interaction interface belongs.
  • the electronic device may first receive the preset control instruction set sent by the server.
  • the server may receive the control instructions of all static controls corresponding to each interactive interface of the application program and the control mode corresponding to each control instruction input by the developer.
  • the control instruction of each static control includes a verb set and a control text word segmentation set corresponding to the static control.
  • a set of word segmentation extracted from the text, the static control text may be the control name of the static control that can be seen by the user, and the verb set in the control instruction includes multiple verbs with similar semantics.
  • the server may extract the control text word segmentation set from the control data of the static control, that is, the static control text in the static control data, and then Multiple control instructions of the static control are obtained by combining different pre-set verb sets and control text segmentation sets, and the verb sets in each control command include multiple verbs with similar semantics.
  • the server can also use the control instruction corresponding to the verb set in the control instruction and the control function of the static control corresponding to the control text participle set to determine the control mode corresponding to the control instruction.
  • each participle in the control text participle set can be connected by "
  • the participle set content conforming to the Extended Backus-Naur Form (EBNF) grammatical paradigm can be obtained, so that the first control instruction can be loaded into the language model of the grammar (Grammar) engine .
  • EBNF Extended Backus-Naur Form
  • the verb set can be "open
  • the participle set may be "music
  • the preset control instruction set stored in the electronic device since the static controls in the target interaction interface may be updated due to reasons such as version upgrades, the preset control instruction set stored in the electronic device also needs to be updated to ensure that the user can control the updated target All static controls in the interactive interface are controlled by voice.
  • the voice control method may further include: detecting an instruction set version of the control instruction set corresponding to the target application.
  • the electronic device may detect the instruction set version of the control instruction set corresponding to the target application, and obtain the version number of the control instruction set corresponding to the target application.
  • extracting the first control instruction from the control instruction set corresponding to the target application may specifically include: in response to detecting that the version of the instruction set is the latest version, extracting the first control instruction from the control instruction set corresponding to the target application.
  • the electronic device can determine whether the instruction set version of the control instruction set corresponding to the target application is the latest version by judging whether the detected version number is the latest version number. If the electronic device determines that the version number is the latest version number, it can It is determined that the version of the instruction set is the latest version. At this time, there is no need to update the control instruction set corresponding to the target application, and the first control instruction corresponding to the target interactive interface can be directly extracted from the control instruction set corresponding to the target application.
  • the voice control method may further include: in response to detecting that the instruction set version is not the latest version, downloading from the server the to-be-updated version corresponding to the target application.
  • the control instruction set; the control instruction set corresponding to the target application is replaced by the control instruction set to be updated; the first control instruction is extracted from the control instruction set to be updated.
  • the electronic device determines that the version number is not the latest version number, it can determine that the instruction set version is not the latest version.
  • the control instruction set corresponding to the target application needs to be updated, and the electronic device can send the control instruction set for the target application to the server.
  • An instruction set update request so that the server responds to receiving the control instruction set update request, and feeds back to the electronic device the latest version of the control instruction set corresponding to the target application, that is, the control instruction set corresponding to the target application to be updated, so as to download the corresponding version of the target application from the server.
  • the control instruction set to be updated and then use the control instruction set to be updated to replace the control instruction set corresponding to the target application, that is, the control instruction set to be updated is used as the new control instruction set corresponding to the target application, and the target control instruction set that is not the latest version is deleted, Then extract the first control instruction corresponding to the target interaction interface from the control instruction set to be updated, that is, the new control instruction set corresponding to the target application.
  • control instructions of all static controls can be sorted out in advance for each interactive interface of each application program, and then the control instructions of all static controls corresponding to all installed application programs can be pre-stored in the electronic device. instructions, and these control instructions are used as static preset content to realize fast loading of the first control instructions of the target interactive interface.
  • control instruction when the control instruction includes a second control instruction generated according to the dynamic control data in the interface control data, the electronic device may generate the second control instruction according to the dynamic control data.
  • loading the target control instruction set corresponding to the target interaction interface may specifically include: processing the dynamic control data to generate the second control instruction.
  • the dynamic control is a control formed by filling control data in a dynamic content reserved field.
  • Fig. 4 shows a schematic flowchart of a processing procedure of dynamic control data provided by an embodiment of the present disclosure.
  • the processing process of the dynamic control data may include the following steps S410 to S430.
  • each static control data can belong to a static control
  • each dynamic control data can belong to a dynamic control.
  • the electronic device may extract the dynamic control text of the dynamic control to which the dynamic control data belongs from the dynamic control data corresponding to the target interaction interface, and the dynamic control text may be the name of the control that the dynamic control can be seen by the user. .
  • the song list link 311 belongs to the dynamic control of the application main interface 305. Taking the song list link 311 of "Ambient Piano Music as Soul and Endless Void" as an example, its dynamic control text is "Ambient Piano Music as Soul and Endless Void” Endless Void Dialogue”.
  • the electronic device may perform word segmentation processing on the dynamic control text to obtain a word segment set corresponding to the dynamic control text, that is, a control text word segment set of the dynamic control.
  • the electronic device can use any word segmentation processing algorithm to split the dynamic control text into multiple control text word segments, then combine any number of adjacent control text word segments to obtain multiple word segment combinations, and finally A word segment set corresponding to the dynamic control text including multiple control text word segments and combinations of multiple word segments is obtained.
  • the method of combining multiple control text word segments and multiple word segment combinations to obtain a word segment set may include connecting multiple control text word segments and multiple word segment combinations with "
  • the content of the participle set conforming to the EBNF grammatical paradigm can be obtained, so that the generated second control instruction can be loaded into the language model of the Grammar engine.
  • the electronic device may generate the second control instruction according to the word segmentation set based on a preset control instruction generation manner.
  • S430 may specifically include: generating a second control instruction according to a preset verb set and participle set.
  • the electronic device can combine different pre-set verb sets and word segmentation sets to obtain multiple control instructions of the dynamic control, and the verb set in each control instruction includes multiple verbs with similar semantics.
  • the electronic device can also use the control command corresponding to the verb set in the control command and the control function of the dynamic control corresponding to the participle set to determine the control mode corresponding to the control command.
  • the voice control method may further include: preprocessing the dynamic control text.
  • the electronic device after the electronic device extracts the dynamic control text, before performing word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text, it can also preprocess the dynamic control text to obtain Dynamic control text for text processing.
  • pre-processing may include symbol removal processing, digital conversion processing.
  • Symbol elimination processing can be used to eliminate symbols in the dynamic control text, such as punctuation marks, special symbols, mathematical symbols and any other symbols that do not have semantic meaning.
  • Numeral conversion processing can be used to convert Arabic numerals in dynamic control text to Chinese numerals. Among them, if the Arabic numerals have more than two digits, the entire Arabic numerals can be converted into a Chinese numeral, or each numeral can be converted into a numeral respectively.
  • the electronic device may first remove symbols in the dynamic control text to obtain the dynamic control text after removing symbols. Then, the electronic device can convert the Arabic numerals in the dynamic control text after removing symbols into Chinese numerals.
  • the Arabic numeral “200” can be converted into the Chinese numeral "two hundred”, and it can also be converted into the Chinese numeral "200” to obtain the digital conversion
  • the final dynamic control text is "The Chinese class represents the private collection of 200 idiom song names in the cheat sheet
  • the electronic device can perform word segmentation and word segmentation combination on the converted dynamic control text, Get the participle collection "Chinese class representative's private collection cheat sheet 200 idiom song titles
  • the electronic device after the electronic device displays the target interaction interface, it can generate control instructions for all the dynamic controls in the target interaction interface based on the data of each dynamic control in the target interaction interface, that is, the second control instruction, and then These control instructions are used as dynamic loading content to realize reliable and efficient loading of the second control instructions of the target interactive interface.
  • Fig. 5 shows a schematic flowchart of another voice control method provided by an embodiment of the present disclosure.
  • the voice control method may include the following steps S510 to S560.
  • an electronic device with a voice control function may display a target interaction interface, so that the user may perform voice control on the target interaction interface.
  • the electronic device may initialize the ASR engine, and load the language model whose instruction content is empty. Then, load the target control instruction set corresponding to the target interactive interface into the language model. During the start-up and initialization of the ASR engine and the loading of the target control instruction set, the electronic device does not receive user voice.
  • the electronic device needs to load the first control instruction and the second control instruction in the target control instruction set into the language model.
  • the electronic device can first determine whether the application to which the preloaded language model belongs is the target application to which the target interactive interface belongs, and if so, load the target control instruction set into the language model; if not, reload the empty program corresponding to the target application. language model, and then load the target control instruction set into the reloaded language model.
  • the electronic device may first determine the target application to which the target interactive interface belongs, and then query the control instruction set corresponding to the target application in multiple preset control instruction sets, and then determine the instruction set version of the control instruction set corresponding to the target application Whether it is the latest version, if it is determined that the instruction set version is the latest version, there is no need to update the control instruction set corresponding to the target application at this time, and the first control instruction corresponding to the target interactive interface can be directly extracted from the control instruction set corresponding to the target application; The set version is not the latest version. In this case, the control instruction set corresponding to the target application needs to be updated.
  • the first control instruction corresponding to the target interactive interface is extracted from the set of control instructions to be updated. After the first control instruction is acquired by the electronic device, the first control instruction can be loaded into the language model.
  • the electronic device may acquire dynamic control data corresponding to all dynamic controls in the target interaction interface.
  • the electronic device can extract the dynamic control text of the dynamic control from the dynamic control data, then remove the symbols in the dynamic control text and convert the Arabic numerals in the dynamic control text into Chinese numerals,
  • the preprocessed dynamic control text is obtained, and then word segmentation is performed on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text, and finally a second control command is generated according to the preset verb set and word segmentation set.
  • the second control instruction may be loaded into the language model.
  • the electronic device may also convert the first control instruction and the second control instruction into binary codes, and load them into the speech model.
  • the electronic device may wait for the user to input a voice. If the start of human voice is recognized based on Voice Activity Detection (VAD), the recording will continue. If the end of the human voice is recognized based on the VAD, the recording is stopped. The electronic device can use the recorded audio as the user control voice. The electronic device can then input the user control voice into the ASR engine to obtain the target voice text corresponding to the user control voice.
  • VAD Voice Activity Detection
  • the electronic device may search the target control command set for the target control command matching the target voice text.
  • the electronic device may determine whether the target control instruction is found, and if the target control instruction is found, execute S560, otherwise return to execute S530.
  • the electronic device may perform a control operation on the target interface control to which the control data generating the control command belongs according to the target control mode indicated by the target control command.
  • the verb set and the participle set are used to generate control instructions, it can support the voice control of thousands of levels of speech in a single interface, and the control instructions include the generation of static controls based on the interface.
  • the static control instructions and the dynamic control instructions generated based on the dynamic controls in the interactive interface so on the basis of being able to support a sufficiently large grammatical level, the control instructions can also be arbitrarily expanded to achieve an effect that is visible to the interactive interface.
  • the loading process of the control instruction and the recognition process of the user's control voice are independent of each other and do not interfere with each other, which can improve the accuracy of recognition.
  • both the ASR engine and the Grammar engine are offline engines, which can run on the terminal side (that is, run in the electronic device) without relying on the network.
  • the model of the engine is small enough and requires low computing power, so that the control commands that need to be supported in the interactive interface can be responded at a faster speed (on average, it is about 1.2s faster than the cloud recognition result, and faster than the offline general recognition result. 500ms), which can bring higher benefits in vehicle application scenarios.
  • Fig. 6 shows a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • the apparatus shown in FIG. 6 may be applied to electronic equipment.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
  • the voice control device 600 may include an interface display module 610 , an instruction loading module 620 , an instruction matching module 630 and an instruction execution module 640 .
  • the interface display module 610 may be configured to display a target interaction interface.
  • the instruction loading module 620 may be configured to load a target control instruction set corresponding to the target interactive interface, where the target control instruction set includes control instructions generated according to interface control data of the target interactive interface.
  • the instruction matching module 630 may be configured to, in response to receiving the user control speech, query the target control instruction set for the target control instruction matching the user control speech.
  • the instruction executing module 640 may be configured to execute the target control operation corresponding to the target control instruction in response to querying the target control instruction.
  • the target control instruction set corresponding to the target interaction interface can be loaded, and then when the user control voice is received, the target control command set can be queried with the received user control voice match the target control instruction, and execute the queried target control instruction, and then realize the user's voice control of the target interactive interface.
  • the loaded target control instruction set includes control instructions generated according to the interface control data of the target interactive interface, the The interface control data can cover all the interface controls in the target interactive interface. Therefore, all the voice control of the target interactive interface can be fully realized, thereby achieving the effect of being visible and talking about the target interactive interface, and improving the user experience.
  • control instruction may include a first control instruction generated according to static control data in the interface control data.
  • the interface display module 620 may further include an application determination unit, a first query unit, and a first extraction unit.
  • the application determining unit may be configured to determine the target application to which the target interactive interface belongs.
  • the first query unit may be configured to query the target control command set corresponding to the target application among the multiple preset control command sets stored in advance.
  • the first extracting unit may also be configured to extract the first control instruction from the control instruction set corresponding to the target application.
  • the interface display module 620 may further include a version detection unit, which may be configured to detect the instruction set version of the control instruction set corresponding to the target application before extracting the first control instruction from the target control instruction set .
  • the first extraction unit may be further configured to extract the first control instruction from the target control instruction set when the version detection unit detects that the instruction set version is the latest version.
  • the interface display module 620 may further include an instruction set downloading unit, a first processing unit, and a second extracting unit.
  • the instruction set downloading unit may be configured to download the control instruction set to be updated corresponding to the target application from the server if it detects that the instruction set version is not the latest version after detecting the instruction set version of the control instruction set corresponding to the target application.
  • the first processing unit may be configured to replace the target control instruction set with the control instruction set to be updated.
  • the second extracting unit may be configured to extract the first control instruction from the set of control instructions to be updated.
  • control instruction may include a second control instruction generated according to the dynamic control data in the interface control data.
  • the interface display module 620 may further include a third extracting unit, a second processing unit, and an instruction generating unit.
  • the third extracting unit may be configured to extract dynamic control text from the dynamic control data.
  • the second processing unit may be configured to perform word segmentation processing on the dynamic control text to obtain a word segmentation set corresponding to the dynamic control text.
  • the instruction generation unit may be configured to generate a second control instruction according to the word segmentation set.
  • the interface display module 620 may further include a third processing unit, and the third processing unit may be configured to perform word segmentation processing on the dynamic control text to obtain the word segmentation set corresponding to the dynamic control text,
  • the text is pre-processed, wherein the pre-processing includes symbol elimination processing and digital conversion processing.
  • the instruction generation unit may be further configured to generate a second control instruction according to a preset verb set and participle set.
  • the interface display module 630 may include a text conversion unit and a second query unit.
  • the text converting unit may be configured to convert the user control speech into target speech text.
  • the second query unit may be configured to query the target control command set that matches the target voice text.
  • the instruction executing module 640 may be further configured to execute a target control operation for the target interface control involved in the target control command.
  • the voice control device 600 shown in FIG. 6 can execute each step in the method embodiment shown in FIG. 1 to FIG. 5 , and realize each process and effects, which will not be described here.
  • Fig. 7 shows a schematic structural diagram of a voice control device provided by an embodiment of the present disclosure.
  • the voice control device shown in FIG. 7 may be an electronic device.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle terminal, a wearable electronic device, a smart home device, and other devices with a voice control function.
  • the voice control device may include a processor 701 and a memory 702 storing computer program instructions.
  • the processor 701 may include a central processing unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits in the embodiments of the present application.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • Memory 702 may include mass storage for information or instructions.
  • memory 702 may include a hard disk drive (Hard Disk Drive, HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a Universal Serial Bus (Universal Serial Bus, USB) drive or two or more thereof.
  • HDD Hard Disk Drive
  • floppy disk drive a flash memory
  • an optical disk a magneto-optical disk
  • magnetic tape or a Universal Serial Bus (Universal Serial Bus, USB) drive or two or more thereof.
  • Universal Serial Bus Universal Serial Bus
  • Storage 602 may include removable or non-removable (or fixed) media, where appropriate.
  • Memory 602 may be internal or external to the integrated gateway device, where appropriate.
  • memory 602 is a non-volatile solid-state memory.
  • the memory 702 includes a read-only memory (Read-Only Memory, ROM).
  • the ROM can be a mask programmed ROM, a programmable ROM (Programmable ROM, PROM), an erasable PROM (Electrical Programmable ROM, EPROM), an electrically erasable PROM (Electrically Erasable Programmable ROM, EEPROM) ), electrically rewritable ROM (Electrically Alterable ROM, EAROM) or flash memory, or a combination of two or more of these.
  • the processor 701 reads and executes the computer program instructions stored in the memory 702 to execute the steps of the voice control method provided by the embodiments of the present disclosure.
  • the voice control device may further include a transceiver 703 and a bus 704 .
  • a processor 701 a memory 702 and a transceiver 703 are connected through a bus 704 and complete mutual communication.
  • Bus 704 includes hardware, software, or both.
  • a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Super Transmission (Hyper Transport, HT) interconnection, Industrial Standard Architecture (Industrial Standard Architecture, ISA) bus, Infinity Bandwidth interconnection, Low Pin Count (Low Pin Count, LPC) bus, memory bus, Micro Channel Architecture (Micro Channel Architecture) , MCA) bus, Peripheral Component Interconnect (PCI) bus, PCI-Express (PCI-X) bus, Serial Advanced Technology Attachment (Serial Advanced Technology Attachment, SATA) bus, Video Electronics Standards Association local (Video Electronics Standards Association Local Bus, VLB) bus or other suitable bus or a combination of two or more of these.
  • Bus 704 may comprise one or more buses, where appropriate.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium can store a computer program, and when the computer program is executed by the processor, the processor implements the voice control method provided by the embodiment of the present disclosure.
  • the above-mentioned storage medium may include, for example, a memory 702 of computer program instructions, and the above-mentioned instructions can be executed by the processor 701 of the voice control device to complete the voice control method provided by the embodiment of the present disclosure.
  • the storage medium can be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium can be ROM, random access memory (Random Access Memory, RAM), compact disc read-only memory (Compact Disc ROM) , CD-ROM), tapes, floppy disks and optical data storage devices, etc.
  • An embodiment of the present disclosure also provides a computer program product, including a computer program or an instruction.
  • a computer program product including a computer program or an instruction.
  • the voice control method described in the first aspect is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un dispositif, un appareil et un procédé de commande vocale, et un support. Le procédé consiste à : afficher une interface d'interaction cible (S110) ; charger un ensemble d'instructions de commande cible correspondant à l'interface d'interaction cible, l'ensemble d'instructions de commande cible comprenant des instructions de commande qui sont générées en fonction des données de commande d'interface de l'interface d'interaction cible (S120) ; lorsque la parole de commande d'utilisateur est reçue, interroger l'ensemble d'instructions de commande cible pour une instruction de commande cible qui correspond à la parole de commande d'utilisateur (S130) ; et, si l'instruction de commande cible est trouvée, exécuter une opération de commande cible correspondant à l'instruction de commande cible (S140).
PCT/CN2022/117090 2021-09-14 2022-09-05 Dispositif, appareil et procédé de commande vocale, et support WO2023040692A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111084298.5A CN115810354A (zh) 2021-09-14 2021-09-14 语音控制方法、装置、设备及介质
CN202111084298.5 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023040692A1 true WO2023040692A1 (fr) 2023-03-23

Family

ID=85482069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117090 WO2023040692A1 (fr) 2021-09-14 2022-09-05 Dispositif, appareil et procédé de commande vocale, et support

Country Status (2)

Country Link
CN (1) CN115810354A (fr)
WO (1) WO2023040692A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872362A (zh) * 2010-06-25 2010-10-27 大陆汽车亚太管理(上海)有限公司 动态语音标签信息查询系统及其信息查询方法
CN108108142A (zh) * 2017-12-14 2018-06-01 广东欧珀移动通信有限公司 语音信息处理方法、装置、终端设备及存储介质
CN111833868A (zh) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 语音助手控制方法、装置及计算机可读存储介质
CN112295220A (zh) * 2020-10-29 2021-02-02 北京字节跳动网络技术有限公司 Ar游戏控制方法、装置、电子设备及存储介质
WO2021027267A1 (fr) * 2019-08-15 2021-02-18 华为技术有限公司 Procédé et appareil d'interaction parlée, terminal et support de stockage
CN112825030A (zh) * 2020-02-28 2021-05-21 腾讯科技(深圳)有限公司 一种应用程序控制方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872362A (zh) * 2010-06-25 2010-10-27 大陆汽车亚太管理(上海)有限公司 动态语音标签信息查询系统及其信息查询方法
CN108108142A (zh) * 2017-12-14 2018-06-01 广东欧珀移动通信有限公司 语音信息处理方法、装置、终端设备及存储介质
WO2021027267A1 (fr) * 2019-08-15 2021-02-18 华为技术有限公司 Procédé et appareil d'interaction parlée, terminal et support de stockage
CN112825030A (zh) * 2020-02-28 2021-05-21 腾讯科技(深圳)有限公司 一种应用程序控制方法、装置、设备及存储介质
CN111833868A (zh) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 语音助手控制方法、装置及计算机可读存储介质
CN112295220A (zh) * 2020-10-29 2021-02-02 北京字节跳动网络技术有限公司 Ar游戏控制方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN115810354A (zh) 2023-03-17

Similar Documents

Publication Publication Date Title
US11176141B2 (en) Preserving emotion of user input
US11194448B2 (en) Apparatus for vision and language-assisted smartphone task automation and method thereof
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
JP5948671B2 (ja) マルチメディア情報検索方法及び電子機器
US10496276B2 (en) Quick tasks for on-screen keyboards
WO2018045646A1 (fr) Procédé et dispositif à base d'intelligence artificielle pour interaction humain-machine
CN109817210B (zh) 语音写作方法、装置、终端和存储介质
US11238858B2 (en) Speech interactive method and device
AU2017216520A1 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
RU2733816C1 (ru) Способ обработки речевой информации, устройство и запоминающий носитель информации
US11630825B2 (en) Method and system for enhanced search term suggestion
WO2022135474A1 (fr) Procédé et appareil de recommandation d'informations et dispositif électronique
KR102140391B1 (ko) 검색 방법 및 이 방법을 적용하는 전자 장치
WO2022105754A1 (fr) Procédé et appareil d'entrée de caractères et dispositif électronique
EP3149926B1 (fr) Système et procédé pour la prise en charge d'une requête d'utilisateur vocale
CN110825840A (zh) 词库扩充方法、装置、设备及存储介质
WO2023040692A1 (fr) Dispositif, appareil et procédé de commande vocale, et support
US20130179165A1 (en) Dynamic presentation aid
CN111858966A (zh) 知识图谱的更新方法、装置、终端设备及可读存储介质
CN111079422A (zh) 关键词提取方法、装置及存储介质
CN114402384A (zh) 数据处理方法、装置、服务器和存储介质
CN107168627B (zh) 用于触摸屏的文本编辑方法和装置
US20200105249A1 (en) Custom temporal blacklisting of commands from a listening device
US20140181672A1 (en) Information processing method and electronic apparatus
CN113360127B (zh) 音频播放方法以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22869063

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE