WO2022000828A1 - 小程序的语音控制方法、设备及存储介质 - Google Patents

小程序的语音控制方法、设备及存储介质 Download PDF

Info

Publication number
WO2022000828A1
WO2022000828A1 PCT/CN2020/117498 CN2020117498W WO2022000828A1 WO 2022000828 A1 WO2022000828 A1 WO 2022000828A1 CN 2020117498 W CN2020117498 W CN 2020117498W WO 2022000828 A1 WO2022000828 A1 WO 2022000828A1
Authority
WO
WIPO (PCT)
Prior art keywords
applet
voice
target applet
target
information
Prior art date
Application number
PCT/CN2020/117498
Other languages
English (en)
French (fr)
Inventor
史南胜
谢马林
季林峰
曹姣
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to KR1020217020655A priority Critical patent/KR20210091328A/ko
Priority to EP20943669.0A priority patent/EP4170650A1/en
Priority to JP2022520806A priority patent/JP7373063B2/ja
Priority to US17/357,660 priority patent/US11984120B2/en
Publication of WO2022000828A1 publication Critical patent/WO2022000828A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44568Immediately runnable code
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the embodiments of the present application relate to voice technology in computer technology, and in particular, to a method, device, and storage medium for voice control of an applet.
  • the interaction between the existing intelligent voice device and the applet is inconvenient to operate.
  • intelligent hardware devices such as car screens and TV screens
  • the interaction through touch is particularly inconvenient, especially the operation of in-vehicle applications during driving may lead to safety hazards.
  • Touching the operation screen makes the voice interaction unsustainable, resulting in fragmentation of attention, and the fragmentation of user interaction and the inconvenience of use, which is easy to cause users to quit or abandon halfway; the process is too long, such as finding a favorite applet You need to go through the Mini Program Center before you can enter. Based on the above factors, the user experience is extremely poor.
  • the present application provides a voice control method, device and storage medium for an applet, so as to realize the voice control of the applet on an intelligent voice device, improve the convenience of user interaction with the applet, and thus improve the interactive experience.
  • a voice control method of a small program is provided, which is applied to an intelligent voice device, and the intelligent voice device is configured with a voice interaction system and a target applet, and the method includes:
  • the target applet receives the intention information transmitted by the voice interaction system, wherein the intention information is obtained after the voice interaction system performs speech recognition and intention analysis on the voice control instruction of the target applet sent by the user;
  • the target applet converts the intention information into a control instruction executable by the thread of the target applet, and the thread of the target applet executes the control instruction.
  • a voice control method for an applet which is applied to an intelligent voice device, where a voice interaction system and a target applet are configured on the intelligent voice device, and the method includes:
  • the voice interaction system obtains the voice control instruction of the target applet sent by the user
  • the voice interaction system performs voice recognition and intent analysis on the voice control instruction to obtain intent information
  • the voice interaction system transmits the intention information to the target applet, so that the target applet converts the intention information into a control instruction executable by the thread of the target applet and executes it.
  • a voice control method for an applet is provided, which is applied to an intelligent voice device, and the intelligent voice device is configured with a voice interaction system and a target applet, and the method includes:
  • the voice interaction system performs voice recognition and intent analysis on the voice control instructions, obtains intent information, and transmits the intent information to the target applet;
  • the intent information is received by the target applet, and the intent information is converted into a control instruction executable by the thread of the target applet, and the control instruction is executed by the thread of the target applet.
  • an intelligent voice device on which a voice interaction system and a target applet are configured; wherein the voice interaction system includes:
  • an acquisition module configured to acquire the user's voice control instruction for the target applet in the intelligent voice device through the voice acquisition device;
  • a voice processing module configured to perform voice recognition and intent analysis on the voice control instruction, obtain intent information, and transmit the intent information to the target applet;
  • the target applet includes:
  • a receiving module configured to receive the intent information
  • an instruction conversion module for converting the intention information into a control instruction executable by the thread of the target applet
  • the execution module is used for executing the control instruction through the thread of the target applet.
  • an electronic device comprising:
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the first aspect or the second aspect or the third aspect the method described.
  • a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the first aspect or the second aspect or the third aspect. method.
  • a user's voice control instruction for a target applet in an intelligent voice device is obtained through a voice acquisition device; the voice interaction system performs voice control on the voice control instruction Identify and analyze the intent, obtain intent information, and transmit the intent information to the target applet; the intent information is received by the target applet, and the intent information is converted into a control instruction that can be executed by the thread of the target applet, The control instructions are executed by the thread of the target applet.
  • the embodiment of the present application can realize the voice control of the target applet, improve the convenience of the interaction process, and avoid the need for touch mode to interact with the applet, resulting in unsustainable voice interaction.
  • the split caused attention, which improved the user's experience of using Mini Programs, and also provided strong support for the distribution and usage of Mini Programs on smart voice devices.
  • FIG. 1 is a scene diagram of a voice control method that can implement an applet according to an embodiment of the present application
  • FIG. 2 is a flowchart of a voice control method for an applet provided according to an embodiment of the present application
  • FIG. 3 is a flowchart of a voice control method for an applet provided according to another embodiment of the present application.
  • FIG. 4 is a flowchart of a voice control method for an applet provided according to another embodiment of the present application.
  • FIG. 5 is a flowchart of a voice control method for an applet provided according to another embodiment of the present application.
  • FIG. 6 is a flowchart of a voice control method for an applet provided according to another embodiment of the present application.
  • FIG. 7 is a flowchart of a voice control method for an applet provided according to another embodiment of the present application.
  • FIG. 8 is a block diagram of an intelligent voice device provided according to an embodiment of the present application.
  • FIG. 9 is a block diagram of an electronic device used to implement the voice control method of the applet according to the embodiment of the present application.
  • the existing smart voice devices especially some smart voice devices with screens, such as screen speakers, car screens, etc., mainly rely on manual activation for the distribution of the applet, and the interaction process between the user and the applet is mainly through manual touch. Interactions, such as information video applets on screen speakers, need to manually trigger the applet center for selection before they can be called up, and can be browsed and watched by manually turning pages up and down, clicking to play, etc.
  • the interaction between the existing intelligent voice device and the applet is inconvenient to operate.
  • intelligent hardware devices such as car screens and TV screens
  • the interaction through touch is particularly inconvenient, especially the operation of in-vehicle applications during driving may lead to safety hazards.
  • the present application provides a voice control method for an applet, which is applied to the voice technology in computer technology, and realizes the interaction between the voice interaction system and the applet framework.
  • Voice control improves the convenience of the interaction process, avoids the need for touch to interact with the applet and causes the voice interaction to be unsustainable and causes distraction, improves the user experience of using the applet, and also distributes the applet on smart voice devices. and the amount of use to provide strong support.
  • the present application can be applied to a smart voice device with a screen, and of course it can also be applied to a smart voice device without a screen.
  • the applet can still have the function of being activated and controlled by manual touch.
  • the embodiment of the present application is applied to the scenario shown in FIG. 1 .
  • the intelligent voice device 100 is configured with a voice interaction system 110 and a target applet 120 , wherein the voice interaction system 110 can collect the voice of the user to the target applet collected by the voice collection device.
  • the control command performs speech recognition and intent analysis, obtains intent information, and transmits the intent information to the target applet 120, which receives the intent information and converts the intent information into the target applet 120.
  • a control instruction that can be executed by a thread, and the thread of the target applet 120 executes the control instruction.
  • FIG. 2 is a flowchart of the voice control method for an applet provided by an embodiment of the present application.
  • the executive body can be an intelligent voice device, on which a voice interaction system and a target applet are configured; as shown in Figure 2, the specific steps of the voice control method of the applet are as follows:
  • the user when the user needs to control the target applet on the smart voice device, he can issue a voice control command.
  • a voice control command For example, the user wants to control the video applet A on the smart speaker to play the variety show B variety show B, the user can issue a voice control command of "I want to watch variety show B of the video applet A", and then the voice control command of the user can be collected by a voice acquisition device such as a microphone on the smart speaker.
  • the voice interaction system performs voice recognition and intent analysis on the voice control instruction, obtains intent information, and transmits the intent information to the target applet.
  • the voice interaction system of the intelligent voice device since the voice acquisition device collects the voice control instructions for the target applet, the voice interaction system uses the voice recognition and intent analysis functions.
  • the function performs speech recognition and intent analysis.
  • the speech recognition is to convert the collected voice control instructions into text that the machine understands
  • the intent analysis is to analyze and process the machine text and interpret the key information, such as the above-mentioned voice control instructions.
  • the following keywords “mini program”, "video applet A”, and "variety show B" can be parsed, so that the user's intention information can be obtained; while the target applet does not need to deploy speech recognition and intention analysis functions.
  • the voice interaction system After acquiring the intent information, the voice interaction system only needs to transmit the intent information to the target applet, that is, transmit the intent information to the thread of the target applet.
  • the intent information is received by the target applet, and the intent information is converted into a control instruction executable by the thread of the target applet, and the control instruction is executed by the thread of the target applet.
  • the intent information obtained by the voice interaction system may not be understood and executed by the applet. Therefore, when the target applet has been activated in the intelligent voice device, after receiving the intent information transmitted by the voice interaction system, the target applet can convert the intent information into control instructions that can be executed by the thread of the target applet, and then the target applet The thread of the program executes the control instruction, thereby realizing the user's desired intent or function. For example, the video applet A has been activated on the smart speaker.
  • the video applet A After the video applet A receives the intention information related to the user's desire to play variety show B, it is converted into a search resource library and playback instructions that the video applet A can execute, and then Execute the search resource library and play instructions to play the variety show B in the video applet A.
  • the voice interaction system needs to first activate the target applet, and then the target applet performs the above process.
  • a voice control instruction of a user to a target applet in an intelligent voice device is obtained through a voice acquisition device; the voice interaction system performs voice recognition and intention analysis on the voice control instruction, and obtains intent information, and transmit the intent information to the target applet; the intent information is received by the target applet, and the intent information is converted into a control instruction that can be executed by the thread of the target applet, and the thread of the target applet Execute the control instruction.
  • the voice control of the target applet can be realized, the convenience of the interaction process is improved, and the unsustainable voice interaction caused by the interaction with the applet in the touch mode is avoided.
  • the split caused attention which improved the user's experience of using Mini Programs, and also provided strong support for the distribution and usage of Mini Programs on smart voice devices.
  • the voice interaction system described in S202 acquires the intent information, it may specifically include:
  • the target applet described in S203 converts the intention information into a control instruction executable by the thread of the target applet, it may specifically include:
  • the target applet determines a predetermined conversion rule according to the intention information, and converts the intention information into a control instruction executable by the thread of the target applet according to the predetermined conversion rule.
  • the voice interaction system can convert the intent parsing result into intent information satisfying the preset protocol according to the preset protocol, wherein the preset protocol may specifically include a protocol header and a protocol content part,
  • the protocol header may specifically include, but is not limited to, the namespace of the instruction, the instruction name, and the encryption type (if the protocol is embedded in the system, encrypted transmission is not required), and the content of the protocol may include, but not limited to, the intended operation information and target small Program information, wherein the operation information may include but not limited to operation type and extension information, the extension information may be a supplement to the operation type, and the target applet information may include the applet identifier of the target applet;
  • the operation type is "fast-forward"
  • the applet ID is the ID of the video applet A
  • the extended information is the fast-forward speed, such as 2x fast-forward speed.
  • the target applet After receiving the intent information that satisfies the preset protocol transmitted by the voice interaction system, the target applet converts the intent information into control instructions that the target applet can execute.
  • the target applet is SwanJS (Smart Web Application Native JavaScript). ) architecture
  • SwanJS is the core of the applet framework of an Internet enterprise
  • the intent information that satisfies the preset protocol needs to be converted into SwanJS event commands.
  • the target applet obtains a preset conversion rule corresponding to the protocol header according to the protocol header in the intent information, wherein the preset conversion rule includes the relationship between the preset operation information and the preset control instruction.
  • the corresponding relationship of the target applet according to the corresponding relationship between the preset operation information and the preset control instruction, the target applet converts the protocol content part in the intention information into a control instruction that can be executed by the thread of the target applet.
  • the target applet can be preset. All the control instructions that can be executed, and the corresponding relationship between each preset control instruction and the preset operation type, and then determine which control instruction corresponds to the content part of the protocol.
  • SwanJS only implements the general function of the interface.
  • the developer can rewrite the command execution function through the interface, for example, for fast forward
  • SwanJS only provides an interface to implement the general function of fast forwarding, and the specific method of fast forwarding can be configured by the developer according to the specific scene.
  • the specific method may further include:
  • the target applet controls the thread of the target applet to temporarily store the relevant content that needs to be interacted with, and upload the relevant content that needs to be interacted by asynchronously uploading to the server.
  • control commands that need to interact with the server may not need to interact with the server immediately, and the instruction scheduling process is performed through the target applet , temporarily store the relevant content in the target applet that needs to interact with the server locally on the smart voice device, and then upload it to the server by asynchronous upload.
  • the control command If you want to favorite videos, you need to record the favorite videos on the server, so that the user can still see the favorite videos when the target applet is called up next time or the applet is called up on other devices, and the target applet can save the favorite videos.
  • the relevant content is temporarily stored locally, and then uploaded asynchronously.
  • upload the server to the server or upload the server in a silent idle upload mode, so as to avoid frequent interaction between the target applet and the server during user interaction, and to ensure that users Sufficient bandwidth during interaction to improve user experience.
  • a synchronous uploading method can also be used when receiving a control instruction that needs to interact with the server.
  • this embodiment is compatible with asynchronous and synchronous uploading, and can be selected according to specific scenarios such as network environment.
  • determine whether it is necessary to interact with the server After obtaining the control command, determine whether it is necessary to interact with the server. When no interaction is required, the control command can be directly executed locally; when interaction is required, it is determined whether local temporary storage is required. Temporary storage is directly uploaded to the server. If local temporary storage is required, it will be temporarily stored locally, and then uploaded to the server asynchronously.
  • an asynchronous upload method can also be used, which can ensure that the same user has a consistent experience when using the same target applet on different devices, such as different smart speakers. You can view the user's favorites, historical browsing records, comments, likes, order and purchase records and other data on the same target applet on .
  • the method may further include:
  • the voice interaction system performs speech recognition and intention analysis on the voice activating instruction, determines a target applet to be mobilized according to the intention analysis result, and revives the target applet.
  • S201-S203 is the voice control of the target applet after the target applet has been activated in the intelligent voice device, before voice control of the target applet, the target applet needs to be To tune up.
  • the call-up process can also be carried out by voice control, that is, the user's voice call-up instruction to the target applet is obtained through a voice acquisition device.
  • voice control that is, the user's voice call-up instruction to the target applet is obtained through a voice acquisition device.
  • the user sends a voice call-up of "start video applet A" command, and then voice recognition and intention analysis can be performed through the voice interaction system to determine that the user's intention is to call up the target applet "video applet A", so the target applet can be called up further.
  • the voice interaction system described in S402 activates the target applet, it may specifically include:
  • the server obtains the applet package of the target applet, and can directly call up the target applet; otherwise, if the target applet does not exist in the intelligent voice device, the applet package of the target applet needs to be obtained from the server, and then the target applet does not exist.
  • the target applet is called up.
  • the applet package of the target applet when obtaining the applet package of the target applet from the server as described in S502, it may specifically include:
  • the intelligent voice device does not support voice interaction with the applet, obtain part of the applet package of the target applet from the server, wherein the part of the applet package does not load modules related to voice interaction .
  • the voice interaction system when the voice interaction system obtains the applet package SDK (software development kit) of the target applet from the server, it can first determine whether the intelligent voice device has the ability to support the voice interaction of the applet. If voice interaction with the applet is supported, the full SDK applet package of the target applet can be obtained from the server; if the smart voice device does not support voice interaction with the applet, part of the SDK applet of the target applet can be obtained from the server Some SDK applet packages are relative to the full SDK applet package, in which modules related to voice interaction are not loaded, that is, the target applet does not have the ability to receive intent information and convert the intent information into the target applet. The ability of the control instructions that the thread can execute can reduce the size of the applet package body, reduce the traffic consumption of the loading process, and improve the speed of calling up the target applet.
  • SDK software development kit
  • determining the target applet to be called up according to the result of the intention analysis includes:
  • the voice interaction system determines that the intent analysis result includes the target applet to be called up and the resource information requested by the user, it searches the resource library of the target applet to see if there is a target corresponding to the resource information resource;
  • S702 If it does not exist, obtain other applet programs that can provide the target resource, and recommend it to the user as an alternative target applet.
  • the voice call-up instruction for the target applet sent by the user may specifically include the name of the target applet and the requested resource information.
  • the voice command can be used as an unvoiced call-up command.
  • voice recognition and intent analysis it can be determined that the target to be called up is small.
  • the program is video applet A, and the resource requested by the user is variety show B. At this time, you can search the server to see if the resource library of video applet A exists. If the resource exists, the video applet A will be called up.
  • a fuzzy search can be performed based on the requested resource to determine the target applet. Which applet has the resource, so it is determined as the target applet and called up.
  • the voice control method for the applet may further include:
  • the voice interaction system periodically acquires a predetermined number of popular applet packages from the server, and caches them.
  • the voice interaction system can periodically access the server to obtain relevant information of popular applets, and the voice interaction system can select a predetermined number of popular applets according to the relevant information of popular applets, and obtain the miniprograms of these popular applets from the server.
  • the package is cached so that when the user needs to call up a popular applet, it can be quickly called up.
  • the server can also periodically push information about popular applets to the voice interaction system, and then the voice interaction system selects a predetermined number of popular applets according to the information about popular applets.
  • the number of popular applets to be cached may be determined according to the storage space of the intelligent voice device; and the selection of those popular applets for caching may be determined based on factors such as the download volume of the applets, user interests and other factors.
  • the voice control method of the applet can realize the voice control of the target applet through the interaction between the voice interaction system and the target applet frame; Voice control realizes calling up the target applet, improves the convenience of calling the applet, simplifies the operation process of finding and opening the applet by touch, improves the user experience of using the applet, and also helps the applet in the operation process of the applet.
  • the distribution and usage of smart voice devices provide strong support.
  • An embodiment of the present application provides a voice control method for an applet, which is applied to an intelligent voice device, where a voice interaction system and a target applet are configured on the intelligent voice device, and the method includes:
  • the target applet receives the intention information transmitted by the voice interaction system, wherein the intention information is obtained after the voice interaction system performs speech recognition and intention analysis on the voice control instruction of the target applet sent by the user;
  • the target applet converts the intention information into a control instruction executable by the thread of the target applet, and the thread of the target applet executes the control instruction.
  • the intention information is the intention information generated by the voice interaction system according to the intention analysis result according to a preset protocol
  • the target applet converts the intent information into control instructions executable by the thread of the target applet, including:
  • the target applet determines a predetermined conversion rule according to the intention information, and converts the intention information into a control instruction executable by the thread of the target applet according to the predetermined conversion rule.
  • the preset protocol includes a protocol header and a protocol content, wherein the protocol content includes operation information and target applet information corresponding to the intention;
  • the target applet determines a predetermined conversion rule according to the intention information, and converts the intention information into a control instruction executable by the thread of the target applet according to the predetermined conversion rule, including:
  • the target applet acquires a preset conversion rule corresponding to the protocol header according to the protocol header in the intent information, wherein the preset conversion rule includes the correspondence between preset operation information and preset control instructions relation;
  • the target applet converts the protocol content part in the intention information into a control command executable by the thread of the target applet according to the corresponding relationship between the preset operation information and the preset control instruction.
  • the execution of the control instruction by the thread of the target applet includes:
  • the target applet determines that the control instruction needs to interact with the server, the target applet controls the thread of the target applet to temporarily store the relevant content that needs to be interacted with, and asynchronously uploads the content that needs to be interacted with.
  • the relevant content is uploaded to the server.
  • the target applet before the target applet receives the intent information transmitted by the voice interaction system, it further includes:
  • the target applet is called up by the voice interaction system according to the voice calling instruction of the target applet issued by the user.
  • the smart voice device is a smart speaker.
  • the voice control method of the applet provided in this embodiment is a flow executed by the target applet of the intelligent voice device in the above-mentioned embodiment, and the specific implementation manner and technical effect thereof refer to the above-mentioned embodiment, which will not be repeated here.
  • An embodiment of the present application provides a voice control method for an applet, which is applied to an intelligent voice device, where a voice interaction system and a target applet are configured on the intelligent voice device, and the method includes:
  • the voice interaction system obtains the voice control instruction of the target applet sent by the user
  • the voice interaction system performs voice recognition and intent analysis on the voice control instruction to obtain intent information
  • the voice interaction system transmits the intention information to the target applet, so that the target applet converts the intention information into a control instruction executable by the thread of the target applet and executes it.
  • the obtaining intention information includes:
  • the intent information generated by the voice interaction system according to the intent parsing result according to a preset protocol wherein the preset protocol includes a protocol header and protocol content, wherein the protocol content includes operation information corresponding to the intent and target applet information .
  • the method before the voice interaction system acquires the voice control instruction of the target applet sent by the user, the method further includes:
  • the voice interaction system obtains the voice tune-up instruction to the target applet issued by the user;
  • the voice interaction system performs voice recognition and intention analysis on the voice activating instruction, determines the target applet to be mobilized according to the intention analysis result, and revives the target applet.
  • the activating the target applet includes:
  • the voice interaction system determines that the target applet does not exist in the intelligent voice device, obtain the applet package of the target applet from the server, and activate the target applet; or
  • the voice interaction system determines that the target applet already exists in the intelligent voice device, it directly calls up the target applet.
  • the obtaining the applet package of the target applet from the server includes:
  • the voice interaction system determines that the intelligent voice device supports voice interaction with the applet, obtain the full applet package of the target applet from the server; or
  • the voice interaction system determines that the intelligent voice device does not support voice interaction with the applet, obtain part of the applet package of the target applet from the server, wherein the part of the applet package does not load with the applet Voice interaction related modules.
  • determining the target applet to be called up according to the intent analysis result includes:
  • the voice interaction system determines that the intent analysis result includes the target applet to be called up and the resource information requested by the user, it searches the resource library of the target applet to see if there is a target resource corresponding to the resource information;
  • the method further includes:
  • the voice interaction system periodically obtains a predetermined number of small program packages of popular small programs from the server, and caches them.
  • the smart voice device is a smart speaker.
  • the voice control method of the applet provided in this embodiment is a process performed by the voice interaction system of the intelligent voice device in the above embodiment, and the specific implementation manner and technical effect thereof refer to the above embodiment, which will not be repeated here.
  • FIG. 8 is a structural diagram of the intelligent voice device provided by an embodiment of the present application.
  • the intelligent voice device 800 is configured with a voice interaction system 810 and a target applet 820 thereon.
  • the voice interaction system 810 may include: an acquisition module 811 and a voice processing module 812;
  • An acquisition module 811 configured to acquire the user's voice control instruction for the target applet in the intelligent voice device through the voice acquisition device;
  • a voice processing module 812 configured to perform voice recognition and intent analysis on the voice control instruction, obtain intent information, and transmit the intent information to the target applet;
  • the speech processing module 812 may specifically include a speech recognition sub-module and an intent parsing sub-module.
  • the target applet 820 may include: a receiving module 821, an instruction converting module 822 and an executing module 823;
  • a receiving module 821 configured to receive the intention information
  • an instruction conversion module 822 configured to convert the intent information into a control instruction executable by the thread of the target applet
  • the execution module 823 is configured to execute the control instruction through the thread of the target applet.
  • the speech processing module 812 acquires the intention information, it is used to:
  • the instruction conversion module 822 converts the intention information into a control instruction executable by the thread of the target applet, it is used for:
  • a predetermined conversion rule is determined according to the intention information, and the intention information is converted into a control instruction executable by the thread of the target applet according to the predetermined conversion rule.
  • the execution module 823 executes the control instruction through the thread of the target applet, it is used to:
  • the thread of the control target applet temporarily stores the relevant content that needs to be interacted locally, and uploads the relevant content that needs to be interacted to the server by asynchronous uploading.
  • the execution module 823 may specifically include a scheduling sub-module and an uploading sub-module.
  • the acquisition module 811 is further configured to: acquire through the voice acquisition device The user's voice call-up command to the target applet;
  • the voice interaction system 810 also includes an applet calling module 813 for:
  • Voice recognition and intention analysis are performed on the voice activating instruction, and the target applet to be mobilized is determined according to the intention analysis result, and the target applet is mobilized.
  • the target applet does not exist in the intelligent voice device, obtain the applet package of the target applet from the server, and activate the target applet; or
  • the target applet is directly called up.
  • the applet calling module 813 obtains the applet package of the target applet from the server, it is used to:
  • the intelligent voice device supports voice interaction with the applet, obtain the full applet package of the target applet from the server; or
  • the intelligent voice device does not support the voice interaction with the applet, obtain part of the applet package of the target applet from the server, wherein the part of the applet package does not load the modules related to the voice interaction.
  • the applet calling module 813 determines the target applet to be called up according to the intent analysis result, it is used to:
  • the intent parsing result includes the target applet to be called up and the resource information requested by the user, then search the resource library of the target applet to see if there is a target resource corresponding to the resource information;
  • the applet calling module 813 is also used for:
  • a predetermined number of popular applet packages are periodically obtained from the server and cached.
  • the intelligent voice device provided in this embodiment may be specifically used to execute the method embodiments provided in the above figures, and specific functions will not be repeated here.
  • the intelligent voice device obtained in this embodiment obtains the user's voice control instruction for the target applet in the intelligent voice device through the voice acquisition device; the voice interaction system performs voice recognition and intent analysis on the voice control instruction to obtain intent information, and transmit the intent information to the target applet; the intent information is received by the target applet, and the intent information is converted into a control instruction executable by the thread of the target applet, and the thread of the target applet executes the Control instruction.
  • the voice control of the target applet can be realized, the convenience of the interaction process is improved, and the unsustainable voice interaction caused by the interaction with the applet in the touch mode is avoided.
  • the split caused attention, which improved the user's experience of using Mini Programs, and also provided strong support for the distribution and usage of Mini Programs on smart voice devices.
  • the present application further provides an electronic device and a readable storage medium.
  • FIG. 9 it is a block diagram of an electronic device according to the voice control method of the applet according to the embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the application described and/or claimed herein.
  • the electronic device includes: one or more processors 901, a memory 902, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are interconnected using different buses and may be mounted on a common motherboard or otherwise as desired.
  • the processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface.
  • multiple processors and/or multiple buses may be used with multiple memories, if desired.
  • multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system).
  • a processor 901 is taken as an example in FIG. 9 .
  • the memory 902 is the non-transitory computer-readable storage medium provided by the present application.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes the voice control method of the applet provided by the present application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to cause the computer to execute the voice control method of the applet provided by the present application.
  • the memory 902 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules (for example, the modules shown in Figure 8).
  • the processor 901 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 902, that is, to implement the voice control method of the applet in the above method embodiments.
  • the memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function; the storage data area may store a program created by the use of an electronic device according to the voice control method of the applet data etc. Additionally, memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely relative to the processor 901, and these remote memories may be connected to the electronic device of the voice control method of the applet through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the electronic device for the voice control method of the applet may further include: an input device 903 and an output device 904 .
  • the processor 901 , the memory 902 , the input device 903 and the output device 904 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 9 .
  • the input device 903 can receive input numerical or character information, and generate key signal input related to user settings and function control of the electronic device of the voice control method of the applet, such as touch screen, keypad, mouse, trackpad, touchpad, An input device such as a pointing stick, one or more mouse buttons, trackball, joystick, etc.
  • Output devices 904 may include display devices, auxiliary lighting devices (eg, LEDs), haptic feedback devices (eg, vibration motors), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the user's voice control instruction for the target applet in the intelligent voice device is obtained through the voice acquisition device; the voice interaction system performs speech recognition and intent analysis on the voice control instruction to obtain intent information, and transmit the intent information to the target applet; the intent information is received by the target applet, and the intent information is converted into a control instruction executable by the thread of the target applet, and the thread of the target applet executes the Control instruction.
  • the voice control of the target applet can be realized, the convenience of the interaction process is improved, and the unsustainable voice interaction caused by the interaction with the applet in the touch mode is avoided.
  • the split caused attention which improved the user's experience of using Mini Programs, and also provided strong support for the distribution and usage of Mini Programs on smart voice devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了小程序的语音控制方法、设备及存储介质,涉及计算机技术中的语音技术,具体实现方案为:通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;由语音交互系统对语音控制指令进行语音识别和意图解析,获取意图信息,并将意图信息传输给目标小程序;由目标小程序接收意图信息,并将意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行控制指令。本申请实施例通过语音交互系统与目标小程序框架之间的交互,实现对目标小程序的语音控制,提高交互过程的便利性,提高用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。

Description

小程序的语音控制方法、设备及存储介质
本申请要求于2020年6月29日提交中国专利局、申请号为202010605375.6、申请名称为“小程序的语音控制方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术中的语音技术,尤其涉及一种小程序的语音控制方法、设备及存储介质。
背景技术
随着人工智能和小程序的迅猛发展,在智能软硬件设备中围绕小程序展开的各种应用和产品不断受到关注。各种智能硬件产品通过嵌入和分发小程序为用户提供更多的服务。
目前,很多支持语音交互的智能语音设备如有屏音箱、车载屏幕等,也可使用小程序,而其对于小程序的分发主要还是依赖手动调起,用户与小程序的交互过程也主要通过手动触摸交互,例如有屏音箱上的资讯视频类小程序,需要手动触发小程序中心进行选择后才能调起、并通过手动上下翻页、点击播放等操作才能进行浏览和观看等。
现有的智能语音设备与小程序的交互方式,操作不方便,对于车载屏、电视屏等智能硬件设备,通过触摸方式交互尤为不便,尤其是在驾驶过程中对车载应用的操作可能导致安全隐患;而触摸操作屏幕,导致语音交互不可持续进行,造成注意力割裂,且用户交互操作的割裂和使用的不方便性,容易造成用户中途退出或者弃用;流程过长,比如找到收藏的小程序等需要经过小程序中心才可以进入,基于上述因素导致用户体验极差。
发明内容
本申请提供了一种小程序的语音控制方法、设备及存储介质,以在智能语音设备上实现小程序的语音控制,提高用户与小程序交互的便利性,从而提高交互体验。
根据本申请的第一方面,提供了一种小程序的语音控制方法,应用于智能语音设 备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
所述目标小程序接收所述语音交互系统传输的意图信息,其中所述意图信息为所述语音交互系统对用户发出的目标小程序的语音控制指令进行语音识别和意图解析后所得到的;
所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。
根据本申请的第二方面,提供了一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
所述语音交互系统获取用户发出的目标小程序的语音控制指令;
所述语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息;
所述语音交互系统将所述意图信息传输给所述目标小程序,以由所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令并执行。
根据本申请的第三方面,提供了一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;
由语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;
由目标小程序接收所述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。
根据本申请的第四方面,提供了一种智能语音设备,其上配置有语音交互系统和目标小程序;其中语音交互系统包括:
获取模块,用于通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;
语音处理模块,用于对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;
所述目标小程序包括:
接收模块,用于接收所述意图信息;
指令转换模块,用于将所述意图信息转换为目标小程序的线程能够执行的控制指令;
执行模块,用于通过目标小程序的线程执行所述控制指令。
根据本申请的第五方面,提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行第一方面或第二方面或第三方面所述的方法。
根据本申请的第六方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行第一方面或第二方面或第三方面所述的方法。
本申请实施例提供的小程序的语音控制方法、设备及存储介质,通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;由语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;由目标小程序接收所述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。本申请实施例通过语音交互系统与目标小程序框架之间的交互,可实现对目标小程序的语音控制,提高了交互过程的便利性,避免了需要触摸方式与小程序交互导致语音交互不可持续而进行造成注意力割裂,提高了用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。
应当理解,本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征,也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本申请的限定。其中:
图1是可以实现本申请实施例的小程序的语音控制方法的场景图;
图2是根据本申请一实施例提供的小程序的语音控制方法的流程图;
图3是根据本申请另一实施例提供的小程序的语音控制方法的流程图;
图4是根据本申请另一实施例提供的小程序的语音控制方法的流程图;
图5是根据本申请另一实施例提供的小程序的语音控制方法的流程图;
图6是根据本申请另一实施例提供的小程序的语音控制方法的流程图;
图7是根据本申请另一实施例提供的小程序的语音控制方法的流程图;
图8是根据本申请一实施例提供的智能语音设备的框图;
图9是用来实现本申请实施例的小程序的语音控制方法的电子设备的框图。
具体实施方式
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
现有技术的智能语音设备,尤其是一些有屏智能语音设备,如有屏音箱、车载屏幕等,对于小程序的分发主要还是依赖手动调起,用户与小程序的交互过程也主要通过手动触摸交互,例如有屏音箱上的资讯视频类小程序,需要手动触发小程序中心进行选择后才能调起、并通过手动上下翻页、点击播放等操作才能进行浏览和观看等。现有的智能语音设备与小程序的交互方式,操作不方便,对于车载屏、电视屏等智能硬件设备,通过触摸方式交互尤为不便,尤其是在驾驶过程中对车载应用的操作可能导致安全隐患;而触摸操作屏幕,导致语音交互不可持续进行,造成注意力割裂,且用户交互操作的割裂和使用的不方便性,容易造成用户中途退出或者弃用;流程过长,比如找到收藏的小程序等需要经过小程序中心才可以进入,基于上述因素导致用户体验极差。
针对于现有技术的上述技术问题,本申请提供一种小程序的语音控制方法,应用于计算机技术中的语音技术,通过语音交互系统与小程序框架之间的交互,实现对目标小程序的语音控制,提高了交互过程的便利性,避免了需要触摸方式与小程序交互导致语音交互不可持续而进行造成注意力割裂,提高了用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。本申请可应用于有屏智能语音设备,当然也可用于无屏智能语音设备。当然,小程序仍然可以具有手动触摸进行调起、控制的功能。
本申请实施例应用于如图1所示的场景,在智能语音设备100配置有语音交互系统110和目标小程序120,其中语音交互系统110能够对语音采集设备采集的用户对目标小程序的语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序120,由目标小程序120接收所述意图信息,并将所述意图信息转换为目标小程序120的线程能够执行的控制指令,由目标小程序120的线程执行所述控制指令。通过上述的语音交互系统110与目标小程序120框架之间的交互,可实现对目标小程序的语音控制。
下面结合具体实施例和附图对本申请的小程序的语音控制过程进行详细介绍。
本申请一实施例提供一种小程序的语音控制方法,图2为本申请实施例提供的小程序的语音控制方法流程图。所述执行主体可以为智能语音设备,其上配置有语音交互系统和目标小程序;如图2所示,所述小程序的语音控制方法具体步骤如下:
S201、通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令。
在本实施例中,当用户需要对智能语音设备上的目标小程序进行控制时,可发出语音控制指令,例如用户希望控制智能音箱上的已调起的视频小程序A播放综艺节目B综艺节目B,则用户可以发出“我要看视频小程序A的综艺节目B”的语音控制指令,进而可由智能音箱上的语音采集设备如麦克风采集用户的语音控制指令。
可以理解的是,用户在发出包含视频小程序A及综艺节目B的语音控制指令时,可以为真实存在的某个视频小程序及真实存在的综艺节目,这里进行了隐形处理。
S202、由语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序。
在本实施例中,由于智能语音设备的语音交互系统通常具备语音识别和意图解析功能,因此在语音采集设备采集到对目标小程序的语音控制指令后,通过语音交互系统的语音识别和意图解析功能进行语音识别和意图解析,其中语音识别是将采集到的语音控制指令转换为机器理解的文字,而意图解析是将机器文字进行分析处理,解读其中的关键信息,例如上述的语音控制指令,可解析出如下关键词“小程序”、“视频小程序A”、“综艺节目B”,从而可获取到用户的意图信息;而目标小程序中则不需要部署语音识别和意图解析功能,在语音交互系统获取到意图信息后,将意图信息传输给目标小程序即可,也即将意图信息传输给目标小程序的线程。
S203、由目标小程序接收所述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。
在本实施例中,由于不同的小程序在开发过程时可能采用不同的框架或开发语言,可能与语音交互系统的存在差异,语音交互系统得到的意图信息可能无法被小程序理解和执行。因此在智能语音设备中已调起该目标小程序时,目标小程序可接收到语音交互系统传输的意图信息后,将意图信息转换为目标小程序的线程能够执行的控制指令,进而由目标小程序的线程执行该控制指令,从而实现用户所需的意图或功能。例如,智能音箱上已调起了视频小程序A,在视频小程序A接收到用户希望播放综艺节目B相关的意图信息后,转换成视频小程序A能够执行的搜索资源库和播放指令,进而执行搜索资源库和播放指令,实现在视频小程序A中播放综艺节目B。
当然,若目标小程序当前未调起,则需要先由语音交互系统对目标小程序进行调起后再由目标小程序进行上述的流程。
本实施例提供的小程序的语音控制方法,通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;由语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;由目标小程序接收所述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。本实施例通过语音交互系统与目标小程序框架之间的交互,可实现对目标小程序的语音控制,提高了交互过程的便利性,避免了需要触摸方式与小程序交互导致语音交互不可持续而进行造成注意力割裂,提高了用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。
在上述任一实施例的基础上,S202中所述的语音交互系统在获取意图信息时,具体可包括:
由所述语音交互系统根据意图解析结果按照预设协议生成意图信息;
相应的,所述S203中所述的目标小程序在将所述意图信息转换为目标小程序的线程能够执行的控制指令时,具体可包括:
所述目标小程序根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令。
在本实施例中,语音交互系统在获取到意图解析结果后,可根据预设协议将意图解析结果转换为满足预设协议的意图信息,其中预设协议具体可包括协议头和协议内容部分,其中协议头具体可包括但不限于指令的命名空间、指令名、加密类型(若系统内嵌协议则无需进行加密传输),而协议内容部分则可包括但不限于意图对应的操作信息以及目标小程序信息,其中操作信息可包括但不限于操作类型、扩展信息,扩展信息可以是对操作类型的补充,目标小程序信息可包括目标小程序的小程序标识;例如用户意图为对于视频小程序A当前播放视频进行快进,则操作类型为“快进”,小程序标识为视频小程序A的标识,扩展信息为快进速度,如2倍快进速度。
进一步的,目标小程序在接收到语音交互系统传输的满足预设协议的意图信息后,将该意图信息转换成目标小程序能够执行的控制指令,例如目标小程序为SwanJS(Smart Web Application Native JavaScript)架构,SwanJS是某互联网企业的小程序框架的核心,则需要将满足预设协议的意图信息转换成SwanJS事件命令。可选的,在进行指令转换时,目标小程序根据意图信息中的协议头,获取与协议头对应的预设转 换规则,其中预设转换规则中包括预设操作信息与预设控制指令之间的对应关系;目标小程序根据预设操作信息与预设控制指令之间的对应关系,将意图信息中的协议内容部分转换为目标小程序的线程能够执行的控制指令。
进一步的,首先根据协议头中的命名空间对应的转换规则,解析出实现约定好的指令空间,可根据指令空间、操作类型、加密类型、小程序标识等确定意图信息是否是针对目标小程序的,如果不是,则确定意图信息对于目标小程序无效,直接返回,从而不进行后续的语音控制过程,无需占用小程序线程,减小运行开销,如果是,则可继续后续的语音控制过程;而协议内容部分则是控制指令需要执行的部分,支持开发者应用程序的事件调起,可将协议内容部分根据准换规则转换成SwanJS框架下的控制指令,例如可预先设定目标小程序中所能执行的全部控制指令,以及各预设控制指令与预设操作类型之间的对应关系,然后确定协议内容部分与哪一控制指令相对应。
本实施例中考虑到小程序适配和实现的多样性,针对SwanJS只实现接口的通用功能,对于具体小程序具体场景的模式,开发者可以通过接口进行重写指令执行功能,例如对于快进的控制指令,SwanJS只提供接口实现快进的通用功能,而具体以怎样的方式快进可由开发者根据具体场景配置。
在上述任一实施例的基础上,如图3所示,在S203中所述由目标小程序的线程执行所述控制指令时,具体还可包括:
S301、由所述目标小程序判断所述控制指令是否需要与服务端进行交互;
S302、若所述控制指令需要与服务端进行交互,则所述目标小程序控制目标小程序的线程对于需要交互的相关内容进行本地暂存,并采用异步上传的方式对需要交互的相关内容上传到所述服务端。
在本实施例中,考虑到当智能语音设备处于弱网或断网环境时,某些需要与服务端进行交互的控制指令可无需立即与服务端进行交互,则通过目标小程序进行指令调度过程,对目标小程序中需要与服务端交互的相关内容在智能语音设备本地进行暂存,然后采用异步上传的方式上传到服务端,例如当智能语音设备处于弱网或断网环境时,控制指令是希望收藏视频,需要将收藏视频记录在服务端,以便于下次调起目标小程序或在其他设备上调起小程序时用户仍能够看到该收藏的视频,目标小程序则可将收藏视频的相关内容在本地暂存,然后异步上传,例如在网络环境比较好时上传服务端,或者采用静默空闲上传方式上传服务端,避免在用户交互过程中目标小程序频繁与服务端交互,保障用户交互过程中足够的带宽,提高用户体验。
当然,上述实施例中,在网络环境较好时收到需要与服务端进行交互的控制指令 时也可采用同步上传的方式。此外,本实施例中可兼容异步和同步上传,可根据网络环境等具体场景进行选择。可选的,在获取到控制指令后,判断是否需要与服务端交互,当不需要交互时,直接在本地执行控制指令即可;当需要交互时,判断是否需要本地暂存,若不需要本地暂存则直接上传服务端,若需要本地暂存,则先本地暂存,再异步上传到服务端。
可选的,本实施例中,对于一些需要同步给服务端的数据,也可采用异步上传的方式,可确保同一用户在不同的设备上使用同一目标小程序时体验一致,例如在不同的智能音箱上的同一目标小程序上可查看到该用户的收藏夹、历史浏览记录、评论、点赞、下单购买记录等数据。
在上述任一实施例的基础上,如图4所示,在S201所述的通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令之前,还可包括:
S401、通过所述语音采集设备获取用户对目标小程序的语音调起指令;
S402、由所述语音交互系统对所述语音调起指令进行语音识别和意图解析,根据意图解析结果确定待调起的目标小程序,并对所述目标小程序进行调起。
在本实施例中,由于S201-S203是在智能语音设备中已调起该目标小程序后对目标小程序的语音控制,而在对目标小程序进行语音控制之前,还需先将目标小程序进行调起。具体的,调起过程也可通过语音控制的方式进行调起,也即通过语音采集设备获取用户对目标小程序的语音调起指令,例如,用户发出“启动视频小程序A”的语音调起指令,进而可通过语音交互系统进行语音识别和意图解析,确定用户的意图是希望对目标小程序“视频小程序A”进行调起,因此进一步的可对目标小程序进行调起。
在上述实施例的基础上,如图5所示,S402中所述的语音交互系统对所述目标小程序进行调起时,具体可包括:
S501、由所述语音交互系统判断所述智能语音设备中是否已存在所述目标小程序;
S502、若确定所述智能语音设备中不存在所述目标小程序,则从服务端获取所述目标小程序的小程序包,并对所述目标小程序进行调起;或者
S503、若确定所述智能语音设备中已存在所述目标小程序,则直接对所述目标小程序进行调起。
在本实施例中,如果之前使用过目标小程序、且目标小程序仍缓存在智能语音设备中,或者之前未使用过目标小程序、但预先对目标小程序进行了缓存,则不需要再从服务端获取目标小程序的小程序包,可直接对目标小程序进行调起;否则,智能语 音设备中不存在目标小程序,则需要从服务端获取目标小程序的小程序包,然后再对目标小程序进行调起。
可选的,如图6所示,在S502所述的从服务端获取所述目标小程序的小程序包时,具体可包括:
S601、由所述语音交互系统判断所述智能语音设备是否支持与小程序的语音交互;
S602、若所述智能语音设备支持与小程序的语音交互,则从所述服务端获取所述目标小程序的全量小程序包;或者
S603、若所述智能语音设备不支持与小程序的语音交互,则从所述服务端获取所述目标小程序的部分小程序包,其中所述部分小程序包中不加载与语音交互相关模块。
在本实施例中,在语音交互系统从服务端获取目标小程序的小程序包SDK(软件开发工具包)时,可先判断智能语音设备进行是否具备支持小程序语音交互能力,如若智能语音设备支持与小程序的语音交互,则从服务端获取目标小程序的全量SDK小程序包;若智能语音设备不支持与小程序的语音交互,则可从服务端获取目标小程序的部分SDK小程序包,部分SDK小程序包是相对于全量SDK小程序包而言的,其中不加载与语音交互相关模块,也即此时目标小程序不具备接收意图信息、将意图信息转换成目标小程序的线程能够执行的控制指令的能力,可减少小程序包体大小,减小加载过程的流量消耗,提高目标小程序的调起速度。
在上述任一实施例的基础上,如图7所示,所述根据意图解析结果确定待调起的目标小程序,包括:
S701、所述语音交互系统若确定所述意图解析结果中包括待调起的目标小程序以及用户请求的资源信息,则搜索所述目标小程序的资源库中是否存在所述资源信息对应的目标资源;
S702、若不存在,则获取其他能够提供所述目标资源的小程序,并推荐给用户,以作为备选的目标小程序。
在本实施例中,用户发出的对目标小程序的语音调起指令中具体可包括目标小程序名称和所请求的资源信息,例如在智能语音设备当前未调起视频小程序A的情况下,用户发出“我要看视频小程序A的综艺节目B”的语音指令时,此时该语音指令则可作为未语音调起指令,在语音识别和意图解析后,可确定待调起的目标小程序为视频小程序A,用户请求的资源为综艺节目B,此时可向服务端搜索视频小程序A的资源库是否存在该资源,若存在该资源,则对视频小程序A进行调起,包括直接调起或者获取小程序包后再调起;若视频小程序A的资源库不存在该资源,则可查询是否有其 他的小程序的资源库中存在该资源,例如视频小程序C的资源库中存在该资源,可将视频小程序C推荐给用户,例如询问用户是否调起视频小程序C,或者直接将视频小程序C作为目标小程序进行调起。
此外,可选的,若用户在发出语音指令时只包括请求的资源,可根据请求的资源进行模糊搜索,确定目标小程序,例如用户发出“我要看综艺节目B”的语音指令,可以查找哪些小程序存在该资源,从而确定为目标小程序,进行调起。
在上述任一实施例的基础上,所述小程序的语音控制方法还可包括:
由所述语音交互系统定期从所述服务端获取预定数量的热门小程序的小程序包,并进行缓存。
在本实施例中,语音交互系统可定期访问服务端获取热门小程序相关信息,语音交互系统可根据热门小程序相关信息选择预定数量的热门小程序,从服务端获取该些热门小程序的小程序包进行缓存,以便于在用户需要调起某一热门小程序时可实现快速调起。当然,也可由服务端定期向语音交互系统推送热门小程序相关信息,进而由语音交互系统根据热门小程序相关信息选择预定数量的热门小程序。可选的,缓存热门小程序的数量可根据智能语音设备的存储空间确定;而选择那些热门小程序进行缓存可根据小程序的下载量、用户兴趣等因素进行确定。
上述各实施例提供的小程序的语音控制方法,通过语音交互系统与目标小程序框架之间的交互,可实现对目标小程序的语音控制;此外,在目标小程序未调起时,可通过语音控制实现对目标小程序的调起,也提高了调起小程序的便利性,简化了通过触摸方式查找、打开小程序的操作过程,提高了用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。
本申请一实施例提供一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
所述目标小程序接收所述语音交互系统传输的意图信息,其中所述意图信息为所述语音交互系统对用户发出的目标小程序的语音控制指令进行语音识别和意图解析后所得到的;
所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。
在上述实施例的基础上,其中,所述意图信息为所述语音交互系统根据意图解析结果按照预设协议生成的意图信息;
所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令,包括:
所述目标小程序根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令。
在上述实施例的基础上,所述预设协议中包括协议头和协议内容,其中所述协议内容中包括与意图对应的操作信息和目标小程序信息;
所述目标小程序根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令,包括:
所述目标小程序根据所述意图信息中的协议头,获取与所述协议头对应的预设转换规则,其中所述预设转换规则中包括预设操作信息与预设控制指令之间的对应关系;
所述目标小程序根据所述预设操作信息与预设控制指令之间的对应关系,将所述意图信息中的协议内容部分转换为目标小程序的线程能够执行的控制指令。
在上述实施例的基础上,所述由目标小程序的线程执行所述控制指令,包括:
若所述目标小程序确定所述控制指令需要与服务端进行交互,则所述目标小程序控制目标小程序的线程对于需要交互的相关内容进行本地暂存,并采用异步上传的方式对需要交互的相关内容上传到所述服务端。
在上述实施例的基础上,在所述目标小程序接收所述语音交互系统传输的意图信息之前,还包括:
所述目标小程序由所述语音交互系统根据用户发出的目标小程序的语音调起指令进行调起。
在上述实施例的基础上,所述智能语音设备为智能音箱。
本实施例提供的小程序的语音控制方法为上述实施例中智能语音设备的目标小程序执行的流程,其具体实现方式和技术效果参见上述实施例,此处不再赘述。
本申请一实施例提供一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
所述语音交互系统获取用户发出的目标小程序的语音控制指令;
所述语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息;
所述语音交互系统将所述意图信息传输给所述目标小程序,以由所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令并执行。
在上述实施例的基础上,所述获取意图信息,包括:
所述语音交互系统根据意图解析结果按照预设协议生成的意图信息,其中所述预设协议中包括协议头和协议内容,其中所述协议内容中包括与意图对应的操作信息和目标小程序信息。
在上述实施例的基础上,在所述语音交互系统获取用户发出的目标小程序的语音控制指令之前,还包括:
所述语音交互系统获取用户发出的对目标小程序的语音调起指令;
所述语音交互系统对所述语音调起指令进行语音识别和意图解析,根据意图解析结果确定待调起的目标小程序,并对所述目标小程序进行调起。
在上述实施例的基础上,所述对所述目标小程序进行调起,包括:
若所述语音交互系统确定所述智能语音设备中不存在所述目标小程序,则从服务端获取所述目标小程序的小程序包,并对所述目标小程序进行调起;或者
若所述语音交互系统确定所述智能语音设备中已存在所述目标小程序,则直接对所述目标小程序进行调起。
在上述实施例的基础上,所述从服务端获取所述目标小程序的小程序包,包括:
若所述语音交互系统确定所述智能语音设备支持与小程序的语音交互,则从所述服务端获取所述目标小程序的全量小程序包;或者
若所述语音交互系统确定所述智能语音设备不支持与小程序的语音交互,则从所述服务端获取所述目标小程序的部分小程序包,其中所述部分小程序包中不加载与语音交互相关模块。
在上述实施例的基础上,所述根据意图解析结果确定待调起的目标小程序,包括:
所述语音交互系统若确定所述意图解析结果中包括待调起的目标小程序以及用户请求的资源信息,则搜索所述目标小程序的资源库中是否存在所述资源信息对应的目标资源;
若不存在,则获取其他能够提供所述目标资源的小程序,并推荐给用户,以作为备选的目标小程序。
在上述实施例的基础上,所述方法还包括:
所述语音交互系统定期从所述服务端获取预定数量的热门小程序的小程序包,并进行缓存。
在上述实施例的基础上,所述智能语音设备为智能音箱。
本实施例提供的小程序的语音控制方法为上述实施例中智能语音设备的语音交互系统执行的流程,其具体实现方式和技术效果参见上述实施例,此处不再赘述。
本申请一实施例提供一种智能语音设备,图8为本申请实施例提供的智能语音设备的结构图。如图8所示,所述智能语音设备800其上配置有语音交互系统810和目标小程序820。
其中,语音交互系统810可包括:获取模块811和语音处理模块812;
获取模块811,用于通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;
语音处理模块812,用于对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;
语音处理模块812具体可包括语音识别子模块和意图解析子模块。
其中,所述目标小程序820可包括:接收模块821、指令转换模块822和执行模块823;
接收模块821,用于接收所述意图信息;
指令转换模块822,用于将所述意图信息转换为目标小程序的线程能够执行的控制指令;
执行模块823,用于通过目标小程序的线程执行所述控制指令。
在上述任一实施例的基础上,所述语音处理模块812在获取意图信息时,用于:
根据意图解析结果按照预设协议生成意图信息;
所述指令转换模块822在将所述意图信息转换为目标小程序的线程能够执行的控制指令时,用于:
根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令。
在上述任一实施例的基础上,所述执行模块823在通过目标小程序的线程执行所述控制指令时,用于:
判断所述控制指令是否需要与服务端进行交互;
若所述控制指令需要与服务端进行交互,则控制目标小程序的线程对于需要交互的相关内容进行本地暂存,并采用异步上传的方式对需要交互的相关内容上传到所述服务端。
也即执行模块823具体可包括调度子模块和上传子模块。
在上述任一实施例的基础上,所述获取模块811在所述通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令之前,还用于:通过所述语音采集 设备获取用户对目标小程序的语音调起指令;
所述语音交互系统810还包括小程序调起模块813,用于:
对所述语音调起指令进行语音识别和意图解析,根据意图解析结果确定待调起的目标小程序,并对所述目标小程序进行调起。
在上述任一实施例的基础上,所述小程序调起模813在对所述目标小程序进行调起时,用于:
判断所述智能语音设备中是否已存在所述目标小程序;
若确定所述智能语音设备中不存在所述目标小程序,则从服务端获取所述目标小程序的小程序包,并对所述目标小程序进行调起;或者
若确定所述智能语音设备中已存在所述目标小程序,则直接对所述目标小程序进行调起。
在上述任一实施例的基础上,所述小程序调起模块813在从服务端获取所述目标小程序的小程序包时,用于:
判断所述智能语音设备是否支持与小程序的语音交互;
若所述智能语音设备支持与小程序的语音交互,则从所述服务端获取所述目标小程序的全量小程序包;或者
若所述智能语音设备不支持与小程序的语音交互,则从所述服务端获取所述目标小程序的部分小程序包,其中所述部分小程序包中不加载与语音交互相关模块。
在上述任一实施例的基础上,所述小程序调起模块813在根据意图解析结果确定待调起的目标小程序时,用于:
若确定所述意图解析结果中包括待调起的目标小程序以及用户请求的资源信息,则搜索所述目标小程序的资源库中是否存在所述资源信息对应的目标资源;
若不存在,则获取其他能够提供所述目标资源的小程序,并推荐给用户,以作为备选的目标小程序。
在上述任一实施例的基础上,所述小程序调起模块813还用于:
定期从所述服务端获取预定数量的热门小程序的小程序包,并进行缓存。
本实施例提供的智能语音设备可以具体用于执行上述图所提供的方法实施例,具体功能此处不再提供的赘述。
本实施例提供的智能语音设备,通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;由语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;由目标小程序接收所 述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。本实施例通过语音交互系统与目标小程序框架之间的交互,可实现对目标小程序的语音控制,提高了交互过程的便利性,避免了需要触摸方式与小程序交互导致语音交互不可持续而进行造成注意力割裂,提高了用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。
如图9所示,是根据本申请实施例的小程序的语音控制方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图9所示,该电子设备包括:一个或多个处理器901、存储器902,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图9中以一个处理器901为例。
存储器902即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的小程序的语音控制方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的小程序的语音控制方法。
存储器902作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的小程序的语音控制方法对应的程序指令/模块(例如,附图8所示的各模块)。处理器901通过运行存储在存储器902中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的小程序的语音控制方法。
存储器902可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系 统、至少一个功能所需要的应用程序;存储数据区可存储根据小程序的语音控制方法的电子设备的使用所创建的数据等。此外,存储器902可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器902可选包括相对于处理器901远程设置的存储器,这些远程存储器可以通过网络连接至小程序的语音控制方法的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
小程序的语音控制方法的电子设备还可以包括:输入装置903和输出装置904。处理器901、存储器902、输入装置903和输出装置904可以通过总线或者其他方式连接,图9中以通过总线连接为例。
输入装置903可接收输入的数字或字符信息,以及产生与小程序的语音控制方法的电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置904可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
这些计算机程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算机程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶 显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
根据本申请实施例的技术方案,通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;由语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;由目标小程序接收所述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。本实施例通过语音交互系统与目标小程序框架之间的交互,可实现对目标小程序的语音控制,提高了交互过程的便利性,避免了需要触摸方式与小程序交互导致语音交互不可持续而进行造成注意力割裂,提高了用户使用小程序的体验,也为小程序在智能语音设备分发和使用量提供有力支撑。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。

Claims (27)

  1. 一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
    所述目标小程序接收所述语音交互系统传输的意图信息,其中所述意图信息为所述语音交互系统对用户发出的目标小程序的语音控制指令进行语音识别和意图解析后所得到的;
    所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。
  2. 根据权利要求1所述的方法,其中,所述意图信息为所述语音交互系统根据意图解析结果按照预设协议生成的意图信息;
    所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令,包括:
    所述目标小程序根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令。
  3. 根据权利要求2所述的方法,其中,所述预设协议中包括协议头和协议内容,其中所述协议内容中包括与意图对应的操作信息和目标小程序信息;
    所述目标小程序根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令,包括:
    所述目标小程序根据所述意图信息中的协议头,获取与所述协议头对应的预设转换规则,其中所述预设转换规则中包括预设操作信息与预设控制指令之间的对应关系;
    所述目标小程序根据所述预设操作信息与预设控制指令之间的对应关系,将所述意图信息中的协议内容部分转换为目标小程序的线程能够执行的控制指令。
  4. 根据权利要求1-3任一项所述的方法,其中,所述由目标小程序的线程执行所述控制指令,包括:
    若所述目标小程序确定所述控制指令需要与服务端进行交互,则所述目标小程序控制目标小程序的线程对于需要交互的相关内容进行本地暂存, 并采用异步上传的方式对需要交互的相关内容上传到所述服务端。
  5. 根据权利要求1-4任一项所述的方法,在所述目标小程序接收所述语音交互系统传输的意图信息之前,还包括:
    所述目标小程序由所述语音交互系统根据用户发出的目标小程序的语音调起指令进行调起。
  6. 根据权利要求1-5任一项所述的方法,其中,所述智能语音设备为智能音箱。
  7. 一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
    所述语音交互系统获取用户发出的目标小程序的语音控制指令;
    所述语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息;
    所述语音交互系统将所述意图信息传输给所述目标小程序,以由所述目标小程序将所述意图信息转换为目标小程序的线程能够执行的控制指令并执行。
  8. 根据权利要求7所述的方法,其中,所述获取意图信息,包括:
    所述语音交互系统根据意图解析结果按照预设协议生成的意图信息,其中所述预设协议中包括协议头和协议内容,其中所述协议内容中包括与意图对应的操作信息和目标小程序信息。
  9. 根据权利要求7或8所述的方法,在所述语音交互系统获取用户发出的目标小程序的语音控制指令之前,还包括:
    所述语音交互系统获取用户发出的对目标小程序的语音调起指令;
    所述语音交互系统对所述语音调起指令进行语音识别和意图解析,根据意图解析结果确定待调起的目标小程序,并对所述目标小程序进行调起。
  10. 根据权利要求9所述的方法,其中,所述对所述目标小程序进行调起,包括:
    若所述语音交互系统确定所述智能语音设备中不存在所述目标小程序,则从服务端获取所述目标小程序的小程序包,并对所述目标小程序进行调起;或者
    若所述语音交互系统确定所述智能语音设备中已存在所述目标小程序,则直接对所述目标小程序进行调起。
  11. 根据权利要求10所述的方法,其中,所述从服务端获取所述目标小程序的小程序包,包括:
    若所述语音交互系统确定所述智能语音设备支持与小程序的语音交互,则从所述服务端获取所述目标小程序的全量小程序包;或者
    若所述语音交互系统确定所述智能语音设备不支持与小程序的语音交互,则从所述服务端获取所述目标小程序的部分小程序包,其中所述部分小程序包中不加载与语音交互相关模块。
  12. 根据权利要求9-11任一项所述的方法,其中,所述根据意图解析结果确定待调起的目标小程序,包括:
    所述语音交互系统若确定所述意图解析结果中包括待调起的目标小程序以及用户请求的资源信息,则搜索所述目标小程序的资源库中是否存在所述资源信息对应的目标资源;
    若不存在,则获取其他能够提供所述目标资源的小程序,并推荐给用户,以作为备选的目标小程序。
  13. 根据权利要求10-12任一项所述的方法,还包括:
    所述语音交互系统定期从所述服务端获取预定数量的热门小程序的小程序包,并进行缓存。
  14. 根据权利要求7-13任一项所述的方法,其中,所述智能语音设备为智能音箱。
  15. 一种小程序的语音控制方法,应用于智能语音设备,所述智能语音设备上配置有语音交互系统和目标小程序,所述方法包括:
    通过语音采集设备获取用户对目标小程序的语音控制指令;
    由所述语音交互系统对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给所述目标小程序;
    由所述目标小程序接收所述意图信息,并将所述意图信息转换为目标小程序的线程能够执行的控制指令,由目标小程序的线程执行所述控制指令。
  16. 根据权利要求15所述的方法,其中,在所述通过语音采集设备获取用户对目标小程序的语音控制指令之前,还包括:
    通过语音采集设备获取用户对目标小程序的语音调起指令;
    由所述语音交互系统对所述语音调起指令进行语音识别和意图解析, 根据意图解析结果确定待调起的目标小程序,并对所述目标小程序进行调起。
  17. 一种智能语音设备,其上配置有语音交互系统和目标小程序;其中语音交互系统包括:
    获取模块,用于通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令;
    语音处理模块,用于对所述语音控制指令进行语音识别和意图解析,获取意图信息,并将所述意图信息传输给目标小程序;
    所述目标小程序包括:
    接收模块,用于接收所述意图信息;
    指令转换模块,用于将所述意图信息转换为目标小程序的线程能够执行的控制指令;
    执行模块,用于通过目标小程序的线程执行所述控制指令。
  18. 根据权利要求17所述的设备,其中,所述语音处理模块在获取意图信息时,用于:
    根据意图解析结果按照预设协议生成意图信息;
    所述指令转换模块在将所述意图信息转换为目标小程序的线程能够执行的控制指令时,用于:
    根据所述意图信息确定预定转换规则,并按照所述预定转换规则将所述意图信息转换为目标小程序的线程能够执行的控制指令。
  19. 根据权利要求17或18所述的设备,其中,所述执行模块在通过目标小程序的线程执行所述控制指令时,用于:
    判断所述控制指令是否需要与服务端进行交互;
    若所述控制指令需要与服务端进行交互,则控制目标小程序的线程对于需要交互的相关内容进行本地暂存,并采用异步上传的方式对需要交互的相关内容上传到所述服务端。
  20. 根据权利要求17-19任一项所述的设备,其中,所述获取模块在所述通过语音采集设备获取用户对智能语音设备中的目标小程序的语音控制指令之前,还用于:通过所述语音采集设备获取用户对目标小程序的语音调起指令;
    所述语音交互系统还包括小程序调起模块,用于
    对所述语音调起指令进行语音识别和意图解析,根据意图解析结果确定待调起的目标小程序,并对所述目标小程序进行调起。
  21. 根据权利要求20所述的设备,其中,所述小程序调起模块在对所述目标小程序进行调起时,用于:
    判断所述智能语音设备中是否已存在所述目标小程序;
    若确定所述智能语音设备中不存在所述目标小程序,则从服务端获取所述目标小程序的小程序包,并对所述目标小程序进行调起;或者
    若确定所述智能语音设备中已存在所述目标小程序,则直接对所述目标小程序进行调起。
  22. 根据权利要求21所述的设备,其中,所述小程序调起模块在从服务端获取所述目标小程序的小程序包时,用于:
    判断所述智能语音设备是否支持与小程序的语音交互;
    若所述智能语音设备支持与小程序的语音交互,则从所述服务端获取所述目标小程序的全量小程序包;或者
    若所述智能语音设备不支持与小程序的语音交互,则从所述服务端获取所述目标小程序的部分小程序包,其中所述部分小程序包中不加载与语音交互相关模块。
  23. 根据权利要求20-22任一项所述的设备,其中,所述小程序调起模块在根据意图解析结果确定待调起的目标小程序时,用于:
    若确定所述意图解析结果中包括待调起的目标小程序以及用户请求的资源信息,则搜索所述目标小程序的资源库中是否存在所述资源信息对应的目标资源;
    若不存在,则获取其他能够提供所述目标资源的小程序,并推荐给用户,以作为备选的目标小程序。
  24. 根据权利要求20-23任一项所述的设备,其中,所述小程序调起模块还用于:
    定期从所述服务端获取预定数量的热门小程序的小程序包,并进行缓存。
  25. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-16中任一项所述的方法。
  26. 一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行权利要求1-16中任一项所述的方法。
  27. 一种计算机程序,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如权利要求1-16中任一项所述的方法。
PCT/CN2020/117498 2020-06-29 2020-09-24 小程序的语音控制方法、设备及存储介质 WO2022000828A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020217020655A KR20210091328A (ko) 2020-06-29 2020-09-24 애플릿의 음성 제어방법, 기기 및 저장매체
EP20943669.0A EP4170650A1 (en) 2020-06-29 2020-09-24 Speech control method for mini-program, and devices and storage medium
JP2022520806A JP7373063B2 (ja) 2020-06-29 2020-09-24 ミニプログラムの音声制御方法、機器及び記憶媒体
US17/357,660 US11984120B2 (en) 2020-06-29 2021-06-24 Voice control method for applet and device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010605375.6 2020-06-29
CN202010605375.6A CN111724785B (zh) 2020-06-29 2020-06-29 小程序的语音控制方法、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/357,660 Continuation US11984120B2 (en) 2020-06-29 2021-06-24 Voice control method for applet and device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022000828A1 true WO2022000828A1 (zh) 2022-01-06

Family

ID=72569589

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117498 WO2022000828A1 (zh) 2020-06-29 2020-09-24 小程序的语音控制方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111724785B (zh)
WO (1) WO2022000828A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112242141B (zh) * 2020-10-15 2022-03-15 广州小鹏汽车科技有限公司 一种语音控制方法、智能座舱、服务器、车辆和介质
CN112379945B (zh) * 2020-11-20 2024-04-19 北京百度网讯科技有限公司 用于运行应用的方法、装置、设备以及存储介质
CN113763946A (zh) * 2021-01-04 2021-12-07 北京沃东天骏信息技术有限公司 消息处理方法、语音处理方法、装置、终端和存储介质
CN113093596A (zh) * 2021-03-29 2021-07-09 北京金山云网络技术有限公司 一种控制指令的处理方法和装置
CN113823283B (zh) * 2021-09-22 2024-03-08 百度在线网络技术(北京)有限公司 信息处理的方法、设备、存储介质及程序产品
CN114407796A (zh) * 2022-01-21 2022-04-29 腾讯科技(深圳)有限公司 车载终端的控制方法、装置、设备及存储介质
CN114639384B (zh) * 2022-05-16 2022-08-23 腾讯科技(深圳)有限公司 语音控制方法、装置、计算机设备及计算机存储介质
CN115588433A (zh) * 2022-11-24 2023-01-10 广州小鹏汽车科技有限公司 语音交互方法、服务器及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106297782A (zh) * 2016-07-28 2017-01-04 北京智能管家科技有限公司 一种人机交互方法及系统
CN108470566A (zh) * 2018-03-08 2018-08-31 腾讯科技(深圳)有限公司 一种应用操作方法以及装置
CN110060679A (zh) * 2019-04-23 2019-07-26 诚迈科技(南京)股份有限公司 一种全程语音控制的交互方法和系统
CN110718221A (zh) * 2019-10-08 2020-01-21 百度在线网络技术(北京)有限公司 语音技能控制方法、语音设备、客户端以及服务器

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002351652A (ja) * 2001-05-23 2002-12-06 Nec System Technologies Ltd 音声認識操作支援システム、音声認識操作支援方法、および、音声認識操作支援プログラム
US20070286360A1 (en) * 2006-03-27 2007-12-13 Frank Chu System and Method for Providing Screen-Context Assisted Information Retrieval
US9959865B2 (en) * 2012-11-13 2018-05-01 Beijing Lenovo Software Ltd. Information processing method with voice recognition
CN107765838A (zh) * 2016-08-18 2018-03-06 北京北信源软件股份有限公司 人机交互辅助方法及装置
US10127908B1 (en) * 2016-11-11 2018-11-13 Amazon Technologies, Inc. Connected accessory for a voice-controlled device
CN109429522A (zh) * 2016-12-06 2019-03-05 吉蒂机器人私人有限公司 语音交互方法、装置及系统
US11204787B2 (en) * 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US10713007B2 (en) * 2017-12-12 2020-07-14 Amazon Technologies, Inc. Architecture for a hub configured to control a second device while a connection to a remote system is unavailable
CN108632140A (zh) * 2018-04-22 2018-10-09 厦门声连网信息科技有限公司 一种基于小程序的声音处理系统、方法及服务器
CN110659013A (zh) * 2018-06-28 2020-01-07 比亚迪股份有限公司 一种消息处理方法与装置、存储介质
CN109036396A (zh) * 2018-06-29 2018-12-18 百度在线网络技术(北京)有限公司 一种第三方应用的交互方法及系统
CN110797022B (zh) * 2019-09-06 2023-08-08 腾讯科技(深圳)有限公司 一种应用控制方法、装置、终端和服务器
CN110647305B (zh) * 2019-09-29 2023-10-31 阿波罗智联(北京)科技有限公司 应用程序的语音交互方法、装置、设备和介质
CN110580904A (zh) * 2019-09-29 2019-12-17 百度在线网络技术(北京)有限公司 通过语音控制小程序的方法、装置、电子设备及存储介质
CN111145755A (zh) * 2019-12-24 2020-05-12 北京摩拜科技有限公司 一种语音调用共享单车应用开锁界面的方法、装置及终端设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106297782A (zh) * 2016-07-28 2017-01-04 北京智能管家科技有限公司 一种人机交互方法及系统
CN108470566A (zh) * 2018-03-08 2018-08-31 腾讯科技(深圳)有限公司 一种应用操作方法以及装置
CN110060679A (zh) * 2019-04-23 2019-07-26 诚迈科技(南京)股份有限公司 一种全程语音控制的交互方法和系统
CN110718221A (zh) * 2019-10-08 2020-01-21 百度在线网络技术(北京)有限公司 语音技能控制方法、语音设备、客户端以及服务器

Also Published As

Publication number Publication date
CN111724785A (zh) 2020-09-29
CN111724785B (zh) 2023-07-04

Similar Documents

Publication Publication Date Title
WO2022000828A1 (zh) 小程序的语音控制方法、设备及存储介质
JP7373063B2 (ja) ミニプログラムの音声制御方法、機器及び記憶媒体
US20210200537A1 (en) Distributed cross-platform application projection management and delivery
US10074365B2 (en) Voice control method, mobile terminal device, and voice control system
US8763058B2 (en) Selective data downloading and presentation based on user interaction
KR101259157B1 (ko) 사용자 인터페이스를 관리하는 장치 및 방법
US10002115B1 (en) Hybrid rendering of a web page
US9479564B2 (en) Browsing session metric creation
KR20110040604A (ko) 클라우드 서버, 클라이언트 단말, 디바이스, 클라우드 서버의 동작 방법 및 클라이언트 단말의 동작 방법
JP7087121B2 (ja) ランディングページの処理方法、装置、機器及び媒体
JP7381518B2 (ja) アプリケーションプログラムの操作ガイダンス方法、装置、機器及び読み取り可能な記憶媒体
CN110058832A (zh) 图像处理装置及其控制方法
US10140985B2 (en) Server for processing speech, control method thereof, image processing apparatus, and control method thereof
CN112104905B (zh) 服务器、显示设备及数据传输方法
US10223458B1 (en) Automatic magazine generator for web content
JP7130803B2 (ja) 検索方法、検索装置、電子機器及び記憶媒体
US20100318671A1 (en) System and method for selection of streaming media
US20120110067A1 (en) Remote graphics rendering
US20120317493A1 (en) Methods and System for Locally Generated Gesture and Transition Graphics Interaction with Terminal Control Services
WO2022100192A1 (zh) 多媒体文件的处理方法、装置、终端及网络接入点设备
CN110741339A (zh) 具有延迟考虑的显示模式相关响应生成
WO2022247507A1 (zh) 播放系统的控制方法及播放系统
US10298567B1 (en) System for providing multi-device access to complementary content
KR101561524B1 (ko) 원격 사용자 인터페이스 관리 시스템 및 그 방법
EP2718834A1 (en) Methods and system for locally generated gesture and transition graphics interaction with terminal control services

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20217020655

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20943669

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022520806

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020943669

Country of ref document: EP

Effective date: 20230118