WO2022247466A1 - 资源显示的方法、终端及服务器 - Google Patents

资源显示的方法、终端及服务器 Download PDF

Info

Publication number
WO2022247466A1
WO2022247466A1 PCT/CN2022/085018 CN2022085018W WO2022247466A1 WO 2022247466 A1 WO2022247466 A1 WO 2022247466A1 CN 2022085018 W CN2022085018 W CN 2022085018W WO 2022247466 A1 WO2022247466 A1 WO 2022247466A1
Authority
WO
WIPO (PCT)
Prior art keywords
card
voice
resource
control
candidate set
Prior art date
Application number
PCT/CN2022/085018
Other languages
English (en)
French (fr)
Inventor
黄益贵
乔登龙
夏潘斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022247466A1 publication Critical patent/WO2022247466A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • This application relates to the technical field of voice control in the field of artificial intelligence, in particular to a method, terminal and server for resource display.
  • terminals such as mobile phones, tablets, smart screens, notebooks, and wearable devices
  • Terminals can not only interact based on user gestures, but also voice interaction is becoming more and more common.
  • Voice interaction can realize the basic needs of users, such as checking the weather, opening applications, and recommending food.
  • the process of voice interaction is somewhat complicated, such as multiple rounds of dialogue, it is necessary to display and control the content obtained in the previous round of dialogue, the terminal cannot accurately identify the user's intention, and it is prone to misidentification or misresponse.
  • the present application provides a resource display method, terminal and server to recognize the second or later voices of the user in multiple rounds of dialogue, based on the first round or before the current round of dialogue
  • the first resource displayed in a certain round is used to identify the control command corresponding to the user's voice, so that the control command recognized by the voice has a certain range of control, so that the recognized control command is more accurate, so as to accurately respond to the user's voice. intention.
  • the first aspect of this application provides a resource display method, including:
  • the second voice is used to control the displayed first resource
  • the second part of the content of the first resource is displayed, or in response to the control instruction corresponding to the second voice, the second resource related to the first resource is displayed.
  • the content acquired in the previous round of dialogue (such as the above-mentioned first resource) can be displayed and controlled through the user's second voice, so as to display the second part of the content of the first resource , or to display a second resource related to the first resource.
  • the content displayed on the first user interface includes the first part of the first resource, specifically including: displaying the first part of the first resource in the form of a card on the first user interface.
  • the terminal can present various resources to the user for display through a card or card-like display form.
  • Cards can be in the form of views (views) or components (Widget components), and usually present information in the form of an information box on the user interface.
  • the information box is usually in the shape of a rectangle or a rounded rectangle.
  • the card may be a card displayed on the negative screen of the terminal, or a quick application card.
  • the card includes multiple types, and the card type of the first card presented by the terminal is related to the content to be displayed by the first resource.
  • the first resource to be displayed and the corresponding card type to be used can be specified in advance (for example, specified by the first resource provider), so that the first resource can be displayed with a more suitable card type.
  • the types of cards include plain text cards, picture-text cards, grid-type cards, list-type cards, and the like.
  • the first card corresponds to a first voice command control candidate set
  • the first voice command control candidate set includes one or more instructions for displaying and controlling the content of the first card
  • the candidate set controlled by the first voice command is related to the card type of the first card
  • the instruction corresponding to the second voice matches the first instruction of the candidate set controlled by the first voice instruction, execute the first display control associated with the first instruction on the first card to display the second part of the content of the first resource, or A second resource related to the first resource is displayed.
  • the method further includes: receiving the user's third voice; if the command corresponding to the third voice matches the second command in the candidate set controlled by the first voice command, the A second display control associated with the second instruction is executed, the second display control being different from the first display control.
  • the second voice can be a voice command to control the movement of the selection box (such as the voice is: select the next one or select the left/right, etc.), the first display control It is to control the movement of the selection box;
  • the third voice can be a voice command for the overall movement of the card display content (corresponding to finger touch and slide) (for example, the voice is: slide to the right, slide down, etc.)
  • the second display control is to control the display content of the card For example, moving the displayed content to the left or upward, so as to move part of the content out of the card, and move another part (corresponding to part of the content of the second resource) into the card display.
  • different card types correspond to different voice command control candidate sets; the method also includes: receiving the user's fourth voice; displaying the third resource corresponding to the fourth voice with the second card , the type of the second card is different from the type of the first card; the second card corresponds to the candidate set of the second voice command control, and the candidate set of the second voice command control includes one or more display controls for the content of the second card Instructions, the candidate set controlled by the second voice command is different from the candidate set controlled by the first voice command.
  • Fig. 4f-1 shows the obtained first resource (that is, obtains each Mate mobile phone model) with a grid-shaped card (equivalent to the type of the first card).
  • Figure 4f-2 shows the obtained third resource (that is, the introduction information of the obtained Mate Porsche Design Edition) using a graphic card (equivalent to the type of the second card), and the server will use the grid card and the The obtained first resource generates a corresponding command control candidate set (corresponding to the first voice command control candidate set), and will generate a corresponding command control candidate set (corresponding to the second voice command control) based on the graphic card and the obtained second resource. candidate set for instruction control). Based on the above, it is possible to support the second round of dialogue (corresponding to the second voice) and the third round of dialogue (corresponding to the fourth voice).
  • the candidate set controlled by the first voice command is generated by the server based on the card type of the first card and the first resource.
  • the candidate set controlled by the first voice command is related to the card type and the first resource used, it is equivalent to setting a corresponding range (namely the candidate set) for the identifiable control command, so the first voice command identified accordingly
  • the control command of the second voice is related to the card type and the first resource, so the recognized control command of the second voice is more accurate.
  • the method further includes: sending the data of the second voice to the server; As a result of matching the first instruction of the candidate set, the first display control associated with the first instruction is executed on the first card.
  • the command corresponding to the second voice is obtained by matching the candidate set controlled by the first voice command by the server. Since the candidate set controlled by the first voice command is related to the card type and the first resource used, the recognized The control instructions of the second voice are more accurate.
  • the method further includes: receiving the fifth voice of the user; displaying the fourth resource corresponding to the fifth voice with a third card, the type of the third card is the same as the type of the first card, The fourth resource is different from the first resource; the third card corresponds to the candidate set controlled by the third voice command, and the candidate set controlled by the third voice command includes one or more commands for displaying and controlling the contents of the third card, and the third voice command
  • the candidate set of command controls is different from the first candidate set of voice command controls.
  • Fig. 4a-1 shows the first resource obtained by displaying a picture-text card (that is, a brief introduction to Zhang San), and Fig. 4a-2 is also The graphic-text card is used to display the obtained fourth resource (that is, the detailed information of Zhang San obtained by triggering the jump link), and the server will generate a corresponding command-controlled candidate set based on the graphic-text card and the obtained first resource (corresponding to The candidate set controlled by the first voice command, for example, including the command to view details), will generate a corresponding candidate set of command control (corresponding to the candidate set controlled by the third voice command, for example, including An instruction to move the displayed content of the card up/down as a whole, so as to display the part that is not currently displayed in the current card).
  • the second round of dialogue for example, corresponding to the second voice
  • the third round of dialogue for example, corresponding to the fifth voice
  • the types of cards include one or more of the following: plain text cards, picture-text cards, grid-type cards, and list-type cards.
  • the candidate set of the first control instruction includes one or more of moving, turning pages, or viewing details.
  • a candidate set of corresponding first control instructions can be generated according to the type of the above card.
  • the generated candidate set of first control instructions may be related to the type of card used.
  • the candidate set of control instructions includes one or more of moving, page turning, selecting an object, or moving a selection box.
  • a candidate set of corresponding first control instructions can be generated according to the type of the above card.
  • the generated candidate set of first control instructions may be related to the type of card used.
  • the candidate set of the first control instruction further includes: a control instruction for the button, and the control instruction is used to trigger the control.
  • corresponding control instructions can also be generated according to the buttons on the card, expanding the content that can be controlled by voice.
  • the first resource is a resource obtained from a third-party server, and the card type is selected by the third party corresponding to the first resource.
  • the session identifiers corresponding to the first voice and the second voice are the same.
  • the second aspect of this application provides a resource display method, which is applied to the server, including:
  • the terminal receiving data corresponding to the user's second voice sent by the terminal; the second voice is used by the terminal to control the displayed first resource;
  • the terminal can present multiple resources to the user for display through a card or similar display form.
  • Cards can be in the form of views (views) or components (Widget components), and usually present information in the form of an information box on the user interface.
  • the information box is usually in the shape of a rectangle or a rounded rectangle.
  • the card may be a card displayed on the negative screen of the terminal, or a quick application card.
  • the method further includes: sending the information of the first card to the terminal, so that the terminal displays the first resource in the first card.
  • the resource wherein the information of the first card includes the first card or the first card's identifier or the resource access path of the first card.
  • the terminal can obtain the first card in various ways. For example, directly send the first card to the terminal; or send the identification of the first card to the terminal, and the terminal can determine the card corresponding to the identification according to the pre-stored cards, or obtain the corresponding card from the server according to the card identification; or send The resource access path of the first card leads to the terminal, and the terminal can obtain the first card according to the path.
  • the method further includes:
  • the first voice command control candidate set includes one or multiple instructions for controlling the display of the content of the first card
  • the method further includes:
  • a second display control instruction for the first card associated with the second instruction is sent to the terminal, and the second display control instruction is different from The first display control instructions.
  • different card types correspond to different voice command control candidate sets; the method also includes:
  • the information of the second card includes the second card or the identification of the second card or the resource access path of the second card , the type of the second card is different from the type of the first card;
  • the second voice command control candidate set includes one or more instructions for controlling the display of the content of the second card, the second voice command control candidate set is different from the first voice command control candidate set.
  • the method further includes:
  • the third voice command control candidate set includes one or more commands for display control of the content of the third card, the third voice command control candidate set is different from the first voice command control candidate set.
  • the types of cards include one or more of the following: plain text cards, picture-text cards, grid-type cards, and list-type cards.
  • the candidate set of the first control instruction includes one or more of moving, turning pages, or viewing details.
  • the candidate set of control instructions includes one or more of moving, page turning, selecting an object, or moving a selection box.
  • the candidate set of the first control instruction further includes: a control instruction for the button, and the control instruction is used to trigger the control.
  • the first resource is a resource obtained from a third-party server, and the card type is selected by the third party corresponding to the first resource.
  • the session identifiers corresponding to the first voice and the second voice are the same.
  • the third aspect of the present application provides a resource display method, including:
  • the fourth aspect of the present application provides a terminal, including a processor, an internal memory, a display screen, and a communication interface;
  • the processor When the computer executable program is invoked by the processor, the processor is used to realize any resource display method provided in the first aspect of the present application through a communication interface or a display screen.
  • the fifth aspect of the present application provides a server, including a processor, a memory, and a communication interface;
  • the processor When the computer executable program is invoked by the processor, the processor is used to implement any resource display method provided in the second aspect of the present application through the communication interface.
  • Fig. 1 is a schematic diagram showing a recipe through a card
  • FIG. 2 is a schematic diagram of an embodiment of an application scenario of the present application
  • Fig. 3 is a flow chart of the first embodiment of the method for resource display in this application.
  • Figure 4a-1 is a schematic diagram of an embodiment of the graphic card of the present application.
  • Fig. 4a-2 is a schematic diagram of an embodiment of the details page corresponding to Fig. 4a-1 of the present application;
  • Fig. 4b is a schematic diagram of an embodiment of the grid-shaped card of the present application.
  • Fig. 4c is a schematic diagram of an embodiment of the list card of the present application.
  • Fig. 4d is a schematic diagram of an embodiment of the plain text card of the present application.
  • Fig. 4e is a schematic diagram of another embodiment of the list card of the present application.
  • Figure 4f-1 is a schematic diagram of displaying a card in response to "I want to buy Mate40" in the first embodiment of the resource display method of this application;
  • Figure 4f-2 is a schematic diagram of displaying a card responding to "Mate40 Chrysler Design Edition" in the first embodiment of the resource display method of this application;
  • Fig. 5 is a schematic diagram of an embodiment of the development interface provided by the present application.
  • Fig. 6 is a flow chart of the first embodiment of the resource display method of the present application.
  • Fig. 7a is a schematic diagram of the second embodiment of the method shown in the resources of this application.
  • Fig. 7b is a schematic diagram of the third embodiment of the method displayed by the resources of this application.
  • FIG. 8 is a schematic diagram of an embodiment of a terminal of the present application.
  • FIG. 9 is a schematic diagram of an embodiment of the server of the present application.
  • a user may choose to use a voice mode to interact with the terminal when using the terminal.
  • the graphic information related to the recipe shown on the page displayed by the card shown in FIG. 1 is taken as an example for illustration.
  • the terminal responds to the user's voice "how to make braised pork", and shows the recipe of braised pork to the user through a card. Due to the large content, the current display interface only shows the first step of the recipe for braised pork.
  • FIG. 1 shows that the terminal recognizes "next step” as a poem name, and displays the content of the poem corresponding to the "next step” through a card.
  • Cards are a new form of service.
  • Terminals can display multiple resources to users through cards or card-like displays.
  • the information of these resources includes information such as pictures, texts, controls, jump links, etc. Through these resources, interactive services can be provided to users.
  • a card can be in the form of a view or a component (Widget component). It is a form of presenting information on a user interface with an information box.
  • the information box is usually in the shape of a rectangle or a rounded rectangle.
  • the embodiment of the present application will be described later by taking displaying information in the form of a card as an example. It can be understood that the terminal may also display information in other forms, not limited to cards, which is not specifically limited in this embodiment of the present application.
  • the embodiment of the present application provides a resource display method, which can realize the voice interaction between the user and the first resource displayed on the current interface. For example, as shown in Figure 1, based on the resource display method of the embodiment of the present application, when the user says "next step", the terminal can respond to the user's intention more accurately and display another part of the first resource (braised pork recipe), Or display the second resource (recipe details of braised pork) related to the first resource.
  • FIG. 2 An application scenario of the embodiment of the present application can be referred to the schematic diagram shown in FIG. 2 , which includes a terminal 20 , a server 24 and a third-party server 26 .
  • the terminal 20 can be any terminal with a communication function, such as a mobile phone, a tablet computer, a wearable device, an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a notebook computer, a vehicle-mounted terminal and other electronic devices.
  • a communication function such as a mobile phone, a tablet computer, a wearable device, an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a notebook computer, a vehicle-mounted terminal and other electronic devices.
  • augmented reality augmented reality
  • VR virtual reality
  • the server 24 can be a general server, or a cloud server, etc., and is used to provide the terminal 20 with the information required for user interface display. These information can include the data provided by the third-party server 26, or the data of the server 24 itself, or Includes card templates and more.
  • the server 24 can be realized by multiple servers, for example, the server 24 can be realized by a voice assistant server and a voice open server.
  • the server 24 can also be implemented by one server, in this case, there is no information interaction step between servers when implemented by multiple servers.
  • the third-party server 26 may be a general server, or a cloud server, etc., and is used to provide resource data required for the voice interaction process, such as the data displayed in the card in the user interface of the terminal 20 .
  • the terminal 20 includes a voice APP or voice service.
  • a voice APP can also be called a voice assistant APP.
  • Terminal 20 may also include a card engine.
  • the card engine can be another APP or another service, and it can also be integrated in the voice assistant APP as a functional module in the voice assistant APP.
  • the voice assistant APP may include an automatic speech recognition (ASR) module, and may also include a text-to-speech (TTS) module.
  • ASR automatic speech recognition
  • TTS text-to-speech
  • the voice assistant APP collects the user's voice through the microphone of the terminal, and submits the voice to the ASR module, which recognizes the voice as text and sends it to the voice assistant server.
  • the TTS module can convert the text sent by the voice assistant server into voice to play to the user.
  • the card engine is used to process the data to be displayed and display it in the form of a card to present to the user.
  • the card engine can integrate the received card with the data, and process/adapt the data to be displayed according to the card layout and configuration information (such as font size, theme color mode, etc.) to present to the user.
  • the life cycle of the card engine can be managed by the voice assistant APP, and the voice assistant APP will pull up and destroy it.
  • the voice assistant server may include a Natural Language Understanding (NLU) module and a Dialog Manager (DM) module.
  • NLU Natural Language Understanding
  • DM Dialog Manager
  • the NLU module is used to analyze the received natural language to obtain corresponding semantics.
  • the semantics may include skill information, intent information, or slot information.
  • the DM module has the ability to orchestrate the services involved in the process of man-machine dialogue to control the process of man-machine dialogue, or to manage the context of man-machine dialogue to drive multiple rounds of man-machine dialogue.
  • the terminal 20 may also have an NLU module and a DM module to perform semantic analysis and speech dialogue association.
  • the open voice server can determine the skill corresponding to the voice command according to the semantics of the voice command, and can determine the interface of the third-party server 26 that needs to obtain resources (or called data) according to the skill.
  • the open voice server can send the intent information and slot information in the semantics to the third-party server 26 to receive the resources fed back by the third-party server 26, and then provide them to the terminal 20 through the voice assistant server.
  • the third-party server 26 can also provide relevant card information, so that the voice open server can learn the used
  • the card is also provided to the terminal 20, as well as the control intent used by the voice open server to generate multiple rounds of dialogue.
  • the third-party server 26 is a server that provides service resources.
  • the third-party server 26 may be a server of the same manufacturer as the server 24, or a server of a different manufacturer, which is not specifically limited in this embodiment of the present application.
  • S10 The terminal receives the user's first voice.
  • the terminal can continue to detect the user's voice to recognize the user's voice command, that is, the first voice, such as "how to make braised pork" in Figure 1.
  • a conversation process in which the user and the terminal perform voice interaction may include one or more rounds of dialogue between the user and the terminal.
  • Each session can be identified by a corresponding session identifier.
  • the session identifier may be sessionId or dialogId.
  • one session process can correspond to one skill. For example, when the user's voice input hits the skill of ordering takeaway, the electronic device can assign sessionId 1 to the session process. Subsequently, all conversations between the user and the electronic device under the skill of ordering takeaway belong to session 1, and their session IDs are all is sessionId 1.
  • the "how to do braised pork" in Figure 1 can be matched with the skill of "recipe query".
  • S15 The terminal sends the data corresponding to the first voice to the server.
  • the data corresponding to the first voice may be text corresponding to the first voice.
  • the terminal can recognize the user's voice and convert it into text.
  • the data corresponding to the first voice may be the received voice data, or the original voice data after noise reduction processing or compressed voice data.
  • the terminal sends the voice data to the server for processing.
  • S20 The server identifies the semantics of the first voice, and obtains a corresponding first resource according to the semantics of the first voice.
  • the server may obtain the first resource from a third-party server according to the semantics of the first voice.
  • the semantics may include intent information, slot information, or skill information.
  • the recognition of the semantics corresponding to the text is performed on the server.
  • the identification of semantics can also be performed on the terminal, or the identification of semantics is jointly performed by the terminal and the server.
  • the NLU module can be deployed on the server or terminal, which can be composed of a deep neural network, such as a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), Transformer-based bidirectional encoder (Bidirectional Encoder Representation from Transformers, BERT) and other network implementations.
  • a deep neural network such as a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), Transformer-based bidirectional encoder (Bidirectional Encoder Representation from Transformers, BERT) and other network implementations.
  • the terminal can directly obtain the complete or partial semantics corresponding to the first voice (such as all or part of intent information, slot information, and skill information) without completely relying on
  • the identification by the server reduces the communication requirements between the server and the terminal, and at the same time protects the privacy of the terminal user.
  • the recognized semantics includes skill information
  • the server may acquire data (first resource) that satisfies the voice instruction and return it to the terminal.
  • the server can also determine a third-party server that can provide the resource, find an interface to call the third-party server according to the skill information, and send the intent information and slot information in the semantics to the third-party server through the interface, and from the third-party server to obtain the data that satisfies the voice instruction, that is, the first resource.
  • the semantics of the user's voice "how to do braised pork" in Fig. 1 is recognized as "get the recipe: braised pork"
  • the first resource corresponding to the semantics is not cached on the server, it is determined that the first resource can be provided.
  • a third-party server of a resource and obtain a recipe for cooking braised pork from the third-party server.
  • S25 The server sends the first resource to the terminal.
  • the server after the server obtains data related to the voice control command from its own or a third-party server, it can directly send it to the terminal, or process it first, such as integrating data, removing content that cannot be displayed, or processing the data. Orchestration, etc., and then send the processed data to the terminal.
  • the data obtained from the third-party server and the data sent by the server to the terminal may be processed by the server. In the embodiment of the present application, all these data can be regarded as the first resource.
  • S30 The terminal receives the first resource, and the terminal displays a first user interface, where content displayed on the first user interface includes a first part of the first resource.
  • the first resource includes a first part and at least one second part described later. Due to the limitation of the display screen size, when the terminal cannot display all of one page of the user interface, or the first resource itself includes multiple parts displayed on multiple pages, the terminal displays the first user interface, and the first user interface Include the first section in the first resource.
  • the cooking recipe of braised pork shown in Fig. 1 includes six steps and adopts the display mode of six pages.
  • What is displayed in the first user interface in Fig. 1 is the page of the first step of the recipe, that is, the page corresponding to the first step of the recipe A first part of a resource; other steps of the recipe not shown, corresponding to multiple second parts of the first resource.
  • the first resource may only include the first part, that is, the terminal may completely display the first resource on one page of the first user interface. For example, when the amount of data of the first resource is small.
  • S35 The terminal receives a second voice from the user; the second voice is used to control the displayed first resource.
  • the interaction can be performed through gesture operation or voice.
  • the user can view more content of the first resource, or view more content related to the first resource, through the second voice.
  • the second voice is related to the current dialogue context, and the server can determine the corresponding control instruction according to the semantics and context corresponding to the second voice.
  • the terminal may also use prompts on the first user interface to display words, sentences, etc. that the user can refer to, so as to improve the accuracy of the voice interaction.
  • Figure 4a-1 shows such a prompt message "Okay, I have found this information, you can continue to tell me to check the details", which means that relevant information has been found according to the user's first voice "Who is Zhang San”, It also prompts the user to continue the conversation by means of "view details", and the prompt information shown in Figure 4e includes “find these merchants, you can try to tell me which merchants to choose", indicating that the According to the user's first voice "what's delicious in the attachment", find the merchant shown in the picture, and prompt the user to continue the conversation by "choose which merchant” and the server can accurately identify the "choose which merchant” semantics, which can then be transformed into corresponding control instructions.
  • the second voice may correspond to the same session process as the first voice, for example, the session identifiers of the second voice and the first voice are the same.
  • S40 The terminal sends data corresponding to the second voice to the server.
  • step S15 For this step, reference may be made to step S15, and details are not repeated here.
  • S45 The server recognizes the semantics corresponding to the second voice.
  • step S20 refer to step S20.
  • the second voice is used to control the display of the content related to the first resource displayed on the first user interface.
  • the server recognizes the second voice
  • the NLU module can recognize the same semantics for different expressions of the same intention, for example, for the control command of sliding to the left
  • the user may express it as sliding to the left, or as scrolling to the left, previous page, and previous page.
  • the NLU module can recognize the same semantics.
  • the server determines a control instruction corresponding to the semantics of the second voice.
  • the server may directly determine the control instruction according to the semantics of the second voice.
  • the server may determine the control instruction according to the semantics of the second voice and the form of the content to be displayed through the user interface (for example, the form corresponding to the card type described later).
  • control includes but is not limited to: triggering a jump link to view the details page (that is, the second resource); page content page turning: such as sliding to the left, sliding to the right, sliding up, sliding down etc.; move the selection box, such as previous, next, previous line, next line, etc.; select an object of multiple selectable objects, such as selecting the nth one, or equivalent to a certain control on the first user interface being triggered (such as a button control being clicked). This will be described in detail later.
  • S55 The server sends the control instruction to the terminal.
  • S60 The terminal receives the control instruction, and in response to the control instruction, displays a second part of the first resource, or displays a second resource related to the first resource.
  • the terminal in response to the control instruction corresponding to the second voice, may display the second part of the first resource on the first user interface, or display the second part of the first resource on another user interface.
  • the second part of the first resource may include part or all of the first resource other than the first part, and the second part of the first resource may also include the first part and other content other than the first part, which is not specifically limited in this embodiment of the present application. .
  • the terminal in response to the control instruction corresponding to the second voice, may display the second resource related to the first resource on the first user interface, or display the second resource on another user interface.
  • the second resource may be an additional resource.
  • the first resource may carry a jump link of the second resource.
  • the terminal can obtain the second resource by triggering the jump link of the second resource, or jump to the jump link of the second resource. Go to the display interface corresponding to the link.
  • the control command corresponding to the second voice is to view details, and the terminal can jump to the page displaying details according to the jump link on the details page, and the page can be an H5 page, an App page, or a quick application page, etc.
  • the terminal may also detect the user's subsequent voice, so as to continue to interact with the user.
  • the subsequent detected voice can be the voice of the same conversation, and other display controls are performed on the first card, as shown in Figure 1 in "How to make braised pork", the user can input the voice "Next step” or "View details” and so on for other display controls.
  • the subsequent voice detected by the terminal can also be the voice of other conversations. Instead of continuing the previous conversation, query other information, such as "I want to buy Mate40", and the terminal displays other content.
  • the resource displayed by the terminal may be displayed in a different manner from the previous one, and the executable control instructions may also be different from the previous one.
  • the first part of the resource displayed by the terminal through the first user interface may also be obtained from a third-party server in other ways, instead of the above steps S10-S20.
  • Steps S10-S20 need to be included. For example, here are two possible ways:
  • the first resource is provided to the terminal after responding to a non-speech input of the terminal.
  • the user may manually enter the first user interface of the terminal, click a certain icon (such as a recipe icon) displayed on the first user interface, or manually enter a certain word (such as "braised pork") in the search box provided by the terminal, and the terminal
  • the corresponding information is sent to the server, and after obtaining the corresponding request, the server obtains the corresponding request result (that is, the first resource) from the third-party server and provides it to the terminal.
  • the first resource when the user uses the subscription function, the first resource may be actively pushed to the terminal by the third-party server through the server. For example, if a user subscribes to a certain recipe, when the third-party server provides a new "braised pork" recipe (i.e. the first resource), it will be pushed to the terminal through the server, and when the user enters the first user interface, the pushed recipe can be displayed. Recipe (ie first resource).
  • the first user interface when the terminal displays the first part or the second part of the first resource, or when displaying the second resource related to the first resource, the first user interface may be displayed in the form of a card. Due to the different content that needs to be displayed, the card will also have many different forms. Different content layouts are suitable for different card formats, or card templates. Therefore, the layout of the content displayed by the card can also be determined by the adopted card template terminal/server.
  • the terminal/server can generate a candidate set of control instructions for displaying the first resource through the obtained first resource and the card template, and the candidate set of control instructions can be used to match the A control instruction corresponding to the semantics of the second voice.
  • the first resource is a recipe for braised pork
  • the first user interface only displays the first step of the recipe for "braised pork”.
  • the current card is a picture-text card and contains multiple pages of content.
  • the terminal/server can generate a candidate set of control instructions for the braised pork recipe card, including: "next page"/"next step"/"page turning", etc.
  • Control instructions for switching to the second part for display "view details" and other control instructions for switching to the details page for display, etc.
  • the card template is a graphic-text card
  • the candidate set of control commands generated by the terminal/server does not have "next page"/"next step”/"page turning", etc. Switch to the second part to display the control instructions.
  • the card can be divided into a plain text card, a graphic card, a grid card, a list card, and the like.
  • a control in the card such as a button control, which will correspond to an instruction.
  • the button control is a call-out button
  • the call-out button control corresponds to a call operation (such as calling a phone APP and calling out a corresponding number)
  • the button control is a control for turning pages (for example, the next page is written on the control)
  • the page turning button control corresponds to an instruction for displaying the content of the next page.
  • Plain text card refers to a card whose display content is text, or a card whose display content is mainly text, such as the example shown in FIG. 4d.
  • buttons control on the plain text card, and the user can trigger the function of the button control by voice or touch.
  • voice or touch This is also applicable to other types of cards, and will not be described in detail when other types of cards are described later.
  • the terminal can switch to a page displaying the content of the detailed information of the first resource, or the terminal can jump to the page corresponding to the jump link through the jump link on the details page.
  • the second page related to the details and display the details content (second resource).
  • the candidate set of voice control commands may include commands corresponding to the needs of turning pages, moving, and viewing details, so that the user can also realize the above-mentioned needs of turning pages, moving, and viewing details of the displayed content through voice commands.
  • a candidate set of voice control instructions may be generated in advance according to the card template and the data to be displayed.
  • the candidate set of voice control instructions may include instructions corresponding to checking details (CHECK_DETAIL).
  • the candidate set of voice control commands can also include: "Previous page”, “Previous step” and other instructions to turn up/forward pages (Previous), “Next page”, “Next step” and other instructions for turning pages down/back; Movement-related instructions such as “Left” or “Right” (equivalent to instructions for moving a finger on the screen to move the entire page displayed by the card); etc.
  • the candidate set of voice control instructions may also include voice instructions corresponding to the functions of the controls. This is also applicable to other types of cards, and will not be described in detail when other types of cards are described later.
  • the candidate set of voice control instructions may be generated based on a card template and data to be displayed.
  • the candidate set of voice control instructions can also be generated based on a card template. This is also applicable to other types of cards, and will not be described in detail when other types of cards are described later.
  • the control commands in the candidate set of voice control commands are all pre-associated with the control associated with the card display that the terminal can respond to. Therefore, after receiving the user's voice command, the terminal/server can match the voice command (such as the intent of the voice command) with the control commands in the candidate set of voice control commands, and if the matching is successful, execute the function of the corresponding command.
  • the voice command such as the intent of the voice command
  • the terminal/server can match the voice command (such as the intent of the voice command) with the control commands in the candidate set of voice control commands, and if the matching is successful, execute the function of the corresponding command.
  • the card shown in Figure 4d has three button functions, corresponding to the three buttons of call, change country, and travel guide, and adopt the technical solution of the embodiment of the application
  • the text displayed in the functions of the three buttons that is, the control name (corresponding to the button name in FIG. 4d )
  • the control name that is, the corresponding button name
  • a control command corresponding to the button will be generated. For example, recognize the semantics of the voice command in the user's voice as "call”, hit the button whose control name is "call”, and send the control command corresponding to the button to the terminal for execution.
  • the command corresponding to the "call” button If the dialed number is "0123-987654321", the terminal executes the command.
  • the command corresponding to the "Call” button can be triggered through the user's voice, and due to the robustness of semantic recognition, when the user says “call”, “outgoing", “telephone” and “dial”, all
  • the instruction corresponding to the "call” button can be executed.
  • Picture-text card Refers to a card that displays pictures and text, or a card that only shows pictures but no text, such as Figure 4a-1 and Figure 4a-2.
  • the layout of pictures and text can be up and down, left and right, surrounded, embedded and so on.
  • buttons control on the picture-text card, and the user can trigger the function of the button control by voice or touch.
  • the candidate set of voice control commands is similar to that of plain text cards, and the candidate set of voice control commands may include commands corresponding to the needs of turning pages, moving, and viewing details.
  • the candidate set of voice control commands may include commands corresponding to the needs of turning pages, moving, and viewing details.
  • details please refer to the introduction of the candidate set of voice control commands for the plain text card, and details will not be repeated here.
  • the generated candidate set of voice control instructions may include the control instruction "check details (CHECK_DETAIL)",
  • the semantics of the user's voice command is recognized as "view details”
  • hit the corresponding control command of the "view details” and then execute the control command, for example, according to the corresponding jump link, jump to the corresponding page, or pull up an app to display the page with the corresponding jump link.
  • Figure 4a-1 shows an example of using a picture-text card. At this time, the card presents the brief information of Zhang San. The data displayed on this card includes jump links.
  • the generated candidate set of voice control commands may include page-turning voice control commands, such as "previous page”, “previous step” and other instructions for turning up/forward pages (Previous), "next page”, " “Next step” waits for the next/back page (Next) instruction.
  • Grid-type card refers to a card whose content is displayed in the form of multiple grids or similar grids.
  • the content displayed in the grid-shaped card is usually cards with more than one row and one column.
  • the content displayed in the grid-shaped card can be a card including pictures and text, or the content displayed in the card can also be only pictures or text.
  • the layout of the grid is usually arranged in multiple rows and columns.
  • a grid card with 2 rows and 2 columns is a card composed of four graphics or pictures
  • 3 rows and 3 columns is a card composed of 9 graphics or pictures.
  • a grid-shaped card please refer to the example shown in Figure 4b or Figure 4f-1.
  • the candidate set of voice control instructions can include instructions for realizing page turning and moving, and since there are multiple grid objects, the candidate set of voice control instructions can also include: "Select the nth (Select ItemID: n)" indicates a selection-related instruction, and the instruction is used to indicate that one of multiple objects is selected and enters the details page of the selected object, and may also include related instructions for moving the selection box.
  • each grid object may have a number for selection
  • the generated candidate set of voice control instructions may include: related instructions for selecting a certain number
  • the voice control command one of the grid objects can be selected and corresponding controls can be performed, such as entering the details page of the selected grid object, as shown in Figure 4f-2.
  • the numbers are sequentially generated by the server according to the positions of each grid object. This also applies to list-type cards, which will not be described in detail later.
  • the generated voice control instruction candidate set can also include voice control instructions for performing the movement (Move) of the selection box, such as the selection box up (SelectBox up) and down (SelectBox down) , to the left (SelectBox let), to the right (SelectBox right), the next (SelectBox Next), the previous (SelectBox Previous), etc.
  • voice control instructions for performing the movement (Move) of the selection box, such as the selection box up (SelectBox up) and down (SelectBox down) , to the left (SelectBox let), to the right (SelectBox right), the next (SelectBox Next), the previous (SelectBox Previous), etc.
  • grid-type cards usually include not only up and down movement, but also left and right movement, while list-type cards usually move up and down or left and right. Usually does not support up and down direction and left and right direction movement at the same time.
  • the layout direction can also be combined with the layout direction when generating the control instructions for turning pages to generate control instructions for turning pages to the left/right or turning pages up/down.
  • control instructions for sliding to the left/right to turn pages may be generated.
  • List-type card means that the displayed content in the card is presented in the form of a list, which can be a multi-column (greater than or equal to 2) horizontal layout type, or a multi-row (greater than or equal to 2) vertical layout type.
  • the displayed content of the list card can be pictures and texts, or only pictures, or only texts, such as the examples shown in Figure 4c and Figure 4e.
  • the candidate set of voice control commands is similar to that of the grid card.
  • the candidate set of voice control commands can include turning pages, moving, and indicating that one of multiple objects is selected and enters the details page of the selected object. , Related instructions for selecting box movement.
  • details please refer to the introduction of the candidate set of voice control commands for grid-shaped cards, and details will not be repeated here.
  • the list-type cards shown in Fig. 4c and Fig. 4e generate numbers for selecting list items.
  • the candidate set of voice control instructions may also include: "select nth (Select ItemID:n)" related instructions .
  • select nth Select ItemID:n
  • the server recognizes that the semantics of the user's voice is "select the fourth”
  • This method is triggered by the number, which will be more accurate from the perspective of the user's voice input. Convenience, it will be more simplified and accurate from the perspective of recognition accuracy.
  • the division of card types may be different in different instances.
  • a card comprising a plurality of squares arranged horizontally in a single row or a plurality of squares arranged vertically in a single column, such as the card shown in Figure 4b, but when the plurality of squares are only When it is one row or one column, it can belong to the grid type card.
  • cards with multiple grids arranged horizontally in a single row or grids arranged vertically in a single column may also belong to list-type cards instead of grid-type cards.
  • the terminal/server supports different card types and whether there are different candidate sets of voice control commands for different card types can be confirmed by whether it can respond to different voice requests of the user.
  • the matching of the above-mentioned voice commands with the pre-generated control commands in the voice control command set may be implemented by precise matching or fuzzy matching, which is not specifically limited in this embodiment of the present application.
  • fuzzy matching mode when the text corresponding to the recognized voice is "next", the "next" can be matched.
  • different expressions corresponding to the same voice command can also be learned in advance.
  • different expressions of the same control command can be further expanded. This different expression can be provided by a third-party server that provides the data to be displayed, indicating the extension word and corresponding control commands.
  • the server when the server generates control instructions for expressing the next page, during semantic recognition, "next step, next page” and so on will be recognized as page turning control instructions corresponding to the next page.
  • the first forging process related data When the first forging process related data is provided
  • the third-party server provides an extended description (such as provided in text mode) that includes similar "next page turning extension: next process”, then, during semantic recognition, "next process” will be recognized as the page turning control of the next page instruction.
  • card templates may have coarse-grained and fine-grained divisions.
  • the card template can be stored in the terminal, and the update of the card template can be obtained from the server periodically or aperiodically, wherein the updating method can be to request the update from the server through the terminal, or the server actively pushes the update.
  • the server may send to the terminal the template identifier used to display the first resource (the server may first obtain the template identifier from a third-party server Template ID), the terminal reads the stored card template according to the template ID, so as to display the first part or the second part of the first resource.
  • the card template when the card template is stored in the terminal, it also It may be that the user selects a favorite card template from a plurality of card templates to display the first part or the second part of the first resource.
  • the card template is stored in the server, and the server provides the terminal with the card template required for displaying the first resource.
  • the identifier of the used card template may be obtained from a third-party server providing the first resource. For example, when the server obtains the first resource from a third-party server, it also obtains a card template identifier.
  • the server determines a specific card template according to the card template identifier, and provides it to the terminal together with the first resource.
  • the terminal uses the card template to display the first resource. The first or second part of a resource.
  • the server can provide the resource access path of the card template to the terminal, and the terminal can obtain the card template according to the resource access path.
  • resource access paths of various types of card modules may be provided to the terminal, so that the terminal may download or update stored various types of card modules.
  • the resource access path of the card module corresponding to the first resource is provided to the terminal, so that the terminal can download or update the card module.
  • the server may generate a candidate set of control instructions according to the card form (or card template) used when the user interface of the terminal displays data.
  • the server matches the recognized voice information of the user with the instructions in the candidate set, and then executes the corresponding control instruction after matching the control instruction in the candidate set.
  • the card form (or card template) shown can be further combined with the data (resources) to be displayed to generate the voice of the card control intention Candidate set of control instructions.
  • the server can also provide a development interface in the form of a webpage, which is used to enable third-party developers to select the type of card to be used through the development interface, wherein a card template (such as The template type shown on the left side of Figure 3) to determine the card type, and the basic configuration of the selected template can be performed through the development interface, such as the font size shown on the upper and right sides of Figure 5, whether to display titles, pictures Scale, picture style, or theme mode and other configurations, these configuration information can be stored corresponding to different third-party servers, so that the corresponding configuration information can be obtained according to the third-party logo during use, and used for the terminal when displaying cards The configuration should be displayed.
  • a card template such as The template type shown on the left side of Figure 3
  • the basic configuration of the selected template can be performed through the development interface, such as the font size shown on the upper and right sides of Figure 5, whether to display titles, pictures Scale, picture style, or theme mode and other configurations, these configuration information can be stored corresponding to different third-party servers,
  • FIG. 3 also shows a sample of the selected template (such as the sample of the grid-shaped card in the middle of FIG. 3 ), which can be changed in font, proportion, etc. according to the configuration.
  • the developer of the third-party server (referring to the provider of the required data) can select the corresponding card template based on the server, such as the development page of the voice open server.
  • the developer of the third-party server only needs to input the business data content without the need for a third-party server Developers input programming content (data format, control intent) related to business data, thereby reducing the development difficulty of third-party users during development, and the control instructions supported by third-party user presets are not required.
  • the server includes a voice assistant server and a voice open server
  • the terminal includes a voice assistant APP and a card engine
  • the application environment of this specific embodiment is a scenario where a user initiates a card service through the voice assistant APP, including the following steps:
  • S100 The terminal receives the user's first voice.
  • the voice assistant App of the terminal can start voice interaction with the user and receive voice commands from the user. For example, the user wakes up the voice assistant App through voice wake-up words such as "Xiaoyi Xiaoyi".
  • the voice assistant APP in the terminal can call the terminal microphone to receive the user's voice, or the terminal can receive the user's voice through the microphone and provide it to the voice assistant APP.
  • the user's first voice received by the terminal is "I want to buy Mate40", where Mate40 is the model of the mobile phone.
  • the terminal sends the received first voice ("I want to buy Mate40") to the voice assistant server.
  • the terminal can send the first voice to the server.
  • What the terminal sends may be the first voice, or the processed text of the first voice, which is not limited in this embodiment of the present application.
  • the voice assistant server After receiving the first voice, the voice assistant server processes the first voice.
  • the voice assistant server can perform semantic analysis on the first voice ("I want to buy Mate40") through the DM module and the NLU module, and obtain the result of the semantic analysis, which includes skills, intentions and slots, such as "skills: Huawei shopping guide, Intent: buy equipment, slot 1-model: Mate40, slot 2-device type: mobile phone).
  • the voice assistant server can send the results of semantic analysis (skills, intentions and slot-related data) to the voice open server.
  • the open voice server can confirm the third-party server that provides the service corresponding to the skill according to the skill, such as the skill name or skill ID, and send the result of the semantic analysis ("skill: Huawei shopping guide, intent: buy equipment, slot 1 -Model: Mate40, slot 2-Device type: mobile phone) (data related to intent and slot, or data related to skill, intent and slot) are sent to the third-party server through the preset interface.
  • skill Huawei shopping guide
  • intent buy equipment
  • slot 1 -Model Mate40
  • slot 2-Device type mobile phone
  • the third-party server may acquire corresponding data according to the intent and slot information, or acquire corresponding data according to the skill, intent, and slot information, and return the data (first resource) to the open voice server.
  • the information returned by the third-party server to the open voice server may also include a card template ID (that is, which type of card is selected to display the data to be displayed).
  • a card template ID that is, which type of card is selected to display the data to be displayed.
  • the voice opening server may determine the card template according to the returned data that needs to be displayed.
  • the speech development server can automatically determine what type of card template to select according to the data to be displayed.
  • the open voice server may pre-record the card type of the data to be displayed returned by the third-party server.
  • the developer corresponding to the third-party server can select a card template for association when connecting the resources of the third-party server to the speech development server. Subsequent data returned by the third-party server can be displayed based on the card template. If more than one is selected, one can be selected from multiple associated card templates according to the content of the data.
  • An example for selecting a template may be a grid-shaped card template as shown in FIG. 5 , but this embodiment of the present application does not specifically limit it.
  • the information returned by the third-party server not only includes data associated with the first voice command (the first resource), but also includes a card template.
  • the voice open server can determine the card template according to the card template name or card template ID in the information returned by the third-party server.
  • the speech development server may store detailed information of the card template, and the speech development server may also obtain data of the corresponding card template from the card engine according to the ID of the card template.
  • the card engine can be a template of the speech development server, or a template of other servers or other devices.
  • the voice opening server generates a candidate set of voice control instructions according to the card template and the data required for card display returned by the third-party server.
  • the voice open server generates a candidate set of voice control commands according to the card template and the data required for card display returned by the third-party server.
  • the same card template, but different display data may correspond to different candidate sets of voice control commands. For example, when the displayed data does not need to be displayed in multiple pages, the candidate set of voice control commands has no candidate control commands such as page turning and moving.
  • the voice opening server may also generate a candidate set of voice control instructions according to the card template.
  • the card templates are the same, and the candidate sets of voice control instructions corresponding to different display data are the same.
  • the candidate set of voice control instructions may be a complete set of possible display control instructions based on this type of card template.
  • the returned data includes 4 Mate models of mobile phones, and the pages displaying these 4 models of mobile phones use a grid-shaped template, based on the grid-shaped card, and the card
  • the pages of the above-mentioned 4 kinds of Mate mobile phones can be displayed (such as the pages shown in Figure 4f-1 generated in subsequent steps), and the generated candidate sets of voice control instructions include: "Select the nth (Select ItemID: n) ", indicating the next (SelectBox Next) and previous (SelectBox Previous) of the selection box movement.
  • the target page can be entered directly, or the target page can be entered after receiving a similar voice command indicating "enter" from the user.
  • the voice opening server returns the data of the card template, the data required for card display, and the candidate set of voice control instructions to the voice assistant server.
  • the voice assistant server stores the candidate set of voice control commands for responding to subsequent voice commands.
  • the voice assistant server sends the data of the card template and the data to be displayed to the terminal.
  • the voice assistant server can send the card template to the terminal, or can send the card template identifier (such as the card template ID or the card template name) to the terminal when the terminal stores the card template.
  • the card template identifier such as the card template ID or the card template name
  • the voice assistant server or voice open server may process the data to be displayed, such as identifying, extracting, and typesetting the displayed content.
  • the data to be displayed acquired from a third party and the data sent to the terminal for display are regarded as the data to be displayed or the first resource.
  • the voice assistant APP of the terminal displays the received data based on the card template.
  • the voice assistant App of the terminal can pull up the card engine, and transmit the data of the card template and the data required for card display to the card engine, and the card engine will complete the rendering and display of the card, that is, display the first part of the first resource .
  • the terminal can use the template.
  • the voice assistant server sends the card ID and various card templates are stored on the terminal, the card template to be used can be determined according to the received template ID.
  • the voice assistant server sends the resource access path of the card template, the terminal can access the resource access path to obtain the corresponding card module.
  • a grid-shaped card page as shown in Figure 4f-1 is generated, including 4 The picture of the Mate model phone and the model below the picture, and the corresponding numbers 1-4.
  • S200 The terminal receives the user's second voice.
  • the terminal can continue to detect the user's voice to realize multiple rounds of dialogue. Specifically, after the voice assistant App of the terminal displays the first part of the content of the first resource, the terminal can continue to receive voice commands from the user to continue the previous conversation. Among them, as shown in Figure 4f-1, it also shows the voice commands that the user can use "select the number, previous/next", so that the voice spoken by the user is as close as possible to the recognizable voice command, which is convenient for accurate Identify user intent.
  • the second voice and the first voice belong to the same session process, such as the session ID of the second voice is the same as the session ID of the above-mentioned first voice.
  • the voice assistant APP in the terminal can receive the user's voice, such as "previous, next, nth" and other voices, and can display the content of the previous round of dialogue, such as the movement of the selection box, etc. Further display controls.
  • the user's voice instruction of "select the fourth one" can be received.
  • the voice used to express the same intention may also be “the 4th”, “choose the 4th” and so on.
  • S205 The terminal recognizes the received voice as text by the ASR module, and then sends it to the voice assistant server.
  • the automatic speech recognition module converts speech to text as "select 4th".
  • the voice is “the 4th”
  • it will be converted into text as "the 4th”, “choose the 4th”.
  • the terminal may send the second voice or text corresponding to the second voice to the server for voice recognition.
  • the terminal can also recognize the voice locally without processing by the server.
  • the voice assistant server recognizes the semantics corresponding to the second voice, and matches the semantics corresponding to the second voice with the candidate set of voice control instructions to match the corresponding control instructions.
  • the voice assistant server will recognize the same semantics for "select the 4th", “4th”, and “select the 4th”, assuming that the recognized semantics in this example is “select the 4th”.
  • the voice assistant server recognizes the matching control command matching "select the 4th" and the candidate set of voice control commands, for example, the matched complete control command is "CardAction: Select ItemID: 4th".
  • step S200 when the received user voice is "next", then in this step, the voice assistant server recognizes the intention of "next” and “move the selection box to the next" in the candidate set of voice control commands match, and then determine the control instruction corresponding to the card control intention, such as "CardAction: GeneralAction: Move Next".
  • the voice assistant server sends the determined control instruction to the voice assistant APP of the terminal.
  • the voice assistant After the voice assistant matches the control instruction corresponding to the second voice, it sends it to the terminal, so that the terminal performs corresponding display control on the card.
  • the voice assistant APP of the terminal sends the control command to the card engine, and the card engine makes the card respond to the control command.
  • the card engine can be a part of the module/function of the voice assistant app, or it can be a module/function independent of the voice assistant app.
  • the card engine is used to process and control the display of cards, and may be called by other names, which is not limited in this embodiment of the present application.
  • step S210 “CardAction: GeneralAction: Move Next”
  • the card engine moves down the selection box in Figure 4f-1 to the second grid object (that is, on the picture corresponding to Mate40pro), and this round of dialogue is over.
  • the specific way for the card engine to respond to the command is: according to the agreement with the voice assistant APP Message format, feed back a message including the information of the fourth item to the voice assistant APP, for example, the message format can be ⁇ "message2vassistant":"Mate40 Porsche Design" ⁇ , and continue to S230-S235 (indicated by a dotted line).
  • the voice assistant APP can process the received messages according to the preset processing method, for example:
  • the voice assistant APP After the voice assistant APP receives the message, it sends it to the voice assistant server, and after the voice assistant server receives it, it proceeds to the next round of dialogue processing.
  • the details page data of Mate40 Porsche Design and the template used are obtained from the third-party server, and sent to the terminal for display (see steps S110-S150 for specific principles, and will not be described in detail), for example, Figure 4f -2 shows that in the next round of dialogue, the detailed introduction of the Mate40 Porsche Design displayed on the terminal, that is, the details page, adopts a graphic template.
  • the candidate set of generated voice control instructions corresponding to the card page in Figure 4f-2 includes: "previous page", "next page” indicating page turning instructions, and "display” for viewing details instructions. full details".
  • step S200 begins with the second dialogue after the first round of man-machine dialogue, and it is not difficult for those skilled in the art to understand that the beginning of step S200 can be any time after the first round of man-machine dialogue.
  • the process of dialogue is not difficult for those skilled in the art to understand that the beginning of step S200 can be any time after the first round of man-machine dialogue.
  • the terminal may also continue the second voice to perform other display control on the card shown in FIG. 4f-2.
  • voice commands such as "next page” or "show full details”.
  • "display all detailed information” can jump to a third-party page for display, and the third-party page can be an H5 page, or an application or quick application page.
  • the terminal may also conduct other multiple rounds of conversations with the user to implement other conversation processes.
  • the terminal may display different resources with different types of cards (such as the above-described cards of text type, picture type, grid type, list type, etc.). Different types of cards may have the same candidate set of voice control commands.
  • the candidate set of voice control commands includes one or more of moving, turning pages, and viewing details. Different types of cards can also have different candidate sets of voice control commands, or, for the same type of cards, the data (resources) to be displayed are different, and different cards can also have different candidate sets of voice control commands, such as the same type Cards, single-page display and multi-page display have different candidate sets for voice control commands.
  • whether the terminal or the server supports different candidate sets of voice control commands can be verified by testing whether the terminal or the server can respond to specific commands of the user.
  • the present application further provides another embodiment of a method for displaying resources, which is applied to a terminal.
  • Relevant content in the foregoing embodiments is applicable to this embodiment, and will not be repeated here. Referring to the flow chart shown in Figure 7a, it includes the following steps:
  • S310 Receive the user's first voice.
  • S320 Send the data corresponding to the first voice to the server.
  • S330 Receive a first resource corresponding to the first voice sent by the server.
  • S340 Display a first user interface, where content displayed on the first user interface includes a first part of content of the first resource.
  • S350 Receive a second voice from the user; the second voice is used to control the displayed first resource.
  • S360 Display a second part of the content of the first resource in response to the control instruction corresponding to the second voice, or display a second resource related to the first resource in response to the control instruction corresponding to the second voice .
  • the content displayed on the first user interface includes the first part of the first resource, which specifically includes: displaying the first part of the first resource in the form of a card on the first user interface.
  • FIG. 1 , FIG. 4a-1 , FIG. 4b to FIG. 4f-1 are all embodiments of displaying the first part of the first resource in the form of a card.
  • the card includes multiple types, and the card type of the first card presented by the terminal is related to the content to be displayed by the first resource. For example, plain text cards, graphic cards, grid cards, list cards, etc.
  • the first card corresponds to a first voice command control candidate set
  • the first voice command control candidate set includes one or more commands for display control of the content of the first card
  • the candidate set controlled by the first voice command is related to the card type of the first card; correspondingly, in response to the control command corresponding to the second voice, display the second part of the content of the first resource, or respond to
  • the control instruction corresponding to the second voice displays the second resource related to the first resource, specifically including: if the instruction corresponding to the second voice is the same as the first instruction of the candidate set controlled by the first voice instruction When matching, execute the first display control associated with the first instruction on the first card to display the second part of the content of the first resource, or display the second resource related to the first resource.
  • step S130 For a practicable manner of generating a candidate set controlled by the first voice command, reference may be made to the content described in step S130 above.
  • the embodiment described in FIG. 1 displays the content of the second part of the first resource
  • the embodiments described in FIG. 4a-1 and FIG. 4a-2 display the second resource related to the first resource.
  • the method further includes: receiving a third voice of the user; if the instruction corresponding to the third voice matches the second instruction of the candidate set controlled by the first voice instruction, The first card executes a second display control associated with the second instruction, and the second display control is different from the first display control.
  • a third voice of the user if the instruction corresponding to the third voice matches the second instruction of the candidate set controlled by the first voice instruction, The first card executes a second display control associated with the second instruction, and the second display control is different from the first display control.
  • the second voice can be a voice instruction to control the movement of the selection box (such as the voice is: select the next one or select the left/right, etc.), and the first display control is the control The movement of the selection box;
  • the third voice can be a voice command for the overall movement of the card display content (corresponding to finger touch and slide) (such as voice: slide to the right, slide down, etc.), and the second display control is to control the movement of the card display content , such as moving the displayed content to the left or upward, so as to move part of the content out of the card, and move another part (corresponding to part of the content of the second resource) into the card display.
  • different card types correspond to different voice command control candidate sets; the method further includes: receiving the user's fourth voice; and displaying the third resource corresponding to the fourth voice with the second card, so The type of the second card is different from the type of the first card; the second card corresponds to a second voice command control candidate set, and the second voice command control candidate set includes one or more The content of the second card is an instruction for display control, and the candidate set for the second voice command control is different from the candidate set for the first voice command control. For example, this is illustrated by the embodiment composed of Fig. 4f-1 and Fig. 4f-2. Fig.
  • FIG. 4f-1 shows the first resource obtained by displaying the obtained first resource (that is, obtaining each Mate mobile phone model and picture)
  • Figure 4f-2 shows the obtained third resource (that is, the introduction information of the obtained Mate Porsche Design Edition) using a picture-text card (equivalent to the type of the second card), and the server will use it according to the grid type card and the obtained first resource to generate a corresponding command control candidate set (corresponding to the first voice command control candidate set), and will generate a corresponding command control candidate set (corresponding to the A candidate set controlled by the second voice command).
  • the candidate set controlled by the first voice command is generated by the server based on the card type of the first card and the first resource. For example, as described above with reference to FIG. 1 , the terminal/server can generate a candidate set of control instructions for displaying the first resource through the obtained first resource and the card template.
  • the method further includes: sending the data of the second voice to the server; receiving the instruction corresponding to the second voice confirmed by the server and the first As a result of matching the first instruction in the voice instruction control candidate set, the first display control associated with the first instruction is executed on the first card.
  • the method further includes: receiving the fifth voice of the user; displaying the fourth resource corresponding to the fifth voice with a third card, the type of the third card is the same as the type of the first card Same, the fourth resource is different from the first resource; the third card corresponds to a candidate set controlled by a third voice command, and the candidate set controlled by a third voice command includes one or more
  • the content of the card is an instruction for display control, and the candidate set for the third voice command control is different from the candidate set for the first voice command control.
  • Fig. 4a-1 shows the first resource obtained by displaying a picture-text card (i.e.
  • the server will generate a corresponding command control candidate set based on the graphic card and the obtained first resource (corresponding to the candidate set of the first voice command control, for example including the command to view details), the corresponding command control candidate set (corresponding to the third voice command control candidate set, will be generated according to the graphic card and the obtained fourth resource, For example, it includes an instruction to move the content displayed by the card up/down as a whole, so as to display the part of the current card that has not been displayed).
  • the types of the cards include one or more of the following: plain text cards, picture-text cards, grid-type cards, and list-type cards.
  • Figure 4d is a plain text card
  • Figure 1 Figure 4a-1
  • Figure 4a-2 and Figure 4f-2 are graphic cards
  • Figure 4b and Figure 4f-1 are palace-style cards
  • Figures 4c and 4e is a list card.
  • the candidate set of the first control instruction includes one or more of moving, turning pages, or viewing details.
  • the candidate set of the first control instruction includes one or more of moving, turning pages, or viewing details.
  • the candidate set of control instructions includes one or more of moving, turning pages, selecting an object, or moving a selection box.
  • moving For an implementation method, please refer to the introduction to grid-type cards or list-type cards.
  • the candidate set of the first control instruction further includes: a control instruction for the button, and the control instruction is used to trigger the control.
  • the control instruction is used to trigger the control.
  • the first resource is a resource obtained from a third-party server, and the card type is selected by a third party corresponding to the first resource.
  • the card type is selected by a third party corresponding to the first resource.
  • the session identifiers corresponding to the first voice and the second voice are the same.
  • an implementation is introduced with reference to FIG. 3 . Steps S10 - S60 shown in FIG. 3 may be in one session, so the same session ID may be used.
  • the present application also provides another embodiment of a method for displaying resources, which is applied to a server.
  • the relevant content in the above embodiments is applicable to this embodiment, and will not be repeated here. Referring to the flow chart shown in Figure 7b, including:
  • S420 Acquire a first resource according to the data corresponding to the first voice.
  • S430 Send the first resource to the terminal for display.
  • S440 Receive data corresponding to the user's second voice sent by the terminal; the second voice is used for the terminal to control the displayed first resource.
  • S460 Send the control instruction to the terminal, so that the terminal displays the second part of the first resource in response to the control instruction, or, in response to the control instruction, displays the content related to the first resource.
  • the second resource associated with the resource is
  • steps S410-S430 for an implementable manner of the above-mentioned steps S410-S430, reference may be made to the content described in the above-mentioned steps S15-S25.
  • steps S440-S460 refer to the content described in steps 35-S55 above.
  • the method further includes: sending the information of the first card to the terminal, so that the terminal uses the first The card displays the first resource, wherein the information of the first card includes the first card or the identifier of the first card or the resource access path of the first card.
  • the method further includes: generating a first voice command control candidate set according to the card type of the first card, or , generating a first voice command control candidate set according to the card type of the first card and the first resource, and the first voice command control candidate set includes one or more Displaying control instructions; the identifying the control instruction corresponding to the second voice, and sending the control instruction to the terminal, so that the terminal responds to the control instruction, specifically includes: corresponding to the second voice The command is matched with the control command in the candidate set controlled by the first voice command; if the command corresponding to the second voice matches the first command in the candidate set controlled by the first voice command, the The terminal sends a first display control instruction for the first card associated with the first instruction, so that the terminal displays the second part of the first resource, or displays the information related to the first resource.
  • step S130 For a practicable manner of generating a candidate set controlled by the first voice command, reference may be made to the content described in step S130 above.
  • the embodiment described in FIG. 1 displays the content of the second part of the first resource
  • the embodiments described in FIG. 4a-1 and FIG. 4a-2 display the second resource related to the first resource.
  • the method further includes: receiving data corresponding to the user's third voice sent by the terminal; and combining the command corresponding to the third voice with the candidate set controlled by the first voice command match the control instructions; if the instruction corresponding to the third voice matches the second instruction of the candidate set controlled by the first voice instruction, send the second instruction associated with the second instruction to the terminal.
  • a second display control instruction of a card, the second display control instruction is different from the first display control instruction.
  • different card types correspond to different voice command control candidate sets; the method further includes: receiving data corresponding to the user's fourth voice sent by the terminal; acquiring data corresponding to the fourth voice The corresponding third resource; sending the information of the second card and the third resource to the terminal, so that the terminal displays the third resource with the second card; the information of the second card Including the second card or the identification of the second card or the resource access path of the second card, the type of the second card is different from the type of the first card; and according to the type of the second card card type, generating a candidate set of second voice command control associated with the second card, or, according to the card type of the second card and the third resource, generating the second voice command associated with the second card A candidate set of voice command control, the candidate set of the second voice command control includes one or more instructions for displaying and controlling the content of the second card, and the candidate set of the second voice command control is different from the A set of candidates controlled by the first voice command.
  • the method further includes: receiving data corresponding to the fifth voice of the user sent by the terminal; acquiring a fourth resource corresponding to the fifth voice; and the fourth resource is sent to the terminal, so that the terminal displays the fourth resource with the third card, the type of the third card is the same as the type of the first card, The fourth resource is different from the first resource; and according to the card type of the third card, a third voice command control candidate set associated with the third card is generated, or, according to the third card card type and the fourth resource, generate a third voice command control candidate set associated with the third card, and the third voice command control candidate set includes one or more Instructions for display control of content, the candidate set for the third voice command control is different from the candidate set for the first voice command control.
  • the implementation manner refer to the above-mentioned introduction to the implementation example when the fifth voice is involved on the terminal side, and details are not repeated here.
  • the types of the cards include one or more of the following: plain text cards, picture-text cards, grid-type cards, and list-type cards.
  • Figure 4d is a plain text card
  • Figure 1 Figure 4a-1
  • Figure 4a-2 and Figure 4f-2 are graphic cards
  • Figure 4b and Figure 4f-1 are palace-style cards
  • Figures 4c and 4e is a list card.
  • the candidate set of the first control instruction includes one or more of moving, turning pages, or viewing details.
  • the candidate set of the first control instruction includes one or more of moving, turning pages, or viewing details.
  • the candidate set of control instructions includes one or more of moving, turning pages, selecting an object, or moving a selection box.
  • moving For an implementation method, please refer to the introduction to grid-type cards or list-type cards.
  • the candidate set of the first control instruction further includes: a control instruction for the button, and the control instruction is used to trigger the control.
  • the control instruction is used to trigger the control.
  • the first resource is a resource obtained from a third-party server, and the card type is selected by a third party corresponding to the first resource.
  • the card type is selected by a third party corresponding to the first resource.
  • the session identifiers corresponding to the first voice and the second voice are the same.
  • an implementation is introduced with reference to FIG. 3 . Steps S10 - S60 shown in FIG. 3 may be in one session, so the same session ID may be used.
  • FIG. 8 shows a schematic structural diagram of a terminal 100 according to an embodiment of the present application.
  • Terminals 20 may include cell phones, foldable electronic devices, tablet computers, desktop computers, laptop computers, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, cellular phones, personal digital Assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) equipment, virtual reality (virtual reality, VR) equipment, artificial intelligence (artificial intelligence, AI) equipment, wearable equipment, vehicle equipment, smart home equipment , or at least one of smart city equipment.
  • PDA personal digital assistant
  • augmented reality augmented reality, AR
  • VR virtual reality
  • AI artificial intelligence
  • wearable equipment wearable equipment
  • vehicle equipment smart home equipment
  • smart home equipment smart home equipment
  • the terminal 20 may include a processor 110, an internal memory 120, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a display screen 130 etc.
  • the sensor module 140 may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
  • the terminal 20 accesses the mobile communication network or wireless network through the mobile communication module 150 and the antenna 1, or the wireless communication module 160 and the antenna 2, to realize data transmission with the server; through the microphone 170C or the receiver 170B, the user voice is received Play the voice through the loudspeaker 170A, or play the voice through the earphone interface 170D to the earphone or the external speaker connected to the earphone interface 170D; display the card through the display screen 130; realize voice recognition as text, text conversion into voice, The execution of relevant control instructions, the rendering and display of cards, etc.
  • the terminal 20 accesses the mobile communication network or wireless network through the mobile communication module 150 and the antenna 1, or the wireless communication module 160 and the antenna 2, to realize data transmission with the server; through the microphone 170C or the receiver 170B, the user voice is received Play the voice through the loudspeaker 170A, or play the voice through the earphone interface 170D to the earphone or the external speaker connected to the earphone interface 170D; display the card through the display
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal 20 .
  • the terminal 20 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the processor can generate an operation control signal according to the instruction opcode and the timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 may be a cache memory.
  • the memory may store instructions or data used by the processor 110 or used frequently. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • the processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, and a camera through at least one of the above interfaces.
  • the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the terminal 20 .
  • the terminal 20 may also adopt different interface connection modes in the above embodiments, or a combination of multiple interface connection modes.
  • the wireless communication function of the terminal 20 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor, etc.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in terminal 20 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the terminal 20 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 130 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the terminal 20 including wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (wireless fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), bluetooth low energy consumption (bluetooth low energy, BLE), ultra wide band (UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC ), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the terminal 20 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 20 can communicate with the network and other electronic devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS can include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou satellite navigation system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi- zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system
  • GLONASS global navigation satellite system
  • Beidou satellite navigation system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quasi-zenith satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the terminal 20 may implement a display function through a GPU, a display screen 130, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 130 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 130 is used to display images, videos and the like.
  • the display screen 130 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the terminal 20 may include one or more display screens 130 .
  • Internal memory 120 may be used to store computer-executable program code, including instructions.
  • the internal memory 120 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data created during the use of the terminal 20 (such as audio data, phonebook, etc.) and the like.
  • the internal memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the processor 110 executes various functional methods or data processing of the terminal 20 by executing instructions stored in the internal memory 120 and/or instructions stored in a memory provided in the processor.
  • the terminal 20 may implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 can be set in the processor 110, or some functional modules of the audio module 170 can be set in the processor 110.
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • the terminal 20 can listen to music through the speaker 170A, or output an audio signal for a hands-free call.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the receiver 170B can be placed close to the human ear to listen to the voice.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the terminal 20 may be provided with at least one microphone 170C.
  • the terminal 20 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals.
  • the terminal 20 can also be equipped with three, four or more microphones 170C to realize sound signal collection, noise reduction, identify sound sources, and realize directional recording functions, etc.
  • the earphone interface 170D is used for connecting wired earphones.
  • the earphone interface 170D may be a USB interface, or a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • FIG. 9 is a schematic structural diagram of a server 200 provided by an embodiment of the present application.
  • the server 200 includes: a processor 210 , a memory 220 , a communication interface 230 , and a bus 240 .
  • the server 200 performs data or instruction transmission with the terminal through the communication interface 230, and performs data or instruction transmission with a third-party server. Realize the functions of natural language understanding, dialog management, card control intent generation, storage, etc. through the processor.
  • a third-party server Realize the functions of natural language understanding, dialog management, card control intent generation, storage, etc. through the processor.
  • the steps or functions performed by the server described in the specific implementation manner will not be repeated here.
  • the communication interface 230 in the server 24 shown in FIG. 9 can be used to communicate with other devices.
  • the processor 210 may be connected to the memory 220 .
  • the memory 220 can be used to store the program codes and data. Therefore, the memory 220 may be a storage unit inside the processor 210, or an external storage unit independent of the processor 210, or may include a storage unit inside the processor 210 and an external storage unit independent of the processor 210. part.
  • the server 24 may also include a bus 240 .
  • the memory 220 and the communication interface 230 may be connected to the processor 210 through the bus 240 .
  • the bus 240 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus 240 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one line is used in FIG. 11 , but it does not mean that there is only one bus or one type of bus.
  • the processor 210 may be a central processing unit (central processing unit, CPU).
  • the processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 210 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.
  • the memory 220 may include read-only memory and random-access memory, and provides instructions and data to the processor 210 .
  • a portion of processor 210 may also include non-volatile random access memory.
  • processor 210 may also store device type information.
  • the processor 210 executes the computer-executed instructions in the memory 220 to perform the operation steps of the above method.
  • server 24 may correspond to the corresponding subject in the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the server 24 are for realizing the present embodiment For the sake of brevity, the corresponding processes of each method are not repeated here.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored.
  • a computer program When the program is executed by a processor, it is used to execute a method for generating a variety of questions.
  • the method includes the methods described in the above-mentioned embodiments. at least one of the options.
  • the computer storage medium of the embodiment of the present application can adopt any combination of one or more computer-readable mediums.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Program code on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • connect such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

涉及人工智能下的语音控制的技术领域,尤其涉及资源显示的方法,包括:终端接收用户的第一语音(S310),将第一语音对应的数据发送给服务器(S320);接收服务器发送的与第一语音对应的第一资源(S330);显示第一用户界面,第一用户界面显示的内容包括第一资源的第一部分内容(S340);接收用户的第二语音;第二语音用于对显示的第一资源进行控制(S350);响应第二语音对应的控制指令,显示第一资源的第二部分内容,或,响应第二语音对应的控制指令,显示与第一资源相关的第二资源(S360)。通过第二语音对应的控制指令可基于第一资源确定,使得第二语音对应的控制指令在识别时有了一定的范围限制,故所识别的控制指令更为准确,即对用户的语音的控制意图理解更为准确。

Description

资源显示的方法、终端及服务器
本申请要求于2021年5月26日提交中国国家知识产权局、申请号202110579359.9、申请名称为“资源显示的方法、终端及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域下的语音控制的技术领域,特别是指用于资源显示的方法、终端及服务器。
背景技术
当前,智能终端,例如手机、平板、智慧屏、笔记本、穿戴设备等的使用越来越普遍。终端不仅能基于用户的手势进行交互,语音交互也越来越普遍。
语音交互可以实现用户的基本需求,例如查天气、打开应用、推荐美食等。但若语音交互的过程稍复杂,例如多轮对话时,需要对前一轮对话获取的内容进行显示控制,终端不能准确识别用户的意图,容易出现误识别或误响应。
发明内容
鉴于现有技术的以上问题,本申请提供一种资源显示的方法、终端及服务器,以在多轮对话中,识别用户第二轮或以后的语音时,基于第一轮、或当前轮对话之前的某轮所显示的第一资源,来识别所述用户语音对应的控制指令,使得语音所识别的控制指令具有了一定的范围控制,从而所识别的控制指令更为准确,以准确响应用户的意图。
为达到上述目的,本申请第一方面提供了一种资源显示的方法,包括:
接收用户的第一语音;
将第一语音对应的数据发送给服务器;
接收服务器发送的与第一语音对应的第一资源;
显示第一用户界面,第一用户界面显示的内容包括第一资源的第一部分内容;
接收用户的第二语音;第二语音用于对显示的第一资源进行控制;
响应第二语音对应的控制指令,显示第一资源的第二部分内容,或者,响应第二语音对应的控制指令,显示与第一资源相关的第二资源。
由上,通过上述步骤,可以实现在多轮对话中,通过用户第二语音,对前一轮对话获取的内容(如上述第一资源)进行显示控制,以显示第一资源的第二部分内容,或显示与第一资源相关的第二资源。可见,由于识别用户第二轮或以后的语音时,基于第一轮、或当前轮对话之前的某轮(相对于当前轮,该某轮可以视为第一轮)所显示的第一资源,来识别所述用户语音对应的控制指令,使得语音所识别的控制指令具有了一定的范围控制,从而所识别的控制指令更为准确,以准确响应用户的意图。
作为第一方面的一种可能的实现方式,第一用户界面显示的内容包括第一资源的第一部分,具体包括:在第一用户界面,以卡片形式显示第一资源的第一部分。
由上,终端可以通过卡片或类似卡片的展示形式,向用户呈现多种资源进行显示。卡片可以是视图(view)、组件(Widget组件)的形态,通常在用户界面上以信息框呈现信息的形式,信息框通常是矩形或圆角矩形等形状。在一些可能的实现方式中,卡片可以是终端的负一屏所显示的卡片,也可以是快应用卡片。
作为第一方面的一种可能的实现方式,卡片包括多个类型,终端呈现的第一卡片的卡片类型与第一资源需要显示的内容相关。
由上,可以预先指定(例如由第一资源提供方指定)所显示的第一资源与对应所要采用的卡片类型,以使第一资源能够以较合适的卡片类型进行显示。其中,卡片类型例如纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片等。
作为第一方面的一种可能的实现方式,第一卡片对应第一语音指令控制的候选集合,第一语音指令控制的候选集合包括一个或多个对第一卡片的内容进行显示控制的指令,第一语音指令控制的候选集合与第一卡片的卡片类型相关;
响应第二语音对应的控制指令,显示第一资源的第二部分内容,或者,响应第二语音对应的控制指令,显示与第一资源相关的第二资源,具体包括:
若第二语音对应的指令与第一语音指令控制的候选集合的第一指令匹配时,对第一卡片执行与第一指令关联的第一显示控制以显示第一资源的第二部分内容,或者显示与第一资源相关的第二资源。
由上,通过生成第一语音指令控制的候选集合,使得在识别第二语音对应的控制指令时,从该第一语音指令控制的候选集合中去匹配识别,由于第一语音指令控制的候选集合与所使用的卡片类型相关,相当于对可识别的控制指令设定了对应的范围(即候选集合),因此据此识别的第二语音的控制指令更为准确。
作为第一方面的一种可能的实现方式,方法还包括:接收用户的第三语音;若第三语音对应的指令与第一语音指令控制的候选集合的第二指令匹配时,对第一卡片执行与第二指令关联的第二显示控制,第二显示控制不同于第一显示控制。
其中,例如图4b或图4c示出对应的例子中,第二语音可以是控制选择框移动的语音指令(如语音为:选择下一个或选择左边的/右边的等等),第一显示控制是控制选择框的移动;第三语音可以是卡片显示内容整体移动(对应手指触摸滑动)的语音指令(如语音为:向右滑、向下滑等等),第二显示控制是控制卡片显示内容的移动,例如将所显示内容向左或向上方向移动,以将部分内容移出卡片,而将另一部分(对应第二资源的部分内容)移入卡片显示。由上,可以实现对第二轮对话(对应第二语音)、第三轮(对应第三语音)的多轮对话的支持。
作为第一方面的一种可能的实现方式,不同的卡片类型对应的语音指令控制的候选集合不同;方法还包括:接收用户的第四语音;以第二卡片显示第四语音对应的第三资源,第二卡片的类型不同于第一卡片的类型;第二卡片对应第二语音指令控制的候选集合,第二语音指令控制的候选集合包括一个或多个对第二卡片的内容进行显示控制的指令,第二语音指令控制的候选集合不同于第一语音指令控制的候选集合。
其中,例如图4f-1和图4f-2构成的实施例中,图4f-1为以宫格型卡片(相当于 第一卡片的类型)显示获得的第一资源(即获得各Mate手机型号和图),图4f-2为采用图文型卡片(相当于第二卡片的类型)显示获得的第三资源(即获得的Mate保时捷设计版的介绍信息),服务器会根据宫格型卡片和获得的第一资源生成对应的指令控制的候选集合(对应第一语音指令控制的候选集合),会根据图文型卡片和获得的第二资源生成对应的指令控制的候选集合(对应第二语音指令控制的候选集合)。由上,可以实现对第二轮对话(对应第二语音)、第三轮(对应第四语音)的多轮对话的支持。
作为第一方面的一种可能的实现方式,第一语音指令控制的候选集合由服务器基于第一卡片的卡片类型和第一资源生成。
由上,由于第一语音指令控制的候选集合与所使用的卡片类型、第一资源相关,相当于对可识别的控制指令设定了对应的范围(即候选集合),因此据此识别的第二语音的控制指令与卡片类型、第一资源相关,故识别的第二语音的控制指令更为准确。
作为第一方面的一种可能的实现方式,接收用户的第二语音之后,方法还包括:向服务器发送第二语音的数据;接收服务器确认的第二语音对应的指令与第一语音指令控制的候选集合的第一指令匹配的结果,对第一卡片执行与第一指令关联的第一显示控制。
由上,第二语音对应的指令,是由服务器根据第一语音指令控制的候选集合进行匹配获得,由于第一语音指令控制的候选集合与所使用的卡片类型、第一资源相关,因此识别的第二语音的控制指令更为准确。
作为第一方面的一种可能的实现方式,方法还包括:接收用户的第五语音;以第三卡片显示第五语音对应的第四资源,第三卡片的类型与第一卡片的类型相同,第四资源不同于第一资源;第三卡片对应第三语音指令控制的候选集合,第三语音指令控制的候选集合包括一个或多个对第三卡片的内容进行显示控制的指令,第三语音指令控制的候选集合不同于第一语音指令控制的候选集合。
以图4a-1和图4a-2构成的实施例对此进行说明,图4a-1为以图文型卡片显示获得的第一资源(即对张三的简介),图4a-2为也采用图文型卡片显示获得的第四资源(即通过触发跳转链接获得的张三的详情信息),服务器会根据图文型卡片和获得的第一资源生成对应的指令控制的候选集合(对应第一语音指令控制的候选集合,例如包括查看详情的指令),会根据图文型卡片和获得的第四资源生成对应的指令控制的候选集合(对应第三语音指令控制的候选集合,例如包括使卡片所显示内容整体向上/下移动的指令,以显示出当前卡片尚未显示出来的部分)。由上,可以实现对第二轮对话(例如对应第二语音)、第三轮(例如对应第五语音)的多轮对话的支持。
作为第一方面的一种可能的实现方式,卡片的类型包括如下一个或多个:纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片。
由上,在通过卡片显示时,可以采用不同的卡片类型进行显示。
作为第一方面的一种可能的实现方式,对于纯文本型卡片或图文型卡片,第一控制指令的候选集合包括移动、翻页、或查看详情中的一个或多个。
由上,可以根据上述卡片的类型,生成对应的第一控制指令的候选集合。生成的第一控制指令的候选集合可以与所使用的卡片类型相关。
作为第一方面的一种可能的实现方式,对于宫格型卡片或列表型卡片,控制指令的候选集合包括移动、翻页、选中对象、或移动选择框中的一个或多个。
由上,可以根据上述卡片的类型,生成对应的第一控制指令的候选集合。生成的第一控制指令的候选集合可以与所使用的卡片类型相关。
作为第一方面的一种可能的实现方式,第一卡片还显示有按钮时,第一控制指令的候选集合还包括:对按钮的控制指令,控制指令用于触发控件。
由上,还可以根据卡片上所具有的按钮生成对应的控制指令,扩展了可语音控制的内容。
作为第一方面的一种可能的实现方式,第一资源是从第三方服务器获取的资源,卡片类型由第一资源对应的第三方选择。
由上,可以由提供第一资源的第三方选择,便于第三方选择出与第一资源所匹配的卡片类型。
作为第一方面的一种可能的实现方式,第一语音和第二语音对应的会话标识相同。
由上,可以便于实现多轮对话的交互流程。
本申请第二面提供了一种资源显示的方法,应用于服务器,包括:
接收终端发送的用户的第一语音对应的数据;
根据第一语音对应的数据获取第一资源;
将第一资源发送给终端进行显示;
接收终端发送的用户的第二语音对应的数据;第二语音用于终端对所显示的第一资源进行控制;
识别第二语音对应的控制指令;
将控制指令发送给终端,以使终端响应控制指令,显示第一资源的第二部分内容,或者,响应控制指令,显示与第一资源相关的第二资源。
由上,通过服务器的上述步骤,可以实现终端通过卡片或类似卡片的展示形式,向用户呈现多种资源进行显示。卡片可以是视图(view)、组件(Widget组件)的形态,通常在用户界面上以信息框呈现信息的形式,信息框通常是矩形或圆角矩形等形状。在一些可能的实现方式中,卡片可以是终端的负一屏所显示的卡片,也可以是快应用卡片。
对于下述列出的各种本申请第二面的可能的实现方式,其有益效果可参见上述对应的本申请第一面的可能的实现方式,不再赘述。
作为第二方面的一种可能的实现方式,在根据第一语音对应的数据获取第一资源之后,方法还包括:将第一卡片的信息发送给终端,以使终端以第一卡片显示第一资源,其中,第一卡片的信息包括第一卡片或第一卡片的标识或第一卡片的资源访问路径。
由上,可以通过多种方式使终端获得第一卡片。例如直接发送第一卡片到终端;或将第一卡片的标识发送到终端,终端可以根据预先存储的各卡片中确定对应该标识的卡片,或根据该卡片标识向服务器获取对应的卡片;或发送第一卡片的资源访问路径到终端,终端可根据该路径获得第一卡片。
作为第二方面的一种可能的实现方式,在根据第一语音对应的数据获取第一资源 之后,方法还包括:
根据第一卡片的卡片类型生成第一语音指令控制的候选集合,或,根据第一卡片的卡片类型和第一资源生成第一语音指令控制的候选集合,第一语音指令控制的候选集合包括一个或多个对第一卡片的内容进行显示控制的指令;
识别第二语音对应的控制指令,将控制指令发送给终端,以使终端响应控制指令,具体包括:
将第二语音对应的指令与第一语音指令控制的候选集合中的控制指令进行匹配;
若第二语音对应的指令与第一语音指令控制的候选集合的第一指令匹配时,向终端发送与第一指令关联的对第一卡片的第一显示控制指令,以使终端显示第一资源的第二部分内容,或者显示与第一资源相关的第二资源。
作为第二方面的一种可能的实现方式,方法还包括:
接收终端发送的用户的第三语音对应的数据;
将第三语音对应的指令与第一语音指令控制的候选集合中的控制指令进行匹配;
若第三语音对应的指令与第一语音指令控制的候选集合的第二指令匹配时,向终端发送与第二指令关联的对第一卡片的第二显示控制指令,第二显示控制指令不同于第一显示控制指令。
作为第二方面的一种可能的实现方式,不同的卡片类型对应的语音指令控制的候选集合不同;方法还包括:
接收终端发送的用户的第四语音对应的数据;
获取与第四语音对应的第三资源;
将第二卡片的信息,以及第三资源发送给终端,以使终端以第二卡片显示第三资源;第二卡片的信息包括第二卡片或第二卡片的标识或第二卡片的资源访问路径,第二卡片的类型不同于第一卡片的类型;
并根据第二卡片的卡片类型,生成与第二卡片关联的第二语音指令控制的候选集合,或,根据第二卡片的卡片类型和第三资源,生成与第二卡片关联的第二语音指令控制的候选集合,第二语音指令控制的候选集合包括一个或多个对第二卡片的内容进行显示控制的指令,第二语音指令控制的候选集合不同于第一语音指令控制的候选集合。
作为第二方面的一种可能的实现方式,方法还包括:
接收终端发送的用户的第五语音对应的数据;
获取与第五语音对应的第四资源;
将第三卡片或第三卡片的标识,以及第四资源发送给终端,以使终端以第三卡片显示第四资源,第三卡片的类型与第一卡片的类型相同,第四资源不同于第一资源;
并根据第三卡片的卡片类型,生成与第三卡片关联的第三语音指令控制的候选集合,或,根据第三卡片的卡片类型和第四资源,生成与第三卡片关联的第三语音指令控制的候选集合,第三语音指令控制的候选集合包括一个或多个对第三卡片的内容进行显示控制的指令,第三语音指令控制的候选集合不同于第一语音指令控制的候选集合。
作为第二方面的一种可能的实现方式,卡片的类型包括如下一个或多个:纯文本 型卡片、图文型卡片、宫格型卡片、列表型卡片。
作为第二方面的一种可能的实现方式,对于纯文本型卡片或图文型卡片,第一控制指令的候选集合包括移动、翻页、或查看详情中的一个或多个。
作为第二方面的一种可能的实现方式,对于宫格型卡片或列表型卡片,控制指令的候选集合包括移动、翻页、选中对象、或移动选择框中的一个或多个。
作为第二方面的一种可能的实现方式,第一卡片还显示有按钮时,第一控制指令的候选集合还包括:对按钮的控制指令,控制指令用于触发控件。
作为第二方面的一种可能的实现方式,第一资源是从第三方服务器获取的资源,卡片类型由第一资源对应的第三方选择。
作为第二方面的一种可能的实现方式,第一语音和第二语音对应的会话标识相同。
本申请第三方面提供了一种资源显示的方法,包括:
本申请第一方面提供的资源显示的方法的任一可能的实现方式,和本申请第二方面提供的资源显示的方法的任一可能的实现方式。
本申请第四方面提供了一种终端,包括处理器、内部存储器、显示屏和通信接口;
内部存储器,用于存储计算机可执行程序;
计算机可执行程序在被处理器调用时,使处理器用于通过通信接口或显示屏实现本申请第一方面提供的任一项的资源显示的方法。
本申请第五方面提供了一种服务器,包括处理器、存储器和通信接口;
存储器,用于存储计算机可执行程序;
计算机可执行程序在被处理器调用时,使处理器用于通过通信接口实现本申请第二方面提供的任一项的资源显示的方法。
本申请的这些和其它方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
以下参照附图来进一步说明本申请的各个特征和各个特征之间的联系。附图均为示例性的,一些特征并不以实际比例示出,并且一些附图中可能省略了本申请所涉及领域的惯常的且对于本申请非必要的特征,或是额外示出了对于本申请非必要的特征,附图所示的各个特征的组合并不用以限制本申请。另外,在本说明书全文中,相同的附图标记所指代的内容也是相同的。具体的附图说明如下:
图1是通过卡片展示一菜谱的示意图;
图2是本申请的应用场景的一实施例的示意图图;
图3是本申请资源显示的方法的第一实施例的流程图;
图4a-1是本申请的图文型卡片的一实施例的示意图;
图4a-2是本申请的图4a-1对应的详情页面的一实施例的示意图;
图4b是本申请的宫格型卡片的一实施例的示意图;
图4c是本申请的列表型卡片的一实施例的示意图;
图4d是本申请的纯文本型卡片的一实施例的示意图;
图4e是本申请的列表型卡片的另一实施例的示意图;
图4f-1是本申请资源显示的方法的第一具体实施方式中的响应“我想买Mate40” 的卡片显示的示意图;
图4f-2是本申请资源显示的方法的第一具体实施方式中的响应“Mate40保时捷设计版”的卡片显示的示意图;
图5是本申请提供的开发界面的一实施例的示意图;
图6是本申请资源显示的方法的第一具体实施方式的流程图;
图7a是本申请资源显示的方法的第二实施例的示意图;
图7b是本申请资源显示的方法的第三实施例的示意图;
图8是本申请的终端一实施例的示意图;
图9为本申请服务器的一实施例的示意图。
具体实施方式
说明书和权利要求书中的词语“第一、第二、第三等”或模块A、模块B、模块C等类似用语,仅用于区别类似的对象,不代表针对对象的特定排序,可以理解地,在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
在以下的描述中,所涉及的表示步骤的标号,如S110、S120……等,并不表示一定会按此步骤执行,在允许的情况下可以互换前后步骤的顺序,或同时执行,或合并执行。
说明书和权利要求书中使用的术语“包括”不应解释为限制于其后列出的内容;它不排除其它的元件或步骤。因此,其应当诠释为指定所提到的所述特征、整体、步骤或部件的存在,但并不排除存在或添加一个或更多其它特征、整体、步骤或部件及其组群。因此,表述“包括装置A和B的设备”不应局限为仅由部件A和B组成的设备。
本说明书中提到的“一个实施例”或“实施例”意味着与该实施例结合描述的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在本说明书各处出现的用语“在一个实施例中”或“在实施例中”并不一定都指同一实施例,但可以指同一实施例。此外,在一个或多个实施例中,能够以任何适当的方式组合各特定特征、结构或特性,如从本公开对本领域的普通技术人员显而易见的那样。
随着智能语音交互技术的发展,用户在使用终端时可以选择使用语音的模式与终端进行交互。如图1所示卡片展示的页面示出的与菜谱相关的图文信息为例进行说明。该例子中,终端响应用户语音“红烧肉怎么做”,通过卡片向用户展示了红烧肉菜谱的做法,由于内容较多,当前的显示界面只展示了红烧肉菜谱的第一步,当用户希望卡片展示红烧肉烹饪步骤的第二步时,用户说出“下一步”的语音后,终端无法识别出该“下一步”应对应展示红烧肉烹饪步骤的第二步页面的意图,无法执行用户所期望的操作,即无法准确响应用户的语音“下一步”。例如图1示出了终端将“下一步”识别为一诗词名称,并将该“下一步”对应的诗词内容通过卡片进行展示。
卡片是一种新的服务形态,终端可以通过卡片或类似卡片的展示形式,向用户呈现多种资源进行显示。这些资源的信息包括图片、文本、控件、跳转链接等信息,通 过这些资源可实现向用户提供可交互的服务。卡片可以是视图(view)、组件(Widget组件)的形态,是一种在用户界面上以信息框呈现信息的形式,信息框通常是矩形或圆角矩形等形状。本申请实施例后续以卡片形式展示信息为例进行说明。可以理解的是,终端也可以是通过其他形式展现信息,不限于卡片,本申请实施例对此不做具体限定。
本申请实施例提供了一种资源显示的方法,可以实现用户与当前界面所显示的第一资源的语音交互。例如如图1中,基于本申请实施例的资源显示方法,用户说出“下一步”时,终端可以较为准确的响应用户的意图,显示出该第一资源(红烧肉菜谱)的另一部分,或显示与第一资源相关的第二资源(红烧肉菜谱详情)。
本申请实施例的一种应用场景,可参见图2示出的示意图,包括终端20、服务器24和第三方服务器26。
其中,终端20可以为任意具有通信功能的终端,例如手机、平板电脑、可穿戴设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、车载终端等电子设备。通过终端20可以实现人机交互,以及在用户界面通过卡片形式展示内容。
其中,服务器24可以为通用服务器、或者云服务器等,用于为终端20提供用户界面展示所需的信息,这些信息可以包括第三方服务器26所提供的数据,或服务器24自身的数据,还可以包括卡片模板等。在一些实施例中,服务器24可以由多个服务器实现,例如,该服务器24可以由语音助手服务器和语音开放服务器实现。在一些实施例中,服务器24也可以由一个服务器实现,这种情况下,则不存在由多个服务器实现时的服务器之间的信息交互步骤。
其中,第三方服务器26可以为通用服务器、或者云服务器等,用于提供语音交互过程所需的资源数据,例如终端20的用户界面中在卡片中展示的数据。
在一些实施例中,终端20包括语音APP或语音服务。语音APP又可以称之为语音助手APP。终端20还可以包括卡片引擎。其中,卡片引擎可以是另外一个APP或另外一个服务,其也可以集成在语音助手APP中作为语音助手APP中的一个功能模块。
在一些实施例中,语音助手APP可包括自动语音识别(ASR)模块,还可包括文本转语音(TTS)模块。语音助手APP通过终端的麦克风收录用户语音,并将该语音提交给ASR模块,由ASR模块将语音识别为文本,然后发送到语音助手服务器。TTS模块可以将语音助手服务器发送来的文本转化为语音,以播放给用户。
在一些实施例中,卡片引擎用于将要展示的数据进行处理通过卡片的形式进行显示,呈现给用户。在一些实施例中,卡片引擎可以将接收的卡片与所述数据进行整合,按照卡片的布局、配置信息(如字体大小、主题颜色模式等)将要展示的数据进行处理/适配,以呈现给用户。卡片引擎的生命周期可以由语音助手APP管理,由语音助手APP拉起和销毁。
在一些实施例中,语音助手服务器可包括自然语言理解(Natural Language  Understanding,NLU)模块和对话管理(Dialog Manager,DM)模块。NLU模块用于将接收的自然语言通过语义分析得到相应的语义。其中,语义可包括技能信息、意图信息、或槽位信息。DM模块具有对人机对话过程中涉及的服务进行编排的能力,以控制着人机对话的过程,或者成为通过管理人机对话过程中的上下文,驱动多轮人机对话。在一种实现方式中,终端20也可以具有NLU模块和DM模块,以进行语义分析和语音对话关联。
在一些实施例中,语音开放服务器可以根据语音指令的语义确定语音指令对应的技能,并可以根据技能确定需要获取资源(或称为数据)的第三方服务器26的接口。语音开放服务器可将语义中的意图信息和槽位信息发送到第三方服务器26,以接收由第三方服务器26反馈的资源,然后通过语音助手服务器提供给终端20。在一些实施例中,如果第三方服务器26反馈的资源/数据在终端上通过卡片形式呈现时,第三方服务器26还可以提供相关的卡片信息,以使语音开放服务器可以据此得知所使用的卡片并提供给终端20,以及用于语音开放服务器生成多轮对话所使用的控制意图。
第三方服务器26是提供服务资源的服务器。第三方服务器26可以是与服务器24为同一厂商的服务器,也可以是不同厂商的服务器,本申请实施例对此不作具体限定。
下面参照图3示出的流程图,对本申请资源显示的方法的第一实施例进行介绍,包括以下步骤:
S10:终端接收用户的第一语音。
在用户对终端的语音APP或语音服务进行唤醒后,终端可以继续对用户的语音进行检测,以识别用户的语音指令,即第一语音,例如图1中的“红烧肉怎么做”。
用户与终端进行语音交互过程的一次会话过程可以包括用户与终端之间的一轮或多轮对话。每次会话可通过对应的会话标识标识。例如,该会话标识可以为sessionId或dialogId等。一般,一次会话过程可与一个技能对应。例如,当用户的语音输入命中点外卖的技能后,电子设备可为本次会话过程分配sessionId 1,后续,用户与电子设备在点外卖这一技能下的对话均属于会话1,其会话标识均为sessionId 1。图1中的“红烧肉怎么做”可以与“查询菜谱”的技能匹配。
S15:终端将所述第一语音对应的数据发送给服务器。
在一些实施例中,第一语音对应的数据可以是第一语音对应的文本。终端可以对用户的语音进行识别,转化为文本。
在一些实施例中,第一语音对应的数据可以是接收到的语音数据,也可以是原语音数据经过降噪处理、或压缩后的语音数据。终端将语音数据发送给服务器进行处理。
S20:服务器识别所述第一语音的语义,根据所述第一语音的语义获得对应的第一资源。
在一些实施例中,服务器可以是根据第一语音的语义从第三方服务器获取所述第一资源。
在一些实施例中,所述语义可包括意图信息、槽位信息、或技能信息。
在本实施例中,对文本对应的语义的识别在服务器执行。在一些实施例中,语义 的识别也可以在终端执行,或者语义的识别由终端与服务器联合执行。
当语义的识别在服务器或终端上执行时,服务器或终端上可布设NLU模块,该模块可以由深度神经网络,如循环神经网络(Recurrent Neural Network,RNN)、卷积神经网络(Convolutional Neural Network,CNN)、基于Transformer的双向编码器(Bidirectional Encoder Representation from Transformers,BERT)等网络实现。当语义的识别由终端与服务器联合执行时,深度神经网络的一部分可布设在终端上,另一部分可布设在服务器上。进一步的,若终端具备完整或部分的语义的识别功能,终端可直接获得第一语音对应的完整或部分的语义(比如意图信息、槽位信息、技能信息中的全部或部分)而无须完全依赖服务器进行识别,降低了服务器和终端之间的通信要求的同时,也保护了终端的使用者的隐私。
本实施例中,所识别的语义中包括技能信息,服务器可以获取自身满足语音指令的数据(第一资源)返回给终端。服务器也可以确定出可提供该资源的第三方服务器,根据技能信息找到调用第三方服务器的接口,通过该接口将语义中的意图信息和槽位信息发送到第三方服务器,从所述第三方服务器来获取满足语音指令的数据,即所述第一资源。例如,当图1中用户语音“红烧肉怎么做”被识别的语义为“获取菜谱:红烧肉”时,由于服务器上并未缓存有该语义对应的第一资源,则确定出可提供该第一资源的第三方服务器,并从该第三方服务器获取到红烧肉烹饪的菜谱。
S25:服务器将所述第一资源发送给所述终端。
在一种实现方式中,服务器在获取到自身或第三方服务器与语音控制指令相关的数据后,可以直接发送给终端,也可以先进行处理,例如整合数据、去除不能显示的内容或对数据进行编排等,然后将处理后的数据发送给终端。
可以理解的是,从第三方服务器获取的数据和服务器发送给终端的数据可能经过服务器的处理,本申请实施例中,针对这些数据都可以认为是第一资源。
S30:终端接收所述第一资源,并且,终端显示第一用户界面,该第一用户界面显示的内容包括第一资源的第一部分。
其中,第一资源包括第一部分,还包括后述的至少一个第二部分。基于显示屏幕大小的限制,终端无法通过用户界面的一个页面全部显示,或者第一资源本身即包括了通过多个页面显示的多个部分时,终端显示第一用户界面,所述第一用户界面包括第一资源中的第一部分。
例如图1示出的红烧肉的烹饪菜谱,包含了六个步骤,采用了六个页面的显示方式,图1中第一用户界面所显示的是该菜谱的第一步的页面,即对应第一资源的第一部分;尚未显示的该菜谱的其他步骤,对应第一资源的多个第二部分。
在一些实施例中,第一资源也可能仅包括第一部分,即终端可以通过第一用户界面的一个页面完整显示出该第一资源。例如在第一资源的数据量很少的情况下。
S35:终端接收用户的第二语音;所述第二语音用于对所述显示的第一资源进行控制。
用户希望查看第一资源相关的更多内容时,可以通过手势操作进行交互,也可以通过语音进行交互。
在本申请实施例的一种实现方式中,终端的第一用户界面上显示了第一资源的第 一部分内容之后。用户可以在本次会话的基础上,通过第二语音,查看更多第一资源的内容,或查看与第一资源相关的更多的内容。该第二语音会与当前对话上下文有关,服务器可以根据该第二语音对应的语义和上下文确定相应的控制指令。
可选的,终端还可以在第一用户界面上采用提示的方式显示出用户可以参考说出的文字、语句等,以提高语音交互的准确性。例如图4a-1示出了这样的提示信息“好的,已经找到这些信息,可以继续对我说出查看详情”,表示已经根据用户的第一语音“张三是谁”找到了相关信息,并提示用户可以采用“查看详情”的方式继续对话,又如图4e示出了的提示信息中包括“找到这些商家,你可以试着对我说选择第几个商家…….”,表示已经根据用户第一语音“附件有什么好吃的”找到图中所显示的商家,并且提示用户可以采用“选择第几个商家”的方式继续对话,服务器可以准确识别该“选择第几个商家”的语义,进而可以转化为对应的控制指令。
第二语音可以与第一语音对应同一会话过程,如第二语音和第一语音的会话标识相同。
S40:终端将所述第二语音对应的数据发送给服务器。
该步骤可参见步骤S15,不再赘述。
S45:服务器识别出所述第二语音对应的语义。
该步骤可参见步骤S20。不同的是,本实施例中,第二语音是用于对第一用户界面显示的第一资源相关的内容进行显示控制。
可以理解的是,服务器对第二语音进行识别时,由于NLU模块本身对语音识别语义的鲁棒性,可以对相同意图的不同的表达识别出相同的语义,例如对于向左滑的控制指令,用户可能表达为向左滑,还可能表达为向左滚动、前一页、上页,NLU模块是可以识别为相同的语义的。在一些实施例中,为了实现语义识别的泛化,还可以进一步的在将各个语义与控制指令对应时,可以通过配置参数等方式,指明某控制指令可对应的扩展词汇,例如近义词。这样当NLU模型识别为这些扩展词,或者NLU模型输入的是这些扩展词时,也都能对应到所述控制指令上。
S50:服务器确定所述第二语音的语义对应的控制指令。在一些实施例中,服务器可以直接根据所述第二语音的语义确定出控制指令。
在一些实施例中,服务器可以根据所述第二语音的语义、所要通过用户界面所显示内容的形式(例如后面介绍的卡片类型对应的形式),来确定出控制指令。
在一些实施例中,所述控制包括但不限于:触发一跳转链接以查看详情页面(即第二资源);页面内容进行翻页:如向左滑、向右滑、向上滑、向下滑等;移动选择框,如上一个、下一个、上一行、下一行等;选中多个可选对象的某个对象,如选第n个,或等价于第一用户界面上某控件被触发(如按钮控件被点击)。将在后文对此进行详述。
S55:服务器将所述控制指令发送给终端。
S60:终端接收所述控制指令,响应所述控制指令,显示所述第一资源的第二部分,或者显示与所述第一资源相关的第二资源。
在一种实现方式中,终端响应所述第二语音对应的控制指令,可以在第一用户界面显示第一资源的第二部分,也可以在另一个用户界面显示第一资源的第二部分。第 一资源的第二部分可以包括第一资源第一部分之外的一部分或全部,第一资源的第二部分也可以包括第一部分以及第一部分外的其他内容,本申请实施例对比不做具体限定。
在一种实现方式中,终端响应所述第二语音对应的控制指令,可以在第一用户界面显示与所述第一资源相关的第二资源,也可以在另一个用户界面显示第二资源。第二资源可能是另外获取的资源,例如第一资源中可以携带第二资源的跳转链接,终端可以通过触发第二资源的跳转链接获取第二资源,或者跳转到第二资源的跳转链接对应的显示界面。例如,第二语音对应的控制指令是查看详情,终端可以根据详情页面的跳转链接跳转到显示详情的页面,该页面可以是H5页面,App页面,或者快应用页面等。
可选的,在步骤S60之后,终端还可以检测用户的后续的语音,以继续与用户进行交互。后续检测到的语音可以是同一会话的语音,对第一卡片进行其他的显示控制,如图1中的“红烧肉怎么做”中,用户可以输入语音“再下一步”或“查看详细内容”等进行其他显示控制。终端后续检测到的语音,也可以是其他会话的语音,不再接续之前的会话,查询其他信息,例如“我想买Mate40”,终端再显示其他内容。可选的,后续会话过程中,终端显示的资源的显示的方式可以与之前的方式不同,能执行的控制指令也可以与之前不同。
在一些实施例中,针对上述步骤S30,终端通过第一用户界面所显示的资源的第一部分,也可以通过其他方式从第三方服务器获得,而非上述步骤S10-S20,这种情况下则不需要包括步骤S10-S20。例如,下面列举了两种可能的方式:
第一种方式:在一些实施例中,第一资源是响应终端的非语音的输入后提供给终端的。例如,可以是用户手动进入终端的第一用户界面,点击第一用户界面所显示的某图标(如菜谱图标),或在终端提供的搜索框手动输入某词(如“红烧肉”),终端将相应信息发送到服务器,服务器获得相应的请求后,从第三方服务器获得相应的请求结果(即第一资源)后提供给终端。
第二种方式:在一些实施例中,当用户使用了订阅功能的情况下,第一资源可以是第三方服务器主动通过服务器推送到终端的。例如用户订阅了某菜谱,当第三方服务器提供了新的“红烧肉”菜谱(即第一资源)后,则通过服务器推送到终端,当用户进入第一用户界面时,则可以显示该推送的菜谱(即第一资源)。
在一些实施例中,终端在显示第一资源的第一部分或第二部分时,或者在显示与第一资源相关的第二资源时,所述第一用户界面可以采用卡片的形式进行显示。由于需要展示的内容不同,卡片也会有多种不同的形式。不同的内容布局适用不同的卡片形式,或称为卡片模板。因此,通过采用的卡片模板终端/服务器也可以确定卡片所展示的内容的布局。
为了更准确有效地识别用户的意图,终端/服务器可以通过获取的第一资源以及卡片模板来生成针对该第一资源进行显示的控制指令的候选集合,该控制指令的候选集合可以用于匹配出第二语音的语义对应的控制指令。例如如图1所示,第一资源为红烧肉菜谱,第一用户界面只显示了“红烧肉”做法的第一步。当前的卡片属于图文型卡片,且包含多页内容,终端/服务器可以生成针对该红烧肉菜谱卡片的控制指令的 候选集合包括:“下一页”/“下一步”/“翻页”等切换到第二部分进行显示的控制指令、“查看详情”等切换到详情页面进行显示的控制指令等。可选的,如果第一资源只有一个页面,卡片模板为图文型卡片,有可能终端/服务器生成的控制指令的候选集合就没有“下一页”/“下一步”/“翻页”等切换到第二部分进行显示的控制指令。
在一些实施例中,按照卡片所展示的内容的呈现形式,卡片可以分为纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片等。其中,卡片中还可以有控件,如按钮控件,该控件会对应一指令,例如,按钮控件为一电话呼出按钮时,该电话呼出按钮控件对应呼叫操作(例如调用电话APP并呼出相应的号码)的指令,又如,按钮控件为一用于翻页的控件(如控件上写着下一页)时,该翻页按钮控件对应显示下一页内容的指令。
可以理解的是,上述卡片的分类只是示例,卡片还可以有不同的分类方式,也可以有不同的命名等,本申请实施例对此不作具体限定。以下先以上述列出的四种卡片类型来进行举例说明。
1)纯文本型卡片:指卡片中的展示的内容是文本的卡片,或者,卡片中的展示的内容以文本为主的卡片,例如图4d所示的示例。
可选的,纯文本型卡片上可以有按钮控件,用户可以通过语音或触控的方式触发按钮控件的功能。对此,同样适用于其他类型的卡片,后文描述其他类型卡片时,将不再赘述。
该类型的卡片,如果当前显示的内容是第一资源(服务器返回的用于显示的数据)的部分信息时,用户可以通过触控卡片,来对显示的内容进行翻页、移动、查看第一资源的详情等,来进行显示控制。其中,查看第一资源内容的详情时,终端可以切换到显示第一资源的详细信息的内容的页面,也可以是终端通过详情页面的跳转链接,跳转到与该跳转链接对应的与详情相关的第二页面并显示详情内容(第二资源)。
相应的,语音控制指令的候选集合可包括实现翻页、移动、查看详情需求对应的指令,从而用户也可以通过语音指令实现上述对显示的内容进行翻页、移动、查看详情等需求。可选的,针对该类型的卡片,在获取到需要进行显示的数据(第一资源)时,可以根据卡片模板和需要显示的数据预先生成语音控制指令的候选集合。例如语音控制指令的候选集合可以包括查看详情(CHECK_DETAIL)对应的指令。当需要多页来显示数据时,该语音控制指令的候选集合还可以包括:“上一页”、“上一步”等往上/往前翻页(Previous)的指令,“下一页”、“下一步”等往下/往后翻页(Next)的指令;可选的,该语音控制指令的候选集合还可以包括“往上(Up)”、“往下(Down)”、“往左(Left)”或“往右(Right)”等移动相关的指令(等价于手指在屏幕移动,使卡片显示的页面的整体移动的指令);等。若卡片还包括控件,该语音控制指令的候选集合还可以包括与控件功能对应的语音指令。对此,同样适用于其他类型的卡片,后文描述其他类型卡片时,将不再赘述。
可选的,语音控制指令的候选集合可以是基于卡片模板和需要显示的数据生成的。语音控制指令的候选集合也可以是基于卡片模板生成。对此,同样适用于其他类型的卡片,后文描述其他类型卡片时,将不再赘述。
语音控制指令的候选集合中的控制指令都预先关联了终端能响应的与卡片显示 关联的控制。因此,当接收到用户的语音指令后,终端/服务器可以将语音指令(如语音指令的意图)与语音控制指令的候选集合中的控制指令进行匹配,若匹配成功,则执行对应指令的功能,以实现翻页、移动、查看详情、触发控件操作等控制。对此,同样适用于其他类型的卡片,后文描述其他类型卡片时,将不再赘述。
进一步的,以图4d所示的纯文本型卡片的例子进一步进行说明:图4d所示卡片存在三个按钮功能,对应呼叫、更换国家、出行指南三个按钮,采用本申请实施例的技术方案会将该三个按钮功能中展示的文字,即控件名称(图4d中即对应按钮名称),作为可响应语音指令的文本。当语音指令匹配(或称被语音指令命中)控件名称(即对应按钮名称)时,会生成该按钮对应的控制指令。例如,识别用户语音中的语音指令的语义为“呼叫”,命中控件名称为“呼叫”的所述按钮,将该按钮对应的控制指令发送到终端以执行,假设该“呼叫”按钮对应的指令是呼出号码“0123-987654321”,则终端执行该指令。由此,通过用户语音即可实现触发该“呼叫”按钮对应的指令,并且,由于语义识别的鲁棒性,在用户说出“呼叫”、“呼出”、“电话”、“拨号”,都可以识别为语义为“呼叫”(即都可以识别语音指令为呼叫)时,都可以执行该“呼叫”按钮对应的指令。
2)图文型卡片:指一个卡片中展示的内容是图片和文字,或者,卡片中展示的内容只有图片没有文字的卡片,例如图4a-1和图4a-2。图片与文字的排布可以是上下排布、左右排布、包围排布、嵌入排布等等。
可选的,该图文型卡片上可以有按钮控件,用户可以通过语音或触控的方式触发按钮控件的功能。
该类型的卡片,语音控制指令的候选集合与纯文本型卡片类似,语音控制指令的候选集合可包括实现翻页、移动、查看详情需求对应的指令。具体可参见纯文本型卡片的语音控制指令的候选集合的介绍,不再赘述。
例如,当图文型卡片所对应的第三方提供的数据包含隐藏的对应详情页的跳转链接时,所生成的该语音控制指令的候选集合可包括“查看详情(CHECK_DETAIL)”的控制指令,当识别用户的语音指令的语义为“查看详情”时,命中该“查看详情”的对应控制指令,然后执行该控制指令,例如根据对应的跳转链接,跳转到对应的跳转链接对应的页面,或拉起某应用显示对应的跳转链接的页面。例如图4a-1示出的即采用了图文型卡片的一个例子,卡片此时呈现的是张三的简要信息,此卡片所展示的数据中包括跳转链接,图4a-1中“张三(Sam Zhang)”以下划线表示该数据包含跳转链接(也可以无这类显示),该跳转链接链接到用于展示张三的详细信息的地址,基于此生成查看详情对应的控制指令。当用户语音为“查看详情”、“详细信息”等类似语音,并被识别为语义为查看详情时,命中该“查看详情”的控制指令,假设该控制指令是访问所述跳转链接的指令,则将该指令发送到终端并被终端执行时,所呈现效果为终端的卡片响应用户语音指令,卡片展示的内容从张三的简要信息切换到所述跳转链接的地址所展示的张三的详细信息,可以是直接显示地址的对应的网页信息,或启动另一应用,如百度百科来提供对应的详情页面等。如图4a-2示出了仍通过卡片形式显示张三的详细信息的例子。
又如,图1示出的具有红绕肉的菜谱的烹饪步骤,也采用了图文型卡片来显示, 其需要对应6个烹饪步骤的6个页面数据,即需要多页来显示数据,所生成的语音控制指令的候选集合中可包括翻页的语音控制指令,如“上一页”、“上一步”等往上/往前翻页(Previous)的指令,“下一页”、“下一步”等往下/往后翻页(Next)的指令。
3)宫格型卡片:是指以多个方格或类方格形式来展示的内容的卡片。宫格型卡片中显示的内容通常是多于一行和一列的卡片,宫格型卡片里显示的内容可以是包括图片和文字的卡片,或者,卡片中的展示的内容也可以只有图片或文字。宫格的布局通常为多行多列排列,例如,宫格型卡片如2行2列即是四个图文或者图片组成的卡片,3行3列即是9个图文或者图片组成的卡片,也可以是2行3列或者3列2行,或者超过3行或3列的图文或者图片组成的卡片。宫格型卡片的例子可参见图4b或图4f-1所示的示例。
该类型的卡片,语音控制指令的候选集合可包括实现翻页、移动对应的指令,并且,由于存在多个宫格对象时,语音控制指令的候选集合还可以包括:“选择第n个(Select ItemID:n)”的表示选择的相关指令,该指令用于表示选择多个对象中的一个,并进入所选择对象的详情页面,以及还可包括选择框移动的相关指令。
例如,图4b或图4f-1示出的宫格型卡片,每个宫格对象上可以具有用于选择的编号,生成的语音控制指令的候选集合中可包括:选择某编号的相关指令,使用该语音控制指令可以选中其中的一个宫格对象,并执行相应控制,如进入所选择的宫格对象的详情页,如图4f-2所示。其中,所述编号为服务器根据各个宫格对象的位置依次生成。对此,同样适用于列表型卡片,后文将不再赘述。
采用该类型卡片时,生成的语音控制指令的候选集合中还可包括用于执行选择框的移动(Move)的语音控制指令,如选择框的往上(SelectBox up)、往下(SelectBox down)、往左(SelectBox let)、往右(SelectBox right),下一个(SelectBox Next),上一个(SelectBox Previous)等,例如针对图4b示出的横向布局的宫格型卡片,假设当前选择框位于电影二的海报上(以粗体框线示意选择框),通过对应的语音控制指令可以将选择框移动相邻的电影海报上。对此,同样适用于列表型卡片,后文将不再赘述。宫格型卡片和列表型卡片可能的一个区别是,宫格型卡片的移动相关的指令通常不仅可以包括上下方向移动,还可以包括左右方向移动,而列表型卡片通常是上下移动或者左右移动,通常不会同时支持上下方向和左右方向移动。
采用该卡片时,当生成语音控制指令的候选集合时,在生成翻页的控制指令时也可结合布局方向,生成向左/向右翻页、或者向上/向下翻页的控制指令。例如针对图4b示出的宫格型卡片横向布局时(即仅有两行),可生成向左/向右滑翻页的控制指令。
4)列表型卡片:指卡片中的展示的内容以列表的形式呈现,可以是多列(大于等于2)的横向布局型,或者多行(大于等于2)的纵向布局型。列表型卡片显示的内容可以是图片和文字,或者只有图片,或者只有文字组成的卡片,例如图4c和图4e所示的示例。
该类型的卡片,语音控制指令的候选集合与宫格型卡片类似,语音控制指令的候选集合可包括实现翻页、移动、用于表示选择多个对象中的一个并进入所选择对象的 详情页面、选择框移动的相关指令。具体可参见宫格型卡片的语音控制指令的候选集合的介绍,不再赘述。
图4c和图4e示出的列表型卡片,生成了用于选择列表项的编号,对应的,语音控制指令的候选集合还可以包括:“选择第n个(Select ItemID:n)”的相关指令。这样会方便用户通过说出编号的方式来进行操作,简化了用户的使用繁琐度。例如,服务器识别用户语音的语义为“选择第4个”时,因此能够对应到相应的第四个选项被选中的控制指令,采用该方式借助于编号触发,从用户语音输入角度来看会更便捷,从识别准确率角度来看会更简化和准确。
可选的,不同的实例中对卡片类型的划分可以不同。例如,在一种可实现方式中,包括单行横向排列的多个方格或单列纵向排列的多个方格的卡片,例如当图4b示出的卡片中,但当其多个方格仅为一行或一列时,可以属于宫格型卡片。可选的,在另一种可实现方式中,单行横向排列的多个方格或单列纵向排列的多个方格的卡片也可以属于列表型卡片,而不属于宫格型卡片。
终端/服务器是否支持不同的卡片类型,以及是否针对不同的卡片类型有不同的语音控制指令的候选集合,可以通过是否能响应用户的不同的语音请求来确认。
可以理解的是,上述语音指令与语音控制指令集合中预生成的控制指令进行匹配可以是精确匹配,也可以是模糊匹配来实现,本申请实施例对此不作具体限定。例如在模糊匹配模式下,当识别到的语音对应的文本为“下一”时,即可匹配到“下一个”。另一方面,基于神经网络的鲁棒性,也可以预先学习到对应相同语音指令的不同表述方式。进一步的,也可以根据所要展示的数据的内容/领域等,进一步扩展表述相同控制指令的不同表达方式,这种不同的表达方式,可以由提供所要展示数据的第三方服务器提供,指明扩展词和所对应的控制指令。例如服务器生成的表述下一页的控制指令,语义识别时,会将“下一步、下一页”等都识别为对应下一页的翻页控制指令,当提供某锻造工艺的相关数据的第三方服务器提供包括了类似“下一页翻页扩展:下一工序”的扩展说明(如文本方式提供),则,语义识别时,会将“下一工序”识别为下一页的翻页控制指令。
在一些实施例中,卡片模板可以有粗粒度和细粒度的划分。卡片模板可存储在终端中,并且,可以定期或不定期的从服务器获得卡片模板的更新,其中,该更新的方式可以是通过终端向服务器请求更新,或服务器主动推送更新。在一些实施例中,当卡片模板存储在终端中时,当显示上述第一资源时,可以由服务器发送给终端显示该第一资源所使用的模板标识(可以先由服务器从第三方服务器获取该模板标识),由终端根据该模板标识读取存储的卡片模板,以用于显示所述第一资源的第一部分或第二部分,在一些实施例中,当卡片模板存储在终端中时,也可以是用户从多个卡片模板中选择其所喜欢的卡片模板来进行第一资源的第一部分或第二部分的显示。
在一些实施例中,卡片模板存储在服务器中,服务器会将显示第一资源所需的卡片模板提供给终端。所使用的卡片模板的标识可以是从提供第一资源的第三方服务器获得。例如,在服务器从第三方服务器获取第一资源时,同时会获得一卡片模板标识,服务器根据该卡片模板标识确定出具体的卡片模板,连同第一资源提供给终端,终端 使用该卡片模板显示第一资源的第一部分或第二部分。
在一些实施例中,服务器可以提供给终端卡片模板的资源访问路径,终端可根据资源访问路径来获得卡片模板。在一些实施例中,提供给终端的可以是各类卡片模块的资源访问路径,从而终端可以下载或更新所存储的各类卡片模块。在一些实施例中,可以是在提供给终端第一资源时,提供给终端的对应第一资源的卡片模块的资源访问路径,从而终端可以下载或更新该卡片模块。
服务器可以根据所述终端的用户界面展示数据时使用的卡片形式(或卡片模板),来生成控制指令的候选集合。服务器将识别的用户语音信息与候选集合中的指令进行匹配,进而在匹配到候选集合中的控制指令后,执行对应的控制指令。在一些实施例中,在生成卡片控制意图对应的语音控制指令的候选集合时,可以进一步将所示卡片形式(或卡片模板)和所要展示的数据(资源)结合,来生成卡片控制意图的语音控制指令的候选集合。
在另一些实施例中,如图5所示,服务器还可提供网页页面形式的开发界面,用于使得第三方开发者可通过该开发界面选择所要使用的卡片类型,其中可以采用卡片模板(如图3左侧示出的模板类型)的方式来确定卡片类型,并且可以通过开发界面对所选用模板的进行基本配置,如图5上侧和右侧示出的字体大小、是否显示标题、图片比例、图片样式、或主题模式等配置,这些配置信息可对应不同的第三方服务器进行存储,以在使用时能够根据第三方的标识获取到对应的配置信息,以用于终端在进行卡片展示时会对应该配置进行展示。并且图3示出的开发界面还示出了所选模板的样例(如图3中间宫格型卡片的样例),该样例可随所述配置进行相应的字体、比例等变化。第三方服务器的开发人员(指展示所需数据的提供商)可以基于服务器,例如语音开放服务器的开发页面选择对应的卡片模板,第三方服务器的开发人员只需要输入业务数据内容,无需第三方服务器的开发人员输入与业务数据相关的编程内容(数据格式、控制意图),从而降低了第三方用户在开发时的开发难度,可以不需要第三方用户预设支持的控制指令。
为了对上述实施例进一步的进行理解,下面参照图6示出的流程图,对本申请用于资源显示的方法的第一具体实施方式进行介绍。上述实施例中描述的内容均适用于图6所示实施例,此处不再赘述。该具体实施方式中,服务器包括语音助手服务器和语音开放服务器,终端包括有语音助手APP和卡片引擎,且该具体实施方式的应用环境为用户通过语音助手APP发起卡片服务的情景,包括以下步骤:
S100:终端接收用户的第一语音。
在一种可选的实现方式中,终端的语音助手App在被启动或被唤醒后,可以开启与用户的语音交互,接收用户的语音指令。例如,用户通过“小艺小艺”等语音唤醒词唤醒语音助手App。终端中的语音助手APP可以调用终端麦克风接收到用户语音,或终端通过麦克风接收用户语音并提供给语音助手APP。本具体实施方式中,例如,终端接收的用户的第一语音为“我想买Mate40”,这里Mate40是手机的型号。
S105:终端将接收到的第一语音(“我想买Mate40”)发送给语音助手服务器。终端可以将第一语音发送给服务器。终端发送的可以是第一语音,或者第一语音经过处理后的文本等,本申请实施例对此不作限定。
S110:语音助手服务器在接收到第一语音后,对第一语音进行处理。语音助手服务器可以通过DM模块和NLU模块,对第一语音(“我想买Mate40”)做语义分析,得到语义分析的结果,该结果包括技能、意图和槽位,如“技能:华为导购,意图:买设备,槽位1-型号:Mate40,槽位2-设备类型:手机)。语音助手服务器可以将语义分析的结果(技能、意图和槽位的相关数据)发送到语音开放服务器。
S115:语音开放服务器可以根据技能,如技能名称或技能ID,确认提供该技能对应的服务的第三方服务器,并将该语义分析的结果(“技能:华为导购,意图:买设备,槽位1-型号:Mate40,槽位2-设备类型:手机)(意图和槽位的相关数据,或技能、意图和槽位的相关数据)通过预设的接口发给第三方服务器。
S120:第三方服务器可以根据意图和槽位信息获取对应的数据,或,根据技能、意图和槽位信息获取对应的数据,并将所述数据(第一资源)返回给语音开放服务器。
可选的,第三方服务器返回给语音开放服务器的信息还可以包括卡片模板ID(即选择哪类卡片类型来显示需要展示的数据)。
S125:语音开放服务器可以根据返回的需要显示的数据来确定卡片模板。
在一种实现方式中,语音开发服务器可以根据需要显示的数据来自动判断选择什么类型的卡片模板。
在另一种实现方式中,语音开放服务器可以预先记录该第三方服务器返回的需要显示的数据的卡片类型。
例如,第三方服务器对应的开发者可以在将第三方服务器的资源接入语音开发服务器时,选择卡片模板进行关联。后续该第三方服务器返回的数据都可以基于该卡片模板进行显示。如果选择多个时,可以根据数据的内容从多个关联的卡片模板中选择一个。选择模板的示例可如图5所示的宫格型卡片模板,但本申请实施例不做具体限定。
在另一种实现方式中,第三方服务器返回的信息中不仅包括与第一语音指令关联的数据(第一资源),还可以包括卡片模板。语音开放服务器可以根据第三方服务器返回信息中的卡片模板名称或卡片模板ID等来确定卡片模板。
可选的,语音开发服务器可以保存有卡片模板的详细信息,语音开发服务器也可以从根据卡片模板ID从卡片引擎获取对应的卡片模板的数据。卡片引擎可以是语音开发服务器的模板,也可以是其他服务器的模板或其他设备。
S130:语音开放服务器根据卡片模板和第三方服务器返回的卡片展示需要的数据生成语音控制指令的候选集合。
语音开放服务器根据卡片模板和第三方服务器返回的卡片展示需要的数据生成语音控制指令的候选集合。同一卡片模板,但不同的显示数据,可能对应的语音控制指令的候选集合不同。例如显示的数据不需要多页显示时,语音控制指令的候选集合就没有翻页、移动等候选控制指令。
可选的,语音开放服务器也可以根据卡片模板生成语音控制指令的候选集合。卡 片模板相同,不同的显示数据对应的语音控制指令的候选集合相同。例如,语音控制指令的候选集合可以为基于该类卡片模板的可能的显示控制的指令全集。
具体语音控制指令的候选集合,可以参考上述卡片类型介绍部分的描述,此处不再赘述。卡片的类型可以多样,上述示例作为参考,但不构成限制。
在上述“我想买Mate40”的实例中,假设所返回的数据包括了4种Mate型号的手机,显示这4种型号手机的页面采用了宫格型模板,基于宫格型卡片,以及该卡片可显示上述4种Mate手机的页面(例如在后续步骤中生成的如图4f-1示出的页面),所生成的语音控制指令的候选集合包括:“选择第n个(Select ItemID:n)”、表示选择框移动的下一个(SelectBox Next),上一个(SelectBox Previous)。在选择框移动到目标位置后,可以直接进入目标页面,或者等待接收到用户的表示“进入”等类似的语音指令后再进入目标页面。
S135:语音开放服务器将卡片模板的数据、卡片展示需要的数据、语音控制指令的候选集合返回给语音助手服务器。
S140:语音助手服务器存储所述语音控制指令的候选集合,用于响应后续语音指令。
S145:语音助手服务器将卡片模板的数据和需要显示的数据下发给终端。
语音助手服务器可以将卡片模板发送给终端,也可以在终端保存有卡片模板时,将卡片模板标识(如卡片模板ID、或卡片模板名称)发送给终端。
可选的,在将需要显示的数据(第一资源)发送给终端前,语音助手服务器或语音开放服务器可以对需要显示的数据进行处理,例如对显示的内容进行识别、提取、排版等。本申请实施例中将从第三方获取的需要显示的数据和发送给终端进行显示的数据均认为是需要显示的数据或第一资源。
S150:终端的语音助手APP基于卡片模板对接收的数据进行显示。
例如,终端的语音助手App可以拉起卡片引擎,将卡片模板的数据和卡片展示需要的数据传给卡片引擎,由卡片引擎完成卡片的渲染并进行展示,即对第一资源的第一部分进行展示。
当语音助手服务器发送了卡片模板时,则终端可以使用该模板。当语音助手发送的是卡片标识时,且终端上存储有各类卡片模板时,可以根据收到的模板标识确定出所使用的卡片模板。当语音助手服务器发送的是卡片模板的资源访问路径时,则终端可以访问该资源访问路径获得对应的卡片模块。
在上述“我想买Mate40”的实例中,根据卡片模板标识获得的宫格型模板,以及相应的要展示的数据,生成了如图4f-1示出的宫格型卡片页面,包括了4种Mate型号手机的图和位于图下方的型号,以及对应的编号1-4。
S200:终端接收用户的第二语音。
基于上述语音对话交互,终端可以继续检测用户的语音,实现多轮对话。具体的,终端的语音助手App显示第一资源的第一部分内容后,终端还可以继续接收用户的语音指令,接续之前的对话。其中,在如图4f-1所示,并显示了用户可以使用的语音指令“选第几个、上/下一个”,以便于用户所说的语音尽量贴近可识别的语音指令,便于精准的识别用户的意图。第二语音与第一语音属于同一会话过程,如第二语音的会 话标识与上述第一语音的会话标识相同。
本实施例中,终端中的语音助手APP可以接收用户的语音,如“上一个、下一个、第n个”等语音,可以对上一轮对话显示内容,如选择框的移动等,以进行进一步显示控制。
在上述“我想买Mate40”的实例中,可以接收用户的“选择第4个”的语音指令。在一些实施例中,表示相同意图的用于语音也可以是“第4个”、“选第4”等等。
S205:终端将接收到语音由ASR模块识别为文本,然后发送到语音助手服务器。
例如,自动语音识别模块将语音转换成文本为“选择第4个”。对应步骤S200中的例子,当语音为“第4个”、“选第4”时,转换成文本为则“第4个”、“选第4”。
终端可以将第二语音或第二语音对应的文本等发送给服务器进行语音识别。可选的,终端具有语音识别功能时,也可以终端本地对语音进行识别,而不需要通过服务器处理。
S210:语音助手服务器识别第二语音对应的语义,并将第二语音对应的语义与语音控制指令的候选集合进行匹配,匹配出对应的控制指令。
其中,对于“选择第4个”、“第4个”、“选第4”语音助手服务器会识别为相同的语义,假设本例中所识别的语义为“选择第4个”。
然后,语音助手服务器识别出“选择第4个”与语音控制指令的候选集合中所匹配的匹配控制指令,如所匹配的完整的控制指令为“CardAction:Select ItemID:4th”。
再例如,假设步骤S200中,接收用户语音为“下一个”时,则本步骤中语音助手服务器识别出“下一个”与语音控制指令的候选集合中的“移动选择框到下一个”的意图相匹配,进而确定出该卡片控制意图对应的控制指令,如为“CardAction:GeneralAction:Move Next”。
S215:语音助手服务器向终端的语音助手APP发送所确定出的控制指令。
语音助手在匹配出第二语音对应的控制指令后,发送给终端,以使终端对卡片进行相应的显示控制。
S220-S225:终端的语音助手APP将所述控制指令发送给卡片引擎,通过卡片引擎使卡片响应该控制指令。
卡片引擎可以是语音助手App的一部分模块/功能,也可以是独立于语音助手App模块/功能。卡片引擎用于对卡片显示进行处理和控制,也可以是其他名称,本申请实施例对此不作限定。
假设控制指令是对应步骤S210中用户语音“下一个”的控制指令:“CardAction:GeneralAction:Move Next”,本步骤中,卡片引擎将图4f-1中的选择框下移到第二个宫格对象(即Mate40pro对应的图片上),此时本轮对话结束。
假设控制指令是对应步骤S210中用户语音“选第4个”的控制指令“CardAction:Select ItemID:4th”,则本步骤中,卡片引擎响应该指令的具体方式为:根据与语音助手APP约定的消息格式,向语音助手APP反馈包括第四项的信息的消息,如该消息格式可以为{"message2vassistant":"Mate40保时捷设计版"},并继续S230-S235(以虚 线表示)。
S230-S235:语音助手APP可根据预设的处理方式对收到的消息进行处理,举例说明:
语音助手APP收到所述消息后,发送到语音助手服务器,语音助手服务器收到后,继续进行下一轮的对话处理。
例如,下一轮对话中,从第三方服务器获得Mate40保时捷设计版的详情页面数据及所使用的模板,并发送到终端显示(具体原理可参见步骤S110-S150,不再赘述),例如图4f-2示出了下一轮对话中,终端所显示的Mate40保时捷设计版的详细介绍,即详情页面,采用了图文型模板。并且,在下一轮对话中,图4f-2卡片页面对应的所生成的语音控制指令的候选集合包括:表示翻页指令的“上一页”、“下一页”、查看详情指令的“显示全部详情信息”。
其中,上述实施例中,上述步骤S200开始是进行的人机首轮对话之后的第二次对话,本领域技术人员不难理解,步骤S200开始可以是进行的人机首轮对话之后的任一次对话的过程。
在一种可选的实现方式中,终端还可以接续第二语音,对图4f-2中所示的卡片进行其他显示控制。例如“下一页”或“显示全部详情信息”等语音指令。其中,“显示全部详情信息”可以跳转到第三方的页面进行显示,该第三方页面可以是H5页面,也可以是应用或快应用页面。
在一种可选的实现方式中,终端还可以与用户进行其他多轮对话,实现其他会话过程。基于接收到的用户的语音,终端可以以不同类型的卡片(例如上述描述的文本型、图片型、宫格型、列表型等卡片)显示不同的资源。不同类型的卡片可以有相同的语音控制指令的候选集合,例如语音控制指令的候选集合包括移动、翻页、查看详情等一个或多个。不同类型的卡片也可以有不同的语音控制指令的候选集合,或者,针对同一类型的卡片,需要显示的数据(资源)不同,不同卡片也可以有不同的语音控制指令的候选集合,例如同一类型的卡片,单页显示和多页显示的语音控制指令的候选集合不同。可选的,终端或服务器是否支持有不同的语音控制指令的候选集合,可以通过测试终端或服务器是否能响应用户的特定的指令来验证。
多轮对话是否属于同一次对话,可以通过会话标识来区分。
基于上述实施例,本申请还提供了一种资源显示的方法的另一实施例,应用于终端。上述实施例中相关的内容均适用于本实施例,此处不再赘述。参见图7a示出的流程图,包括以下步骤:
S310:接收用户的第一语音。
S320:将所述第一语音对应的数据发送给服务器。
S330:接收所述服务器发送的与所述第一语音对应的第一资源。
S340:显示第一用户界面,所述第一用户界面显示的内容包括所述第一资源的第一部分内容。
S350:接收所述用户的第二语音;所述第二语音用于对显示的所述第一资源进行控制。
S360:响应所述第二语音对应的控制指令,显示所述第一资源的第二部分内容,或者,响应所述第二语音对应的控制指令,显示与所述第一资源相关的第二资源。
其中,上述步骤S310和步骤S320的一种可实现的方式,可参见上述步骤S10-S15所描述的内容。上述步骤S330-S350的一种可实现的方式,可参见上述步骤S30-S35所描述的内容。上述步骤S360的一种可实现的方式可参见上述步骤S60所描述的内容。
在一些实施例中,所述第一用户界面显示的内容包括第一资源的第一部分,具体包括:在所述第一用户界面,以卡片形式显示所述第一资源的第一部分。例如图1、图4a-1、图4b到图4f-1均为以卡片形式显示所述第一资源的第一部分的实施例。
在一些实施例中,所述卡片包括多个类型,所述终端呈现的第一卡片的卡片类型与所述第一资源需要显示的内容相关。例如纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片等。
在一些实施例中,所述第一卡片对应第一语音指令控制的候选集合,所述第一语音指令控制的候选集合包括一个或多个对所述第一卡片的内容进行显示控制的指令,所述第一语音指令控制的候选集合与所述第一卡片的卡片类型相关;相应的,响应所述第二语音对应的控制指令,显示所述第一资源的第二部分内容,或者,响应所述第二语音对应的控制指令,显示与所述第一资源相关的第二资源,具体包括:若所述第二语音对应的指令与所述第一语音指令控制的候选集合的第一指令匹配时,对所述第一卡片执行与所述第一指令关联的第一显示控制以显示所述第一资源的第二部分内容,或者显示与所述第一资源相关的第二资源。其中,生成第一语音指令控制的候选集合的一种可实现的方式可参见上述步骤S130所描述的内容。另外,图1所描述的实施例为显示所述第一资源的第二部分内容,图4a-1和图4a-2所描述的实施例为显示与所述第一资源相关的第二资源。
在一些实施例中,所述方法还包括:接收所述用户的第三语音;若所述第三语音对应的指令与所述第一语音指令控制的候选集合的第二指令匹配时,对所述第一卡片执行与所述第二指令关联的第二显示控制,所述第二显示控制不同于所述第一显示控制。例如图4b或图4c示出对应的例子中,第二语音可以是控制选择框移动的语音指令(如语音为:选择下一个或选择左边的/右边的等等),第一显示控制是控制选择框的移动;第三语音可以是卡片显示内容整体移动(对应手指触摸滑动)的语音指令(如语音为:向右滑、向下滑等等),第二显示控制是控制卡片显示内容的移动,例如将所显示内容向左或向上方向移动,以将部分内容移出卡片,而将另一部分(对应第二资源的部分内容)移入卡片显示。
在一些实施例中,不同的卡片类型对应的语音指令控制的候选集合不同;所述方法还包括:接收用户的第四语音;以第二卡片显示所述第四语音对应的第三资源,所述第二卡片的类型不同于所述第一卡片的类型;所述第二卡片对应第二语音指令控制的候选集合,所述第二语音指令控制的候选集合包括一个或多个对所述第二卡片的内容进行显示控制的指令,所述第二语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。例如,以图4f-1和图4f-2构成的实施例对此进行说明,图4f-1为以宫格型卡片(相当于第一卡片的类型)显示获得的第一资源(即获得各Mate手机 型号和图),图4f-2为采用图文型卡片(相当于第二卡片的类型)显示获得的第三资源(即获得的Mate保时捷设计版的介绍信息),服务器会根据宫格型卡片和获得的第一资源生成对应的指令控制的候选集合(对应第一语音指令控制的候选集合),会根据图文型卡片和获得的第二资源生成对应的指令控制的候选集合(对应第二语音指令控制的候选集合)。
在一些实施例中,所述第一语音指令控制的候选集合由服务器基于所述第一卡片的卡片类型和所述第一资源生成。例如前文参考图1所描述的终端/服务器可以通过获取的第一资源以及卡片模板来生成针对该第一资源进行显示的控制指令的候选集合的例子。
在一些实施例中,接收所述用户的第二语音之后,所述方法还包括:向服务器发送所述第二语音的数据;接收服务器确认的所述第二语音对应的指令与所述第一语音指令控制的候选集合的所述第一指令匹配的结果,对所述第一卡片执行与所述第一指令关联的第一显示控制。其中,一种可实现的方式可参见上述步骤S40、步骤S60所描述的内容。
在一些实施例中,所述方法还包括:接收用户的第五语音;以第三卡片显示所述第五语音对应的第四资源,所述第三卡片的类型与所述第一卡片的类型相同,所述第四资源不同于所述第一资源;所述第三卡片对应第三语音指令控制的候选集合,所述第三语音指令控制的候选集合包括一个或多个对所述第三卡片的内容进行显示控制的指令,所述第三语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。例如,以图4a-1和图4a-2构成的实施例对此进行说明,图4a-1为以图文型卡片显示获得的第一资源(即对张三的简介),图4a-2为也采用图文型卡片显示获得的第四资源(即通过触发跳转链接获得的张三的详情信息),服务器会根据图文型卡片和获得的第一资源生成对应的指令控制的候选集合(对应第一语音指令控制的候选集合,例如包括查看详情的指令),会根据图文型卡片和获得的第四资源生成对应的指令控制的候选集合(对应第三语音指令控制的候选集合,例如包括使卡片所显示内容整体向上/下移动的指令,以显示出当前卡片尚未显示出来的部分)。
在一些实施例中,所述卡片的类型包括如下一个或多个:纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片。例如图4d为纯文本型卡片,图1、图4a-1、图4a-2、图4f-2为图文型卡片,图4b、图4f-1为宫格型卡片,图4c、图4e为列表型卡片。
在一些实施例中,对于纯文本型卡片或图文型卡片,所述第一控制指令的候选集合包括移动、翻页、或查看详情中的一个或多个。其一种实现方式,可参见对纯文本型卡片、图文型卡片的介绍。
在一些实施例中,对于宫格型卡片或列表型卡片,所述控制指令的候选集合包括移动、翻页、选中对象、或移动选择框中的一个或多个。其一种实现方式,可参见对宫格型卡片或列表型卡片的介绍。
在一些实施例中,所述第一卡片还显示有按钮时,所述第一控制指令的候选集合还包括:对所述按钮的控制指令,所述控制指令用于触发所述控件。其一种实现方式,可参见纯文本型卡片中对按钮的介绍。
在一些实施例中,所述第一资源是从第三方服务器获取的资源,所述卡片类型由 第一资源对应的第三方选择。其一种实现方式,可参见在针对步骤S30终端通过第一用户界面所显示的资源的第一部分的介绍。
在一些实施例中,所述第一语音和所述第二语音对应的会话标识相同。这里参照图3对一种实现方式进行介绍,图3示出的步骤S10-步骤S60可以是在一次对话中,故可以采用相同的会话标识。
基于上述实施例,本申请还提供了一种资源显示的方法的又一实施例,应用于服务器,上述实施例中相关的内容均适用于本实施例,此处不再赘述。参见图7b示出的流程图,包括:
S410:接收终端发送的用户的第一语音对应的数据。
S420:根据所述第一语音对应的数据获取第一资源。
S430:将所述第一资源发送给所述终端进行显示。
S440:接收所述终端发送的所述用户的第二语音对应的数据;所述第二语音用于所述终端对所显示的所述第一资源进行控制。
S450:识别所述第二语音对应的控制指令。
S460:将所述控制指令发送给所述终端,以使所述终端响应所述控制指令,显示所述第一资源的第二部分内容,或者,响应所述控制指令,显示与所述第一资源相关的第二资源。
其中,上述步骤S410-S430的一种可实现的方式,可参见上述步骤S15-S25所描述的内容。步骤S440-S460的一种可实现的方式,可参见上述步骤35-S55所描述的内容。
在一些实施例中,在根据所述第一语音对应的数据获取第一资源之后,所述方法还包括:将第一卡片的信息发送给所述终端,以使所述终端以所述第一卡片显示所述第一资源,其中,所述第一卡片的信息包括所述第一卡片或所述第一卡片的标识或所述第一卡片的资源访问路径。其中,一种可实现的方式可参见上述步骤S125所描述的内容。
在一些实施例中,所述在根据所述第一语音对应的数据获取第一资源之后,所述方法还包括:根据所述第一卡片的卡片类型生成第一语音指令控制的候选集合,或,根据所述第一卡片的卡片类型和所述第一资源生成第一语音指令控制的候选集合,所述第一语音指令控制的候选集合包括一个或多个对所述第一卡片的内容进行显示控制的指令;所述识别所述第二语音对应的控制指令,将所述控制指令发送给所述终端,以使所述终端响应所述控制指令,具体包括:将所述第二语音对应的指令与所述第一语音指令控制的候选集合中的控制指令进行匹配;若所述第二语音对应的指令与所述第一语音指令控制的候选集合的第一指令匹配时,向所述终端发送与所述第一指令关联的对所述第一卡片的第一显示控制指令,以使所述终端显示所述第一资源的第二部分内容,或者显示与所述第一资源相关的第二资源。其中,生成第一语音指令控制的候选集合的一种可实现的方式可参见上述步骤S130所描述的内容。另外,图1所描述的实施例为显示所述第一资源的第二部分内容,图4a-1和图4a-2所描述的实施例为显示与所述第一资源相关的第二资源。
在一些实施例中,所述方法还包括:接收所述终端发送的所述用户的第三语音对应的数据;将所述第三语音对应的指令与所述第一语音指令控制的候选集合中的控制指令进行匹配;若所述第三语音对应的指令与所述第一语音指令控制的候选集合的第二指令匹配时,向所述终端发送与所述第二指令关联的对所述第一卡片的第二显示控制指令,所述第二显示控制指令不同于所述第一显示控制指令。其中,可实现方式,可以参见前述的对终端侧涉及第三语音时的可实现的例子的介绍,不再赘述。
在一些实施例中,不同的卡片类型对应的语音指令控制的候选集合不同;所述方法还包括:接收所述终端发送的所述用户的第四语音对应的数据;获取与所述第四语音对应的第三资源;将第二卡片的信息,以及所述第三资源发送给所述终端,以使所述终端以所述第二卡片显示所述第三资源;所述第二卡片的信息包括所述第二卡片或所述第二卡片的标识或所述第二卡片的资源访问路径,所述第二卡片的类型不同于所述第一卡片的类型;并根据所述第二卡片的卡片类型,生成与所述第二卡片关联的第二语音指令控制的候选集合,或,根据所述第二卡片的卡片类型和所述第三资源,生成与所述第二卡片关联的第二语音指令控制的候选集合,所述第二语音指令控制的候选集合包括一个或多个对所述第二卡片的内容进行显示控制的指令,所述第二语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。其中,可实现方式,可以参见前述的对终端侧涉及第四语音时的可实现的例子的介绍,不再赘述。
在一些实施例中,所述方法还包括:接收所述终端发送的所述用户的第五语音对应的数据;获取与所述第五语音对应的第四资源;将第三卡片或第三卡片的标识,以及所述第四资源发送给所述终端,以使所述终端以所述第三卡片显示所述第四资源,所述第三卡片的类型与所述第一卡片的类型相同,所述第四资源不同于所述第一资源;并根据所述第三卡片的卡片类型,生成与所述第三卡片关联的第三语音指令控制的候选集合,或,根据所述第三卡片的卡片类型和所述第四资源,生成与所述第三卡片关联的第三语音指令控制的候选集合,所述第三语音指令控制的候选集合包括一个或多个对所述第三卡片的内容进行显示控制的指令,所述第三语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。其中,可实现方式,可以参见前述的对终端侧涉及第五语音时的可实现的例子的介绍,不再赘述。
在一些实施例中,所述卡片的类型包括如下一个或多个:纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片。例如图4d为纯文本型卡片,图1、图4a-1、图4a-2、图4f-2为图文型卡片,图4b、图4f-1为宫格型卡片,图4c、图4e为列表型卡片。
在一些实施例中,对于纯文本型卡片或图文型卡片,所述第一控制指令的候选集合包括移动、翻页、或查看详情中的一个或多个。其一种实现方式,可参见对纯文本型卡片、图文型卡片的介绍。
在一些实施例中,对于宫格型卡片或列表型卡片,所述控制指令的候选集合包括移动、翻页、选中对象、或移动选择框中的一个或多个。其一种实现方式,可参见对宫格型卡片或列表型卡片的介绍。
在一些实施例中,所述第一卡片还显示有按钮时,所述第一控制指令的候选集合还包括:对所述按钮的控制指令,所述控制指令用于触发所述控件。其一种实现方式,可参见纯文本型卡片中对按钮的介绍。
在一些实施例中,所述第一资源是从第三方服务器获取的资源,所述卡片类型由第一资源对应的第三方选择。其一种实现方式,可参见在针对步骤S30终端通过第一用户界面所显示的资源的第一部分的介绍。
在一些实施例中,所述第一语音和所述第二语音对应的会话标识相同。这里参照图3对一种实现方式进行介绍,图3示出的步骤S10-步骤S60可以是在一次对话中,故可以采用相同的会话标识。
图8示意了本申请实施例的一终端100的结构示意图。
终端20可以包括手机、可折叠电子设备、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备、或智慧城市设备中的至少一种。本申请实施例对该终端20的具体类型不作特殊限制。
终端20可以包括处理器110,内部存储器120,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,显示屏130等。其中传感器模块140可以包括压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。
本实施例中,终端20通过移动通信模块150和天线1,或无线通信模块160和天线2接入移动通信网络或无线网络,实现与服务器的数据传输;通过麦克风170C或受话器170B实现接收用户语音;通过扬声器170A播放语音,或通过耳机接口170D向该耳机接口170D所接入的耳机或外接扬声器播放语音;通过显示屏130显示卡片;通过处理器110实现语音识别为文本、文本转换为语音、相关控制指令的执行、卡片的渲染和展示等等。这些具体的功能或其他未在此为列举的一些功能,具体可参见上述各个实施例和具体实施方式中所描述的终端所执行的步骤或功能,不再赘述。
可以理解的是,本申请实施例示意的结构并不构成对终端20的具体限定。在本申请另一些实施例中,终端20可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理 器110中的存储器可以为高速缓冲存储器。该存储器可以保存处理器110用过或使用频率较高的指令或数据。如果处理器110需要使用该指令或数据,可从该存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。处理器110可以通过以上至少一种接口连接触摸传感器、音频模块、无线通信模块、显示器、摄像头等模块。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端20的结构限定。在本申请另一些实施例中,终端20也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
终端20的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端20中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端20上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏130显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在终端20上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),蓝牙低功耗(bluetooth low energy,BLE),超宽带(ultra wide band,UWB),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation, FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,终端20的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端20可以通过无线通信技术与网络和其他电子设备通信。该无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。该GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
终端20可以通过GPU,显示屏130,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏130和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏130用于显示图像,视频等。显示屏130包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端20可以包括1个或多个显示屏130。
内部存储器120可以用于存储计算机可执行程序代码,该可执行程序代码包括指令。内部存储器120可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端20使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器120的指令,和/或存储在设置于处理器中的存储器的指令,执行终端20的各种功能方法或数据处理。
终端20可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模 块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端20可以通过扬声器170A收听音乐,或输出免提通话的音频信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端20接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端20可以设置至少一个麦克风170C。在另一些实施例中,终端20可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端20还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
图9是本申请实施例提供的一种服务器200的结构性示意性图。该服务器200包括:处理器210、存储器220、通信接口230、总线240。其中,服务器200通过通信接口230与终端进行数据或指令的传输,与第三方服务器进行数据或指令的传输。通过处理器实现自然语言理解的功能、对话管理的功能、卡片控制意图生成的功能、存储的功能等等,这些具体的功能或其他一些未在此为列举的功能,具体可参见上述各个实施例和具体实施方式中所描述的服务器所执行的步骤或功能,不再赘述。
应理解,图9所示的服务器24中的通信接口230可以用于与其他设备之间进行通信。
其中,该处理器210可以与存储器220连接。该存储器220可以用于存储该程序代码和数据。因此,该存储器220可以是处理器210内部的存储单元,也可以是与处理器210独立的外部存储单元,还可以是包括处理器210内部的存储单元和与处理器210独立的外部存储单元的部件。
可选的,服务器24还可以包括总线240。其中,存储器220、通信接口230可以通过总线240与处理器210连接。总线240可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该总线240可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。
应理解,在本申请实施例中,该处理器210可以采用中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器210采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。
该存储器220可以包括只读存储器和随机存取存储器,并向处理器210提供指令和数据。处理器210的一部分还可以包括非易失性随机存取存储器。例如,处理器210还可以存储设备类型的信息。
在服务器24运行时,所述处理器210执行所述存储器220中的计算机执行指令执行上述方法的操作步骤。
应理解,根据本申请实施例的服务器24可以对应于执行根据本申请各实施例的方法中的相应主体,并且服务器24中的各个模块的上述和其它操作和/或功能分别为了实现本实施例各方法的相应流程,为了简洁,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时用于执行一种多样化问题生成方法,该方法包括上述各个实施例所描述的方案中的至少之一。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意 组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是,但不限于,电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上的程序代码可以用任何适当的介质传输,包括、但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,均属于本申请保护范畴。

Claims (30)

  1. 一种资源显示的方法,应用于终端,其特征在于,包括:
    接收用户的第一语音;
    将所述第一语音对应的数据发送给服务器;
    接收所述服务器发送的与所述第一语音对应的第一资源;
    显示第一用户界面,所述第一用户界面显示的内容包括所述第一资源的第一部分内容;
    接收所述用户的第二语音;所述第二语音用于对显示的所述第一资源进行控制;
    响应所述第二语音对应的控制指令,显示所述第一资源的第二部分内容,或者,响应所述第二语音对应的控制指令,显示与所述第一资源相关的第二资源。
  2. 根据权利要求1所述的方法,其特征在于,所述第一用户界面显示的内容包括第一资源的第一部分,具体包括:
    在所述第一用户界面,以卡片形式显示所述第一资源的第一部分。
  3. 根据权利要求2所述的方法,其特征在于,
    所述卡片包括多个类型,所述终端呈现的第一卡片的卡片类型与所述第一资源需要显示的内容相关。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一卡片对应第一语音指令控制的候选集合,所述第一语音指令控制的候选集合包括一个或多个对所述第一卡片的内容进行显示控制的指令,所述第一语音指令控制的候选集合与所述第一卡片的卡片类型相关;
    响应所述第二语音对应的控制指令,显示所述第一资源的第二部分内容,或者,响应所述第二语音对应的控制指令,显示与所述第一资源相关的第二资源,具体包括:
    若所述第二语音对应的指令与所述第一语音指令控制的候选集合的第一指令匹配时,对所述第一卡片执行与所述第一指令关联的第一显示控制以显示所述第一资源的第二部分内容,或者显示与所述第一资源相关的第二资源。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    接收所述用户的第三语音;
    若所述第三语音对应的指令与所述第一语音指令控制的候选集合的第二指令匹配时,对所述第一卡片执行与所述第二指令关联的第二显示控制,所述第二显示控制不同于所述第一显示控制。
  6. 根据权利要求4或5所述的方法,其特征在于,不同的卡片类型对应的语音指令控制的候选集合不同;所述方法还包括:
    接收用户的第四语音;
    以第二卡片显示所述第四语音对应的第三资源,所述第二卡片的类型不同于所述第一卡片的类型;所述第二卡片对应第二语音指令控制的候选集合,所述第二语音指令控制的候选集合包括一个或多个对所述第二卡片的内容进行显示控制的指令,所述第二语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。
  7. 根据权利要求2-6任一项所述的方法,其特征在于,所述第一语音指令控制 的候选集合由服务器基于所述第一卡片的卡片类型和所述第一资源生成。
  8. 根据权利要求7所述的方法,其特征在于,接收所述用户的第二语音之后,所述方法还包括:
    向服务器发送所述第二语音的数据;
    接收服务器确认的所述第二语音对应的指令与所述第一语音指令控制的候选集合的所述第一指令匹配的结果,对所述第一卡片执行与所述第一指令关联的第一显示控制。
  9. 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:
    接收用户的第五语音;
    以第三卡片显示所述第五语音对应的第四资源,所述第三卡片的类型与所述第一卡片的类型相同,所述第四资源不同于所述第一资源;所述第三卡片对应第三语音指令控制的候选集合,所述第三语音指令控制的候选集合包括一个或多个对所述第三卡片的内容进行显示控制的指令,所述第三语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。
  10. 根据权利要求2-9任一项所述的方法,其特征在于,所述卡片的类型包括如下一个或多个:纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片。
  11. 根据权利要求10所述的方法,其特征在于,对于纯文本型卡片或图文型卡片,所述第一控制指令的候选集合包括移动、翻页、或查看详情中的一个或多个。
  12. 根据权利要求10或11所述的方法,其特征在于,对于宫格型卡片或列表型卡片,所述控制指令的候选集合包括移动、翻页、选中对象、或移动选择框中的一个或多个。
  13. 根据权利要求10-12任一项所述的方法,其特征在于,所述第一卡片还显示有按钮时,所述第一控制指令的候选集合还包括:对所述按钮的控制指令,所述控制指令用于触发所述控件。
  14. 根据权利要求2-13任一项所述的方法,其特征在于,所述第一资源是从第三方服务器获取的资源,所述卡片类型由第一资源对应的第三方选择。
  15. 根据权利要求1-14任一项所述的方法,其特征在于,所述第一语音和所述第二语音对应的会话标识相同。
  16. 一种资源显示的方法,应用于服务器,其特征在于,包括:
    接收终端发送的用户的第一语音对应的数据;
    根据所述第一语音对应的数据获取第一资源;
    将所述第一资源发送给所述终端进行显示;
    接收所述终端发送的所述用户的第二语音对应的数据;所述第二语音用于所述终端对所显示的所述第一资源进行控制;
    识别所述第二语音对应的控制指令;
    将所述控制指令发送给所述终端,以使所述终端响应所述控制指令,显示所述第一资源的第二部分内容,或者,响应所述控制指令,显示与所述第一资源相关的第二资源。
  17. 根据权利要求16所述的方法,其特征在于,在根据所述第一语音对应的数 据获取第一资源之后,所述方法还包括:
    将第一卡片的信息发送给所述终端,以使所述终端以所述第一卡片显示所述第一资源,其中,所述第一卡片的信息包括所述第一卡片或所述第一卡片的标识或所述第一卡片的资源访问路径。
  18. 根据权利要求17所述的方法,其特征在于,所述在根据所述第一语音对应的数据获取第一资源之后,所述方法还包括:
    根据所述第一卡片的卡片类型生成第一语音指令控制的候选集合,或,根据所述第一卡片的卡片类型和所述第一资源生成第一语音指令控制的候选集合,所述第一语音指令控制的候选集合包括一个或多个对所述第一卡片的内容进行显示控制的指令;
    所述识别所述第二语音对应的控制指令,将所述控制指令发送给所述终端,以使所述终端响应所述控制指令,具体包括:
    将所述第二语音对应的指令与所述第一语音指令控制的候选集合中的控制指令进行匹配;
    若所述第二语音对应的指令与所述第一语音指令控制的候选集合的第一指令匹配时,向所述终端发送与所述第一指令关联的对所述第一卡片的第一显示控制指令,以使所述终端显示所述第一资源的第二部分内容,或者显示与所述第一资源相关的第二资源。
  19. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    接收所述终端发送的所述用户的第三语音对应的数据;
    将所述第三语音对应的指令与所述第一语音指令控制的候选集合中的控制指令进行匹配;
    若所述第三语音对应的指令与所述第一语音指令控制的候选集合的第二指令匹配时,向所述终端发送与所述第二指令关联的对所述第一卡片的第二显示控制指令,所述第二显示控制指令不同于所述第一显示控制指令。
  20. 根据权利要求18或19所述的方法,其特征在于,不同的卡片类型对应的语音指令控制的候选集合不同;所述方法还包括:
    接收所述终端发送的所述用户的第四语音对应的数据;
    获取与所述第四语音对应的第三资源;
    将第二卡片的信息,以及所述第三资源发送给所述终端,以使所述终端以所述第二卡片显示所述第三资源;所述第二卡片的信息包括所述第二卡片或所述第二卡片的标识或所述第二卡片的资源访问路径,所述第二卡片的类型不同于所述第一卡片的类型;
    并根据所述第二卡片的卡片类型,生成与所述第二卡片关联的第二语音指令控制的候选集合,或,根据所述第二卡片的卡片类型和所述第三资源,生成与所述第二卡片关联的第二语音指令控制的候选集合,所述第二语音指令控制的候选集合包括一个或多个对所述第二卡片的内容进行显示控制的指令,所述第二语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。
  21. 根据权利要求18-20任一项所述的方法,其特征在于,所述方法还包括:
    接收所述终端发送的所述用户的第五语音对应的数据;
    获取与所述第五语音对应的第四资源;
    将第三卡片或第三卡片的标识,以及所述第四资源发送给所述终端,以使所述终端以所述第三卡片显示所述第四资源,所述第三卡片的类型与所述第一卡片的类型相同,所述第四资源不同于所述第一资源;
    并根据所述第三卡片的卡片类型,生成与所述第三卡片关联的第三语音指令控制的候选集合,或,根据所述第三卡片的卡片类型和所述第四资源,生成与所述第三卡片关联的第三语音指令控制的候选集合,所述第三语音指令控制的候选集合包括一个或多个对所述第三卡片的内容进行显示控制的指令,所述第三语音指令控制的候选集合不同于所述第一语音指令控制的候选集合。
  22. 根据权利要求17-21任一项所述的方法,其特征在于,所述卡片的类型包括如下一个或多个:纯文本型卡片、图文型卡片、宫格型卡片、列表型卡片。
  23. 根据权利要求22所述的方法,其特征在于,对于纯文本型卡片或图文型卡片,所述第一控制指令的候选集合包括移动、翻页、或查看详情中的一个或多个。
  24. 根据权利要求22或23所述的方法,其特征在于,对于宫格型卡片或列表型卡片,所述控制指令的候选集合包括移动、翻页、选中对象、或移动选择框中的一个或多个。
  25. 根据权利要求22-24任一项所述的方法,其特征在于,所述第一卡片还显示有按钮时,所述第一控制指令的候选集合还包括:对所述按钮的控制指令,所述控制指令用于触发所述控件。
  26. 根据权利要求17-25任一项所述的方法,其特征在于,所述第一资源是从第三方服务器获取的资源,所述卡片类型由第一资源对应的第三方选择。
  27. 根据权利要求16-26任一项所述的方法,其特征在于,所述第一语音和所述第二语音对应的会话标识相同。
  28. 一种资源显示的方法,其特征在于,包括:
    权利要求1-15任一项所述的资源显示的方法,和权利要求16-27任一项所述的资源显示的方法。
  29. 一种终端,其特征在于,包括处理器、内部存储器、显示屏和通信接口;
    所述内部存储器,用于存储计算机可执行程序;
    所述计算机可执行程序在被所述处理器调用时,使所述处理器用于通过所述通信接口或显示屏实现权利要求1-15任一项所述的资源显示的方法。
  30. 一种服务器,其特征在于,包括处理器、存储器和通信接口;
    所述存储器,用于存储计算机可执行程序;
    所述计算机可执行程序在被所述处理器调用时,使所述处理器用于通过所述通信接口实现权利要求16-27任一项所述的资源显示的方法。
PCT/CN2022/085018 2021-05-26 2022-04-02 资源显示的方法、终端及服务器 WO2022247466A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110579359.9A CN115408492A (zh) 2021-05-26 2021-05-26 资源显示的方法、终端及服务器
CN202110579359.9 2021-05-26

Publications (1)

Publication Number Publication Date
WO2022247466A1 true WO2022247466A1 (zh) 2022-12-01

Family

ID=84156275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/085018 WO2022247466A1 (zh) 2021-05-26 2022-04-02 资源显示的方法、终端及服务器

Country Status (2)

Country Link
CN (1) CN115408492A (zh)
WO (1) WO2022247466A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117198291B (zh) * 2023-11-08 2024-01-23 四川蜀天信息技术有限公司 一种语音控制终端界面的方法、装置及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103578473A (zh) * 2012-07-27 2014-02-12 爱卡拉互动媒体股份有限公司 可语音控制的点歌系统及其运作流程
US20150039307A1 (en) * 2013-08-05 2015-02-05 Samsung Electronics Co., Ltd. Interfacing device and method for supporting speech dialogue service
CN108491178A (zh) * 2018-02-13 2018-09-04 北京百度网讯科技有限公司 信息浏览方法、浏览器和服务器
CN108538291A (zh) * 2018-04-11 2018-09-14 百度在线网络技术(北京)有限公司 语音控制方法、终端设备、云端服务器及系统
CN109584879A (zh) * 2018-11-23 2019-04-05 华为技术有限公司 一种语音控制方法及电子设备
CN110706702A (zh) * 2019-10-15 2020-01-17 芜湖盟博科技有限公司 一种语音识别无限级多轮对话方法
CN110704004A (zh) * 2019-08-26 2020-01-17 华为技术有限公司 一种语音控制的分屏显示方法及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103578473A (zh) * 2012-07-27 2014-02-12 爱卡拉互动媒体股份有限公司 可语音控制的点歌系统及其运作流程
US20150039307A1 (en) * 2013-08-05 2015-02-05 Samsung Electronics Co., Ltd. Interfacing device and method for supporting speech dialogue service
CN108491178A (zh) * 2018-02-13 2018-09-04 北京百度网讯科技有限公司 信息浏览方法、浏览器和服务器
CN108538291A (zh) * 2018-04-11 2018-09-14 百度在线网络技术(北京)有限公司 语音控制方法、终端设备、云端服务器及系统
CN109584879A (zh) * 2018-11-23 2019-04-05 华为技术有限公司 一种语音控制方法及电子设备
CN110704004A (zh) * 2019-08-26 2020-01-17 华为技术有限公司 一种语音控制的分屏显示方法及电子设备
CN110706702A (zh) * 2019-10-15 2020-01-17 芜湖盟博科技有限公司 一种语音识别无限级多轮对话方法

Also Published As

Publication number Publication date
CN115408492A (zh) 2022-11-29

Similar Documents

Publication Publication Date Title
WO2020238774A1 (zh) 一种通知消息的预览方法及电子设备
CN110634483B (zh) 人机交互方法、装置、电子设备及存储介质
WO2020192456A1 (zh) 一种语音交互方法及电子设备
CN106202165B (zh) 人机交互的智能学习方法及装置
WO2021204098A1 (zh) 语音交互方法及电子设备
WO2022052776A1 (zh) 一种人机交互的方法、电子设备及系统
KR102390713B1 (ko) 전자 장치 및 전자 장치의 통화 서비스 제공 방법
JP2018508086A (ja) 入力処理方法、装置及び機器
WO2022078295A1 (zh) 一种设备推荐方法及电子设备
WO2020259514A1 (zh) 一种调用服务的方法及装置
CN114115674A (zh) 录音和文档内容的定位方法、电子设备及存储介质
WO2022247466A1 (zh) 资源显示的方法、终端及服务器
US11354520B2 (en) Data processing method and apparatus providing translation based on acoustic model, and storage medium
WO2022089197A1 (zh) 信息提示方法、装置、电子设备及存储介质
CN114745468A (zh) 电子设备及其主题设置方法和介质
CN112433787A (zh) 目标对象的序列化方法、装置、电子设备及存储介质
CN113449068A (zh) 一种语音交互方法及电子设备
WO2023071441A1 (zh) 通信录字母的显示方法、装置和终端设备
CN115841814A (zh) 语音交互方法及电子设备
CN113035189A (zh) 一种文档演示的控制方法、装置和设备
CN114327198A (zh) 控制功能推送方法及设备
CN109725798B (zh) 智能角色的切换方法及相关装置
WO2023197949A1 (zh) 汉语翻译的方法和电子设备
WO2023197951A1 (zh) 搜索方法和电子设备
WO2024051611A1 (zh) 人机交互方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22810199

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE