WO2022193735A1 - 显示设备及语音交互方法 - Google Patents

显示设备及语音交互方法 Download PDF

Info

Publication number
WO2022193735A1
WO2022193735A1 PCT/CN2021/134357 CN2021134357W WO2022193735A1 WO 2022193735 A1 WO2022193735 A1 WO 2022193735A1 CN 2021134357 W CN2021134357 W CN 2021134357W WO 2022193735 A1 WO2022193735 A1 WO 2022193735A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
data
display
display device
response
Prior art date
Application number
PCT/CN2021/134357
Other languages
English (en)
French (fr)
Inventor
张大钊
王冰
Original Assignee
海信视像科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110291989.6A external-priority patent/CN113066491A/zh
Priority claimed from CN202110320136.0A external-priority patent/CN113079400A/zh
Application filed by 海信视像科技股份有限公司 filed Critical 海信视像科技股份有限公司
Publication of WO2022193735A1 publication Critical patent/WO2022193735A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/44Receiver circuitry for the reception of television signals according to analogue transmission standards
    • H04N5/445Receiver circuitry for the reception of television signals according to analogue transmission standards for displaying additional information

Definitions

  • the present application relates to the technical field of voice interaction, and in particular, to a display device and a voice interaction method.
  • TVs can support voice control in addition to traditional remote control.
  • the user can input a piece of speech to the TV, and the TV can recognize the text from the speech, then query the semantics of the text through the network, and respond according to the relationship between the preset semantics and the service of the display device.
  • the voice that the user can input to the TV is a query sentence
  • the response of the display device to the voice can be to display the answer corresponding to the query sentence and read the answer aloud.
  • the process of reading the answer aloud by the display device and the process of displaying the answer are independent of each other, so that the user needs to concentrate on listening to the read answer while watching the displayed answer, resulting in poor voice interaction experience.
  • the present application provides a display device and a voice interaction method.
  • the display device provided by the present application includes:
  • controller connected to the display, the controller being configured to:
  • the response data includes audio data and display data
  • the display data includes the answer data of the voice command and the recommendation data of the voice command, generate a graphic object corresponding to the answer data, and the A response interface of a recommended control corresponding to the recommended data, wherein the recommended control is configured to jump to a user interface corresponding to the recommended control in response to being triggered;
  • the display is controlled to display the response interface, and the audio output device connected thereto is controlled to play audio corresponding to the audio data.
  • the server provided by this application is configured as:
  • the answer data and recommendation data are sent to the display device.
  • the voice interaction method provided by this application includes:
  • the response data includes audio data and display data
  • the display data includes the answer data of the voice command and the recommendation data of the voice command, generate a graphic object corresponding to the answer data, and the A response interface of a recommended control corresponding to the recommended data, wherein the recommended control is configured to jump to a user interface corresponding to the recommended control in response to being triggered;
  • the display is controlled to display the response interface, and the audio output device connected thereto is controlled to play audio corresponding to the audio data.
  • the display device provided by the present application includes:
  • controller connected to the display, the controller being configured to:
  • the response data includes broadcast data
  • the audio output device is controlled to play the audio corresponding to the broadcast data, and when the broadcast target is played, the broadcast target is highlighted on the display.
  • the voice interaction method provided by this application includes:
  • the response data includes broadcast data
  • the audio output device is controlled to play the audio corresponding to the broadcast data, and when the broadcast target is played, the broadcast target is highlighted on the display.
  • FIG. 1 exemplarily shows a schematic diagram of an operation scene between a display device and a control apparatus according to some embodiments
  • FIG. 2 exemplarily shows a hardware configuration block diagram of the control apparatus 100 according to some embodiments
  • FIG. 3 exemplarily shows a hardware configuration block diagram of the display device 200 according to some embodiments
  • FIG. 4 exemplarily shows a schematic diagram of software configuration in the display device 200 according to some embodiments
  • FIG. 5 exemplarily shows a schematic diagram of the principle of voice interaction according to some embodiments
  • FIG. 6 exemplarily shows a schematic diagram of a voice interaction interface according to some embodiments
  • FIG. 7 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • FIG. 8 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • FIG. 9 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • FIG. 10 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • FIG. 11 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • FIG. 12 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • FIG. 13 exemplarily shows a schematic diagram of another voice interaction interface according to some embodiments.
  • module refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code capable of performing the functions associated with that element.
  • FIG. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in FIG. 1 , a user can operate the display device 200 through the smart device 300 or the control device 100 .
  • control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or Bluetooth protocol communication, and other short-range communication methods, and the display device 200 is controlled wirelessly or wiredly.
  • the user can control the display device 200 by inputting user instructions through keys on the remote control, voice input, control panel input, and the like.
  • a smart device 300 eg, a mobile terminal, a tablet computer, a computer, a notebook computer, etc.
  • the display device 200 is controlled using an application running on the smart device.
  • the display device 200 can also be controlled in a manner other than the control apparatus 100 and the smart device 300.
  • the module for acquiring voice commands configured inside the display device 200 can directly receive the user's voice command for control.
  • the user's voice command control can also be received through a voice control device provided outside the display device 200 device.
  • the display device 200 is also in data communication with the server 400 .
  • the display device 200 may be allowed to communicate via local area network (LAN), wireless local area network (WLAN), and other networks.
  • the server 400 may provide various contents and interactions to the display device 200 .
  • the server 400 may be a cluster or multiple clusters, and may include one or more types of servers.
  • FIG. 2 exemplarily shows a configuration block diagram of the control apparatus 100 according to an exemplary embodiment.
  • the control device 100 includes a controller 110 , a communication interface 130 , a user input/output interface 140 , a memory, and a power supply.
  • the control device 100 can receive the user's input operation instruction, and convert the operation instruction into an instruction that the display device 200 can recognize and respond to, and play an intermediary role between the user and the display device 200 .
  • FIG. 3 is a block diagram showing a hardware configuration of the display apparatus 200 according to an exemplary embodiment.
  • display device 200 includes tuner 210, communicator 220, detector 230, external device interface 240, controller 250, display 260, audio output interface 270, memory, power supply, user interface at least one.
  • the controller includes a processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.
  • the display 260 includes a display screen component for presenting a picture, and a driving component for driving the image display, for receiving the image signal output from the controller, for displaying the video content, the image content and the menu manipulation interface Components and user manipulation UI interface.
  • the display 260 may be a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and projection screen.
  • communicator 220 is a component for communicating with external devices or servers according to various communication protocol types.
  • the communicator may include at least one of a Wifi module, a Bluetooth module, a wired Ethernet module and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.
  • the display device 200 may establish transmission and reception of control signals and data signals with the external control device 100 or the server 400 through the communicator 220 .
  • the user interface may be used to receive control signals from the control device 100 (eg, an infrared remote control, etc.).
  • the control device 100 eg, an infrared remote control, etc.
  • the detector 230 is used to collect signals from the external environment or interaction with the outside.
  • the detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which can be used to collect external environmental scenes, user attributes or user interaction gestures, or , the detector 230 includes a sound collector, such as a microphone, for receiving external sound.
  • the external device interface 240 may include, but is not limited to, the following: High Definition Multimedia Interface (HDMI), Analog or Data High Definition Component Input Interface (Component), Composite Video Input Interface (CVBS), USB Input Interface (USB), Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by a plurality of the above-mentioned interfaces.
  • HDMI High Definition Multimedia Interface
  • Component Analog or Data High Definition Component Input Interface
  • CVBS Composite Video Input Interface
  • USB Input Interface USB
  • Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by a plurality of the above-mentioned interfaces.
  • the controller 250 controls the operation of the display device and responds to user operations.
  • the controller 250 controls the overall operation of the display apparatus 200 .
  • the controller 250 may perform an operation related to the object selected by the user command.
  • the object may be any of the selectable objects, such as hyperlinks, icons, or other operable controls.
  • the operations related to the selected object include: displaying operations connected to hyperlinked pages, documents, images, etc., or executing operations of programs corresponding to the icons.
  • a "user interface” is a medium interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user.
  • the commonly used form of user interface is Graphical User Interface (GUI), which refers to a user interface related to computer operations displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display screen of the electronic device, wherein the control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, etc. visual interface elements.
  • GUI Graphical User Interface
  • the system is divided into four layers, from top to bottom, they are an application layer (referred to as “application layer”), an application framework layer (referred to as “framework layer”) ”), the Android runtime and the system library layer (referred to as the “system runtime layer”), and the kernel layer.
  • application layer an application layer
  • frame layer an application framework layer
  • Android runtime the Android runtime
  • system library layer the system library layer
  • kernel layer the kernel layer
  • At least one application program runs in the application program layer, and these application programs may be a Window program, a system setting program, or a clock program that comes with the operating system; they may also be developed by third-party developers. s application.
  • the application package in the application layer is not limited to the above examples.
  • the framework layer provides an application programming interface (API) and a programming framework for the application.
  • the application framework layer includes some predefined functions.
  • the application framework layer is equivalent to a processing center, which decides to let the applications in the application layer take action.
  • the application program can access the resources in the system and obtain the services of the system during execution through the API interface.
  • the system runtime layer provides support for the upper layer, that is, the framework layer.
  • the Android operating system will run the C/C++ library included in the system runtime layer to implement the functions to be implemented by the framework layer.
  • the kernel layer is the layer between hardware and software. As shown in Figure 4, the kernel layer at least includes at least one of the following drivers: audio driver, display driver, Bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, pressure sensors, etc.), and power drives, etc.
  • the kernel layer at least includes at least one of the following drivers: audio driver, display driver, Bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, pressure sensors, etc.), and power drives, etc.
  • the hardware or software architecture in some embodiments may be based on the introduction in the foregoing embodiments, and may be based on other similar hardware or software architectures in some embodiments, and the technical solutions of the present application may be implemented.
  • FIG. 5 is a schematic diagram of a speech recognition network architecture provided by an embodiment of the present application.
  • the smart device is used to receive the input information and output the processing result of the information.
  • the speech recognition service device is an electronic device deployed with a speech recognition service
  • the semantic service device is an electronic device deployed with a semantic service
  • the business service device is an electronic device deployed with a business service.
  • the electronic device here may include a server, a computer, etc.
  • the speech recognition service, the semantic service (also referred to as a semantic engine) and the business service are web services that can be deployed on the electronic device, wherein the speech recognition service is used to convert audio Recognized as text, the semantic service is used for semantic analysis of the text, and the business service is used to provide specific services such as the weather query service of Moji Weather and the music query service of QQ Music.
  • the speech recognition service is used to convert audio Recognized as text
  • the semantic service is used for semantic analysis of the text
  • the business service is used to provide specific services such as the weather query service of Moji Weather and the music query service of QQ Music.
  • the following describes the process of processing the information input to the smart device based on the architecture shown in FIG. 5 .
  • the above process may include the following three processes: :
  • the smart device After receiving the query sentence input by voice, the smart device can upload the audio of the query sentence to the voice recognition service device, so that the voice recognition service device can recognize the audio as text through the voice recognition service and return it to the smart device.
  • the smart device before uploading the audio of the query sentence to the speech recognition service device, the smart device may perform denoising processing on the audio of the query sentence, where the denoising processing may include steps such as removing echoes and ambient noise.
  • the smart device uploads the text of the query sentence recognized by the speech recognition service to the semantic service device, so that the semantic service device performs semantic analysis on the text through the semantic service to obtain the business field and intent of the text.
  • the semantic service device sends a query instruction to the corresponding business service device to obtain the query result given by the business service.
  • the smart device can obtain and output the query result from the semantic service device.
  • the semantic service device may also send the semantic parsing result of the query statement to the smart device, so that the smart device outputs the feedback statement in the semantic parsing result.
  • FIG. 5 is only an example, and does not limit the protection scope of the present application.
  • other architectures may also be used to implement similar functions, for example, all or part of the three processes may be completed by an intelligent terminal, which will not be repeated here.
  • the smart device shown in FIG. 5 can be a display device, such as a smart TV
  • the function of the voice recognition service device can be realized by a sound collector and a controller set on the display device
  • the semantic service device and the business service device The functions of the display device can be implemented by the controller of the display device, or by the server of the display device.
  • a query sentence or other interactive sentence input by the user to the display device through voice may be referred to as a voice command.
  • what the display device obtains from the semantic service device is the query result given by the business service
  • the display device can analyze the query result, generate response data for the voice command, and then control the display device to execute the corresponding response data according to the response data.
  • Actions For example, after analyzing the query result, it is obtained that the query result includes a piece of text set with a broadcast identifier.
  • the display device can generate response data according to a preset response rule.
  • An exemplary response rule is: when acquiring the text set with the broadcast identifier, generate a dialog box containing the text corresponding to the broadcast data on the voice interaction interface, and voice Broadcast the text corresponding to the broadcast data.
  • the display device can generate response data including UI interface data and broadcast data according to the preset response rule.
  • the UI interface corresponding to the UI interface data is provided with a dialog box containing text corresponding to the query result, and the broadcast data includes the broadcast data.
  • the display device can analyze the semantic parsing result, generate response data, and then control the display device to perform corresponding actions according to the response data.
  • the response data corresponding to the voice command includes service type data, but does not include broadcast data
  • the service type data may include UI interface data and/or control instructions of the display device. This generally occurs when the voice scene is a scene where the user issues an instruction to the display device.
  • the response data may include UI interface data for displaying a volume bar and a control command for increasing the speaker volume.
  • the display device adjusts the volume and displays the volume bar according to the response data, but does not broadcast the voice.
  • the response data corresponding to the voice command includes broadcast data and service type data, wherein the service type data may include UI interface data.
  • the service type data may include UI interface data.
  • the display device needs to feed back the query result to the user in the form of voice broadcast.
  • the corresponding data corresponding to the voice command includes broadcast data.
  • the response data may include UI data showing details of today's weather and broadcast data including weather information such as temperature, wind, and humidity.
  • the display device can display the UI interface and perform voice broadcast according to the response data.
  • the two processes of displaying the UI interface and the voice broadcast performed by the display device are independent of each other and are not related to each other, and the user needs to broadcast the voice broadcast by himself.
  • the text is linked to the UI interface, and the experience is not good.
  • an initial UI interface can be generated according to the UI interface data, and the broadcast data can be detected in the initial UI interface data.
  • the corresponding broadcast target displays the initial UI interface and performs voice broadcast.
  • the broadcast target is broadcast, the broadcast target is highlighted on the initial UI interface, so that the user can see the current voice broadcast content on the UI interface. , realizes the automatic connection between the broadcast text of the voice broadcast and the UI interface, and improves the user experience.
  • the following takes the voice interaction process between the user and the display device as an example to describe in detail the above technical solution for associating the broadcast text of the voice broadcast with the UI interface.
  • a voice control button may be provided on the remote control of the display device. After the user presses the voice control button on the remote control, the controller of the display device may control the display of the display device to display a voice interaction interface and control sound A collector, such as a microphone, picks up the sound around the display device. At this time, the user may input a voice instruction to the display device.
  • a voice interaction interface such as a microphone
  • the display device may support a voice wake-up function, and the sound collector of the display device may be in a state of continuously collecting sounds. After the user speaks the wake-up word, the display device performs speech recognition on the voice command input by the user, and after recognizing that the voice command is the wake-up word, it can control the display of the display device to display the voice interaction interface. At this time, the user can continue to input voice to the display device. instruction.
  • the sound collector of the display device can keep the sound collecting state, and the user can Press and hold the voice control button on the remote control at any time to re-enter the voice command, or say the wake-up word.
  • the display device can end the last voice interaction process, and start a new voice interaction process according to the new voice command input by the user. This ensures the real-time nature of voice interaction.
  • the current interface of the display device is a voice interaction interface
  • the display device performs voice recognition on the voice command input by the user
  • the text corresponding to the voice command is obtained, and the display device itself or the server of the display device performs the text processing on the text.
  • the user intent is obtained, the user intent is processed to obtain the semantic parsing result, and the response data is generated according to the semantic parsing result.
  • the response data can be called initial response data. If the display device responds directly according to the response data, it may There is a situation where the voice broadcast is independent of the UI interface.
  • echo cancellation may be performed on the voice broadcast sound.
  • the display device can process the initial response data to obtain the final response data, and respond according to the final response data, and obtain the connection between the voice broadcast and the UI interface. Effect.
  • the initial response data does not include broadcast data, the response can be directly based on the initial response data.
  • the process of processing the initial response data by the display device to obtain the final response data can be referred to the following description.
  • an initial UI interface can be generated according to the UI interface data, and then a broadcast target corresponding to the broadcast data can be detected on the initial UI interface.
  • the broadcast target may be an object associated with the broadcast text and capable of special display, such as text, graphics, etc.
  • the special display refers to the display that is different from the initial UI interface.
  • the broadcast target corresponding to the broadcast text may include a text target.
  • the content displayed on the initial UI interface usually includes broadcast text.
  • a dialog box is set on the initial UI interface, and the broadcast text is located in the dialog box.
  • the display device can split the broadcast text into at least two character groups according to some preset splitting rules, and determine each character group as a text target.
  • An exemplary splitting rule for broadcast text is as follows: split the broadcast text into single Chinese characters, each Chinese character can be used as a character group, if there is a punctuation mark between the Chinese character and the next Chinese character, the punctuation mark can be ignored, The punctuation mark can also be written into the character group corresponding to the Chinese character, or the punctuation mark can be written into the character group of the next Chinese character. For example, to announce the text "Hello", it can be split into two character groups, one is "you” and the other is "good!, resulting in two text objects.
  • An exemplary splitting rule for the broadcast text is as follows: the broadcast text is split into multiple words, and reference may be made to the above for the handling of punctuation marks. For example, for the broadcast text "Hello, my name is Xiao A.”, it can be split into four character groups, namely: “Hello,”, “I”, “Call”, “Xiao A.”, Thus, four textual targets are obtained.
  • An exemplary splitting rule for broadcast text is as follows: split the broadcast text into multiple short sentences using punctuation marks as a distinction, for example, for the broadcast text "Hello, my name is Xiao A.”, it can be split It is two character groups, namely: "Hello,” and "My name is Xiao A.”, thus obtaining two text targets.
  • splitting rules are only exemplary splitting rules for broadcast text, and in actual implementation, the splitting rules may also be other rules.
  • the content displayed on the initial UI interface also includes some other texts, which can be textually matched with the character group, and the text matched with the character group is also determined as a text target.
  • the broadcast text “Today's weather is cloudy.”
  • it can be split into the following character groups: “Today's”, “Weather”, “Yes”, “Cloudy.”
  • the voice interaction dialog box displays the broadcast text "Today's weather is cloudy.”
  • today's weather details include the text that matches the character group “Cloudy.”
  • “cloudy” in today's weather detail information can also be determined as a text target corresponding to the character group "cloudy”.
  • the rules for text matching may be the same text or the same, similar or related meanings of the text. For example, if the character group corresponding to the playback text is "3 to 8 degrees Celsius", and the weather details information on the initial UI interface contains the text “3 to 8 degrees Celsius”, the text "3 to 8 degrees Celsius” can ” has the same meaning, and “3°C ⁇ 8°C” is determined as the text target.
  • the broadcast object corresponding to the broadcast text may include a graphic object.
  • the initial UI interface can display multiple graphics, some graphics may be set with text descriptions, and these text descriptions may match the broadcast text.
  • the broadcast text is "ultraviolet rays are weak, not suitable for fishing, suitable for indoor exercise.”
  • the broadcast text can be divided into the following character groups, namely: “ultraviolet rays", “weak,”, “unsuitable”, “ fishing,”, “fit”, “indoor”, “exercise”.
  • the graphics displayed on the initial UI interface may include a sun graphic and a fish graphic, and a text description is set on one side of the sun graphic: "weak ultraviolet rays", and a text description is set on the side of the fish graphic: "fishing is normal”.
  • the text description "weak ultraviolet rays” is related to the content of the character group "ultraviolet”
  • the text description "fishing in general” is related to the content of the character group "fishing”. Therefore, the sun graphic can be determined as the character A graphic target corresponding to the group "ultraviolet”
  • the fish graphic is determined as a graphic target corresponding to the character group "fishing”.
  • one character group may correspond to one text target or may correspond to multiple text targets.
  • a character group may also correspond to one graphic object or may not correspond to a graphic object, and in some embodiments, one character group may also correspond to multiple graphic objects.
  • the display device may set an upper number for each character group to facilitate distinguishing different character groups.
  • the display device After the broadcast target is obtained, in order to ensure that the display device can highlight the broadcast target on time after playing the text corresponding to the broadcast target, during the voice broadcast process, the display device needs to obtain the broadcast progress.
  • the display device can collect sound through a microphone during the voice broadcast, so as to obtain the sound from the speaker of the display device, convert the sound from the speaker to text, and then convert the converted text with By matching the broadcast text, the current playback progress can be obtained, and when the broadcast progress corresponds to the broadcast target, the broadcast target is highlighted.
  • the time difference between the broadcast progress obtained by this method and the actual broadcast progress is the same as the data processing time of the display device, which can better reflect the actual broadcast progress of the display device, thereby ensuring the accuracy of the broadcast target to be highlighted.
  • the above method for obtaining the broadcast progress requires the display device to perform data processing on the sound emitted by the speaker in real time, which consumes a lot of performance.
  • the computing power of the display device is weak, it may cause the display device to freeze.
  • the display device can pre-calculate the time it takes from the broadcast starting point to the broadcast target, and then record the time after the broadcast starts, and the broadcast progress can be obtained according to the voice broadcast speed of the display device.
  • the specific implementation is as follows:
  • the display device may separately calculate the time it takes from the start point of the voice broadcast to the broadcast target when the display device performs voice broadcast.
  • the display device may support the use of different timbres to broadcast during voice interaction, and the speaking rates of different timbres may be slightly different.
  • the timbres supported by the display device include female voices and male voices, and the female voice speaks slightly faster , the male voice speaks a little slower.
  • the display device can set the timbre to be female by default.
  • the timbre of the display device can also be set to male by default.
  • the user can set the timbre of the display device in advance, so that the display device can broadcast voice according to the timbre set by the user. If the user does not preset the timbre of the display device. Voice, the display device will broadcast according to the default voice.
  • the display device can determine its own voice broadcast speed according to the current timbre, and calculate the start time and broadcast duration of the character group broadcast according to the voice broadcast speed and the character spacing between the starting point of the character group and the starting point of the voice broadcast.
  • the display device can adjust the UI interface data so that after the start time corresponding to the broadcast target is reached, that is, when the broadcast progress reaches the broadcast target, according to The preset emphasis display rule will highlight the broadcast target; within the broadcast duration corresponding to the broadcast target; the broadcast target can be maintained under emphasis; after the broadcast duration corresponding to the broadcast target, the broadcast target can be de-emphasized to make The user knows the progress of the current broadcast.
  • the UI interface data can also be adjusted so that after the broadcast duration corresponding to the broadcast target, the broadcast target can continue to be highlighted and displayed until the next broadcast target needs to be highlighted, and then the broadcast target can be de-emphasized. Target.
  • the broadcast target can continue to be highlighted and displayed, and the broadcast target cannot be de-emphasized after the next broadcast target needs to be highlighted. , but to highlight the two broadcast goals at the same time, so that users can know the progress of the broadcast.
  • the broadcast target can continue to be highlighted and displayed, and the broadcast target cannot be de-emphasized after the next broadcast target needs to be highlighted. , but both broadcast targets are highlighted at the same time, but the emphasis level of the previous broadcast target is lowered.
  • the method for highlighting the broadcast target may be to change the color of the broadcast target so that the color of the broadcast target is different from the previous one.
  • the text colors corresponding to different emphasis levels may be different, and the color difference between the text color corresponding to a high emphasis level and the background color may be relatively large.
  • the method for highlighting the broadcast target may be to display the broadcast target at the top of the playing interface, or display the broadcast target at the top of the playing interface, or display the non-playing target on the playing interface. Foremost.
  • the arrangement positions of graphic objects corresponding to different emphasis levels may be different, and graphic objects corresponding to high emphasis levels may be arranged higher or higher.
  • the method for highlighting the broadcast target may be to set the focus of the display device on the broadcast target of the playing interface.
  • the UI interface corresponding to the adjusted UI interface data can be a dynamic interface that changes with the broadcast progress, and the final response data can be obtained according to the adjusted UI interface data and broadcast data.
  • the display device can control the audio output device to start broadcasting the broadcast text, and control the display to display the initial UI interface. After the broadcast reaches the broadcast target, the broadcast target can be displayed on the initial UI interface. Highlighting, after broadcasting a broadcasting target, de-emphasizes the broadcasting target.
  • the audio output device may correspond to the audio output interface in FIG. 3 , wherein the audio output interface may include or be connected to a speaker and an external audio output terminal.
  • FIGS. 6-8 show schematic diagrams of voice interaction interfaces according to some embodiments.
  • the broadcast text includes: "Find the following jokes for you. Students in the writing class must write a short story in class, including religion, royalty, ".
  • the display device can split the broadcast text into multiple character groups, the first character group can be "find the following joke for you", the second character group and subsequent character groups can include a single character, that is, the second character group It is "Write”, the third character group is "Zuo”, the fourth character group is "Class”, and so on.
  • Each character group is used as a broadcast target.
  • the color of the broadcast target can be changed.
  • the writing class must write a short story in the classroom, including religion, ” is different from the colors of other broadcast goals, indicating that the broadcast progress of course is “teaching”.
  • the broadcast text includes: "It is cloudy today in Laoshan District, 3 to 8 degrees Celsius, ".
  • the display device can split the broadcast text into multiple character groups, the first character group can be "Laoshan District”, the second character group can be “today”, and the third character group can be “cloudy, 3 to 8". Celsius”, and so on.
  • Each character group is used as a broadcast target.
  • the color of the broadcast target can be changed.
  • the color of “cloudy, 3 to 8 degrees Celsius” is different from the color of other broadcast targets. , indicating that the broadcast progress of course is “cloudy, 3 to 8 degrees Celsius”.
  • Figure 7 also includes text targets that match the broadcast text. For example, “cloudy 3°C to 8°C” in the lower left corner of Figure 7 matches the broadcast target "cloudy, 3 to 8°C”. 3°C ⁇ 8°C” is set as the broadcast target. When the voice broadcast reaches “Cloudy, 3 to 8°C”, the “cloudy 3°C ⁇ 8°C” can also be displayed in discoloration.
  • FIG. 8 it is a schematic diagram of the interface in FIG. 7 after updating.
  • the text target "air quality 62 (good air)” matching "good air” in the broadcast text and "good air” can be displayed in discoloration.
  • the broadcast text includes: "It is cloudy today in Laoshan District, 3 to 8 degrees Celsius, the average temperature today is 2 degrees lower than yesterday's, it is advisable to wash the car, ".
  • the display device can split the broadcast text into multiple character groups, which are: “Laoshan District”, “Today”, “Cloudy, 3 to 8 degrees Celsius”, “Today”, “Average temperature”, “Ratio”, “Yesterday”. “low 2 degrees”, “should wash the car”, and so on.
  • Each character group is used as a broadcast point.
  • the color of the broadcast point can be changed.
  • the color of "should wash the car” is different from the color of other broadcast points, indicating that the broadcast of course The progress is "should wash the car”.
  • Figure 9 also includes text targets matching the broadcast text. For example, in the middle of Figure 9, “car washing is suitable” matches the broadcast point "car washing is suitable”. Therefore, “car washing is suitable” can also be set as the broadcast point.
  • the voice broadcast reaches "Car washing is suitable”
  • the color-changing display of "Car washing is suitable” can also be displayed; further, the graphic target in Figure 9, that is, the car icon corresponding to "Car washing is suitable” can also be highlighted. For example, Change the background color of the car icon area to make the background color of the car icon different from other graphic icons.
  • FIG. 7-FIG. 9 it can be seen from FIG. 7-FIG. 9 that when broadcasting to different broadcasting points, different graphic and text objects can be highlighted and displayed respectively to prompt the user of the current broadcasting progress.
  • the display device provided by the present application can analyze the speech analysis result after receiving the semantic analysis result, and when the semantic analysis result includes broadcast data, can detect the broadcast target corresponding to the broadcast data in the UI interface. , When the broadcast reaches the broadcast target, the broadcast target will be highlighted, so that the user can see the content of the current voice broadcast on the UI interface, and the broadcast text of the voice broadcast can be linked with the UI interface, which improves the user experience. experience.
  • the information provided by the service type data and the broadcast data may be relatively limited.
  • the server may also actively generate recommendation data, which is generated according to the recommendation data, service type data and broadcast data. response data.
  • the server may further extract keywords from the broadcast data, and obtain recommendation data corresponding to the keywords according to preset recommendation rules.
  • the preset recommendation rules may include the mapping relationship between keywords and recommendation data. If there are multiple keywords in the broadcast data, multiple sets of recommendation data can be obtained, and the server sets the broadcast data for each group of recommendation data. the corresponding keywords in .
  • An exemplary recommendation rule may be: if the keyword is a certain action, the recommendation data includes business information that can perform the action; if the keyword is a certain person, the recommendation data includes the encyclopedia information and/or work information of the person.
  • the recommendation data is determined according to the keywords in the broadcast data, so that in the subsequent voice broadcast, the relevant recommended content can be displayed following the progress of the voice broadcast, and the relevance of the content of the voice broadcast and the UI interface can be improved.
  • the recommendation data may not be limited to being determined according to the keywords of the broadcast data, but may also be determined in other ways, for example, by analyzing the business type data, obtaining the text, pictures, etc. corresponding to the business type data, and then in the These content-related data are queried on the network or in a pre-established database as recommended data, wherein the query method can be implemented based on existing technologies such as text search and image search, which will not be described in detail in this application.
  • the recommendation data may include some text data and graphic data
  • the display device may display the content corresponding to the text data and the content corresponding to the graphic data on the voice interactive interface.
  • the recommendation data may include jump instructions of some applications, and the jump instructions may is configured to jump to an interface of an application in response to a trigger.
  • the jump will fail. Even if the application is installed on the display device, the hardware of the display device does not support the functions of the application, and the jump to the application will not work properly. Using its functionality, this case can also be considered a jump failure.
  • the server may deliver recommendation data to the display device according to the terminal capability parameter of the display device, wherein the terminal capability parameter may include hardware capability parameter and software capability parameter, and the software capability parameter may include A list of installed applications, and the hardware capability parameters include a list of hardware that displays device configurations, such as cameras, positioning modules, and more.
  • the server may obtain the terminal capability parameters of the display device in advance before the display device obtains the response data of the voice command from it, and then calculate the text of the voice command provided by the display device after the display device obtains the response data from it.
  • the corresponding recommendation data and then check whether the terminal capability parameters of the display device support the jump of the application corresponding to the recommended data. If all the recommended data are supported, then send all the recommended data to the display device. If it does not support any set of recommended data , no recommendation data is sent to the display device, and if only part of the recommendation data is supported, only the recommendation data it supports is sent to the display device.
  • the server may not acquire the terminal capability parameters of the display device in advance, and the display device also sends the terminal capability parameters to the server when sending the text corresponding to the voice command to the server.
  • the server may also calculate the recommended data supported by the display device according to the terminal capability parameter of the display device, and may also filter the recommended data supported by the display device according to the terminal capability parameter of the display device after preliminary determination of the recommended data , so that only the recommended data supported by the display device can be sent to the display device.
  • the server may not detect the terminal capability parameters of the display device, and by default the display device supports all recommended data, and directly sends the recommended data calculated according to the broadcast data or service type data to the display device. If the display device supports the recommended data according to its own terminal capability parameters, it will display the content of the recommended data on the interactive interface. If it does not support the recommended data according to its own terminal capability parameters, it will not display the content of the recommended data on the voice interactive interface, which can also avoid displaying the content of the recommended data. The content of the recommended data, but does not support the application jumping problem corresponding to the recommended data.
  • the server since the server may not actively calculate and deliver the recommendation data, or may not calculate the recommendation data that conforms to the terminal capability parameters of the display device, at this time, the server responds to the text of the voice command delivered by the display device
  • the data does not include recommendation data.
  • the response data is provided with a recommendation data identifier, the data corresponding to the recommendation data identifier is empty, or there is no recommendation data identifier in the response data.
  • the display device may send a recommendation data request to the server according to the above two situations, or other situations where the identification response data does not contain recommendation data, so that the server passively sends the recommendation data.
  • the data corresponding to the recommended data identifier is empty, indicating that the server has performed the recommended data calculation, but has not obtained recommended data that conforms to the terminal capability parameters of the display device.
  • the server may not pre-store the terminal of the display device.
  • the capability parameter, or the terminal capability parameter of the display device stored in the server does not support the display of recommended data. Therefore, the display device can send a recommendation data request to the server according to the recommendation data that the corresponding data is empty, and attach the current terminal capability parameters of the display device to the recommendation data request, so that the server can calculate the terminal capability parameters according to the request.
  • Recommendation data so as to solve the previous problem of recommendation data not supported by the display device due to lack of terminal capability parameters or outdated terminal capability parameters.
  • the recommendation data is sent to the display device.
  • the latest terminal capability parameters may still not support the display of recommended data.
  • the data sent by the server to the display device may still not include the recommended data, and the display device does not The recommendation data is then requested from the server.
  • the response data does not contain a recommendation data identifier, indicating that the server is configured to not actively perform recommendation data calculation by default.
  • the display device may send a recommendation data request to the server, and attach the recommendation data request.
  • the current terminal capability parameters of the device are displayed on the upper panel, so that the server can calculate and deliver recommendation data according to the terminal capability parameters in the request.
  • the display device may not attach the current terminal capability parameters of the display device to the recommendation data request, so that the server defaults that the display device supports all recommendation data, and sends recommendation data calculated according to broadcast data or service type data to the display device.
  • the terminal capability parameters uploaded by the display device to the server may include all hardware capability parameters and software capability parameters of the display device.
  • the terminal capability parameters uploaded by the display device to the server may also be parameters corresponding to the answer data, and the display device may pre-store the mapping relationship between keywords and terminal capability parameters.
  • keywords can be extracted from the answer data, the terminal capability parameter corresponding to the keyword can be detected, and the terminal capability parameter corresponding to the keyword can be uploaded to the server, so that the server can calculate the recommendation data according to the terminal capability parameter corresponding to the keyword. .
  • the server actively sends the recommendation data
  • the display device after the display device receives the response data containing the recommendation data, it can generate a UI interface containing the corresponding content of the recommendation data according to the response data, and can control the display after the UI interface is generated.
  • the UI interface controls the audio output device to play the text corresponding to the broadcast data. If the server does not actively deliver the recommendation data, it will take a certain amount of time for the display device to obtain the recommendation data from the server through the recommendation data request.
  • the display device can be configured to first generate and display based on the service type data and broadcast data.
  • UI interface update the UI interface after receiving the recommendation data.
  • the display device can also be configured to generate and display a UI interface after acquiring the recommendation data, and start broadcasting the data.
  • the display device may mark the response data as initial response data, and the initial response data may include broadcast data, service type data and recommendation data. If the display device responds directly according to the initial response data, the voice broadcast may be independent of the UI interface.
  • the display device can process the initial response data to obtain final response data, and respond according to the final response data to obtain the effect of linking the voice broadcast with the UI interface.
  • the initial response data does not include broadcast number data, the response can be directly based on the initial response data.
  • the process of processing the initial response data by the display device to obtain the final response data can be referred to the following description.
  • the recommendation control can be configured according to the recommendation data, and the recommendation control can be triggered in response to a trigger.
  • the display device can configure the trigger condition of the recommended control as: when the recommended control is in focus, if it receives a confirmation signal input by the user, such as the signal sent by the confirmation button on the remote control , the recommended control is triggered.
  • a response interface may be generated according to the answer data and the recommendation data, so that the response interface includes graphic objects corresponding to the answer data and recommendation controls corresponding to the recommendation data.
  • the display device may detect the target graphic and text corresponding to the broadcast text in the graphic-text object corresponding to the answer data on the response interface.
  • the target image and text may be an object associated with the broadcast text and capable of special display, such as text, graphics, etc.
  • the special display refers to the display that is different from the display before the broadcast.
  • the target graphics may include target text. Since the content corresponding to the answer data usually includes the same text as the broadcast text, for example, a dialog box is set on the response interface, and the text displayed in the dialog box can be the same as the broadcast text, so it can be obtained from the text displayed on the response interface. Determine the target text that needs to be displayed as the broadcast progresses.
  • FIGS. 10-13 show schematic diagrams of voice interaction interfaces when performing content recommendation according to some embodiments.
  • a recommendation control can be generated.
  • the recommendation data corresponding to the broadcast text "Yi Car Wash” is a nearby car wash shop
  • the associated APPs are a group purchase APP and a map APP
  • multiple recommendation controls can be generated, among which the recommendation control on the left is configured
  • the recommended control on the right is configured to switch to the car wash shop navigation interface of a map APP.
  • the display effect of these recommended controls can be updated. For example, these recommended controls can be adjusted to be displayed in a prominent position on the current interface, such as a central position.
  • the broadcast points may be highlighted and displayed in sequence according to the sequence of the voice broadcast.
  • the broadcast text includes "XX1, XX2, XX3, XX4, XX5", XX1, XX3, XX4, XX5 each correspond to some recommended controls, wherein XX1, XX3, XX4, XX5 each correspond to two recommended controls, one It is an avatar control, which belongs to a graphic control, and the other is an entry control, which belongs to a graphic control that displays avatars and text introductions.
  • the target text corresponding to XX1 and the recommended controls are updated and displayed, wherein, the method for updating the display effect of the target text corresponding to XX1 can be to change the color of the target text, and the avatar control corresponding to XX1 is updated and displayed
  • the method of the effect can be to change the color of the edge of the avatar control
  • the method of updating the display effect of the entry control corresponding to XX1 can be to display the control below the avatar control, and get the focus of the display device
  • XX3, XX4, XX5 correspond to The entry control of is not displayed for the time being.
  • the display effect of the target text corresponding to XX2 is updated, wherein the method of updating the display effect of the target text corresponding to XX1 may be to change the color of the target text. Since XX2 does not have a corresponding push control, the recommended control corresponding to XX1 can still maintain the display effect when XX1 is broadcast, or restore to the display effect before the broadcast of XX1.
  • the method of updating the display effect of the target text corresponding to XX3 can be to change the color of the target text
  • the avatar control corresponding to XX3 is updated to display
  • the method of the effect can be to change the color of the edge of the avatar control
  • the method of updating the display effect of the entry control corresponding to XX3 can be to display the control below the avatar control, and get the focus of the display device
  • XX1, XX4, XX5 correspond to
  • the entry control of XX1 will not be displayed for the time being, or, the entry control corresponding to XX1 can also be moved to the lower part of the entry control corresponding to XX3, so that the display priority of the entry control corresponding to XX1 is lower than that of the entry control corresponding to XX3. display priority.
  • the method for updating the display effect of the target text corresponding to XX5 can be to change the color of the target text
  • the avatar control corresponding to XX5 is updated and displayed
  • the method of the effect can be to change the color of the edge of the avatar control
  • the method of updating the display effect of the entry control corresponding to XX5 can be to display the control below the avatar control, and obtain the focus of the display device
  • XX1, XX3, XX4 correspond to The entry control is not displayed, or, the entry control corresponding to XX4 can also be moved to the bottom of the entry control corresponding to XX5, so that the display priority of the entry control corresponding to XX1 is lower than that of the entry control corresponding to XX3.
  • the display device can analyze the response data after receiving the response data corresponding to the voice command, and when the response data includes recommendation data, it can generate a recommendation control in the UI interface, and broadcast it during the broadcast.
  • the recommendation control will be highlighted, so that the user can see the recommended information of the content of the current voice broadcast on the UI interface after inputting a voice command, and can obtain the recommendation information without entering the voice command again.
  • the recommendation information reduces the interaction steps and improves the interaction experience; further, by automatically highlighting the broadcast points and recommendation controls in the UI interface along with the progress of the voice broadcast, users can see the voice on the UI interface.
  • the progress of the broadcast does not need to listen to the content of the voice broadcast, which reduces the user's attention burden and improves the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请实施例提供的显示设备及语音交互方法,所述显示设备包括显示器;控制器,与所述显示器连接,所述控制器被配置为:接收用户输入的语音指令;响应于所述语音指令,获取所述语音指令对应的响应数据;在所述响应数据包括播报数据时,从所述响应数据中得到至少两个所述播报数据对应的播报目标;控制音频输出装置播放所述播报数据对应的音频,在播放到所述播报目标时,将所述播放目标在所述显示器上进行强调显示。本申请解决了语音交互体验不佳的技术问题。

Description

显示设备及语音交互方法
本申请要求在2021年3月18日提交中国专利局、申请号为202110291989.6、名称为“显示设备及语音交互方法”的中国专利申请的优先权,以及在2021年3月25日提交中国专利局、申请号为202110320136.0、名称为“显示设备、服务器及语音交互方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音交互技术领域,尤其涉及一种显示设备及语音交互方法。
背景技术
随着电视朝着智能化的方向不断发展,当今的电视,除了支持传统的遥控器控制之外,还能支持语音控制。用户可向电视输入一段语音,电视可从这段语音中识别出文本,然后通过网络查询这段文本的语义,根据预设的语义与显示设备的业务之间的关系进行响应。例如,用户可向电视输入的语音为一个查询语句,显示设备对该语音的响应可为显示这个查询语句对应的答案并朗读该答案。然而,相关技术中,显示设备朗读答案的过程和显示答案的过程相互独立,导致用户需要集中注意力一边聆听朗读的答案,一边观看显示的答案,使得语音交互体验不佳。
发明内容
为解决语音交互体验不佳的技术问题,本申请提供的显示设备及语音交互方法。
第一方面,本申请提供的显示设备,该显示设备包括:
显示器;
控制器,与所述显示器连接,所述控制器被配置为:
接收用户输入的语音指令;
响应于所述语音指令,获取所述语音指令对应的响应数据;
在所述响应数据包括音频数据和显示数据时,若所述显示数据包括所述语音指令的答案数据和所述语音指令的推荐数据,生成包含所述答案数据对应的图文对象、以及所述推荐数据对应的推荐控件的响应界面,其中,所述推荐控件被配置为响应于触发时跳转至所述推荐控件对应的用户界面;
控制所述显示器显示所述响应界面,并控制与之相连接的音频输出装置播放所述音频数据对应的音频。
第二方面,本申请提供的服务器,该服务器被配置为:
接收来自显示设备的语音指令转换后的文本;
根据预设的业务处理规则,获取所述文本对应的答案数据;
根据预设的推荐规则,获取所述答案数据对应的推荐数据;
向所述显示设备发送所述答案数据和推荐数据。
第三方面,本申请提供的语音交互方法,该方法包括:
接收用户输入的语音指令;
响应于所述语音指令,获取所述语音指令对应的响应数据;
在所述响应数据包括音频数据和显示数据时,若所述显示数据包括所述语音指令的答 案数据和所述语音指令的推荐数据,生成包含所述答案数据对应的图文对象、以及所述推荐数据对应的推荐控件的响应界面,其中,所述推荐控件被配置为响应于触发时跳转至所述推荐控件对应的用户界面;
控制所述显示器显示所述响应界面,并控制与之相连接的音频输出装置播放所述音频数据对应的音频。
第三方面,本申请提供的显示设备,该显示设备包括:
显示器;
控制器,与所述显示器连接,所述控制器被配置为:
接收用户输入的语音指令;
响应于所述语音指令,获取所述语音指令对应的响应数据;
在所述响应数据包括播报数据时,从所述响应数据中得到至少两个所述播报数据对应的播报目标;
控制音频输出装置播放所述播报数据对应的音频,在播放到所述播报目标时,将所述播放目标在所述显示器上进行强调显示。
第四方面,本申请提供的语音交互方法,该方法包括:
接收用户输入的语音指令;
响应于所述语音指令,获取所述语音指令对应的响应数据;
在所述响应数据包括播报数据时,从所述响应数据中得到至少两个所述播报数据对应的播报目标;
控制音频输出装置播放所述播报数据对应的音频,在播放到所述播报目标时,将所述播放目标在所述显示器上进行强调显示。
附图说明
为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1中示例性示出了根据一些实施例的显示设备与控制装置之间操作场景的示意图;
图2中示例性示出了根据一些实施例的控制装置100的硬件配置框图;
图3中示例性示出了根据一些实施例的显示设备200的硬件配置框图;
图4中示例性示出了根据一些实施例的显示设备200中软件配置示意图;
图5中示例性示出了根据一些实施例的语音交互原理的示意图;
图6中示例性示出了根据一些实施例的语音交互界面的示意图;
图7中示例性示出了根据一些实施例的另一语音交互界面的示意图;
图8中示例性示出了根据一些实施例的另一语音交互界面的示意图;
图9中示例性示出了根据一些实施例的另一语音交互界面的示意图;
图10中示例性示出了根据一些实施例的另一语音交互界面的示意图;
图11中示例性示出了根据一些实施例的另一语音交互界面的示意图;
图12中示例性示出了根据一些实施例的另一语音交互界面的示意图;
图13中示例性示出了根据一些实施例的另一语音交互界面的示意图。
具体实施方式
为使本申请的目的和实施方式更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。
需要说明的是,本申请中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
本申请中说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别类似或同类的对象或实体,而不必然意味着限定特定的顺序或先后次序,除非另外注明。应该理解这样使用的用语在适当情况下可以互换。
术语“包括”和“具有”以及他们的任何变形,意图在于覆盖但不排他的包含,例如,包含了一系列组件的产品或设备不必限于清楚地列出的所有组件,而是可包括没有清楚地列出的或对于这些产品或设备固有的其它组件。
术语“模块”是指任何已知或后来开发的硬件、软件、固件、人工智能、模糊逻辑或硬件或/和软件代码的组合,能够执行与该元件相关的功能。
图1为根据实施例中显示设备与控制装置之间操作场景的示意图。如图1所示,用户可通过智能设备300或控制装置100操作显示设备200。
在一些实施例中,控制装置100可以是遥控器,遥控器和显示设备的通信包括红外协议通信或蓝牙协议通信,及其他短距离通信方式,通过无线或有线方式来控制显示设备200。用户可以通过遥控器上按键、语音输入、控制面板输入等输入用户指令,来控制显示设备200。
在一些实施例中,也可以使用智能设备300(如移动终端、平板电脑、计算机、笔记本电脑等)以控制显示设备200。例如,使用在智能设备上运行的应用程序控制显示设备200。
在一些实施例中,显示设备200还可以采用除了控制装置100和智能设备300之外的方式进行控制,例如,可以通过显示设备200设备内部配置的获取语音指令的模块直接接收用户的语音指令控制,也可以通过显示设备200设备外部设置的语音控制设备来接收用户的语音指令控制。
在一些实施例中,显示设备200还与服务器400进行数据通信。可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向显示设备200提供各种内容和互动。服务器400可以是一个集群,也可以是多个集群,可以包括一类或多类服务器。
图2示例性示出了根据示例性实施例中控制装置100的配置框图。如图2所示,控制装置100包括控制器110、通信接口130、用户输入/输出接口140、存储器、供电电源。控制装置100可接收用户的输入操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起用用户与显示设备200之间交互中介作用。
图3示出了根据示例性实施例中显示设备200的硬件配置框图。
在一些实施例中,显示设备200包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、显示器260、音频输出接口270、存储器、供电电源、用户接口中的至少一种。
在一些实施例中控制器包括处理器,视频处理器,音频处理器,图形处理器,RAM,ROM,用于输入/输出的第一接口至第n接口。
在一些实施例中,显示器260包括用于呈现画面的显示屏组件,以及驱动图像显示的驱动组件,用于接收源自控制器输出的图像信号,进行显示视频内容、图像内容以及菜单操控界面的组件以及用户操控UI界面。
在一些实施例中,显示器260可为液晶显示器、OLED显示器、以及投影显示器,还可以为一种投影装置和投影屏幕。
在一些实施例中,通信器220是用于根据各种通信协议类型与外部设备或服务器进行通信的组件。例如:通信器可以包括Wifi模块,蓝牙模块,有线以太网模块等其他网络通信协议芯片或近场通信协议芯片,以及红外接收器中的至少一种。显示设备200可以通过通信器220与外部控制设备100或服务器400建立控制信号和数据信号的发送和接收。
在一些实施例中,用户接口,可用于接收控制装置100(如:红外遥控器等)的控制信号。
在一些实施例中,检测器230用于采集外部环境或与外部交互的信号。例如,检测器230包括光接收器,用于采集环境光线强度的传感器;或者,检测器230包括图像采集器,如摄像头,可以用于采集外部环境场景、用户的属性或用户交互手势,再或者,检测器230包括声音采集器,如麦克风等,用于接收外部声音。
在一些实施例中,外部装置接口240可以包括但不限于如下:高清多媒体接口(HDMI)、模拟或数据高清分量输入接口(分量)、复合视频输入接口(CVBS)、USB输入接口(USB)、RGB端口等任一个或多个接口。也可以是上述多个接口形成的复合性的输入/输出接口。
在一些实施例中,控制器250,通过存储在存储器上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250控制显示设备200的整体操作。例如:响应于接收到用于选择在显示器260上显示UI对象的用户命令,控制器250便可以执行与由用户命令选择的对象有关的操作。
在一些实施例中,所述对象可以是可选对象中的任何一个,例如超链接、图标或其他可操作的控件。与所选择的对象有关操作有:显示连接到超链接页面、文档、图像等操作,或者执行与所述图标相对应程序的操作。
在一些实施例中,“用户界面”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。用户界面常用的表现形式是图形用户界面(Graphic User Interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
参见图4,在一些实施例中,将系统分为四层,从上至下分别为应用程序(Applications)层(简称“应用层”),应用程序框架(Application Framework)层(简称“框架层”),安卓运行时(Android runtime)和系统库层(简称“系统运行库层”),以及内核层。
在一些实施例中,应用程序层中运行有至少一个应用程序,这些应用程序可以是操作系统自带的窗口(Window)程序、系统设置程序或时钟程序等;也可以是第三方开发者所开发的应用程序。在具体实施时,应用程序层中的应用程序包不限于以上举例。
框架层为应用程序提供应用编程接口(application programming interface,API)和编程 框架。应用程序框架层包括一些预先定义的函数。应用程序框架层相当于一个处理中心,这个中心决定让应用层中的应用程序做出动作。应用程序通过API接口,可在执行中访问系统中的资源和取得系统的服务。
在一些实施例中,系统运行库层为上层即框架层提供支撑,当框架层被使用时,安卓操作系统会运行系统运行库层中包含的C/C++库以实现框架层要实现的功能。
在一些实施例中,内核层是硬件和软件之间的层。如图4所示,内核层至少包含以下驱动中的至少一种:音频驱动、显示驱动、蓝牙驱动、摄像头驱动、WIFI驱动、USB驱动、HDMI驱动、传感器驱动(如指纹传感器,温度传感器,压力传感器等)、以及电源驱动等。
在一些实施例中的硬件或软件架构可以基于上述实施例中的介绍,在一些实施例中可以是基于相近的其他硬件或软件架构,可以实现本申请的技术方案即可。
为清楚说明本申请的实施例,下面结合图5对本申请实施例提供的一种语音识别网络架构进行描述。
参见图5,图5为本申请实施例提供的一种语音识别网络架构示意图。图5中,智能设备用于接收输入的信息以及输出对该信息的处理结果。语音识别服务设备为部署有语音识别服务的电子设备,语义服务设备为部署有语义服务的电子设备,业务服务设备为部署有业务服务的电子设备。这里的电子设备可包括服务器、计算机等,这里的语音识别服务、语义服务(也可称为语义引擎)和业务服务为可部署在电子设备上的web服务,其中,语音识别服务用于将音频识别为文本,语义服务用于对文本进行语义解析,业务服务用于提供具体的服务如墨迹天气的天气查询服务、QQ音乐的音乐查询服务等。在一个实施例中,图5所示架构中可存在部署有不同业务服务的多个实体服务设备,也可以一个或多个实体服务设备中集合一项或多项功能服务。
一些实施例中,下面对基于图5所示架构处理输入智能设备的信息的过程进行举例描述,以输入智能设备的信息为通过语音输入的查询语句为例,上述过程可包括如下三个过程:
[语音识别]
智能设备可在接收到通过语音输入的查询语句后,将该查询语句的音频上传至语音识别服务设备,以由语音识别服务设备通过语音识别服务将该音频识别为文本后返回至智能设备。在一个实施例中,将查询语句的音频上传至语音识别服务设备前,智能设备可对查询语句的音频进行去噪处理,这里的去噪处理可包括去除回声和环境噪声等步骤。
[语义理解]
智能设备将语音识别服务识别出的查询语句的文本上传至语义服务设备,以由语义服务设备通过语义服务对该文本进行语义解析,得到文本的业务领域、意图等。
[语义响应]
语义服务设备根据对查询语句的文本的语义解析结果,向相应的业务服务设备下发查询指令以获取业务服务给出的查询结果。智能设备可从语义服务设备获取该查询结果并输出。作为一个实施例,语义服务设备还可将对查询语句的语义解析结果发送至智能设备,以由智能设备输出该语义解析结果中的反馈语句。
需要说明的是,图5所示架构只是一种示例,并非对本申请保护范围的限定。本申请实施例中,也可采用其他架构来实现类似功能,例如:三个过程全部或部分可以由智能终 端来完成,在此不做赘述。
在一些实施例中,图5所示的智能设备可为显示设备,如智能电视,语音识别服务设备的功能可由显示设备上设置的声音采集器和控制器配合实现,语义服务设备和业务服务设备的功能可由显示设备的控制器实现,或者由显示设备的服务器来实现。
在一些实施例中,用户通过语音输入显示设备的查询语句或其他交互语句可称为语音指令。
在一些实施例中,显示设备从语义服务设备获取到的是业务服务给出的查询结果,显示设备可对该查询结果进行分析,生成语音指令的响应数据,然后根据响应数据控制显示设备执行相应的动作。例如,对查询结果进行分析后,得到查询结果包括一段设置有播报标识的文本。显示设备可根据预设的响应规则生成响应数据,一种示例性的响应规则是:在获取到设置有播报标识的文本时,在语音交互界面生成包含播报数据对应的文本的对话框,并语音播报该播报数据对应的文本。因此,显示设备可根据该预设的响应规则生成包括UI界面数据和播报数据的响应数据,该UI界面数据对应的UI界面上,设置有包含查询结果对应文本的对话框,播报数据包括播报数据对应的文本和调用音频播放装置播放这段播报数据对应的文本的控制指令。
在一些实施例中,显示设备从语义服务设备获取到的是语音指令的语义解析结果,显示设备可对该语义解析结果进行分析,生成响应数据,然后根据响应数据控制显示设备执行相应的动作。
相关技术中,语音指令对应的响应数据包括业务类型数据,不包括播报数据,业务类型数据可包括UI界面数据和/或显示设备的控制指令。这一般出现在语音场景是用户对显示设备发出指令的场景。例如,在语音指令为增大音量的音量调节指令时,响应数据可包括显示音量条的UI界面数据和增大扬声器音量的控制指令。显示设备根据响应数据调整音量并显示音量条,不进行语音播报。
还有些相关技术中,语音指令对应的响应数据包括播报数据和业务类型数据,其中,业务类型数据可包括UI界面数据。这一般出现在语音场景是一种人机对话的场景,例如,用户发出“今天天气如何”指令,响应于该指令,显示设备需要再将查询结果通过语音播报的形式反馈给用户,这时,语音指令对应的相应数据中包括了播报数据。在语音指令为查询今日天气的指令时,响应数据可包括显示今日天气详情的UI数据和包括气温、风力、湿度等天气信息的播报数据。显示设备可根据响应数据显示UI界面并进行语音播报。
然而,相关技术中,在语音指令对应的响应数据包括播报数据和业务类型数据时,显示设备执行显示UI界面和语音播报的这两个过程相互独立,没有关联,用户需要自行将语音播报的播报文本与UI界面联系起来,体验欠佳。
为解决上述技术问题,在一些实施例中,显示设备在得到语音指令的响应数据后,可在响应数据包括播报数据时,根据UI界面数据生成初始UI界面,在初始UI界面数据中检测播报数据对应的播报目标,显示初始UI界面并进行语音播报,在播报到播报目标时,将播报目标在初始UI界面上进行强调显示,从而使用户能够在UI界面上就能看出当前语音播报的内容,实现了自动将语音播报的播报文本与UI界面联系起来,提升了用户体验。
下面以用户与显示设备的语音交互过程为例,详细介绍上述将语音播报的播报文本与UI界面联系起来的技术方案。
在一些实施例中,显示设备的遥控器上可设置有语音控制按键,用户按住遥控器上的 语音控制按键后,显示设备的控制器可控制显示设备的显示器显示语音交互界面,并控制声音采集器,如麦克风,采集显示设备周围的声音。此时,用户可向显示设备输入语音指令。
在一些实施例中,显示设备可支持语音唤醒功能,显示设备的声音采集器可处于持续采集声音的状态。用户说出唤醒词后,显示设备对用户输入的语音指令进行语音识别,识别出语音指令为唤醒词后,可控制显示设备的显示器显示语音交互界面,此时,用户可继续向显示设备输入语音指令。
在一些实施例中,在用户输入一个语音指令后,在显示设备获取语音指令的响应数据或显示设备根据响应数据进行响应的过程中,显示设备的声音采集器可保持声音采集的状态,用户可随时按住遥控器上的语音控制按键重新输入语音指令,或者说出唤醒词,此时,显示设备可结束上一次的语音交互进程,根据用户新输入的语音指令,开启新的语音交互进程,从而保障语音交互的实时性。
在一些实施例中,在显示设备的当前界面为语音交互界面时,显示设备对用户输入的语音指令进行语音识别后,得到语音指令对应的文本,显示设备自己或显示设备的服务器对该文本进行语义理解后得到用户意图,对用户意图进行处理得到语义解析结果,根据语义解析结果生成响应数据,该响应数据可称为初始响应数据,显示设备如果直接按照该响应数据进行响应,则可能就会出现语音播报与UI界面相独立的情况。
在一些实施例中,为避免语音播报过程影响声音采集器的声音回采,可会语音播报声音进行回声消除。
为避免出现语音播报与UI界面相独立的情况,在一些实施例中,显示设备可对初始响应数据进行处理,得到最终响应数据,根据最终响应数据进行响应,可得到语音播报与UI界面相联系的效果。当然,如果初始响应数据不包含播报数据,则可直接根据初始响应数据进行响应。
以初始响应数据包括播报数据为例,显示设备对初始响应数据进行处理以得到最终响应数据的过程可参见下文描述。
在一些实施例中,若初始响应数据包括播报数据和UI界面数据,可根据UI界面数据生成初始UI界面,然后在初始UI界面上检测播报数据对应的播报目标。其中,播报目标可为与播报文本相关联,且能够进行特殊显示的对象,如文字、图形等,特殊显示是指区别于初始UI界面的显示。
在一些实施例中,播报文本对应的播报目标可包括文本目标。初始UI界面显示的内容通常会包括播报文本。例如,初始UI界面上设置有一个对话框,播报文本位于该对话框中。显示设备在从语义解析结果中提取出播报文本后,可根据一些预设的拆分规则将播报文本拆分成至少两个字符组,将每个字符组分别确定为一个文本目标。
播报文本的一种示例性拆分规则如下:将播报文本拆分成单个汉字,每个汉字可作为一个字符组,如果该汉字与下一个汉字之间有标点符号,则可忽略该标点符号,也可将该标点符号写入该汉字对应的字符组内,或者将该标点符号写入在下一个汉字的字符组内。例如,对于播报文本“你好”,可将其拆分为两个字符组,一个是“你”,另一个是“好!”,从而得到了两个文本目标。
播报文本的一种示例性拆分规则如下:将播报文本拆分成多个词语,标点符号的处理可参考上文。例如,对于播报文本“你好,我叫小A。”,可将其拆分为四个字符组,分别 为:“你好,”、“我”、“叫”、“小A。”,从而得到了四个文本目标。
播报文本的一种示例性拆分规则如下:以标点符号作为区分,将播报文本拆分成多个短句,例如,对于播报文本“你好,我叫小A。”,可将其拆分为两个字符组,分别为:“你好,”、“我叫小A。”,从而得到了两个文本目标。
上述拆分规则仅为示例性的播报文本的拆分规则,实际实施中,拆分规则也可为其他规则。
在一些实施例中,除了播报文本,初始UI界面显示的内容还包括一些其他文本,可将这些文本与字符组进行文本匹配,将与字符组匹配的文本也确定为文本目标。例如,对于播报文本“今天的天气是多云。”,可将其拆分为下列几个字符组,分别为:“今天的”、“天气”、“是”、“多云。”在初始UI界面上,存在语音交互的对话框和今天的天气详情信息,其中,语音交互的对话框内显示有播报文本“今天的天气是多云。”,今天的天气详情信息包括与字符组匹配的文本“多云”,因此,也可将今天的天气详情信息中的“多云”确定为字符组“多云”对应的一个文本目标。
在一些实施例中,文本匹配的规则可为文本相同或文本的含义相同、相近或相关。例如,播放文本对应的字符组为“3到8摄氏度”,初始UI界面的天气详情信息中含有文本“3℃~8℃”,则可根据“3℃~8℃”与“3到8摄氏度”含义相同,将“3℃~8℃”确定为文本目标。
在一些实施例中,播报文本对应的播报目标可包括图形目标。初始UI界面可显示多个图形,部分图形可能设置有文字说明,这些文字说明可能与播报文本相匹配。例如,播报文本为“紫外线弱,不适宜钓鱼,适宜室内锻炼。”可将该播报文本拆分为下列几个字符组,分别为:“紫外线”、“弱,”、“不适宜”、“钓鱼,”、“适宜”、“室内”、“锻炼”。初始UI界面上显示的图形可包括一个太阳图形和鱼图形,且太阳图形一侧设置有文字说明:“紫外线弱”,鱼图形一侧设置有文字说明:“钓鱼一般”。此时,根据文本匹配的规则,可得到文字说明“紫外线弱”与字符组“紫外线”内容相关,文字说明“钓鱼一般”与字符组“钓鱼”内容相关,因此,可将太阳图形确定为字符组“紫外线”对应的一个图形目标,将鱼图形确定为字符组“钓鱼”对应的一个图形目标。
可见,对于一段播报文本,将其拆分成多个字符组后,在初始UI界面上,一个字符组可能对应一个文本目标,也可能对应着多个文本目标。一个字符组还可能对应一个图形目标或者不对应图形目标,在一些实施例中,一个字符组还可能对应多个图形目标。
在一些实施例中,显示设备可为每个字符组设置上编号,便于区分不同的字符组。
在得到播报目标后,为保障显示设备在播放到播报目标对应的文本后,能准时将播报目标进行强调显示,在语音播报的过程中,显示设备需要获取播报进度。
在一些实施例中,显示设备可在语音播报的过程中通过麦克风进行收集声音,从而得到显示设备的扬声器发出的声音,对扬声器发出的声音进行语音到文本的转换,再将转换后的文本与播报文本进行匹配,就可得到当前的播放进度,进而在播报进度与播报目标相对应时,将播报目标进行强调显示。通过该方法获取的播报进度与实际的播报进度的时间差与显示设备进行数据处理的时间大小相同,能够较好地反映显示设备的实际播报进度,从而保障了播报目标进行强调显示的准确性。
然而,上述播报进度的获取方法需要显示设备实时对扬声器发出的声音进行数据处理,性能消耗较大,在显示设备的计算能力较弱时,可能导致显示设备出现卡顿的现象。
在一些实施例中,显示设备可预先计算好从播报起点到播报目标需要花费的时间,然后在播报开始后进行记时,根据显示设备的语音播报速度就能得到播报进度,具体实现如下:
在一些实施例中,显示设备在得到多个播报目标后,可分别计算显示设备在进行语音播报时,从语音播报的起点到播报目标需要花费的时间。
在一些实施例中,显示设备在语音交互时可支持利用不同的音色进行播报,不同的音色播报的语速可能稍有不同,例如,显示设备支持的音色包括女声和男声,女声语速稍快,男声语速稍慢。显示设备可默认设置音色为女声,当然,也可默认设置音色为男声,用户可预先对显示设备的音色进行设置,使显示设备按照用户设置的音色进行语音播报,如果用户不预先设置显示设备的音色,显示设备将按照默认音色进行播报。
在一些实施例中,显示设备可根据当前的音色确定自己的语音播报速度,根据语音播报速度和字符组的起点距离语音播报的起点的字符间距,计算字符组播报的起始时刻和播报时长。在一些实施例中,显示设备在计算出播报目标对应的起始时刻和播报时长后,可将UI界面数据调整为在达到播报目标对应的起始时刻后,即播报进度到达播报目标时,根据预设的强调显示规则,将播报目标进行强调显示;在该播报目标对应的播报时长内;可维持强调显示播报目标;在该播报目标对应的播报时长后,可取消强调显示该播报目标,使用户知道当前播报的进度。
在一些实施例中,也可将UI界面数据调整为在该播报目标对应的播报时长后,也可继续维持强调显示播报目标,一直到需要强调显示下一个播报目标后,再取消强调显示该播报目标。
在一些实施例中,还可将UI界面数据调整为该播报目标对应的播报时长后,也可继续维持强调显示播报目标,到需要强调显示下一个播报目标后,也不取消强调显示该播报目标,而是同时强调显示这两个播报目标,也可使用户得知当然播报的进度。
在一些实施例中,还可将UI界面数据调整为该播报目标对应的播报时长后,也可继续维持强调显示播报目标,到需要强调显示下一个播报目标后,也不取消强调显示该播报目标,而是同时强调显示这两个播报目标,但是将上一个播报目标的强调显示等级降低。
在一些实施例中,若播报目标为文本目标,则将播报目标进行强调显示的方法可为将播报目标进行变色,使所述播放目标的颜色与之前不同。不同的强调等级对应的文本颜色可不相同,高强调等级对应的文本颜色与背景色的色差可相对较大。
在一些实施例中,若播报目标为图形目标,则将播报目标进行强调显示的方法可为将播报目标显示在所述播放界面的最上方,或显示在所述播放界面上的非播放目标的最前方。不同的强调等级对应的图形目标的排列位置可不相同,高强调等级对应的图形目标的排列可更为靠上或更为靠前。
在一些实施例中,若播报目标为图形目标,则将播报目标进行强调显示的方法可为将所述显示设备的焦点设置在所述播放界面的播放目标上。
根据上述强调显示的规则将UI界面数据进行调整后,调整后的UI界面数据对应的UI界面可为随播报进度变化的动态界面,根据调整后的UI界面数据和播报数据可得到最终响应数据。
在一些实施例中,显示设备在得到最终响应数据后,可控制音频输出装置开始起播播报文本,并控制显示器显示初始UI界面,在播报到播报目标后,将播报目标在初始UI界 面上进行强调显示,在播报完一个播报目标后,取消强调显示该播报目标。
其中,音频输出装置可以与图3中的音频输出接口相对应,其中音频输出接口可以包括或者连接扬声器以及外接音响输出端子。
为对上述语音交互过程中的显示界面变化做进一步描述,图6-图8示出了根据一些实施例的语音交互界面示意图。
参见图6,播报文本包括:“为您找到以下笑话写作班同学须在堂上写一篇短故事、要包括宗教、皇室、……”。显示设备可将播报文本拆分成多个字符组,第一个字符组可为“为您找到以下笑话”,第二个字符组及后续的字符组可包括单个字符,即第二个字符组为“写”、第三个字符组为“作”、第四个字符组为“班”,以此类推。
每个字符组均为作为一个播报目标,在播报到一个播报目标时,可将该播报目标的颜色进行变色,图6中,“写作班同学须在堂上写一篇短故事、要包括宗教、”的颜色区别于其他播报目标的颜色,表明当然的播报进度是“教”。
参见图7,播报文本包括:“崂山区今天多云,3到8摄氏度,……”。显示设备可将播报文本拆分成多个字符组,第一个字符组可为“崂山区”,第二个字符组可为“今天”,第三个字符组可为“多云,3到8摄氏度”,以此类推。
每个字符组均为作为一个播报目标,在播报到一个播报目标时,可将该播报目标的颜色进行变色,图7中,“多云,3到8摄氏度”的颜色区别于其他播报目标的颜色,表明当然的播报进度是“多云,3到8摄氏度”。
图7中还包括与播报文本匹配的文字目标,例如,图7的左下角的“多云3℃~8℃”与播报目标“多云,3到8摄氏度”相匹配,因此,还可将“多云3℃~8℃”设置为播报目标,当语音播报到“多云,3到8摄氏度”时,可将“多云3℃~8℃”也进行变色显示。
参见图8,为图7中的界面更新后的示意图。如图8所示,当播报进度为“空气良”时,播报文本中的“空气良”和“空气良”匹配的文字目标“空气质量62(空气良)”均可变色显示。
根据图7、图8可见,在播报到不同的播报目标时,不同的播报目标可分别强调显示,以提示用户当前的播报进度。
参见图9,播报文本包括:“崂山区今天多云,3到8摄氏度,今天平均温度比昨天的低2度,宜洗车,……”。显示设备可将播报文本拆分成多个字符组,依次为:“崂山区”、“今天”、“多云,3到8摄氏度”、“今天”、“平均温度”、“比”、“昨天的”、“低2度”、“宜洗车”,以此类推。每个字符组均作为一个播报点,在播报到一个播报点时,可将该播报点的颜色进行变色,图9中,“宜洗车”的颜色区别于其他播报点的颜色,表明当然的播报进度是“宜洗车”。
图9中还包括与播报文本匹配的文字目标,例如,图9的中部“洗车较适宜”与播报点“宜洗车”相匹配,因此,还可将“洗车较适宜”设置为播报点,当语音播报到“宜洗车”时,可将“洗车较适宜”也进行变色显示;进一步的,还可将图9中的图形目标,即“洗车较适宜”对应的汽车图标进行强调显示,例如,将该汽车图标区域的底色进行变色,从而使汽车图标的底色区别于其他图形图标。
根据图7-图9可见,在播报到不同的播报点时,不同的图文对象可分别强调显示,以提示用户当前的播报进度。
由上述实施例可见,本申请提供的显示设备,在接收到语义解析结果后,可对语音解 析结果进行分析,在语义解析结果包括播报数据时,可在UI界面中检测播放数据对应的播报目标,在播报到播报目标时,将播报目标进行强调显示,从而使用户能够在UI界面上就能看出当前语音播报的内容,实现了将语音播报的播报文本与UI界面联系起来,提升了用户体验。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
在一些实施例中,业务类型数据和播报数据提供的信息可能较为有限,为丰富一次语音交互下用户能够获取的信息,服务器还可主动生成推荐数据,根据推荐数据、业务类型数据和播报数据生成响应数据。
在一些实施例中,服务器在得到播报数据后,还可从该播报数据中提取出关键词,根据预设的推荐规则,获取关键词对应的推荐数据。其中,预设的推荐规则可包括关键词和推荐数据之间的映射关系,如果播报数据中有多个关键词,则可得到多组推荐数据,服务器为每组推荐数据分别设置其在播报数据中对应的关键词。示例性的推荐规则可为:关键词为某个动作,则推荐数据包括能够执行该动作的商家信息;关键词为某个人物,则推荐数据包括该人物的百科信息和/或作品信息。推荐数据根据播报数据中的关键词确定,可使得在后续进行语音播报时,能够跟随语音播报的进度展示相关的推荐内容,提升语音播报的内容与UI界面的关联性。
在一些实施例中,推荐数据还可不限于根据播报数据关键词确定,还可根据其他方式确定,例如,通过对业务类型数据进行分析,获取业务类型数据对应中的文本、图片等内容,然后在网络上或预先建立的数据库中查询这些内容关联的数据,作为推荐数据,其中,查询方法可基于现有技术如文本搜索、图片搜索来实现,本申请不再详细说明。
在一些实施例中,推荐数据可包括一些文本数据和图形数据,显示设备可将这些文本数据对应的内容和图形数据对应的内容在语音交互界面上进行展示。
在一些实施例中,为了使用户能进一步了解推荐数据对应的内容,以及推广推荐数据关联的商家信息、媒资信息等信息,推荐数据可包括一些应用程序的跳转指令,该跳转指令可被配置为响应于触发时跳转至某个应用程序的某个界面。然而,如果显示设备没有安装该应用程序,则会导致跳转失败,即便显示设备安装了该应用程序,然而显示设备的硬件不支持该应用程序的功能,跳转至该应用程序后也不能正常使用其功能,这种情况也可认为跳转失败。为了避免出现跳转失败的情况,服务器可根据显示设备的终端能力参数向显示设备下发推荐数据,其中,终端能力参数可包括硬件能力参数和软件能力参数,软件能力参数包可括显示设备上安装的应用程序列表,硬件能力参数包括显示设备配置的硬件列表,如摄像头、定位模块等等。
在一些实施例中,服务器可在显示设备向其获取语音指令的响应数据前预先获取显示设备的终端能力参数,然后在显示设备向其获取响应数据后,计算显示设备的提供的语音指令的文本对应的推荐数据,然后检测显示设备的终端能力参数是否支持推荐数据对应的应用程序的跳转,如果支持全部的推荐数据,再向显示设备发送全部的推荐数据,如果不支持任何一组推荐数据,则不向显示设备发送任何推荐数据,如果仅支持部分推荐数据, 则仅向显示设备发送其支持的推荐数据。当然,服务器也可不预先获取显示设备的终端能力参数,显示设备在向服务器发送语音指令对应的文本时,一并将终端能力参数发送给服务器。
在一些实施例中,服务器也可根据显示设备的终端能力参数,计算显示设备支持的推荐数据,也可在初步确定推荐数据后,再根据显示设备的终端能力参数,筛选显示设备支持的推荐数据,从而可只向显示设备发送显示设备支持的推荐数据。
在一些实施例中,服务器也可不检测显示设备的终端能力参数,默认显示设备支持所有推荐数据,直接向显示设备发送根据播报数据或业务类型数据计算的推荐数据。显示设备根据自己的终端能力参数支持推荐数据,则在交互界面显示推荐数据的内容,根据自己的终端能力参数不支持推荐数据,则不在语音交互界面显示推荐数据的内容,这样也能够避免显示了推荐数据的内容,但不支持推荐数据对应的应用程序跳转的问题。
在一些实施例中,由于服务器可能不主动计算并下发推荐数据,也可能没计算出符合显示设备的终端能力参数的推荐数据,此时,服务器向显示设备下发的语音指令的文本的响应数据不包括推荐数据,例如,响应数据设置有推荐数据标识,该推荐数据标识对应的数据为空,或者响应数据中没有推荐数据标识。显示设备可根据上述这两种情况,或其他标识响应数据中不包含推荐数据的情况,向服务器发送推荐数据请求,使服务器被动下发推荐数据。
在一些实施例中,推荐数据标识对应的数据为空,表示服务器进行了推荐数据计算,但没得到符合显示设备的终端能力参数的推荐数据,这种情况可能是服务器没有预先存储显示设备的终端能力参数,或者服务器存储的显示设备的终端能力参数不支持推荐数据的展示。因此,显示设备可根据推荐数据标识对应的数据为空,向服务器发送推荐数据请求,并在推荐数据请求中附上显示设备当前的终端能力参数,以使服务器根据该请求中的终端能力参数计算推荐数据,从而解决之前由于缺乏终端能力参数或者终端能力参数过时而导致没有得到显示设备支持的推荐数据的问题,在计算出推荐数据后,将推荐数据下发至显示设备。当然,即便显示设备上传了最新的终端能力参数,该最新的终端能力参数可能仍然不支持推荐数据的展示,此时,服务器向显示设备下发的数据中可仍不包括推荐数据,显示设备不再向服务器请求推荐数据。
在一些实施例中,响应数据中不包含推荐数据标识,表示服务器被配置为默认不主动进行推荐数据计算,这种情况下,显示设备可向服务器发送推荐数据请求,并在推荐数据请求中附上显示设备当前的终端能力参数,以使服务器根据该请求中的终端能力参数计算并下发推荐数据。当然,显示设备也可不在推荐数据请求中附上显示设备当前的终端能力参数,使服务器默认显示设备支持所有推荐数据,向显示设备发送根据播报数据或业务类型数据计算的推荐数据。
在一些实施例中,显示设备上传给服务器的终端能力参数可包括显示设备的全部硬件能力参数和软件能力参数。
在一些实施例中,显示设备上传给服务器的终端能力参数还可为与答案数据对应的参数,显示设备可预存有关键词与终端能力参数的映射关系,在接收到服务器发送的语音指令对应的答案数据后,可从答案数据中提取关键词,检测关键词对应的终端能力参数,将该关键词对应的终端能力参数上传给服务器,以使服务器根据该关键词对应的终端能力参数计算推荐数据。
示例性的,显示设备根据响应数据进行响应的过程可参见下文描述。
需要注意的是,若服务器主动下发推荐数据,则显示设备接收到包含推荐数据的响应数据后,就可根据响应数据生成包含推荐数据对应内容的UI界面,在生成UI界面后可控制显示器显示该UI界面并控制音频输出装置播放播报数据对应的文本。若服务器没有主动下发推荐数据,显示设备通过推荐数据请求向服务器获取推荐数据需要耗费一定时间,为保障语音交互的及时性,显示设备可被配置为先根据业务类型数据和播报数据生成并展示UI界面,在接收到推荐数据后再更新UI界面,如果在语音播放结束时还是没有接收到推荐数据,就无需根据推荐数据更新UI界面。当然,显示设备也可被配置为在获取到推荐数据后,再生成并展示UI界面,并开始播报数据。
在一些实施例中,显示设备在接收到服务器发送的语音指令的文本的响应数据后,可将该响应数据标记为初始响应数据,初始响应数据可包括播报数据、业务类型数据和推荐数据。显示设备如果直接按照该初始响应数据进行响应,则可能就会出现语音播报与UI界面相独立的情况。
因此,在一些实施例中,显示设备可对初始响应数据进行处理,得到最终响应数据,根据最终响应数据进行响应,可得到语音播报与UI界面相联系的效果。当然,如果初始响应数据不包含播报数数据,则可直接根据初始响应数据进行响应。
以初始响应数据包括播报数据为例,显示设备对初始响应数据进行处理以得到最终响应数据的过程可参见下文描述。
在一些实施例中,若初始响应数据包括播报数据和UI界面数据,而UI界面数据包括语音指令的答案数据和语音指令的推荐数据,可根据推荐数据配置推荐控件,推荐控件响应于触发时可跳转至对应的应用程序的界面,显示设备可将推荐控件的触发条件配置为:在推荐控件为在获取焦点时,如果接收到用户输入的确认信号,如遥控器上的确认按键发出的信号,则触发推荐控件。
在一些实施例中,在配置完推荐控件后,可根据答案数据和推荐数据生成响应界面,使响应界面上包括答案数据对应的图文对象、以及推荐数据对应的推荐控件。
在一些实施例中,显示设备可在响应界面上的答案数据对应的图文对象中检测播报文本对应的目标图文。其中,目标图文可为与播报文本相关联,且能够进行特殊显示的对象,如文字、图形等,特殊显示是指区别于播报前的显示。
在一些实施例中,目标图文可包括目标文本。由于答案数据对应的内容通常会包括与播报文本相同的文本,例如,响应界面上设置有一个对话框,该对话框中显示的文本可与播报文本相同,因此可从响应界面上显示的文本中确定需要随着播报进度变化显示效果的目标文本。
图10-图13示出了根据一些实施例的进行内容推荐时的语音交互界面示意图。
在一些实施例中,播报文本有对应的推荐数据,则可生成推荐控件。例如,参见图10,播报文本“宜洗车”对应的推荐数据为附近的洗车店,关联的APP为某团购APP和某地图APP,则可生成多个推荐控件,其中左侧的推荐控件被配置为调转至某团购APP的洗车店商家界面,右侧的推荐控件被配置为调转至某地图APP的洗车店导航界面。当播报至“宜洗车”时,可将这些推荐控件更新显示效果,例如,将这些推荐控件调整至显示在当前界面的显眼位置,如中心位置。
在一些实施例中,根据播报文本生成的推荐控件有多个,则可按照语音播报的顺序, 依次将播报点进行强调显示。例如,在播报文本包括“XX1,XX2,XX3,XX4,XX5”时,XX1,XX3,XX4,XX5各对应一些推荐控件,其中,XX1,XX3,XX4,XX5各分别对应两个推荐控件,一个是头像控件,该头像控件属于图形控件,另一个是词条控件,该词条控件属于显示内容为头像和文本介绍的图形控件。
参见图11,在播报至XX1时,将XX1对应的目标文本和推荐控件更新显示效果,其中,XX1对应的目标文本更新显示效果的方法可为将目标文本进行变色,XX1对应的头像控件更新显示效果的方法可为将头像控件的边缘进行变色,XX1对应的词条控件更新显示效果的方法可为该控件显示在头像控件的下方,并获取到显示设备的焦点,而XX3,XX4,XX5对应的词条控件暂不进行显示。
在播报至XX2时,将XX2对应的目标文本更新显示效果,其中,XX1对应的目标文本更新显示效果的方法可为将目标文本进行变色。由于XX2没有对应的推挤控件,因此,XX1对应的推荐控件仍可维持播报XX1时的显示效果,或还原至播报XX1之前的显示效果。
参见图12,在播报至XX3时,将XX3对应的目标文本和推荐控件更新显示效果,其中,XX3对应的目标文本更新显示效果的方法可为将目标文本进行变色,XX3对应的头像控件更新显示效果的方法可为将头像控件的边缘进行变色,XX3对应的词条控件更新显示效果的方法可为该控件显示在头像控件的下方,并获取到显示设备的焦点,而XX1,XX4,XX5对应的词条控件暂不进行显示,或者,还可将XX1对应的词条控件移动至XX3对应的词条控件的下方,使XX1对应的词条控件的显示优先级低于XX3对应的词条控件的显示优先级。
参见图13,在播报至XX5时,将XX5对应的目标文本和推荐控件更新显示效果,其中,XX5对应的目标文本更新显示效果的方法可为将目标文本进行变色,XX5对应的头像控件更新显示效果的方法可为将头像控件的边缘进行变色,XX5对应的词条控件更新显示效果的方法可为该控件显示在头像控件的下方,并获取到显示设备的焦点,而XX1,XX3,XX4对应的词条控件不进行显示,或者,还可将XX4对应的词条控件移动至XX5对应的词条控件的下方,使XX1对应的词条控件的显示优先级低于XX3对应的词条控件的显示优先级,使XX3对应的词条控件的显示优先级低于XX4对应的词条控件的显示优先级,XX4对应的词条控件的显示优先级低于XX5对应的词条控件的显示优先级。
根据图11-图13可见,在播报到不同的播报点时,不同的播报点以及对应的推荐控件可依次强调显示,以提示用户当前的播报进度,并向用户展示相关的推荐内容。
由上述实施例可见,本申请提供的显示设备,在接收到语音指令对应的响应数据后,可对响应数据进行分析,在响应数据包括推荐数据时,可在UI界面中生成推荐控件,在播报到推荐控件对应的播报点时,将推荐控件进行强调显示,从而使用户在输入一次语音指令后就能够在UI界面上看到当前语音播报的内容的推荐信息,不需要再次输入语音指令才能获得推荐信息,减少了交互步骤,提升了交互体验;进一步的,通过将UI界面中的播报点和推荐控件同时跟随语音播报的进度自动进行强调显示,可使用户在UI界面上就能看到语音播报的进度,不需要特意聆听语音播报的内容,减轻了用户的注意力负担,提升了用户体验。
为了方便解释,已经结合具体的实施方式进行了上述说明。但是,上述示例性的讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多 种修改和变形。上述实施方式的选择和描述是为了更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用所述实施方式以及适于具体使用考虑的各种不同的变形的实施方式。

Claims (20)

  1. 一种显示设备,其特征在于,包括:
    显示器,用于呈现用户界面;
    控制器,与所述显示器连接,所述控制器被配置为:
    接收用户输入的语音指令;
    响应于所述语音指令,获取所述语音指令对应的响应数据;
    在所述响应数据包括音频数据和显示数据时,若所述显示数据包括所述语音指令的答案数据和所述语音指令的推荐数据,生成包含所述答案数据对应的图文对象、以及所述推荐数据对应的推荐控件的响应界面,其中,所述推荐控件被配置为响应于触发时跳转至所述推荐控件对应的用户界面;
    控制所述显示器显示所述响应界面,并控制与之相连接的音频输出装置播放所述音频数据对应的音频。
  2. 根据权利要求1所述的显示设备,其特征在于,所述控制器还被配置为:
    在播放到与所述推荐控件匹配的参考文本时,更新所述推荐控件的显示效果,使所述推荐控件的显示效果与在所述参考文本播放前的显示效果不同,其中,所述参考文本属于所述音频数据对应的文本。
  3. 根据权利要求2所述的显示设备,其特征在于,所述更新所述推荐控件的显示效果,包括:
    若在所述参考文本播放前,显示设备的焦点不在所述推荐控件上,则在所述参考文本播放时,将所述显示设备的焦点移动到所述推荐控件上。
  4. 根据权利要求2所述的显示设备,其特征在于,所述控制器还被配置为:
    在播放到与所述推荐控件匹配的参考文本时,更新所述推荐控件对应的图文对象的显示效果,使所述图文对象的显示效果与在所述参考文本播放前的显示效果不同。
  5. 根据权利要求1所述的显示设备,其特征在于,所述控制器还被配置为:
    若所述显示数据包括所述语音指令的答案数据,不包括所述语音指令的推荐数据,则在生成播放完所述音频数据对应的音频之前,向服务器获取所述语音指令的推荐数据;
    根据所述推荐数据生成所述推荐控件,并在所述响应界面上增加显示所述推荐控件。
  6. 根据权利要求5所述的显示设备,其特征在于,所述向服务器获取所述语音指令的推荐数据,包括:
    检测显示设备上是否存在与所述答案数据对应的终端能力参数;
    若所述显示设备上存在与所述答案数据对应的终端能力参数,则向服务器发送所述终端能力参数,以获取所述答案数据对应的推荐数据,所述终端能力参数用于实现所述推荐数据对应的功能。
  7. 根据权利要求1所述的显示设备,其特征在于,所述控制器还被配置为:
    接收所述推荐控件的触发指令;
    响应于所述触发指令,控制所述显示器从所述响应界面跳转至所述推荐控件对应的用户界面。
  8. 根据权利要求7所述的显示设备,其特征在于,所述控制器还被配置为:
    接收用户输入的返回指令;
    响应于所述返回指令,控制所述显示器从所述推荐控件对应的用户界面跳转至所述响应界面。
  9. 一种服务器,其特征在于,所述服务器被配置为:
    接收来自显示设备的语音指令转换后的文本;
    根据预设的业务处理规则,获取所述文本对应的答案数据;
    根据预设的推荐规则,获取所述答案数据对应的推荐数据;
    向所述显示设备发送所述答案数据和推荐数据。
  10. 一种语音交互方法,其特征在于,包括:
    接收用户输入的语音指令;
    响应于所述语音指令,获取所述语音指令对应的响应数据;
    在所述响应数据包括音频数据和显示数据时,若所述显示数据包括所述语音指令的答案数据和所述语音指令的推荐数据,生成包含所述答案数据对应的图文对象、以及所述推荐数据对应的推荐控件的响应界面,其中,所述推荐控件被配置为响应于触发时跳转至所述推荐控件对应的用户界面;
    控制所述显示器显示所述响应界面,并控制与之相连接的音频输出装置播放所述音频数据对应的音频。
  11. 一种显示设备,其特征在于,包括:
    显示器,用户呈现用户界面;
    控制器,与所述显示器连接,所述控制器被配置为:
    接收用户输入的语音指令;
    响应于所述语音指令,获取所述语音指令对应的响应数据;
    在所述响应数据包括音频数据和显示数据时,根据所述显示数据生成响应界面,将所述音频数据对应的文本与所述响应界面上的图文对象进行匹配,得到相匹配的参考文本与目标图文,其中,所述参考文本属于所述音频数据对应的文本,所述目标图文属于所述图文对象;
    控制所述显示器显示所述响应界面,并控制与之相连接的音频输出装置播放所述音频数据对应的音频;
    在播放到所述参考文本时,更新所述目标图文在所述响应界面上的显示效果,使所述目标图文的显示效果与在所述参考文本播放前的显示效果不同。
  12. 根据权利要求11所述的显示设备,其特征在于,所述控制器还被配置为:
    在播放完所述参考文本后,还原所述目标图文的显示效果,使所述目标图文的显示效果与在所述参考文本播放前的显示效果相同。
  13. 根据权利要求11所述的显示设备,其特征在于,所述控制器还被配置为:
    在播放完所述参考文本后,将所述目标图文的显示效果进行更新,使所述目标图文的显示效果与在所述参考文本播放前的显示效果不同,与在所述参考文本播放后的显示效果也不同。
  14. 根据权利要求11所述的显示设备,其特征在于,所述将所述音频数据对应的文本与所述响应界面上的图文对象进行匹配,得到匹配的参考文本与目标图文,包括:
    将所述音频数据对应的文本拆分成多个字符组;
    将所述字符组与所述响应界面上的文本进行匹配,若匹配成功,将所述字符组确定为 参考文本,将所述响应界面上的文本确定为目标文本,所述响应界面上的图文对象包括所述响应界面上的文本,所述目标图文包括所述目标文本。
  15. 根据权利要求14所述的显示设备,其特征在于,所述控制器还被配置为:
    获取显示设备的播报速度,以及获取所述参考文本的起点距离所述音频数据对应的文本的起点的字符间距;
    根据所述播报速度和字符间距,计算所述参考文本开始播报的时刻;
    根据所述播报速度和参考文本的字符长度,计算所述参考文本结束播报的时刻。
  16. 根据权利要求11所述的显示设备,其特征在于,所述将所述音频数据对应的文本与所述响应界面上的图文对象进行匹配,得到匹配的参考文本与目标图文,包括:
    将所述音频数据对应的文本拆分成多个字符组;
    将所述字符组与所述响应界面上的图形进行匹配,若匹配成功,将所述字符组确定为参考文本,将所述响应界面上的文本确定为目标图形,所述响应界面上的图文对象包括所述响应界面上的图形,所述目标图文包括所述目标图形。
  17. 根据权利要求11所述的显示设备,其特征在于,所述更新所述目标图文在所述响应界面上的显示效果,使所述目标图文的显示效果与在所述参考文本播放前的显示效果不同,包括:
    将所述目标图文的颜色进行变色,使所述目标图文的颜色与在所述参考文本播放前的颜色不同。
  18. 根据权利要求11所述的显示设备,其特征在于,所述更新所述目标图文在所述响应界面上的显示效果,使所述目标图文的显示效果与在所述参考文本播放前的显示效果不同,包括:
    将所述目标图文在所述响应界面上的位置进行调整,使所述目标图文在所述响应界面上的位置与在所述参考文本播放前的位置不同。
  19. 根据权利要求11所述的显示设备,其特征在于,所述更新所述目标图文在所述响应界面上的显示效果,使所述目标图文的显示效果与在所述参考文本播放前的显示效果不同,包括:
    若在所述参考文本播放前,显示设备的焦点不在所述目标图文上,则在所述参考文本播放时,将所述显示设备的焦点移动到所述目标图文上。
  20. 一种语音交互方法,其特征在于,包括:
    接收用户输入的语音指令;
    响应于所述语音指令,获取所述语音指令对应的响应数据;
    在所述响应数据包括音频数据和显示数据时,根据所述显示数据生成响应界面,将所述音频数据对应的文本与所述响应界面上的图文对象进行匹配,得到相匹配的参考文本与目标图文,其中,所述参考文本属于所述音频数据对应的文本,所述目标图文属于所述图文对象;
    控制所述显示器显示所述响应界面,并控制音频输出装置播放所述音频数据对应的音频;
    在播放到所述参考文本时,更新所述目标图文在所述响应界面上的显示效果,使所述目标图文的显示效果与在所述参考文本播放前的显示效果不同。
PCT/CN2021/134357 2021-03-18 2021-11-30 显示设备及语音交互方法 WO2022193735A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110291989.6A CN113066491A (zh) 2021-03-18 2021-03-18 显示设备及语音交互方法
CN202110291989.6 2021-03-18
CN202110320136.0A CN113079400A (zh) 2021-03-25 2021-03-25 显示设备、服务器及语音交互方法
CN202110320136.0 2021-03-25

Publications (1)

Publication Number Publication Date
WO2022193735A1 true WO2022193735A1 (zh) 2022-09-22

Family

ID=83321701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134357 WO2022193735A1 (zh) 2021-03-18 2021-11-30 显示设备及语音交互方法

Country Status (1)

Country Link
WO (1) WO2022193735A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018156013A (ja) * 2017-03-21 2018-10-04 株式会社東芝 情報処理装置、情報処理方法、および情報処理プログラム
CN111949782A (zh) * 2020-08-07 2020-11-17 海信视像科技股份有限公司 一种信息推荐方法和服务设备
CN112349287A (zh) * 2020-10-30 2021-02-09 深圳Tcl新技术有限公司 显示设备及其控制方法、从设备及计算机可读存储介质
CN112492371A (zh) * 2020-11-18 2021-03-12 海信视像科技股份有限公司 一种显示设备
CN113066491A (zh) * 2021-03-18 2021-07-02 海信视像科技股份有限公司 显示设备及语音交互方法
CN113079400A (zh) * 2021-03-25 2021-07-06 海信视像科技股份有限公司 显示设备、服务器及语音交互方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018156013A (ja) * 2017-03-21 2018-10-04 株式会社東芝 情報処理装置、情報処理方法、および情報処理プログラム
CN111949782A (zh) * 2020-08-07 2020-11-17 海信视像科技股份有限公司 一种信息推荐方法和服务设备
CN112349287A (zh) * 2020-10-30 2021-02-09 深圳Tcl新技术有限公司 显示设备及其控制方法、从设备及计算机可读存储介质
CN112492371A (zh) * 2020-11-18 2021-03-12 海信视像科技股份有限公司 一种显示设备
CN113066491A (zh) * 2021-03-18 2021-07-02 海信视像科技股份有限公司 显示设备及语音交互方法
CN113079400A (zh) * 2021-03-25 2021-07-06 海信视像科技股份有限公司 显示设备、服务器及语音交互方法

Similar Documents

Publication Publication Date Title
US11620984B2 (en) Human-computer interaction method, and electronic device and storage medium thereof
CN107370649B (zh) 家电控制方法、系统、控制终端、及存储介质
TWI511125B (zh) 語音操控方法、行動終端裝置及語音操控系統
CN112511882B (zh) 一种显示设备及语音唤起方法
US10999415B2 (en) Creating a cinematic storytelling experience using network-addressable devices
CN112165627B (zh) 信息处理方法、装置、存储介质、终端及系统
CN113066491A (zh) 显示设备及语音交互方法
CN114627864A (zh) 显示设备与语音交互方法
CN111539216A (zh) 一种用于自然语言内容标题消歧的方法、设备和系统
WO2022193735A1 (zh) 显示设备及语音交互方法
CN117809649A (zh) 显示设备和语义分析方法
CN115270808A (zh) 显示设备和语义理解方法
CN113035194B (zh) 一种语音控制方法、显示设备及服务器
US20220215833A1 (en) Method and device for converting spoken words to text form
CN113079400A (zh) 显示设备、服务器及语音交互方法
CN110764618A (zh) 一种仿生交互系统、方法及相应的生成系统和方法
CN113038217A (zh) 一种显示设备、服务器及应答语生成方法
CN112380871A (zh) 语义识别方法、设备及介质
CN114302248B (zh) 一种显示设备及多窗口语音播报方法
WO2022237381A1 (zh) 保存会议记录的方法、终端及服务器
CN113076427B (zh) 一种媒资资源搜索方法、显示设备及服务器
CN118283339A (zh) 一种显示设备、服务器和语音指令识别方法
CN117806587A (zh) 显示设备和多轮对话预料生成方法
CN117292692A (zh) 显示设备和音频识别方法
CN118331531A (zh) 一种显示设备及多类别指令响应方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21931309

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21931309

Country of ref document: EP

Kind code of ref document: A1