CN112905149A - Processing method of voice instruction on display device, display device and server - Google Patents

Processing method of voice instruction on display device, display device and server Download PDF

Info

Publication number
CN112905149A
CN112905149A CN202110368889.9A CN202110368889A CN112905149A CN 112905149 A CN112905149 A CN 112905149A CN 202110368889 A CN202110368889 A CN 202110368889A CN 112905149 A CN112905149 A CN 112905149A
Authority
CN
China
Prior art keywords
voice
instruction
display
display device
voice service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110368889.9A
Other languages
Chinese (zh)
Inventor
鲁亚凯
朱赵龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vidaa Netherlands International Holdings BV
Vidaa USA Inc
Original Assignee
Vidaa Netherlands International Holdings BV
Vidaa USA Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vidaa Netherlands International Holdings BV, Vidaa USA Inc filed Critical Vidaa Netherlands International Holdings BV
Priority to CN202110368889.9A priority Critical patent/CN112905149A/en
Publication of CN112905149A publication Critical patent/CN112905149A/en
Priority to US18/278,537 priority patent/US20240053957A1/en
Priority to EP22772086.9A priority patent/EP4309031A1/en
Priority to PCT/US2022/020435 priority patent/WO2022197737A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a processing method of a voice instruction on a display device, the display device and a server. The user can control the display device to perform related operations by inputting voice content to the display device. Meanwhile, in order to avoid the display device from analyzing and processing the voice contents in different voice services, the display device needs to send the voice contents input by the user to the server for analysis, and the voice contents are converted into target voice instructions meeting the unified instruction standard by the server again. After receiving the target voice command, the display device only needs to process the target voice command in a unified mode. Because the voice command which accords with one command standard is generated in the server, the display equipment only needs to process one voice command without adding various processing codes for various voice commands, so that the code amount is reduced, and the maintenance cost of the display equipment can also be reduced.

Description

Processing method of voice instruction on display device, display device and server
Technical Field
The present application relates to the field of display technologies, and in particular, to a method for processing a voice instruction on a display device, and a server.
Background
Smart voice services are favored by more and more display device manufacturers as a feature function, and are used more and more frequently on display devices. But currently some mainstream voice services on display devices are available only in some countries. In order to cover more countries, a display device may simultaneously contain a plurality of voice services, but the voice command standard for controlling the same function in each voice service is different. If a plurality of voice services are used, different types of voice commands need to be processed on the display device. Further, adding ways to handle various voice instructions to the display device increases the amount of code for the display device and the maintenance cost of the display device.
Disclosure of Invention
The application provides a processing method of a voice instruction on display equipment, the display equipment and a server, and aims to solve the problem that different voice instructions need to be processed respectively aiming at different voice services on the existing display equipment.
In a first aspect, the present application provides a display device comprising:
a display;
a controller configured to:
sending voice content input by a user to a server so that the server analyzes the voice content by using a voice service to which the voice content belongs, and converting the analyzed voice content into a target voice instruction meeting a unified instruction standard;
receiving the target voice instruction sent back by the server;
and responding to the target voice instruction, and controlling the display equipment to execute relevant operation.
In some embodiments, the controller is further configured to:
under the condition that the display equipment is started up for the first time, displaying a voice service selection page in the process of starting up navigation;
responding to a first selection instruction used for selecting a target voice service on the voice service selection page by a user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
In some embodiments, the controller is further configured to:
displaying voice options on a starting page of the display equipment;
responding to a second selection instruction used for selecting the voice option on the starting page by the user, and controlling a display to display a voice service selection page;
responding to a third selection instruction used for selecting a target voice service on the voice service selection page by the user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
In some embodiments, the controller is further configured to:
controlling the display to display a voice service selection page in response to a voice setting instruction input by a user through the control device;
responding to a fourth selection instruction used for selecting a target voice service on the voice service selection page by the user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
In some embodiments, the controller is further configured to:
controlling a display to display a setting page in response to a setting page selection instruction input by a user;
responding to a fifth selection instruction used for selecting the voice service setting item on the setting page by the user, and controlling a display to display a voice service selection page;
responding to a sixth selection instruction used for selecting a target voice service on the voice service selection page by the user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
In some embodiments, the controller is further configured to:
after receiving the second selection instruction, detecting whether a voice function on the display device is registered;
and under the condition that the voice function is not registered, controlling a display to display a voice registration page so as to enable a user to complete voice registration operation.
In some embodiments, the controller is further configured to:
after receiving the voice setting instruction, detecting whether a voice function on the display equipment is registered;
and under the condition that the voice function is not registered, controlling a display to display a voice registration page so as to enable a user to complete voice registration operation.
In a second aspect, the present application further provides a server, including:
a controller configured to:
receiving voice content sent by display equipment and a voice service type currently used by the display equipment;
analyzing the voice content according to an instruction standard corresponding to the voice service type to obtain analyzed content; wherein different types of voice services have different instruction standards;
converting the analyzed content into a target voice instruction which accords with a target instruction standard; the target instruction standard is a set of unified instruction generation standard in the server;
and sending the target voice instruction back to the display device.
In a third aspect, the present application provides a method for processing a voice command on a display device, including:
sending voice content input by a user to a server so that the server analyzes the voice content by using a voice service to which the voice content belongs, and converting the analyzed voice content into a target voice instruction meeting a unified instruction standard;
receiving the target voice instruction sent back by the server;
and responding to the target voice instruction, and controlling the display equipment to execute relevant operation.
In a fourth aspect, the present application further provides another method for processing a voice instruction on a display device, including:
receiving voice content sent by display equipment and a voice service type currently used by the display equipment;
analyzing the voice content according to an instruction standard corresponding to the voice service type to obtain analyzed content; wherein different types of voice services have different instruction standards;
converting the analyzed content into a target voice instruction which accords with a target instruction standard; the target instruction standard is a set of unified instruction generation standard in the server;
and sending the target voice instruction back to the display device.
As can be seen from the foregoing, the present application provides a processing method for a voice command on a display device, and a server. The user can control the display device to perform related operations by inputting voice content to the display device. Meanwhile, in order to avoid the display device from analyzing and processing the voice contents in different voice services, the display device needs to send the voice contents input by the user to the server for analysis, and the voice contents are converted into target voice instructions meeting the unified instruction standard by the server again. After receiving the target voice command, the display device only needs to process the target voice command in a unified mode. Because the voice command which accords with one command standard is generated in the server, the display equipment only needs to process one voice command without adding various processing codes for various voice commands, so that the code amount is reduced, and the maintenance cost of the display equipment can also be reduced.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 illustrates a schematic diagram of a usage scenario of a display device according to some embodiments;
fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments;
fig. 3 illustrates a hardware configuration block diagram of the display apparatus 200 according to some embodiments;
FIG. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments;
FIG. 5 illustrates a communication flow diagram for server 400 and display device 200 according to some embodiments;
FIG. 6 illustrates another communication flow diagram for server 400 and display device 200 according to some embodiments;
FIG. 7 illustrates a schematic diagram of a voice service selection page in accordance with some embodiments;
FIG. 8 illustrates a process flow diagram for display device 200 according to some embodiments;
FIG. 9 illustrates a schematic diagram of a launch page of display device 200 according to some embodiments;
FIG. 10 illustrates a second process flow diagram of the display device 200 according to some embodiments;
FIG. 11 illustrates a schematic diagram of a settings page according to some embodiments;
FIG. 12 illustrates a third process flow diagram for display device 200 according to some embodiments;
FIG. 13 illustrates a fourth process flow diagram of the display device 200 according to some embodiments;
FIG. 14 illustrates a second schematic diagram of a voice service selection page in accordance with some embodiments;
FIG. 15 illustrates a third schematic diagram of a voice service selection page, according to some embodiments.
Detailed Description
To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
FIG. 1 illustrates a schematic diagram of a usage scenario of a display device according to some embodiments. As shown in fig. 1, the display apparatus 200 is also in data communication with a server 400, and a user can operate the display apparatus 200 through the smart device 300 or the control device 100.
In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may control the display apparatus 200 by inputting a user instruction through at least one of a key on a remote controller, a voice input, a control panel input, and the like.
In some embodiments, the smart device 300 may include any of a mobile terminal, a tablet, a computer, a laptop, an AR/VR device, and the like.
In some embodiments, the smart device 300 may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.
In some embodiments, the smart device 300 and the display device may also be used for communication of data.
In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the display device 200 to obtain a voice instruction, or may be received by a voice control apparatus provided outside the display device 200.
In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.
In some embodiments, software steps executed by one step execution agent may be migrated on demand to another step execution agent in data communication therewith for execution. Illustratively, software steps performed by the server may be migrated to be performed on a display device in data communication therewith, and vice versa, as desired.
Fig. 2 illustrates a block diagram of a hardware configuration of the control apparatus 100 according to some embodiments. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200.
In some embodiments, the communication interface 130 is used for external communication, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module.
In some embodiments, the user input/output interface 140 includes at least one of a microphone, a touchpad, a sensor, a key, or an alternative module.
Fig. 3 illustrates a hardware configuration block diagram of a display device 200 according to some embodiments.
In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.
In some embodiments, the controller includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, a first interface to an nth interface for input/output.
In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.
In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.
In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.
In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.
In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.
In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.
In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.
In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.
In some embodiments, the controller includes at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphic Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first interface to an nth interface for input/output, a communication Bus (Bus), and the like.
And the CPU is used for executing the operating system and the application program instructions stored in the memory and executing various application programs, data and contents according to various interaction instructions for receiving external input so as to finally display and play various audio and video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.
In some embodiments, a graphics processor for generating various graphics objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.
In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal displayed or played on the direct display device 200.
In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.
In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform at least one of noise reduction, digital-to-analog conversion, and amplification processing to obtain a sound signal that can be played in the speaker.
In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.
In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.
In some embodiments, user interface 280 is an interface that may be used to receive control inputs (e.g., physical buttons on the body of the display device, or the like).
As shown in fig. 4, the system of the display device is divided into three layers, i.e., an application layer, a middleware layer and a hardware layer from top to bottom.
The Application layer mainly includes common applications on the television and an Application Framework (Application Framework), wherein the common applications are mainly applications developed based on the Browser, such as: HTML5 APPs; and Native APPs (Native APPs);
an Application Framework (Application Framework) is a complete program model, and has all basic functions required by standard Application software, such as: file access, data exchange …, and interfaces (toolbars, status bars, menus, dialog boxes) for use of these functions.
Native APPs (Native APPs) may support online or offline, message push, or local resource access.
The middleware layer comprises various television protocols, multimedia protocols, system components and other middleware. The middleware can use basic service (function) provided by system software to connect each part of an application system or different applications on a network, and can achieve the purposes of resource sharing and function sharing.
The hardware layer mainly comprises an HAL interface, hardware and a driver, wherein the HAL interface is a unified interface for butting all the television chips, and specific logic is realized by each chip. The driving mainly comprises: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..
Smart voice services are favored by more and more display device manufacturers as a feature function, and are used more and more frequently on display devices. But currently some of the mainstream voice services on the display device 200 are available only in some countries. In order to cover more countries, a variety of voice services may be included in one display device 200, but the voice command standard for controlling the same function is different for each voice service. If a plurality of voice services are used, different types of voice instructions are processed on the display device 200. Further, adding ways to handle various voice instructions to the display apparatus 200 increases the amount of code for the display apparatus 200 and the maintenance cost of the display apparatus 200.
Based on the above, in order to reduce the amount of code for processing the voice instruction in the display apparatus 200 and reduce the maintenance cost, the embodiment of the present application provides a server 400 that can receive the voice content sent from the display apparatus 200, wherein the voice content is the content input to the display apparatus 200 by the user when using the display apparatus 200.
In general, different types of voice services have different instruction standards, and the voice contents performing the same operation in the different types of voice services may also be slightly different. The conventional server needs to parse specific voice content in the current voice service scenario to know what operation the user wants to perform. The conventional server then generates voice commands belonging to the current voice service for the voice content according to the command standard of the current voice service.
For example, if the voice service a has the instruction standard a and the voice service B has the instruction standard B, if the voice content input by the user based on the voice service a on the display device 200 is "application 1", the conventional server parses the voice content in the voice service a that the user wants to open the application 1, and then generates a target voice instruction of "turnOn" from the parsed content by using the instruction standard a; if the voice content input by the user based on the voice service B at the display device 200 is "open application 1", the conventional server parses the voice content in the voice service B that the user wants to open application 1, and then generates a target voice instruction "true" from the parsed content by using the instruction standard B. Where "turnOn" is an open instruction belonging to voice service a, and "true" is an open instruction belonging to voice service B.
As shown in fig. 5, the server 400 in the embodiment of the present application also parses the voice content according to the current voice service type based on the display device 200, but does not generate a voice command conforming to the current voice service, and converts the parsed content into a target voice command by using a unified command standard. Thus, the display device 200 always receives one type of target voice instruction, and only analyzes the target voice instruction; without configuring a code that parses a variety of voice instructions.
For example, the target voice command sent back to display device 200 by the conventional server may be "turn on" or "true", and in this case, not only code 1 but also code 2 are configured in display device 200 to resolve the target voice command of "turn on". However, if the server 400 in the embodiment of the present application converts all the voice contents "turnOn" and "true" based on the voice service a and the voice service B into "1" of the unified instruction standard, then "1" is sent back to the display device 200 as the target voice instruction, and the display device 200 only needs to analyze the instruction "1", where only one set of codes needs to be configured, so that the problem of code redundancy in the display device 200 can be effectively avoided.
In the above process, the controller of the server 400 may be configured to: receives the voice content from the display device 200 and the type of voice service currently being used by the display device 200. Furthermore, the voice content in the current voice service scene can be analyzed, the intention of the user can be analyzed, and the analyzed content can be obtained. Then, the analyzed content is converted into a target voice command meeting the target command standard. Finally, the target voice command is sent back to the display device 200, so that the display device 200 performs corresponding operations according to the requirement of the target voice command.
As can be seen, in the embodiment of the present application, the server 400 may uniformly convert the voice commands of the plurality of voice services into the voice commands conforming to one command standard, so that the display device 200 does not need to analyze the voice contents of different voice services, and the maintenance cost of the display device 200 is reduced while the code amount of the voice commands analyzed in the display device 200 is reduced.
In order to achieve the above object, a display device 200 is also provided in the embodiment of the present application. As shown in fig. 6, the display device 200 may provide different voice services to the user according to the user's needs, and then receive the voice content input by the user in the current scenario of the voice service. The display apparatus 200 transmits the voice content and the type of the current voice service to the server 400, and the server 400 generates a target voice command and transmits it back to the display apparatus 200. The display device 200 then performs the associated operations, etc., in accordance with the requirements of the target voice command.
In this process, the controller 250 of the display apparatus 200 may be configured to: the voice contents inputted by the user are transmitted to the server 400. And then receives the target voice command sent back by the server 400. Finally, the display apparatus 200 is controlled to perform the relevant operation in response to the target voice instruction.
Before the user inputs a voice command to control the display device 200, the user may select different voice services to use on the display device 200 according to his/her needs, such as Google Assistant (Google Assistant), amazon alexa (amazon artificial intelligence Assistant), and so on. In order to provide more convenient and various voice services to the user, the voice service selection page may be displayed on the display device 200 in various forms, such as during a navigation process when the display device 200 is first turned on, on a start page after the display device 200 is turned on, or on a setting page of the display device 200, or directly on the display device 200 according to a control instruction input by the user, or the like.
In some embodiments, the Voice service selection page displayed during the navigation when the display device 200 is first powered on is shown in fig. 7, and the selectable Voice services in fig. 7 include two types of "Google Assistant" and "Alexa", and there is a prompt content "Select your Voice Assistant" for prompting the user to make a selection. The user may select a target voice service on the voice service selection page through voice control or by controlling the apparatus 100 or the like through a remote controller or the like.
In this process, as shown in fig. 8, the controller 250 of the display apparatus 200 is configured to: in the case where the display apparatus 200 is first powered on, a voice service selection page as shown in fig. 7 is displayed during the power-on navigation. The user selects a target voice service on the voice service selection page, i.e., inputs a first selection instruction to the display device 200, and the controller 250 switches the voice service on the display device 200 to the target voice service, e.g., "Google Assistant" or the like, in response to the first selection instruction.
After the display apparatus 200 sets the voice service, as shown in fig. 8, the user may input the voice content to the display apparatus 200 again in the current scenario of the target voice service, and then the controller 250 of the display apparatus 200 transmits the voice content together with the type of the target voice service to the server 400, and the server 400 continues the processing.
In some embodiments, the first selection instruction may be input by pressing a direction key of the control apparatus 100 such as a remote controller or the like or may be directly input to the display device 200 by a voice instruction.
In addition, in some embodiments, the control device 100 is also configured with a function key dedicated to receiving the voice content, and the user can input the voice content to the control device 100 by pressing the voice function key on the control device 100, so that the control device 100 forwards the voice content to the display device 200. Alternatively, some display devices 200 may be equipped with sound pickup means themselves, and the user may directly display the voice content input by the device 200 for direct reception by the sound pickup means.
In some embodiments, the start-up page of the display device 200 after power-on is shown in fig. 9, in which many resource options are shown, and some function options such as "search", "set", "user", "voice", etc. are displayed at the top of the start-up page in fig. 9. Wherein a microphone icon may be used as a "voice" option that the user may select on the launch page.
In this process, as shown in fig. 10, the controller 250 of the display apparatus 200 may be further configured to: the voice option as shown in fig. 9 is displayed on the start page of the display device 200. Then, the user selects a voice option on the launch page, i.e., inputs a second selection instruction to the display device 200, and the controller 250 controls the display 260 to display a voice service selection page in response to the second selection instruction. The user may continue to select the target voice service on the voice service selection page, i.e., continue to input the third selection instruction to the display apparatus 200, and the controller 250 continues to switch the voice service currently used by the display apparatus 200 to the target voice service selected by the user in response to the third selection instruction.
After the display apparatus 200 sets the voice service, as shown in fig. 10, the user may input the voice content to the display apparatus 200 again in the current scenario of the target voice service, and then the controller 250 of the display apparatus 200 transmits the voice content together with the type of the target voice service to the server 400, and the server 400 continues the processing.
In some embodiments, the second selection instruction and the third selection instruction may be input by pressing a key of the control apparatus 100 such as a remote controller or directly inputting a voice instruction to the display device 200.
In some embodiments, the setting page of the display device 200 is illustrated in fig. 11, taking a "System" page as an example, and includes several function setting items, such as "Time", "Timer Settings", "System PIN", "Parental Control", "Language and Location", "Voice Service", "Application Settings", "HDMI & CEC", and the like. The user may select a Voice Service setup item on the setup page, and then control the display device 200 to display a Voice Service selection page. And, when the focus frame is positioned on the Voice Service setting item, displaying a corresponding prompt content on the setting page, such as "Use your Voice to control the TV.
In this process, as shown in fig. 12, the controller 250 of the display apparatus 200 may be further configured to: the display 260 is controlled to display a setup page in response to a setup page selection instruction input by the user. Then, the user selects a voice service setting item on the setting page, i.e., inputs a fifth selection instruction to the display device 200, and the controller 250 controls the display 260 to display a voice service selection page in response to the fifth selection instruction. The user continues to select the target voice service on the voice service selection page, i.e., a sixth selection instruction is input to the display apparatus 200, and the controller 250 switches the voice service currently used by the display apparatus 200 to the target voice service in response to the sixth selection instruction.
After the display apparatus 200 sets the voice service, as shown in fig. 12, the user may input the voice content to the display apparatus 200 again in the current scenario of the target voice service, and then the controller 250 of the display apparatus 200 transmits the voice content together with the type of the target voice service to the server 400, and the processing is continued by the server 400.
In some embodiments, the user may input the page selection instruction, the fifth selection instruction, and the sixth selection instruction by pressing a key of the control apparatus 100 such as a remote controller or directly inputting the page selection instruction, the fifth selection instruction, and the sixth selection instruction to the display device 200 by voice.
In some embodiments, a voice service function key may be further configured on the control apparatus 100 associated with the display device 200, and the user may press the function key on the control apparatus 100 to control the display device 200 to directly display the voice service selection page.
In this process, as shown in fig. 13, the controller 250 of the display apparatus 200 may be further configured to: the display 260 is controlled to directly display a voice service selection page in response to a voice setting instruction input by the user through the control apparatus 100. The voice setting instruction is an instruction sent by a user pressing the voice service function key. Then, the user may continue to select the target voice service on the voice service selection page, i.e., a fourth selection instruction is input to the display apparatus 200, and the controller 250 switches the voice service currently used by the display apparatus 200 to the target voice service in response to the fourth selection instruction.
After the voice service is set in the display apparatus 200, as shown in fig. 13, the user may input the voice content to the display apparatus 200 again in the current scenario of the target voice service, and then the controller 250 of the display apparatus 200 transmits the voice content together with the type of the target voice service to the server 400, and the processing is continued by the server 400.
In some embodiments, the fourth selection instruction may be input by pressing a key of the control apparatus 100 such as a remote controller or may be directly input to the display device 200 by voice.
FIG. 14 illustrates a second schematic diagram of a voice service selection page, in accordance with some embodiments. As shown in FIG. 14, three voice services are shown on the voice service selection page, such as "Google Assistant", "amazon alexa", and "yandex", etc. Wherein "yandex" is one of the most important web portals in russia, and is also an artificial intelligence assistant. And there is a prompt "Ask questions, search for your favorite movies, and more just by asking Google" (questions can be asked, movies you like searched, television controlled, etc.) for prompting the user to make a selection. The voice service selection page is also provided with a confirmation option and a prompt of 'Set voice Assistant to Google Assistant'.
FIG. 15 illustrates a third schematic diagram of a voice service selection page, according to some embodiments. As shown in FIG. 15, when the focus box is positioned to the "amazon Alexa" voice service option, some prompt of voice content, such as "thinks to try Alexa, what's the weather today? (one may try To say "Alexa, how is today? (Alexa, why is blue.
The voice service selection page described in the foregoing embodiments may all adopt the contents shown in fig. 14 or fig. 15, and after the user selects the target voice service on the voice service selection page, the display device 200 may switch the voice service used by the current system to the target voice service.
Generally, most display devices 200 require that their own voice functions be registered before using a voice service, so as to ensure that the display devices 200 reasonably and legally collect the voice content of users, and further ensure the legality and security of the voice service. After the voice function is registered, the voice service can enhance the accuracy of semantic recognition or content recognition according to the historical requirements of the user and the like, so that the user can use the voice service more conveniently.
Based on this, in some embodiments, the display device 200 further needs to detect whether the voice function is registered on the display device 200 before the user selects the voice service, for example, after the user selects the voice option on the above-mentioned start page, the display device 200 may detect whether the voice function is registered on it or activated. In this process, the controller 250 of the display apparatus 200 is configured to: after receiving the second selection instruction, it is detected whether the voice function on the display apparatus 200 is registered. And in case the voice function is not registered, the controller 250 controls the display 260 to display a voice registration page to allow the user to complete the voice registration operation. In addition, in case of voice function registration, the controller 250 may control the display 260 to directly display a voice service selection page.
Alternatively, after the user presses the voice service function key of the control apparatus 100, the display device 200 may detect whether the voice function is registered or activated before displaying the voice service selection page. In this process, the controller 250 of the display apparatus 200 is further configured to: after receiving the voice setting instruction, it is detected whether the voice function on the display apparatus 200 is registered. And in case the voice function is not registered, the controller 250 controls the display 260 to display a voice registration page to allow the user to complete the voice registration operation. In addition, in case of voice function registration, the controller 250 may control the display 260 to directly display a voice service selection page.
Still alternatively, after the user selects the voice service setting item on the setting page, the display device 200 may also detect whether the voice function is registered or activated thereon before displaying the voice service selection page. In this process, the controller 250 of the display apparatus 200 is further configured to: after receiving the fifth selection instruction, it is detected whether the voice function on the display apparatus 200 is registered. And in case the voice function is not registered, the controller 250 controls the display 260 to display a voice registration page to allow the user to complete the voice registration operation. In addition, in case of voice function registration, the controller 250 may control the display 260 to directly display a voice service selection page.
It can be seen that the display device 200 in the above embodiments of the present application not only can provide the user with entries for multiple voice service selection pages, but also can parse and respond with the target voice instruction sent back by the server 400. Since the server 400 can generate a voice command conforming to one command standard, the display apparatus 200 can also process only one voice command without adding a plurality of processing codes to a plurality of voice commands, thereby reducing the code amount and reducing the maintenance cost of the display apparatus 200.
In order to reduce the code redundancy of the voice command processed in the display device 200, the embodiment of the present application further provides a method for processing the voice command on the display device, where the method may be applied to the server 400 described above, and specifically may include the following steps: receives the voice content from the display device 200 and the type of voice service currently being used by the display device 200. Analyzing the voice content according to an instruction standard corresponding to the voice service type to obtain analyzed content; wherein different types of voice services have different instruction standards. Converting the analyzed content into a target voice instruction which accords with a target instruction standard; wherein the target instruction standard is a set of instruction generation standards unified in the server 400. The target voice instruction is sent back to the display device 200.
Meanwhile, another processing method for a voice instruction on a display device is also provided in the embodiment of the present application, and the method may be applied to the display device 200 described above, and specifically may include the following steps: the voice content input by the user is transmitted to the server 400, so that the server 400 analyzes the voice content by using the voice service to which the voice content belongs, and converts the analyzed voice content into a target voice instruction meeting the unified instruction standard. And receiving a target voice instruction sent back by the server 400. In response to the target voice instruction, the display apparatus 200 is controlled to perform the relevant operation.
As the processing method of the voice command on the display device in the embodiment of the present application may be applied to the display device 200 and the server 400 in the foregoing embodiments, respectively, other contents regarding the processing method of the voice command on the display device in the embodiment of the present application may refer to the contents regarding the foregoing embodiments of the display device 200 and the server 400, and are not described herein again.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. A display device, comprising:
a display;
a controller configured to:
sending voice content input by a user to a server so that the server analyzes the voice content by using a voice service to which the voice content belongs, and converting the analyzed voice content into a target voice instruction meeting a unified instruction standard;
receiving the target voice instruction sent back by the server;
and responding to the target voice instruction, and controlling the display equipment to execute relevant operation.
2. The display device of claim 1, wherein the controller is further configured to:
under the condition that the display equipment is started up for the first time, displaying a voice service selection page in the process of starting up navigation;
responding to a first selection instruction used for selecting a target voice service on the voice service selection page by a user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
3. The display device of claim 1, wherein the controller is further configured to:
displaying voice options on a starting page of the display equipment;
responding to a second selection instruction used for selecting the voice option on the starting page by the user, and controlling a display to display a voice service selection page;
responding to a third selection instruction used for selecting a target voice service on the voice service selection page by the user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
4. The display device of claim 1, wherein the controller is further configured to:
controlling the display to display a voice service selection page in response to a voice setting instruction input by a user through the control device;
responding to a fourth selection instruction used for selecting a target voice service on the voice service selection page by the user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
5. The display device of claim 1, wherein the controller is further configured to:
controlling a display to display a setting page in response to a setting page selection instruction input by a user;
responding to a fifth selection instruction used for selecting the voice service setting item on the setting page by the user, and controlling a display to display a voice service selection page;
responding to a sixth selection instruction used for selecting a target voice service on the voice service selection page by the user, and switching the currently used voice service of the display equipment into the target voice service;
and sending the voice content input by the user in the target voice service to the server together with the type of the target voice service.
6. The display device of claim 3, wherein the controller is further configured to:
after receiving the second selection instruction, detecting whether a voice function on the display device is registered;
and under the condition that the voice function is not registered, controlling a display to display a voice registration page so as to enable a user to complete voice registration operation.
7. The display device of claim 4, wherein the controller is further configured to:
after receiving the voice setting instruction, detecting whether a voice function on the display equipment is registered;
and under the condition that the voice function is not registered, controlling a display to display a voice registration page so as to enable a user to complete voice registration operation.
8. A server, comprising:
a controller configured to:
receiving voice content sent by display equipment and a voice service type currently used by the display equipment;
analyzing the voice content according to an instruction standard corresponding to the voice service type to obtain analyzed content; wherein different types of voice services have different instruction standards;
converting the analyzed content into a target voice instruction which accords with a target instruction standard; the target instruction standard is a set of unified instruction generation standard in the server;
and sending the target voice instruction back to the display device.
9. A method for processing voice commands on a display device, comprising:
sending voice content input by a user to a server so that the server analyzes the voice content by using a voice service to which the voice content belongs, and converting the analyzed voice content into a target voice instruction meeting a unified instruction standard;
receiving the target voice instruction sent back by the server;
and responding to the target voice instruction, and controlling the display equipment to execute relevant operation.
10. A method for processing voice commands on a display device, comprising:
receiving voice content sent by display equipment and a voice service type currently used by the display equipment;
analyzing the voice content according to an instruction standard corresponding to the voice service type to obtain analyzed content; wherein different types of voice services have different instruction standards;
converting the analyzed content into a target voice instruction which accords with a target instruction standard; the target instruction standard is a set of unified instruction generation standard in the server;
and sending the target voice instruction back to the display device.
CN202110368889.9A 2021-03-15 2021-04-06 Processing method of voice instruction on display device, display device and server Pending CN112905149A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202110368889.9A CN112905149A (en) 2021-04-06 2021-04-06 Processing method of voice instruction on display device, display device and server
US18/278,537 US20240053957A1 (en) 2021-03-15 2022-03-15 Display apparatus and display method
EP22772086.9A EP4309031A1 (en) 2021-03-15 2022-03-15 Display apparatus and display method
PCT/US2022/020435 WO2022197737A1 (en) 2021-03-15 2022-03-15 Display apparatus and display method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110368889.9A CN112905149A (en) 2021-04-06 2021-04-06 Processing method of voice instruction on display device, display device and server

Publications (1)

Publication Number Publication Date
CN112905149A true CN112905149A (en) 2021-06-04

Family

ID=76110024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110368889.9A Pending CN112905149A (en) 2021-03-15 2021-04-06 Processing method of voice instruction on display device, display device and server

Country Status (1)

Country Link
CN (1) CN112905149A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581959A (en) * 2020-12-15 2021-03-30 四川虹美智能科技有限公司 Intelligent device control method and system and voice server
CN113490041A (en) * 2021-06-30 2021-10-08 Vidaa美国公司 Voice function switching method and display device
CN113608715A (en) * 2021-08-13 2021-11-05 Vidaa美国公司 Display device and voice service switching method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103474068A (en) * 2013-08-19 2013-12-25 安徽科大讯飞信息科技股份有限公司 Method, equipment and system for implementing voice command control
CN108877791A (en) * 2018-05-23 2018-11-23 百度在线网络技术(北京)有限公司 Voice interactive method, device, server, terminal and medium based on view
CN111526402A (en) * 2020-05-06 2020-08-11 海信电子科技(武汉)有限公司 Method for searching video resources through voice of multi-screen display equipment and display equipment
CN112565849A (en) * 2019-09-26 2021-03-26 深圳市茁壮网络股份有限公司 Voice control method of digital television, television control system and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103474068A (en) * 2013-08-19 2013-12-25 安徽科大讯飞信息科技股份有限公司 Method, equipment and system for implementing voice command control
CN108877791A (en) * 2018-05-23 2018-11-23 百度在线网络技术(北京)有限公司 Voice interactive method, device, server, terminal and medium based on view
CN112565849A (en) * 2019-09-26 2021-03-26 深圳市茁壮网络股份有限公司 Voice control method of digital television, television control system and storage medium
CN111526402A (en) * 2020-05-06 2020-08-11 海信电子科技(武汉)有限公司 Method for searching video resources through voice of multi-screen display equipment and display equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581959A (en) * 2020-12-15 2021-03-30 四川虹美智能科技有限公司 Intelligent device control method and system and voice server
CN113490041A (en) * 2021-06-30 2021-10-08 Vidaa美国公司 Voice function switching method and display device
CN113490041B (en) * 2021-06-30 2023-05-05 Vidaa美国公司 Voice function switching method and display device
CN113608715A (en) * 2021-08-13 2021-11-05 Vidaa美国公司 Display device and voice service switching method

Similar Documents

Publication Publication Date Title
CN112905149A (en) Processing method of voice instruction on display device, display device and server
CN112653906B (en) Video hot spot playing method on display equipment and display equipment
CN112887778A (en) Switching method of video resource playing modes on display equipment and display equipment
CN112947888A (en) Display method and display equipment of voice function page
CN113301405A (en) Display device and display control method of virtual keyboard
CN113163258A (en) Channel switching method and display device
CN112733050A (en) Display method of search results on display device and display device
CN113014979A (en) Content display method and display equipment
CN112817556A (en) Switching method of voice scheme on display equipment, display equipment and control device
CN113784203A (en) Display device and channel switching method
CN113608715A (en) Display device and voice service switching method
CN112882780A (en) Setting page display method and display device
CN112882631A (en) Display method of electronic specification on display device and display device
CN113709557A (en) Audio output control method and display device
CN115701105A (en) Display device, server and voice interaction method
CN113573112A (en) Display device and remote controller
CN113490030A (en) Display device and channel information display method
CN113038255A (en) Channel information updating method and display device
CN112732396A (en) Media asset data display method and display device
CN113064691A (en) Display method and display equipment for starting user interface
CN113014977A (en) Display device and volume display method
CN113490041B (en) Voice function switching method and display device
CN113676782B (en) Display equipment and interaction method for coexisting multiple applications
CN115514998B (en) Display equipment and network media resource switching method
CN113342305B (en) Audio state display method and display device of power amplifier device and power amplifier device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination