CN116189674A - Voice control method and display device - Google Patents

Voice control method and display device Download PDF

Info

Publication number
CN116189674A
CN116189674A CN202211597149.3A CN202211597149A CN116189674A CN 116189674 A CN116189674 A CN 116189674A CN 202211597149 A CN202211597149 A CN 202211597149A CN 116189674 A CN116189674 A CN 116189674A
Authority
CN
China
Prior art keywords
voice information
control module
voice
main control
display
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211597149.3A
Other languages
Chinese (zh)
Inventor
杨香斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Priority to CN202211597149.3A priority Critical patent/CN116189674A/en
Publication of CN116189674A publication Critical patent/CN116189674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4436Power management, e.g. shutting down unused components of the receiver
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The embodiment of the application discloses a voice control method and display equipment, which relate to the technical field of intelligent terminals and can improve the speed of displaying media resources (such as videos) by the display equipment in a standby state controlled by a user. The display device includes: a display configured to display media assets; a main control module configured to stop operation when the display device is in a standby state; a power-on control module configured to: acquiring voice information when the display equipment is in a standby state; if the voice information comprises the first voice information, triggering the main control module to start, and determining whether the voice information comprises the second voice information except the first voice information; the first voice information is used for indicating a wake-up word; a main control module configured to: and if the voice information comprises the second voice information, controlling the display to display the media resource indicated by the second voice information according to the second voice information.

Description

Voice control method and display device
Technical Field
The application relates to the technical field of intelligent terminals, in particular to a voice control method and display equipment.
Background
Currently, display devices (e.g., cell phones, televisions, etc.) are becoming more and more intelligent, for example, display devices are beginning to provide far-field speech functionality. With the far-field voice function, the user can control the function of the display device by voice without performing an entity operation. For example, the voice-controlled display device is powered on and plays video.
When the display equipment is in a standby state, only the starting voice triggering the starting of the display equipment is responded. Therefore, if the user wants to control the display device to play video when the display device is in the standby state, the user is required to first send out a startup voice triggering the startup of the display device and wait for the startup of the display device. After the display device is successfully started, the user sends out display instruction voice triggering the display device to play the video, and the display device receives and responds to the display instruction voice to display the video. It will be appreciated that the user may need to speak multiple times to control the display device to play the video. This undoubtedly increases the step of the user controlling the display device in the standby state to play the video, thereby reducing the speed of the user controlling the display device in the standby state to play the video.
Disclosure of Invention
The embodiment of the application provides a voice control method and display equipment, which can improve the speed of displaying media resources (such as videos) by the display equipment in a standby state controlled by a user.
In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:
in a first aspect, a voice control method is provided and applied to a display device, wherein the display device comprises a startup control module, and the startup control module works in a standby state of the display device. The method comprises the following steps: the method comprises the steps that when the display equipment is in a standby state, a starting control module obtains voice information; if the voice information comprises the first voice information, starting the display equipment, and determining whether the voice information comprises second voice information except the first voice information; the first voice information is used for indicating a wake-up word; if the voice information comprises the second voice information, the display device displays the media resource indicated by the second voice information according to the second voice information.
With reference to the first aspect, in a possible implementation manner, the method further includes: the startup control module determines whether the voice information comprises first voice information; if the voice information does not include the first voice information, the startup control module determines whether the next voice information includes the first voice information. The next voice message is acquired by the power-on control module after the voice message.
With reference to the first aspect, in another possible implementation manner, the display device further includes a main control module; the main control module stops working when the display device is in a standby state. The display device is started up, and determines second voice information except the first voice information from the voice information, including: the startup control module triggers the main control module to start, and the startup control module determines whether the voice information includes second voice information in addition to the first voice information.
With reference to the first aspect, in another possible implementation manner, the displaying device displays, according to the second voice information, a media resource indicated by the second voice information, including: the main control module acquires second voice information from the starting control module; the main control module sends second voice information to the server; the main control module receives a voice recognition result of the second voice information sent by the server; and the main control module determines and displays the media resources indicated by the second voice information according to the voice recognition result.
With reference to the first aspect, in another possible implementation manner, the obtaining, by the main control module, second voice information from the power-on control module includes: if the voice information comprises the second voice information, the starting control module generates a first identifier; the first identification characterizing speech information comprises second speech information; the main control module sends a first query request to the startup control module; the startup control module responds to a first query request sent by the main control module and sends a first identification to the main control module; the main control module sends a second query request to the startup control module according to the first identifier sent by the startup control module; the startup control module responds to a second query request sent by the main control module and sends second voice information to the main control module; the main control module receives second voice information sent by the startup control module.
In a second aspect, there is provided a display device including: the device comprises a display, a starting control module and a main control module.
Wherein the display is configured to display the media asset. And the main control module is configured to stop working when the display device is in a standby state. A power-on control module configured to: acquiring voice information when the display equipment is in a standby state; if the voice information comprises the first voice information, triggering the main control module to start, and determining whether the voice information comprises the second voice information except the first voice information; the first voice information is used for indicating a wake-up word. A main control module configured to: and if the voice information comprises the second voice information, controlling the display to display the media resource indicated by the second voice information according to the second voice information.
With reference to the second aspect, in another possible implementation manner, the power-on control module is further configured to: determining whether the voice information includes first voice information; if the voice information does not include the first voice information, determining whether the next voice information includes the first voice information; the next voice message is acquired by the power-on control module after the voice message.
With reference to the second aspect, in another possible implementation manner, the display device includes a communicator. The main control module is specifically configured to: acquiring second voice information from a startup control module; the control communicator sends second voice information to the server; the control communicator receives the voice recognition result of the second voice information sent by the server; and determining the media resources indicated by the second voice information according to the voice recognition result, and controlling the display to display the media resources indicated by the second voice information.
With reference to the second aspect, in another possible implementation manner, the power-on control module is further configured to: if the voice information comprises the second voice information, generating a first identifier; the first identification characterizing speech information includes second speech information. A main control module further configured to: and sending a first query request to the starting control module. The power-on control module is further configured to: and responding to the first query request sent by the main control module, and sending a first identification to the main control module. A main control module further configured to: and sending a second query request to the startup control module according to the first identifier sent by the startup control module. The power-on control module is further configured to: and responding to a second query request sent by the main control module, and sending second voice information to the main control module.
In a third aspect, a display device is provided, which has the functionality to implement the method of the first aspect described above. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a fourth aspect, there is provided a display device including: a processor and a memory; the memory is configured to store computer instructions that, when executed by the display device, cause the display device to perform the speech control method of any one of the first aspects described above.
In a fifth aspect, there is provided a computer readable storage medium having instructions stored therein that, when run on a display device, cause the display device to perform the speech control method of any one of the first aspects above.
In a sixth aspect, there is provided a computer program product comprising computer instructions which, when run on a display device, enable the display device to perform the speech control method of any one of the first aspects above.
In a seventh aspect, there is provided an apparatus (e.g. the apparatus may be a system-on-a-chip) comprising a processor for supporting a display device to implement the functions referred to in the first aspect above. In one possible design, the apparatus further includes a memory for storing program instructions and data necessary for the display device. When the device is a chip system, the device can be formed by a chip, and can also comprise the chip and other discrete devices.
An embodiment of the present application provides a voice control method, where after a display device obtains voice information in a standby state, it is first determined whether the voice information includes a wake-up word. If the voice information includes a wake word, the display device is powered on and the display device further continues to determine if the voice information includes second voice information indicating a media asset. If the voice information further includes second voice information indicating a media asset, the display device redisplays the media asset. That is, the user can control the display device to start up and can also control the display device to display media resources when the user sends out voice information while the display device is in a standby state. The user does not need to wait for the display device to start up and then control the display device to display the media resource, so that the step of controlling the display device to display the media resource in the standby state by the user is simplified, and the speed of controlling the display device to display the media resource in the standby state by the user can be improved.
Drawings
FIG. 1 is a flow chart of a voice control method provided in the related art;
fig. 2 is a schematic view of a scenario of a voice control method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a display device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a display device according to a second embodiment of the present application;
fig. 5 is a flowchart of a voice control method according to an embodiment of the present application;
fig. 6 is a second flowchart of a voice control method according to an embodiment of the present application;
fig. 7 is a schematic diagram of a user voice control television playing video according to an embodiment of the present application;
fig. 8 is a schematic hardware diagram of a display device according to an embodiment of the present application;
fig. 9 is a flowchart III of a voice control method according to an embodiment of the present application;
fig. 10 is a schematic structural diagram III of a display device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a chip system according to an embodiment of the present application.
Detailed Description
For purposes of clarity and implementation of the present application, the following description will make clear and complete descriptions of exemplary implementations of the present application with reference to the accompanying drawings in which exemplary implementations of the present application are illustrated, it being apparent that the exemplary implementations described are only some, but not all, of the examples of the present application.
It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms "first," second, "" third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for limiting a particular order or sequence, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
Currently, television sets provide more and more functionality, such as far-field speech functionality provided by television sets. Through the far-field voice function, a user can control the television to execute corresponding operation by sending voice within a certain range from the television without operating a control device of the television. For example, a voice-controlled television is turned on and a video is played.
Wherein, since the television is powered on and in a standby state, all modules (e.g., controller, communication module) in the television are substantially deactivated. Therefore, in order to ensure that a user can control the television to start through voice when the television is in a standby state, a starting control module is arranged in the television, and the starting control module still works normally when the television is in the standby state. The starting control module can acquire starting voice of a user triggering the television to start and trigger the television to start according to the starting voice. Thus, the television in the standby state is started under the voice control of the user.
However, since the television set provides many other television functions in addition to the on/off state, for example, the television set displays a certain application program, the television set displays a certain channel, the television set plays a video of a ball game, the television set plays a video of singing song, and so on. That is, the variety of other television functions provided by a television is very large. The content of the functional speech that the user triggers other television functions is also very rich and varied. Therefore, the difficulty of recognizing the functional speech that triggers the control of other television functions is also greater. The starting control module in the television has limited processing capacity, can only be used for recognizing starting voice, and cannot recognize functional voice with rich and multilateral content. The controller in the television needs to control the communication module to communicate with the server to request the server to recognize the functional voice.
However, in the standby state of the television, both the controller and the communication module in the television stop operating. Therefore, when the television is in a standby state, the television can only be controlled to be started by voice through the starting control module in operation. If a user wants to control the television to perform some other television function, such as playing a video of a ball game, when the television is in a standby state, the user is required to first send a start-up voice triggering the television to start up and wait for the television to start up. After the television is successfully started, the user sends out display instruction voice triggering the television to play the ball game video, and the television receives and responds to the display instruction voice to display the ball game video. It can be appreciated that the user needs to make multiple voices to control the television to play the video of the ball game. Therefore, the step of playing the ball game video by the television in the standby state controlled by the user is undoubtedly increased, and the speed of playing the ball game video by the television in the standby state controlled by the user is reduced.
Illustratively, a voice control method provided by the related art as shown in fig. 1 includes the steps of: s11, in a standby state, the television acquires voice information I sent by a user; s12, judging whether the voice information I comprises a wake-up word or not;
S13, if the voice information I comprises a wake-up word, starting the television, and enabling the television to enter a starting state; s14, if the voice information does not include the wake-up word, the television is kept in a standby state; s15, under the starting-up state, the television acquires second voice information sent by a user; s16, judging whether the second voice information comprises a wake-up word or not; s17, if the voice information II comprises a wake-up word, continuing to identify the voice information II, and determining media resources indicated by the voice information II; s18, if the second voice message does not comprise a wake-up word, the television is kept in a starting state; and S19, the television displays the media resources indicated by the voice information II.
It will be appreciated that the user needs to send out two separate utterances of speech, and that both utterances include wake-up words. After the user sends the first voice information, the user waits for the television to start up and then sends the second voice information. This undoubtedly increases the steps for the user to control the television to play the media asset in the standby state, thereby reducing the speed for the user to control the television to play the media asset in the standby state.
In view of the foregoing, an embodiment of the present application provides a voice control method, in which after a display device obtains voice information in a standby state, it is determined whether the voice information includes a wake-up word. If the voice information includes a wake word, the display device is powered on and the display device further continues to determine if the voice information includes second voice information indicating a media asset. If the voice information further includes second voice information indicating a media asset, the display device redisplays the media asset. That is, the user can control the display device to start up and can also control the display device to display media resources when the user sends out voice information while the display device is in a standby state. The user does not need to wait for the display device to start up and then control the display device to display the media resource, so that the step of controlling the display device to display the media resource in the standby state by the user is simplified, and the speed of controlling the display device to display the media resource in the standby state by the user can be improved.
The following describes a voice control method provided in the embodiment of the present application.
The display device provided by the embodiment of the application can have various implementation forms, for example, can be a tablet computer, a PC, a television, a smart television, a laser projection device, an electronic desktop (electronic table) and other display devices with displays. The embodiment of the present application does not limit the specific form of the display device herein. In the embodiment of the application, a display device is taken as a television set as an example for schematic description.
Fig. 2 is a schematic diagram of a scenario in which a user controls a display device according to an embodiment. As shown in fig. 2, a user may operate the television 200 through the control apparatus 100 or the smart device 300. Alternatively, the user may also speak a voice within a certain range from the television 200, through which the television 200 is controlled.
In some embodiments, the control device 100 may be a remote controller, and the communication between the remote controller and the television 200 includes infrared protocol communication, and other short-range communication modes, and the television 200 is controlled by a wireless or wired mode. The user may control the television 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc.
In some embodiments, the user may also control the television 200 using a smart device 300 (e.g., mobile terminal, tablet, computer, notebook, etc.). For example, the television 200 is controlled using an application running on the smart device 300.
In some embodiments, the television 200 may not receive instructions using the smart device 300 or the control apparatus 100 described above, but may receive instructions of a user through touch or gesture, or the like.
In some embodiments, the television 200 may further perform control in a manner other than the control apparatus 100 and the smart device 300, for example, the voice of the user may be directly received through a voice acquisition module (e.g., a microphone) configured inside the television 200, or the voice of the user may be received through a voice acquisition device configured outside the television 200. The method provided in the embodiment of the present application will be described below by taking the example of receiving the voice of the user through the voice acquisition module configured inside the television 200.
In some embodiments, the television 200 is also in data communication with a server 400. Television set 200 may be permitted to make communication connections via a local area network (Local Area Network, LAN), a wireless local area network (Wireless Local Area Networks, WLAN) and other networks. The server 400 may provide various content and interactions to the television 200. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers.
Fig. 3 is a schematic structural diagram of a television according to an embodiment of the present application.
As shown in fig. 3, the television 200 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280.
In some embodiments, the controller 250 includes: at least one of a CPU, a video processor, an audio processor, a graphic processor (Graphics Processing Unit, GPU), a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), a first interface to an nth interface for input/output, a communication Bus (Bus), and the like.
The modem 210 receives broadcast television signals through a wired or wireless reception manner, and demodulates audio and video signals from a plurality of wireless or wired broadcast television signals. The detector 230 is used to collect signals of the external environment or interaction with the outside. The controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box. The display 260 may be at least one of a liquid crystal display, an Organic Light-Emitting Diode (OLED) display, a touch display, and a projection display, and may also be a projection device and a projection screen.
In some embodiments, the controller 250 controls the operation of the television 200 and responds to user operations by various software control programs stored on the memory. The controller 250 controls the overall operation of the television set 200. The user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.
In some embodiments, the sound collector may be a microphone, also referred to as a "microphone," for converting sound signals into electrical signals. When performing voice interaction, the user can sound through the mouth of a person near the microphone, inputting sound signals to the microphone. The display device 200 may be provided with at least one microphone. In other embodiments, the display device 200 may be provided with two microphones, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the display device 200 may also be provided with three, four, or more microphones to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.
The microphone may be built in the television 200, or connected to the television 200 by a wired or wireless method. For example, a microphone may be provided at the lower side edge of the display 260 of the television 200. Of course, the location of the microphone on the television 200 is not limited in this embodiment of the present application. Alternatively, the television 200 may not include a microphone, i.e., the microphone is not provided in the television 200. The television 200 may be coupled to a microphone (also referred to as a microphone) via an interface such as USB interface 130. The external microphone may be secured to the television 200 by external fasteners such as a camera mount with a clip. For example, an external microphone may be secured to the television 200 at the edge of the display 260, such as at the upper edge, by an external mount.
In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of the user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include at least one of a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.
In some examples, taking the operating system of the television 200 as an Android system as an example, as shown in fig. 4, the television 200 may be logically divided into an application layer (Applications) 21, a kernel layer 22 and a hardware layer 23.
The hardware layers may include the controller 250, the communicator 220, the detector 230, the display 260, and the like shown in fig. 3, as shown in fig. 4. The application layer 21 includes one or more applications. The application may be a system application or a third party application. For example, the application layer 21 includes far-field speech applications, which may provide far-field speech functionality. The far-field voice application may be specifically configured to obtain voice information sent by a user when the television 200 is in a power-on state, and request the server 400 to identify the obtained voice information; and then controlling the television 200 to display the media resources indicated by the identification result according to the identification result.
The kernel layer 22 acts as software middleware between the hardware layer and the application layer 21 for managing and controlling hardware and software resources.
The server 400 includes a communication control module 201 and a voice recognition module 202. The communication control module 201 is configured to establish a communication connection with the television 200. Such as: the far-field voice application in the television 200 enables a communication connection with the communication control module 201 of the server 400 by invoking the communicator 220.
In some examples, the core layer 22 includes a detector driver for sending voice information collected by the sound collector in the detector 230 to the far-field voice application. When the television 200 is in the on state, far-field speech applications and communicator 220 in television 200 are started. The communicator 220 establishes a communication connection with the communication control module 201 in the server 400. The detector driver is used to send the user-entered voice information collected by the sound collector in the detector 230 to the far-field voice application. The far-field speech application then sends the speech information to the speech recognition module 202 of the server 400. After receiving the voice information sent by the television 200, the voice recognition module 202 determines the voice text corresponding to the voice information. The voice recognition module 202 sends the voice text corresponding to the voice information to the far-field voice application of the television 200. Upon receiving the voice text sent by the server 400, the far-field voice application controls the display 192 to display the media asset indicated by the voice text.
The voice information referred to in the present application may be data authorized by the user or sufficiently authorized by the parties.
The methods in the following embodiments may be implemented in a display device having the above-described hardware structure.
The following describes the voice control method provided in the embodiment of the present application in detail with reference to fig. 5. As shown in fig. 5, continuing to illustrate the television 200 as a display device, the voice control method provided in the embodiment of the present application may include the following S501-S503.
S501, in a standby state, the television 200 is started up to obtain voice information by the control module.
The television 200 may include a power-on control module and a main control module. In the standby state of the television 200, the power-on control module in the television 200 is still in a working state, and a plurality of modules such as the main control module and the communication module included in the television 200 stop working. At this time, if the user sends out voice within a certain range from the television 200, the television 200 may obtain the voice information sent by the user through the power-on control module, and cache the voice information.
The power-on control module in the television 200 is used for acquiring voice information and identifying the voice information. The starting control module is mainly used for identifying voice information for triggering the television to start. The power-on control module may also be referred to as a far-field voice module.
Illustratively, the power-on control module in the television 200 may be implemented by digital signal processing (Digital Signal Processing, DSP). The main control module in the television 200 may be implemented by a System On Chip (SOC).
In the standby state of the television 200, a plurality of modules such as the main control module, the communication module, and the display are suspended, and the power consumption of the television 200 is reduced. Therefore, this standby state may also be referred to as a low power consumption state. Next, when the television 200 is in the standby state and the display is also stopped, the display is in the black screen state.
In some embodiments, the television 200 may be in a standby state, and the power-on control module may always acquire voice information. After the startup control module acquires the voice information, the voice information can be identified to determine whether the voice information comprises the first voice information. If the voice information includes the first voice information, the user is shown to wake up the television successfully, and the television is turned on, i.e. S502 is executed. If the voice information does not include the first voice information, the television 200 may remain in a standby state, and the power-on control module may recognize the next voice information acquired after the voice information to determine whether the next voice information includes the first voice information.
The first voice information is used for indicating a wake-up word. The wake word may be a specified word. For example, wake words may include "hello".
It should be noted that, the processing procedure of the next voice information by the startup control module may refer to the processing procedure of the voice information by the startup control module.
Illustratively, in the standby state, the television 200 is still in operation with the detector driver and detector 230. If the user utters a voice, the sound collector in the detector 230 collects voice information uttered by the user. The detector drives and sends the voice information collected by the sound collector to the startup control module. The startup control module acquires the voice information from the sound collector and then judges whether the voice information comprises a wake-up word.
For example, the power-on control module may acquire the voice information of the first duration each time, and process (e.g., recognize) the acquired voice information of the first duration. After the startup control module acquires the voice information of one first duration, the next voice information of the first duration can be continuously acquired.
S502, if the voice information comprises the first voice information, the television 200 is started, and whether the voice information comprises the second voice information except the first voice information is determined; the first voice information is used for indicating a wake-up word.
If the startup control module determines that the voice information includes the first voice information (or includes a wake-up word), indicating that voice wake-up is successful, the startup control module may trigger the main control module in the television 200 to start, and the startup control module determines whether the voice information includes the second voice information in addition to the first voice information. Further, the main control module is started, and then the main controller can control the display to display one interface (such as a main interface) in the television.
In some embodiments, if the voice message includes the first voice message, other modules in the television 200 that stop operating, such as a communication module, may be started in addition to the start-up control module triggering the start-up of the main control module. For example, the power-on control module triggers the start-up of other modules in the television 200 that are not active, or the main control module triggers the start-up of other modules in the television 200 that are not active.
In some embodiments, after the power-on control module in the television 200 obtains the voice information, the voice information may be cached. And then, after determining that the voice information comprises the first voice information, the starting-up control module determines whether the voice information comprises other voice information except the first voice information, namely, the second voice information. If the voice information includes the second voice information, the television set 200 may recognize the second voice information, i.e., perform S503. If the voice information does not include the second voice information, the startup control module may delete the voice information.
Illustratively, the power-on control module may employ voice activity detection (Voice Activity Detection, VAD) to determine whether the voice information includes second voice information in addition to the first voice information.
S503, if the voice information includes the second voice information, the television 200 displays the media resource indicated by the second voice information according to the second voice information.
If the startup control module determines that the voice information includes the second voice information, the startup control module may send the second voice information to the main control module in the television 200. The main control module may then display the media asset indicated by the second voice information based on the second voice information. For example, the second voice information may include a live video of the ball game, and the main control module may control the display to play the video of the ball game.
Illustratively, the main control module displaying the media resource indicated by the second voice information according to the second voice information may include the steps of: firstly, recognizing the second voice information to obtain a voice text (which can be called as a voice recognition result) corresponding to the second voice information; and controlling a display to display the media resources indicated by the voice text (namely, the media resources indicated by the second voice information) according to the voice text.
The main control module may transmit the second voice information to the server through the communication module, for example. The server identifies the second voice information to obtain a voice text corresponding to the second voice information, and sends the voice text to the main control module.
In some embodiments, the television 200 may locally store the media asset indicated by the second voice information, or the television 200 may obtain the media asset indicated by the second voice information from the server.
In some embodiments, the television 200 may be installed with far field voice applications. The main control module can realize the above-mentioned "according to the second voice information, display the media resource indicated by the second voice information" through the far-field voice application.
Illustratively, taking the example that the user wants to watch the exercise video in the standby state of the television, and the television 200 includes a power-on control module and a main control module, the voice control method provided in the embodiment of the present application is described. As shown in fig. 6, S501 in the method includes S601. The method further includes S602. S502 in the method includes S603, and S503 includes S604-S607.
S601, in a standby state, the television 200 is started up to obtain voice information by the control module.
For example, as shown in fig. 7 (a), the television 200 is in a standby state, and the display of the television 200 is black. The user speaks "hello, i wants to exercise" within a certain range from the television set 200. The startup control module can acquire voice information sent by a user and buffer the voice information. The voice information includes "hello, i want to exercise".
S602, the startup control module judges whether the voice information comprises first voice information, wherein the first voice information is used for indicating a wake-up word.
If it is determined that the voice information includes the first voice information, the power-on control module executes S603-S604. If it is determined that the voice information does not include the first voice information, the television 200 remains in the standby state.
For example, continuing to take the example that the voice information acquired by the power-on control module includes "hello, i wants to exercise", the power-on control module may determine that the voice information includes a wake-up word, i.e., "hello", and then execute S603.
S603, the starting control module triggers the main control module to start.
For example, as shown in fig. 7 (b), when the main control module is started, the display of the television 200 is controlled to display the main interface 701.
S604, the startup control module acquires second voice information except the first voice information in the voice information.
For example, continuing to take the example that the voice information obtained by the startup control module includes "hello, i want to exercise", the startup control module may obtain the second voice information in the voice information includes "i want to exercise".
It should be noted that, in addition to the power-on control module shown in fig. 6 executing S603 first and then executing S604, the power-on control module may also execute S603 and S604 simultaneously. The embodiment of the present application does not limit the order of S603 and S604.
S605, the startup control module sends the second voice information to the main control module.
S606, the main control module recognizes the second voice information to obtain a voice text corresponding to the second voice information.
For example, continuing to take the example that the second voice information includes "i want to exercise", the main control module recognizes the second voice information, and can obtain the voice text corresponding to the second voice information including "exercise".
S607, the main control module displays the media resources indicated by the voice text according to the voice text corresponding to the second voice information.
For example, as shown in fig. 7 (c), taking the example where the voice text corresponding to the second voice message includes "exercise", the main control module may determine an exercise video indicated by "exercise" and control the display to display the exercise video 702. The exercise video may be a user historically browsed exercise video or an exercise video customized by the television 200.
It should be noted that, after the user issues "hello, i wants to exercise," the display of the television set other than that shown in fig. 7 jumps from the black screen to the main interface 701, and then jumps from the main interface 701 to the interface including the exercise video 702. The display of the television may also jump directly from a black screen to an interface that includes the exercise video 702. The embodiments of the present application are not limited in this regard.
It can be understood that, when the user sends out the voice including the wake-up word while the television 200 is in the standby state, the user can control not only the television 200 to start up, but also the television 200 to play the media resource indicated by the voice after starting up. The user can control the television 200 to display the media resources when the television 200 is in the standby state, so that the television 200 can display the media resources when the user controls the television 200 to display the media resources when the user is in the standby state. The process of displaying media assets by the television 200 in the active voice controlled standby state may be referred to as oneshot. The user does not need to wait for the television 200 to be started and then send out the second voice, so that the speed of controlling the television 200 to display the media resources is improved.
Illustratively, as shown in fig. 8, the power-on control module in the television 200 is implemented by a DSP and the main control module is implemented by an SOC. The DSP may include, among other things, a wake module 811, a VAD module 812 and an audio interaction module 813.
The voice control method provided in the embodiment of the present application is described with reference to the structure of the television 200 shown in fig. 8.
First, regardless of whether the television 200 is in an on state or a standby state, the DSP can always acquire voice information through the sound collector. The wake-up module 811 can identify the voice information acquired by the DSP to determine whether the voice information includes the first voice information. If the voice information includes the first voice information, the DSP may send a start notification message to the SOC. The start notification information is used to trigger a General-purpose input/output (GPIO) pin that sets the SOC to a low level. When the GPIO pin of the SOC is at a low level, the SOC starts. The GPIO pin of the SOC is used to control the suspension/start of the SOC.
Secondly, the DSP still acquires the voice information through the sound collector while judging whether the voice information comprises the first voice information. The DSP may determine, after acquiring the voice information of the first duration, through the VAD module 812, whether the acquired voice information of the first duration includes second voice information in addition to the first voice information.
After the SOC starts, the SOC may trigger other modules in the television 200 that are inactive to start, such as triggering the communicator 220 to start. The SOC may also trigger the start of at least one application installed in the television set 200, including far-field voice applications. The far-field speech application in the SOC may obtain the second speech information from the audio interaction module 813 in the DSP. The far-field voice application in the SOC may then communicate with the server via the communicator 220 requesting the server to recognize the second voice information. The far-field speech application in the SOC may receive the speech text corresponding to the second speech information sent by the server. The SOC can control the display to display the media assets indicated by the voice text in accordance with the voice text.
The first duration may be a duration required for starting the SOC, such as 2 seconds(s).
In some embodiments, after the power-on control module in the television 200 determines the second voice information, the main control module waits to send a query request (i.e., a second query request described below) that triggers acquisition of the second voice information. After receiving the query request sent by the main control module, the startup control module responds to the query request and sends the second query request to the main control module.
Illustratively, the television 200 includes a power-on control module and a main control module, and the television 200 is installed with a far-field voice application as an example, the voice control method provided in the embodiment of the present application is described. As shown in fig. 9, S502 in the method may further include S901, and S606 in S503 may further include S902-S906. The method may further comprise S907-S908.
S901, the startup control module judges whether the voice information comprises second voice information except the first voice information.
The startup control module may perform VAD on information other than the first voice information in the voice information, and determine whether the information other than the first voice information in the voice information includes the second voice information. If the information other than the first voice information in the voice information includes the second voice information, that is, the voice information includes the second voice information, S604 and S902 are performed. If the information other than the first voice information in the voice information does not include the second voice information, that is, the voice information does not include the second voice information, the power-on control module may generate a second identifier, where the second identifier characterizes that the voice information does not include the second voice information, that is, S909 is executed.
S902, a starting-up control module generates a first identifier; the first identification characterizing speech information includes second speech information.
If the voice information includes the second voice information, the startup control module may generate the first identifier. The first identification characterizing speech information includes second speech information. The second voice message is a voice message except for triggering the television 200 to be turned on, that is, the second voice message belongs to a functional voice. The functional voice is used for triggering other television functions besides the on-off function.
S903, the main control module sends a first query request to the startup control module.
After the main control module is started, a first query request can be sent to the startup control module, wherein the first query request is used for requesting whether a user inputs functional voice or not.
S904, the startup control module responds to the first query request and sends a first identification to the main control module.
After the startup control module generates a first identifier, the startup control module responds to a first query request sent by the main control module and sends the first identifier to the main control module.
S905, the main control module sends a second query request to the startup control module according to the first identification.
The main control module determines that the user inputs the second voice information (i.e. the functional voice) according to the first identifier sent by the startup control module, and then can send a second query request to the startup control module. The second query request is for requesting acquisition of second voice information.
S906, the startup control module responds to the second query request and sends second voice information to the main control module.
The startup control module responds to a second query request sent by the main control module and sends second voice information to the main control module.
S907, the startup control module generates a second identifier; the second identification characterizing speech information does not include the second speech information.
If the voice information does not include the second voice information, the startup control module may generate a second identifier. The second identifier characterizes the speech information as excluding the second speech information, i.e. characterizes the user as not entering functional speech.
S908, the startup control module responds to the first query request and sends a second identification to the main control module.
After the startup control module generates the second identifier, the second identifier is sent to the main control module in response to the first query request sent by the main control module. Then, the main control module can send a first query request to the startup control module again according to the second identifier sent by the startup control module so as to request the startup control module again to query whether the user inputs the functional voice.
Illustratively, a specific process of the main control module obtaining the second voice information from the power-on control module in the embodiment of the present application will be described with reference to the structure of the television 200 shown in fig. 8. First, after the SOC is started, a first query request may be sent to the audio interaction module 813 through the far-field voice application. The audio interaction module 813, in response to the first query request, may obtain the first identifier generated by the VAD module 812; the first identification is then sent to a far-field speech application in the SOC. Wherein the VAD module 812 generates the first identification upon determining that the acquired voice information includes the second voice information.
The far-field speech application in the SOC may then send a second query request to the audio interaction module 813 based on the first identification. The audio interaction module 813 may send the second voice information to a far-field voice application in the SOC in response to the second query request.
Wherein the audio interaction module 813 may divide the second voice information into a plurality of audio blocks (e.g., audio blocks of 512 bytes in size) which are sequentially transmitted to the far-field voice application in the SOC. Alternatively, the audio interaction module 813 may also send the complete second voice information to the far-field voice application in the SOC once.
The SOC and DSP may be connected by a universal serial bus (Universal Serial Bus, USB) interface, for example. At this time, the far-field voice application in the SOC and the audio interaction module 813 in the DSP may interact through the USB interface.
The foregoing description of the solution provided in the embodiments of the present application has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the present application may divide functional modules of a display device (e.g., the television 200) according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
The embodiment of the application also provides a display device. As shown in fig. 10, the display device 1000 includes: a display 1001, a power-on control module 1002 and a main control module 1003.
Wherein the display 1001 is configured to display media assets. The main control module 1003 is configured to stop the operation when the display device is in a standby state. A power-on control module 1002 configured to: acquiring voice information while the display device 1000 is in a standby state; if the voice information includes the first voice information, triggering the main control module 1003 to start and determining whether the voice information includes the second voice information other than the first voice information; the first voice information is used for indicating a wake-up word. A main control module 1003 configured to: if the voice information includes the second voice information, the display 1001 is controlled to display the media resource indicated by the second voice information according to the second voice information.
In one possible implementation, the power-on control module 1002 is further configured to: determining whether the voice information includes first voice information; if the voice information does not include the first voice information, determining whether the next voice information includes the first voice information; the next voice message is acquired by the power-on control module after the voice message.
In another possible implementation, the display device 1000 includes a communicator 1004. The main control module 1003 is specifically configured to: acquiring second voice information from the startup control module 1002; control communicator 1004 sends the second voice information to the server; the control communicator 1004 receives the voice recognition result of the second voice information sent by the server; according to the voice recognition result, the media resource indicated by the second voice information is determined, and the display 1001 is controlled to display the media resource indicated by the second voice information.
In another possible implementation, the power-on control module 1002 is further configured to: if the voice information comprises the second voice information, generating a first identifier; the first identification characterizing speech information includes second speech information. The main control module 1003 is further configured to: a first query request is sent to the power-on control module 1002. The power-on control module 1002 is further configured to: in response to the first query request sent by the master control module 1003, a first identification is sent to the master control module 1003. The main control module 1003 is further configured to: and sending a second query request to the startup control module 1002 according to the first identifier sent by the startup control module 1002. The power-on control module 1002 is further configured to: in response to the second query request sent by the main control module 1003, second voice information is sent to the main control module 1003.
Of course, the display device 1000 provided in the embodiments of the present application includes, but is not limited to, the above modules, for example, the display device 1000 may further include a memory. The memory may be used to store executable instructions for the writing display device 1000 and may also be used to store data generated by the display device 1000 during operation, such as acquired voice information, and the like.
The embodiment of the application also provides a display device, which comprises: a processor and a memory; the memory is used for storing computer instructions, and when the display device runs, the processor executes the computer instructions stored in the memory, so that the display device executes the voice control method provided by the embodiment of the application.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions, and when the computer instructions are executed on the display device, the display device can execute the voice control method provided by the embodiment of the application.
For example, the computer readable storage medium may be ROM, RAM, compact disk read-Only (CD-ROM), magnetic tape, floppy disk, optical data storage device, etc.
The embodiments also provide a computer program product containing computer instructions that, when executed on a display device, enable the display device to perform the voice control method provided by the embodiments of the present application.
The embodiment of the application also provides a device (for example, the device may be a chip system) which includes a processor for supporting the display device to implement the voice control method provided by the embodiment of the application. In one possible design, the apparatus further includes a memory for storing program instructions and data necessary for the display device. When the device is a chip system, the device can be formed by a chip, and can also comprise the chip and other discrete devices.
Illustratively, as shown in fig. 11, a chip system provided by an embodiment of the present application may include at least one processor 1101 and at least one interface circuit 1102. The processor 1101 may be a processor in the television set 200 described above. The processor 1101 and interface circuit 1102 may be interconnected by wires. The processor 1101 may receive and execute computer instructions from the memory of the television set 200 described above through the interface circuit 1102. The computer instructions, when executed by the processor 1101, may cause the television 200 to perform the steps performed by the television 200 in the above-described embodiments. Of course, the chip system may also include other discrete devices, which are not specifically limited in this embodiment of the present application.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules, that is, the internal structure of the apparatus (e.g., the first control device, the regional controller) is divided into different functional modules, so as to perform all or part of the functions described above. The specific working processes of the above-described system, apparatus (e.g., first control device, area controller) and unit may refer to the corresponding processes in the foregoing method embodiments, which are not described herein again.
In several embodiments provided herein, it should be understood that the disclosed systems, apparatuses (e.g., first control device, zone controller) and methods may be implemented in other manners. For example, the above-described embodiments of the apparatus (e.g., first control device, regional controller) are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the methods described in the various embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A display device, the display device comprising:
a display configured to display media assets;
a main control module configured to stop operation when the display device is in a standby state;
a power-on control module configured to: acquiring voice information when the display equipment is in a standby state; if the voice information comprises first voice information, triggering the main control module to start, and determining whether the voice information comprises second voice information except the first voice information; the first voice information is used for indicating a wake-up word;
the main control module is configured to: and if the voice information comprises the second voice information, controlling the display to display the media resource indicated by the second voice information according to the second voice information.
2. The display device of claim 1, wherein the display device comprises a display device,
the power-on control module is further configured to: determining whether the voice information includes the first voice information; if the voice information does not comprise the first voice information, determining whether the next voice information comprises the first voice information; the next voice information is acquired by the start-up control module after the voice information.
3. The display device of claim 2, wherein the display device comprises a communicator;
the main control module is specifically configured to:
acquiring the second voice information from the starting control module;
controlling the communicator to send the second voice information to a server;
controlling the communicator to receive the voice recognition result of the second voice information sent by the server;
and determining the media resources indicated by the second voice information according to the voice recognition result, and controlling the display to display the media resources indicated by the second voice information.
4. A display device according to claim 3, wherein,
the power-on control module is further configured to: if the voice information comprises the second voice information, generating a first identifier; the first identifier characterizes that the voice information comprises the second voice information;
The main control module is further configured to: sending a first query request to the starting control module;
the power-on control module is further configured to: responding to the first query request sent by the main control module, and sending the first identification to the main control module;
the main control module is further configured to: sending a second query request to the starting control module according to the first identifier sent by the starting control module;
the power-on control module is further configured to: and responding to the second query request sent by the main control module, and sending the second voice information to the main control module.
5. The voice control method is characterized by being applied to display equipment, wherein the display equipment comprises a starting control module, and the starting control module works when the display equipment is in a standby state; the method comprises the following steps:
the display equipment is in the standby state, and the starting control module acquires voice information;
if the voice information comprises first voice information, starting the display equipment, and determining whether the voice information comprises second voice information except the first voice information; the first voice information is used for indicating a wake-up word;
And if the voice information comprises the second voice information, the display equipment displays the media resource indicated by the second voice information according to the second voice information.
6. The method of claim 5, wherein the method further comprises:
the starting control module determines whether the voice information comprises the first voice information;
if the voice information does not comprise the first voice information, the starting control module determines whether the next voice information comprises the first voice information or not; the next voice information is acquired by the start-up control module after the voice information.
7. The method of claim 6, wherein the display device further comprises a master control module; the main control module stops working when the display equipment is in a standby state;
the display device is started up, and determines second voice information except the first voice information from the voice information, including:
the power-on control module triggers the main control module to start, and the power-on control module determines whether the voice information includes second voice information in addition to the first voice information.
8. The method of claim 7, wherein the displaying the media asset indicated by the second voice information according to the second voice information comprises:
the main control module acquires the second voice information from the starting control module;
the main control module sends the second voice information to a server;
the main control module receives a voice recognition result of the second voice information sent by the server;
and the main control module determines and displays the media resources indicated by the second voice information according to the voice recognition result.
9. The method of claim 8, wherein the master control module obtaining the second voice information from the power-on control module comprises:
if the voice information comprises the second voice information, the starting control module generates a first identifier; the first identifier characterizes that the voice information comprises the second voice information;
the main control module sends a first query request to the starting control module;
the starting control module responds to the first query request sent by the main control module and sends the first identification to the main control module;
The main control module sends a second query request to the starting control module according to the first identifier sent by the starting control module;
the starting control module responds to the second query request sent by the main control module and sends the second voice information to the main control module;
and the main control module receives the second voice information sent by the starting control module.
CN202211597149.3A 2022-12-12 2022-12-12 Voice control method and display device Pending CN116189674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211597149.3A CN116189674A (en) 2022-12-12 2022-12-12 Voice control method and display device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211597149.3A CN116189674A (en) 2022-12-12 2022-12-12 Voice control method and display device

Publications (1)

Publication Number Publication Date
CN116189674A true CN116189674A (en) 2023-05-30

Family

ID=86447952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211597149.3A Pending CN116189674A (en) 2022-12-12 2022-12-12 Voice control method and display device

Country Status (1)

Country Link
CN (1) CN116189674A (en)

Similar Documents

Publication Publication Date Title
US20200265838A1 (en) Electronic device and operation method therefor
JP6229287B2 (en) Information processing apparatus, information processing method, and computer program
JP2014203207A (en) Information processing unit, information processing method, and computer program
US20100088096A1 (en) Hand held speech recognition device
JP2014203208A (en) Information processing unit, information processing method, and computer program
WO2018095219A1 (en) Media information processing method and device
CN112599126B (en) Awakening method of intelligent device, intelligent device and computing device
JP7210745B2 (en) Display device control method and display device thereby
US20190066669A1 (en) Graphical data selection and presentation of digital content
CN112511882A (en) Display device and voice call-up method
CN112165641A (en) Display device
CN112182196A (en) Service equipment applied to multi-turn conversation and multi-turn conversation method
CN109032554A (en) A kind of audio-frequency processing method and electronic equipment
CN111885400A (en) Media data display method, server and display equipment
CN109389977B (en) Voice interaction method and device
CN112002321B (en) Display device, server and voice interaction method
CN115150501A (en) Voice interaction method and electronic equipment
WO2023155607A1 (en) Terminal devices and voice wake-up methods
CN113449068A (en) Voice interaction method and electronic equipment
CN108334339A (en) A kind of bluetooth equipment driving method and device
CN116437155A (en) Live broadcast interaction method and device, computer equipment and storage medium
JP2020198077A (en) Voice control method of electronic device, voice control apparatus of electronic device, computer device, and storage medium
CN116189674A (en) Voice control method and display device
CN113038048B (en) Far-field voice awakening method and display device
CN114694661A (en) First terminal device, second terminal device and voice awakening method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination