WO2023246151A9 - 显示设备和控制方法 - Google Patents

显示设备和控制方法 Download PDF

Info

Publication number
WO2023246151A9
WO2023246151A9 PCT/CN2023/078157 CN2023078157W WO2023246151A9 WO 2023246151 A9 WO2023246151 A9 WO 2023246151A9 CN 2023078157 W CN2023078157 W CN 2023078157W WO 2023246151 A9 WO2023246151 A9 WO 2023246151A9
Authority
WO
WIPO (PCT)
Prior art keywords
preset
display device
information
target
text
Prior art date
Application number
PCT/CN2023/078157
Other languages
English (en)
French (fr)
Other versions
WO2023246151A1 (zh
Inventor
王建君
李霞
张立泽
董逸晨
王娜
Original Assignee
海信视像科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210713044.3A external-priority patent/CN115240665A/zh
Priority claimed from CN202210768515.0A external-priority patent/CN115270808A/zh
Application filed by 海信视像科技股份有限公司 filed Critical 海信视像科技股份有限公司
Publication of WO2023246151A1 publication Critical patent/WO2023246151A1/zh
Publication of WO2023246151A9 publication Critical patent/WO2023246151A9/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]

Definitions

  • the embodiments of the present application relate to the field of display technology, and in particular, to a display device and a control method.
  • Current display devices can support user voice interaction. Users can control the display device through voice, such as voice control of the volume of the display device, voice search for relevant information in the display device, etc.
  • the display device When the display device receives voice control information from the user, it needs to send the voice control information to the server for semantic analysis, and respond to the user's voice information based on the semantic analysis results returned by the server.
  • the relationship between the user and the display device Each interaction requires the server to perform semantic analysis, which will increase server pressure and lead to greater loss of server resources.
  • An embodiment of the present application provides a display device.
  • the display device includes: a controller configured to: receive a user's target control request for the display device; in response to the target control request, obtain the target control text, and based on the predetermined Assume that the text library performs semantic matching on the target control text to obtain a semantic matching result.
  • the preset text library includes: preset terminology information and preset resource information; based on the semantic matching result, execute the target control request Corresponding control instructions.
  • Embodiments of the present application provide a control method, including: receiving a user's target control request for the display device; in response to the target control request, obtaining target control text, and executing the target control text based on a preset text library. Semantic matching, obtain semantic matching results, the preset text library includes: preset term information and preset resource information; based on the semantic matching results, execute the control instructions corresponding to the target control request; display and execute the control The response interface corresponding to the command.
  • Figure 1 is a schematic diagram of an operation scenario between a display device and a control device according to one or more embodiments
  • FIG. 2 is a hardware configuration block diagram of the control device 100 according to one or more embodiments
  • Figure 3 is a hardware configuration block diagram of a display device 200 according to one or more embodiments.
  • Figure 4 is a schematic diagram of software configuration in the display device 200 according to one or more embodiments.
  • Figure 5 is a schematic diagram of an icon control interface display of an application in the display device 200 according to one or more embodiments
  • Figure 6A is a system framework diagram for control according to one or more embodiments
  • Figure 6B is an architectural diagram of control according to one or more embodiments.
  • Figure 7A is a schematic flow chart of a control method
  • Figure 7B is a schematic interface diagram of a display device
  • Figure 7C is an interface schematic diagram of another display device
  • Figure 7D is an interface schematic diagram of yet another display device
  • Figure 7E is an interface schematic diagram of yet another display device
  • Figure 7F is an interface schematic diagram of yet another display device
  • Figure 7G is an interface schematic diagram of yet another display device
  • Figure 8A is a schematic structural diagram of matching target control text and candidate terms
  • Figure 8B is a schematic structural diagram of another type of target control text and candidate word matching
  • Figure 8C is a schematic structural diagram of matching target control text and candidate terms
  • Figure 8D is a schematic structural diagram of matching target control text and candidate terms
  • Figure 8E is a schematic structural diagram of matching target control text and candidate terms
  • Figure 8F is a schematic structural diagram of communication between a display device and a server
  • Figure 9 is a schematic diagram of an operation scenario between a display device and a control device according to one or more embodiments.
  • Figure 10A is a schematic flow chart of a semantic understanding method
  • Figure 10B is a schematic diagram of preprocessing of a semantic understanding method
  • Figure 11A is a schematic flow chart of another semantic understanding method
  • Figure 11B is a schematic diagram of the principle of another semantic understanding method
  • Figure 12A is a schematic flow chart of another semantic understanding method
  • Figure 12B is a schematic diagram of the principle of another semantic understanding method
  • Figure 12C is one of the interface display diagrams of a semantic understanding method
  • Figure 12D is the second interface display diagram of a semantic understanding method
  • Figure 12E is the third interface display diagram of a semantic understanding method.
  • the display device can respond to the user's control instructions in the display interface by interacting with the user, where the user can The display device sends voice control information to perform corresponding voice control on the display device. After receiving the voice control information sent by the user, the display device will send the voice control information to the server to facilitate semantic analysis, so that it can be based on the information returned by the server. Semantically analyze the results and execute corresponding controls.
  • the display device after receiving the target control request for the display device sent by the user, the display device performs text conversion on the target control request to obtain the target control text, and performs the target control text based on the preset text library.
  • Semantic matching determines the semantic matching results, which can realize semantic understanding when interacting with the user on the device side, and based on the semantic matching results, execute the control instructions corresponding to the target control request, and display the response interface corresponding to the execution control instructions.
  • the device side performs semantic matching based on the preset text library to avoid the problem of needing to upload to the server and consume server-side resources every time the user interacts. It can effectively save the pressure on the server side and improve the efficiency of interaction control.
  • FIG. 1 is a schematic diagram of an operation scenario between a display device and a control device according to one or more embodiments.
  • the user can speak voice control information to the display device 200 , or the user can send voice control information to the control device 100 of the display device 200 (or the smart device 300 associated with the display device 200 ) to implement control of the display.
  • the semantic control of the device enables the display device 200 to perform semantic matching locally on the device, effectively identify the voice control information issued by the user, execute relevant control instructions, and display a response interface corresponding to the execution control instructions to facilitate user experience.
  • the user can use a remote control or mobile phone to turn on the display device and send voice control information to the display device.
  • the display device can perform semantic matching on the voice control information and identify user needs.
  • the user can send voice data to the display device at the viewing position of the display device, and the microphone array of the display device collects the voice data spoken by the user and performs semantic matching, wherein multiple preset positions can be set on the display device, It is used to load the microphone array to facilitate the effective acquisition of external voice data.
  • multiple preset positions can be set at the bottom of the display of the display device, such as the first position, the second position and the third position.
  • the first position can be loaded A microphone array, one microphone array can be loaded in the second position, one microphone array can be loaded in the third position, or multiple preset positions can be set on the upper part of the display of the display device, such as the fourth position, the fifth position and the third position.
  • a microphone array can be loaded in the fourth position, a microphone array can be loaded in the fifth position, and a microphone array can be loaded in the sixth position, or multiple preset positions can be set on the left side of the display of the display device.
  • a microphone array can be loaded in the seventh position, a microphone array can be loaded in the eighth position, and a microphone array can be loaded in the ninth position, or a microphone array can be loaded in the display device.
  • Multiple preset positions are set on the right side of the display, such as the tenth position, the eleventh position and the twelfth position.
  • a microphone array can be loaded in the tenth position, a microphone array can be loaded in the eleventh position, and a microphone array can be loaded in the twelfth position.
  • a microphone array can be installed in the display device, and when the display device receives the user's voice data, it can effectively and quickly perform semantic matching to respond to the user, thus solving the problems in the existing technology and enabling the display device to perform semantic matching based on preset text on the device side.
  • the library performs semantic matching on it to avoid the problem of needing to upload to the server and consume server-side resources every time the user interacts. It can effectively save the pressure on the server and improve the efficiency of interaction control.
  • the loading position of the microphone array in the display device can be installed in different areas of the display at the same time.
  • the first area, the second area, ..., the twelfth area may exist at the same time, so that voice data can be obtained more effectively and accurately. This is not specifically limited in the embodiments of the present application.
  • control device 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication, Bluetooth protocol communication, wireless or other wired methods to control the display device 200 .
  • the user can control the display device 200 by inputting user instructions through buttons on the remote control, voice input, and control panel input.
  • mobile terminals, tablets, computers, laptops, and other smart devices can also be used to control the display device 200 .
  • the display device 200 may not use the above-mentioned smart device or control device to receive instructions, but may receive user control through touch, gestures, or voice input.
  • the display device 200 can also be controlled in a manner other than the control device 100 and the smart device 300 .
  • the display device 200 can directly receive the user's voice command control through a module configured inside the display device 200 to obtain voice commands.
  • the user's voice command control can also be received through a voice control device provided outside the display device 200 .
  • the smart device 300 can communicate with the software application installed in the display device 200 through a network communication protocol to achieve one-to-one control operations and data communication purposes.
  • the audio and video content displayed on the smart device 300 can also be transmitted to the display device 200 to realize the synchronous display function.
  • the display device 200 also performs data communication with the server 400 through various communication methods, which can allow the display device 200 to communicate through a local area network (LAN), a wireless local area network (WLAN) and other networks.
  • the server 400 can provide various content and interactions to the display device 200.
  • the display device 200 may be a liquid crystal display, an OLED display, a projection display device, or the like.
  • the display device 200 may also additionally provide a smart network television function that provides computer support functions.
  • FIG. 2 is a configuration block diagram of the control device 100 according to an exemplary embodiment.
  • the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply.
  • the control device 100 can receive input operation instructions from the user, and convert the operation instructions into instructions that the display device 200 can recognize and respond to, thereby mediating the interaction between the user and the display device 200 .
  • the communication interface 130 is used to communicate with the outside and includes at least one of a WIFI chip, a Bluetooth module, NFC or a replaceable module.
  • the user input/output interface 140 includes at least one of a microphone, a touch pad, a sensor, a button, or a replaceable module.
  • FIG. 3 is a hardware configuration block diagram of the display device 200 according to an exemplary embodiment.
  • the display device 200 includes a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface (i.e., user interface). input interface) 280.
  • the controller 250 includes a central processing unit, a video processor, an audio processor, a graphics processor, a RAM, a ROM, and first to nth interfaces for input/output.
  • the display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen.
  • the tuner-demodulator 210 receives broadcast television signals through wired or wireless reception methods, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or wired broadcast television signals.
  • the communicator 220 is a component for communicating with external devices or servers according to various communication protocol types.
  • the communicator may include at least one of a Wifi module, a Bluetooth module, a wired Ethernet module, other network communication protocol chips or near field communication protocol chips, and an infrared receiver.
  • the device 200 can establish the transmission and reception of control signals and data signals with the external control device 100 or the server 400 through the communicator 220.
  • the detector 230 is used to collect signals from the external environment or interactions with the outside.
  • the controller 250 and the tuner-demodulator 210 may be located in different separate devices, that is, the tuner-demodulator 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.
  • the user interface 280 can be used to receive control signals from the control device 100 (such as an infrared remote control, etc.).
  • the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in the memory.
  • the controller 250 controls the overall operation of the display device 200.
  • the user may input a user command into a graphical user interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user can input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.
  • FIG 4 is a schematic diagram of software configuration in the display device 200 according to one or more embodiments of the present application. As shown in Figure 4, the system is divided into four layers, from top to bottom they are the Applications layer (abbreviation). "Application layer”), Application Framework layer ("Framework layer”), Android runtime and system library layer (“System Runtime layer”), and kernel layer.
  • Application layer Application layer
  • Framework layer Application Framework layer
  • System Runtime layer Android runtime and system library layer
  • kernel layer kernel layer
  • At least one application program runs in the application layer.
  • These applications can be the window program, system setting program or clock program that comes with the operating system; they can also be developed by third-party developers. s application.
  • applications in the application layer include but are not limited to the above examples.
  • the system runtime layer provides support for the upper layer, that is, the framework layer.
  • the Android operating system will run the C/C++ library included in the system runtime layer to implement the functions to be implemented by the framework layer.
  • the kernel layer is a layer between hardware and software, and includes at least one of the following drivers: audio driver, display driver, Bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, pressure sensor, etc.), and power driver, etc.
  • Figure 5 is a schematic diagram of the icon control interface display of an application in the display device 200 according to one or more embodiments of this application.
  • the application layer includes at least one application that can display the corresponding Icon controls, such as: Live TV application icon control, video on demand application icon control, media center application icon control, application center icon control, game application icon control, etc.
  • Live TV app that provides live TV from different sources.
  • Video on demand application that can provide videos from different storage sources. Unlike live TV applications, video on demand offers the display of video from certain storage sources.
  • Media center application can provide various multimedia content playback applications. The application center can provide storage for various applications.
  • the above-mentioned display device is a terminal device with a display function, such as a television or a flat-panel television.
  • a display function such as a television or a flat-panel television.
  • the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in the memory.
  • the controller 250 controls the overall operation of the display device 200.
  • the user may input a user command into a graphical user interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user can input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.
  • An output interface (display 260, and/or audio output interface 270) configured to output user interaction information
  • Communicator 220 used to communicate with server 400;
  • the controller 250 is configured to: receive a user's target control request for the display device;
  • the target control text is obtained, and the target control text is semantically matched based on a preset text library to obtain a semantic matching result.
  • the preset text library includes: preset terminology information and preset resources. information;
  • the display 260 is configured to display a response interface corresponding to executing the control instruction.
  • controller 250 is specifically configured to:
  • the target control text does not match all terms included in the preset term information, and semantic matching is performed on the target control text based on the preset resource information to obtain a semantic matching result.
  • controller 250 is specifically configured to:
  • controller 250 is specifically configured to:
  • controller 250 is also configured to:
  • controller 250 is also configured to:
  • the server In response to the second information sending request, receive preset resource information sent by the server, wherein the preset resource information is determined by the server based on the user's historical control request and the user's historical access amount, so
  • the above-mentioned preset resource information includes popular resources in at least two fields;
  • controller 250 is specifically configured to:
  • the target control text does not match all the resources included in the preset resource information, and the target control text is sent to the server so that the server performs semantic analysis on the target control text and determines that it is consistent with the target control text.
  • the third control instruction In response to the third control instruction corresponding to the target control text sent by the server, the third control instruction is executed.
  • the display device after receiving the target control request for the display device sent by the user, the display device performs text conversion on the target control request to obtain the target control text, and converts the target control text based on the preset text library.
  • This method performs semantic matching and determines the semantic matching results, which can realize semantic understanding when interacting with the user on the device side, and based on the semantic matching results, execute the control instructions corresponding to the target control request, and display the response interface corresponding to the execution control instructions.
  • It can perform semantic matching based on the preset text library on the device side, avoiding the problem of needing to upload to the server and consuming server-side resources every time the user interacts. It can effectively save the pressure on the server and improve the efficiency of interaction control.
  • Figure 6A is a system framework diagram for control according to one or more embodiments of this application.
  • the system may include a control request receiving module 601, a semantic matching module 602, a control instruction execution module 603 and a response Interface display module 604.
  • the system receives the user's target control request for the display device through the control request receiving module 601, and the semantic matching module 602 responds to the target control request, obtains the target control text, and performs the target control text based on the preset text library. Semantic matching is performed to obtain a semantic matching result.
  • the preset text library includes: preset terminology information and preset resource information.
  • the control instruction execution module 603 executes the control instruction corresponding to the target control request based on the semantic matching result.
  • Figure 6B is an architecture diagram for control according to one or more embodiments of the embodiment of the present application. Based on the above system framework, the implementation of the embodiment of the present application in the Android system is shown in Figure 6B.
  • the Android system mainly includes an application layer and a framework. layer, system runtime layer and kernel layer, the implementation logic is mainly reflected in the application layer, including the control request receiving module, semantic matching module, control instruction execution module and response interface display module.
  • the control method provided in the embodiment of the present application detects the user's voice behavior in real time, receives the voice data sent by the user, collects the user's voice data during a speaking period as the user's target control request for the display device, and obtains the target control request from the target control request.
  • Carry out text analysis to obtain the target control text in which a preset text library can be pre-stored in the display device, and the preset text library can be used to describe the preset wording information of the user's habitual words and the preset resources used to describe the current popular resources.
  • the user interface corresponding to the command facilitates the user to learn the control results, thereby enabling semantic matching on the device side without requiring the participation of the server for every interaction, reducing the number of visits to the server and avoiding resource loss caused by invalid access on the server side. problems, reduce server pressure, and improve interaction efficiency.
  • FIG. 7A The following will be described in conjunction with FIG. 7A in an exemplary manner. It can be understood that the steps involved in FIG. 7A may include more steps or fewer steps during actual implementation, and the order between these steps is also different. Can be different.
  • Figure 7A is a schematic flow chart of a control method provided by an embodiment of the present application.
  • the control method specifically includes the following steps:
  • the display device may include some smart devices capable of voice interaction.
  • the display device may include: smart TV, smart air conditioner, smart refrigerator, smart audio, etc.
  • the display device may also include other smart devices that support voice interaction. The embodiments of this application do not specifically limit this.
  • the display device can record the interactive voice issued by the user in real time to feed back the interactive information of the interactive voice response to the user.
  • the user's interactive voice with the smart TV can be "Turn the volume down", and correspondingly, the user's target control request for the smart TV is “Turn the volume down", indicating the user's need to lower the volume.
  • Smart TV can effectively determine the user's target control request for the smart TV by collecting the user's voice data in real time.
  • the smart TV can collect the user's voice data in its corresponding preset area in real time. If the voice data is not detected within the preset period, or the voice data is detected to be unclear, the preset area can be expanded for continuous voice collection. Thus, the user's voice data can be collected effectively and accurately.
  • the preset area may be an area corresponding to the preset sound field range of the smart TV, or other areas around the preset smart TV, which are not specifically limited in the embodiments of the present application.
  • the interactive voice between the user and the smart air conditioner can be "temperature decrease", and correspondingly, the user's target control request for the smart air conditioner is "temperature decrease", indicating that the user has a need to lower the temperature.
  • the smart air conditioner The air conditioner can effectively determine the user's target control request for the smart air conditioner by collecting the user's voice data in real time.
  • the smart air conditioner can collect the user's voice data in its corresponding preset area in real time. If the voice data is not detected within the preset period, or the voice data is detected to be unclear, the preset area can be expanded for continuous voice collection. Thus, the user's voice data can be collected effectively and accurately.
  • the preset area may be other areas around the preset smart air conditioner, such as a designated area in front of the smart air conditioner.
  • the user's interactive voice with the smart refrigerator can be "open/open”, and correspondingly, the user's target control request for the smart refrigerator is "open/open", indicating that the user has the need to open the refrigerator.
  • the smart refrigerator can effectively determine the user's target control request for the smart air conditioner by collecting the user's voice data in real time.
  • the smart refrigerator can collect the user's voice data in its corresponding preset area in real time. If the voice data is not detected within the preset period, or the voice data is detected to be unclear, the preset area can be expanded for continuous voice collection. Thus, the user's voice data can be collected effectively and accurately.
  • the preset area may be other areas around the preset smart refrigerator, such as a designated area in front of the smart refrigerator, which is not specifically limited in the embodiments of the present application.
  • the user's interactive voice with the smart speaker can be "turn up the playback volume", and correspondingly, the user's target control request for the smart speaker is "turn up the playback volume", indicating that the user has increased the playback volume.
  • smart speakers can effectively determine the user's target control request for smart speakers by collecting the user's voice data in real time.
  • the smart speaker can collect the user's voice data in its corresponding preset area in real time. If the voice data is not detected within the preset period, or the voice data is detected to be unclear, the preset area can be expanded for continuous voice collection. Thus, the user's voice data can be collected effectively and accurately.
  • the preset area may be an area corresponding to the preset sound field range of the smart speaker, or other areas around the preset smart speaker.
  • the preset area can be related to the current placement position of the smart speaker. For example, when the smart speaker moves from the first position to the second position, the preset area can be changed from the first preset area corresponding to the first position to The second location area corresponding to the second location.
  • the display device can perform a text conversion operation on the received target control request, and convert the voice data into text data to facilitate text matching by the display device.
  • the text conversion operation is an operation of converting speech data into text data.
  • the text conversion operation can be implemented through a speech recognition algorithm/speech recognition software.
  • the display device can semantically match the target control text based on the preset text library on the device side, thereby avoiding multiple accesses to the server and occupying server resources.
  • the preset text library may include: preset terminology information and preset resource information.
  • the preset user information is the user's usual idioms, and the preset resource information is the popular words on the network/online during the current period.
  • the preset word information can be the words used by the user when interacting with the display device.
  • the default user information can include but is not limited to: turn on the TV, turn on the TV, lower the volume, increase the volume. Turn up the volume, turn off the TV, etc.
  • the default resource information can be: song name A, TV series B, video C, etc.
  • the preset user information may include but is not limited to: turning on the air conditioner, turning on the air conditioner, lowering the temperature, increasing the temperature, turning off the air conditioner, etc.
  • the preset resource information may be: 26°, 16° wait.
  • the preset user information may include but is not limited to: open the refrigerator, turn on the refrigerator, lower the temperature, increase the temperature, close the refrigerator, etc.
  • the preset resource information may be: silent mode, deodorization mode etc.
  • the preset user information may include but is not limited to: turning on the speaker, turning on the speaker, turning up the volume, turning down the volume, turning off the sound, etc.
  • the preset resource information may be: song name D, Pop E et al.
  • the preset text library can correspond to users, that is, different users have different preset word information and preset resource information. Different users can be identified by their voice characteristics, and preset texts uniquely corresponding to the users can be set. library, so that when the user's voice data is collected, it can be matched based on the corresponding preset text library.
  • the target control text can be semantically matched based on the first preset text library corresponding to the first user to obtain a semantic matching result, or when it is determined that the person interacting with the display device is the first user.
  • the target control text can be semantically matched based on the second preset text library corresponding to the second user, and the semantic matching result can be obtained.
  • the display device can store preset text libraries corresponding to multiple users.
  • the number of preset text libraries that can be stored in the display device is related to the storage capacity of the display device. For example, a display device with a large storage capacity can More preset text libraries are stored, and smaller preset text libraries can be stored for display devices with smaller storage capacity.
  • the display device can update the preset text library stored by the display device according to the storage time or the number of user visits.
  • the display device can pre-set an update time (such as one week), and can choose to regularly clean up the stored preset text libraries every week, such as deleting some preset text libraries that have not been matched for a long time to facilitate other new preset text libraries. of joining.
  • an update time such as one week
  • the display device while the display device is regularly updated, it can also perform adaptive updates based on its storage capacity.
  • the display device stores preset text libraries corresponding to ten users. When an eleventh preset text library is detected, When joining, you can delete a default text library according to the number of uses/usage time of the default text library to facilitate the addition of a new default text library, thus, Improve the usefulness of the preset text library stored by the display device.
  • the display device can record the number of visits of each user. For example, the number of visits of the first user is 2, the number of visits of the second user is 10, and the number of visits of the third user is 15. Then it can be determined that the first user does not belong to the frequently used
  • the user can delete the preset text library corresponding to the first user when the update time arrives, thereby facilitating real-time updating of the preset text library and ensuring its practicality.
  • the semantic matching results can effectively reflect the execution content corresponding to the target control instruction issued by the user, thereby effectively determining the control instruction corresponding to the execution content and executing the control instruction.
  • the control instruction corresponding to the execution target control request is to change the current value of the smart TV. Playback volume increases.
  • the display device is a smart air conditioner
  • the control instruction corresponding to the target control request is to lower the temperature of the smart air conditioner.
  • the display device is a smart refrigerator
  • the control instruction corresponding to the execution target control request is to open the door of the smart refrigerator.
  • the control instruction corresponding to the execution target control request is to increase the playback volume of the smart speaker.
  • the method further includes: displaying a response interface corresponding to the execution control instruction.
  • the response interface can be a new display content on the original interface, or it can be a display interface after changing the original interface.
  • the response interface corresponding to the display execution control instruction can be divided into the following multiple implementation methods.
  • the response interface corresponding to the display execution control instruction may be to add new display content corresponding to the current display interface of the display device. display interface.
  • the current display interface of the smart TV plays display content 1, as exemplarily shown in Figure 7B, and the control instruction corresponding to the target control request is to turn up the volume, then when the control instruction corresponding to the target control request is executed, The current display interface of the smart device will not change.
  • the response interface corresponding to the corresponding display execution control instruction can be a display instruction to turn up the volume in a preset area of the current display interface.
  • the response interface is as exemplarily shown in Figure 7C.
  • the current display interface of the smart refrigerator shows that the temperature of the refrigerator compartment is 5°C, as exemplarily shown in Figure 7D.
  • the control instruction corresponding to the target control request is to adjust the temperature of the refrigerator compartment to 1°C, then the target is executed.
  • the control request corresponds to the control instruction
  • the current display interface of the smart device will not change.
  • the response interface corresponding to the corresponding display execution control instruction can adjust the display temperature to 1°C in the temperature display area of the current display interface, and can display the Adjust the message and the response interface is shown as an example in Figure 7E.
  • the response interface corresponding to the display execution control instruction may be another display after changing the current display interface of the display device. interface.
  • the current display interface of the smart TV displays the first episode of TV series 1.
  • the content is display content 2.
  • the control instruction corresponding to the target control request is to play the next episode.
  • the current display interface of the smart TV will change to another display.
  • the response interface corresponding to the corresponding display execution control instruction can be the display interface corresponding to the next episode after the current display interface changes, displaying the second episode of TV series 1, and the playback content is display content 3.
  • the response interface is as shown in Figure 7G. shown.
  • the target control text when determining the semantic matching result, can be semantically matched according to the preset term information and the preset resource information in the preset text library. For example, first, the preset term information included in the preset term information can be semantically matched. All candidate terms are semantically matched with the target control text. If none of the candidate terms included in the preset term information can successfully match the target control text, all candidate resources included in the preset resource information can be used to match the target control text. Perform one-to-one matching to determine the semantic matching result.
  • a matching threshold can be set to measure the matching degree between the target control text and each candidate term included in the preset term information. For example, the matching threshold can be set to 85%. If the target control text and the preset term information include If the matching degree of the candidate terms is greater than 85%, it is determined that the matching is successful. If the matching degree of the target control text and the candidate terms included in the preset term information is less than or equal to 85%, it is determined that the matching fails.
  • the target control text is matched with five candidate terms included in the preset term information.
  • the matching degree between the target control text and candidate term 1 is 25%
  • the matching degree between the target control text and candidate term 2 is 25%.
  • the matching degree is 28%
  • the matching degree between the target control text and candidate term 3 is 40%
  • the matching degree between the target control text and candidate term 4 is 50%
  • the matching degree between the target control text and candidate term 5 is 10%.
  • the target control text matches a candidate term in the preset term information (the matching degree is higher than 85%)
  • the control instruction corresponding to the target control request is the same as the first control instruction corresponding to this candidate term. , which can execute the first control instruction and is used to respond to the target control request.
  • the target control text is matched with five candidate terms included in the preset term information.
  • the matching degree between the target control text and candidate term 1 is 25%
  • the matching degree between the target control text and candidate term 2 is 25%.
  • the matching degree is 90%
  • the matching degree between the target control text and candidate term 3 is 40%
  • the matching degree between the target control text and candidate term 4 is 50%
  • the matching degree between the target control text and candidate term 5 is 10%, then it can
  • the candidate term that matches the target control text is determined to be candidate term 2.
  • the candidate term with the highest matching degree can be selected as the candidate term that successfully matches the target control text.
  • the target control text is matched with five candidate terms included in the preset term information.
  • the matching degree between the target control text and candidate term 1 is 25%
  • the matching degree between the target control text and candidate term 2 is 25%.
  • the matching degree is 90%
  • the matching degree between the target control text and candidate term 3 is 92%
  • the matching degree between the target control text and candidate term 4 is 50%
  • the matching degree between the target control text and candidate term 5 is 10%, then it can
  • the candidate term that matches the target control text is determined to be candidate term 3.
  • all candidate resources in the preset resource information can be matched with the target control text. If there are all candidate resources included in the preset resource information that can match the target If the control text successfully matches the candidate resource, it can be determined that the control instruction corresponding to the target control request is the same as the second control instruction corresponding to the candidate resource, and the second control instruction can be executed to respond to the target control request.
  • a matching threshold can be set to measure each candidate included in the target control text and the preset resource information.
  • the matching degree of the selected resource for example, the matching threshold can be set to 85%. If the matching degree between the target control text and the candidate resources included in the preset resource information is greater than 85%, the matching is determined to be successful. If the target control text and the preset resource information If the matching degree of the candidate resources included in is less than or equal to 85%, it is determined that the matching fails.
  • the target control text is matched with three candidate resources included in the preset resource information.
  • the matching degree between the target control text and candidate resource 1 is 40%
  • the matching degree between the target control text and candidate resource 2 is 40%.
  • the matching degree is 93%
  • the matching degree between the target control text and candidate resource 3 is 20%. Then it can be determined that the candidate resource matching the target control text is candidate term 2.
  • the candidate resource with the highest matching degree may be selected as the candidate resource that successfully matches the target control text.
  • the target control text matches three candidate resources included in the preset resource information.
  • the matching degree between the target control text and candidate resource 1 is 90%, and the matching degree between the target control text and candidate resource 2 is 90%.
  • the matching degree is 86%, and the matching degree between the target control text and candidate resource 3 is 20%. Then it can be determined that the candidate resource matching the target control text is candidate term 1.
  • the server can predetermine a preset text library based on the user's historical access data, and send the preset text library to the display device for storage.
  • the server may send a first information sending request to the display device based on the update frequency of the preset terminology information, so that the display device receives the preset terminology information sent by the server to build/update the preset text library.
  • the device when the display device interacts with the user, the device can understand the user's intention based on the preset wording information and quickly provide feedback to the user.
  • the server when it determines the preset word information, it can calculate the user's habitual words based on the user's voice access history data.
  • the display format of the user's access history data is shown in Table 1 below.
  • the server may send a second information sending request to the display device based on the update frequency of the preset resource information, so that The display device receives the preset resource information sent by the server to build/update the preset text library.
  • the display device when the display device interacts with the user, it can understand the user's intention on the device side based on the preset resources and quickly provide feedback to the user.
  • the server when it determines the preset resource information, it can obtain hot media information in the corresponding field based on the user portrait.
  • the user portrait collects data from various dimensions such as the user's social attributes, consumption habits, preference characteristics, etc., and then analyzes the user or product characteristics. Characterize attributes and analyze and statistically mine potential value information on these features, thereby abstracting a full picture of a user. The details are as follows.
  • Create popular resources in various fields Obtain popular resources in various fields based on the search results of all users and the current hot search list, as shown in Table 3 below.
  • Obtain popular resources Obtain the hot media resource results in the corresponding field according to the tag type, obtain the hash value of the user ID, then take the modulo of 7 (assuming that the data is pushed between 1-6 am), obtain the slot value, and save the data in the corresponding in Table 5.
  • the server can regularly send the preset terminology information and preset resource information to the display device to facilitate storage/updating of the display device.
  • the user's habitual words (default word information) have a smaller range of changes and a low update frequency, and can be set to be updated once a month.
  • the Internet hot word media information (default resource information) has a higher update frequency, and can be set to be updated once a day.
  • the server can be set to push every hour between 1 a.m. and 6 a.m. on the 1st of each month according to the slot value in Table 2. For example: data with a slot value of 1 is pushed between 1 o'clock and 2 o'clock, and so on. Similarly, for Table 5, data is pushed according to the slot value from 1 am to 6 o'clock every morning.
  • the interaction between the display device and the server can use the lightweight Message Queuing Telemetry Transport (MQTT) publish/subscribe message transmission protocol.
  • MQTT Message Queuing Telemetry Transport
  • the corresponding topic is established between the two parties, using QOS1 (a protocol between the data sender and the data receiver) quality of service, and setting the expiration time to 120 seconds.
  • the server sends network hot word media information and information to the corresponding topic according to the above procedures.
  • User idiom information if the corresponding message is successfully consumed, the corresponding data flag position in the database will be 1.
  • the server will only push the user idioms of the month and the hot words and media information of the day; if the user fails to read the message, the server will send the message again, at least once it reaches the display equipment.
  • MQTT is a machine-to-machine/IoT connection protocol. It is designed as an extremely lightweight publish/subscribe message transfer protocol, which is very useful for remote connections that require a smaller code footprint and/or network bandwidth is at a premium. Designed for constrained devices and low-bandwidth, high-latency, or unreliable networks, these principles also make the protocol suitable for connected devices in the emerging "machine-to-machine" or Internet of Things world, as well as for very high bandwidth and battery power applications. Ideal for mobile applications due to its small size, low power consumption, minimal data packets, and efficient distribution of information to one or more receivers.
  • the display device will receive repeated messages. For repeatedly received messages, the display device can use a message overwriting strategy to store the messages.
  • the embodiments of the present application execute the above control method on the display device.
  • the display device receives the target control request for the display device sent by the user, it performs text conversion on the target control request to obtain the target control text.
  • the preset text library performs semantic matching on the target control text and determines the semantic matching results. It can realize semantic understanding when interacting with the user on the device side, and based on the semantic matching results, execute the control instructions corresponding to the target control request and display the corresponding execution control instructions.
  • the response interface can be semantically matched based on the preset text library on the device side, avoiding the problem of needing to upload to the server and consuming server-side resources every time the user interacts, which can effectively save the pressure on the server and improve the efficiency of interaction control. .
  • the display device is explained by taking voice interaction on a television or the like as an example.
  • the display device may also be a display device on a smart refrigerator or other home appliances.
  • FIG. 9 is a schematic diagram of an application scenario of a semantic understanding process of a display device provided by an embodiment of the present application.
  • the user can operate the display device 200 through the control device 100 or the terminal device 300 .
  • the semantic understanding process can be used in voice interaction scenarios between users and smart homes.
  • the display device 200 in this scenario can be a smart refrigerator, a smart washing machine, and other smart devices with smart display functions.
  • the user wants to interact with the smart device in the scenario.
  • a voice command needs to be issued first.
  • the smart device receives the voice command, it performs semantic understanding of the voice command and determines the semantic understanding result corresponding to the voice command, so that the subsequent smart device can display the semantic result according to the command. Or execute corresponding control instructions to meet the user's needs and improve the user experience.
  • the display device 200 can also be controlled using a terminal device 300 (such as a mobile terminal, a tablet computer, a notebook computer, etc.).
  • the display device 200 is controlled using an application program running on the terminal device 300 .
  • the terminal device 300 can install a software application with the display device 200 to realize connection communication through a network communication protocol to achieve one-to-one control operations and data communication purposes.
  • the semantic understanding content displayed on the terminal device 300 can also be transmitted to the display device 200 to realize the synchronous display function.
  • the display device 200 can also display content according to the user's voice instructions. For example, if the user's voice command is "Can you help me put two kilograms of beef and three kilograms of pork with a shelf life of 3 days in the refrigerator?", the display device 200 will display the corresponding content after semantic understanding.
  • the display device 200 may not use the above-mentioned terminal device 300 to receive instructions, but may receive user control through touch or gestures. For example, when the display device 200 is a smart refrigerator and the user adjusts the upper refrigerator door from the open state to the closed state, the display device 200 displays the multiple food ingredients put in before closing the refrigerator door and the attribute identifiers corresponding to the various food ingredients. .
  • the semantic understanding method provided by the embodiments of the present application can be implemented based on computer equipment, or functional modules or functional entities in the computer equipment.
  • the computer device may be a personal computer (PC), a server, a mobile phone, a tablet computer, a notebook computer, a large computer, etc., which are not specifically limited in the embodiments of this application.
  • FIGS. 10A, 11A, and 12A will be described in an exemplary manner with reference to FIGS. 10A, 11A, and 12A.
  • each step in the flow chart of FIGS. 10A, 11A, and 12A is as follows: The arrow directions are shown in sequence, but the steps are not necessarily performed in the order indicated by the arrows. Unless otherwise specified in this article, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders.
  • at least some of the steps in Figure 10A, Figure 11A, and Figure 12A may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times.
  • semantic understanding uses a series of AI (Artificial Intelligence, artificial intelligence) algorithms to parse text into structured, machine-readable intent and groove information, making it easier for Internet developers to better understand and meet user needs.
  • AI Artificial Intelligence, artificial intelligence
  • the semantic understanding scenario is that the smart refrigerator performs semantic understanding on the operations of multiple ingredients, multiple attributes or missing attributes spoken by the user, and displays the semantic understanding content on the display device of the smart refrigerator.
  • the method specifically includes the following steps:
  • the word segmentation information includes: at least one food material word segmentation and an attribute identifier corresponding to the food material word segmentation.
  • the steps to obtain the word segmentation information corresponding to the voice ingredient data can be as follows: first, remove prepositions, adverbs and other words that do not have professional attributes for refrigerator ingredient management; then, obtain all identifiable The names of the ingredients are uniformly marked according to the type of ingredients; finally, other attributes such as quantifiers are marked in different dimensions.
  • the corresponding word segmentation information obtained includes: two pounds, shelf life, 3 days, beef, pork, three pounds.
  • the ingredient segmentation can be mango, apple, beef, pork, mutton, etc.
  • the attribute identifier corresponding to the ingredient segmentation can be fruit, meat, vegetables, weight, etc.
  • the attribute identifiers are: two pounds and three pounds.
  • the ingredient segments determined are beef and pork, and then The corresponding target participles are: beef and pork.
  • the display device can output prompts such as "I don't understand this sentence" indicating that the keywords of the user's voice data cannot be recognized.
  • the semantic understanding content corresponding to at least one ingredient word segment is determined based on the sequence relationship between the target word segment and other word segments in at least one ingredient word segment. Take “Can you help me put two kilograms of beef with a shelf life of 3 days and three kilograms of pork in the refrigerator" as an example.
  • the semantic understanding content corresponding to at least one ingredient segment is determined to be: beef, two kilograms, shelf life of 3 days; pork, three kilograms catty.
  • the semantic understanding content corresponding to at least one ingredient word segment is displayed: beef, two kilograms, shelf life of 3 days; pork, three kilograms.
  • the display device first obtains word segmentation information corresponding to the voice food material data, where the word segmentation information includes at least one food material word segmentation and an attribute identifier corresponding to the food material word segmentation, and then determines the target word segmentation in at least one food material word segmentation based on the attribute identifier, Then, based on the sequence relationship between the target word segmentation and other word segments in at least one food ingredient segmentation, the semantic understanding content corresponding to at least one food ingredient segmentation is determined, and finally the semantic understanding content corresponding to at least one food ingredient segmentation is displayed.
  • FIG. 11A is a schematic flowchart of another semantic understanding method provided by an embodiment of the present application. This embodiment is further expanded and optimized based on Figure 10A. Optionally, this embodiment mainly explains the process of step S53 (determining the semantic understanding content corresponding to at least one of the food material word segments based on the sequence relationship between the target word segmentation and other word segments in at least one of the food material word segments).
  • the target participle is the first subject participle that ranks first among at least one ingredient participle.
  • the phrase information includes at least one group of phrases.
  • the phrase information represents a combination of a phrase corresponding to the first subject participle and a phrase corresponding to the second subject participle.
  • the target participle is the first subject participle that is ranked first in at least one ingredient participle, in this embodiment, the target participle is beef.
  • the phrase corresponding to the first subject participle is "two catties, shelf life 3 days", the second subject participle is "pork”, and the phrase corresponding to the second subject participle is "three catties”.
  • the semantic understanding content corresponding to at least one ingredient segment is determined to be: beef, two kilograms, shelf life of 3 days; pork, three kilograms.
  • Figure 12A is a schematic flowchart of another semantic understanding method provided by an embodiment of the present application. This embodiment is further expanded and optimized based on Figure 6A. Optionally, this embodiment mainly focuses on the process of step S531 (determining phrase information based on the attribute identification of the first iterative word segmentation in at least one of the food ingredient word segments and the sequence relationship between the first iterative word segmentation and the target word segmentation). Be explained.
  • the corresponding word segmentation information obtained includes: “two kilograms", “shelf life”, “3 days”, “Beef”, “pork”, “three pounds”.
  • the first ergodic participle is "two catties”, where there can be multiple second subject participles.
  • the second subject participle is pork.
  • the first traversal participle is "two catties", the target participle is "beef”, and the phrase information is "shelf life 3 days”.
  • step S6312 determining phrase information based on the sequence relationship between the first traversal word segmentation and the target word segmentation.
  • the corresponding word segmentation information obtained includes: “two kilograms”, “shelf life”, “3 days”, “Beef”, “pork”, “three pounds”.
  • the first iterative participle is: “two catties”, and the target participle is: “beef”.
  • the sequence relationship between the first iterative participle and the target participle is a preset relationship. There is an accusative word between the first iterative participle and the target participle. The word is: "shelf life”.
  • the object word is "shelf life” and the participle corresponding to the object word is "3 days".
  • the first phrase is composed of the object word and the participle corresponding to the object word, that is, "shelf life is 3 days”.
  • step B can be implemented in the following ways:
  • B-1 Determine the second traversal participle as the second subject participle based on the attribute identification of the second traversal participle in at least one of the food participles, and based on other participles between the target participle and the first traversal participle and The first phrase forms a second phrase.
  • the corresponding word segmentation information obtained includes: “two kilograms", “shelf life”, “3 days”, “Beef”, “pork”, “three pounds”.
  • the second traversal participle is "pork”, and the attribute identifier of the second traversal participle is "three catties”, so the second traversal participle is determined to be the second subject participle, the target participle is "beef”, and the first traversal participle is "two catties”
  • other participles include: the first phrase "shelf life 3 days” and the first subject participle "beef”.
  • the single participles included in the other participles may be the same as the single participles separated from the first phrase (the first phrase may be a combined word).
  • other participles include: “Shelf life is 3 days”.
  • the first phrase is split into “Shelf life + 3 days”
  • the second phrase is: "Beef + two kilograms + shelf life is 3 days”.
  • the single participle included in the other participles may be different from the single participle split out of the first phrase (the first phrase may be a compound word).
  • other participles can include: "shelf life + 3 days + quality”.
  • the first phrase is split into “shelf life + 3 days”
  • the second phrase is "beef + two kilograms + shelf life 3 days + quality”.
  • beef is classified according to its quality (represented by fat marbling) and physiological maturity (age): premium, special selection, optional, standard, commercial, usable, chopped and canned8 level.
  • the other participle is "shelf life + 3 days + special quality”
  • the first phrase is split into “shelf life + 3 days”
  • the second phrase is "beef + two kilograms + shelf life 3 days + special quality” excellent.
  • B-2 Determine the second traversed participle to be the last second subject participle that has not been traversed, and combine the second traversed participle with at least one remaining participle that has not been traversed in the food ingredients to form a third phrase.
  • the second traversal participle is the last second subject participle that has not been traversed, and the second traversal participle is combined with at least one remaining participle that has not been traversed in the ingredients to form a third phrase, starting with "Can you help me put it together?" "Put two kilograms of beef and three kilograms of pork in the refrigerator with a shelf life of 3 days" as an example.
  • the corresponding word segmentation information obtained includes: “two kilograms", “shelf life”, “3 days”, “beef”, "pork", "three kilograms” ".
  • the second traversed participle is "pork", the remaining participle that has not been traversed is "three catties", and the third phrase formed is "pork + three catties".
  • step B-3 can be implemented in the following ways:
  • the second phrase is "beef + two kilograms + shelf life of 3 days”
  • the third phrase is "pork + three kilograms”
  • the determined candidate information is "beef", "two kilograms", “shelf life”, “3 days” "Heaven”, "Pork”, "Three Catties”.
  • a verb participle is added based on the subject participle included in each phrase.
  • the verb participle can be "add” or "put”.
  • the added verb is "add”, and the obtained phrase information It's "beef”, “added”, “two catties”, “shelf life”, “3 days”; "pork”, “added”, “three catties”.
  • step S64 (displaying semantic understanding content corresponding to at least one of the food ingredient word segments) can be implemented in the following manner: in response to a preset operation of the display device, displaying at least one semantic understanding content corresponding to the food ingredient word segmentation. content, or when displaying the semantic understanding content corresponding to at least one of the food ingredient participles, hiding the verb participles included in the semantic understanding content.
  • the preset operation can be closing the refrigerator door, triggering a preset gesture/button, etc. to evoke the display interface.
  • the refrigerator will no longer display the interface repeatedly after detecting that it is closed, avoiding multiple displays and improving user experience.
  • a display mode may display semantic understanding content corresponding to at least one ingredient participle as shown in FIG. 12C , and this mode includes a verb participle corresponding to the subject.
  • Another display method may be as shown in FIG. 12D .
  • the verb participle included in the semantic understanding content is hidden.
  • parameters such as ingredient type and attribute identification can be modified accordingly.
  • the user stores "two pounds of beef and three pounds of pork with a shelf life of three days" in the morning, and takes out one pound of beef at noon. and a pound of pork for lunch, and then say to the refrigerator, "Modify the beef to one pound of shelf life and the shelf life of three days, and the pork to two pounds.”
  • the display device of the refrigerator understands the semantics corresponding to at least one ingredient segment after the modification. The content is displayed.
  • the display device first obtains word segmentation information corresponding to the voice food material data, where the word segmentation information includes at least one food material word segmentation and an attribute identifier corresponding to the food material word segmentation, and then determines the target word segmentation in at least one food material word segmentation based on the attribute identifier, Then, based on the sequence relationship between the target word segmentation and other word segments in at least one food ingredient segmentation, the semantic understanding content corresponding to at least one food ingredient segmentation is determined, and finally the semantic understanding content corresponding to at least one food ingredient segmentation is displayed.
  • the display device first obtains word segmentation information corresponding to the voice food material data, where the word segmentation information includes at least one food material word segmentation and an attribute identifier corresponding to the food material word segmentation, and then determines the target word segmentation in at least one food material word segmentation based on the attribute identifier, Then, based on the sequence relationship between the target word segmentation and other word segments in at least one food ingredient segmentation, the semantic understanding content corresponding to at least one food ingredient segmentation is determined, and finally the semantic understanding content corresponding to at least one food ingredient segmentation is displayed.
  • An embodiment of the present application provides a computer device, including: one or more processors; a storage device configured to store one or more programs. When the one or more programs are executed by the one or more processors, The one or more processors are caused to implement any one of the semantic understanding methods described in the embodiments of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种显示设备和控制方法,应用于显示技术领域,该显示设备(200)包括:控制器(250),被配置为:接收用户对显示设备的目标控制请求(S710);响应于目标控制请求,获取目标控制文本,基于预设文本库对目标控制文本进行语义匹配,得到语义匹配结果,预设文本库中包括:预设用语信息以及预设资源信息(S720);基于语义匹配结果,执行目标控制请求对应的控制指令(S730);显示器(260),被配置为:显示执行控制指令对应的响应界面。

Description

显示设备和控制方法
相关申请的交叉引用
本申请要求在2022年06月22日提交、申请号为202210713044.3,以及在2022年06月30日提交、申请号为202210768515.0的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及显示技术领域,尤其涉及一种显示设备和控制方法。
背景技术
目前的显示设备,能够支持用户语音交互,用户可通过语音去控制显示设备,如语音控制显示设备的音量大小、语音搜索显示设备中的相关信息等。
显示设备在接收到用户发出的语音控制信息时,需要将语音控制信息发送至服务器进行语义分析,并基于服务器返回的语义分析结果才能够对用户的语音信息进行响应,然而,用户与显示设备的每一次交互都需要服务器进行语义分析,会增大服务器压力,导致服务器资源损耗较大。
发明内容
本申请实施例提供了一种显示设备,该显示设备包括:控制器,被配置为:接收用户对所述显示设备的目标控制请求;响应于所述目标控制请求,获取目标控制文本,基于预设文本库对所述目标控制文本进行语义匹配,得到语义匹配结果,所述预设文本库中包括:预设用语信息以及预设资源信息;基于所述语义匹配结果,执行所述目标控制请求对应的控制指令。
本申请实施例提供了一种控制方法,包括:接收用户对所述显示设备的目标控制请求;响应于所述目标控制请求,获取目标控制文本,基于预设文本库对所述目标控制文本进行语义匹配,得到语义匹配结果,所述预设文本库中包括:预设用语信息以及预设资源信息;基于所述语义匹配结果,执行所述目标控制请求对应的控制指令;显示执行所述控制指令对应的响应界面。
附图说明
图1为根据一个或多个实施例的显示设备与控制装置之间操作场景的示意图;
图2为根据一个或多个实施例的控制设备100的硬件配置框图;
图3为根据一个或多个实施例的显示设备200的硬件配置框图;
图4为根据一个或多个实施例的显示设备200中软件配置示意图;
图5为根据一个或多个实施例的显示设备200中应用程序的图标控件界面显示示意图;
图6A为根据一个或多个实施例进行控制的系统框架图;
图6B为根据一个或多个实施例进行控制的架构图;
图7A为一种控制方法的流程示意图;
图7B为一种显示设备的界面示意图;
图7C为另一种显示设备的界面示意图;
图7D为又一种显示设备的界面示意图;
图7E为又一种显示设备的界面示意图;
图7F为又一种显示设备的界面示意图;
图7G为又一种显示设备的界面示意图;
图8A为一种目标控制文本与候选用语匹配的结构示意图;
图8B为另一种目标控制文本与候选用语匹配的结构示意图;
图8C为又一种目标控制文本与候选用语匹配的结构示意图;
图8D为又一种目标控制文本与候选用语匹配的结构示意图;
图8E为又一种目标控制文本与候选用语匹配的结构示意图;
图8F为一种显示设备与服务器通信的结构示意图;
图9为一个或多个实施例的显示设备与控制装置之间操作场景的示意图;
图10A为一种语义理解方法的流程示意图;
图10B为一种语义理解方法的预处理示意图;
图11A为另一种语义理解方法的流程示意图;
图11B为另一种语义理解方法的原理示意图;
图12A为又一种语义理解方法的流程示意图;
图12B为又一种语义理解方法的原理示意图;
图12C为一种语义理解方法的界面显示图之一;
图12D为一种语义理解方法的界面显示图之二;
图12E为一种语义理解方法的界面显示图之三。
具体实施方式
为了能够更清楚地理解本申请实施例的上述目的、特征和优点,下面将对本申请实施例进行进一步描述。需要说明的是,在不冲突的情况下,本申请实施例的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请实施例,但本申请实施例还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本申请实施例的一部分实施例,而不是全部的实施例。
显示设备能够通过与用户交互,在显示界面中响应用户的控制指令,其中,用户可向 显示设备发送语音控制信息,来对显示设备进行相应语音控制,显示设备在接收到用户发送的语音控制信息之后,会将语音控制信息发送给服务器,便于进行语义分析,从而,能够基于服务器返回的语义分析结果,执行相应的控制。
然而,对于同一个用户,可能会有很多固定的语言习惯,比如控制显示设备的音量大小时,有的用户喜欢说“升高音量”,有的用户则喜欢说“声音大一点”,对于固定用户来说,每一个用户对于控制需求的说法基本不会发生太大变化,显示设备每次在接收到用户的语音控制信息时,都需要在服务端进行语音分析,会浪费服务端的资源。
为了解决上述问题,本申请实施例中显示设备在接收到用户发送的对显示设备的目标控制请求之后,对目标控制请求进行文本转换,得到目标控制文本,基于预设文本库对目标控制文本进行语义匹配,确定语义匹配结果,能够在设备端实现与用户交互时的语义理解,并基于语义匹配结果,执行目标控制请求对应的控制指令,显示执行控制指令对应的响应界面,通过上述方法能够在设备端基于预设文本库对其进行语义匹配,避免每次用户交互时都需要上传服务器消耗服务端资源的问题,能够有效节省服务端压力,提升交互控制效率。
图1为一个或多个实施例的显示设备与控制装置之间操作场景的示意图。如图1所示,用户可向显示设备200说出语音控制信息,或者,用户可向显示设备200的控制装置100(或者与显示设备200关联的智能设备300)发出语音控制信息,实现对显示设备的语义控制,使得显示设备200在设备端本地进行语义匹配,对用户发出的语音控制信息进行有效识别,并执行相关控制指令,显示执行控制指令对应的响应界面,便于用户体验。
在一些实施例中,用户可用过遥控器或手机打开显示设备,并通向显示设备发送语音控制信息,显示设备能够对这些语音控制信息进行语义匹配,识别用户需求。
示例性的,用户可在显示设备的观看位置向显示设备发送语音数据,由显示设备的麦克风阵列收集用户说出的语音数据并进行语义匹配,其中,显示设备上可设置多个预设位置,用来装载麦克风阵列,便于对外界语音数据进行有效获取,如,可在显示设备的显示底部设置多个预设位置,如第一位置、第二位置以及第三位置,第一位置中可装载一个麦克风阵列,第二位置中可装载一个麦克风阵列,第三位置中可装载一个麦克风阵列,或者,可在显示设备的显示上部设置多个预设位置,如第四位置、第五位置以及第六位置,第四位置中可装载一个麦克风阵列,第五位置中可装载一个麦克风阵列,第六位置中可装载一个麦克风阵列,或者,可在显示设备的显示左侧设置多个预设位置,如第七位置、第八位置以及第九位置,第七位置中可装载一个麦克风阵列,第八位置中可装载一个麦克风阵列,第九位置中可装载一个麦克风阵列,或者,可在显示设备的显示右侧设置多个预设位置,如第十位置、第十一位置以及第十二位置,第十位置中可装载一个麦克风阵列,第十一位置中可装载一个麦克风阵列,第十二位置中可装载一个麦克风阵列,则在显示设备接收到用户的语音数据时,能够有效快速的进行语义匹配,以响应用户,从而,解决了现有技术中的问题,能够在设备端基于预设文本库对其进行语义匹配,避免每次用户交互时都需要上传服务器消耗服务端资源的问题,能够有效节省服务端压力,提升交互控制效率。
需要说明的是,显示设备中麦克风阵列的装载位置可同时安装在显示的不同区域,第 一区域、第二区域、......、第十二区域,可同时存在,从而,能够更有效准确的获取语音数据,本申请实施例对此不做具体限定。
在一些实施例中,控制装置100可以是遥控器,遥控器和显示设备的通信包括红外协议通信、蓝牙协议通信、无线或其他有线方式来控制显示设备200。用户可以通过遥控器上按键、语音输入以及控制面板输入等输入用户指令,来控制显示设备200。在一些实施例中,也可以使用移动终端、平板电脑、计算机、笔记本电脑和其他智能设备来控制显示设备200。
在一些实施例中,显示设备200可以不使用上述的智能设备或控制设备接收指令,而是通过触摸或者手势或者语音输入等方式接收用户的控制。
在一些实施例中,显示设备200还可以采用除了控制装置100和智能设备300之外的方式进行控制,例如,可以通过显示设备200设备内部配置的获取语音指令的模块直接接收用户的语音指令控制,也可以通过显示设备200设备外部设置的语音控制设备来接收用户的语音指令控制。
在一些实施例中,智能设备300可与显示设备200中安装的软件应用,通过网络通信协议实现连接通信,实现一对一控制操作的和数据通信的目的。也可以将智能设备300上显示的音视频内容传输到显示设备200上,实现同步显示功能。显示设备200还与服务器400通过多种通信方式进行数据通信,可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向显示设备200提供各种内容和互动。显示设备200,可以为液晶显示器、OLED显示器或者投影显示设备等。显示设备200除了提供广播接收电视功能之外,还可以附加提供计算机支持功能的智能网络电视功能。
图2为根据示例性实施例中控制装置100的配置框图。如图2所示,控制装置100包括控制器110、通信接口130、用户输入/输出接口140、存储器、供电电源。控制装置100可接收用户的输入操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起用用户与显示设备200之间交互中介作用。通信接口130用于和外部通信,包含WIFI芯片,蓝牙模块,NFC或可替代模块中的至少一种。用户输入/输出接口140包含麦克风,触摸板,传感器,按键或可替代模块中的至少一种。
图3为根据示例性实施例中显示设备200的硬件配置框图。如图3所示显示设备200包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、显示器260、音频输出接口270、存储器、供电电源、用户接口(即用户输入接口)280中的至少一种。控制器250包括中央处理器,视频处理器,音频处理器,图形处理器,RAM,ROM,用于输入/输出的第一接口至第n接口。显示器260可为液晶显示器、OLED显示器、触控显示器以及投影显示器中的至少一种,还可以为一种投影装置和投影屏幕。调谐解调器210通过有线或无线接收方式接收广播电视信号,以及从多个无线或有线广播电视信号中解调出音视频信号,如以及EPG数据信号。通信器220是用于根据各种通信协议类型与外部设备或服务器进行通信的组件。例如:通信器可以包括Wifi模块,蓝牙模块,有线以太网模块等其他网络通信协议芯片或近场通信协议芯片,以及红外接收器中的至少一种。显示设 备200可以通过通信器220与外部控制设备100或服务器400建立控制信号和数据信号的发送和接收。检测器230用于采集外部环境或与外部交互的信号。控制器250和调谐解调器210可以位于不同的分体设备中,即调谐解调器210也可在控制器250所在的主体设备的外置设备中,如外置机顶盒等。用户接口280可用于接收控制装置100(如:红外遥控器等)的控制信号。
在一些实施例中,控制器250,通过存储在存储器上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250控制显示设备200的整体操作。用户可在显示器260上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。
图4为根据本申请实施例一个或多个实施例的显示设备200中软件配置示意图,如图4所示,将系统分为四层,从上至下分别为应用程序(Applications)层(简称“应用层”),应用程序框架(Application Framework)层(简称“框架层”),安卓运行时(Android runtime)和系统库层(简称“系统运行库层”),以及内核层。
在一些实施例中,应用程序层中运行有至少一个应用程序,这些应用程序可以是操作系统自带的窗口(Window)程序、系统设置程序或时钟程序等;也可以是第三方开发者所开发的应用程序。在具体实施时,应用程序层中的应用程序包括但不限于以上举例。
在一些实施例中,系统运行库层为上层即框架层提供支撑,当框架层被使用时,安卓操作系统会运行系统运行库层中包含的C/C++库以实现框架层要实现的功能。
在一些实施例中,内核层是硬件和软件之间的层,至少包含以下驱动中的至少一种:音频驱动、显示驱动、蓝牙驱动、摄像头驱动、WIFI驱动、USB驱动、HDMI驱动、传感器驱动(如指纹传感器,温度传感器,压力传感器等)、以及电源驱动等。
图5为根据本申请实施例一个或多个实施例的显示设备200中应用程序的图标控件界面显示示意图,如图5中所示,应用程序层包含至少一个应用程序可以在显示器中显示对应的图标控件,如:直播电视应用程序图标控件、视频点播应用程序图标控件、媒体中心应用程序图标控件、应用程序中心图标控件、游戏应用图标控件等。直播电视应用程序,可以通过不同的信号源提供直播电视。视频点播应用程序,可以提供来自不同存储源的视频。不同于直播电视应用程序,视频点播提供来自某些存储源的视频显示。媒体中心应用程序,可以提供各种多媒体内容播放的应用程序。应用程序中心,可以提供储存各种应用程序。
在一些实施例中,上述显示设是具有显示功能的终端设备,例如电视机或者平板电视等。该显示设备中:
在一些实施例中,控制器250,通过存储在存储器上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250控制显示设备200的整体操作。用户可在显示器260上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。
输出接口(显示器260,和/或,音频输出接口270),被配置为输出用户交互信息;
通信器220,用于与服务器400进行通信;
控制器250,被配置为:接收用户对所述显示设备的目标控制请求;
响应于所述目标控制请求,获取目标控制文本,基于预设文本库对所述目标控制文本进行语义匹配,得到语义匹配结果,所述预设文本库中包括:预设用语信息以及预设资源信息;
基于所述语义匹配结果,执行所述目标控制请求对应的控制指令;
显示器260,被配置为:显示执行所述控制指令对应的响应界面。
在一些实施例中,所述控制器250,具体被配置为:
基于预设用语信息对所述目标控制文本进行语义匹配;
检测到所述目标控制文本与所述预设用语信息中包括的全部用语均不匹配,基于预设资源信息对所述目标控制文本进行语义匹配,得到语义匹配结果。
在一些实施例中,所述控制器250,具体被配置为:
检测到所述目标控制文本与所述预设用语信息中包括的候选用语匹配,确定所述目标控制请求对应的控制指令为所述候选用语对应的第一控制指令;
执行所述候选用语对应的第一控制指令。
在一些实施例中,所述控制器250,具体被配置为:
检测到所述目标控制文本与所述预设资源信息中包括的候选资源匹配,确定所述目标控制请求对应的控制指令为所述候选资源对应的第二控制指令;
执行所述候选资源对应的第二控制指令。
在一些实施例中,所述控制器250,还被配置为:
响应于第一信息发送请求,接收服务器发送的预设用语信息,其中,所述预设用语信息是所述服务器基于所述用户的历史控制请求确定出的;
将所述预设用语信息添加至预设文本库中。
在一些实施例中,所述控制器250,还被配置为:
响应于第二信息发送请求,接收服务器发送的预设资源信息,其中,所述预设资源信息是所述服务器基于所述用户的历史控制请求以及所述用户的历史访问量确定出的,所述预设资源信息包括至少两个领域内的热门资源;
将所述预设资源信息添加至预设文本库中。
在一些实施例中,所述控制器250,具体被配置为:
检测到所述目标控制文本与所述预设资源信息中包括的全部资源均不匹配,向服务器发送所述目标控制文本,以使所述服务器对所述目标控制文本进行语义分析,确定与所述目标控制文本对应的第三控制指令;
响应于所述服务器发送的所述目标控制文本对应的第三控制指令,执行所述第三控制指令。
综上所述,本申请实施例中显示设备在接收到用户发送的对显示设备的目标控制请求之后,对目标控制请求进行文本转换,得到目标控制文本,基于预设文本库对目标控制文 本进行语义匹配,确定语义匹配结果,能够在设备端实现与用户交互时的语义理解,并基于语义匹配结果,执行目标控制请求对应的控制指令,显示执行控制指令对应的响应界面,通过上述方法能够在设备端基于预设文本库对其进行语义匹配,避免每次用户交互时都需要上传服务器消耗服务端资源的问题,能够有效节省服务端压力,提升交互控制效率。
图6A为根据本申请实施例一个或多个实施例进行控制的系统框架图,如图6A所示,该系统中可以包括控制请求接收模块601、语义匹配模块602、控制指令执行模块603以及响应界面显示模块604。该系统通过控制请求接收模块601接收用户对所述显示设备的目标控制请求,由语义匹配模块602响应于所述目标控制请求,获取目标控制文本,基于预设文本库对所述目标控制文本进行语义匹配,得到语义匹配结果,所述预设文本库中包括:预设用语信息以及预设资源信息,由控制指令执行模块603基于所述语义匹配结果,执行所述目标控制请求对应的控制指令,并通过响应界面显示模块604显示执行所述控制指令对应的响应界面,从而,能够在设备端基于预设文本库对其进行语义匹配,避免每次用户交互时都需要上传服务器消耗服务端资源的问题,能够有效节省服务端压力,提升交互控制效率。
图6B为根据本申请实施例一个或多个实施例进行控制的架构图,基于上述系统框架,本申请实施例在安卓系统中的实现如图6B所示,安卓系统中主要包括应用层、框架层、系统运行库层以及内核层,实现逻辑主要在应用层体现,其中,包括控制请求接收模块、语义匹配模块、控制指令执行模块以及响应界面显示模块。
本申请实施例中提供的控制方法,通过实时检测用户的语音行为,接收用户发出的语音数据,收集用户在一次说话时段内的语音数据作为用户对显示设备的目标控制请求,并从目标控制请求进行文本分析,得到目标控制文本,其中,显示设备中可预先存储预设文本库,预设文本库中可用于描述用户习惯性用语的预设用语信息以及用于描述当前热门资源的预设资源信息,通过预设用语信息以及预设资源信息对目标控制文本进行相似度匹配,确定语义匹配结果,使得根据语义匹配结果,在显示设备上执行相应的控制指令来控制显示设备,并显示执行控制指令对应的用户界面,便于用户获知控制结果,从而,能够在设备端进行语义匹配,无需每次交互都需要服务器的参与,减少对服务器的访问次数,避免服务端无效访问较多导致资源损耗的问题,降低服务端压力,提升交互效率。
以下将以示例性的方式结合图7A进行说明,可以理解的是,图7A中所涉及的步骤在实际实现时可以包括更多的步骤,或者更少的步骤,并且这些步骤之间的顺序也可以不同。
如图7A所示,图7A为本申请实施例提供的一种控制方法的流程示意图,控制方法具体包括如下步骤:
S710、接收用户对显示设备的目标控制请求。
其中,显示设备可包括一些能够进行语音交互的智能设备,如,显示设备可包括:智能电视、智能空调、智能冰箱、智能音响等,当然,显示设备还可以包括其他支持语音交互的智能设备,本申请实施例对此不做具体限定。
其中,显示设备能够实时收录用户发出的交互语音,来向用户反馈交互语音响应的交互信息。
在显示设备为智能电视时,用户与智能电视的交互语音可为“声音调小”,则对应的,用户对智能电视的目标控制请求为“声音调小”,表示出用户有降低音量的需求,智能电视可通过实时采集用户的语音数据,来有效确定出用户对智能电视的目标控制请求。
其中,智能电视可实时采集其对应的预设区域内用户的语音数据,若预设时段内未检测到语音数据,或者,检测到语音数据不清楚,则可扩大预设区域进行持续语音收录,从而,有效精准的采集用户的语音数据。
需要说明的是,预设区域可为预先设定的智能电视的声场范围对应的区域,或者,预先设定的智能电视周围的其他区域,本申请实施例对此不做具体限定。
在显示设备为智能空调时,用户与智能空调的交互语音可为“温度降低”,则对应的,用户对智能空调的目标控制请求为“温度降低”,表示出用户有降低温度的需求,智能空调可通过实时采集用户的语音数据,来有效确定出用户对智能空调的目标控制请求。
其中,智能空调可实时采集其对应的预设区域内用户的语音数据,若预设时段内未检测到语音数据,或者,检测到语音数据不清楚,则可扩大预设区域进行持续语音收录,从而,有效精准的采集用户的语音数据。
需要说明的是,预设区域可为预先设定的智能空调周围的其他区域,如智能空调的前方指定区域。
在显示设备为智能冰箱时,用户与智能冰箱的交互语音可为“打开/开启”,则对应的,用户对智能冰箱的目标控制请求为“打开/开启”,表示出用户有打开冰箱的需求,智能冰箱可通过实时采集用户的语音数据,来有效确定出用户对智能空调的目标控制请求。
其中,智能冰箱可实时采集其对应的预设区域内用户的语音数据,若预设时段内未检测到语音数据,或者,检测到语音数据不清楚,则可扩大预设区域进行持续语音收录,从而,有效精准的采集用户的语音数据。
需要说明的是,预设区域可为预先设定的智能冰箱周围的其他区域,如智能冰箱的前方指定区域,本申请实施例对此不做具体限定。
在显示设备为智能音响时,用户与智能音响的交互语音可为“调大播放音量”,则对应的,用户对智能音响的目标控制请求为“调大播放音量”,表示出用户有增加播放音量的需求,智能音响可通过实时采集用户的语音数据,来有效确定出用户对智能音响的目标控制请求。
其中,智能音响可实时采集其对应的预设区域内用户的语音数据,若预设时段内未检测到语音数据,或者,检测到语音数据不清楚,则可扩大预设区域进行持续语音收录,从而,有效精准的采集用户的语音数据。
需要说明的是,预设区域可为预先设定的智能音响的声场范围对应的区域,或者,预先设定的智能音响周围的其他区域。
其中,对于智能音响,预设区域可与智能音响的当前摆放位置相关,如智能音响由第一位置移动至第二位置时,预设区域可由第一位置对应的第一预设区域变更为第二位置对应的第二位置区域。
S720、响应于目标控制请求,获取目标控制文本,基于预设文本库对目标控制文本进 行语义匹配,得到语义匹配结果。
其中,显示设备可对接收到的目标控制请求进行文本转换操作,由语音数据转换为文本数据,便于显示设备进行文本匹配。
需要说明的是,文本转换操作为将语音数据转换为文本数据的操作,具体的,可通过语音识别算法/语音识别软件实现文本转换操作。
其中,显示设备可在设备端基于预设文本库对目标控制文本进行语义匹配,从而,避免与服务端的多次访问,占用服务器资源。
其中,预设文本库中可包括:预设用语信息以及预设资源信息,预设用户信息为用户通常情况下的习惯用语,预设资源信息为当前时段内网络上/线上的热门话语。
预设用语信息可为用户习惯性与显示设备进行语音交互时的话语,在用户与智能电视进行语音交互时,预设用户信息可包括但不限于是:打开电视、开启电视、降低音量、增大音量、关闭电视等,预设资源信息可为:歌曲名称A、电视剧B、视频C等。
在用户与智能空调进行语音交互时,预设用户信息可包括但不限于是:打开空调、开启空调、降低温度、增大温度、关闭空调等,预设资源信息可为:26°、16°等。
在用户与智能冰箱进行语音交互时,预设用户信息可包括但不限于是:打开冰箱、开启冰箱、降低温度、增大温度、关闭冰箱等,预设资源信息可为:静音模式、除异味模式等。
在用户与智能音响进行语音交互时,预设用户信息可包括但不限于是:打开音响、开启音响、音量调大、音量调小、关闭音响等,预设资源信息可为:歌曲名称D、流行音乐E等。
另外,预设文本库可与用户相对应,也就是,不同用户对应的预设用语信息以及预设资源信息不同,可以用户的语音特性为不同用户进行标识,设置与用户唯一对应的预设文本库,便于采集到用户的语音数据时,能够基于与其对应的预设文本库进行相应匹配。
在确定与显示设备交互的为第一用户时,可基于与第一用户对应的第一预设文本库对目标控制文本进行语义匹配,获得语义匹配结果,或者,在确定与显示设备交互的为第二用户时,可基于与第二用户对应的第二预设文本库对目标控制文本进行语义匹配,获得语义匹配结果。
其中,显示设备中可存储多个用户对应的预设文本库,具体的,显示设备中能够存储的预设文本库的数量与显示设备的存储量相关,如对于存储量较大的显示设备可存储较多预设文本库,对于存储量较小的显示设备可存储较少预设文本库。
需要说明的是,显示设备可根据存储时间或者用户访问量对其存储的预设文本库进行更新处理。
显示设备可预先设置一个更新时间(如一周),可选择每周一对存储的预设文本库进行定期清理,如删除一些较长时间未匹配的预设文本库,便于其他新的预设文本库的加入。
其中,显示设备在定期更新的同时,也可基于其存储量进行自适应更新,如显示设备中存储了十个用户对应的预设文本库,在检测到有第十一个预设文本库的加入时,可按照预设文本库的使用次数/使用时间删除一个预设文本库,便于新的预设文本库的加入,从而, 提升显示设备存储的预设文本库的实用性。
显示设备中可记录每个用户的访问量,如第一用户的访问量为2,第二用户的访问量为10,第三用户的访问量为15,则可确定出第一用户不属于常用用户,则在更新时间到达时,可删除第一用户对应的预设文本库,从而,便于对预设文本库进行实时更新,保证其实用性。
S730、基于语义匹配结果,执行目标控制请求对应的控制指令。
其中,语义匹配结果能够有效反映出与用户发出的目标控制指令对应的执行内容,从而,有效确定出执行内容对应的控制指令,并执行控制指令。
结合上述举例,在显示设备为智能电视时,基于语义匹配结果确定出与用户发出的目标控制指令对应的执行内容为增大音量时,则执行目标控制请求对应的控制指令为将智能电视的当前播放音量增大。
在显示设备为智能空调时,基于语义匹配结果确定出与用户发出的目标控制指令对应的执行内容为降低温度时,则执行目标控制请求对应的控制指令为将智能空调的温度降低。
在显示设备为智能冰箱时,基于语义匹配结果确定出与用户发出的目标控制指令对应的执行内容为开启冰箱门时,则执行目标控制请求对应的控制指令为将智能冰箱的门打开。
在显示设备为智能音响时,基于语义匹配结果确定出与用户发出的目标控制指令对应的执行内容为增大音量时,则执行目标控制请求对应的控制指令为将智能音响的播放音量增大。
在一些实施例中,还包括:显示执行控制指令对应的响应界面。
其中,响应界面可为在原始界面上新增显示内容,也可为对原始界面进行更改后的显示界面,显示执行控制指令对应的响应界面可分为以下多种实现方式。
在一些实施例中,对于执行目标控制请求对应的控制指令不会更改显示设备的当前显示界面时,显示执行控制指令对应的响应界面可为在显示设备的当前显示界面上添加新的显示内容对应的显示界面。
在显示设备为智能电视时,智能电视的当前显示界面播放显示内容1,如图7B示例性所示,目标控制请求对应的控制指令为调大音量,则执行目标控制请求对应的控制指令时,智能设备的当前显示界面不会发生更改,对应的显示执行控制指令对应的响应界面可为在当前显示界面的预设区域中显示调大音量的显示指示,响应界面如图7C示例性所示。
在显示设备为智能冰箱时,智能冰箱的当前显示界面显示冰箱冷藏室温度为5℃,如图7D示例性所示,目标控制请求对应的控制指令为冷藏室温度调整至1℃,则执行目标控制请求对应的控制指令时,智能设备的当前显示界面不会发生更改,对应的显示执行控制指令对应的响应界面可为在当前显示界面的温度显示区域调整显示温度为1℃,并可显示已调整消息,响应界面如图7E示例性所示。
在另一些实施例中,对于执行目标控制请求对应的控制指令会更改显示设备的当前显示界面时,显示执行控制指令对应的响应界面可为对显示设备的当前显示界面进行更改后的另一个显示界面。
在显示设备为智能电视时,智能电视的当前显示界面显示电视剧1的第一集,播放内 容为显示内容2,如图7F示例性所示,目标控制请求对应的控制指令为播放下一集,则执行目标控制请求对应的控制指令时,智能电视的当前显示界面会变化为另一个显示界面,对应的显示执行控制指令对应的响应界面可为当前显示界面变化后的下一集对应的显示界面,显示电视剧1的第二集,播放内容为显示内容3,响应界面如图7G示例性所示。
基于上述实施例的描述,在确定语义匹配结果时,可依次根据预设文本库中的预设用语信息以及预设资源信息对目标控制文本进行语义匹配,如先将预设用语信息中包括的全部候选用语与目标控制文本进行语义匹配,若预设用语信息中包括的全部候选用语中没有一个能够与目标控制文本匹配成功,则可选用预设资源信息中包括的全部候选资源对目标控制文本进行一一匹配,从而,确定出语义匹配结果。
其中,可设置一个匹配阈值,从来衡量目标控制文本与预设用语信息中包括的每个候选用语的匹配度,如匹配阈值可设置为85%,若目标控制文本与预设用语信息中包括的候选用语的匹配度大于85%,则确定匹配成功,若目标控制文本与预设用语信息中包括的候选用语的匹配度小于或等于85%,则确定匹配失败。
如图8A示例性所示,目标控制文本与预设用语信息中包括的五个候选用语进行匹配,其中,目标控制文本与候选用语1的匹配度为25%,目标控制文本与候选用语2的匹配度为28%,目标控制文本与候选用语3的匹配度为40%,目标控制文本与候选用语4的匹配度为50%,目标控制文本与候选用语5的匹配度为10%。
其中,在确定出目标控制文本与预设用语信息中的一个候选用语匹配(匹配度高于85%),则可确定出目标控制请求对应的控制指令与这个候选用语对应的第一控制指令相同,可执行第一控制指令,用于对目标控制请求进行响应。
如图8B示例性所示,目标控制文本与预设用语信息中包括的五个候选用语进行匹配,其中,目标控制文本与候选用语1的匹配度为25%,目标控制文本与候选用语2的匹配度为90%,目标控制文本与候选用语3的匹配度为40%,目标控制文本与候选用语4的匹配度为50%,目标控制文本与候选用语5的匹配度为10%,则可确定出与目标控制文本匹配的候选用语为候选用语2。
需要说明的是,若确定出有至少两个候选用语与目标控制文本的匹配度高于85%,则可选择匹配度最高的候选用语作为与目标控制文本匹配成功的候选用语。
如图8C示例性所示,目标控制文本与预设用语信息中包括的五个候选用语进行匹配,其中,目标控制文本与候选用语1的匹配度为25%,目标控制文本与候选用语2的匹配度为90%,目标控制文本与候选用语3的匹配度为92%,目标控制文本与候选用语4的匹配度为50%,目标控制文本与候选用语5的匹配度为10%,则可确定出与目标控制文本匹配的候选用语为候选用语3。
其中,在基于预设资源信息与目标控制文本进行匹配时,可将预设资源信息中的全部候选资源与目标控制文本进行匹配,若预设资源信息中包括的全部候选资源中存在能够与目标控制文本匹配成功的候选资源,则可确定出目标控制请求对应的控制指令与这个候选资源对应的第二控制指令相同,可执行第二控制指令,用于对目标控制请求进行响应。
其中,可设置一个匹配阈值,从来衡量目标控制文本与预设资源信息中包括的每个候 选资源的匹配度,如匹配阈值可设置为85%,若目标控制文本与预设资源信息中包括的候选资源的匹配度大于85%,则确定匹配成功,若目标控制文本与预设资源信息中包括的候选资源的匹配度小于或等于85%,则确定匹配失败。
如图8D示例性所示,目标控制文本与预设资源信息中包括的三个候选资源进行匹配,其中,目标控制文本与候选资源1的匹配度为40%,目标控制文本与候选资源2的匹配度为93%,目标控制文本与候选资源3的匹配度为20%,则可确定出与目标控制文本匹配的候选资源为候选用语2。
需要说明的是,若确定出有至少两个候选资源与目标控制文本的匹配度高于85%,则可选择匹配度最高的候选资源作为与目标控制文本匹配成功的候选资源。
如图8E示例性所示,目标控制文本与预设资源信息中包括的三个候选资源进行匹配,其中,目标控制文本与候选资源1的匹配度为90%,目标控制文本与候选资源2的匹配度为86%,目标控制文本与候选资源3的匹配度为20%,则可确定出与目标控制文本匹配的候选资源为候选用语1。
其中,服务器可基于用户的历史访问数据预先确定出预设文本库,并向显示设备发送预设文本库进行存储。
服务器可基于预设用语信息的更新频率向显示设备发送一个第一信息发送请求,使得显示设备接收服务器发送的预设用语信息,来构建/更新预设文本库。
从而,使得显示设备在与用户进行交互时,能够在设备端基于预设用语信息理解用户意图,快速向用户反馈。
其中,服务器在确定预设用语信息时,可根据用户语音访问历史数据统计出用户习惯性用语,用户访问历史数据的展示形式如下表1所示。
表1用户访问历史数据
其中,结果A为:{"msg":"","slots":[{"name":"singer","value":"歌手1"},{"name":"song","value":"歌曲a"}],"code":0,"session_complete":true,"domain":"music","skill_id":"990835315751129088","intent":"play"}。
其中,可通过分组查询获取用户高频次的说法,为了提升准确度,分组条件可为用户总访问次数>=1000,某个说法次数>=30,每个用户的习惯性用语获取前20条。
获取这个用户ID的哈希值,然后对7(假设在凌晨1-6点之间推送数据)取模,获取分类值(slot),将数据保存在对应表2中,如下表2所示。
表2用户访问历史数据分类值
服务器可基于预设资源信息的更新频率向显示设备发送一个第二信息发送请求,使得 显示设备接收服务器发送的预设资源信息,来构建/更新预设文本库。
从而,使得显示设备在与用户进行交互时,能够基于预设资源在设备端理解用户意图,快速向用户反馈。
其中,服务器在确定预设资源信息时,可根据用户画像获取对应领域下的热点媒资信息,用户画像为通过收集用户社会属性、消费习惯、偏好特征等各个维度数据,进而对用户或者产品特征属性的刻画,并对这些特征分析统计挖掘潜在价值信息,从而抽象出一个用户的全貌。具体如下所示。
创建各个领域的热门资源:根据所有用户的搜索结果以及当前热搜榜单等信息获得各个领域的热门资源,如下表3所示。
表3热门资源
其中,结果B为:{"msg":"","slots":[{"name":"singer","value":"歌手1"},{"name":"song","value":"电影"}],"code":0,"session_complete":true,"domain":"music","skill_id":""990836308639354880"","intent":"play"}。
根据用户语音历史访问数据创建用户画像:根据访问量的多少并对同一个用户的画像标签进行排名如下表4。
表4画像标签排名
获取热门资源:根据标签类型获取对应领域热点媒资结果,获取用户标识的哈希值,然后对7(假设在凌晨1-6点之间推送数据)取模,获取slot值,数据保存在对应表5中。
表5数据存储
服务器在确定出与用户关联的预设用语信息和预设资源信息之后,可定期将预设用语信息和预设资源信息发送至显示设备,便于显示设备存储/更新。
其中,用户习惯性用语(预设用语信息)变化范围较小,更新频次低,可设置每月更新一次,网络热词媒资信息(预设资源信息)更新频率较高,可设置每天更新一次,如服务器可设置在每月1号凌晨1点到6点之间根据表2中的slot值在每个整点之间进行推送, 例如:1点到2点之间推送slot值为1的数据,依次类推,同理对于表5,每天凌晨1点到6点根据slot值进行推送。
另外,如图8F示例性所示,显示设备和服务器之间的交互可采用轻量级的消息队列遥测传输(Message Queuing Telemetry Transport,MQTT)发布/订阅消息传输协议,每个显示设备和服务器之间建立对应主题,采用QOS1(数据发送方与数据接收方之间的一种协议)服务质量,并设置过期时间为120秒,服务器端根据上述规程分别向对应主题发送网络热词媒资信息和用户习惯用语信息,若对应消息消费成功,在数据库对应数据标志位置为1。
其中,对于特殊场景,如用户长时间未开机,开机之后服务器只推送当月的用户习惯用语和当天的网络热词媒资信息;如用户读取消息失败,服务器会再次发送消息,至少一次到达显示设备。
其中,MQTT是机器对机器/物联网连接协议,它被设计为一个极其轻量级的发布/订阅消息传输协议,对于需要较小代码占用空间和/或网络带宽非常宝贵的远程连接非常有用,是专为受限设备和低带宽、高延迟或不可靠的网络而设计,这些原则也使该协议成为新兴的“机器到机器”或物联网世界的连接设备,以及带宽和电池功率非常高的移动应用的理想选择,具有它体积小、功耗低、数据包最小的优势,并且可以有效地将信息分配给一个或多个接收器。
需要说明的是,由于MQTT采用的QOS1传输协议,显示设备会接收到重复消息,对于重复接收到的消息,显示设备可采用覆盖消息的策略进行消息存储。
综上所述,本申请实施例通过在显示设备上执行上述控制方法,显示设备在接收到用户发送的对显示设备的目标控制请求之后,对目标控制请求进行文本转换,得到目标控制文本,基于预设文本库对目标控制文本进行语义匹配,确定语义匹配结果,能够在设备端实现与用户交互时的语义理解,并基于语义匹配结果,执行目标控制请求对应的控制指令,显示执行控制指令对应的响应界面,通过上述方法能够在设备端基于预设文本库对其进行语义匹配,避免每次用户交互时都需要上传服务器消耗服务端资源的问题,能够有效节省服务端压力,提升交互控制效率。
上述实施方式中,显示设备是以电视等的语音交互为例进行说明,在本申请的其他实施方式中,显示设备还可以是智能冰箱或其他家电设备上的显示设备。
如图9所示,图9为本申请实施例提供的一种显示设备的语义理解过程的应用场景示意图,在图9中,用户可通过控制装置100或终端设备300操作显示设备200。语义理解过程可用于用户与智能家居的语音交互场景中,例如,该场景中的显示设备200可以是智能冰箱、智能洗衣机等具有智能显示功能的智能设备,用户想要对该场景中的智能设备进行控制时,需要先发出语音指令,而智能设备在接收到该语音指令时,对该语音指令进行语义理解,确定与该语音指令所对应的语义理解结果,便于后续智能设备根据语义结果进行显示或者执行相应的控制指令,满足用户的使用需求,提升用户使用体验。
在一些实施例中,也可以使用终端设备300(例如移动终端、平板电脑、笔记本电脑等)控制显示设备200。例如,使用终端设备300上运行的应用程序控制显示设备200。 终端设备300可与显示设备200安装软件应用,通过网络通信协议实现连接通信,实现一对一控制操作以及数据通信的目的。也可以将终端设备300上显示的语义理解内容传输到显示设备200上,实现同步显示功能。
在一些实施例中,显示设备200也可以根据用户的语音指令进行内容显示。例如,用户的语音指令为“能不能帮我把两斤保质期3天的牛肉和猪肉三斤放到冰箱”,显示设备200进行语义理解后,显示出相应的内容。
在一些实施例中,显示设备200可以不使用上述的终端设备300接收指令,而是通过触摸或者手势等方式接收用户的控制。例如,当显示设备200为智能冰箱,且用户将其上部冰箱门由打开状态调整为关闭状态时,显示设备200对关闭冰箱门之前放入的多种食材以及各种食材对应的属性标识进行显示。
本申请实施例提供的语义理解方法,可以基于计算机设备,或者计算机设备中的功能模块或者功能实体实现。
其中,计算机设备可以为个人计算机(personal computer,PC)、服务器、手机、平板电脑、笔记本电脑、大型计算机等,本申请实施例对此不作具体限定。
为了更加详细的说明本实施例,以下将以示例性的方式结合图10A、图11A、图12A进行说明,可以理解的是,虽然图10A、图11A、图12A的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图10A、图11A、图12A中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。以能够实现本申请实施例中提供的语义理解方法为准。其中,语义理解是通过一系列AI(Artificial Intelligence,人工智能)算法,将文本解析为结构化的、机器可读的意图与语槽信息,便于互联网开发者更好的理解并满足用户需求。例如,在本申请实施例中,语义理解场景,就是智能冰箱对用户说出的多食材多属性或缺失属性的操作进行语义理解,并将语义理解内容显示在智能冰箱的显示设备上。
如图10A所示,该方法具体包括如下步骤:
S51、获取语音食材数据对应的分词信息,分词信息包括:至少一个食材分词以及所述食材分词对应的属性标识。
具体的,针对冰箱食材管理类的语音食材数据,获取语音食材数据对应的分词信息的步骤可以如下:首先,去除介词、副词等不具备冰箱食材管理专业属性的词;然后,获取所有可识别到的食材名称,并按照食材种类进行统一标记;最后,对其他的属性例如数量词等进行分维度的标记。
示例性的,以语音食材数据“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,参照图10B所示,获取对应的分词信息包括:二斤、保质期、3天、牛肉、猪肉、三斤。
S52、基于所述属性标识确定至少一个所述食材分词中的目标分词。
其中,食材分词可以是芒果、苹果、牛肉、猪肉、羊肉等,食材分词对应的属性标识可以是水果、肉类、蔬菜、重量等。此处不对食材分词和属性标识做具体限制。
示例性的,“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”中的属性标识分别是:二斤、三斤,由此确定的食材分词是牛肉、猪肉,进而对应的目标分词分别是:牛肉、猪肉。
需要说明的是,对用户语音数据进行预处理后,获取到对应的分词信息,根据属性标识确定是否具有主语(食材类的词),若有主语,则执行下一步骤S53;若没有主语,显示设备可以输出“不理解这句话”等表明无法识别用户语音数据的关键词的提示语。
S53、基于所述目标分词与至少一个所述食材分词中其他分词的顺序关系,确定至少一个所述食材分词对应的语义理解内容。
示例性的,基于目标分词与至少一个食材分词中其他分词的顺序关系,确定至少一个食材分词对应的语义理解内容。以“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,确定的至少一个食材分词对应的语义理解内容为:牛肉、二斤、保质期3天;猪肉、三斤。
S54、显示至少一个所述食材分词对应的语义理解内容。
示例性的,显示至少一个食材分词对应的语义理解内容为:牛肉、二斤、保质期3天;猪肉、三斤。
在本申请实施例中,显示设备首先获取语音食材数据对应的分词信息,其中,分词信息包括至少一个食材分词以及食材分词对应的属性标识,然后基于属性标识确定至少一个食材分词中的目标分词,接着基于目标分词与至少一个食材分词中其他分词的顺序关系,确定至少一个食材分词对应的语义理解内容,最后显示至少一个食材分词对应的语义理解内容。通过获取语音食材数据对应的至少一个食材分词以及食材分词对应的属性标识,能够对多食材和各食材属性及其搭配关系进行理解,从而能够提高显示设备食材管理中倒装及乱序话术的理解能力,进一步进行准确展示,提升了用户体验。
图11A是本申请实施例提供的另一种语义理解方法的流程示意图。本实施例是在图10A的基础上进一步扩展与优化。可选的,本实施例主要对步骤S53(基于所述目标分词与至少一个所述食材分词中其他分词的顺序关系,确定至少一个所述食材分词对应的语义理解内容)的过程进行说明。
S531、基于至少一个所述食材分词中第一遍历分词的属性标识,以及所述第一遍历分词与所述目标分词的顺序关系,确定词组信息。
其中,目标分词为至少一个食材分词中排列顺序最前的第一主语分词。词组信息包括至少一组词组,在本实施例中,词组信息表示第一主语分词对应的词组与第二主语分词对应的词组的组合。
S532、基于所述词组信息,确定至少一个所述食材分词对应的语义理解内容。
示例性的,参照图11B所示,以“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,获取分词列表中最左侧的主语,即,第一主语分词是“牛肉”,然后以主语为 基准从分词列表左侧开始遍历。由于目标分词为至少一个食材分词中排列顺序最前的第一主语分词,因此,在本实施例中,目标分词是牛肉。第一主语分词对应的词组是“二斤、保质期3天”,第二主语分词是“猪肉”,第二主语分词对应的词组是“三斤”。基于词组信息,确定的至少一个食材分词对应的语义理解内容为:牛肉、二斤、保质期3天;猪肉、三斤。
图12A是本申请实施例提供的另一种语义理解方法的流程示意图。本实施例是在图6A的基础上进一步扩展与优化。可选的,本实施例主要对步骤S531(基于至少一个所述食材分词中第一遍历分词的属性标识,以及所述第一遍历分词与所述目标分词的顺序关系,确定词组信息)的过程进行说明。
S5311、基于至少一个所述食材分词中第一遍历分词的属性标识确定所述第一遍历分词不为第二主语分词,且确定不存在与所述第一遍历分词的属性标识相同的分词。
示例性的,以“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,获取对应的分词信息包括:“二斤”、“保质期”、“3天”、“牛肉”、“猪肉”、“三斤”。第一遍历分词是“二斤”,其中,第二主语分词可以是多个,在本申请实施例中,第二主语分词是猪肉。
S5312、基于所述第一遍历分词与所述目标分词的顺序关系,确定词组信息。
示例性的,第一遍历分词是“二斤”,目标分词是“牛肉”,词组信息为“保质期3天”。
本实施例是在上述实施例的基础上进一步扩展与优化。可选的,本实施例主要对步骤S6312(基于所述第一遍历分词与所述目标分词的顺序关系,确定词组信息)的过程进行说明。
A、确定所述第一遍历分词与所述目标分词的顺序关系属于预设关系,且,确定所述第一遍历分词与所述目标分词之间存在宾格词。
示例性的,以“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,获取对应的分词信息包括:“二斤”、“保质期”、“3天”、“牛肉”、“猪肉”、“三斤”。第一遍历分词是:“二斤”,目标分词是:“牛肉”,第一遍历分词与目标分词的顺序关系属于预设关系,第一遍历分词与目标分词之间存在宾格词,宾格词是:“保质期”。
B、将所述宾格词与所述宾格词对应的分词组成第一词组,基于所述第一遍历分词与所述目标分词之间的其他分词以及所述第一词组,确定词组信息。
示例性的,宾格词为“保质期”,宾格词对应的分词为“3天”,第一词组是宾格词与宾格词对应的分词组成的,即,“保质期3天”。
可选的,步骤B可以通过如下方式实现:
B-1、基于至少一个所述食材分词中第二遍历分词的属性标识确定所述第二遍历分词为第二主语分词,基于所述目标分词与所述第一遍历分词之间的其他分词以及所述第一词组,组成第二词组。
示例性的,以“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,获取对应的分词信息包括:“二斤”、“保质期”、“3天”、“牛肉”、“猪肉”、“三斤”。第二遍历分词为“猪肉”,第二遍历分词的属性标识为“三斤”,所以确定第二遍历分词为第二主语分词,目标分词为“牛肉”,第一遍历分词为“二斤”,其他分词包括:第一词组“保质期3天”和第一主语分词“牛肉”。
其中,其他分词中包括的单个分词可与第一词组(第一词组可以是组合词)中拆分出的单个分词相同。例如,其他分词包括:“保质期3天”,第一词组拆分得到“保质期+3天”,组成第二词组为:“牛肉+二斤+保质期3天”。
或者,其他分词中包括的单个分词可与第一词组(第一词组可以是组合词)中拆分出的单个分词不同。例如,其他分词可以包括:“保质期+3天+品质”,第一词组拆分得到“保质期+3天”,组成的第二词组为“牛肉+二斤+保质期3天+品质”。一般情况下,依据牛肉的品质(以脂肪的大理石纹为代表)和生理成熟度(年龄)将牛肉分为:特优、特选、可选、标准、商用、可用、切碎和制罐8个级别。在本实施例中,若其他分词是“保质期+3天+特优”,第一词组拆分得到“保质期+3天”,组成的第二词组为“牛肉+二斤+保质期3天+特优”。
B-2、确定所述第二遍历分词为未遍历的最后一个第二主语分词,将所述第二遍历分词与至少一个所述食材中未遍历的剩余分词组成第三词组。
示例性的,第二遍历分词为未遍历的最后一个第二主语分词,将所述第二遍历分词与至少一个所述食材中未遍历的剩余分词组成第三词组,以“能不能帮我把二斤保质期3天的牛肉和猪肉三斤放到冰箱”为例,获取对应的分词信息包括:“二斤”、“保质期”、“3天”、“牛肉”、“猪肉”、“三斤”。第二遍历分词为“猪肉”,未遍历的剩余分词为“三斤”,组成的第三词组为“猪肉+三斤”。
B-3、基于所述第二词组与所述第三词组的顺序关系,确定词组信息。
可选的,步骤B-3可以通过如下方式实现:
①、基于所述第二词组与所述第三词组的顺序关系,确定候选信息。
示例性的,第二词组为“牛肉+二斤+保质期3天”,第三词组为“猪肉+三斤”,确定的候选信息为“牛肉”、“二斤”、“保质期”、“3天”、“猪肉”、“三斤”。
②、在所述候选信息中添加动词分词,得到词组信息。
示例性的,如图12B所示,基于每个词组中包括的主语分词添加一个动词分词,动词分词可以是“添加”或“放到”,例如添加的动词是“添加”,得到的词组信息是“牛肉”、“添加”、“二斤”、“保质期”、“3天”;“猪肉”、“添加”、“三斤”。
在一些实施例中,步骤S64(显示至少一个所述食材分词对应的语义理解内容)可以通过如下方式实现:响应于所述显示设备的预设操作,显示至少一个所述食材分词对应的语义理解内容,或者,在显示至少一个所述食材分词对应的语义理解内容时,隐藏所述语义理解内容中包括的动词分词。
其中,预设操作可以是关闭冰箱门、触发预设手势/按钮等,以唤起显示界面。另外,在响应于预设操作执行唤起显示界面后,冰箱在检测到关闭后,不再重复显示界面,避免多次显示,提升用户体验。
示例性的,响应于显示设备的预设操作,一种显示方式可以参照图12C所示,显示至少一个食材分词对应的语义理解内容,这种方式包括主语对应的动词分词。另一种显示方式可以参照图12D所示,在显示至少一个食材分词对应的语义理解内容时,隐藏语义理解内容中包括的动词分词。
另外,当冰箱中存放的食材发生变化时,可以相应地对食材类型和属性标识等参数进行修改,例如,用户早上存放的是“牛肉二斤保质期三天猪肉三斤”,中午取出一斤牛肉和一斤猪肉用作午餐,然后对冰箱说“修改牛肉为一斤保质期三天猪肉为二斤”,参照图12E所示,冰箱的显示设备对修改后的显示至少一个食材分词对应的语义理解内容进行显示。
在本申请实施例中,显示设备首先获取语音食材数据对应的分词信息,其中,分词信息包括至少一个食材分词以及食材分词对应的属性标识,然后基于属性标识确定至少一个食材分词中的目标分词,接着基于目标分词与至少一个食材分词中其他分词的顺序关系,确定至少一个食材分词对应的语义理解内容,最后显示至少一个食材分词对应的语义理解内容。通过获取语音食材数据对应的至少一个食材分词以及食材分词对应的属性标识,能够对多食材和各食材属性及其搭配关系进行理解,从而能够提高显示设备食材管理中倒装及乱序话术的理解能力,进一步进行准确展示,提升了用户体验。
在本申请实施例中,显示设备首先获取语音食材数据对应的分词信息,其中,分词信息包括至少一个食材分词以及食材分词对应的属性标识,然后基于属性标识确定至少一个食材分词中的目标分词,接着基于目标分词与至少一个食材分词中其他分词的顺序关系,确定至少一个食材分词对应的语义理解内容,最后显示至少一个食材分词对应的语义理解内容。通过获取语音食材数据对应的至少一个食材分词以及食材分词对应的属性标识,能够对多食材和各食材属性及其搭配关系进行理解,从而能够提高显示设备食材管理中倒装及乱序话术的理解能力,进一步进行准确展示,提升了用户体验。
本申请实施例提供一种计算机设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请实施例中的任一种所述的语义理解方法。
为了方便解释,已经结合具体的实施方式进行了上述说明。但是,上述在一些实施例中讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多种修改和变形。上述实施方式的选择和描述是为了更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用实施方式以及适于具体使用考虑的各种不同的变形的实施方式。

Claims (16)

  1. 一种显示设备,包括:
    显示器,配置为显示图像和/或用户界面;
    控制器,被配置为:接收用户对所述显示设备的目标控制请求;
    响应于所述目标控制请求,获取目标控制文本,基于预设文本库对所述目标控制文本进行语义匹配,得到语义匹配结果,所述预设文本库中包括:预设用语信息以及预设资源信息;
    基于所述语义匹配结果,执行所述目标控制请求对应的控制指令。
  2. 根据权利要求1所述的显示设备,所述控制器,具体被配置为:
    基于预设用语信息对所述目标控制文本进行语义匹配;
    检测到所述目标控制文本与所述预设用语信息中包括的全部用语均不匹配,基于预设资源信息对所述目标控制文本进行语义匹配,得到语义匹配结果。
  3. 根据权利要求2所述的显示设备,所述控制器,具体被配置为:
    检测到所述目标控制文本与所述预设用语信息中包括的候选用语匹配,确定所述目标控制请求对应的控制指令为所述候选用语对应的第一控制指令;
    执行所述候选用语对应的第一控制指令。
  4. 根据权利要求2所述的显示设备,所述控制器,具体被配置为:
    检测到所述目标控制文本与所述预设资源信息中包括的候选资源匹配,确定所述目标控制请求对应的控制指令为所述候选资源对应的第二控制指令;
    执行所述候选资源对应的第二控制指令。
  5. 根据权利要求1所述的显示设备,所述控制器,还被配置为:
    响应于第一信息发送请求,接收服务器发送的预设用语信息,其中,所述预设用语信息是所述服务器基于所述用户的历史控制请求确定出的;
    将所述预设用语信息添加至预设文本库中。
  6. 根据权利要求1所述的显示设备,所述控制器,还被配置为:
    响应于第二信息发送请求,接收服务器发送的预设资源信息,其中,所述预设资源信息是所述服务器基于所述用户的历史控制请求以及所述用户的历史访问量确定出的,所述预设资源信息包括至少两个领域内的热门资源;
    将所述预设资源信息添加至预设文本库中。
  7. 根据权利要求2所述的显示设备,所述控制器,进一步被配置为:
    检测到所述目标控制文本与所述预设资源信息中包括的全部资源均不匹配,向服务器发送所述目标控制文本,以使所述服务器对所述目标控制文本进行语义分析,确定与所述目标控制文本对应的第三控制指令;
    响应于所述服务器发送的所述目标控制文本对应的第三控制指令,执行所述第三控制指令。
  8. 根据权利要求1所述的显示设备,所述控制器,还被配置为:获取语音食材数据 对应的分词信息,所述分词信息包括:至少一个食材分词以及所述食材分词对应的属性标识;
    基于所述属性标识确定至少一个所述食材分词中的目标分词;
    基于所述目标分词与至少一个所述食材分词中其他分词的顺序关系,确定至少一个所述食材分词对应的语义理解内容。
  9. 根据权利要求8所述的显示设备,所述目标分词为至少一个所述食材分词中排列顺序最前的第一主语分词;
    所述控制器,进一步配置为:基于至少一个所述食材分词中第一遍历分词的属性标识,以及所述第一遍历分词与所述目标分词的顺序关系,确定词组信息;基于所述词组信息,确定至少一个所述食材分词对应的语义理解内容。
  10. 根据权利要求9所述的显示设备,所述控制器,进一步配置为:基于至少一个所述食材分词中第一遍历分词的属性标识确定所述第一遍历分词不为第二主语分词,且确定不存在与所述第一遍历分词的属性标识相同的分词;基于所述第一遍历分词与所述目标分词的顺序关系,确定词组信息。
  11. 根据权利要求10所述的显示设备,所述控制器,进一步配置为:确定所述第一遍历分词与所述目标分词的顺序关系属于预设关系,且,确定所述第一遍历分词与所述目标分词之间存在宾格词;将所述宾格词与所述宾格词对应的分词组成第一词组,基于所述第一遍历分词与所述目标分词之间的其他分词以及所述第一词组,确定词组信息。
  12. 根据权利要求11所述的显示设备,所述控制器,进一步配置为:基于至少一个所述食材分词中第二遍历分词的属性标识确定所述第二遍历分词为第二主语分词,基于所述目标分词与所述第一遍历分词之间的其他分词以及所述第一词组,组成第二词组;确定所述第二遍历分词为未遍历的最后一个第二主语分词,将所述第二遍历分词与至少一个所述食材中未遍历的剩余分词组成第三词组;
    基于所述第二词组与所述第三词组的顺序关系,确定词组信息。
  13. 根据权利要求12所述的显示设备,所述控制器,进一步配置为:基于所述第二词组与所述第三词组的顺序关系,确定候选信息;在所述候选信息中添加动词分词,得到词组信息。
  14. 根据权利要求13所述的显示设备,所述显示器,进一步配置为:响应于所述显示设备的预设操作,显示至少一个所述食材分词对应的语义理解内容,或者,在显示至少一个所述食材分词对应的语义理解内容时,隐藏所述语义理解内容中包括的动词分词。
  15. 一种用于显示设备的控制方法,包括:
    接收用户对显示设备的目标控制请求;
    响应于所述目标控制请求,获取目标控制文本,基于预设文本库对所述目标控制文本进行语义匹配,得到语义匹配结果,所述预设文本库中包括:预设用语信息以及预设资源信息;
    基于所述语义匹配结果,执行所述目标控制请求对应的控制指令;
    显示执行所述控制指令对应的响应界面。
  16. 根据权利要求15所述的方法,还包括:
    获取语音食材数据对应的分词信息,所述分词信息包括:至少一个食材分词以及所述食材分词对应的属性标识;
    基于所述属性标识确定至少一个所述食材分词中的目标分词;
    基于所述目标分词与至少一个所述食材分词中其他分词的顺序关系,确定至少一个所述食材分词对应的语义理解内容;
    显示至少一个所述食材分词对应的语义理解内容。
PCT/CN2023/078157 2022-06-22 2023-02-24 显示设备和控制方法 WO2023246151A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210713044.3 2022-06-22
CN202210713044.3A CN115240665A (zh) 2022-06-22 2022-06-22 显示设备、控制方法和存储介质
CN202210768515.0A CN115270808A (zh) 2022-06-30 2022-06-30 显示设备和语义理解方法
CN202210768515.0 2022-06-30

Publications (2)

Publication Number Publication Date
WO2023246151A1 WO2023246151A1 (zh) 2023-12-28
WO2023246151A9 true WO2023246151A9 (zh) 2024-01-25

Family

ID=89379109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078157 WO2023246151A1 (zh) 2022-06-22 2023-02-24 显示设备和控制方法

Country Status (1)

Country Link
WO (1) WO2023246151A1 (zh)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903621A (zh) * 2012-12-26 2014-07-02 联想(北京)有限公司 一种语音识别的方法及电子设备
CN105512182B (zh) * 2015-11-25 2019-03-12 深圳Tcl数字技术有限公司 语音控制方法及智能电视
CN105764185B (zh) * 2016-03-18 2017-12-12 深圳Tcl数字技术有限公司 交流驱动混合调光电路和电视机
CN108399919A (zh) * 2017-02-06 2018-08-14 中兴通讯股份有限公司 一种语义识别方法和装置
CN107220292A (zh) * 2017-04-25 2017-09-29 上海庆科信息技术有限公司 智能对话装置、反馈式智能语音控制系统及方法
CN109410927B (zh) * 2018-11-29 2020-04-03 北京蓦然认知科技有限公司 离线命令词与云端解析结合的语音识别方法、装置和系统
CN116235011A (zh) * 2020-11-04 2023-06-06 海信视像科技股份有限公司 显示设备及界面显示方法
CN115240665A (zh) * 2022-06-22 2022-10-25 海信视像科技股份有限公司 显示设备、控制方法和存储介质
CN115270808A (zh) * 2022-06-30 2022-11-01 海信视像科技股份有限公司 显示设备和语义理解方法

Also Published As

Publication number Publication date
WO2023246151A1 (zh) 2023-12-28

Similar Documents

Publication Publication Date Title
US11972327B2 (en) Method for automating actions for an electronic device
US10659200B2 (en) Companion application for activity cooperation
CN105634881B (zh) 应用场景推荐方法及装置
CN107515944A (zh) 基于人工智能的交互方法、用户终端、及存储介质
KR100995440B1 (ko) 인터랙티브 텔레비전용 개인 채널을 효과적으로 구현하기 위한 시스템 및 방법
EP3680896B1 (en) Method for controlling terminal by voice, terminal, server and storage medium
US20190340521A1 (en) Intelligent Recommendation Method and Terminal
CN107370649A (zh) 家电控制方法、系统、控制终端、及存储介质
WO2021103398A1 (zh) 一种智能电视以及服务器
WO2021218442A1 (zh) 通信方法、控制物联网设备的方法、电子设备
CN104954354A (zh) 数字内容的上下文感知流式传送
US11729470B2 (en) Predictive media routing based on interrupt criteria
WO2018133307A1 (zh) 一种实现语音控制的方法和终端
US20210117402A1 (en) System and method for updating knowledge graph
CN111078986A (zh) 数据检索方法、装置及计算机可读存储介质
CN114067798A (zh) 一种服务器、智能设备及智能语音控制方法
WO2022134689A1 (zh) 多媒体资源展示方法及装置
WO2023246151A9 (zh) 显示设备和控制方法
US20160004784A1 (en) Method of providing relevant information and electronic device adapted to the same
CN115240665A (zh) 显示设备、控制方法和存储介质
WO2022268136A1 (zh) 一种进行语音控制的终端设备及服务器
CN115270808A (zh) 显示设备和语义理解方法
CN115291829A (zh) 显示设备及订阅消息提醒方法
CN114627864A (zh) 显示设备与语音交互方法
CN113220954A (zh) 一种信息展示方法、装置及投影设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23825810

Country of ref document: EP

Kind code of ref document: A1