CN109600646B - Voice positioning method and device, smart television and storage medium - Google Patents

Voice positioning method and device, smart television and storage medium Download PDF

Info

Publication number
CN109600646B
CN109600646B CN201811514031.3A CN201811514031A CN109600646B CN 109600646 B CN109600646 B CN 109600646B CN 201811514031 A CN201811514031 A CN 201811514031A CN 109600646 B CN109600646 B CN 109600646B
Authority
CN
China
Prior art keywords
positioning
user
voice
label
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811514031.3A
Other languages
Chinese (zh)
Other versions
CN109600646A (en
Inventor
李鸣
肖云
张奎
储磊
李贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Future Tv Co ltd
Original Assignee
Future Tv Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Future Tv Co ltd filed Critical Future Tv Co ltd
Priority to CN201811514031.3A priority Critical patent/CN109600646B/en
Publication of CN109600646A publication Critical patent/CN109600646A/en
Application granted granted Critical
Publication of CN109600646B publication Critical patent/CN109600646B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/4221Dedicated function buttons, e.g. for the control of an EPG, subtitles, aspect ratio, picture-in-picture or teletext
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a voice positioning method and device, a smart television and a storage medium. The voice positioning method comprises the following steps: receiving a voice instruction of a user; determining a keyword in the voice instruction according to the voice instruction; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; and displaying a user interface corresponding to the target positioning identifier. The method improves the flexibility of voice positioning, enhances the interactivity of artificial intelligence and further improves the user experience.

Description

Voice positioning method and device, smart television and storage medium
Technical Field
The invention relates to the technical field of voice control, in particular to a voice positioning method and device.
Background
In the existing voice control process of the intelligent television, a user initiates a voice instruction, and the intelligent television responds to the instruction of the user and is positioned at a corresponding position on a television desktop. Compared with a remote controller key positioning method, the positioning method is more flexible, and a user can realize related control on the television without using a remote controller.
However, the voice command used in the existing voice command positioning is not random, that is, the voice command issued by the user must match the existing positioning operation in the current interface in the television, such as "open", "enter", "return", "previous", "next", and "jump to", so when the user wants to enter another interface, the user may need to input the "return" voice command first, exit to the desktop, then input the "next", enter the next interface, and repeat this many times to reach the interface that the user wants to watch, so the user operation is still tedious, and the interaction flexibility is poor.
Disclosure of Invention
The present application provides a method and an apparatus for voice positioning to solve the technical problem of poor flexibility of voice positioning.
The embodiment of the application is realized by the following steps:
in a first aspect, the present invention provides a method for speech localization, the method comprising: receiving a voice instruction of a user; determining a keyword in the voice instruction according to the voice instruction; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; and displaying a user interface corresponding to the target positioning identifier.
In the scheme of the embodiment of the invention, the preset positioning label mapping table is matched with the key words in the user voice command, the target positioning identification is determined, and finally the user interface corresponding to the target positioning identification is displayed. Compared with the traditional voice positioning, the user can initiate any voice instruction without being limited to fixed positioning operation on a desktop, so that the flexibility of voice positioning is improved, the interactivity of artificial intelligence is enhanced, and the user experience is further improved.
With reference to the first aspect, in a first possible implementation manner of the first aspect, before receiving a voice instruction of a user, the method further includes:
generating a positioning label of each positionable element according to each positionable element in the desktop; and taking each locatable element as a locating identifier, and storing the locating identifier and the locating label in the locating label mapping table according to a one-to-one corresponding relationship. And taking each locatable element in the desktop as a locating identifier, and generating the locating label mapping table by corresponding to the locating label, so that the locating label mapping table can meet the voice locating requirement of the user.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, when the positionable elements are fixed-function buttons, icons, or text boxes, generating a positioning tag of each of the positionable elements includes: and generating the positioning label according to the functions of the button, the icon or the text box. And generating the positioning label according to the function of the positioning element with fixed function, thereby meeting some functional positioning requirements of users.
With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating a location tag of the locatable element includes:
and generating the positioning label according to the content title of the recommendation bit. And generating the positioning label according to the content title of the recommendation position to meet different watching requirements of the user.
With reference to the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating a location tag of the locatable element includes:
and generating the positioning label according to the content introduction of the recommendation position, wherein the content introduction comprises the participants of the TV play, the movie or the variety program and the type of the TV play, the movie or the variety program. The positioning label is generated according to the content introduction of the recommendation position, so that the user can position the content required by the user only by speaking the related actor or the related type, and the user experience is improved.
With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the receiving a voice instruction of a user includes:
receiving a voice instruction related to the television program watching demand of a user; correspondingly, the step of displaying the user interface corresponding to the target positioning identifier comprises: and displaying a user interface corresponding to the target positioning identifier on a television desktop. When a user watches television programs, the user can display a corresponding user interface on a television desktop by initiating a simple voice instruction, so that the user experience is improved.
With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the method further includes:
analyzing the voice commands of the user received for multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information. By updating the positioning label mapping table, related interest labels can be generated, and user experience is improved.
In a second aspect, an embodiment of the present invention provides an apparatus for speech positioning, where the apparatus includes a functional module configured to implement the method according to the first aspect.
In a third aspect, an embodiment of the present invention provides an intelligent television, which may execute the method described in the first aspect and various possible implementation manners of the first aspect.
In a fourth aspect, the embodiments of the present invention provide a readable storage medium, on which a computer program is stored, the computer program, when executed by a computer, implementing any possibility of the method according to the first aspect or the second aspect.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only some embodiments of the invention and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
FIG. 1 is a flow chart of a method for voice positioning according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a device for speech localization according to an embodiment of the present invention;
fig. 3 is a schematic external structural diagram of a smart television according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an internal structure of the smart television according to the embodiment of the present invention.
Icon: 300-a smart television; 301-a housing; 302-a receiver; a processor-303; display screen-304.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, a flowchart of a voice positioning method according to an embodiment of the present invention is shown, where the method includes:
step 101: and receiving a voice instruction of a user.
Step 102: and determining key words in the voice instruction according to the voice instruction.
Step 103: and matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label. The positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier.
Step 104: and determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier.
Step 105: and displaying a user interface corresponding to the target positioning identifier.
And matching the preset positioning label mapping table with the keywords in the user voice instruction, determining the target positioning identification, and finally displaying the user interface corresponding to the target positioning identification. Compared with the traditional voice positioning, the user can initiate any voice instruction without being limited to fixed positioning operation on a desktop, so that the flexibility of voice positioning is improved, the interactivity of artificial intelligence is enhanced, and the user experience is further improved.
The voice positioning method provided by the embodiment of the invention can be applied to the smart television, but also can be applied to other intelligent interactive devices, such as smart homes, smart phones and the like. In the following description, an implementation flow of the voice positioning method provided by the embodiment of the present invention will be described in detail with reference to the smart tv.
And for the positioning label mapping table, the positioning label mapping table comprises a positioning identifier and a positioning label, the positioning identifier is each positionable element in the user interface, and the corresponding position on the user interface can be positioned through the positioning identifier. The positioning label is the content corresponding to the positioning identification, and when the keyword in the voice instruction of the user is matched, the keyword is matched with the positioning label. Therefore, before obtaining the positioning tag mapping table, i.e. before performing step 101, the positioning tag may be generated according to the positioning element.
The positioning tag is a content corresponding to the positioning identifier, and therefore, modes of generating tags by different positionable elements may be different.
The first generation mode of the positioning label comprises the following steps: the positioning elements are fixed buttons, icons or text boxes, and the positioning labels are generated according to the functions of the positioning elements.
It will be appreciated that in the first generation, the location tag may be understood as a function. For example, as shown in table one, the table corresponds to the first generation method.
Locatable element Positioning label
Play button Play back
Selection button Selection set
Collection button Collection method
Movie brief introduction text box Brief introduction to the drawings
Watch 1
It should be noted that, in the first generation manner, the locatable element is not limited to the one listed in table one, and may be any fixed-function button, icon or text box. Through the first generation mode, the positioning label meeting the basic operation of the user can be generated.
The second generation mode of the positioning label is as follows: the positioning element is a recommendation bit of a television program, such as a movie, a television play or a variety program, and the positioning tag is generated according to the title of the television program.
In the second generation method, the titles of the general recommendation bits contain rich information and can be used as the positioning tags. For example, as shown in table two, it is a table corresponding to the second generation method.
Locatable element Positioning label
Deep sea jail-crossing recommendation site Deep sea jail
Fortune express delivery recommendation bit Express delivery of fortune
Watch two
Through the second generation mode, the positioning label meeting the watching requirement of the user can be generated.
The third generation mode of the positioning label is as follows: the positioning element is also a recommendation bit of a television program, such as a movie, a television play or a variety program, and the positioning tag is generated according to the content introduction of the recommendation bit.
In the third generation method, unlike the second generation method, the second generation method depends only on the title content of the recommendation bit, and actually, in any case of a tv show, a movie, or a variety program, the titles may be consistent, but the specific contents are different. Such as television shows where different actors play with the same name. Thus, a location tag containing some details may be generated by the third generation. For example, as shown in table three, the table corresponds to the third generation method.
Watch III
Figure BDA0001900297070000071
In table three, it can be seen that for a playful recommended position, a playful actor of the recommended position can be analyzed, and the type of the movie, even the production country of the movie, etc. This information can be obtained by parsing the content introduction of the recommendation bits.
In addition, in the second and third generation modes, the existing smart televisions are all internet-enabled to watch programs online, so that the internet-enabled operation can be performed, and users can enter the recommendation positions one by one to know the title content or content introduction of each recommendation position so as to obtain the positioning tags of the recommendation positions.
After the positioning labels of the positionable elements in the television desktop are generated through the three possible generation modes, the positionable elements can be used as positioning identifiers, and the positioning identifiers and the positioning labels are mapped one by one to obtain the positioning label mapping table. For example, as shown in table four, the mapping table of the positioning tag is finally obtained.
Figure BDA0001900297070000081
Watch four
In the above, one generation manner of the positioning tag mapping table is described, but in practical application, there may be other generation manners of the positioning tag mapping table.
After the positioning tag mapping table is stored in the television system, a user initiates a voice instruction in the watching process of the television, and after the step 101 is executed, the voice instruction of the user is received. The voice instruction of the user may be a voice instruction related to the viewing requirement of the television program of the user, such as "i want to watch XX television drama", "i want to watch XX movie", and so on. Correspondingly, the user interface displayed finally is the user interface on the television desktop. For receiving the voice instruction of the user, a microphone can be arranged on the smart television to receive the voice instruction; the voice of the user can also be received by other devices, such as a remote controller or a mobile phone, and then forwarded to a television and the like. For the user, the voice command is initiated towards the television, or the voice command is initiated towards the remote controller or the mobile phone.
After step 101 is executed, step 102 is executed to determine the keyword. One possible implementation: the voice of the user can be recognized through a voice recognition module in the television, the voice information of the user is analyzed through a natural language understanding technology, and the key words in the voice instruction are determined. The voice recognition module can be a single recognition module or a recognition module integrated with a processor of the smart television. The keywords may represent the positioning intention of the user, and the keywords may be one or more.
The Natural Language Understanding technology is equivalent to a machine translation, and is a key for realizing artificial intelligence, and Natural Language processing is also called Natural Language Understanding (NLU), and is also called Computational linguistics (Computational linguistics). On the one hand, it is a branch of linguistic information processing, and on the other hand, it is the core of Artificial Intelligence (AI).
Taking the example that the voice instruction initiated by the user is "i want to watch XX tv series" or "i want to watch XX movie", the determined keyword may be the name of tv series or movie. If the user-initiated voice command also includes definitions of an XX series or an XX movie actor, such as an XX series played by actor a, then the determined keywords may be: the name of the XX television series and actor a. After the keyword is determined, step 103 is performed. In performing step 103, one possible implementation is: and matching the text similarity of the one or more keywords and the plurality of positioning labels one by one, and calculating the similarity of the keywords and the positioning labels to obtain a matching result.
The algorithm for calculating text similarity may be a similarity calculation algorithm such as cosine similarity, pearson correlation coefficient, and the like.
For example, the name of XX series is similarity-matched to each of the location tags in table four. After step 103 is completed, step 104 is executed, and as to the matching and confirmation modes of steps 103 and 104, the embodiment of the present invention provides two possible implementation modes. A first possible implementation: recording the text similarity of the one or more keywords and each positioning label, sequencing the text similarities according to a descending order, and taking the positioning identification corresponding to the positioning label with the highest similarity as a target positioning identification. A second possible implementation: comparing the text similarity obtained by the current positioning label with the text similarity obtained by the previous positioning label, and if the current text similarity is greater than or equal to the previous text similarity, keeping the positioning label corresponding to the current text similarity; otherwise, the positioning label corresponding to the text similarity of the previous one is reserved. And equivalently, determining the positioning label with the highest text similarity in the matching process, and taking the positioning label corresponding to the positioning label with the highest similarity as the target positioning label.
For example, similarity matching is started in the order of table four, and the last result is the positioning label with the highest similarity to the name of the XX series or a plurality of labels with the similarity to the name of the XX series in descending order.
Then step 105 is performed, a possible implementation of step 105: after the target positioning identifier is obtained, a corresponding control instruction can be generated according to the target positioning identifier, the control instruction is sent to the desktop system, and after the desktop system receives the control instruction, a related page is displayed on a user viewing interface. Another possible implementation of step 105: and after the target positioning identification is obtained, sending the target positioning identification to a desktop system, responding the target positioning identification by the desktop system, and adjusting an interface viewed by a user to a related page.
For example, the tag with the highest similarity is found to be used as a target positioning identifier, the target positioning identifier is sent to the desktop system or a control instruction is generated and then sent to the desktop system, the desktop system responds, the user display interface is skipped, and then the user watching interface shows an interface corresponding to the XX television series. Whether the control instruction is generated or the target positioning identification is sent, the control instruction can be combined with the related action in the voice instruction of the user to: such as watching, playing, listening, etc., the related actions and the control instruction or the target positioning identification are sent to the desktop system together for response processing.
To more clearly illustrate the implementation of the positioning method in the embodiment of the present invention, it is assumed that the voice command initiated by the user is: after recognizing the voice of the user, analyzing the voice information of the user through a natural language understanding technology to obtain keywords in the voice information of the user: red sea action, actor C. And matching the obtained keywords with the positioning labels in the positioning label mapping table one by one, and calculating the similarity values of the keywords and the positioning labels. Taking the first possible implementation manner of matching and confirming in step 103 and step 104 as an example, after the similarity calculation is completed, sorting according to the similarity values in descending order, and obtaining the positioning label with the highest similarity as: action by action actor C actor D in the red sea. Obtaining a target positioning identifier according to the mapping relation: and identifying the red sea action recommendation bit. And combining the corresponding action 'watching', if the action is directly sent to a desktop system, the desktop system carries out response processing, and therefore the television desktop is directly jumped to a recommendation position of the action in the red sea. The user can continue to initiate voice instructions such as playing and the like according to the related operation content on the recommendation position.
In addition, for a user who often initiates a voice instruction, the positioning tag mapping table can be dynamically updated according to personal habits or hobbies of the user, so that the positioning tag also has an interest tag function.
Therefore, optionally, the method further comprises: analyzing the voice commands of the user received for multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information.
This mode is mainly directed to the third mode for generating the positioning tag, because the third generation mode includes specific content of each recommended bit, the positioning tag can be updated according to the habit or preference of the user. The user preference represents the interest and hobbies of the user, for example, positioning labels such as movie types, plot outlines, emotional styles and the like can be added.
By updating the positioning label mapping table, related interest labels can be generated, and user experience is improved.
Referring to fig. 2, a device 200 for positioning speech according to an embodiment of the present invention includes: a receiving module 201, a determining module 202, a matching module 203, and a presentation module 204.
The receiving module 201: the voice command is used for receiving a voice command of a user; the determination module 202: the voice command is used for determining a keyword in the voice command according to the voice command; the matching module 203: the keyword is matched with a positioning label in a preset positioning label mapping table, and the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; the determining module 202 is further configured to: determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; the display module 204: and the user interface is used for displaying the user interface corresponding to the target positioning identification.
Optionally, the apparatus 200 may further include: a generation module: the positioning label is used for generating a positioning label of each positioning element according to each positioning element in the desktop; a storage module: and the mapping table is used for taking each locatable element as a location identifier and storing the location identifier and the location label in a one-to-one correspondence relationship in the location label mapping table.
Optionally, when the positionable element is a button, an icon, or a text box with fixed function, the generating module is further configured to: and generating the positioning label according to the functions of the button, the icon or the text box.
Optionally, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating module is further configured to: and generating the positioning label according to the content title of the recommendation bit.
Optionally, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating module is further configured to: and generating the positioning label according to the content introduction of the recommendation position, wherein the content introduction comprises the participants of the TV play, the movie or the variety program and the type of the TV play, the movie or the variety program.
Optionally, the receiving module 201 is further configured to receive a voice instruction related to a television program viewing demand of the user; the display module 204 is further configured to display a user interface corresponding to the target location identifier on a television desktop.
Optionally, the generating module is further configured to analyze the voice instruction of the user received multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information.
Referring to fig. 3 and fig. 4, which are schematic external and internal structural diagrams of a smart television 300 according to an embodiment of the present invention, respectively, the smart television 300 includes a housing 301, a receiver 302 disposed in the housing 301, a processor 303, and a display screen 304 disposed on the housing 301.
Alternatively, the processor 303 may be an integrated circuit chip having signal processing capabilities. The Processor 303 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
For the receiver 302, which is a voice receiving device, the receiver 302 may be a microphone receiver or other voice receiver, or other receiver capable of receiving a voice instruction of a user.
The receiver 302 is used for receiving voice instructions of a user; the processor 303: the voice command is used for determining a keyword in the voice command according to the voice command; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; the display screen 304: and the user interface is used for displaying the user interface corresponding to the target positioning identification.
Optionally, the processor 303 is further configured to generate a positioning tag of each positionable element according to each positionable element in the desktop; and taking each locatable element as a locating identifier, and storing the locating identifier and the locating label in the locating label mapping table according to a one-to-one corresponding relationship.
Optionally, when the locatable element is a button, an icon or a text box with fixed function, the processor 303 is further configured to generate the location tag according to the function of the button, the icon or the text box.
Optionally, when the locatable element is a recommendation bit of a tv show, a movie or a variety program, the processor 303 is configured to generate the location tag according to a content title of the recommendation bit.
Optionally, when the locatable element is a recommendation bit of a tv show, a movie, or an all-purpose art program, the processor 303 is further configured to generate the location tag according to a content introduction of the recommendation bit, where the content introduction includes a participant of the tv show, the movie, or the all-purpose art program, and a genre of the tv show, the movie, or the all-purpose art program.
Optionally, the receiver 302 is configured to receive a voice command related to the television program viewing requirements of the user; the display screen 304 is used for displaying a user interface corresponding to the target positioning identifier on a television desktop.
Optionally, the processor 303 is further configured to analyze the voice instruction of the user received multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information.
The embodiments and specific examples of the voice positioning method in the foregoing embodiments are also applicable to the apparatus for voice positioning in fig. 2 and the smart television in fig. 3 and 4, and those skilled in the art can clearly know the apparatus for voice positioning in fig. 2 and the method for implementing the smart television in fig. 3 and 4 through the foregoing detailed description of the voice positioning method, so that details are not described herein again for the sake of brevity of the description.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method of speech localization, comprising:
receiving a voice instruction of a user;
determining a keyword in the voice instruction according to the voice instruction;
matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identification is a positioning element on each user interface in all user interfaces on a television desktop, and the positioning label is content corresponding to the positioning identification;
determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier;
displaying a user interface corresponding to the target positioning identification;
analyzing the voice commands of the user received for multiple times;
determining preference information of the user according to the analysis result;
and updating the positioning labels in the preset positioning label mapping table according to the preference information.
2. The method of claim 1, wherein prior to receiving the user's voice instruction, the method further comprises:
generating a positioning label of each positionable element according to each positionable element on each user interface in all user interfaces on the television desktop;
and taking each locatable element as a locating identifier, and storing the locating identifier and the locating label in the locating label mapping table according to a one-to-one corresponding relationship.
3. The method of claim 2, wherein generating a positional tag for each of the locatable elements when the locatable element is a fixed function button, icon or text box comprises:
and generating the positioning label according to the functions of the button, the icon or the text box.
4. The method of claim 2, wherein generating the location tag of the locatable element when the locatable element is a recommendation bit for a television show, a movie, or a variety program comprises:
and generating the positioning label according to the content title of the recommendation bit.
5. The method of claim 2, wherein generating the location tag of the locatable element when the locatable element is a recommendation bit for a television show, a movie, or a variety program comprises:
and generating the positioning label according to the content introduction of the recommendation position, wherein the content introduction comprises the participants of the TV play, the movie or the variety program and the type of the TV play, the movie or the variety program.
6. The method of claim 1, wherein receiving a voice instruction from a user comprises:
receiving a voice instruction related to the television program watching demand of a user;
correspondingly, the step of displaying the user interface corresponding to the target positioning identifier comprises: and displaying a user interface corresponding to the target positioning identifier on a television desktop.
7. An apparatus for speech localization, comprising:
a receiving module: the voice command is used for receiving a voice command of a user;
a determination module: the voice command is used for determining a keyword in the voice command according to the voice command;
a matching module: the keyword is matched with a positioning label in a preset positioning label mapping table, and the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identification is a positioning element on each user interface in all user interfaces on a television desktop, and the positioning label is content corresponding to the positioning identification;
the determination module is further to: determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier;
a display module: the user interface is used for displaying the user interface corresponding to the target positioning identification;
the updating module is used for analyzing the voice instructions of the user received for many times; determining preference information of the user according to the analysis result; and updating the positioning labels in the preset positioning label mapping table according to the preference information.
8. The intelligent television is characterized by comprising a shell;
the voice receiving device is arranged in the shell and used for receiving a voice instruction of a user;
the processor is arranged in the shell and used for determining key words in the voice instruction according to the voice instruction; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identification is a positioning element on each user interface in all user interfaces on a television desktop, and the positioning label is content corresponding to the positioning identification; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; analyzing the voice commands of the user received for multiple times; determining preference information of the user according to the analysis result; updating the positioning labels in the preset positioning label mapping table according to the preference information;
and the display screen is arranged on the shell and used for displaying the user interface corresponding to the target positioning identifier.
9. A readable storage medium, having stored thereon a computer program for performing the steps of the method according to any one of claims 1-6 when the computer program is executed by a computer.
CN201811514031.3A 2018-12-11 2018-12-11 Voice positioning method and device, smart television and storage medium Active CN109600646B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811514031.3A CN109600646B (en) 2018-12-11 2018-12-11 Voice positioning method and device, smart television and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811514031.3A CN109600646B (en) 2018-12-11 2018-12-11 Voice positioning method and device, smart television and storage medium

Publications (2)

Publication Number Publication Date
CN109600646A CN109600646A (en) 2019-04-09
CN109600646B true CN109600646B (en) 2021-03-23

Family

ID=65961729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811514031.3A Active CN109600646B (en) 2018-12-11 2018-12-11 Voice positioning method and device, smart television and storage medium

Country Status (1)

Country Link
CN (1) CN109600646B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112004131A (en) * 2020-08-12 2020-11-27 海信电子科技(武汉)有限公司 Display system
CN113891157A (en) * 2021-11-11 2022-01-04 百度在线网络技术(北京)有限公司 Video playing method, video playing device, electronic equipment, storage medium and program product
CN116185190B (en) * 2023-02-09 2024-05-10 江苏泽景汽车电子股份有限公司 Information display control method and device and electronic equipment
CN116594916B (en) * 2023-07-17 2023-11-14 腾讯科技(深圳)有限公司 Page control positioning method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102427558A (en) * 2011-09-27 2012-04-25 深圳市九洲电器有限公司 Sound control method of set top box and set top box thereof
CN103472990A (en) * 2013-08-27 2013-12-25 小米科技有限责任公司 Appliance, and method and device for controlling same
CN105161106A (en) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 Voice control method of intelligent terminal, voice control device and television system
CN106057203A (en) * 2016-05-24 2016-10-26 深圳市敢为软件技术有限公司 Precise voice control method and device
CN107948698A (en) * 2017-12-14 2018-04-20 深圳市雷鸟信息科技有限公司 Sound control method, system and the smart television of smart television

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8175885B2 (en) * 2007-07-23 2012-05-08 Verizon Patent And Licensing Inc. Controlling a set-top box via remote speech recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102427558A (en) * 2011-09-27 2012-04-25 深圳市九洲电器有限公司 Sound control method of set top box and set top box thereof
CN103472990A (en) * 2013-08-27 2013-12-25 小米科技有限责任公司 Appliance, and method and device for controlling same
CN105161106A (en) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 Voice control method of intelligent terminal, voice control device and television system
CN106057203A (en) * 2016-05-24 2016-10-26 深圳市敢为软件技术有限公司 Precise voice control method and device
CN107948698A (en) * 2017-12-14 2018-04-20 深圳市雷鸟信息科技有限公司 Sound control method, system and the smart television of smart television

Also Published As

Publication number Publication date
CN109600646A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN109600646B (en) Voice positioning method and device, smart television and storage medium
CN110430476B (en) Live broadcast room searching method, system, computer equipment and storage medium
CN109165302B (en) Multimedia file recommendation method and device
US20180121547A1 (en) Systems and methods for providing information discovery and retrieval
CN109688475B (en) Video playing skipping method and system and computer readable storage medium
US20180152767A1 (en) Providing related objects during playback of video data
CN107527619B (en) Method and device for positioning voice control service
US20150012840A1 (en) Identification and Sharing of Selections within Streaming Content
CN111372109B (en) Intelligent television and information interaction method
WO2019047878A1 (en) Method for controlling terminal by voice, terminal, server and storage medium
CN109979450B (en) Information processing method and device and electronic equipment
US10650814B2 (en) Interactive question-answering apparatus and method thereof
CN109144285A (en) A kind of input method and device
CN112765460A (en) Conference information query method, device, storage medium, terminal device and server
WO2011106087A1 (en) Method for processing auxilary information for topic generation
CN108491178A (en) Information browsing method, browser and server
CN112685637B (en) Intelligent interaction method of intelligent equipment and intelligent equipment
CN112417095B (en) Voice message processing method and device
CN117809651A (en) Voice interaction device and voice interaction method
KR20180059347A (en) Interactive question-anwering apparatus and method thereof
CN115687807A (en) Information display method, device, terminal and storage medium
CN113438532B (en) Video processing method, video playing method, video processing device, video playing device, electronic equipment and storage medium
EP4161085A1 (en) Real-time audio/video recommendation method and apparatus, device, and computer storage medium
CN110209939B (en) Method and device for acquiring recommendation information, electronic equipment and readable storage medium
CN112927686A (en) Voice recommendation language display method, device and system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant