CN109600646B

CN109600646B - Voice positioning method and device, smart television and storage medium

Info

Publication number: CN109600646B
Application number: CN201811514031.3A
Authority: CN
Inventors: 李鸣; 肖云; 张奎; 储磊; 李贺
Original assignee: Future Tv Co ltd
Current assignee: Future Tv Co ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2021-03-23
Anticipated expiration: 2038-12-11
Also published as: CN109600646A

Abstract

The invention provides a voice positioning method and device, a smart television and a storage medium. The voice positioning method comprises the following steps: receiving a voice instruction of a user; determining a keyword in the voice instruction according to the voice instruction; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; and displaying a user interface corresponding to the target positioning identifier. The method improves the flexibility of voice positioning, enhances the interactivity of artificial intelligence and further improves the user experience.

Description

Voice positioning method and device, smart television and storage medium

Technical Field

The invention relates to the technical field of voice control, in particular to a voice positioning method and device.

Background

In the existing voice control process of the intelligent television, a user initiates a voice instruction, and the intelligent television responds to the instruction of the user and is positioned at a corresponding position on a television desktop. Compared with a remote controller key positioning method, the positioning method is more flexible, and a user can realize related control on the television without using a remote controller.

However, the voice command used in the existing voice command positioning is not random, that is, the voice command issued by the user must match the existing positioning operation in the current interface in the television, such as "open", "enter", "return", "previous", "next", and "jump to", so when the user wants to enter another interface, the user may need to input the "return" voice command first, exit to the desktop, then input the "next", enter the next interface, and repeat this many times to reach the interface that the user wants to watch, so the user operation is still tedious, and the interaction flexibility is poor.

Disclosure of Invention

The present application provides a method and an apparatus for voice positioning to solve the technical problem of poor flexibility of voice positioning.

The embodiment of the application is realized by the following steps:

in a first aspect, the present invention provides a method for speech localization, the method comprising: receiving a voice instruction of a user; determining a keyword in the voice instruction according to the voice instruction; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; and displaying a user interface corresponding to the target positioning identifier.

In the scheme of the embodiment of the invention, the preset positioning label mapping table is matched with the key words in the user voice command, the target positioning identification is determined, and finally the user interface corresponding to the target positioning identification is displayed. Compared with the traditional voice positioning, the user can initiate any voice instruction without being limited to fixed positioning operation on a desktop, so that the flexibility of voice positioning is improved, the interactivity of artificial intelligence is enhanced, and the user experience is further improved.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before receiving a voice instruction of a user, the method further includes:

generating a positioning label of each positionable element according to each positionable element in the desktop; and taking each locatable element as a locating identifier, and storing the locating identifier and the locating label in the locating label mapping table according to a one-to-one corresponding relationship. And taking each locatable element in the desktop as a locating identifier, and generating the locating label mapping table by corresponding to the locating label, so that the locating label mapping table can meet the voice locating requirement of the user.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, when the positionable elements are fixed-function buttons, icons, or text boxes, generating a positioning tag of each of the positionable elements includes: and generating the positioning label according to the functions of the button, the icon or the text box. And generating the positioning label according to the function of the positioning element with fixed function, thereby meeting some functional positioning requirements of users.

With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating a location tag of the locatable element includes:

and generating the positioning label according to the content title of the recommendation bit. And generating the positioning label according to the content title of the recommendation position to meet different watching requirements of the user.

With reference to the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating a location tag of the locatable element includes:

and generating the positioning label according to the content introduction of the recommendation position, wherein the content introduction comprises the participants of the TV play, the movie or the variety program and the type of the TV play, the movie or the variety program. The positioning label is generated according to the content introduction of the recommendation position, so that the user can position the content required by the user only by speaking the related actor or the related type, and the user experience is improved.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the receiving a voice instruction of a user includes:

receiving a voice instruction related to the television program watching demand of a user; correspondingly, the step of displaying the user interface corresponding to the target positioning identifier comprises: and displaying a user interface corresponding to the target positioning identifier on a television desktop. When a user watches television programs, the user can display a corresponding user interface on a television desktop by initiating a simple voice instruction, so that the user experience is improved.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the method further includes:

analyzing the voice commands of the user received for multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information. By updating the positioning label mapping table, related interest labels can be generated, and user experience is improved.

In a second aspect, an embodiment of the present invention provides an apparatus for speech positioning, where the apparatus includes a functional module configured to implement the method according to the first aspect.

In a third aspect, an embodiment of the present invention provides an intelligent television, which may execute the method described in the first aspect and various possible implementation manners of the first aspect.

In a fourth aspect, the embodiments of the present invention provide a readable storage medium, on which a computer program is stored, the computer program, when executed by a computer, implementing any possibility of the method according to the first aspect or the second aspect.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only some embodiments of the invention and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

FIG. 1 is a flow chart of a method for voice positioning according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a device for speech localization according to an embodiment of the present invention;

fig. 3 is a schematic external structural diagram of a smart television according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an internal structure of the smart television according to the embodiment of the present invention.

Icon: 300-a smart television; 301-a housing; 302-a receiver; a processor-303; display screen-304.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, a flowchart of a voice positioning method according to an embodiment of the present invention is shown, where the method includes:

step 101: and receiving a voice instruction of a user.

Step 102: and determining key words in the voice instruction according to the voice instruction.

Step 103: and matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label. The positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier.

Step 104: and determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier.

Step 105: and displaying a user interface corresponding to the target positioning identifier.

And matching the preset positioning label mapping table with the keywords in the user voice instruction, determining the target positioning identification, and finally displaying the user interface corresponding to the target positioning identification. Compared with the traditional voice positioning, the user can initiate any voice instruction without being limited to fixed positioning operation on a desktop, so that the flexibility of voice positioning is improved, the interactivity of artificial intelligence is enhanced, and the user experience is further improved.

The voice positioning method provided by the embodiment of the invention can be applied to the smart television, but also can be applied to other intelligent interactive devices, such as smart homes, smart phones and the like. In the following description, an implementation flow of the voice positioning method provided by the embodiment of the present invention will be described in detail with reference to the smart tv.

And for the positioning label mapping table, the positioning label mapping table comprises a positioning identifier and a positioning label, the positioning identifier is each positionable element in the user interface, and the corresponding position on the user interface can be positioned through the positioning identifier. The positioning label is the content corresponding to the positioning identification, and when the keyword in the voice instruction of the user is matched, the keyword is matched with the positioning label. Therefore, before obtaining the positioning tag mapping table, i.e. before performing step 101, the positioning tag may be generated according to the positioning element.

The positioning tag is a content corresponding to the positioning identifier, and therefore, modes of generating tags by different positionable elements may be different.

The first generation mode of the positioning label comprises the following steps: the positioning elements are fixed buttons, icons or text boxes, and the positioning labels are generated according to the functions of the positioning elements.

It will be appreciated that in the first generation, the location tag may be understood as a function. For example, as shown in table one, the table corresponds to the first generation method.

Locatable element	Positioning label
		Play button	Play back
Selection button	Selection set
		Collection button	Collection method
Movie brief introduction text box	Brief introduction to the drawings

Watch 1

It should be noted that, in the first generation manner, the locatable element is not limited to the one listed in table one, and may be any fixed-function button, icon or text box. Through the first generation mode, the positioning label meeting the basic operation of the user can be generated.

The second generation mode of the positioning label is as follows: the positioning element is a recommendation bit of a television program, such as a movie, a television play or a variety program, and the positioning tag is generated according to the title of the television program.

In the second generation method, the titles of the general recommendation bits contain rich information and can be used as the positioning tags. For example, as shown in table two, it is a table corresponding to the second generation method.

Locatable element	Positioning label
		Deep sea jail-crossing recommendation site	Deep sea jail
Fortune express delivery recommendation bit	Express delivery of fortune

Watch two

Through the second generation mode, the positioning label meeting the watching requirement of the user can be generated.

The third generation mode of the positioning label is as follows: the positioning element is also a recommendation bit of a television program, such as a movie, a television play or a variety program, and the positioning tag is generated according to the content introduction of the recommendation bit.

In the third generation method, unlike the second generation method, the second generation method depends only on the title content of the recommendation bit, and actually, in any case of a tv show, a movie, or a variety program, the titles may be consistent, but the specific contents are different. Such as television shows where different actors play with the same name. Thus, a location tag containing some details may be generated by the third generation. For example, as shown in table three, the table corresponds to the third generation method.

Watch III

In table three, it can be seen that for a playful recommended position, a playful actor of the recommended position can be analyzed, and the type of the movie, even the production country of the movie, etc. This information can be obtained by parsing the content introduction of the recommendation bits.

In addition, in the second and third generation modes, the existing smart televisions are all internet-enabled to watch programs online, so that the internet-enabled operation can be performed, and users can enter the recommendation positions one by one to know the title content or content introduction of each recommendation position so as to obtain the positioning tags of the recommendation positions.

After the positioning labels of the positionable elements in the television desktop are generated through the three possible generation modes, the positionable elements can be used as positioning identifiers, and the positioning identifiers and the positioning labels are mapped one by one to obtain the positioning label mapping table. For example, as shown in table four, the mapping table of the positioning tag is finally obtained.

Watch four

In the above, one generation manner of the positioning tag mapping table is described, but in practical application, there may be other generation manners of the positioning tag mapping table.

After the positioning tag mapping table is stored in the television system, a user initiates a voice instruction in the watching process of the television, and after the step 101 is executed, the voice instruction of the user is received. The voice instruction of the user may be a voice instruction related to the viewing requirement of the television program of the user, such as "i want to watch XX television drama", "i want to watch XX movie", and so on. Correspondingly, the user interface displayed finally is the user interface on the television desktop. For receiving the voice instruction of the user, a microphone can be arranged on the smart television to receive the voice instruction; the voice of the user can also be received by other devices, such as a remote controller or a mobile phone, and then forwarded to a television and the like. For the user, the voice command is initiated towards the television, or the voice command is initiated towards the remote controller or the mobile phone.

After step 101 is executed, step 102 is executed to determine the keyword. One possible implementation: the voice of the user can be recognized through a voice recognition module in the television, the voice information of the user is analyzed through a natural language understanding technology, and the key words in the voice instruction are determined. The voice recognition module can be a single recognition module or a recognition module integrated with a processor of the smart television. The keywords may represent the positioning intention of the user, and the keywords may be one or more.

The Natural Language Understanding technology is equivalent to a machine translation, and is a key for realizing artificial intelligence, and Natural Language processing is also called Natural Language Understanding (NLU), and is also called Computational linguistics (Computational linguistics). On the one hand, it is a branch of linguistic information processing, and on the other hand, it is the core of Artificial Intelligence (AI).

Taking the example that the voice instruction initiated by the user is "i want to watch XX tv series" or "i want to watch XX movie", the determined keyword may be the name of tv series or movie. If the user-initiated voice command also includes definitions of an XX series or an XX movie actor, such as an XX series played by actor a, then the determined keywords may be: the name of the XX television series and actor a. After the keyword is determined, step 103 is performed. In performing step 103, one possible implementation is: and matching the text similarity of the one or more keywords and the plurality of positioning labels one by one, and calculating the similarity of the keywords and the positioning labels to obtain a matching result.

The algorithm for calculating text similarity may be a similarity calculation algorithm such as cosine similarity, pearson correlation coefficient, and the like.

For example, the name of XX series is similarity-matched to each of the location tags in table four. After step 103 is completed, step 104 is executed, and as to the matching and confirmation modes of

steps

103 and 104, the embodiment of the present invention provides two possible implementation modes. A first possible implementation: recording the text similarity of the one or more keywords and each positioning label, sequencing the text similarities according to a descending order, and taking the positioning identification corresponding to the positioning label with the highest similarity as a target positioning identification. A second possible implementation: comparing the text similarity obtained by the current positioning label with the text similarity obtained by the previous positioning label, and if the current text similarity is greater than or equal to the previous text similarity, keeping the positioning label corresponding to the current text similarity; otherwise, the positioning label corresponding to the text similarity of the previous one is reserved. And equivalently, determining the positioning label with the highest text similarity in the matching process, and taking the positioning label corresponding to the positioning label with the highest similarity as the target positioning label.

For example, similarity matching is started in the order of table four, and the last result is the positioning label with the highest similarity to the name of the XX series or a plurality of labels with the similarity to the name of the XX series in descending order.

Then step 105 is performed, a possible implementation of step 105: after the target positioning identifier is obtained, a corresponding control instruction can be generated according to the target positioning identifier, the control instruction is sent to the desktop system, and after the desktop system receives the control instruction, a related page is displayed on a user viewing interface. Another possible implementation of step 105: and after the target positioning identification is obtained, sending the target positioning identification to a desktop system, responding the target positioning identification by the desktop system, and adjusting an interface viewed by a user to a related page.

For example, the tag with the highest similarity is found to be used as a target positioning identifier, the target positioning identifier is sent to the desktop system or a control instruction is generated and then sent to the desktop system, the desktop system responds, the user display interface is skipped, and then the user watching interface shows an interface corresponding to the XX television series. Whether the control instruction is generated or the target positioning identification is sent, the control instruction can be combined with the related action in the voice instruction of the user to: such as watching, playing, listening, etc., the related actions and the control instruction or the target positioning identification are sent to the desktop system together for response processing.

To more clearly illustrate the implementation of the positioning method in the embodiment of the present invention, it is assumed that the voice command initiated by the user is: after recognizing the voice of the user, analyzing the voice information of the user through a natural language understanding technology to obtain keywords in the voice information of the user: red sea action, actor C. And matching the obtained keywords with the positioning labels in the positioning label mapping table one by one, and calculating the similarity values of the keywords and the positioning labels. Taking the first possible implementation manner of matching and confirming in step 103 and step 104 as an example, after the similarity calculation is completed, sorting according to the similarity values in descending order, and obtaining the positioning label with the highest similarity as: action by action actor C actor D in the red sea. Obtaining a target positioning identifier according to the mapping relation: and identifying the red sea action recommendation bit. And combining the corresponding action 'watching', if the action is directly sent to a desktop system, the desktop system carries out response processing, and therefore the television desktop is directly jumped to a recommendation position of the action in the red sea. The user can continue to initiate voice instructions such as playing and the like according to the related operation content on the recommendation position.

In addition, for a user who often initiates a voice instruction, the positioning tag mapping table can be dynamically updated according to personal habits or hobbies of the user, so that the positioning tag also has an interest tag function.

Therefore, optionally, the method further comprises: analyzing the voice commands of the user received for multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information.

This mode is mainly directed to the third mode for generating the positioning tag, because the third generation mode includes specific content of each recommended bit, the positioning tag can be updated according to the habit or preference of the user. The user preference represents the interest and hobbies of the user, for example, positioning labels such as movie types, plot outlines, emotional styles and the like can be added.

By updating the positioning label mapping table, related interest labels can be generated, and user experience is improved.

Referring to fig. 2, a device 200 for positioning speech according to an embodiment of the present invention includes: a receiving module 201, a determining module 202, a matching module 203, and a presentation module 204.

The receiving module 201: the voice command is used for receiving a voice command of a user; the determination module 202: the voice command is used for determining a keyword in the voice command according to the voice command; the matching module 203: the keyword is matched with a positioning label in a preset positioning label mapping table, and the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; the determining module 202 is further configured to: determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; the display module 204: and the user interface is used for displaying the user interface corresponding to the target positioning identification.

Optionally, the apparatus 200 may further include: a generation module: the positioning label is used for generating a positioning label of each positioning element according to each positioning element in the desktop; a storage module: and the mapping table is used for taking each locatable element as a location identifier and storing the location identifier and the location label in a one-to-one correspondence relationship in the location label mapping table.

Optionally, when the positionable element is a button, an icon, or a text box with fixed function, the generating module is further configured to: and generating the positioning label according to the functions of the button, the icon or the text box.

Optionally, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating module is further configured to: and generating the positioning label according to the content title of the recommendation bit.

Optionally, when the locatable element is a recommendation bit of a tv show, a movie, or a variety program, the generating module is further configured to: and generating the positioning label according to the content introduction of the recommendation position, wherein the content introduction comprises the participants of the TV play, the movie or the variety program and the type of the TV play, the movie or the variety program.

Optionally, the receiving module 201 is further configured to receive a voice instruction related to a television program viewing demand of the user; the display module 204 is further configured to display a user interface corresponding to the target location identifier on a television desktop.

Optionally, the generating module is further configured to analyze the voice instruction of the user received multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information.

Referring to fig. 3 and fig. 4, which are schematic external and internal structural diagrams of a smart television 300 according to an embodiment of the present invention, respectively, the smart television 300 includes a housing 301, a receiver 302 disposed in the housing 301, a processor 303, and a display screen 304 disposed on the housing 301.

Alternatively, the processor 303 may be an integrated circuit chip having signal processing capabilities. The Processor 303 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

For the receiver 302, which is a voice receiving device, the receiver 302 may be a microphone receiver or other voice receiver, or other receiver capable of receiving a voice instruction of a user.

The receiver 302 is used for receiving voice instructions of a user; the processor 303: the voice command is used for determining a keyword in the voice command according to the voice command; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identifier is a positioning element on a desktop, and the positioning tag is content corresponding to the positioning identifier; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; the display screen 304: and the user interface is used for displaying the user interface corresponding to the target positioning identification.

Optionally, the processor 303 is further configured to generate a positioning tag of each positionable element according to each positionable element in the desktop; and taking each locatable element as a locating identifier, and storing the locating identifier and the locating label in the locating label mapping table according to a one-to-one corresponding relationship.

Optionally, when the locatable element is a button, an icon or a text box with fixed function, the processor 303 is further configured to generate the location tag according to the function of the button, the icon or the text box.

Optionally, when the locatable element is a recommendation bit of a tv show, a movie or a variety program, the processor 303 is configured to generate the location tag according to a content title of the recommendation bit.

Optionally, when the locatable element is a recommendation bit of a tv show, a movie, or an all-purpose art program, the processor 303 is further configured to generate the location tag according to a content introduction of the recommendation bit, where the content introduction includes a participant of the tv show, the movie, or the all-purpose art program, and a genre of the tv show, the movie, or the all-purpose art program.

Optionally, the receiver 302 is configured to receive a voice command related to the television program viewing requirements of the user; the display screen 304 is used for displaying a user interface corresponding to the target positioning identifier on a television desktop.

Optionally, the processor 303 is further configured to analyze the voice instruction of the user received multiple times; determining preference information of the user according to the analysis result; and updating the preset positioning label mapping table according to the preference information.

The embodiments and specific examples of the voice positioning method in the foregoing embodiments are also applicable to the apparatus for voice positioning in fig. 2 and the smart television in fig. 3 and 4, and those skilled in the art can clearly know the apparatus for voice positioning in fig. 2 and the method for implementing the smart television in fig. 3 and 4 through the foregoing detailed description of the voice positioning method, so that details are not described herein again for the sake of brevity of the description.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of speech localization, comprising:

receiving a voice instruction of a user;

determining a keyword in the voice instruction according to the voice instruction;

matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identification is a positioning element on each user interface in all user interfaces on a television desktop, and the positioning label is content corresponding to the positioning identification;

determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier;

displaying a user interface corresponding to the target positioning identification;

analyzing the voice commands of the user received for multiple times;

determining preference information of the user according to the analysis result;

and updating the positioning labels in the preset positioning label mapping table according to the preference information.

2. The method of claim 1, wherein prior to receiving the user's voice instruction, the method further comprises:

generating a positioning label of each positionable element according to each positionable element on each user interface in all user interfaces on the television desktop;

and taking each locatable element as a locating identifier, and storing the locating identifier and the locating label in the locating label mapping table according to a one-to-one corresponding relationship.

3. The method of claim 2, wherein generating a positional tag for each of the locatable elements when the locatable element is a fixed function button, icon or text box comprises:

and generating the positioning label according to the functions of the button, the icon or the text box.

4. The method of claim 2, wherein generating the location tag of the locatable element when the locatable element is a recommendation bit for a television show, a movie, or a variety program comprises:

and generating the positioning label according to the content title of the recommendation bit.

5. The method of claim 2, wherein generating the location tag of the locatable element when the locatable element is a recommendation bit for a television show, a movie, or a variety program comprises:

and generating the positioning label according to the content introduction of the recommendation position, wherein the content introduction comprises the participants of the TV play, the movie or the variety program and the type of the TV play, the movie or the variety program.

6. The method of claim 1, wherein receiving a voice instruction from a user comprises:

receiving a voice instruction related to the television program watching demand of a user;

correspondingly, the step of displaying the user interface corresponding to the target positioning identifier comprises: and displaying a user interface corresponding to the target positioning identifier on a television desktop.

7. An apparatus for speech localization, comprising:

a receiving module: the voice command is used for receiving a voice command of a user;

a determination module: the voice command is used for determining a keyword in the voice command according to the voice command;

a matching module: the keyword is matched with a positioning label in a preset positioning label mapping table, and the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identification is a positioning element on each user interface in all user interfaces on a television desktop, and the positioning label is content corresponding to the positioning identification;

the determination module is further to: determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier;

a display module: the user interface is used for displaying the user interface corresponding to the target positioning identification;

the updating module is used for analyzing the voice instructions of the user received for many times; determining preference information of the user according to the analysis result; and updating the positioning labels in the preset positioning label mapping table according to the preference information.

8. The intelligent television is characterized by comprising a shell;

the voice receiving device is arranged in the shell and used for receiving a voice instruction of a user;

the processor is arranged in the shell and used for determining key words in the voice instruction according to the voice instruction; matching the keyword with a positioning label in a preset positioning label mapping table, wherein the positioning label mapping table also comprises a positioning identifier corresponding to the positioning label; the positioning identification is a positioning element on each user interface in all user interfaces on a television desktop, and the positioning label is content corresponding to the positioning identification; determining a positioning identifier corresponding to the positioning label corresponding to the keyword as a target positioning identifier; analyzing the voice commands of the user received for multiple times; determining preference information of the user according to the analysis result; updating the positioning labels in the preset positioning label mapping table according to the preference information;

and the display screen is arranged on the shell and used for displaying the user interface corresponding to the target positioning identifier.

9. A readable storage medium, having stored thereon a computer program for performing the steps of the method according to any one of claims 1-6 when the computer program is executed by a computer.