CN113763955A

CN113763955A - Cross-screen voice interaction implementation method based on NLP natural language processing

Info

Publication number: CN113763955A
Application number: CN202111109330.0A
Authority: CN
Inventors: 严志雄; 邵寻路; 麻泽宇; 吴晓涛
Original assignee: Paco Video Technology Hangzhou Co ltd
Current assignee: Paco Video Technology Hangzhou Co ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-12-07

Abstract

The invention discloses a cross-screen voice interaction realization method based on NLP natural language processing, which comprises the following steps: constructing a user-defined word stock and label system by using a graph database; labeling the keyword labels of the television programs managed by the platform through an interface; applying for a WeChat applet account, adding a login page and adding a voice remote controller page; an integrated speech translation interface; sending the small program to generate a two-dimensional code, and adding the two-dimensional code into a set top box of the television; scanning the two-dimensional code to open the applet and informing the set top box to establish polling request connection with the server; the audio file collection function is used in the applet, and a request is initiated to the server after the audio file is collected; the server side translates the audio, and the server side performs word segmentation on translated characters; context logic processing; the large-screen end of the television receives the information of the server and then makes a response action; the mobile terminal such as a mobile phone replaces a voice remote controller, so that the user cost is reduced, and the operation is convenient.

Description

Cross-screen voice interaction implementation method based on NLP natural language processing

Technical Field

The invention relates to the technical field of natural language processing, in particular to a cross-screen voice interaction implementation method based on NLP natural language processing.

Background

IPTV, i.e. an interactive network television, integrates multiple technologies such as internet, multimedia, and communication, and it is very common for users to watch IPTV through a television set-top box. The data volume of IPTV television programs is very large, and tens of millions of pieces of media asset data can be easily achieved. In daily life, the system can meet the requirement of users for searching and watching most of television programs.

At present, the program retrieval interaction of the IPTV mainly uses a remote controller. The user may use the remote control keys to sort through or type in search of specified content. In addition, the user can use a voice remote controller to complete the retrieval. Although operators have already performed classification management and column scheduling on a large number of television programs, it is time-consuming for users to find specific programs by using a remote controller, and often need to find multiple keys and multiple screens. Some users select the voice remote controller and hope to operate the television more quickly through voice interaction, but the actual situation is that the voice remote controller has certain cost, low popularization rate and no short-term memory and context and scene dialogue processing capability.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a cross-screen voice interaction implementation method based on NLP natural language processing, which can effectively solve the problems in the background art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

the cross-screen voice interaction realization method based on NLP natural language processing comprises the following steps:

step S1, a user-defined word stock and label system is constructed by using a graph database;

step S2, labeling the keyword label of the television program managed by the platform, storing the television program information, the keyword label information and the associated information of the two, providing an external inquiry interface, and realizing the retrieval of the television program according to the transmitted keyword information;

step S3, applying for a WeChat applet account;

step S4, developing the wechat applet, adding a login page and integrating a wechat applet login authentication interface; adding a voice remote controller page, integrating recording authority authentication of the WeChat applet and a recording use interface; an integrated speech translation interface;

step S5, sending the WeChat applet to generate a two-dimensional code, and adding the two-dimensional code into the client application of the television or the set-top box;

step S6, scanning the two-dimensional code to open the WeChat applet, informing the client to establish a polling request connection with the server, and the set-top box starting to send a polling request to the server;

step S7, a recording collecting function is used in the WeChat applet, and a request is initiated to the server after the audio file is recorded;

step S8, the service end translates the audio frequency, then carries on word segmentation to the translated words, uses the self-defined word stock and the label system to extract the key word label after the word segmentation;

step S9, after the keyword label is extracted, the context logic processing is carried out to form an action instruction and the action instruction is returned to the client of the television or the set top box;

in step S10, the client executes corresponding actions, such as channel change, play, cursor movement, etc., after receiving the action command from the server.

Preferably, in step S1, the graph database is ArangoDB, the custom thesaurus defines some keywords commonly used, i.e. tagged words, and possible relationships between tagged words and other tagged words, where the relationships between tagged words and other tagged words include subordination, mutual exclusion, and similarity, and the custom thesaurus further defines abstract concepts of dimension words.

Preferably, in step S4, the speech translation interface includes a translation interface of the wechat system.

Preferably, in step S5, after the large screen end of the television is turned on, the small program two-dimensional code will be seen on the designated page, and the small screen of the mobile terminal opens the small program by scanning the two-dimensional code and controls the large screen end of the television by using the voice of the mobile phone.

Preferably, in step S8, the server performs word segmentation on the translated text by using open-source jieba word segmentation chinese NLP natural language processing.

Preferably, in step S9, the context logic process is used to make inferences in conjunction with the last user utterance when analyzing the current user intent.

Compared with the prior art, the invention has the beneficial effects that:

the invention enables users to replace the voice remote controller by using mobile terminal equipment such as mobile phones and the like to carry out voice interaction, realizes the control of a large screen of a television, and replaces the voice remote controller by using mobile terminals such as smart phones, thereby reducing the user cost and improving the convenience of user operation; the mobile terminal device is a small screen terminal, if a corresponding WeChat small program is opened by the smart phone for voice input, an application program on the server performs recognition translation, intention extraction and context processing on the acquired voice, namely, the conversation processing, result response and the notification of the set-top box are performed by combining the last conversation content, and the large screen terminal of the television makes corresponding display.

Drawings

FIG. 1 is a flowchart of a cross-screen voice interaction implementation method based on NLP natural language processing.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in figure 1 of the drawings, in which,

the embodiment discloses a cross-screen voice interaction implementation method based on NLP natural language processing, which is characterized by comprising the following steps:

the graph database is ArangoDB, the custom thesaurus defines some common keywords, namely label words, and possible relations between the label words and other label words, wherein the relations between the label words and other label words comprise subordination, mutual exclusion and similarity, and the custom thesaurus also defines abstract concepts of dimension words;

for example: newly creating two dimensional words such as 'type', 'age bracket';

a label word 'kid' dimension is newly established as an age group, a label word 'horror' dimension is newly established as a type, and label word relationships 'kid' and 'horror' are added for mutual exclusion. At the moment, the application on the server processes context logic, namely, the last conversation content is combined to perform the conversation processing, the horror is described above, and the label of the children is only reserved when the children are described below;

for example: newly building a television program 'Australian door Fengyun 3'; and (3) displaying the staff list: liudebua, Wangjing, Liuwei Qiang, description: classic hong pian; the generation: the 80 s;

converting the television program into json object parameter transmission, and outputting a label group by an interface: macao, Liudebhua, Wangjing, classic, Hongkong tablet, 80 s;

storing the corresponding relation between the label and the television program, and matching the label with the data in subsequent searching such as searching Liudebua;

step S3, applying for a WeChat applet account;

for example: the user says that a sentence is 'movie in 80 years or not', then a sentence 'Liu De Hua leading actor', and the context logic processing result is a result set meeting the three keywords of '80 years', 'Liu De Hua' and 'movie';

after the context logic processing is finished, a new group of label key words are obtained, the intention is clear at this moment, if basic searching and preset actions are carried out, follow-up operations can be respectively carried out according to service scenes, the basic searching is carried out, for example, related programs of '80-year movies' need to be searched, television program data with '80-year' and 'movie' labels are directly inquired for large-screen display of a television, and if the preset actions are carried out, if 'sound is turned down a little', the sound at a large-screen end of the television is directly informed to be turned down;

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. The method for realizing cross-screen voice interaction based on NLP natural language processing is characterized by comprising the following steps:

step S3, applying for a WeChat applet account;

in step S10, the client executes the corresponding action after receiving the action command from the server.

2. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S1, the graph database is ArangoDB, and the custom thesaurus defines some keywords, i.e., tag words, that are commonly used, and possible relationships between the tag words and other tag words, where the relationships between the tag words and other tag words include dependencies, mutual exclusions, and similarities, and the custom thesaurus further defines abstract concepts of dimension words.

3. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S5, the speech translation interface includes a translation interface of the wechat self.

4. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S6, the small program two-dimensional code will be seen on the designated page after the large screen terminal of the television is turned on, and the small screen of the mobile terminal opens the small program by scanning the two-dimensional code and controls the large screen terminal of the television by using the voice of the mobile phone.

5. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S8, the server performs word segmentation on the translated text by using open-source jieba word segmentation chinese NLP natural language processing.

6. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S9, the context logic process is used to make inferences in conjunction with the last user utterance when analyzing the current user intent.