CN113763955A - Cross-screen voice interaction implementation method based on NLP natural language processing - Google Patents

Cross-screen voice interaction implementation method based on NLP natural language processing Download PDF

Info

Publication number
CN113763955A
CN113763955A CN202111109330.0A CN202111109330A CN113763955A CN 113763955 A CN113763955 A CN 113763955A CN 202111109330 A CN202111109330 A CN 202111109330A CN 113763955 A CN113763955 A CN 113763955A
Authority
CN
China
Prior art keywords
television
natural language
server
language processing
dimensional code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111109330.0A
Other languages
Chinese (zh)
Inventor
严志雄
邵寻路
麻泽宇
吴晓涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Paco Video Technology Hangzhou Co ltd
Original Assignee
Paco Video Technology Hangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Paco Video Technology Hangzhou Co ltd filed Critical Paco Video Technology Hangzhou Co ltd
Priority to CN202111109330.0A priority Critical patent/CN113763955A/en
Publication of CN113763955A publication Critical patent/CN113763955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72409User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by interfacing with external accessories
    • H04M1/72415User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by interfacing with external accessories for remote control of appliances
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/4222Remote control device emulator integrated into a non-television apparatus, e.g. a PDA, media center or smart toy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a cross-screen voice interaction realization method based on NLP natural language processing, which comprises the following steps: constructing a user-defined word stock and label system by using a graph database; labeling the keyword labels of the television programs managed by the platform through an interface; applying for a WeChat applet account, adding a login page and adding a voice remote controller page; an integrated speech translation interface; sending the small program to generate a two-dimensional code, and adding the two-dimensional code into a set top box of the television; scanning the two-dimensional code to open the applet and informing the set top box to establish polling request connection with the server; the audio file collection function is used in the applet, and a request is initiated to the server after the audio file is collected; the server side translates the audio, and the server side performs word segmentation on translated characters; context logic processing; the large-screen end of the television receives the information of the server and then makes a response action; the mobile terminal such as a mobile phone replaces a voice remote controller, so that the user cost is reduced, and the operation is convenient.

Description

Cross-screen voice interaction implementation method based on NLP natural language processing
Technical Field
The invention relates to the technical field of natural language processing, in particular to a cross-screen voice interaction implementation method based on NLP natural language processing.
Background
IPTV, i.e. an interactive network television, integrates multiple technologies such as internet, multimedia, and communication, and it is very common for users to watch IPTV through a television set-top box. The data volume of IPTV television programs is very large, and tens of millions of pieces of media asset data can be easily achieved. In daily life, the system can meet the requirement of users for searching and watching most of television programs.
At present, the program retrieval interaction of the IPTV mainly uses a remote controller. The user may use the remote control keys to sort through or type in search of specified content. In addition, the user can use a voice remote controller to complete the retrieval. Although operators have already performed classification management and column scheduling on a large number of television programs, it is time-consuming for users to find specific programs by using a remote controller, and often need to find multiple keys and multiple screens. Some users select the voice remote controller and hope to operate the television more quickly through voice interaction, but the actual situation is that the voice remote controller has certain cost, low popularization rate and no short-term memory and context and scene dialogue processing capability.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a cross-screen voice interaction implementation method based on NLP natural language processing, which can effectively solve the problems in the background art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the cross-screen voice interaction realization method based on NLP natural language processing comprises the following steps:
step S1, a user-defined word stock and label system is constructed by using a graph database;
step S2, labeling the keyword label of the television program managed by the platform, storing the television program information, the keyword label information and the associated information of the two, providing an external inquiry interface, and realizing the retrieval of the television program according to the transmitted keyword information;
step S3, applying for a WeChat applet account;
step S4, developing the wechat applet, adding a login page and integrating a wechat applet login authentication interface; adding a voice remote controller page, integrating recording authority authentication of the WeChat applet and a recording use interface; an integrated speech translation interface;
step S5, sending the WeChat applet to generate a two-dimensional code, and adding the two-dimensional code into the client application of the television or the set-top box;
step S6, scanning the two-dimensional code to open the WeChat applet, informing the client to establish a polling request connection with the server, and the set-top box starting to send a polling request to the server;
step S7, a recording collecting function is used in the WeChat applet, and a request is initiated to the server after the audio file is recorded;
step S8, the service end translates the audio frequency, then carries on word segmentation to the translated words, uses the self-defined word stock and the label system to extract the key word label after the word segmentation;
step S9, after the keyword label is extracted, the context logic processing is carried out to form an action instruction and the action instruction is returned to the client of the television or the set top box;
in step S10, the client executes corresponding actions, such as channel change, play, cursor movement, etc., after receiving the action command from the server.
Preferably, in step S1, the graph database is ArangoDB, the custom thesaurus defines some keywords commonly used, i.e. tagged words, and possible relationships between tagged words and other tagged words, where the relationships between tagged words and other tagged words include subordination, mutual exclusion, and similarity, and the custom thesaurus further defines abstract concepts of dimension words.
Preferably, in step S4, the speech translation interface includes a translation interface of the wechat system.
Preferably, in step S5, after the large screen end of the television is turned on, the small program two-dimensional code will be seen on the designated page, and the small screen of the mobile terminal opens the small program by scanning the two-dimensional code and controls the large screen end of the television by using the voice of the mobile phone.
Preferably, in step S8, the server performs word segmentation on the translated text by using open-source jieba word segmentation chinese NLP natural language processing.
Preferably, in step S9, the context logic process is used to make inferences in conjunction with the last user utterance when analyzing the current user intent.
Compared with the prior art, the invention has the beneficial effects that:
the invention enables users to replace the voice remote controller by using mobile terminal equipment such as mobile phones and the like to carry out voice interaction, realizes the control of a large screen of a television, and replaces the voice remote controller by using mobile terminals such as smart phones, thereby reducing the user cost and improving the convenience of user operation; the mobile terminal device is a small screen terminal, if a corresponding WeChat small program is opened by the smart phone for voice input, an application program on the server performs recognition translation, intention extraction and context processing on the acquired voice, namely, the conversation processing, result response and the notification of the set-top box are performed by combining the last conversation content, and the large screen terminal of the television makes corresponding display.
Drawings
FIG. 1 is a flowchart of a cross-screen voice interaction implementation method based on NLP natural language processing.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in figure 1 of the drawings, in which,
the embodiment discloses a cross-screen voice interaction implementation method based on NLP natural language processing, which is characterized by comprising the following steps:
step S1, a user-defined word stock and label system is constructed by using a graph database;
the graph database is ArangoDB, the custom thesaurus defines some common keywords, namely label words, and possible relations between the label words and other label words, wherein the relations between the label words and other label words comprise subordination, mutual exclusion and similarity, and the custom thesaurus also defines abstract concepts of dimension words;
for example: newly creating two dimensional words such as 'type', 'age bracket';
a label word 'kid' dimension is newly established as an age group, a label word 'horror' dimension is newly established as a type, and label word relationships 'kid' and 'horror' are added for mutual exclusion. At the moment, the application on the server processes context logic, namely, the last conversation content is combined to perform the conversation processing, the horror is described above, and the label of the children is only reserved when the children are described below;
step S2, labeling the keyword label of the television program managed by the platform, storing the television program information, the keyword label information and the associated information of the two, providing an external inquiry interface, and realizing the retrieval of the television program according to the transmitted keyword information;
for example: newly building a television program 'Australian door Fengyun 3'; and (3) displaying the staff list: liudebua, Wangjing, Liuwei Qiang, description: classic hong pian; the generation: the 80 s;
converting the television program into json object parameter transmission, and outputting a label group by an interface: macao, Liudebhua, Wangjing, classic, Hongkong tablet, 80 s;
storing the corresponding relation between the label and the television program, and matching the label with the data in subsequent searching such as searching Liudebua;
step S3, applying for a WeChat applet account;
step S4, developing the wechat applet, adding a login page and integrating a wechat applet login authentication interface; adding a voice remote controller page, integrating recording authority authentication of the WeChat applet and a recording use interface; an integrated speech translation interface;
step S5, sending the WeChat applet to generate a two-dimensional code, and adding the two-dimensional code into the client application of the television or the set-top box;
step S6, scanning the two-dimensional code to open the WeChat applet, informing the client to establish a polling request connection with the server, and the set-top box starting to send a polling request to the server;
step S7, a recording collecting function is used in the WeChat applet, and a request is initiated to the server after the audio file is recorded;
step S8, the service end translates the audio frequency, then carries on word segmentation to the translated words, uses the self-defined word stock and the label system to extract the key word label after the word segmentation;
step S9, after the keyword label is extracted, the context logic processing is carried out to form an action instruction and the action instruction is returned to the client of the television or the set top box;
for example: the user says that a sentence is 'movie in 80 years or not', then a sentence 'Liu De Hua leading actor', and the context logic processing result is a result set meeting the three keywords of '80 years', 'Liu De Hua' and 'movie';
after the context logic processing is finished, a new group of label key words are obtained, the intention is clear at this moment, if basic searching and preset actions are carried out, follow-up operations can be respectively carried out according to service scenes, the basic searching is carried out, for example, related programs of '80-year movies' need to be searched, television program data with '80-year' and 'movie' labels are directly inquired for large-screen display of a television, and if the preset actions are carried out, if 'sound is turned down a little', the sound at a large-screen end of the television is directly informed to be turned down;
in step S10, the client executes corresponding actions, such as channel change, play, cursor movement, etc., after receiving the action command from the server.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (6)

1. The method for realizing cross-screen voice interaction based on NLP natural language processing is characterized by comprising the following steps:
step S1, a user-defined word stock and label system is constructed by using a graph database;
step S2, labeling the keyword label of the television program managed by the platform, storing the television program information, the keyword label information and the associated information of the two, providing an external inquiry interface, and realizing the retrieval of the television program according to the transmitted keyword information;
step S3, applying for a WeChat applet account;
step S4, developing the wechat applet, adding a login page and integrating a wechat applet login authentication interface; adding a voice remote controller page, integrating recording authority authentication of the WeChat applet and a recording use interface; an integrated speech translation interface;
step S5, sending the WeChat applet to generate a two-dimensional code, and adding the two-dimensional code into the client application of the television or the set-top box;
step S6, scanning the two-dimensional code to open the WeChat applet, informing the client to establish a polling request connection with the server, and the set-top box starting to send a polling request to the server;
step S7, a recording collecting function is used in the WeChat applet, and a request is initiated to the server after the audio file is recorded;
step S8, the service end translates the audio frequency, then carries on word segmentation to the translated words, uses the self-defined word stock and the label system to extract the key word label after the word segmentation;
step S9, after the keyword label is extracted, the context logic processing is carried out to form an action instruction and the action instruction is returned to the client of the television or the set top box;
in step S10, the client executes the corresponding action after receiving the action command from the server.
2. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S1, the graph database is ArangoDB, and the custom thesaurus defines some keywords, i.e., tag words, that are commonly used, and possible relationships between the tag words and other tag words, where the relationships between the tag words and other tag words include dependencies, mutual exclusions, and similarities, and the custom thesaurus further defines abstract concepts of dimension words.
3. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S5, the speech translation interface includes a translation interface of the wechat self.
4. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S6, the small program two-dimensional code will be seen on the designated page after the large screen terminal of the television is turned on, and the small screen of the mobile terminal opens the small program by scanning the two-dimensional code and controls the large screen terminal of the television by using the voice of the mobile phone.
5. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S8, the server performs word segmentation on the translated text by using open-source jieba word segmentation chinese NLP natural language processing.
6. The NLP natural language processing-based cross-screen voice interaction implementation method according to claim 1, wherein: in step S9, the context logic process is used to make inferences in conjunction with the last user utterance when analyzing the current user intent.
CN202111109330.0A 2021-09-22 2021-09-22 Cross-screen voice interaction implementation method based on NLP natural language processing Pending CN113763955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111109330.0A CN113763955A (en) 2021-09-22 2021-09-22 Cross-screen voice interaction implementation method based on NLP natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111109330.0A CN113763955A (en) 2021-09-22 2021-09-22 Cross-screen voice interaction implementation method based on NLP natural language processing

Publications (1)

Publication Number Publication Date
CN113763955A true CN113763955A (en) 2021-12-07

Family

ID=78796781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111109330.0A Pending CN113763955A (en) 2021-09-22 2021-09-22 Cross-screen voice interaction implementation method based on NLP natural language processing

Country Status (1)

Country Link
CN (1) CN113763955A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105763898A (en) * 2014-12-16 2016-07-13 上海天脉聚源文化传媒有限公司 Method and system for controlling IPTV set-top box by voice
CN106804001A (en) * 2017-02-28 2017-06-06 山东浪潮商用系统有限公司 A kind of method and system by wechat client remote control set-box
CN107241652A (en) * 2017-06-28 2017-10-10 百视通网络电视技术发展有限责任公司 A kind of TV speech remote control system and method based on wechat small routine
US20180041453A1 (en) * 2015-12-28 2018-02-08 Goertek Inc. Method and device for interaction between smart watch and wechat platform, and smart watch
CN107918286A (en) * 2017-12-13 2018-04-17 福建师范大学福清分校 A kind of intelligent home control system and implementation method based on equipment room linkage characteristic
CN109194988A (en) * 2018-09-06 2019-01-11 广州高清视信数码科技股份有限公司 A kind of one-way set-top box voice control channel switching method and system
CN110706696A (en) * 2019-09-25 2020-01-17 珠海格力电器股份有限公司 Voice control method and device
CN110781402A (en) * 2020-01-02 2020-02-11 南京创维信息技术研究院有限公司 System and method for realizing multi-round deep retrieval on television based on Tianmao elfin

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105763898A (en) * 2014-12-16 2016-07-13 上海天脉聚源文化传媒有限公司 Method and system for controlling IPTV set-top box by voice
US20180041453A1 (en) * 2015-12-28 2018-02-08 Goertek Inc. Method and device for interaction between smart watch and wechat platform, and smart watch
CN106804001A (en) * 2017-02-28 2017-06-06 山东浪潮商用系统有限公司 A kind of method and system by wechat client remote control set-box
CN107241652A (en) * 2017-06-28 2017-10-10 百视通网络电视技术发展有限责任公司 A kind of TV speech remote control system and method based on wechat small routine
CN107918286A (en) * 2017-12-13 2018-04-17 福建师范大学福清分校 A kind of intelligent home control system and implementation method based on equipment room linkage characteristic
CN109194988A (en) * 2018-09-06 2019-01-11 广州高清视信数码科技股份有限公司 A kind of one-way set-top box voice control channel switching method and system
CN110706696A (en) * 2019-09-25 2020-01-17 珠海格力电器股份有限公司 Voice control method and device
CN110781402A (en) * 2020-01-02 2020-02-11 南京创维信息技术研究院有限公司 System and method for realizing multi-round deep retrieval on television based on Tianmao elfin

Similar Documents

Publication Publication Date Title
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
WO2019218820A1 (en) Method and apparatus for determining controlled object, and storage medium and electronic device
US20060173859A1 (en) Apparatus and method for extracting context and providing information based on context in multimedia communication system
US20150154303A1 (en) System and method for providing content recommendation service
EP4053802A1 (en) Video classification method and apparatus, device and storage medium
CN103916687A (en) Display apparatus and method of controlling display apparatus
CN111372141B (en) Expression image generation method and device and electronic equipment
CN103984772A (en) Method and device for generating text retrieval subtitle library and video retrieval method and device
CN104639993A (en) Video program recommending method and server thereof
CN109600646B (en) Voice positioning method and device, smart television and storage medium
CN109144285A (en) A kind of input method and device
CN114155855A (en) Voice recognition method, server and electronic equipment
CN112182196A (en) Service equipment applied to multi-turn conversation and multi-turn conversation method
CN113411674A (en) Video playing control method and device, electronic equipment and storage medium
CN112148874A (en) Intention identification method and system capable of automatically adding potential intention of user
CN115098729A (en) Video processing method, sample generation method, model training method and device
CN111159467B (en) Method and equipment for processing information interaction
CN111552794A (en) Prompt language generation method, device, equipment and storage medium
CN113763955A (en) Cross-screen voice interaction implementation method based on NLP natural language processing
CN115602167A (en) Display device and voice recognition method
CN114840711A (en) Intelligent device and theme construction method
CN111344664B (en) Electronic apparatus and control method thereof
KR102435243B1 (en) A method for providing a producing service of transformed multimedia contents using matching of video resources
KR20190032130A (en) Apparatus and method for providing transaction of an intellectual property service
CN112908319B (en) Method and equipment for processing information interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination