CN112506405B

CN112506405B - Artificial intelligent voice large screen command method based on Internet supervision field

Info

Publication number: CN112506405B
Application number: CN202011396329.6A
Authority: CN
Inventors: 刘磊; 侯居永; 栾丽丽; 陈兆亮
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2022-05-31
Anticipated expiration: 2040-12-03
Also published as: CN112506405A

Abstract

The invention provides an artificial intelligent voice large-screen command method based on the field of internet supervision, which belongs to the field of internet plus supervision.

Description

Artificial intelligent voice large screen command method based on Internet supervision field

Technical Field

The invention relates to the technical field of Internet plus supervision, in particular to an artificial intelligent voice large-screen command method based on the field of Internet supervision.

Background

During the operation of the system, a large amount of service data is accumulated, multi-angle deep analysis is carried out on the service data by establishing a data warehouse model, hidden information in the data is fully excavated, scientific basis is provided for leader decision-making, supervision data and supervision results are classified from multiple dimensions of different supervision industries, supervision areas, supervision fields and the like by aggregating various supervision data information and relying on an intelligent applicability data analysis model and tools, supervision risks are specially analyzed and displayed from different angles of supervision objects, supervision matters, supervision behaviors and the like by large data analysis algorithms such as regression analysis, cluster analysis, heat analysis, classification analysis and the like, multi-dimensional supervision work statistical analysis of regions, departments and the like is formed, governments can realize more accurate city management by means of the large data, promote civilians, realize standard supervision based on the large data, Accurate supervision, combined supervision and supervision on supervision, and meanwhile, data can be analyzed to realize early risk warning and drive business decision.

The existing system only focuses on the display effect of a multi-dimensional analysis processing and visualization system of data, but often ignores the interaction with a user, and realizes the heavy display effect and the light interaction.

Disclosure of Invention

In order to solve the technical problems, the invention provides an artificial intelligent voice large-screen command method based on the field of internet supervision.

The invention discloses an artificial intelligent voice large-screen command method based on the field of internet supervision, and aims to improve the traditional interaction mode between a user and a system, improve the experience of the user, add a more intelligent input processing mode to the system and enable the system to be more humanized by applying a voice real-time character and natural language processing technology and a semantic analysis technology to a visual large-screen analysis method in the field of internet supervision.

The technical scheme of the invention is as follows:

an artificial intelligent voice large screen command method based on the internet supervision field,

the method comprises the steps of identifying and converting user voice in real time into text information, extracting keywords through a natural language processing algorithm to obtain main information which a user wants to express, processing the main information through a background algorithm, transmitting the main information to a foreground through a Websocket protocol, calling a page control method to simulate user clicking operation, displaying the information which the user wants, and further realizing that the user clicking operation is replaced by the voice.

Further, in the above-mentioned case,

converting the user voice into text information in real time:

the real-time conversion of audio language into text is based on a deep full-sequence convolution neural network framework, and audio stream data is converted into text data in real time by establishing long connection between an application and a voice transcription engine for a complete audio file within 60 seconds.

And performing semantic analysis and word segmentation on the text information to obtain entry vectors, and performing iterative learning by using a keyword corpus in the Internet supervision field as a training object through a deep learning algorithm.

The vocabulary in the internet supervision field is uploaded to a recognition engine, and when the vocabulary appears in the audio stream to be transcribed, the engine can recognize the vocabulary, so that the recognition accuracy of the professional vocabulary is improved, iterative training can be performed, and continuous optimization can be realized.

Further, in the above-mentioned case,

natural language processing and semantic analysis:

after the voice input of a user is converted into text information through processing, natural language analysis and understanding services are provided according to linguistic data in the internet supervision field and by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology and a neural network technology, and analysis services are provided for the text information by means of processing technologies of word segmentation part of speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntax dependence analysis, automatic summarization, text similarity, text classification, emotion analysis and keyword extraction of NLP.

The related core algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model and a FastText text classification algorithm model based on a neural network.

Further, in the above-mentioned case,

establishing connection between a server and a client based on a WebSocket protocol, transmitting generated instruction information to a front-end script, calling a related method to realize simulated click operation, and completing page and function switching.

The server transmits an instruction to the client:

and (3) applying a WebSocket technology, calling a client monitoring method by the server, transmitting the keywords generated in the second step to the client, and calling different js methods by the client according to different parameters, thereby realizing the purpose of replacing the clicking action of the user and completing the switching display of the page.

1) Based on a deep learning technology, the recognition rate of the speech to the text is improved;

2) the accuracy rate of text processing is improved based on the internet supervision field specific corpus;

3) extracting keywords through natural language processing and semantic analysis;

4) through webocket technology, the server transmits a transmission instruction to the client, and action triggering and function switching are achieved.

The invention has the advantages that

By the method and the device, the interactivity between the large visual screen and the user in the internet supervision field can be improved, the use experience of the user is further improved, and meanwhile, the system is more intelligent and the use scenes are enriched.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, it is obvious that the described embodiments are some, but not all embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

The invention can greatly improve the user participation degree and improve the user experience degree by adding the voice command control function, applies the voice real-time text-to-text function to an internet supervision visual analysis system, and carries out natural language processing, semantic analysis and other technologies on the voice converted text to realize accurate identification, extraction and calling of keywords and a foreground processing method, thereby converting the voice converted text into the operation on the page. The interaction is more intelligent and friendly.

The invention provides a brand-new interactive experience mode for realizing the user voice instead of clicking operation based on an artificial intelligent natural language processing technology by combining the actual Internet + supervision service condition, the standard requirement and the actual use scene of the system and following the relevant laws and regulations.

The technical scheme is as follows:

1. converting user voice into text information in real time:

the real-time conversion of the audio language into the text is based on a deep full-sequence convolution neural network framework, and audio stream data is converted into text data in real time by establishing long connection between application and a voice transcription engine for a complete audio file within 60 seconds; meanwhile, a user can upload special vocabularies in the internet supervision field to a recognition engine, and when the vocabularies appear in the audio stream to be transcribed, the vocabularies can be recognized by the engine, so that the recognition accuracy of the professional vocabularies is improved, iterative training can be performed, and continuous optimization can be realized. The system can support audio formats such as wav, pcm and mp3, single-channel audio streams with sampling rates of 8k and 16k, and data sampling precision of 16 bits.

2. Natural language processing and semantic analysis:

after the user voice input is converted into text information through processing, personalized, integrated, intelligent and diversified natural language analysis and understanding services are provided according to the linguistic data in the specific internet supervision field by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology, a neural network technology and the like, and efficient and accurate analysis services are provided for the text information by utilizing processing technologies of word segmentation part of speech tagging, name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis, keyword extraction and the like of NLP, and related core algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm and a W2V word vector model, A CRF conditional random field algorithm model, a FastText text classification algorithm model based on a neural network and the like.

3. The server transmits an instruction to the client:

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An artificial intelligent voice large screen command method based on the internet supervision field is characterized in that,

the method comprises the steps of identifying and converting user voice in real time into text information, extracting keywords through a natural language processing algorithm to obtain main information which a user wants to express, processing the main information through a background algorithm, transmitting the main information to a foreground through a Websocket protocol, calling a page control method to simulate user clicking operation, displaying the information which the user wants, and further replacing the user clicking operation with voice;

converting the user voice into text information in real time:

the real-time conversion of the audio language into the text is based on a deep full-sequence convolution neural network framework, and audio stream data is converted into text data in real time by establishing long connection between application and a voice transcription engine for a complete audio file within 60 seconds;

performing semantic analysis and word segmentation on text information to obtain entry vectors, and performing iterative learning by using a keyword corpus in the Internet supervision field as a training object through a deep learning algorithm;

the vocabulary in the internet supervision field is uploaded to a recognition engine, and when the vocabulary appears in the audio stream to be transcribed, the vocabulary can be recognized by the engine, so that the recognition accuracy of the professional vocabulary is improved, and iterative training and continuous optimization can be realized;

natural language processing and semantic analysis:

2. The method of claim 1,

3. The method of claim 1,

4. The method of claim 3,

the server transmits an instruction to the client:

and (3) applying a WebSocket technology, calling a client monitoring method by the server, transmitting the keywords generated in the second step to the client, and calling different js methods by the client according to different parameters, so as to replace the clicking action of the user and finish the switching display of the page.