CN112836517A

CN112836517A - Method for processing mining risk signal based on natural language

Info

Publication number: CN112836517A
Application number: CN202110108106.3A
Authority: CN
Inventors: 崔胜辉; 侯居永; 栾丽丽; 陈兆亮
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-25

Abstract

The invention provides a method for processing and mining risk signals based on natural language, which belongs to the field of Internet plus supervision application.

Description

Method for processing mining risk signal based on natural language

Technical Field

The invention relates to the field of Internet plus supervision application, in particular to a method for mining risk signals based on Natural Language Processing (NLP).

Background

At present, the supervision resources of each supervision department are limited, and the task source mode mainly comprises random matching and planning tasks. Aiming at massive supervision objects, the current situation of nervous supervision resources is combined, and the source mode of supervision tasks has large blind area and low precision. Meanwhile, mass data exist in the air, but effective risk clues cannot be extracted from the mass data, so that the current situation of resource waste is highlighted. How to reasonably utilize the tense resources and how to effectively utilize the big data resources is an urgent problem to be solved by various current regulatory departments.

The risk early warning engine system belongs to the risk early warning category in the field of Internet plus supervision, analyzes potential risk early warning signals based on mass data by using a constructed data analysis model, and drives a supervision main body to carry out targeted supervision, so that supervision dead zones are reduced, and the supervision accuracy is improved. However, at present, the most prospective data is in a large text format, and has the low-quality characteristics of no structure, no abstraction, no feature and the like. Therefore, the engine cannot accurately analyze the data, and the preprocessing such as information extraction and formatting needs to be performed on the data.

At present, data imported through a system is generally in a large text format, and has the low-quality characteristics of no structure, no abstraction, no feature and the like, so that a risk early warning engine and other systems are not suitable for performing business analysis on the information. Therefore, the service system cannot deeply utilize the massive data, and waste is caused.

Disclosure of Invention

In order to solve the above technical problem, the present invention provides a method for mining risk signals based on Natural Language Processing (NLP).

According to the method, the content of the large text is analyzed, key information such as objects, time, places, emotions and the like is extracted and stored in a formatted mode, and then the preprocessed data is analyzed to extract risk early warning signals

The technical scheme of the invention is as follows:

a risk early warning engine system processes complaint reports and network public opinion data in advance based on an artificial intelligent natural language processing technology and stores extracted key information in a formatted mode. And then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department.

Further, in the above-mentioned case,

the text information is processed into entry vectors through semantic analysis and word segmentation, a keyword corpus in the Internet supervision field is used as a training object, and a key instruction capable of representing the entry text information is found out through iterative learning of a deep learning algorithm.

The extracted information comprises complaint reporting objects, time, places and emotional information.

In a still further aspect of the present invention,

natural language processing and semantic analysis:

when the risk early warning engine preliminarily analyzes the text data, natural language analysis and understanding services are provided by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology and a neural network technology, and analysis services are provided for text information by means of processing technologies of word segmentation part-of-speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis and keyword extraction of NLP.

The related algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model and a FastText text classification algorithm model based on a neural network.

Analyzing a risk early warning signal:

further analyzing a model of the risk early warning signal and storing the model into a relational database; meanwhile, the user is supported to add the analysis model in a user-defined mode, and the model library is expanded to adapt to the dynamic change of the self-warning focus.

And the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.

Push risk early warning signal

And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.

The invention has the advantages that

The risk early warning signal generated according to the mass data can reflect a potential risk point, and the method is more accurate as a clue for developing supervision work. Meanwhile, limited resources are reasonably utilized, and the mode of automatically pushing the generated risk early warning signals to related supervision units is more efficient.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

The risk early warning engine system is applied to the field of Internet and supervision, and is specially used for extracting valuable information such as objects, time, places, emotions and the like from large text content so as to analyze risk early warning signals.

The method comprises the steps of preprocessing the content of the large text, digging out information such as objects, time, places, emotions and the like, and writing the information back to a database, thereby structuring and abstracting the content. Through the analysis and processing of the link, the risk early warning engine can directly utilize the excavated structured result data to perform business analysis and calculation. Therefore, the retrieval speed of the service data can be improved, and the utilization rate of the data can be greatly improved.

The invention is applied to a risk early warning engine system, and belongs to the field of Internet plus supervision application. The risk early warning engine system preprocesses data based on an artificial intelligent natural language processing technology and formats and stores extracted key information. And then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department. The technical scheme of the invention is as follows:

1. natural language processing and semantic analysis:

when a risk early warning engine preliminarily analyzes large text data, personalized, integrated, intelligent and diversified natural language analysis and understanding services are provided by relying on a natural language processing technology, a cooperative crowdsourcing technology, a machine learning technology, a neural network technology and the like, and efficient and accurate analysis services are provided for text information by utilizing processing technologies of word segmentation part of speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis, keyword extraction and the like of NLP, the related core algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model, a FastText text classification algorithm model based on a neural network and the like.

2. Analyzing a risk early warning signal:

and a model for further analyzing the risk early warning signal is stored in a relational database, so that the quantity of data analysis models can be transversely expanded by a user, and the model database is enriched. And the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.

3. Push risk early warning signal

1) Based on a deep learning technology, the recognition rate from voice to text is improved;

2) the accuracy rate of text processing is improved based on a scene specific corpus in the Internet supervision field;

3) extracting keywords through natural language processing and semantic analysis;

the text information is processed into entry vectors through semantic analysis, word segmentation and the like, a specific keyword corpus in the Internet supervision field is used as a training object, iterative learning is performed through a deep learning algorithm, key instructions which can represent the text information can be found out more accurately, and the algorithm identification accuracy is improved.

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for processing mining risk signals based on natural language is characterized in that,

preprocessing data based on an artificial intelligence natural language processing technology, and formatting and storing the extracted information; and then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department.

2. The method of claim 1,

the text information is processed into entry vectors through semantic analysis and word segmentation, a keyword corpus in the Internet supervision field is used as a training object, iterative learning is carried out through a deep learning algorithm, and an instruction representing the text information is found out.

3. The method of claim 2,

the extracted information includes object, time, place, emotion information.

4. The method according to claim 2 or 3,

natural language processing and semantic analysis:

5. The method of claim 4,

6. The method of claim 5,

analyzing a risk early warning signal:

7. The method of claim 6,

8. The method of claim 7,

push risk early warning signal