CN112836517A - Method for processing mining risk signal based on natural language - Google Patents

Method for processing mining risk signal based on natural language Download PDF

Info

Publication number
CN112836517A
CN112836517A CN202110108106.3A CN202110108106A CN112836517A CN 112836517 A CN112836517 A CN 112836517A CN 202110108106 A CN202110108106 A CN 202110108106A CN 112836517 A CN112836517 A CN 112836517A
Authority
CN
China
Prior art keywords
early warning
risk early
model
analysis
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110108106.3A
Other languages
Chinese (zh)
Inventor
崔胜辉
侯居永
栾丽丽
陈兆亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110108106.3A priority Critical patent/CN112836517A/en
Publication of CN112836517A publication Critical patent/CN112836517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The invention provides a method for processing and mining risk signals based on natural language, which belongs to the field of Internet plus supervision application.

Description

Method for processing mining risk signal based on natural language
Technical Field
The invention relates to the field of Internet plus supervision application, in particular to a method for mining risk signals based on Natural Language Processing (NLP).
Background
At present, the supervision resources of each supervision department are limited, and the task source mode mainly comprises random matching and planning tasks. Aiming at massive supervision objects, the current situation of nervous supervision resources is combined, and the source mode of supervision tasks has large blind area and low precision. Meanwhile, mass data exist in the air, but effective risk clues cannot be extracted from the mass data, so that the current situation of resource waste is highlighted. How to reasonably utilize the tense resources and how to effectively utilize the big data resources is an urgent problem to be solved by various current regulatory departments.
The risk early warning engine system belongs to the risk early warning category in the field of Internet plus supervision, analyzes potential risk early warning signals based on mass data by using a constructed data analysis model, and drives a supervision main body to carry out targeted supervision, so that supervision dead zones are reduced, and the supervision accuracy is improved. However, at present, the most prospective data is in a large text format, and has the low-quality characteristics of no structure, no abstraction, no feature and the like. Therefore, the engine cannot accurately analyze the data, and the preprocessing such as information extraction and formatting needs to be performed on the data.
At present, data imported through a system is generally in a large text format, and has the low-quality characteristics of no structure, no abstraction, no feature and the like, so that a risk early warning engine and other systems are not suitable for performing business analysis on the information. Therefore, the service system cannot deeply utilize the massive data, and waste is caused.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a method for mining risk signals based on Natural Language Processing (NLP).
According to the method, the content of the large text is analyzed, key information such as objects, time, places, emotions and the like is extracted and stored in a formatted mode, and then the preprocessed data is analyzed to extract risk early warning signals
The technical scheme of the invention is as follows:
a risk early warning engine system processes complaint reports and network public opinion data in advance based on an artificial intelligent natural language processing technology and stores extracted key information in a formatted mode. And then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department.
Further, in the above-mentioned case,
the text information is processed into entry vectors through semantic analysis and word segmentation, a keyword corpus in the Internet supervision field is used as a training object, and a key instruction capable of representing the entry text information is found out through iterative learning of a deep learning algorithm.
The extracted information comprises complaint reporting objects, time, places and emotional information.
In a still further aspect of the present invention,
natural language processing and semantic analysis:
when the risk early warning engine preliminarily analyzes the text data, natural language analysis and understanding services are provided by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology and a neural network technology, and analysis services are provided for text information by means of processing technologies of word segmentation part-of-speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis and keyword extraction of NLP.
The related algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model and a FastText text classification algorithm model based on a neural network.
Analyzing a risk early warning signal:
further analyzing a model of the risk early warning signal and storing the model into a relational database; meanwhile, the user is supported to add the analysis model in a user-defined mode, and the model library is expanded to adapt to the dynamic change of the self-warning focus.
And the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.
Push risk early warning signal
And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.
The invention has the advantages that
The risk early warning signal generated according to the mass data can reflect a potential risk point, and the method is more accurate as a clue for developing supervision work. Meanwhile, limited resources are reasonably utilized, and the mode of automatically pushing the generated risk early warning signals to related supervision units is more efficient.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The risk early warning engine system is applied to the field of Internet and supervision, and is specially used for extracting valuable information such as objects, time, places, emotions and the like from large text content so as to analyze risk early warning signals.
The method comprises the steps of preprocessing the content of the large text, digging out information such as objects, time, places, emotions and the like, and writing the information back to a database, thereby structuring and abstracting the content. Through the analysis and processing of the link, the risk early warning engine can directly utilize the excavated structured result data to perform business analysis and calculation. Therefore, the retrieval speed of the service data can be improved, and the utilization rate of the data can be greatly improved.
The invention is applied to a risk early warning engine system, and belongs to the field of Internet plus supervision application. The risk early warning engine system preprocesses data based on an artificial intelligent natural language processing technology and formats and stores extracted key information. And then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department. The technical scheme of the invention is as follows:
1. natural language processing and semantic analysis:
when a risk early warning engine preliminarily analyzes large text data, personalized, integrated, intelligent and diversified natural language analysis and understanding services are provided by relying on a natural language processing technology, a cooperative crowdsourcing technology, a machine learning technology, a neural network technology and the like, and efficient and accurate analysis services are provided for text information by utilizing processing technologies of word segmentation part of speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis, keyword extraction and the like of NLP, the related core algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model, a FastText text classification algorithm model based on a neural network and the like.
2. Analyzing a risk early warning signal:
and a model for further analyzing the risk early warning signal is stored in a relational database, so that the quantity of data analysis models can be transversely expanded by a user, and the model database is enriched. And the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.
3. Push risk early warning signal
And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.
1) Based on a deep learning technology, the recognition rate from voice to text is improved;
2) the accuracy rate of text processing is improved based on a scene specific corpus in the Internet supervision field;
3) extracting keywords through natural language processing and semantic analysis;
the text information is processed into entry vectors through semantic analysis, word segmentation and the like, a specific keyword corpus in the Internet supervision field is used as a training object, iterative learning is performed through a deep learning algorithm, key instructions which can represent the text information can be found out more accurately, and the algorithm identification accuracy is improved.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A method for processing mining risk signals based on natural language is characterized in that,
preprocessing data based on an artificial intelligence natural language processing technology, and formatting and storing the extracted information; and then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department.
2. The method of claim 1,
the text information is processed into entry vectors through semantic analysis and word segmentation, a keyword corpus in the Internet supervision field is used as a training object, iterative learning is carried out through a deep learning algorithm, and an instruction representing the text information is found out.
3. The method of claim 2,
the extracted information includes object, time, place, emotion information.
4. The method according to claim 2 or 3,
natural language processing and semantic analysis:
when the risk early warning engine preliminarily analyzes the text data, natural language analysis and understanding services are provided by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology and a neural network technology, and analysis services are provided for text information by means of processing technologies of word segmentation part-of-speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis and keyword extraction of NLP.
5. The method of claim 4,
the related algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model and a FastText text classification algorithm model based on a neural network.
6. The method of claim 5,
analyzing a risk early warning signal:
further analyzing a model of the risk early warning signal and storing the model into a relational database; meanwhile, the user is supported to add the analysis model in a user-defined mode, and the model library is expanded to adapt to the dynamic change of the self-warning focus.
7. The method of claim 6,
and the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.
8. The method of claim 7,
push risk early warning signal
And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.
CN202110108106.3A 2021-01-27 2021-01-27 Method for processing mining risk signal based on natural language Pending CN112836517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110108106.3A CN112836517A (en) 2021-01-27 2021-01-27 Method for processing mining risk signal based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110108106.3A CN112836517A (en) 2021-01-27 2021-01-27 Method for processing mining risk signal based on natural language

Publications (1)

Publication Number Publication Date
CN112836517A true CN112836517A (en) 2021-05-25

Family

ID=75931728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110108106.3A Pending CN112836517A (en) 2021-01-27 2021-01-27 Method for processing mining risk signal based on natural language

Country Status (1)

Country Link
CN (1) CN112836517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328907A (en) * 2021-10-22 2022-04-12 浙江嘉兴数字城市实验室有限公司 Natural language processing method for early warning risk upgrade event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529804A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Client complaint early-warning monitoring analyzing method based on text mining technology
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A kind of product information security risk monitoring method based on semantic analysis
CN110347719A (en) * 2019-06-24 2019-10-18 华南农业大学 A kind of enterprise's foreign trade method for prewarning risk and system based on big data
CN111899089A (en) * 2020-07-01 2020-11-06 苏宁金融科技(南京)有限公司 Enterprise risk early warning method and system based on knowledge graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529804A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Client complaint early-warning monitoring analyzing method based on text mining technology
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A kind of product information security risk monitoring method based on semantic analysis
CN110347719A (en) * 2019-06-24 2019-10-18 华南农业大学 A kind of enterprise's foreign trade method for prewarning risk and system based on big data
CN111899089A (en) * 2020-07-01 2020-11-06 苏宁金融科技(南京)有限公司 Enterprise risk early warning method and system based on knowledge graph

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328907A (en) * 2021-10-22 2022-04-12 浙江嘉兴数字城市实验室有限公司 Natural language processing method for early warning risk upgrade event

Similar Documents

Publication Publication Date Title
CN109189942B (en) Construction method and device of patent data knowledge graph
CN107679039B (en) Method and device for determining statement intention
CN111046656B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN109325040B (en) FAQ question-answer library generalization method, device and equipment
CN111460125A (en) Intelligent question and answer method and system for government affair service
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN111639484A (en) Method for analyzing seat call content
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN107526721A (en) A kind of disambiguation method and device to electric business product review vocabulary
CN117033571A (en) Knowledge question-answering system construction method and system
CN115759071A (en) Government affair sensitive information identification system and method based on big data
CN111241299A (en) Knowledge graph automatic construction method for legal consultation and retrieval system thereof
CN109670045A (en) Emotion reason abstracting method based on ontology model and multi-kernel support vector machine
CN112836517A (en) Method for processing mining risk signal based on natural language
CN113407726A (en) Emergency disposal plan method and system
CN116628173A (en) Intelligent customer service information generation system and method based on keyword extraction
Han et al. A novel part of speech tagging framework for nlp based business process management
CN112506405B (en) Artificial intelligent voice large screen command method based on Internet supervision field
CN112488593B (en) Auxiliary bid evaluation system and method for bidding
CN113688233A (en) Text understanding method for semantic search of knowledge graph
CN109960798A (en) Uighur text emergency event element recognition methods
CN116049385B (en) Method, device, equipment and platform for generating information and create industry research report
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis
Sharma et al. Architecture and Types of Intelligent Agent and Uses of Various Technologies
CN113076468B (en) Nested event extraction method based on field pre-training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210525