CN112836517A - Method for processing mining risk signal based on natural language - Google Patents
Method for processing mining risk signal based on natural language Download PDFInfo
- Publication number
- CN112836517A CN112836517A CN202110108106.3A CN202110108106A CN112836517A CN 112836517 A CN112836517 A CN 112836517A CN 202110108106 A CN202110108106 A CN 202110108106A CN 112836517 A CN112836517 A CN 112836517A
- Authority
- CN
- China
- Prior art keywords
- early warning
- risk early
- model
- analysis
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000005065 mining Methods 0.000 title claims abstract description 6
- 238000004458 analytical method Methods 0.000 claims description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 25
- 238000005516 engineering process Methods 0.000 claims description 19
- 238000003058 natural language processing Methods 0.000 claims description 17
- 230000011218 segmentation Effects 0.000 claims description 9
- 230000008451 emotion Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000007635 classification algorithm Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 230000008520 organization Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 claims 1
- 238000007405 data analysis Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Abstract
The invention provides a method for processing and mining risk signals based on natural language, which belongs to the field of Internet plus supervision application.
Description
Technical Field
The invention relates to the field of Internet plus supervision application, in particular to a method for mining risk signals based on Natural Language Processing (NLP).
Background
At present, the supervision resources of each supervision department are limited, and the task source mode mainly comprises random matching and planning tasks. Aiming at massive supervision objects, the current situation of nervous supervision resources is combined, and the source mode of supervision tasks has large blind area and low precision. Meanwhile, mass data exist in the air, but effective risk clues cannot be extracted from the mass data, so that the current situation of resource waste is highlighted. How to reasonably utilize the tense resources and how to effectively utilize the big data resources is an urgent problem to be solved by various current regulatory departments.
The risk early warning engine system belongs to the risk early warning category in the field of Internet plus supervision, analyzes potential risk early warning signals based on mass data by using a constructed data analysis model, and drives a supervision main body to carry out targeted supervision, so that supervision dead zones are reduced, and the supervision accuracy is improved. However, at present, the most prospective data is in a large text format, and has the low-quality characteristics of no structure, no abstraction, no feature and the like. Therefore, the engine cannot accurately analyze the data, and the preprocessing such as information extraction and formatting needs to be performed on the data.
At present, data imported through a system is generally in a large text format, and has the low-quality characteristics of no structure, no abstraction, no feature and the like, so that a risk early warning engine and other systems are not suitable for performing business analysis on the information. Therefore, the service system cannot deeply utilize the massive data, and waste is caused.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a method for mining risk signals based on Natural Language Processing (NLP).
According to the method, the content of the large text is analyzed, key information such as objects, time, places, emotions and the like is extracted and stored in a formatted mode, and then the preprocessed data is analyzed to extract risk early warning signals
The technical scheme of the invention is as follows:
a risk early warning engine system processes complaint reports and network public opinion data in advance based on an artificial intelligent natural language processing technology and stores extracted key information in a formatted mode. And then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department.
Further, in the above-mentioned case,
the text information is processed into entry vectors through semantic analysis and word segmentation, a keyword corpus in the Internet supervision field is used as a training object, and a key instruction capable of representing the entry text information is found out through iterative learning of a deep learning algorithm.
The extracted information comprises complaint reporting objects, time, places and emotional information.
In a still further aspect of the present invention,
natural language processing and semantic analysis:
when the risk early warning engine preliminarily analyzes the text data, natural language analysis and understanding services are provided by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology and a neural network technology, and analysis services are provided for text information by means of processing technologies of word segmentation part-of-speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis and keyword extraction of NLP.
The related algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model and a FastText text classification algorithm model based on a neural network.
Analyzing a risk early warning signal:
further analyzing a model of the risk early warning signal and storing the model into a relational database; meanwhile, the user is supported to add the analysis model in a user-defined mode, and the model library is expanded to adapt to the dynamic change of the self-warning focus.
And the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.
Push risk early warning signal
And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.
The invention has the advantages that
The risk early warning signal generated according to the mass data can reflect a potential risk point, and the method is more accurate as a clue for developing supervision work. Meanwhile, limited resources are reasonably utilized, and the mode of automatically pushing the generated risk early warning signals to related supervision units is more efficient.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The risk early warning engine system is applied to the field of Internet and supervision, and is specially used for extracting valuable information such as objects, time, places, emotions and the like from large text content so as to analyze risk early warning signals.
The method comprises the steps of preprocessing the content of the large text, digging out information such as objects, time, places, emotions and the like, and writing the information back to a database, thereby structuring and abstracting the content. Through the analysis and processing of the link, the risk early warning engine can directly utilize the excavated structured result data to perform business analysis and calculation. Therefore, the retrieval speed of the service data can be improved, and the utilization rate of the data can be greatly improved.
The invention is applied to a risk early warning engine system, and belongs to the field of Internet plus supervision application. The risk early warning engine system preprocesses data based on an artificial intelligent natural language processing technology and formats and stores extracted key information. And then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department. The technical scheme of the invention is as follows:
1. natural language processing and semantic analysis:
when a risk early warning engine preliminarily analyzes large text data, personalized, integrated, intelligent and diversified natural language analysis and understanding services are provided by relying on a natural language processing technology, a cooperative crowdsourcing technology, a machine learning technology, a neural network technology and the like, and efficient and accurate analysis services are provided for text information by utilizing processing technologies of word segmentation part of speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis, keyword extraction and the like of NLP, the related core algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model, a FastText text classification algorithm model based on a neural network and the like.
2. Analyzing a risk early warning signal:
and a model for further analyzing the risk early warning signal is stored in a relational database, so that the quantity of data analysis models can be transversely expanded by a user, and the model database is enriched. And the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.
3. Push risk early warning signal
And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.
1) Based on a deep learning technology, the recognition rate from voice to text is improved;
2) the accuracy rate of text processing is improved based on a scene specific corpus in the Internet supervision field;
3) extracting keywords through natural language processing and semantic analysis;
the text information is processed into entry vectors through semantic analysis, word segmentation and the like, a specific keyword corpus in the Internet supervision field is used as a training object, iterative learning is performed through a deep learning algorithm, key instructions which can represent the text information can be found out more accurately, and the algorithm identification accuracy is improved.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (8)
1. A method for processing mining risk signals based on natural language is characterized in that,
preprocessing data based on an artificial intelligence natural language processing technology, and formatting and storing the extracted information; and then analyzing the preprocessed formatted content based on the constructed risk early warning analysis model to generate a risk early warning signal, and accurately transmitting the signal to a relevant department.
2. The method of claim 1,
the text information is processed into entry vectors through semantic analysis and word segmentation, a keyword corpus in the Internet supervision field is used as a training object, iterative learning is carried out through a deep learning algorithm, and an instruction representing the text information is found out.
3. The method of claim 2,
the extracted information includes object, time, place, emotion information.
4. The method according to claim 2 or 3,
natural language processing and semantic analysis:
when the risk early warning engine preliminarily analyzes the text data, natural language analysis and understanding services are provided by means of a natural language processing technology, a collaborative crowdsourcing technology, a machine learning technology and a neural network technology, and analysis services are provided for text information by means of processing technologies of word segmentation part-of-speech tagging, person name recognition, place name recognition, organization name recognition, time noun recognition, syntactic dependency analysis, automatic summarization, text similarity, text classification, emotion analysis and keyword extraction of NLP.
5. The method of claim 4,
the related algorithms comprise a K-short word segmentation algorithm, an HMM hidden Markov algorithm model, a Dijkstra shortest distance algorithm, a TF-IDF word frequency-reverse text frequency algorithm, a TextRank algorithm, a W2V word vector model, a CRF conditional random field algorithm model and a FastText text classification algorithm model based on a neural network.
6. The method of claim 5,
analyzing a risk early warning signal:
further analyzing a model of the risk early warning signal and storing the model into a relational database; meanwhile, the user is supported to add the analysis model in a user-defined mode, and the model library is expanded to adapt to the dynamic change of the self-warning focus.
7. The method of claim 6,
and the risk early warning engine analyzes the preprocessed formatted data according to the created analysis model, generates a risk early warning signal aiming at the suspicious data, and stores the risk early warning signal into a relational database table.
8. The method of claim 7,
push risk early warning signal
And pushing the generated risk early warning signal to a corresponding platform in an http post interface or library table exchange mode according to the type of the risk early warning signal problem, the supervision unit to which the type belongs and the distribution historical experience of the signal problem.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110108106.3A CN112836517A (en) | 2021-01-27 | 2021-01-27 | Method for processing mining risk signal based on natural language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110108106.3A CN112836517A (en) | 2021-01-27 | 2021-01-27 | Method for processing mining risk signal based on natural language |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112836517A true CN112836517A (en) | 2021-05-25 |
Family
ID=75931728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110108106.3A Pending CN112836517A (en) | 2021-01-27 | 2021-01-27 | Method for processing mining risk signal based on natural language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836517A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114328907A (en) * | 2021-10-22 | 2022-04-12 | 浙江嘉兴数字城市实验室有限公司 | Natural language processing method for early warning risk upgrade event |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529804A (en) * | 2016-11-09 | 2017-03-22 | 国网江苏省电力公司南京供电公司 | Client complaint early-warning monitoring analyzing method based on text mining technology |
CN110008311A (en) * | 2019-04-04 | 2019-07-12 | 北京邮电大学 | A kind of product information security risk monitoring method based on semantic analysis |
CN110347719A (en) * | 2019-06-24 | 2019-10-18 | 华南农业大学 | A kind of enterprise's foreign trade method for prewarning risk and system based on big data |
CN111899089A (en) * | 2020-07-01 | 2020-11-06 | 苏宁金融科技(南京)有限公司 | Enterprise risk early warning method and system based on knowledge graph |
-
2021
- 2021-01-27 CN CN202110108106.3A patent/CN112836517A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529804A (en) * | 2016-11-09 | 2017-03-22 | 国网江苏省电力公司南京供电公司 | Client complaint early-warning monitoring analyzing method based on text mining technology |
CN110008311A (en) * | 2019-04-04 | 2019-07-12 | 北京邮电大学 | A kind of product information security risk monitoring method based on semantic analysis |
CN110347719A (en) * | 2019-06-24 | 2019-10-18 | 华南农业大学 | A kind of enterprise's foreign trade method for prewarning risk and system based on big data |
CN111899089A (en) * | 2020-07-01 | 2020-11-06 | 苏宁金融科技(南京)有限公司 | Enterprise risk early warning method and system based on knowledge graph |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114328907A (en) * | 2021-10-22 | 2022-04-12 | 浙江嘉兴数字城市实验室有限公司 | Natural language processing method for early warning risk upgrade event |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109189942B (en) | Construction method and device of patent data knowledge graph | |
CN107679039B (en) | Method and device for determining statement intention | |
CN111046656B (en) | Text processing method, text processing device, electronic equipment and readable storage medium | |
CN109325040B (en) | FAQ question-answer library generalization method, device and equipment | |
CN111460125A (en) | Intelligent question and answer method and system for government affair service | |
CN111967761A (en) | Monitoring and early warning method and device based on knowledge graph and electronic equipment | |
CN111639484A (en) | Method for analyzing seat call content | |
CN111782793A (en) | Intelligent customer service processing method, system and equipment | |
CN107526721A (en) | A kind of disambiguation method and device to electric business product review vocabulary | |
CN117033571A (en) | Knowledge question-answering system construction method and system | |
CN115759071A (en) | Government affair sensitive information identification system and method based on big data | |
CN111241299A (en) | Knowledge graph automatic construction method for legal consultation and retrieval system thereof | |
CN109670045A (en) | Emotion reason abstracting method based on ontology model and multi-kernel support vector machine | |
CN112836517A (en) | Method for processing mining risk signal based on natural language | |
CN113407726A (en) | Emergency disposal plan method and system | |
CN116628173A (en) | Intelligent customer service information generation system and method based on keyword extraction | |
Han et al. | A novel part of speech tagging framework for nlp based business process management | |
CN112506405B (en) | Artificial intelligent voice large screen command method based on Internet supervision field | |
CN112488593B (en) | Auxiliary bid evaluation system and method for bidding | |
CN113688233A (en) | Text understanding method for semantic search of knowledge graph | |
CN109960798A (en) | Uighur text emergency event element recognition methods | |
CN116049385B (en) | Method, device, equipment and platform for generating information and create industry research report | |
CN111949781B (en) | Intelligent interaction method and device based on natural sentence syntactic analysis | |
Sharma et al. | Architecture and Types of Intelligent Agent and Uses of Various Technologies | |
CN113076468B (en) | Nested event extraction method based on field pre-training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210525 |