CN115859968B

CN115859968B - Policy granulation analysis system based on natural language analysis and machine learning

Info

Publication number: CN115859968B
Application number: CN202310166168.9A
Authority: CN
Inventors: 杨显华; 杨弋; 丁春利; 王铮; 牛颢; 高屹嵩; 龙树全; 姚晗; 王舒; 魏兵兵; 李�浩; 廖建雄; 周文安; 唐山; 聂珊
Original assignee: Sichuan Institute Of Standardization; SICHUAN INSTITUTE OF COMPUTER SCIENCES
Current assignee: Sichuan Institute Of Standardization; SICHUAN INSTITUTE OF COMPUTER SCIENCES
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-11-21
Anticipated expiration: 2043-02-27
Also published as: CN115859968A

Abstract

The invention relates to a policy granulation analysis system based on natural language analysis and machine learning, which solves the technical problem of low accuracy rate by adopting a policy file acquisition input module, a natural language processing module, a machine learning optimization module and a policy granulation analysis output module; the policy granulation analysis output module analyzes and outputs the granulation parameters of the policy according to the preset policy dimension characteristics and the result of the natural language processing module; the natural language processing module comprises a file preprocessing unit, a core processing component unit, a word normalization unit, a part-of-speech labeling unit, a primary analysis unit, a dictionary inquiring unit, a deep analysis unit and a natural language processing output unit; the machine learning optimization module comprises a part-of-speech quantization unit, a machine learning algorithm library and a technical scheme of an optimization fusion unit, so that the problem is well solved, and the machine learning optimization module can be used for policy granulation analysis.

Description

Policy granulation analysis system based on natural language analysis and machine learning

Technical Field

The invention relates to the field of policy analysis systems, in particular to a policy granulation analysis system based on natural language analysis and machine learning.

Background

Policy analysis is the process by which individuals, groups, research institutions systematically investigate, observe, and make quantitative and qualitative analyses of, their reflected information, the conditions, questions, and conditions in the organizational policies, decision procedures, and activities that are currently or planarly enforced. The purpose of this is to assist policy makers in continuing to adhere to or improve policy goals, achieving social development and the benefit of most people. This concept was first proposed by the politician lindbulomb, U.S. and he believes that policy analysis is common in policy formulation. The policy analysis theoretical model mainly comprises: politics system model, community model, elite model, functional process model, system model, rational model, progressive model, game model, etc.

The invention provides a policy granulation analysis system based on natural language analysis and machine learning, which is used for solving the calculation problems.

Disclosure of Invention

The invention aims to solve the technical problem of a policy granulation analysis system based on natural language analysis and machine learning in the prior art. The novel policy granulation analysis system based on natural language analysis and machine learning has the characteristic of high accuracy.

In order to solve the technical problems, the technical scheme adopted is as follows:

a policy granular analysis system based on natural language parsing and machine learning, the policy granular analysis system based on natural language parsing and machine learning comprising:

the system comprises a policy file acquisition input module, a natural language processing module, a machine learning optimization module, a policy granulation analysis output module and a machine learning optimization module, wherein the machine learning optimization module is connected with the natural language processing module;

the policy granulation analysis output module analyzes and outputs the granulation parameters of the policy according to the preset policy dimension characteristics and the result of the natural language processing module;

the natural language processing module comprises a file preprocessing unit, a core processing component unit, a word normalization unit, a part-of-speech labeling unit, a primary analysis unit, a dictionary inquiring unit, a word normalization unit, a deep analysis unit and a natural language processing output unit;

the machine learning optimization module comprises a part-of-speech quantization unit, a machine learning algorithm library and an optimization fusion unit; the part-of-speech quantization unit is used for processing natural language into machine quantized language, the machine learning algorithm library is used for loading various machine learning algorithms, and the machine learning optimization module executes the following steps:

step s1, the part-of-speech quantization unit processes natural language into machine language;

step s2, policy is setThe text is divided into s _k Group, corresponding to retrieving s from machine learning algorithm library _k A machine learning algorithm model is planted;

step s3, select the s _ki The subset data is defined as verification set, the rest k-1 group subset data is used as training set, and the s < th > is input _ki Obtaining s by seeding a machine algorithm model _k ×s _k Individual model calculations, ki=1, 2,3,..k;

step s4, defineWherein { x ₁ ,x ₂ ,...x _ki ,x _k Is the s < th } is _ki When the subset data is defined as a verification set, the calculated values of the independent ki algorithm models are obtained; ki=1, 2, 3..k, j and w are predefined parameters, w ₁ ,w ₂ ,...w _k Is a real number set;

step s5, by y _ki ＝μ+αt _ki +ε _ki μ=log (2γ), the characteristic index α calculates a weight dispersion coefficient γ; wherein,ε _ki error term coefficients, t, of the same distribution but independent for a predefined mean value of 0 _ki ＝log|w _ki |；

Step s6, by z _ki ＝δw _ki +ε _k i, calculating the parameter delta, wherein z _ki ＝arctan(Im(w _ki )/Re(w _ki )，ε _k Error term coefficients belonging to the same distribution but independent and having a predefined mean value of 0;

step s7, bringing the characteristic indexes oc, the weight dispersion coefficients γ, and the position parameters δ obtained in steps s5 and s6 into Φ (w) =exp { jδw- γjw|w| ^∝ And performing Fourier transform calculation to obtain a weight distribution function f (x), multiplying the model calculation value by the weight distribution function f (x), and completing fitting of k algorithm model calculation values.

The working principle of the invention is as follows: the invention combines natural language recognition analysis and machine learning technology,the policy granulation analysis is efficiently realized. On the basis, in order to improve accuracy, the invention is provided with a machine learning optimization module, and a word quantization unit, a machine learning algorithm library and an optimization fusion unit are used in combination to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s _k Group, corresponding to retrieving s from machine learning algorithm library _k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained.

In the above preferred scheme, for optimization, further, the core processing component unit comprises a word segmentation device, a sentence boundary annotator, a substitute sentence detector, a mark generator and a document segment description annotator; the sentence boundary annotator is an OpenNLP sentence detection module.

Further, the file preprocessing unit: converting the policy file into a plain text file, inserting paragraph marks into the text, correcting wrongly connected words, and inserting hyphens;

word normalization unit: providing a representation form for each word in the policy text, normalizing the words according to vocabulary attributes, and specifically comprising letter cases, single complex forms, spelling changes, punctuation marks, attribute marks, stop words, inflexion marks, symbols and conjunctions; the mapping relation between the same word and different description characters can be mapped; can be completed by adopting the existing SPECIALIST vocabulary tool;

part of speech tagging unit: assigning a proper part of speech to each word in the text sentence, wherein the part of speech comprises nouns, verbs, adjectives and adverbs; the existing rule-based labeling algorithm, random labeling algorithm and mixed labeling algorithm can be adopted;

primary analysis unit: finishing keyword marking; a blocking module corresponding to the existing CTAKES model can be adopted;

policy feature entity identification unit: mapping each policy feature entity from a term to a concept based on the existing method for querying the dictionary, searching accurate matching items of words in dictionary entries and words in a policy text, and realizing matching of word canonical forms by searching the arrangement sequence of the words in the dictionary;

depth analysis unit: the method comprises the steps of providing syntax information and determining association relations among words; expressing vocabulary in natural language by using numerical value vector to obtain word vector;

natural language processing output unit: the method is used for outputting natural language identification processing results and further carrying out policy granulation analysis;

the deep analysis unit comprises the following steps of realizing word association:

step k1, using word1 in the seed word set, and performing association degree calculation on the word1 and word2 in the candidate word set;

step k2, calculating the association degree of word1 and word2Wherein, P (word 1, word 2) is the probability that word1 and word2 appear together; p (word 1) is the probability that word1 appears in the article, and P (word 2) is the probability that word2 appears in the article;

step k3, judging the magnitude of the association degree PMI (word 1, word 2) and a predefined threshold value, if the association degree PMI is larger than the rule, defining word1 to word2 association, classifying word1 into word2, and classifying word wor into word1; otherwise, the definition is irrelevant.

Further, the deep analysis unit executes the following steps to realize feature space spectrum fusion of the associated words;

step r1, selecting words word1 and words word2, defining the words word1 and words word2 as circle center nodes respectively, normalizing and calculating association degree values of word1 association word combinations, sequencing the association degree values of word2 association word combinations, normalizing and calculating association degree values of word2 association word combinations, sequencing the association degree values to obtain association relation space atlas gl1 of word1 and association relation space atlas gl2 of word2 respectively, and characterizing the association relation space atlas association degree values by using color depth values;

r2, selecting an incidence relation space spectrum gl1 or an incidence relation space spectrum gl2 as a source space spectrum, and the other as a target space spectrum;

step r3, selecting a center node as a starting point, selecting adjacent words as end points, and calling a starting point and end point association degree value pw in the association relation space map gl1 ₁ Pw associated with start point and end point of association relation space map gl2 ₂ Calculating pw ₁ ×pw ₂ A value, if the value is smaller than a predefined threshold value, executing a step r7, otherwise executing a step r4;

step r4, calculating pw ₁ -pw ₂ If the value is smaller than a predefined threshold value, performing coincidence fusion, otherwise, performing difference fusion; the words which are overlapped and fused as a starting point and a finishing point are fused in parallel, and the related connecting lines take smaller color depth values for fusion; the difference fusion is to rotate the association relation space map gl1 or the association relation space map gl2 by taking the starting point as the center, so that the starting point is fused, the ending points are different, and the association connecting lines take respective color depth values;

and r5, traversing all words to finish the feature space map fusion of the associated words.

Further, the dictionary comprises an objective dictionary and a subjective dictionary; the subjective dictionary is a dictionary composed of words indicating policy tendency.

The invention has the beneficial effects that: the invention efficiently realizes the granular analysis of the policy by combining natural language identification analysis and machine learning technology. On the basis, in order to improve accuracy, the invention is provided with a machine learning optimization module, and a word quantization unit, a machine learning algorithm library and an optimization fusion unit are used in combination to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s _k Group, corresponding to retrieving s from machine learning algorithm library _k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained. In the deep analysis unit, in order to realize the high-accuracy recognition and the high-efficiency recognition of the relevance of words and sentences, the invention adopts an intelligent map fusion technology to realize the purpose.

Drawings

The invention will be further described with reference to the drawings and examples.

FIG. 1 is a schematic diagram of a policy granular analysis system based on natural language parsing and machine learning.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

The present embodiment provides a policy granulation analysis system based on natural language analysis and machine learning, as shown in fig. 1, the policy granulation analysis system based on natural language analysis and machine learning includes:

step s2, dividing the policy text into s _k Group, corresponding to fetch from machine learning algorithm librarys _k A machine learning algorithm model is planted;

The embodiment efficiently realizes the granular analysis of the policy by combining natural language recognition analysis and machine learning technology. On the basis, in order to improve the accuracy, the inventionThe machine learning optimization module is loaded, and the word quantization unit, the machine learning algorithm library and the optimization fusion unit are combined to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s _k Group, corresponding to retrieving s from machine learning algorithm library _k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained.

Specifically, the core processing component unit comprises a word segmentation device, a sentence boundary annotator, a substitute sentence detector, a mark generator and a document segment description annotator; the sentence boundary annotator is an OpenNLP sentence detection module.

Specifically, the file preprocessing unit: converting the policy file into a plain text file, inserting paragraph marks into the text, correcting wrongly connected words, and inserting hyphens;

Preferably, the deep analysis unit executes the following steps to realize feature space spectrum fusion of the associated words;

step r3, selecting the center node as the originThe point, the adjacent words are used as the end points, and the relevance value pw of the start points and the end points in the relevance space map gl1 is called ₁ Pw associated with start point and end point of association relation space map gl2 ₂ Calculating pw ₁ ×pw ₂ A value, if the value is smaller than a predefined threshold value, executing a step r7, otherwise executing a step r4;

Preferably, the dictionary includes an objective dictionary and a subjective dictionary; the subjective dictionary is a dictionary composed of words indicating policy tendency. By adopting the subjective policy trend dictionary, policy granulation analysis can be further enriched on the basis of the conventional policy trend judgment.

The embodiment efficiently realizes the granular analysis of the policy by combining natural language recognition analysis and machine learning technology. On the basis, in order to improve accuracy, the invention is provided with a machine learning optimization module, and a word quantization unit, a machine learning algorithm library and an optimization fusion unit are used in combination to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s _k Group, corresponding to retrieving s from machine learning algorithm library _k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained. In the deep analysis unit, in order to realize the high-accuracy recognition and the high-efficiency recognition of the relevance of words and sentences, the invention adopts an intelligent map fusion technology to realize the purpose.

While the foregoing describes the illustrative embodiments of the present invention so that those skilled in the art may understand the present invention, the present invention is not limited to the specific embodiments, and all inventive innovations utilizing the inventive concepts are herein within the scope of the present invention as defined and defined by the appended claims, as long as the various changes are within the spirit and scope of the present invention.

Claims

1. A policy granulation analysis system based on natural language analysis and machine learning is characterized in that: the policy granulation analysis system based on natural language parsing and machine learning comprises:

step s2, dividing the original text into s _k Group, corresponding to retrieving s from machine learning algorithm library _k A machine learning algorithm model is planted;

step s3, select the s _ki The subset data is defined as verification set, the rest k-1 group subset data is used as training set, and the first is inputs _ki Obtaining s by seeding a machine algorithm model _k ×s _k Model calculation, ki=1, 2,3,..k, k is an integer greater than 1;

step s4, defining intermediate parameters ψ (w) is an intermediate parameter, where { x } x ₁ ,x ₂ ,...x _ki ,x _k Is the s < th } is _ki When the subset data is defined as a verification set, the calculated values of the independent ki algorithm models are obtained; ki=1, 2, 3..k, j and w are predefined parameters, w ₁ ,w ₂ ,...w _k Is a real number set, w _ki Is the ki w value;

step s5, by intermediate parameter y _ki ＝μ+αt _ki +ε _ki The predefined coefficient mu=log (2 gamma), and the characteristic index alpha and the weight dispersion coefficient gamma are calculated; wherein,ε _ki an intermediate parameter t, which is a predefined error term coefficient with an average value of 0 and belongs to the same distribution but is independent _ki ＝log|w _ki |；

Step s6, by intermediate parameter z _ki ＝δw _ki +ε _ki Calculating a parameter delta, wherein z _ki ＝arctan(Im(w _ki )/Re(w _ki )，ε _ki Error term coefficients belonging to the same distribution but independent and having a predefined mean value of 0;

step s7, bringing the characteristic index α, the weight dispersion coefficient γ, and the position parameter δ obtained in steps s5 and s6 into an intermediate function Φ (w) =exp { jδw- γ | w| ^α And performing Fourier transform calculation to obtain a weight distribution function f (x), multiplying the model calculation value by the weight distribution function f (x) to obtain a fitting value, and completing the fitting of the k algorithm model calculation values.

2. The policy granular analysis system based on natural language parsing and machine learning according to claim 1, wherein: the core processing component unit comprises a word segmentation device, a sentence boundary annotator, a substitute sentence detector, a mark generator and a document segment description annotator; the sentence boundary annotator is an OpenNLP sentence detection module.

3. The policy granular analysis system based on natural language parsing and machine learning according to claim 1, wherein: the deep analysis unit comprises the following steps of realizing word association:

4. The policy granular analysis system according to claim 3, wherein: the deep analysis unit performs the following steps to realize feature space map fusion of the associated words;

5. The policy granular analysis system based on natural language parsing and machine learning according to claim 1, wherein: the dictionary comprises an objective dictionary and a subjective dictionary; the subjective dictionary is a dictionary composed of words indicating policy tendency.