CN115859968B - Policy granulation analysis system based on natural language analysis and machine learning - Google Patents
Policy granulation analysis system based on natural language analysis and machine learning Download PDFInfo
- Publication number
- CN115859968B CN115859968B CN202310166168.9A CN202310166168A CN115859968B CN 115859968 B CN115859968 B CN 115859968B CN 202310166168 A CN202310166168 A CN 202310166168A CN 115859968 B CN115859968 B CN 115859968B
- Authority
- CN
- China
- Prior art keywords
- policy
- machine learning
- unit
- natural language
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 77
- 238000010801 machine learning Methods 0.000 title claims abstract description 75
- 238000005469 granulation Methods 0.000 title claims abstract description 28
- 230000003179 granulation Effects 0.000 title claims abstract description 28
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 63
- 230000004927 fusion Effects 0.000 claims abstract description 44
- 238000005457 optimization Methods 0.000 claims abstract description 35
- 238000003058 natural language processing Methods 0.000 claims abstract description 21
- 238000013139 quantization Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000002372 labelling Methods 0.000 claims abstract description 10
- 238000010606 normalization Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000001228 spectrum Methods 0.000 claims description 14
- 238000012821 model calculation Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 239000006185 dispersion Substances 0.000 claims description 6
- 238000005315 distribution function Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 3
- 238000010899 nucleation Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000004148 unit process Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims 1
- 238000000034 method Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a policy granulation analysis system based on natural language analysis and machine learning, which solves the technical problem of low accuracy rate by adopting a policy file acquisition input module, a natural language processing module, a machine learning optimization module and a policy granulation analysis output module; the policy granulation analysis output module analyzes and outputs the granulation parameters of the policy according to the preset policy dimension characteristics and the result of the natural language processing module; the natural language processing module comprises a file preprocessing unit, a core processing component unit, a word normalization unit, a part-of-speech labeling unit, a primary analysis unit, a dictionary inquiring unit, a deep analysis unit and a natural language processing output unit; the machine learning optimization module comprises a part-of-speech quantization unit, a machine learning algorithm library and a technical scheme of an optimization fusion unit, so that the problem is well solved, and the machine learning optimization module can be used for policy granulation analysis.
Description
Technical Field
The invention relates to the field of policy analysis systems, in particular to a policy granulation analysis system based on natural language analysis and machine learning.
Background
Policy analysis is the process by which individuals, groups, research institutions systematically investigate, observe, and make quantitative and qualitative analyses of, their reflected information, the conditions, questions, and conditions in the organizational policies, decision procedures, and activities that are currently or planarly enforced. The purpose of this is to assist policy makers in continuing to adhere to or improve policy goals, achieving social development and the benefit of most people. This concept was first proposed by the politician lindbulomb, U.S. and he believes that policy analysis is common in policy formulation. The policy analysis theoretical model mainly comprises: politics system model, community model, elite model, functional process model, system model, rational model, progressive model, game model, etc.
The invention provides a policy granulation analysis system based on natural language analysis and machine learning, which is used for solving the calculation problems.
Disclosure of Invention
The invention aims to solve the technical problem of a policy granulation analysis system based on natural language analysis and machine learning in the prior art. The novel policy granulation analysis system based on natural language analysis and machine learning has the characteristic of high accuracy.
In order to solve the technical problems, the technical scheme adopted is as follows:
a policy granular analysis system based on natural language parsing and machine learning, the policy granular analysis system based on natural language parsing and machine learning comprising:
the system comprises a policy file acquisition input module, a natural language processing module, a machine learning optimization module, a policy granulation analysis output module and a machine learning optimization module, wherein the machine learning optimization module is connected with the natural language processing module;
the policy granulation analysis output module analyzes and outputs the granulation parameters of the policy according to the preset policy dimension characteristics and the result of the natural language processing module;
the natural language processing module comprises a file preprocessing unit, a core processing component unit, a word normalization unit, a part-of-speech labeling unit, a primary analysis unit, a dictionary inquiring unit, a word normalization unit, a deep analysis unit and a natural language processing output unit;
the machine learning optimization module comprises a part-of-speech quantization unit, a machine learning algorithm library and an optimization fusion unit; the part-of-speech quantization unit is used for processing natural language into machine quantized language, the machine learning algorithm library is used for loading various machine learning algorithms, and the machine learning optimization module executes the following steps:
step s1, the part-of-speech quantization unit processes natural language into machine language;
step s2, policy is setThe text is divided into s k Group, corresponding to retrieving s from machine learning algorithm library k A machine learning algorithm model is planted;
step s3, select the s ki The subset data is defined as verification set, the rest k-1 group subset data is used as training set, and the s < th > is input ki Obtaining s by seeding a machine algorithm model k ×s k Individual model calculations, ki=1, 2,3,..k;
step s4, defineWherein { x 1 ,x 2 ,...x ki ,x k Is the s < th } is ki When the subset data is defined as a verification set, the calculated values of the independent ki algorithm models are obtained; ki=1, 2, 3..k, j and w are predefined parameters, w 1 ,w 2 ,...w k Is a real number set;
step s5, by y ki =μ+αt ki +ε ki μ=log (2γ), the characteristic index α calculates a weight dispersion coefficient γ; wherein,ε ki error term coefficients, t, of the same distribution but independent for a predefined mean value of 0 ki =log|w ki |;
Step s6, by z ki =δw ki +ε k i, calculating the parameter delta, wherein z ki =arctan(Im(w ki )/Re(w ki ),ε k Error term coefficients belonging to the same distribution but independent and having a predefined mean value of 0;
step s7, bringing the characteristic indexes oc, the weight dispersion coefficients γ, and the position parameters δ obtained in steps s5 and s6 into Φ (w) =exp { jδw- γjw|w| ∝ And performing Fourier transform calculation to obtain a weight distribution function f (x), multiplying the model calculation value by the weight distribution function f (x), and completing fitting of k algorithm model calculation values.
The working principle of the invention is as follows: the invention combines natural language recognition analysis and machine learning technology,the policy granulation analysis is efficiently realized. On the basis, in order to improve accuracy, the invention is provided with a machine learning optimization module, and a word quantization unit, a machine learning algorithm library and an optimization fusion unit are used in combination to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s k Group, corresponding to retrieving s from machine learning algorithm library k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained.
In the above preferred scheme, for optimization, further, the core processing component unit comprises a word segmentation device, a sentence boundary annotator, a substitute sentence detector, a mark generator and a document segment description annotator; the sentence boundary annotator is an OpenNLP sentence detection module.
Further, the file preprocessing unit: converting the policy file into a plain text file, inserting paragraph marks into the text, correcting wrongly connected words, and inserting hyphens;
word normalization unit: providing a representation form for each word in the policy text, normalizing the words according to vocabulary attributes, and specifically comprising letter cases, single complex forms, spelling changes, punctuation marks, attribute marks, stop words, inflexion marks, symbols and conjunctions; the mapping relation between the same word and different description characters can be mapped; can be completed by adopting the existing SPECIALIST vocabulary tool;
part of speech tagging unit: assigning a proper part of speech to each word in the text sentence, wherein the part of speech comprises nouns, verbs, adjectives and adverbs; the existing rule-based labeling algorithm, random labeling algorithm and mixed labeling algorithm can be adopted;
primary analysis unit: finishing keyword marking; a blocking module corresponding to the existing CTAKES model can be adopted;
policy feature entity identification unit: mapping each policy feature entity from a term to a concept based on the existing method for querying the dictionary, searching accurate matching items of words in dictionary entries and words in a policy text, and realizing matching of word canonical forms by searching the arrangement sequence of the words in the dictionary;
depth analysis unit: the method comprises the steps of providing syntax information and determining association relations among words; expressing vocabulary in natural language by using numerical value vector to obtain word vector;
natural language processing output unit: the method is used for outputting natural language identification processing results and further carrying out policy granulation analysis;
the deep analysis unit comprises the following steps of realizing word association:
step k1, using word1 in the seed word set, and performing association degree calculation on the word1 and word2 in the candidate word set;
step k2, calculating the association degree of word1 and word2Wherein, P (word 1, word 2) is the probability that word1 and word2 appear together; p (word 1) is the probability that word1 appears in the article, and P (word 2) is the probability that word2 appears in the article;
step k3, judging the magnitude of the association degree PMI (word 1, word 2) and a predefined threshold value, if the association degree PMI is larger than the rule, defining word1 to word2 association, classifying word1 into word2, and classifying word wor into word1; otherwise, the definition is irrelevant.
Further, the deep analysis unit executes the following steps to realize feature space spectrum fusion of the associated words;
step r1, selecting words word1 and words word2, defining the words word1 and words word2 as circle center nodes respectively, normalizing and calculating association degree values of word1 association word combinations, sequencing the association degree values of word2 association word combinations, normalizing and calculating association degree values of word2 association word combinations, sequencing the association degree values to obtain association relation space atlas gl1 of word1 and association relation space atlas gl2 of word2 respectively, and characterizing the association relation space atlas association degree values by using color depth values;
r2, selecting an incidence relation space spectrum gl1 or an incidence relation space spectrum gl2 as a source space spectrum, and the other as a target space spectrum;
step r3, selecting a center node as a starting point, selecting adjacent words as end points, and calling a starting point and end point association degree value pw in the association relation space map gl1 1 Pw associated with start point and end point of association relation space map gl2 2 Calculating pw 1 ×pw 2 A value, if the value is smaller than a predefined threshold value, executing a step r7, otherwise executing a step r4;
step r4, calculating pw 1 -pw 2 If the value is smaller than a predefined threshold value, performing coincidence fusion, otherwise, performing difference fusion; the words which are overlapped and fused as a starting point and a finishing point are fused in parallel, and the related connecting lines take smaller color depth values for fusion; the difference fusion is to rotate the association relation space map gl1 or the association relation space map gl2 by taking the starting point as the center, so that the starting point is fused, the ending points are different, and the association connecting lines take respective color depth values;
and r5, traversing all words to finish the feature space map fusion of the associated words.
Further, the dictionary comprises an objective dictionary and a subjective dictionary; the subjective dictionary is a dictionary composed of words indicating policy tendency.
The invention has the beneficial effects that: the invention efficiently realizes the granular analysis of the policy by combining natural language identification analysis and machine learning technology. On the basis, in order to improve accuracy, the invention is provided with a machine learning optimization module, and a word quantization unit, a machine learning algorithm library and an optimization fusion unit are used in combination to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s k Group, corresponding to retrieving s from machine learning algorithm library k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained. In the deep analysis unit, in order to realize the high-accuracy recognition and the high-efficiency recognition of the relevance of words and sentences, the invention adopts an intelligent map fusion technology to realize the purpose.
Drawings
The invention will be further described with reference to the drawings and examples.
FIG. 1 is a schematic diagram of a policy granular analysis system based on natural language parsing and machine learning.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The present embodiment provides a policy granulation analysis system based on natural language analysis and machine learning, as shown in fig. 1, the policy granulation analysis system based on natural language analysis and machine learning includes:
the system comprises a policy file acquisition input module, a natural language processing module, a machine learning optimization module, a policy granulation analysis output module and a machine learning optimization module, wherein the machine learning optimization module is connected with the natural language processing module;
the policy granulation analysis output module analyzes and outputs the granulation parameters of the policy according to the preset policy dimension characteristics and the result of the natural language processing module;
the natural language processing module comprises a file preprocessing unit, a core processing component unit, a word normalization unit, a part-of-speech labeling unit, a primary analysis unit, a dictionary inquiring unit, a word normalization unit, a deep analysis unit and a natural language processing output unit;
the machine learning optimization module comprises a part-of-speech quantization unit, a machine learning algorithm library and an optimization fusion unit; the part-of-speech quantization unit is used for processing natural language into machine quantized language, the machine learning algorithm library is used for loading various machine learning algorithms, and the machine learning optimization module executes the following steps:
step s1, the part-of-speech quantization unit processes natural language into machine language;
step s2, dividing the policy text into s k Group, corresponding to fetch from machine learning algorithm librarys k A machine learning algorithm model is planted;
step s3, select the s ki The subset data is defined as verification set, the rest k-1 group subset data is used as training set, and the s < th > is input ki Obtaining s by seeding a machine algorithm model k ×s k Individual model calculations, ki=1, 2,3,..k;
step s4, defineWherein { x 1 ,x 2 ,...x ki ,x k Is the s < th } is ki When the subset data is defined as a verification set, the calculated values of the independent ki algorithm models are obtained; ki=1, 2, 3..k, j and w are predefined parameters, w 1 ,w 2 ,...w k Is a real number set;
step s5, by y ki =μ+αt ki +ε ki μ=log (2γ), the characteristic index α calculates a weight dispersion coefficient γ; wherein,ε ki error term coefficients, t, of the same distribution but independent for a predefined mean value of 0 ki =log|w ki |;
Step s6, by z ki =δw ki +ε k i, calculating the parameter delta, wherein z ki =arctan(Im(w ki )/Re(w ki ),ε k Error term coefficients belonging to the same distribution but independent and having a predefined mean value of 0;
step s7, bringing the characteristic indexes oc, the weight dispersion coefficients γ, and the position parameters δ obtained in steps s5 and s6 into Φ (w) =exp { jδw- γjw|w| ∝ And performing Fourier transform calculation to obtain a weight distribution function f (x), multiplying the model calculation value by the weight distribution function f (x), and completing fitting of k algorithm model calculation values.
The embodiment efficiently realizes the granular analysis of the policy by combining natural language recognition analysis and machine learning technology. On the basis, in order to improve the accuracy, the inventionThe machine learning optimization module is loaded, and the word quantization unit, the machine learning algorithm library and the optimization fusion unit are combined to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s k Group, corresponding to retrieving s from machine learning algorithm library k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained.
Specifically, the core processing component unit comprises a word segmentation device, a sentence boundary annotator, a substitute sentence detector, a mark generator and a document segment description annotator; the sentence boundary annotator is an OpenNLP sentence detection module.
Specifically, the file preprocessing unit: converting the policy file into a plain text file, inserting paragraph marks into the text, correcting wrongly connected words, and inserting hyphens;
word normalization unit: providing a representation form for each word in the policy text, normalizing the words according to vocabulary attributes, and specifically comprising letter cases, single complex forms, spelling changes, punctuation marks, attribute marks, stop words, inflexion marks, symbols and conjunctions; the mapping relation between the same word and different description characters can be mapped; can be completed by adopting the existing SPECIALIST vocabulary tool;
part of speech tagging unit: assigning a proper part of speech to each word in the text sentence, wherein the part of speech comprises nouns, verbs, adjectives and adverbs; the existing rule-based labeling algorithm, random labeling algorithm and mixed labeling algorithm can be adopted;
primary analysis unit: finishing keyword marking; a blocking module corresponding to the existing CTAKES model can be adopted;
policy feature entity identification unit: mapping each policy feature entity from a term to a concept based on the existing method for querying the dictionary, searching accurate matching items of words in dictionary entries and words in a policy text, and realizing matching of word canonical forms by searching the arrangement sequence of the words in the dictionary;
depth analysis unit: the method comprises the steps of providing syntax information and determining association relations among words; expressing vocabulary in natural language by using numerical value vector to obtain word vector;
natural language processing output unit: the method is used for outputting natural language identification processing results and further carrying out policy granulation analysis;
the deep analysis unit comprises the following steps of realizing word association:
step k1, using word1 in the seed word set, and performing association degree calculation on the word1 and word2 in the candidate word set;
step k2, calculating the association degree of word1 and word2Wherein, P (word 1, word 2) is the probability that word1 and word2 appear together; p (word 1) is the probability that word1 appears in the article, and P (word 2) is the probability that word2 appears in the article;
step k3, judging the magnitude of the association degree PMI (word 1, word 2) and a predefined threshold value, if the association degree PMI is larger than the rule, defining word1 to word2 association, classifying word1 into word2, and classifying word wor into word1; otherwise, the definition is irrelevant.
Preferably, the deep analysis unit executes the following steps to realize feature space spectrum fusion of the associated words;
step r1, selecting words word1 and words word2, defining the words word1 and words word2 as circle center nodes respectively, normalizing and calculating association degree values of word1 association word combinations, sequencing the association degree values of word2 association word combinations, normalizing and calculating association degree values of word2 association word combinations, sequencing the association degree values to obtain association relation space atlas gl1 of word1 and association relation space atlas gl2 of word2 respectively, and characterizing the association relation space atlas association degree values by using color depth values;
r2, selecting an incidence relation space spectrum gl1 or an incidence relation space spectrum gl2 as a source space spectrum, and the other as a target space spectrum;
step r3, selecting the center node as the originThe point, the adjacent words are used as the end points, and the relevance value pw of the start points and the end points in the relevance space map gl1 is called 1 Pw associated with start point and end point of association relation space map gl2 2 Calculating pw 1 ×pw 2 A value, if the value is smaller than a predefined threshold value, executing a step r7, otherwise executing a step r4;
step r4, calculating pw 1 -pw 2 If the value is smaller than a predefined threshold value, performing coincidence fusion, otherwise, performing difference fusion; the words which are overlapped and fused as a starting point and a finishing point are fused in parallel, and the related connecting lines take smaller color depth values for fusion; the difference fusion is to rotate the association relation space map gl1 or the association relation space map gl2 by taking the starting point as the center, so that the starting point is fused, the ending points are different, and the association connecting lines take respective color depth values;
and r5, traversing all words to finish the feature space map fusion of the associated words.
Preferably, the dictionary includes an objective dictionary and a subjective dictionary; the subjective dictionary is a dictionary composed of words indicating policy tendency. By adopting the subjective policy trend dictionary, policy granulation analysis can be further enriched on the basis of the conventional policy trend judgment.
The embodiment efficiently realizes the granular analysis of the policy by combining natural language recognition analysis and machine learning technology. On the basis, in order to improve accuracy, the invention is provided with a machine learning optimization module, and a word quantization unit, a machine learning algorithm library and an optimization fusion unit are used in combination to realize the fusion of multiple algorithms of machine learning optimization. Dividing policy text into s k Group, corresponding to retrieving s from machine learning algorithm library k The machine learning algorithm model is adopted, then the algorithm model is adopted for fusion, and a special fusion algorithm is adopted, so that fusion weighting of various algorithms is realized, and a natural language recognition and analysis algorithm calculated value with high accuracy is obtained. In the deep analysis unit, in order to realize the high-accuracy recognition and the high-efficiency recognition of the relevance of words and sentences, the invention adopts an intelligent map fusion technology to realize the purpose.
While the foregoing describes the illustrative embodiments of the present invention so that those skilled in the art may understand the present invention, the present invention is not limited to the specific embodiments, and all inventive innovations utilizing the inventive concepts are herein within the scope of the present invention as defined and defined by the appended claims, as long as the various changes are within the spirit and scope of the present invention.
Claims (5)
1. A policy granulation analysis system based on natural language analysis and machine learning is characterized in that: the policy granulation analysis system based on natural language parsing and machine learning comprises:
the system comprises a policy file acquisition input module, a natural language processing module, a machine learning optimization module, a policy granulation analysis output module and a machine learning optimization module, wherein the machine learning optimization module is connected with the natural language processing module;
the policy granulation analysis output module analyzes and outputs the granulation parameters of the policy according to the preset policy dimension characteristics and the result of the natural language processing module;
the natural language processing module comprises a file preprocessing unit, a core processing component unit, a word normalization unit, a part-of-speech labeling unit, a primary analysis unit, a dictionary inquiring unit, a word normalization unit, a deep analysis unit and a natural language processing output unit;
the machine learning optimization module comprises a part-of-speech quantization unit, a machine learning algorithm library and an optimization fusion unit; the part-of-speech quantization unit is used for processing natural language into machine quantized language, the machine learning algorithm library is used for loading various machine learning algorithms, and the machine learning optimization module executes the following steps:
step s1, the part-of-speech quantization unit processes natural language into machine language;
step s2, dividing the original text into s k Group, corresponding to retrieving s from machine learning algorithm library k A machine learning algorithm model is planted;
step s3, select the s ki The subset data is defined as verification set, the rest k-1 group subset data is used as training set, and the first is inputs ki Obtaining s by seeding a machine algorithm model k ×s k Model calculation, ki=1, 2,3,..k, k is an integer greater than 1;
step s4, defining intermediate parameters ψ (w) is an intermediate parameter, where { x } x 1 ,x 2 ,...x ki ,x k Is the s < th } is ki When the subset data is defined as a verification set, the calculated values of the independent ki algorithm models are obtained; ki=1, 2, 3..k, j and w are predefined parameters, w 1 ,w 2 ,...w k Is a real number set, w ki Is the ki w value;
step s5, by intermediate parameter y ki =μ+αt ki +ε ki The predefined coefficient mu=log (2 gamma), and the characteristic index alpha and the weight dispersion coefficient gamma are calculated; wherein,ε ki an intermediate parameter t, which is a predefined error term coefficient with an average value of 0 and belongs to the same distribution but is independent ki =log|w ki |;
Step s6, by intermediate parameter z ki =δw ki +ε ki Calculating a parameter delta, wherein z ki =arctan(Im(w ki )/Re(w ki ),ε ki Error term coefficients belonging to the same distribution but independent and having a predefined mean value of 0;
step s7, bringing the characteristic index α, the weight dispersion coefficient γ, and the position parameter δ obtained in steps s5 and s6 into an intermediate function Φ (w) =exp { jδw- γ | w| α And performing Fourier transform calculation to obtain a weight distribution function f (x), multiplying the model calculation value by the weight distribution function f (x) to obtain a fitting value, and completing the fitting of the k algorithm model calculation values.
2. The policy granular analysis system based on natural language parsing and machine learning according to claim 1, wherein: the core processing component unit comprises a word segmentation device, a sentence boundary annotator, a substitute sentence detector, a mark generator and a document segment description annotator; the sentence boundary annotator is an OpenNLP sentence detection module.
3. The policy granular analysis system based on natural language parsing and machine learning according to claim 1, wherein: the deep analysis unit comprises the following steps of realizing word association:
step k1, using word1 in the seed word set, and performing association degree calculation on the word1 and word2 in the candidate word set;
step k2, calculating the association degree of word1 and word2Wherein, P (word 1, word 2) is the probability that word1 and word2 appear together; p (word 1) is the probability that word1 appears in the article, and P (word 2) is the probability that word2 appears in the article;
step k3, judging the magnitude of the association degree PMI (word 1, word 2) and a predefined threshold value, if the association degree PMI is larger than the rule, defining word1 to word2 association, classifying word1 into word2, and classifying word wor into word1; otherwise, the definition is irrelevant.
4. The policy granular analysis system according to claim 3, wherein: the deep analysis unit performs the following steps to realize feature space map fusion of the associated words;
step r1, selecting words word1 and words word2, defining the words word1 and words word2 as circle center nodes respectively, normalizing and calculating association degree values of word1 association word combinations, sequencing the association degree values of word2 association word combinations, normalizing and calculating association degree values of word2 association word combinations, sequencing the association degree values to obtain association relation space atlas gl1 of word1 and association relation space atlas gl2 of word2 respectively, and characterizing the association relation space atlas association degree values by using color depth values;
r2, selecting an incidence relation space spectrum gl1 or an incidence relation space spectrum gl2 as a source space spectrum, and the other as a target space spectrum;
step r3, selecting a center node as a starting point, selecting adjacent words as end points, and calling a starting point and end point association degree value pw in the association relation space map gl1 1 Pw associated with start point and end point of association relation space map gl2 2 Calculating pw 1 ×pw 2 A value, if the value is smaller than a predefined threshold value, executing a step r7, otherwise executing a step r4;
step r4, calculating pw 1 -pw 2 If the value is smaller than a predefined threshold value, performing coincidence fusion, otherwise, performing difference fusion; the words which are overlapped and fused as a starting point and a finishing point are fused in parallel, and the related connecting lines take smaller color depth values for fusion; the difference fusion is to rotate the association relation space map gl1 or the association relation space map gl2 by taking the starting point as the center, so that the starting point is fused, the ending points are different, and the association connecting lines take respective color depth values;
and r5, traversing all words to finish the feature space map fusion of the associated words.
5. The policy granular analysis system based on natural language parsing and machine learning according to claim 1, wherein: the dictionary comprises an objective dictionary and a subjective dictionary; the subjective dictionary is a dictionary composed of words indicating policy tendency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310166168.9A CN115859968B (en) | 2023-02-27 | 2023-02-27 | Policy granulation analysis system based on natural language analysis and machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310166168.9A CN115859968B (en) | 2023-02-27 | 2023-02-27 | Policy granulation analysis system based on natural language analysis and machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115859968A CN115859968A (en) | 2023-03-28 |
CN115859968B true CN115859968B (en) | 2023-11-21 |
Family
ID=85658938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310166168.9A Active CN115859968B (en) | 2023-02-27 | 2023-02-27 | Policy granulation analysis system based on natural language analysis and machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115859968B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679041A (en) * | 2017-10-20 | 2018-02-09 | 苏州大学 | English event synchronous anomalies method and system based on convolutional neural networks |
CN108228701A (en) * | 2017-10-23 | 2018-06-29 | 武汉大学 | A kind of system for realizing Chinese near-nature forest language inquiry interface |
CN108733653A (en) * | 2018-05-18 | 2018-11-02 | 华中科技大学 | A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information |
AU2019100371A4 (en) * | 2019-04-05 | 2019-05-16 | Ba, He Mr | A Sentiment Analysis System Based on Deep Learning |
CN109766416A (en) * | 2018-11-27 | 2019-05-17 | 中国电力科学研究院有限公司 | A kind of new energy policy information abstracting method and system |
CN110609983A (en) * | 2019-08-19 | 2019-12-24 | 广州利科科技有限公司 | Structured decomposition method for policy file |
CN113032552A (en) * | 2021-05-25 | 2021-06-25 | 南京鸿程信息科技有限公司 | Text abstract-based policy key point extraction method and system |
CN113254512A (en) * | 2021-04-26 | 2021-08-13 | 中国人民解放军军事科学院国防科技创新研究院 | Military and civil fusion policy information data analysis and optimization system |
CN115455189A (en) * | 2022-10-08 | 2022-12-09 | 浙江浙里信征信有限公司 | Policy text classification method based on prompt learning |
-
2023
- 2023-02-27 CN CN202310166168.9A patent/CN115859968B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679041A (en) * | 2017-10-20 | 2018-02-09 | 苏州大学 | English event synchronous anomalies method and system based on convolutional neural networks |
CN108228701A (en) * | 2017-10-23 | 2018-06-29 | 武汉大学 | A kind of system for realizing Chinese near-nature forest language inquiry interface |
CN108733653A (en) * | 2018-05-18 | 2018-11-02 | 华中科技大学 | A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information |
CN109766416A (en) * | 2018-11-27 | 2019-05-17 | 中国电力科学研究院有限公司 | A kind of new energy policy information abstracting method and system |
AU2019100371A4 (en) * | 2019-04-05 | 2019-05-16 | Ba, He Mr | A Sentiment Analysis System Based on Deep Learning |
CN110609983A (en) * | 2019-08-19 | 2019-12-24 | 广州利科科技有限公司 | Structured decomposition method for policy file |
CN113254512A (en) * | 2021-04-26 | 2021-08-13 | 中国人民解放军军事科学院国防科技创新研究院 | Military and civil fusion policy information data analysis and optimization system |
CN113032552A (en) * | 2021-05-25 | 2021-06-25 | 南京鸿程信息科技有限公司 | Text abstract-based policy key point extraction method and system |
CN115455189A (en) * | 2022-10-08 | 2022-12-09 | 浙江浙里信征信有限公司 | Policy text classification method based on prompt learning |
Non-Patent Citations (1)
Title |
---|
国内外认知计算研究现状及其在图情领域应用研究;郭顺利 等;《情报科学》;137-146 * |
Also Published As
Publication number | Publication date |
---|---|
CN115859968A (en) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9176949B2 (en) | Systems and methods for sentence comparison and sentence-based search | |
CN112069298A (en) | Human-computer interaction method, device and medium based on semantic web and intention recognition | |
CN107562919B (en) | Multi-index integrated software component retrieval method and system based on information retrieval | |
CN110727839A (en) | Semantic parsing of natural language queries | |
US11487943B2 (en) | Automatic synonyms using word embedding and word similarity models | |
CN112800239B (en) | Training method of intention recognition model, and intention recognition method and device | |
US20220114340A1 (en) | System and method for an automatic search and comparison tool | |
CN113705237B (en) | Relationship extraction method and device integrating relationship phrase knowledge and electronic equipment | |
CN112270188A (en) | Questioning type analysis path recommendation method, system and storage medium | |
CN114049505B (en) | Method, device, equipment and medium for matching and identifying commodities | |
CN102346753A (en) | Semi-supervised text clustering method and device fusing pairwise constraints and keywords | |
CN118096452B (en) | Case auxiliary judgment method, device, terminal equipment and medium | |
CN114943220B (en) | Sentence vector generation method and duplicate checking method for scientific research establishment duplicate checking | |
CN116342167A (en) | Intelligent cost measurement method and device based on sequence labeling named entity recognition | |
CN116644148A (en) | Keyword recognition method and device, electronic equipment and storage medium | |
Balaji et al. | Text summarization using NLP technique | |
CN117891958A (en) | Standard data processing method based on knowledge graph | |
CN117648916A (en) | Text similarity recognition model training method and text related information acquisition method | |
CN112131246A (en) | Data center intelligent query statistical method based on natural language semantic analysis | |
CN115859968B (en) | Policy granulation analysis system based on natural language analysis and machine learning | |
Cahyani et al. | Indonesian part of speech tagging using maximum entropy markov model on Indonesian manually tagged corpus | |
CN114239555A (en) | Training method of keyword extraction model and related device | |
CN113190690A (en) | Unsupervised knowledge graph inference processing method, unsupervised knowledge graph inference processing device, unsupervised knowledge graph inference processing equipment and unsupervised knowledge graph inference processing medium | |
CN111666770A (en) | Semantic matching method and device | |
JP4314271B2 (en) | Inter-word relevance calculation device, inter-word relevance calculation method, inter-word relevance calculation program, and recording medium recording the program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |