CN109635082B - Policy influence analysis method, device, computer equipment and storage medium - Google Patents
Policy influence analysis method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109635082B CN109635082B CN201811417482.5A CN201811417482A CN109635082B CN 109635082 B CN109635082 B CN 109635082B CN 201811417482 A CN201811417482 A CN 201811417482A CN 109635082 B CN109635082 B CN 109635082B
- Authority
- CN
- China
- Prior art keywords
- text
- policy
- news
- keywords
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 44
- 238000012216 screening Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000004590 computer program Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241000590419 Polygonia interrogationis Species 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to the field of big data, and provides a policy influence analysis method, a policy influence analysis device, computer equipment and a storage medium. The method comprises the following steps: acquiring a policy text, extracting keywords of the policy text, acquiring each news text matched with the policy text according to the keywords, comparing the similarity between the policy text and each news text, screening target news texts with the similarity meeting the preset threshold requirement, identifying subject words of the target news texts, and determining influence results of the policy texts according to the subject words of the target news texts. By taking news texts as comparison objects, screening related news texts and identifying according to subject words to obtain influence results of the policy texts, the efficiency of policy influence analysis is improved.
Description
Technical Field
The present application relates to the field of big data technologies, and in particular, to a policy impact analysis method, a policy impact analysis device, a computer device, and a storage medium.
Background
With the development of big data technology, the specific analysis of various types of data has important influence in various aspects, and taking various policies issued by the government as an example, since the government plays an important role in macroscopic regulation and control on the development of the economy and society, the government comprises a plurality of functional institutions, and the policy information issued by each functional institution can have a certain influence on industries, enterprises and products.
For various policy data of government, the traditional processing mode can realize the acquisition and management of policy texts, and the specific influence scope is generally obtained by layer-by-layer interpretation and analysis of policy documents by analysts, so that the analysis efficiency is low.
Disclosure of Invention
In view of the above, it is desirable to provide a policy influence analysis method, apparatus, computer device, and storage medium capable of improving the efficiency of policy influence analysis.
A policy impact analysis method, the method comprising:
acquiring a policy text, and extracting keywords of the policy text;
Acquiring each news text matched with the policy text according to the keywords;
comparing the similarity between the policy text and each news text, and screening target news texts with the similarity meeting the preset threshold requirement;
Identifying a subject term of the target news text;
and determining the influence result of the policy text according to the subject term of the target news text.
In one embodiment, the obtaining the policy text and extracting the keywords of the policy text includes:
Acquiring a policy text, and extracting a title of the policy text;
traversing a preset stop word library according to the title, and screening stop words contained in the title;
traversing a preset government unit directory according to the title subjected to the screening processing of the stop words, and determining a policy issuing party of the policy text;
carrying out syntactic analysis on the title subjected to the deactivated word screening process to determine policy points of the policy text;
And determining keywords of the policy text according to the policy issuer and the policy key points.
In one embodiment, the obtaining, according to the keyword, each news text matching the policy text includes:
acquiring the release time of the policy text, and determining a news search time range according to the release time;
and searching the news texts according to the keywords to obtain each news text matched with the policy text in the news searching time range.
In one embodiment, the obtaining, according to the keyword, each news text matching the policy text includes:
acquiring a first type news text matched with the title of the policy text according to the title of the policy text;
acquiring a second type news text matched with the keywords according to the keywords;
Splitting the keyword into a plurality of sub-keywords according to the part of speech of the keyword, and obtaining a third-class news text matched with each sub-keyword;
and determining each news text matched with the policy text according to the first type news text, the second type news text and the third type news text.
In one embodiment, the comparing the similarity between the policy text and each news text, and screening the target news text whose similarity meets the preset threshold requirement includes:
Calculating Jaccard similarity coefficients of each news text and the policy text, and determining similarity of each news text and the policy text;
and screening target news texts with calculation results meeting the preset threshold requirements.
In one embodiment, the identifying the subject term of the target news text includes:
Splitting the target news text by taking sentences as units, and extracting core words of each split sentence;
respectively acquiring the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text;
and determining the subject term of the target news text according to the part of speech and the word frequency of the core term.
In one embodiment, the determining the influence result of the policy text according to the subject term of the target news text includes:
Performing named entity recognition processing on the subject words of the target news text, and dividing the subject words into industry feature words, enterprise name related words and product feature words;
Traversing each preset industry feature word library according to the industry feature words, and determining the influence industry of the policy text according to the matching degree of the industry feature words and each preset industry feature word library;
Traversing a word stock of a preset enterprise full abbreviation according to the related words of the enterprise names, and determining the influence enterprise of the policy text;
acquiring a preset product feature word library containing the product feature words according to the product feature words, and determining the influence products of the policy text according to the product information corresponding to the preset product feature word library;
And determining an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text.
A policy impact analysis device, the device comprising:
the keyword extraction module is used for acquiring the policy text and extracting keywords of the policy text;
The news text matching module is used for acquiring each news text matched with the policy text according to the keywords;
The target news text screening module is used for comparing the similarity between the policy texts and the news texts and screening target news texts with the similarity meeting the requirement of a preset threshold;
the subject term identification module is used for identifying the subject term of the target news text;
And the influence result determining module is used for determining the influence result of the policy text according to the subject term.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a policy text, and extracting keywords of the policy text;
Acquiring each news text matched with the policy text according to the keywords;
comparing the similarity between the policy text and each news text, and screening target news texts with the similarity meeting the preset threshold requirement;
Identifying a subject term of the target news text;
and determining the influence result of the policy text according to the subject term of the target news text.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a policy text, and extracting keywords of the policy text;
Acquiring each news text matched with the policy text according to the keywords;
comparing the similarity between the policy text and each news text, and screening target news texts with the similarity meeting the preset threshold requirement;
Identifying a subject term of the target news text;
and determining the influence result of the policy text according to the subject term of the target news text.
According to the method, the device, the computer equipment and the storage medium for analyzing the policy influence, the keywords of the policy texts are extracted by acquiring the policy texts, the related news texts are searched according to the keywords, the similarity between the policy texts and the news texts is compared, and the target news texts are screened, so that the target news texts with high relevance to the policy texts are acquired, the target news texts are subjected to subject word recognition, the influence results of the policy texts are determined according to the subject words, the news texts are used as comparison objects, the related news texts are screened, the influence results of the policy texts are obtained according to the subject word recognition, and the efficiency of analyzing the policy influence is improved.
Drawings
FIG. 1 is an application scenario diagram of a policy impact analysis method in one embodiment;
FIG. 2 is a flow chart of a policy impact analysis method according to one embodiment;
FIG. 3 is a flow chart illustrating sub-steps of step S200 of FIG. 2 in one embodiment;
FIG. 4 is a flow chart illustrating sub-steps of step S300 of FIG. 2 in one embodiment;
FIG. 5 is a block diagram of a policy impact analysis device according to one embodiment;
Fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The policy influence analysis method provided by the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 obtains the policy text, extracts keywords of the policy text, searches related news text according to the keywords, compares similarity between the policy text and the news text, screens target news text, and accordingly obtains target news text with high relevance to the policy text. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, a policy impact analysis method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
Step S200, acquiring the policy text and extracting keywords of the policy text.
The policy text refers to news data related to various policies issued by departments such as government authorities, keywords of the policy text comprise issuers and policy points of the policy text, the server can acquire the news data through a web crawler algorithm to acquire the policy text, extraction of the keywords can be acquired through a dead word screening and syntax analysis method on titles of the policy text, the dead word refers to that in information retrieval, certain words or words are automatically filtered before or after natural language data or text are processed, such as 'about', 'a plurality of' and the like, and the accuracy of keyword extraction is improved through dead word screening and filtering, so that the situation that dead word interference keyword extraction results are avoided.
Step S300, according to the keywords, obtaining each news text matched with the policy text.
According to the keywords, searching the whole network news data, wherein news refers to a genre published by each large news platform and has a certain format architecture, and the format architecture of the news comprises five parts of title, guide language, main body, background and resultant language. News can be classified into news data on a carrier of television, radio, newspaper, magazine, internet advertisement media, mobile internet media according to a propagation medium, and news text refers to news data distributed in a text form. The server searches news data published by each large news platform of the whole network by adopting a web crawler algorithm based on the extracted keywords, and acquires news texts with partial or all keywords in the titles as the news texts matched with the policy texts.
Step S400, comparing the similarity between the policy text and each news text, and screening target news texts with the similarity meeting the preset threshold requirement.
The comparison of the similarity can be processed by means of Jaccard similarity coefficient, euclidean Distance (Euclidean distance) calculation, MANHATTAN DISTANCE (Manhattan distance) calculation and the like, and the greater the Jaccard similarity coefficient value obtained by calculation, the higher the similarity of the policy text and the news text is, taking the Jaccard similarity coefficient as an example. In an embodiment, the obtained news texts can be classified according to the matching degree of the policy texts and the news text keywords, and the similarity between the news texts in each category and the policy texts is calculated respectively.
Step S500, identifying subject words of the target news text.
The method specifically comprises preprocessing news text, including sentence segmentation, word segmentation and stop word removal, part-of-speech filtering, word frequency filtering, and calculating weight of each word in the news text by using TF-IDF (term frequency-inverse text frequency index) to obtain the topic word of the news text. TF-IDF is used to evaluate how important a word is to a certain news text of a plurality of news texts, the importance of a word increasing in proportion to the number of times it appears in a file.
Step S600, determining the influence result of the policy text according to the subject word of the target news text.
The influence of the policy text on the industry can be determined when the subject term of the target news text is a related term related to the industry, and similarly, the influence of the policy text on the enterprise or the product can be determined when the subject term of the target news text is a related term related to the specific enterprise or the product.
According to the policy influence analysis method, the keywords of the policy text are extracted through obtaining the policy text, the related news text is searched according to the keywords, the similarity between the policy text and the news text is compared, and the target news text is screened, so that the target news text with high relevance to the policy text is obtained, the influence result of the policy text is determined according to the subject word through recognizing the subject word of the target news text, the news text is used as a comparison object, the related news text is screened, the influence result of the policy text is obtained according to the subject word recognition, and the policy influence analysis efficiency is improved.
In one embodiment, as shown in fig. 3, step S200, obtaining the policy text, and extracting the keywords of the policy text includes:
step S220, the policy text is acquired, and the title of the policy text is extracted.
Step S230, traversing a preset stop word library according to the title, and screening the stop words contained in the title.
Step S240, traversing the preset government unit directory according to the title subjected to the deactivated word screening process, and determining the policy issuer of the policy text.
And S250, carrying out syntactic analysis on the titles subjected to the stop word screening processing to determine the policy gist of the policy text.
Step S260, determining keywords of the policy text according to the policy issuer and the policy gist.
The policy text is a government official document, generally having standard format requirements, according to which the title of the policy text can be extracted. Keywords in the title generally include policy issuers and policy points. The deactivated word library is a word library constructed in advance by collecting deactivated words, and the deactivated words automatically filter certain words or words before or after processing natural language data or texts, such as about, a plurality of words and the like, and the deactivated words contained in the title are screened and filtered by traversing the preset deactivated word library, so that the processing efficiency is improved, and the interference of invalid words is reduced. And determining the policy issuing party in the title by matching with a preset government unit directory. And adopting syntactic analysis to determine the central phrase relation of the title subjected to stop word labeling and filtering treatment, and screening out policy key points.
In one embodiment, step S300, according to the keywords, acquiring each news text matching the policy text includes:
acquiring the release time of the policy text, and determining the news search time range according to the release time.
And searching the news texts according to the keywords to obtain each news text matched with the policy text in the news searching time range.
The news texts related to the policy texts are usually reloaded and interpreted, and due to the fact that the effectiveness of news is strong, the news search time range is determined according to preset news timeliness requirements by acquiring release time information carried by the policy texts, keywords are used as search basis, the news texts are searched, and each news text with the news release time within the news search time range is acquired.
In one embodiment, as shown in fig. 4, step S300, according to the keywords, acquiring each news text matching the policy text includes:
Step S320, according to the title of the policy text, a first type news text matching with the title of the policy text is acquired.
Step S330, obtaining the second type news text matched with the keywords according to the keywords.
Step S340, splitting the keywords into a plurality of sub-keywords according to the parts of speech of the keywords, and obtaining third-class news texts matched with the sub-keywords.
In step S350, each news text matching the policy text is determined according to the first type news text, the second type news text and the third type news text.
The acquired news texts are the same as the policy texts in content, are interpreted in a popular and easily understood mode to explain the policy texts, and are further analyzed in depth to analyze the influence of the release of the policy texts on various industries. Therefore, it is necessary to classify news texts. The specific implementation mode can be processed through matching of titles or keywords, firstly, a complete matching mode is adopted, the titles of the policy texts are used as the basis, a web crawler algorithm is adopted to obtain news texts which are identical to the titles, and the news texts are recorded as first news texts; according to keywords (including policy key points and policy issuers, wherein the policy issuers comprise full names and short names), acquiring news texts containing all the keywords by adopting a web crawler algorithm, and recording the news texts as second-class news texts; splitting nouns and verbs of keywords according to parts of speech, acquiring news texts containing partial split sub-keywords by adopting a web crawler algorithm, screening according to the number of the included words, reserving the news texts with the half word appearing, and marking the news texts as third-class news texts. According to the first type news text, the second type news text and the third type news text, classification of each news text matched with the policy text is achieved while each news text matched with the policy text is obtained.
In one embodiment, step S400, comparing the similarity between the policy text and each news text, and screening the target news text whose similarity meets the preset threshold requirement includes:
and calculating Jaccard similarity coefficients of the news texts and the policy texts, and determining similarity of the news texts and the policy texts.
And screening target news texts with calculation results meeting the preset threshold requirements.
Calculating the Jaccard similarity coefficient comprises counting the intersection number of each keyword in the policy text and the news text and the union number of each keyword in the policy text and the news text, calculating the ratio of the intersection number to the union number, and obtaining the similarity of the policy text and the news text according to the ratio result, wherein the greater the Jaccard similarity coefficient value is, the higher the similarity of the policy text and the news text is. The preset threshold requirements of the news texts of each category can be set according to needs, for example, a Jaccard similarity coefficient threshold of the news texts of the first category is used for determining whether the news texts are the reloadings of the policy texts, a higher threshold requirement can be set, and the threshold requirements of the news texts of the second category and the news texts of the third category can be properly reduced.
In one embodiment, as shown in fig. 4, step S500, identifying the subject term of the target news text includes:
and step S520, splitting the target news text by taking the sentences as units, and extracting core words of each split sentence.
Step S540, the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text are respectively obtained.
Step S560, determining the subject term of the target news text according to the part of speech and the word frequency of the core word.
The sentence splitting adopts punctuation marks as splitting basis, and target news text is split by taking sentences as units, wherein the punctuation marks comprise periods, exclamation marks, question marks and the like. The method comprises the steps of carrying out syntactic analysis on a split sentence, determining the part of speech of each component word of the sentence, determining and extracting core words of each split sentence, counting the occurrence times of each core word in a target news text, determining the word frequency of the core word, wherein the subject word has a specific part of speech because the subject word is a characteristic word for reflecting the influence result of a policy text, for example, the part of speech recognition result of the characteristic word related to enterprises is a named entity, has a specific meaning, and the subject word is generally repeatedly mentioned in the target news text, so the word frequency of the core word needs to meet the set requirement, and determining the subject word of the target news text according to the part of speech and the word frequency of the core word.
In one embodiment, step S600, determining the impact result of the policy text according to the subject term of the target news text includes:
and carrying out named entity recognition processing on the subject words of the target news text, and dividing the subject words into industry feature words, enterprise name related words and product feature words.
Traversing each preset industry feature word library according to the industry feature words, and determining the influence industry of the policy text according to the matching degree of the industry feature words and each preset industry feature word library.
And traversing a word stock of the preset enterprise full abbreviation according to the related words of the enterprise name, and determining the influence of the policy text on the enterprise.
And obtaining a preset product feature word library containing the product feature words according to the shown product feature words, and determining the influence product of the policy text according to the product information corresponding to the preset product feature word library.
And determining an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text.
Named entity recognition refers to recognition of named entities with special meanings such as personal names, place names and organization structure names in corpus, and in an embodiment, by inputting subject words of target news texts into a pre-constructed entity naming recognition model, the named entities in the subject words of the target news texts are recognized, so that travel characteristic words, enterprise name related words and product characteristic words can be distinguished. The named entity recognition model can be Bilstm model or CRF model, and also can be BiLSTM +CRF model. Influence of policy text industry determination: traversing a preset industry-division feature word library based on the industry feature words, and determining that the industry-division is related to the affected industry, such as environmental protection, pollution discharge, pollution, integer, supervision, garbage, energy saving and the like, related to the policy text when the hit rate of the industry feature words to the industry-division feature word library is higher than a set threshold value. Influence of policy text determination of enterprises: and part-of-speech recognition is carried out on the news text, complete matching processing is carried out according to enterprise full abbreviations, and affected enterprises related to the policy text are determined. Determination of the outcome of the influence of the policy text: the products can be identified according to preset relevant rules, taking loan related products as an example, and prompting all loan products when 'house loan', 'car loan', 'loan interest rate', 'up/down/up/down' occur in the policy, namely the related text.
In the embodiment, the industry-division feature word stock can be constructed according to the industry features, specifically, the industry-division feature word stock can be constructed by dividing the category of the industry-division feature word stock according to 20 industry-division specified by the country, acquiring the related text of the industry-division according to the divided industry-division, counting the co-occurrence word of each related text, and adding the co-occurrence word to the industry-division feature word stock when the co-occurrence word reaches the set threshold requirement.
In an embodiment, a focus list may also be set, which may include information of companies, products, and the like. If for a product, policy points that it may involve are complemented according to public opinion information of the product, if not added manually, e.g. "car loan" product, then the policy points involved may include car loans, second hand cars, loan interest rates, etc. For a product of interest, relevant news text for the product of interest is obtained, including explicitly mentioned and potential content and public opinion. According to the policy text, related content and public opinion can be found out, and the related news text of the concerned product and the content of the policy text or the public opinion and the concerned product are analyzed and compared to determine the intersection point, the intersection degree, the number of coincident public opinion and the like between the related news text and the policy text, so that the relevance of the related news text and the public opinion is judged as the influence degree of the policy on the product.
It should be understood that, although the steps in the flowcharts of fig. 2-4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or steps.
In one embodiment, as shown in fig. 5, there is provided a policy influence analyzing apparatus including:
The keyword extraction module 200 is configured to obtain a policy text, and extract keywords of the policy text;
the news text matching module 300 is configured to obtain each news text matched with the policy text according to the keyword;
the target news text screening module 400 is configured to compare the similarity between the policy text and each news text, and screen target news texts whose similarity meets a preset threshold requirement;
the topic word recognition module 500 is configured to recognize a topic word of a target news text;
the influence result determining module 600 is configured to determine an influence result of the policy text according to the subject term.
In one embodiment, the keyword extraction module 200 is further configured to obtain a policy text, extract a title of the policy text, traverse a preset stop word library according to the title, filter the stop words included in the title, traverse a preset government unit directory according to the title subjected to the stop word filtering process, determine a policy issuer of the policy text, parse the title subjected to the stop word filtering process, determine a policy gist of the policy text, and determine a keyword of the policy text according to the policy issuer and the policy gist.
In one embodiment, the news text matching module 300 is further configured to obtain a release time of the policy text, determine a news search time range according to the release time, search the news text according to the keyword, and obtain each news text matching the policy text in the news search time range.
In one embodiment, the news text matching module 300 is further configured to obtain a first type of news text matching the title of the policy text according to the title of the policy text, obtain a second type of news text matching the keyword according to the keyword, split the keyword into a plurality of sub-keywords according to the part of speech of the keyword, obtain a third type of news text matching each sub-keyword, and determine each news text matching the policy text according to the first type of news text, the second type of news text, and the third type of news text.
In one embodiment, the target news text filtering module 400 is further configured to calculate Jaccard similarity coefficients of each news text and the policy text, determine similarity between each news text and the policy text, and filter target news text whose calculation result meets a preset threshold requirement.
In one embodiment, the subject word recognition module 500 is further configured to split the target news text in terms of sentences, extract core words of each sentence after the splitting, respectively obtain the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text, and determine the subject word of the target news text according to the part of speech and the word frequency of the core word.
In one embodiment, the influence result determining module 600 is further configured to perform a named entity recognition process on the subject word of the target news text, divide the subject word into an industry feature word, a business name related word, and a product feature word, traverse each preset industry feature word bank according to the industry feature word, determine an influence industry of the policy text according to a matching degree of the industry feature word and each preset industry feature word bank, traverse a preset enterprise whole abbreviation word bank according to the business name related word, determine an influence enterprise of the policy text according to the product feature word, obtain a preset product feature word bank including the product feature word according to the product feature word, determine an influence product of the policy text according to product information corresponding to the preset product feature word bank, and determine an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text.
According to the policy influence analysis device, the keywords of the policy text are extracted through obtaining the policy text, the related news text is searched according to the keywords, the similarity between the policy text and the news text is compared, and the target news text is screened, so that the target news text with high relevance to the policy text is obtained, the influence result of the policy text is determined according to the subject word through recognizing the subject word of the target news text, the news text is used as a comparison object, the related news text is screened, the influence result of the policy text is obtained according to the subject word recognition, and the policy influence analysis efficiency is improved.
For specific limitations of the policy influence analysis device, reference may be made to the above limitation of the policy influence analysis method, and the detailed description thereof will be omitted. The respective modules in the above-described policy influence analysis apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing policy impact analysis data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a policy impact analysis method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of:
acquiring a policy text and extracting keywords of the policy text;
acquiring each news text matched with the policy text according to the keywords;
comparing the similarity of the policy texts and the news texts, and screening target news texts with the similarity meeting the preset threshold requirement;
Identifying subject words of the target news text;
And determining the influence result of the policy text according to the subject term of the target news text.
In one embodiment, the processor when executing the computer program further performs the steps of:
Acquiring a policy text, and extracting a title of the policy text;
Traversing a preset stop word library according to the title, and screening the stop words contained in the title;
Traversing a preset government unit directory according to the title subjected to the screening processing of the stop words, and determining a policy issuer of the policy text;
syntactic analysis is carried out on titles subjected to the screening processing of the stop words, and policy key points of the policy text are determined;
and determining keywords of the policy text according to the policy issuer and the policy key points.
In one embodiment, the processor when executing the computer program further performs the steps of:
Acquiring the release time of the policy text, and determining a news search time range according to the release time;
And searching the news texts according to the keywords to obtain each news text matched with the policy text in the news searching time range.
In one embodiment, the processor when executing the computer program further performs the steps of:
acquiring a first type news text matched with the title of the policy text according to the title of the policy text;
Acquiring a second type news text matched with the keywords according to the keywords;
splitting the keywords into a plurality of sub-keywords according to the part of speech of the keywords, and obtaining third-class news texts matched with the sub-keywords;
and determining each news text matched with the policy text according to the first type of news text, the second type of news text and the third type of news text.
In one embodiment, the processor when executing the computer program further performs the steps of:
calculating Jaccard similarity coefficients of each news text and the policy text, and determining similarity of each news text and the policy text;
and screening target news texts with calculation results meeting the preset threshold requirements.
In one embodiment, the processor when executing the computer program further performs the steps of:
Splitting the target news text by taking the sentences as units, and extracting core words of each split sentence;
respectively acquiring the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text;
and determining the subject word of the target news text according to the part of speech and the word frequency of the core word.
In one embodiment, the processor when executing the computer program further performs the steps of:
Performing named entity recognition processing on the subject words of the target news text, and dividing the subject words into industry feature words, enterprise name related words and product feature words;
Traversing each preset industry feature word library according to the industry feature words, and determining the influence industry of the policy text according to the matching degree of the industry feature words and each preset industry feature word library;
traversing a word stock of a preset enterprise full abbreviation according to related words of the enterprise name, and determining an influence enterprise of the policy text;
According to the shown product feature words, a preset product feature word stock containing the product feature words is obtained, and according to product information corresponding to the preset product feature word stock, the influence product of the policy text is determined;
and determining an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text.
According to the computer equipment for realizing the policy influence analysis method, the keywords of the policy texts are extracted through obtaining the policy texts, the related news texts are searched according to the keywords, the similarity between the policy texts and the news texts is compared, the target news texts are screened, so that the target news texts with high relevance to the policy texts are obtained, the influence results of the policy texts are determined according to the subject terms through recognizing the subject terms of the target news texts, the news texts are used as comparison objects, the related news texts are screened, the influence results of the policy texts are obtained according to the subject terms, and the policy influence analysis efficiency is improved.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a policy text and extracting keywords of the policy text;
acquiring each news text matched with the policy text according to the keywords;
comparing the similarity of the policy texts and the news texts, and screening target news texts with the similarity meeting the preset threshold requirement;
Identifying subject words of the target news text;
And determining the influence result of the policy text according to the subject term of the target news text.
In one embodiment, the computer program when executed by the processor further performs the steps of:
Acquiring a policy text, and extracting a title of the policy text;
Traversing a preset stop word library according to the title, and screening the stop words contained in the title;
Traversing a preset government unit directory according to the title subjected to the screening processing of the stop words, and determining a policy issuer of the policy text;
syntactic analysis is carried out on titles subjected to the screening processing of the stop words, and policy key points of the policy text are determined;
and determining keywords of the policy text according to the policy issuer and the policy key points.
In one embodiment, the computer program when executed by the processor further performs the steps of:
Acquiring the release time of the policy text, and determining a news search time range according to the release time;
And searching the news texts according to the keywords to obtain each news text matched with the policy text in the news searching time range.
In one embodiment, the computer program when executed by the processor further performs the steps of:
acquiring a first type news text matched with the title of the policy text according to the title of the policy text;
Acquiring a second type news text matched with the keywords according to the keywords;
splitting the keywords into a plurality of sub-keywords according to the part of speech of the keywords, and obtaining third-class news texts matched with the sub-keywords;
and determining each news text matched with the policy text according to the first type of news text, the second type of news text and the third type of news text.
In one embodiment, the computer program when executed by the processor further performs the steps of:
calculating Jaccard similarity coefficients of each news text and the policy text, and determining similarity of each news text and the policy text;
and screening target news texts with calculation results meeting the preset threshold requirements.
In one embodiment, the computer program when executed by the processor further performs the steps of:
Splitting the target news text by taking the sentences as units, and extracting core words of each split sentence;
respectively acquiring the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text;
and determining the subject word of the target news text according to the part of speech and the word frequency of the core word.
In one embodiment, the computer program when executed by the processor further performs the steps of:
Performing named entity recognition processing on the subject words of the target news text, and dividing the subject words into industry feature words, enterprise name related words and product feature words;
Traversing each preset industry feature word library according to the industry feature words, and determining the influence industry of the policy text according to the matching degree of the industry feature words and each preset industry feature word library;
traversing a word stock of a preset enterprise full abbreviation according to related words of the enterprise name, and determining an influence enterprise of the policy text;
According to the shown product feature words, a preset product feature word stock containing the product feature words is obtained, and according to product information corresponding to the preset product feature word stock, the influence product of the policy text is determined;
and determining an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text.
According to the computer readable storage medium for realizing the policy influence analysis method, the keywords of the policy texts are extracted by acquiring the policy texts, the related news texts are searched according to the keywords, the similarity between the policy texts and the news texts is compared, and the target news texts are screened, so that the target news texts with high relevance to the policy texts are acquired, the influence results of the policy texts are determined according to the subject matters by recognizing the subject matters of the target news texts, the news texts are used as comparison objects, the related news texts are screened, and the influence results of the policy texts are obtained according to the subject matters, and the efficiency of the policy influence analysis is improved.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment policy impact analysis method may be accomplished by instructing the relevant hardware by a computer program, which may be stored in a non-volatile computer readable storage medium, and which, when executed, may include the embodiment flow of the above-described methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
Claims (10)
1. A policy impact analysis method, the method comprising:
acquiring a policy text, and extracting keywords of the policy text;
Acquiring each news text matched with the policy text according to the keywords;
Counting the number of intersections of the keywords in the policy texts and the news texts and the number of union sets of the keywords in the policy texts and the news texts, calculating the ratio of the number of intersections to the number of union sets, and determining the similarity of the news texts and the policy texts according to the ratio;
screening target news texts with calculation results meeting the requirement of a preset threshold;
Identifying a subject term of the target news text;
Performing named entity recognition processing on the subject words of the target news text, and dividing the subject words into industry feature words, enterprise name related words and product feature words;
Traversing each preset industry feature word library according to the industry feature words, and determining the influence industry of the policy text according to the matching degree of the industry feature words and each preset industry feature word library;
Traversing a word stock of a preset enterprise full abbreviation according to the related words of the enterprise names, and determining the influence enterprise of the policy text;
acquiring a preset product feature word library containing the product feature words according to the product feature words, and determining the influence products of the policy text according to the product information corresponding to the preset product feature word library;
Determining an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text;
the step of obtaining each news text matched with the policy text according to the keywords comprises the following steps:
acquiring a first type news text matched with the title of the policy text according to the title of the policy text;
acquiring a second type news text matched with the keywords according to the keywords;
Splitting the keyword into a plurality of sub-keywords according to the part of speech of the keyword, and obtaining a third-class news text matched with each sub-keyword;
and determining each news text matched with the policy text according to the first type news text, the second type news text and the third type news text.
2. The method of claim 1, wherein the obtaining the policy text and extracting keywords of the policy text comprises:
Acquiring a policy text, and extracting a title of the policy text;
traversing a preset stop word library according to the title, and screening stop words contained in the title;
traversing a preset government unit directory according to the title subjected to the screening processing of the stop words, and determining a policy issuing party of the policy text;
carrying out syntactic analysis on the title subjected to the deactivated word screening process to determine policy points of the policy text;
And determining keywords of the policy text according to the policy issuer and the policy key points.
3. The method of claim 1, wherein the obtaining each news text matching the policy text based on the keywords comprises:
acquiring the release time of the policy text, and determining a news search time range according to the release time;
and searching the news texts according to the keywords to obtain each news text matched with the policy text in the news searching time range.
4. The method of claim 1, wherein the identifying the subject term of the target news text comprises:
Splitting the target news text by taking sentences as units, and extracting core words of each split sentence;
respectively acquiring the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text;
and determining the subject term of the target news text according to the part of speech and the word frequency of the core term.
5. A policy impact analysis device, the device comprising:
the keyword extraction module is used for acquiring the policy text and extracting keywords of the policy text;
The news text matching module is used for acquiring each news text matched with the policy text according to the keywords;
The target news text screening module is used for counting the intersection number of each keyword in the policy text and each news text and the union number of each keyword in the policy text and each news text, calculating the ratio of the intersection number to the union number, and determining the similarity of each news text and the policy text according to the ratio; screening target news texts with calculation results meeting the requirement of a preset threshold;
the subject term identification module is used for identifying the subject term of the target news text;
The influence result determining module is used for carrying out named entity recognition processing on the subject words of the target news text and dividing the subject words into industry characteristic words, enterprise name related words and product characteristic words; traversing each preset industry feature word library according to the industry feature words, and determining the influence industry of the policy text according to the matching degree of the industry feature words and each preset industry feature word library; traversing a word stock of a preset enterprise full abbreviation according to the related words of the enterprise names, and determining the influence enterprise of the policy text; acquiring a preset product feature word library containing the product feature words according to the product feature words, and determining the influence products of the policy text according to the product information corresponding to the preset product feature word library; determining an influence result of the policy text according to the influence industry, the influence enterprise and the influence product of the policy text;
The news text matching module is further configured to: acquiring a first type news text matched with the title of the policy text according to the title of the policy text; acquiring a second type news text matched with the keywords according to the keywords; splitting the keyword into a plurality of sub-keywords according to the part of speech of the keyword, and obtaining a third-class news text matched with each sub-keyword; and determining each news text matched with the policy text according to the first type news text, the second type news text and the third type news text.
6. The apparatus of claim 5, wherein the keyword extraction module is further configured to: acquiring a policy text, and extracting a title of the policy text; traversing a preset stop word library according to the title, and screening stop words contained in the title; traversing a preset government unit directory according to the title subjected to the screening processing of the stop words, and determining a policy issuing party of the policy text; carrying out syntactic analysis on the title subjected to the deactivated word screening process to determine policy points of the policy text; and determining keywords of the policy text according to the policy issuer and the policy key points.
7. The apparatus of claim 5, wherein the news text matching module is further configured to: acquiring the release time of the policy text, and determining a news search time range according to the release time; and searching the news texts according to the keywords to obtain each news text matched with the policy text in the news searching time range.
8. The apparatus of claim 5, wherein the topic word recognition module is further to: splitting the target news text by taking sentences as units, and extracting core words of each split sentence; respectively acquiring the part of speech of the core word of each sentence and the word frequency of the core word of each sentence in the target news text; and determining the subject term of the target news text according to the part of speech and the word frequency of the core term.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811417482.5A CN109635082B (en) | 2018-11-26 | 2018-11-26 | Policy influence analysis method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811417482.5A CN109635082B (en) | 2018-11-26 | 2018-11-26 | Policy influence analysis method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109635082A CN109635082A (en) | 2019-04-16 |
CN109635082B true CN109635082B (en) | 2024-08-02 |
Family
ID=66069041
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811417482.5A Active CN109635082B (en) | 2018-11-26 | 2018-11-26 | Policy influence analysis method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635082B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516140A (en) * | 2019-08-15 | 2019-11-29 | 北京泰迪熊移动科技有限公司 | A kind of information processing method, equipment and computer storage medium |
CN110705275B (en) * | 2019-09-18 | 2023-04-25 | 东软集团股份有限公司 | Method and device for extracting subject term, storage medium and electronic equipment |
CN110705285B (en) * | 2019-09-20 | 2022-11-22 | 北京市计算中心有限公司 | Government affair text subject word library construction method, device, server and readable storage medium |
CN111506727B (en) * | 2020-04-16 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Text content category acquisition method, apparatus, computer device and storage medium |
CN111652524A (en) * | 2020-06-11 | 2020-09-11 | 中力数创(重庆)科技有限公司 | Method and device for intelligently matching policy and guiding improvement path |
CN112330342A (en) * | 2020-11-11 | 2021-02-05 | 佰聆数据股份有限公司 | Method and system for optimally matching enterprise name and system user name |
CN113064971A (en) * | 2021-04-12 | 2021-07-02 | 苏州城方信息技术有限公司 | Interactive graph structure-based policy text relation mining and expressing method |
CN113723091A (en) * | 2021-08-17 | 2021-11-30 | 中国光大银行股份有限公司 | Enterprise name identification method and device |
CN114495145B (en) * | 2022-02-16 | 2024-05-28 | 平安国际智慧城市科技股份有限公司 | Policy and document extraction method, device, equipment and storage medium |
CN116992111B (en) * | 2023-09-28 | 2023-12-26 | 中国科学技术信息研究所 | Data processing method, device, electronic equipment and computer storage medium |
CN117520552B (en) * | 2024-01-08 | 2024-04-16 | 北京中科江南信息技术股份有限公司 | Policy text processing method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4859893B2 (en) * | 2008-08-12 | 2012-01-25 | ヤフー株式会社 | Advertisement distribution apparatus, advertisement distribution method, and advertisement distribution control program |
CN107633033A (en) * | 2017-09-08 | 2018-01-26 | 成都链科信息科技有限公司 | A kind of policy big data intelligent Matching system and matching process |
CN108334626B (en) * | 2018-02-12 | 2022-06-10 | 百度在线网络技术(北京)有限公司 | News column generation method and device and computer equipment |
CN108446402A (en) * | 2018-03-31 | 2018-08-24 | 四川久久合创信息技术有限公司 | One kind being used for the matched topological data processing system of policy information |
-
2018
- 2018-11-26 CN CN201811417482.5A patent/CN109635082B/en active Active
Non-Patent Citations (2)
Title |
---|
京津冀节能减排政策目标差异性研究;叶亚琼;中国优秀硕士学位论文全文数据库经济与管理科学辑;第J145-12页 * |
基于语义情感分析的舆情监控系统;邹煌;万方数据;第1-88页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109635082A (en) | 2019-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635082B (en) | Policy influence analysis method, device, computer equipment and storage medium | |
CN109858010B (en) | Method and device for recognizing new words in field, computer equipment and storage medium | |
Stamatatos et al. | Clustering by authorship within and across documents | |
WO2020077896A1 (en) | Method and apparatus for generating question data, computer device, and storage medium | |
US8073877B2 (en) | Scalable semi-structured named entity detection | |
US20210026835A1 (en) | System and semi-supervised methodology for performing machine driven analysis and determination of integrity due diligence risk associated with third party entities and associated individuals and stakeholders | |
US9183286B2 (en) | Methodologies and analytics tools for identifying white space opportunities in a given industry | |
CN111767716B (en) | Method and device for determining enterprise multi-level industry information and computer equipment | |
Chen et al. | Towards robust unsupervised personal name disambiguation | |
US11461371B2 (en) | Methods and text summarization systems for data loss prevention and autolabelling | |
CN110909120B (en) | Resume searching/delivering method, device and system and electronic equipment | |
US20070113292A1 (en) | Automated rule generation for a secure downgrader | |
Mustafa et al. | Multi-label classification of research articles using Word2Vec and identification of similarity threshold | |
CN109710918B (en) | Public opinion identification method, public opinion identification device, computer equipment and storage medium | |
CN107368489B (en) | Information data processing method and device | |
CN105550168A (en) | Method and device for determining notional words of objects | |
US11687647B2 (en) | Method and electronic device for generating semantic representation of document to determine data security risk | |
CN115062135B (en) | Patent screening method and electronic equipment | |
Wang et al. | Cyber threat intelligence entity extraction based on deep learning and field knowledge engineering | |
Pham et al. | Detecting cheapfakes using self-query adaptive-context learning | |
Haider et al. | Corporate news classification and valence prediction: A supervised approach | |
CN111401047A (en) | Method and device for generating dispute focus of legal document and computer equipment | |
Cao et al. | Intention classification in multiturn dialogue systems with key sentences mining | |
CN114580398A (en) | Text information extraction model generation method, text information extraction method and device | |
CN115328945A (en) | Data asset retrieval method, electronic device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |