CN114036921A - Policy information matching method and device - Google Patents

Policy information matching method and device Download PDF

Info

Publication number
CN114036921A
CN114036921A CN202011510821.1A CN202011510821A CN114036921A CN 114036921 A CN114036921 A CN 114036921A CN 202011510821 A CN202011510821 A CN 202011510821A CN 114036921 A CN114036921 A CN 114036921A
Authority
CN
China
Prior art keywords
policy
enterprise
answer
corpus text
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011510821.1A
Other languages
Chinese (zh)
Inventor
张一凡
芦海
谭小龙
车亚生
陈思操
杜强
王虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202011510821.1A priority Critical patent/CN114036921A/en
Publication of CN114036921A publication Critical patent/CN114036921A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a policy information matching method and device, and relates to the technical field of computers. The specific implementation mode of the method comprises the following steps: processing the policy corpus text to obtain a policy answer; obtaining a policy question according to the policy answer; obtaining enterprise answers according to the policy questions and the enterprise corpus texts; calculating the similarity of the policy answer corresponding to the policy question and the enterprise answer, and recommending the policy corpus text corresponding to the policy answer with the N-bit top-ranked similarity to the enterprise according to the sequence from the big degree of similarity to the small degree of similarity. The implementation method can provide the optimal policy suitable for the enterprise according to the enterprise condition, is accurate and timely in policy recommendation, is suitable for different scenes and fields, and reduces the labor cost and the resource consumption in manual processing.

Description

Policy information matching method and device
Technical Field
The invention relates to the technical field of computers, in particular to a policy information matching method and device.
Background
When extracting policy information of government official websites, it is usually necessary to manually screen and summarize the relevant requirements in the newly issued policy, and then match the relevant requirements with the existing qualification information of enterprises to recommend the available policy.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
the manual screening method has high accuracy, but the labor cost is too high, the time efficiency cannot be guaranteed along with the increase of policy release amount, and the hysteresis is realized.
Disclosure of Invention
In view of this, embodiments of the present invention provide a policy information matching method and apparatus, which can provide an optimal policy suitable for an enterprise according to an enterprise situation, accurately and timely recommend the policy, and are suitable for different scenarios and fields, thereby reducing labor cost and resource consumption during manual processing.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a policy information matching method including:
processing the policy corpus text to obtain a policy answer;
obtaining a policy question according to the policy answer;
obtaining enterprise answers according to the policy questions and the enterprise corpus texts;
calculating the similarity of the policy answer corresponding to the policy question and the enterprise answer, and recommending a policy corpus text corresponding to the policy answer with N-bit top similarity ranking to the enterprise according to the sequence of the similarity from large to small, wherein N is a natural number.
Optionally, the processing the policy corpus text includes:
carrying out data cleaning on the policy corpus text, and carrying out word segmentation on the cleaned policy corpus text;
and carrying out named entity recognition on the policy corpus text after word segmentation so as to extract an entity result.
Optionally, the entity result is evaluated, and the entity result is iteratively updated according to the evaluation result until an iteration stop condition is met, so as to obtain a final entity result.
Optionally, the processing the policy corpus text to obtain a policy answer includes:
and constructing a mapping relation between the entity result and words/phrases related to the entity result in the policy corpus text based on the context relation of the entity result to obtain the policy answer.
Optionally, obtaining an enterprise answer according to the policy question and the enterprise corpus text includes:
carrying out data cleaning on the enterprise corpus text, and carrying out word segmentation on the cleaned enterprise corpus text;
and obtaining enterprise answers according to the policy questions and the enterprise corpus texts after word segmentation.
Optionally, the method further comprises:
under the condition that the policy corpus text corresponds to a plurality of policy answers, respectively obtaining the policy question according to each policy answer;
and calculating the policy corpus text corresponding to the policy answers with the similarity rank N bits before according to the total similarity of the policy answers and the enterprise answers, wherein the policy corpus text is in the order of the total similarity from large to small and is recommended to the enterprise.
Optionally, the total similarity according to the policy answers and the enterprise answers includes:
and for each policy question, calculating the similarity of the enterprise answer and the policy answer under the same policy question, taking the average value of the similarities of all policy questions, and recommending the policy corpus text corresponding to the policy with N-bit top rank to the enterprise according to the sequence from the average value from large to small.
Optionally, the method further comprises:
and constructing a policy subsidy dictionary based on the policy answer, wherein the policy subsidy dictionary comprises the corresponding relation between the policy corpus text and the policy answer.
Optionally, the method further comprises:
counting the policy questions of the policy answers in the policy subsidy dictionary;
correspondingly storing the policy corpus text, a policy answer obtained based on the policy corpus text and a policy question corresponding to the policy answer;
obtaining the enterprise answers according to the policy questions and the enterprise corpus texts;
and calculating the policy corpus text corresponding to the policy answers with N top-ranked bits according to the total similarity of the policy answers and the enterprise answers, wherein the policy corpus text is in the order from the maximum to the minimum of the total similarity.
According to still another aspect of an embodiment of the present invention, there is provided a policy information matching apparatus including:
the preprocessing module is used for processing the policy corpus text to obtain a policy answer;
the preprocessing module is also used for obtaining a policy question according to the policy answer;
the reading understanding module is used for obtaining enterprise answers according to the policy questions and the enterprise corpus texts;
and the recommending module is used for calculating the similarity of the policy answer corresponding to the policy question and the enterprise answer, and recommending the policy corpus text corresponding to the policy answer with N-bit top similarity ranking to the enterprise according to the sequence of the similarity from large to small, wherein N is a natural number.
According to another aspect of the embodiments of the present invention, there is provided a policy information matching electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the policy information matching method provided by the present invention.
According to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing the policy information matching method provided by the present invention.
One embodiment of the above invention has the following advantages or benefits: because a subsidy dictionary is established according to a large amount of policy information collected from a government official website and the like, the policy problem is extracted, a policy answer template and an enterprise answer list of the policy problem are obtained by utilizing a Bert reading understanding model, answer similarity is obtained by comparing the policy answer template and the enterprise answer list, and the first N policies are recommended to an enterprise according to the enterprise situation, the technical problems that manual screening cost is high, time effectiveness cannot be guaranteed and hysteresis is provided along with the increase of policy issuing quantity are solved, the latest and enterprise-applicable policy is timely extracted and recommended from the large amount of policy information of the government official website, the optimal policy suitable for the enterprise can be provided according to the enterprise situation, the policy is accurately and timely recommended, the policy is suitable for different scenes and fields, and the labor cost and the resource consumption during manual processing are reduced.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 illustrates an exemplary system architecture diagram of a policy information matching method or policy information matching apparatus suitable for application to an embodiment of the present invention;
fig. 2 is a schematic diagram of a main flow of a policy information matching method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a detailed flow of a policy information matching method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of main blocks of a policy information matching apparatus according to an embodiment of the present invention;
FIG. 5 is a block diagram of a computer system suitable for use with a terminal device implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a diagram showing an exemplary system architecture of a policy information matching method or a policy information matching apparatus suitable for application to an embodiment of the present invention, and as shown in fig. 1, an exemplary system architecture of a policy information matching method or a policy information matching apparatus according to an embodiment of the present invention includes:
as shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be installed with various communication client applications, such as uploading enterprise corpus text or policy corpus text, etc.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as analyzing enterprise corpus texts uploaded by users using the terminal devices 101, 102, 103 to recommend a policy background management server suitable for enterprises; or a background management server for analyzing the policy corpus text uploaded by the user by using the terminal devices 101, 102 and 103 to generate a policy question. That is, the background management server may analyze and perform other processing on the received data such as the corpus text acquisition request, and feed back the processing result (e.g., corpus text) to the terminal devices 101, 102, and 103.
It should be noted that the policy information matching method provided by the embodiment of the present invention is generally executed by the server 105, and accordingly, the policy information matching apparatus is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 is a schematic diagram of a main flow of a policy information matching method according to an embodiment of the present invention, and as shown in fig. 2, the policy information matching method of the present invention includes:
step S201, processing the policy corpus text to obtain a policy answer.
Illustratively, the policy corpus text is processed to extract policy answers therefrom.
Illustratively, policy answers may be extracted directly from the political corpus text.
Further, the processing of the policy corpus text comprises cleaning the policy corpus text and performing word segmentation processing on the cleaned policy corpus text.
Furthermore, named entity recognition is carried out on the policy corpus text after word segmentation so as to extract entity results, and mapping relations between the entity results and related words/phrases are constructed based on context relations of the entity results in the policy corpus text, so that policy answers are obtained.
Step S202, a policy question is obtained according to the policy answer.
Illustratively, policy questions are extracted from the policy answers based on the policy answers.
Step S203, obtaining enterprise answers according to the policy questions and the enterprise corpus texts.
Illustratively, the enterprise corpus text is cleaned, and the cleaned enterprise corpus text is subjected to word segmentation. And inputting the enterprise corpus text and the policy question after word segmentation into a Bert model, outputting an enterprise answer corresponding to the policy question, and taking the enterprise answer as the enterprise answer of the policy question.
Step S203, calculating the similarity between the policy answer corresponding to the policy question and the enterprise answer, and recommending a policy corpus text corresponding to N-bit policy answers with the highest similarity rank to the enterprise according to the sequence of the similarity from high to low, where N is a natural number.
Illustratively, based on the policy question and the policy answer obtained in step S201 and the enterprise answer obtained in step S202, the similarity of the policy question in the enterprise answer and the policy answer corresponding to the policy question is calculated, the enterprise answer and the policy answer are matched, and the policy corpus text corresponding to the policy answer with the similarity ranked N top is recommended to the enterprise in the order of the similarity from large to small, and the higher the ranking is, the higher the matching degree of the policy and the enterprise is. Wherein N is a natural number.
In the embodiment of the invention, a policy answer is obtained by processing the text of the policy corpus; obtaining a policy question according to the policy answer; obtaining enterprise answers according to the policy questions and the enterprise corpus texts; and calculating the similarity of the policy answers corresponding to the policy questions and the enterprise answers, recommending the policy corpus text corresponding to the policy answers with N-bit top-ranked similarity to the enterprise according to the sequence of the similarity from large to small, and the like.
Fig. 3 is a schematic diagram of a detailed flow of a policy information matching method according to an embodiment of the present invention, and as shown in fig. 3, the policy information matching method of the present invention includes:
and S301, cleaning the language material text of the original enterprise.
Illustratively, data cleaning is carried out on data in an original enterprise corpus text through preset data cleaning rules, which may include deleting information such as repeated characters, abnormal characters (extra spaces, messy codes and the like), redundant data, failure data and the like in the text; correcting obvious recognizable error information such as wrongly written characters, wrongly punctuated marks and the like; replacing the repeated information with a new definition to distinguish different objects, etc.; and filling missing information and other flows so as to facilitate subsequent data analysis and storage and ensure the integrity and consistency of the data.
Further, the original enterprise corpus text may be a corpus in a structural form, including: the system comprises fields of enterprise name-A, establishment time-19990101, address-Jiangsu, legal person-B, enterprise property-civil and private enterprises, branch organization-none, credit level-A level, annual enterprise newspaper-XXX, patent number-6000, patent award-XXX and the like, wherein information (namely field values) corresponding to the fields comprises contents of different data types.
Further, the original enterprise corpus text may be a non-structural corpus, and enterprise description is given in the corpus, from which enterprise-related information may be extracted. For example, "A Limited, established in 1999 on 1/1, headquarters located in Jiangsu, legal B, and company type B2C, operated web retail services". The following information, enterprise name-a, time of establishment-19990101, address-jiangsu, enterprise domain-e-commerce can be extracted.
Step S302, cleaning the text of the original policy corpus.
For example, data cleaning is performed on data in the original policy corpus text through preset data cleaning rules, which may include the procedure as described in step S301, so as to facilitate subsequent data analysis and storage, and ensure integrity and consistency of the data.
Further, the original policy corpus may be a structured form corpus, including: the method comprises fields of enterprise type requirements, enterprise property requirements, enterprise establishment time requirements, enterprise region requirements, enterprise credit level requirements, enterprise annual newspaper rating requirements, enterprise patent quantity requirements, enterprise patent award requirements and the like, wherein information (namely field values) corresponding to the fields comprise contents of different data types.
Further, the original policy corpus may be a non-structural corpus, and policy descriptions are given in the corpus, from which policy-related information may be extracted. For example, "support e-commerce aggregation zone construction. The provincial e-commerce demonstration base is assessed to reach the standard every year, main indexes such as the e-commerce transaction amount and the number of e-commerce enterprises reach the provincial creation standard, and 30 ten thousand yuan is awarded. 20 ten thousand yuan is awarded to an electronic commerce gathering area which is newly set in each place and has the building area of 10000 square meters, more than 30 families of electronic commerce enterprises and more than 3 million yuan of annual network sales. The one-time reward is not more than 15 ten thousand yuan for rural electronic commerce special parks (streets) with the building area of 5000 square meters, more than 20 electronic commerce enterprises and more than 5000 ten thousand yuan per year of network sales. The following information can be extracted, wherein the information is the enterprise field of electronic commerce and building area of 10000 square meters and the annual network sales of more than 5000 ten thousand yuan.
Step S303, the policy corpus text is participled.
Illustratively, the policy corpus text after data washing is participled through a Bert-crf model tool, and the process includes: dividing single character and punctuation marks; and labeling the part of speech and the lexeme of the single word, wherein the lexeme comprises: word breaking and non-word breaking. The word breaking indicates that the word is the last word of the word, and the opposite is true if the word is not broken; adding a word delimiter behind the broken word; and forming words by the characters, and segmenting words according to the delimiters to obtain a policy expected text after word segmentation. The result after word segmentation according to the Bert-crf model tool comprises the following steps: single words, phrases, punctuation, etc. For example, the example sentence: the embodiment of the invention provides a policy information matching method, which comprises the following steps of after Bert-crf labeling: this/B generation/I implementation/B case/I provision/B provision/I type/B type/I administration/B strategy/I information/I matching/B allocation/I method/B method/I, word segmentation result: the embodiment of the invention provides a policy information matching method, wherein 'B' represents that the character is not the last character (non-broken character) of a corresponding word, and 'I' represents that the character is the last character (broken character) of the corresponding word.
Further, the user-defined dictionary can be associated to obtain a more accurate word segmentation result, so that the word segmentation accuracy is improved.
Further, the policy corpus text after word segmentation is used for inputting the reading understanding of the subsequent Bert model.
Step S304, identifying the named entity, and extracting the entity result (i.e., entity).
Illustratively, the entity results include one or more of the following initial entities, coarse-grained entities, and fine-grained entities.
Exemplarily, conducting named entity recognition on the original policy corpus text after word segmentation obtained in the step S303 to obtain an initial entity; extracting from the initial entity to obtain a coarse-grained entity; and (4) refining the characteristics of the coarse-grained entities to obtain fine-grained entities. For example: "reward 50 ten thousand yuan to the electronic commerce enterprise that annual invoicing network sales amount exceeds 1 hundred million yuan, the single enterprise accumulates reward upper limit 200 ten thousand yuan", carry on the named entity recognition to the corpus text of this policy, receive the initial entity "annual invoicing network sales amount"; extracting from the initial entity to obtain coarse-grained entity annual network sales; and (4) refining the coarse-grained entities to obtain the annual sales volume of the fine-grained entities.
Further, entities representing occupation, categories and the like can be obtained from the external corpus, and coarse-grained entities and fine-grained entities are obtained.
Step S305, evaluating the fine-grained entity to obtain a final entity result.
Exemplarily, based on the fine-grained entity obtained in step S304, the association relationship among the initial entity, the coarse-grained entity, and the fine-grained entity is obtained through context sensing, etc., and the fine-grained entity is evaluated to obtain a final entity result.
Illustratively, the fine-grained entities are evaluated as follows: the initial entity and the fine-grained entity are corresponded to generate candidate items; counting the occurrence times of the fine-grained entities, inputting the fine-grained entities into a random forest model for training, and obtaining an evaluation result; dividing again according to the direction given by the evaluation result; and after the subdivision is carried out again, carrying out the next round of evaluation, and iteratively updating the entity result until the iteration stop condition is met to obtain the final entity result.
Illustratively, the final entity results include [ annual sales ], [ annual network sales ], [ annual invoicing network sales ], [ annual major business income ]; ② [ total number of enterprise workers ], [ number of employees ], and the like.
Further, optionally, the initial entities and/or coarse grained entities are also evaluated.
Further, the evaluation index may include: the frequency, consistency, information richness, integrity, coverage and the like are segmented according to context perception, the sequence and context relation of the extracted entity information in the original text are reserved, and the characteristics of the entity can be accurately reflected.
And step S306, constructing a subsidy dictionary by using a TruePIE method.
Illustratively, based on the entity result obtained after the adjustment in step S305, according to the position of the entity in the corpus text, the mapping relationship between the entity and the words/phrases related to the entity in the corpus text is constructed by the TruePIE method, and further the entity features are extracted to obtain the concept pair of the entity. The entity result obtained in step S305 may have a plurality of meanings, and after the mapping relationship is constructed based on the truefie method, the concept pair of the entity can reflect the most accurate meaning of the entity. For example, a conceptual pair of entities includes [ 3 billion dollars per year of sales ], [ 5000 ten thousand dollars per year of network sales ]; ② 5000 ten thousand yuan per year sales, 2000 ten thousand yuan per year network sales, etc.
Illustratively, the phrase location mapping relationship is constructed by a schema, which constructs the phrase as a fixed collocation phrase.
Illustratively, a subsidy dictionary is built as an alternate based on the concept pairs of entities, the dictionary including nouns and noun phrases.
Step S307, expanding the subsidy dictionary by using the PU algorithm.
Illustratively, to further improve the applicability of the subsidy dictionary, the subsidy dictionary is augmented with a PU algorithm. Inputting a PU algorithm model for training and classifying according to the subsidy dictionary data and the unmarked corpus text data obtained in the step S306, performing entity prediction on the unmarked policy corpus text, selecting a proper entity result for the unmarked policy corpus text, if the evaluation result in the round is poor, correcting according to the evaluation result, iteratively updating the entity result until the iteration stop condition is met, obtaining the final entity result of the unmarked corpus text, and expanding the subsidy dictionary.
Further, unlabeled corpus text may be obtained from a number of different data sources.
Furthermore, the supplemented subsidy dictionary is rechecked to correct the subsidy dictionary and obtain a more accurate subsidy dictionary.
Step S308, a policy question and answer template is constructed.
Illustratively, policy questions are extracted from the subsidy dictionary based on the subsidy dictionary obtained in step S307. For example, for [ 3 billion annual sales ] in a subsidy dictionary, get a policy question "what is annual sales? ".
In the training stage of the Bert model, the policy corpus text based on the word segmentation obtained in step S303 and the policy question extracted from the subsidy dictionary are input, and read and understood, the position of the policy question answer in the policy corpus text is predicted, and the policy answer corresponding to the policy question is output and used as the standard answer of the policy question.
Illustratively, policy question and answer templates are constructed from policy questions and standard answers obtained during the training phase of the Bert model-based reading. Wherein, the answer template is an answer to the policy question obtained based on the policy corpus text.
Further, for example, the policy question extracted from the subsidy dictionary is "what is the qualification requirement of the enterprise? "," what is the enterprise technological innovation requirement? ", the policy corpus text after word segmentation is: the enterprise qualification requirement is a high and new technology enterprise, and the enterprise technical innovation requirement is a core patent possession. Inputting the policy question and the text after word segmentation into a Bert model, outputting answers of the policy question, and constructing a policy question and answer template, wherein the corresponding relation is as follows: policy question "what is the corporate qualifications requirement? ", policy answer" high and new technology enterprise "; policy question "what is the technical innovation requirement of the enterprise? ", the policy answer" owns the core patent ".
Furthermore, the policy questions of the policy answers in the subsidy dictionary are counted, and a subsidy dictionary base (namely a policy base) is constructed by correspondingly storing the policy corpus text, the policy answers obtained based on the policy corpus text and the policy questions corresponding to the policy answers. In the subsequent processing, if the subsidy dictionary constructed by the new policy corpus text contains the policy answer which is not contained in the subsidy dictionary library, the policy answer and the corresponding item are added to the subsidy dictionary library.
Step S309, a policy question and answer list is constructed.
Illustratively, in the stage of using the Bert model, the enterprise corpus text after data cleaning obtained in step S301 and the policy question obtained in step S308 are input, reading and understanding are performed, the position of the answer to the policy question is predicted, and the enterprise answer corresponding to the policy question is output.
Illustratively, a list of policy questions and answers is constructed from the policy questions and the enterprise answers obtained based on the use phase of the Bert model reading. Wherein, the answer list is the answer to the policy question based on the enterprise corpus text.
Step S310, policy recommendation.
Illustratively, based on the policy question and answer template obtained in step S308 and the policy question and answer list obtained in step S309, the similarity of the policy question in the answer template and the answer list is compared, the answer template and the answer list are matched, the policy (i.e., policy corpus text) corresponding to the answer with the highest similarity is the optimal policy applicable to the enterprise, and the policy corpus text corresponding to the answer template with the highest similarity ranking N is recommended to the enterprise in the order of descending similarity. Wherein, N is a natural number and can be defined as required.
Illustratively, for an answer template and an answer list of a policy question under a policy, constructing a mapping relation between the answer list and the answer template under the same policy question, matching the answer template and the answer list, calculating the similarity between the answer list and the answer template, wherein the policy corresponding to the answer template with the highest similarity is the optimal policy, and recommending a policy corpus text corresponding to the policy answer to an enterprise. The answer list may include a business answer, and the answer template may include a plurality of policy answers.
In one embodiment, if a policy includes multiple policy questions, multiple answer templates should be provided for the policy corpus text corresponding to the multiple policy questions, multiple answer lists should be provided for the enterprise corpus text, for each policy question, a mapping relationship between the answer list and the answer template under the same policy question is constructed, the similarity between the answer list and the answer template in each mapping relationship is calculated, the total similarity of all policy questions is counted, and the higher the total similarity is, the higher the matching degree between the policy corresponding to the answer template and the enterprise is, and the policy corpus text corresponding to the answer template is recommended to the enterprise.
In another embodiment, if a plurality of policies are included, each policy includes a plurality of policy questions, corresponding to the policies, the policy corpus text should provide a plurality of answer templates, the enterprise corpus text should provide a plurality of answer lists, for each policy, a mapping relation between the answer list of each policy question and the answer template under the same policy is constructed, the similarity between the answer list and the answer template in each mapping relation is calculated, the total similarity of all policies is counted, the higher the total similarity is, the higher the matching degree between the policy corresponding to the answer template and the enterprise is, and the policy corpus text corresponding to the answer template of the top N ranked answer templates is recommended to the enterprise according to the sequence of the total similarity from large to small.
Further, the similarity calculation method may be a vector-based algorithm such as euclidean distance, cosine similarity, manhattan distance, or the like, a character-based algorithm such as simhash, or a jaccard similarity coefficient based on probability statistics, or the like.
In the embodiment of the invention, the original enterprise corpus text is cleaned; cleaning the text of the language material of the original policy; dividing words of a policy corpus text; identifying named entities and extracting entity results; evaluating a fine-grained entity to obtain a final entity result; constructing a subsidy dictionary by using a TruePIE method; expanding a subsidy dictionary by utilizing a PU algorithm; constructing a policy question and answer template; constructing a policy question and answer list; and policy recommendation and other steps can provide the optimal policy suitable for the enterprise according to the enterprise condition, the policy is recommended accurately and timely, the method is suitable for different scenes and fields, and the labor cost and the resource consumption during manual processing are reduced.
Fig. 4 is a schematic diagram of main blocks of a policy information matching apparatus according to an embodiment of the present invention, and as shown in fig. 4, the policy information matching apparatus 400 of the present invention includes:
the preprocessing module 401 is configured to process the policy corpus text to obtain a policy answer.
Illustratively, the preprocessing module 401 processes the policy corpus text to extract a policy answer therefrom.
Further, the processing of the policy corpus text includes that the preprocessing module 401 cleans the policy corpus text, and performs word segmentation processing on the cleaned policy corpus text.
Further, the preprocessing module 401 performs named entity recognition on the segmented policy corpus text to extract an entity result, and constructs a mapping relationship between the entity result and a related word/phrase based on a context relationship of the entity result in the policy corpus text to obtain a policy answer.
The preprocessing module 401 is further configured to obtain a policy question according to the policy answer.
Illustratively, the preprocessing module 401 extracts policy questions from the policy answers according to the policy answers.
And a reading understanding module 402, configured to obtain an enterprise answer according to the policy question and the enterprise corpus text.
Illustratively, the preprocessing module 401 cleans the enterprise corpus text, and performs word segmentation on the cleaned enterprise corpus text. The reading understanding module 402 inputs the enterprise corpus text after word segmentation and the policy question into the Bert model, outputs an enterprise answer corresponding to the policy question, and uses the enterprise answer as an enterprise answer of the policy question.
A recommending module 403, configured to calculate similarity between the policy answer corresponding to the policy question and the enterprise answer, and recommend, to an enterprise, a policy corpus text corresponding to the policy answer with N top-ranked similarity in order of the similarity from large to small.
Illustratively, based on the policy question and the policy answer obtained by the preprocessing module 401 and the enterprise answer obtained by the reading understanding module 402, the recommending module 403 calculates the similarity to the policy question in the enterprise answer and the policy answer corresponding to the policy question, matches the enterprise answer with the policy answer, and recommends the policy corpus text corresponding to the policy answer with the similarity rank N before to the enterprise according to the sequence of similarity from large to small, wherein the higher the rank is, the higher the matching degree between the policy and the enterprise is. Wherein N is a natural number.
In the embodiment of the invention, through the modules such as the preprocessing module, the reading understanding module, the recommending module and the like, the optimal policy suitable for enterprises can be given according to the enterprise situation, the policy is recommended accurately and timely, the method is suitable for different scenes and fields, and the labor cost and the resource consumption during manual processing are reduced.
Fig. 5 is a schematic structural diagram of a computer system suitable for implementing a terminal device according to an embodiment of the present invention, and as shown in fig. 5, the computer system 500 of the terminal device according to the embodiment of the present invention includes:
a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the system 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a preprocessing module, a reading understanding module, and a recommending module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the recommending module may also be described as a "module that sends policy information to a connected client.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: processing the policy corpus text to obtain a policy answer; obtaining a policy question according to the policy answer; obtaining enterprise answers according to the policy questions and the enterprise corpus texts; calculating the similarity of the policy answer corresponding to the policy question and the enterprise answer, and recommending the policy corpus text corresponding to the policy answer with the N-bit top-ranked similarity to the enterprise according to the sequence from the big degree of similarity to the small degree of similarity.
According to the technical scheme of the embodiment of the invention, the optimal policy suitable for the enterprise can be given according to the enterprise condition, the policy is recommended accurately and timely, the method is suitable for different scenes and fields, and the labor cost and the resource consumption during manual processing are reduced.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A policy information matching method is characterized by comprising the following steps:
processing the policy corpus text to obtain a policy answer;
obtaining a policy question according to the policy answer;
obtaining enterprise answers according to the policy questions and the enterprise corpus texts;
calculating the similarity of the policy answer corresponding to the policy question and the enterprise answer, and recommending a policy corpus text corresponding to the policy answer with N-bit top similarity ranking to the enterprise according to the sequence of the similarity from large to small, wherein N is a natural number.
2. The method of claim 1, wherein processing the policy corpus text comprises:
carrying out data cleaning on the policy corpus text, and carrying out word segmentation on the cleaned policy corpus text;
and carrying out named entity recognition on the policy corpus text after word segmentation so as to extract an entity result.
3. The method of claim 2, wherein the entity result is evaluated, and the entity result is iteratively updated according to the evaluation result until an iteration stop condition is satisfied to obtain a final entity result.
4. The method of claim 2 or 3, wherein processing the policy corpus text to obtain a policy answer comprises:
and constructing a mapping relation between the entity result and words/phrases related to the entity result in the policy corpus text based on the context relation of the entity result to obtain the policy answer.
5. The method of claim 1, wherein obtaining business answers based on the policy questions and business corpus text comprises:
carrying out data cleaning on the enterprise corpus text, and carrying out word segmentation on the cleaned enterprise corpus text;
and obtaining enterprise answers according to the policy questions and the enterprise corpus texts after word segmentation.
6. The method of claim 1, wherein the method further comprises:
under the condition that the policy corpus text corresponds to a plurality of policy answers, respectively obtaining the policy question according to each policy answer;
and calculating the policy corpus text corresponding to the policy answers with the similarity rank N bits before according to the total similarity of the policy answers and the enterprise answers, wherein the policy corpus text is in the order of the total similarity from large to small and is recommended to the enterprise.
7. The method of claim 6, wherein said determining a total similarity between said plurality of policy answers and said plurality of business answers comprises:
and for each policy question, calculating the similarity of the enterprise answer and the policy answer under the same policy question, taking the average value of the similarities of all policy questions, and recommending the policy corpus text corresponding to the policy with N-bit top rank to the enterprise according to the sequence from the average value from large to small.
8. The method of claim 1, wherein the method further comprises:
and constructing a policy subsidy dictionary based on the policy answer, wherein the policy subsidy dictionary comprises the corresponding relation between the policy corpus text and the policy answer.
9. The method of claim 8, wherein the method further comprises:
counting the policy questions of the policy answers in the policy subsidy dictionary;
correspondingly storing the policy corpus text, a policy answer obtained based on the policy corpus text and a policy question corresponding to the policy answer;
obtaining the enterprise answers according to the policy questions and the enterprise corpus texts;
and calculating the policy corpus text corresponding to the policy answers with N top-ranked bits according to the total similarity of the policy answers and the enterprise answers, wherein the policy corpus text is in the order from the maximum to the minimum of the total similarity.
10. A policy information matching apparatus, comprising:
the preprocessing module is used for processing the policy corpus text to obtain a policy answer;
the preprocessing module is also used for obtaining a policy question according to the policy answer;
the reading understanding module is used for obtaining enterprise answers according to the policy questions and the enterprise corpus texts;
and the recommending module is used for calculating the similarity of the policy answer corresponding to the policy question and the enterprise answer, and recommending the policy corpus text corresponding to the policy answer with N-bit top similarity ranking to the enterprise according to the sequence of the similarity from large to small, wherein N is a natural number.
11. A policy information matching electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-9.
CN202011510821.1A 2020-12-18 2020-12-18 Policy information matching method and device Pending CN114036921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011510821.1A CN114036921A (en) 2020-12-18 2020-12-18 Policy information matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011510821.1A CN114036921A (en) 2020-12-18 2020-12-18 Policy information matching method and device

Publications (1)

Publication Number Publication Date
CN114036921A true CN114036921A (en) 2022-02-11

Family

ID=80134128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011510821.1A Pending CN114036921A (en) 2020-12-18 2020-12-18 Policy information matching method and device

Country Status (1)

Country Link
CN (1) CN114036921A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170045A (en) * 2022-02-16 2022-10-11 江苏省联合征信有限公司 Intelligent analysis system and method for enterprise-benefiting policy
CN115470871A (en) * 2022-11-02 2022-12-13 江苏鸿程大数据技术与应用研究院有限公司 Policy matching method and system based on named entity recognition and relation extraction model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170045A (en) * 2022-02-16 2022-10-11 江苏省联合征信有限公司 Intelligent analysis system and method for enterprise-benefiting policy
CN115170045B (en) * 2022-02-16 2024-02-27 江苏省联合征信有限公司 Intelligent analysis system and method for benefit-enterprise policy
CN115470871A (en) * 2022-11-02 2022-12-13 江苏鸿程大数据技术与应用研究院有限公司 Policy matching method and system based on named entity recognition and relation extraction model
CN115470871B (en) * 2022-11-02 2023-02-17 江苏鸿程大数据技术与应用研究院有限公司 Policy matching method and system based on named entity recognition and relation extraction model

Similar Documents

Publication Publication Date Title
US10095780B2 (en) Automatically mining patterns for rule based data standardization systems
CN112084383A (en) Information recommendation method, device and equipment based on knowledge graph and storage medium
CN111325022B (en) Method and device for identifying hierarchical address
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN111143505A (en) Document processing method, device, medium and electronic equipment
CN112100396A (en) Data processing method and device
CN111191825A (en) User default prediction method and device and electronic equipment
CN114036921A (en) Policy information matching method and device
CN112883730A (en) Similar text matching method and device, electronic equipment and storage medium
CN111753029A (en) Entity relationship extraction method and device
CN113268560A (en) Method and device for text matching
CN111651552A (en) Structured information determination method and device and electronic equipment
CN114692628A (en) Sample generation method, model training method, text extraction method and text extraction device
CN111126073B (en) Semantic retrieval method and device
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN116048463A (en) Intelligent recommendation method and device for content of demand item based on label management
CN115952258A (en) Generation method of government affair label library, and label determination method and device of government affair text
CN115481599A (en) Document processing method and device, electronic equipment and storage medium
CN110895655A (en) Method and device for extracting text core phrase
CN114969371A (en) Heat sorting method and device of combined knowledge graph
CN111368036B (en) Method and device for searching information
CN114416990A (en) Object relationship network construction method and device and electronic equipment
CN114065763A (en) Event extraction-based public opinion analysis method and device and related components

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination