CN115098596B - Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium - Google Patents

Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium Download PDF

Info

Publication number
CN115098596B
CN115098596B CN202210577037.5A CN202210577037A CN115098596B CN 115098596 B CN115098596 B CN 115098596B CN 202210577037 A CN202210577037 A CN 202210577037A CN 115098596 B CN115098596 B CN 115098596B
Authority
CN
China
Prior art keywords
government
result
matters
clustering
carding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210577037.5A
Other languages
Chinese (zh)
Other versions
CN115098596A (en
Inventor
肖国泉
肖克
梁全锐
关海峰
邵罗树
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cape Digital Technology Guangdong Co ltd
Original Assignee
Cape Digital Technology Guangdong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cape Digital Technology Guangdong Co ltd filed Critical Cape Digital Technology Guangdong Co ltd
Priority to CN202210577037.5A priority Critical patent/CN115098596B/en
Publication of CN115098596A publication Critical patent/CN115098596A/en
Application granted granted Critical
Publication of CN115098596B publication Critical patent/CN115098596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/54Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a government affair related data carding method, a government affair related data carding device, government affair related data carding equipment and a readable storage medium, wherein the government affair related data carding method comprises the following steps: acquiring government affair data; based on preset extraction keywords, extracting relevant contents of government matters and relevant contents of application materials from the government matters data to obtain an extraction result; if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result; and performing redundancy elimination treatment on the repeated results in the clustering results to obtain target carding results. The utility model discloses a replace the manual work of inefficiency to comb the mode with artificial intelligence, integrate government affairs thing and application material with the mode of relativity, classify, reach the effect of high efficiency carding government affairs thing and application material.

Description

Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium
Technical Field
The present disclosure relates to the field of government affair information, and in particular, to a government affair related data carding method, apparatus, device and readable storage medium.
Background
The number of government affairs which can be handled by the government affair handling hall is huge, the content and the number of application materials which need to be submitted by each business are different, in order to ensure the standardization of the government affairs and facilitate the masses to prepare related application materials before handling the business, the government affairs and the application materials which need to be submitted corresponding to the related government affairs are generally carded by staff who are skilled in handling the government affairs in a manner of government affair types.
But the government affairs thing that needs to be tidied and the quantity of application material are big, and the light is by the relevant staff to comb with the manpower can't accomplish easily, and can appear the phenomenon that relevant staff combed the mistake in the carding process to lead to relevant staff to carry out the self-checking constantly in the carding process, and then lead to the work efficiency of carding government affairs thing and application material to be low.
Disclosure of Invention
The main purpose of the application is to provide a government affair related data carding method, a government affair related data carding device, government affair related data carding equipment and a readable storage medium, and aims to solve the technical problem of how to improve the efficiency of carding government affair matters and applying materials.
In order to achieve the above object, the present application provides a government affair related data carding method, which includes the following steps;
Acquiring government affair data;
based on preset extraction keywords, extracting relevant contents of government matters and relevant contents of application materials from the government matters data to obtain an extraction result;
if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result;
and performing redundancy elimination treatment on the repeated results in the clustering results to obtain target carding results.
Exemplary, the performing redundancy elimination processing on the repeated result in the clustering result to obtain a target carding result includes:
performing redundancy elimination treatment on repeated results in the clustering results to obtain a preliminary carding result;
converting the preliminary carding result into a visual logic diagram, and outputting the visual logic diagram to a display unit for relevant staff to calibrate the preliminary carding result so as to facilitate the relevant staff to adjust preset extracted keywords;
and after the related staff adjusts the preset extraction keywords, re-combing the government affair data to obtain a target combing result.
The method for re-combing the government affair data after the related staff adjusts the preset extraction keywords, to obtain a target combing result, further includes:
outputting the target carding result to a corresponding platform so that the corresponding platform can display the target carding result outwards; the government affairs are displayed according to the sequence of a preset item catalog, and the application materials are displayed in a mode of setting cascade inquiry; wherein the cascade inquiry is a multi-level logical inquiry set based on the result of the sorting, and the application material to be prepared by the clerk is determined by the cascade inquiry.
The extracting results include a matter extracting result and a material extracting result, wherein the extracting, based on a preset extracting keyword, relevant content of a government matter and relevant content of an application material from the government matter data to obtain an extracting result includes:
based on preset extraction keywords, extracting government matters and application materials required for transacting the government matters from the government matters data;
extracting key contents of the government matters, and extracting relevance among the government matters to obtain a matter extraction result;
And extracting key contents of the application materials, and extracting corresponding relations between the application materials and the government matters to obtain a material extraction result.
The clustering result includes a first clustering result and a second clustering result, and if the extraction result exists in a preset item catalog, performing relevance clustering on the government item and the application material with corresponding relations, and obtaining a clustering result, including:
if the extraction result exists in the preset item catalog, carrying out relevance clustering on the extraction result;
clustering the application materials and the corresponding government matters based on the material extraction result to obtain a first clustering result;
and clustering the government affairs with the association based on the item extraction result to obtain a second aggregation result.
Illustratively, before the acquiring the government affair data, the method further includes:
character recognition is carried out on the original data related to government affairs, and a recognition result is obtained;
and storing the identification result in a preset data storage unit.
The method for removing redundancy from the repeated results in the clustering result includes:
Extracting keywords for judging similarity from the clustering result; the keywords are words with intersection content between the government matters and the application materials;
and if the keywords exist in the clustering results, determining repeated results existing in the clustering results.
For example, to achieve the above object, the present application further provides a government affair related data carding device, where the government affair related data carding device includes:
the acquisition module is used for: the method comprises the steps of acquiring government affair data;
and an extraction module: the method comprises the steps of extracting relevant content of government matters and relevant content of application materials from government matters data based on preset extraction keywords to obtain an extraction result;
and a clustering module: if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result;
and (3) a removal module: and the method is used for performing redundancy elimination processing on the repeated results in the clustering results to obtain target carding results.
For the purpose of achieving the above objects, the present application further provides an apparatus for combing government affair related data, including: the system comprises a memory, a processor and a government affair related data carding program stored in the memory and capable of running on the processor, wherein the government affair related data carding program is configured to realize the steps of the government affair related data carding method.
For example, to achieve the above object, the present application further provides a computer-readable storage medium having stored thereon a government-related data carding program, which when executed by a processor, implements the steps of the government-related data carding method as described above.
Compared with the prior art that staff who need to be proficient in handling government matters comb the government matters and application materials needed when handling government matters, the government matters and the application materials are correspondingly classified and combed, and the working efficiency of combing a huge amount of government matters and application materials is low only by manpower, the invention extracts the original huge amount of government matters data, extracts the main contents of the government matters and the application materials by means of preset extraction keywords, judges the extraction results, integrates and clusters the government matters if the government matters in the extraction results exist in a preset matter catalog, obtains a clustering result of relevance of the government matters and the application materials, and performs redundancy elimination treatment on repeated results in the clustering result to obtain a target combing result. The method comprises the steps of extracting relevant contents in government matters and application materials, clustering according to relevance between the government matters and the application materials, removing repeated results in clustering results after clustering, and obtaining target combing results for finishing data integration and classification, wherein whether combing is correct or not is repeatedly checked when the data are combed manually. Therefore, the efficiency of combing government matters and applying materials is improved.
Drawings
FIG. 1 is a schematic flow chart of a first embodiment of a government affair related data carding method of the present application;
FIG. 2 is a schematic flow chart of a second embodiment of a government affair related data carding method of the present application;
FIG. 3 is a schematic flow chart of a third embodiment of a government affair related data carding method according to the present application;
fig. 4 is a schematic structural diagram of a hardware running environment according to an embodiment of the present application.
The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The application provides a government affair related data carding method, referring to fig. 1, fig. 1 is a flow diagram of a first embodiment of the government affair related data carding method.
The embodiments of the present application provide embodiments of government affair related data carding method, and it should be noted that, although a logic sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different sequence from that herein. For convenience of description, each step of executing the subject description government affair related data carding method is omitted below, and the government affair related data carding method includes:
Step S110: acquiring government affair data;
the government affair data includes the related content of the government affair, such as the name of the government affair, the flow of handling the government affair, or the application material required when handling the government affair.
Illustratively, before obtaining the government affair data, the method further comprises:
step a: character recognition is carried out on the original data related to government affairs, and a recognition result is obtained;
the original data related to the government affairs is generally unstructured data, is usually government affair item policies or transaction guide files and the like, and mainly comprises the government affair item policies, application materials required for handling the government affair related items and the like, wherein the file types comprise various, paper documents, electronic documents or pictures and the like.
By adopting character recognition, unstructured data of the original data related to government affairs can be converted into structured data, so that the follow-up use is facilitated.
The recognition result is illustratively the result obtained after character recognition, including document or file text content recognized from the original material related to the administration, and the like.
Illustratively, the original material is identified by OCR (Optical Character Recognition ) technology, corresponding characters are identified from paper documents, electronic documents or pictures, and the content of the obtained original material is processed correspondingly to obtain an identification result.
When characters are recognized by using OCR technology, collecting and recognizing the original data related to government affairs, obtaining all contents in the original data, after the recognition action is completed, judging, splitting, typesetting and other processes are carried out on the recognized characters, and the recognized original data are arranged, so that a recognition result with a certain order and regularity is arranged.
Step b: and storing the identification result in a preset data storage unit.
The identification result is unstructured data identified from original data related to administrative matters, wherein the unstructured data mainly comprises two large blocks: the first is text; the second is an image, picture, etc. The most essential differences of unstructured data compared to structured data include three levels: unstructured data has a larger capacity than structured data; the speed of generation is faster than structured data; the data sources are diverse.
The preset data storage unit includes a database, a data pool, and the like for storing related data, and the database is described as an example.
According to the corresponding relation between the government affairs and the application materials required by handling the government affairs, a preset relation database is established, the preset relation database is a database which is favorable for storing structured data and flexible for calling internal data, the data in the preset relation database are arranged according to a preset logic structure or a preset relation model to form a logic group which is arranged in rows and columns and has relevant information, and meanwhile, the relation database stores the data in different tables instead of putting all the data in a large warehouse, so that the speed is increased and the flexibility is improved.
The government affair data is unstructured data stored in a preset relational database, and the government affair data comprises related policies of government affair, application materials required when the government affair is handled, and related contents such as places and processes of the government affair.
For example, the preset relational database adopts MySQL (My Structured Query Language, open source database) managed by RDBMS (Relational Database Management System ), SQL (Structured Query Language, structured query language) is used in the relational database, the SQL is a database language with multiple functions of data manipulation, data definition and the like, the language has the characteristic of interactivity, and the database management system should make full use of the SQL language to improve the working quality and efficiency of the computer application system.
Unstructured data in a preset relational database are stored in different tables, instead of placing all the data in the same large database, so that flexibility of calling internal data and calling speed are increased.
The method is characterized in that the government affair matters and application materials which are originally participated by related staff are subjected to combing in an artificial intelligent combing mode, government affair related data are input into an intelligent combing model, and the government affair related data are extracted, classified and combed through the intelligent combing model.
The intelligent carding model classifies government affair classes to which the government affair belongs, and simultaneously carries out corresponding carding on the government affair and the application material, thereby achieving the effect of replacing the carding work of related staff.
Step S120: based on preset extraction keywords, extracting relevant contents of government matters and relevant contents of application materials from the government matters data to obtain an extraction result;
the government affair item data is unstructured data in a preset relational database, the unstructured data contains messy and large amount of information, and relevant important content is extracted from the government affair item data, namely, the unstructured data of the government affair item data are converted into structured data.
The key points in the government affair data are extracted according to preset extraction key words, and a key word library contained in the preset extraction key words is adjusted before extraction according to key words of related important contents, wherein the key words comprise words related to government affair matters and words related to application materials.
When the keyword is a word related to a government affair, the keyword includes a specific name of the government affair, or a transacting place, time, etc., wherein the specific name of the government affair is as follows: the public accumulation, medical insurance of people and society, tax and the like, wherein the transacting place and time are as follows: street administration hall, workday acceptance, etc.
When the keyword is a word related to the application material, the keyword includes specific preparation content of the application material, for example: identity cards, household books, academic cards and the like.
The extraction result is structured data after the content of the government affair data is extracted, wherein the structured data comprises the project of the government affair and the application material required by the transacting project.
The keyword word library of the preset extracted keywords comprises keywords of government matters and keywords of application materials required for handling the government matters, and if data related to the keywords exist in the government matters, the data are extracted and classified into the government matters and the application materials.
When extracting keywords related to government matters, extracting data containing the keywords in the government matters data, and classifying the data into the government matters.
When extracting keywords related to the application materials, extracting data containing the keywords in the government matters data, and classifying the data into the application materials.
Illustratively, the key points in the government affair data are extracted by using a knowledge extraction tool of Deep love (information extraction system), and the extracted data are simply classified according to different types of keywords to form the key point data of the government affair or the key point data of the application material.
Step S130: if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result;
the preset item catalog is based on 36 elements in the implementation list based on the national office function [2016]108, and the item catalog to be carded is set. The preset item catalog includes item names, item codes or item types, etc. The extraction result comprises a plurality of government matters and data related to the government matters, the data contained in the extraction result is classified by taking the preset matters catalog as a reference, and the extraction result which does not exist in the preset matters catalog is removed, namely, only the matters data existing in the preset matters catalog are carded.
And comparing government affairs in the extraction result by taking the preset item catalog as a reference, judging whether the government affairs in the extraction result belong to the preset item catalog, if so, carrying out relevance clustering on the extraction result, and carrying out clustering integration on the government affairs with relevance relations and application materials to obtain a clustering result. Illustratively, when performing relevance clustering, the relevance between the government affairs and the government affairs, and the relevance between the government affairs and the application material or the relevance between the application material and the government affairs are comprehensively considered.
When the association between the government matters is clustered, two conditions exist for the association between the government matters, on one hand, a precedence relationship exists when a plurality of government matters are handled, for example: when the wedding registration and the divorce registration are handled, the divorce registration can be handled only by first handling the wedding registration. Another aspect is that there is a portion where government matters overlap when handling multiple government matters, for example: when medical insurance is conducted by a transacting agent, a plurality of matters such as endowment insurance and medical insurance are required to be conducted respectively.
When the association between the government matters and the application materials or the association between the application materials and the government matters are clustered, the clustering is performed according to the government matters and the application materials required by the government matters, the quantity of the application materials required by the government matters is not equal, three to five materials are fewer in each government matters, more than thirty materials are needed, for example: in handling tax returns, basic information of related personnel, bank card information of related personnel, and the like need to be provided.
Step S140: and performing redundancy elimination treatment on the repeated results in the clustering results to obtain target carding results.
When data are analyzed and clustered, relevance clustering is carried out on each government affair item and each application material, when the government affair items are clustered, the related government affair items and the application materials are clustered, and when the government affair items are clustered, the related government affair items and the application materials are clustered, so that repeated parts exist in a clustering result, and the clustering result is redundant.
The redundant repeated results are exemplified by two cases, namely, on one hand, the relationship exists between the government matters, when the government matters are clustered, the repeated results are generated after the government matters are clustered, and on the other hand, the repeated results are generated when the government matters are clustered with the corresponding application materials.
When the government affairs and the government affairs are clustered and then repeated, certain relevance exists between the first government affairs and the second government affairs, so that when the first government affairs are detected, the first government affairs and the second government affairs are clustered, and when the second government affairs are detected, the second government affairs and the first government affairs are clustered, and the clustering result is repeated.
When the government affairs and the application materials are clustered and then repeated, the government affairs and the application materials are clustered, and the application materials and the government affairs are clustered, so that the clustering result is repeated.
The method comprises the steps of extracting and clustering government matters data, performing redundancy elimination on repeated results in clustered results, and combining the repeated results on one hand, integrating the repeated clustered results into one, and reserving important contents in the repeated results; on the other hand, the redundant items in the repeated results are directly removed, and the combing result of combing is obtained.
The method for removing redundancy from the repeated results in the clustering result includes:
step c: extracting keywords for judging similarity from the clustering result; the third keywords are words with intersection content between the government matters and the application materials;
and performing redundancy elimination processing on the clustering result, and counting the frequency of keywords appearing in the clustering result to judge whether the clustering result has redundancy. The importance of a keyword increases in proportion to the number of times it appears in a file.
The clustering result contains government matters and application materials corresponding to the government matters, and the extracted key words need to comprise the contents of the government matters and the application materials, so that only one part of the government matters or the application materials is prevented from being detected, and the other part is prevented from being ignored.
For example, a TF-IDF (Term Frequency-inverse document Frequency) algorithm is used, which is a statistical method for evaluating the importance of a word to one of a set of documents or a corpus. And calculating the similarity of the clustering result to perform redundancy judgment on the office matters and application materials thereof, and completing the fusion of the matters knowledge base.
The keywords are extracted, for example, to determine the similarity between the clustering results, where the keywords are both in the government matters and in the application materials required for handling the government matters. The names of the government matters exist in the government matters and the application materials.
Step d: and if the keywords exist in the clustering results, determining repeated results existing in the clustering results.
And (3) carrying out keyword detection on each clustering result, recording the clustering result of the occurrence keywords, counting the occurrence times of the keywords in each clustering result, and if the occurrence times of the keywords in the clustering result exceed a preset quantity threshold value, proving that the clustering result takes the keywords as the center, and judging the clustering result as a repeated result.
The preset number threshold is defined according to practical situations, for example: the preset number threshold is set to 10 or 15, etc. Counting the times of the keywords, and determining that the clustering result of the keywords is a repeated result when the times of the keywords are larger than a preset quantity threshold value.
By way of example, a TF-IDF algorithm is adopted to judge government matters and application materials with keywords, namely the government matters and application materials in the clustering result are respectively judged, the occurrence times of the keywords in the government matters and application materials are counted, and whether a repeated result exists in the clustering result is judged according to the occurrence times of the keywords.
Compared with the prior art that staff who need to be proficient in handling government matters comb the government matters and application materials needed when handling government matters, the government matters and the application materials are correspondingly classified and combed, and the working efficiency of combing a huge amount of government matters and application materials is low only by manpower, the invention extracts the original huge amount of government matters data, extracts the main contents of the government matters and the application materials by means of preset extraction keywords, judges the extraction results, integrates and clusters the government matters if the government matters in the extraction results exist in a preset matter catalog, obtains a clustering result of relevance of the government matters and the application materials, and performs redundancy elimination treatment on repeated results in the clustering result to obtain a target combing result. The method comprises the steps of extracting relevant contents in government matters and application materials, clustering according to relevance between the government matters and the application materials, removing repeated results in clustering results after clustering, and obtaining target combing results for finishing data integration and classification, wherein whether combing is correct or not is repeatedly checked when the data are combed manually. Therefore, the efficiency of combing government matters and applying materials is improved.
Referring to fig. 2, fig. 2 is a schematic flow chart of a second embodiment of a government affair related data carding method according to the present application, and based on the first embodiment of the government affair related data carding method according to the present application, the second embodiment is provided, where the method further includes:
step S210: based on preset extraction keywords, extracting government matters and application materials required for transacting the government matters from the government matters data;
the government affair data comprises related contents of the government affair and application materials required by transacting the government affair, meanwhile, corresponding relations exist among the government affair, the quantity of the materials required by transacting one government affair is different, the same application materials can be used for different government affair, and the related relations exist in the government affair data. The extraction process converts the originally complex unstructured data into structured data, thereby facilitating the management of the relationship between government matters and application materials.
Illustratively, relevant content is extracted from the government matter data, including government matters and application materials.
Wherein, draw relevant government affairs thing from the government affair thing data, for example: house transfer, house port migration, etc.
The application materials required for handling the relevant government matters are extracted from the government matters data, for example: real estate certification, contact information of the transacting personnel, identity cards, etc.
Step S220: extracting key contents of the government matters, and extracting relevance among the government matters to obtain a matter extraction result;
there is a certain correlation between government matters, for example: there is a sequential relationship or an associated overlapping relationship between government matters.
The method includes the steps of extracting specific key sentences and vocabularies of first government matters from the government matters, extracting specific key sentences and vocabularies of second government matters from the government matters, comparing the extracted key sentences and vocabularies of the first government matters with the extracted key sentences and vocabularies of the second government matters, and extracting relevance of the government matters if relevance exists, so that a matter extraction result is obtained.
And if the key sentences and the vocabularies are high in similarity after comparison, determining that the first government matters and the second government matters have relevance, and extracting the relevance existing between the government matters to obtain a matter extraction result.
And if the key sentences and the vocabularies are low in similarity after comparison, determining that the first government matters and the second government matters have no relevance, and not extracting the relevance between the government matters.
Step S230: extracting key contents of the application materials, and extracting corresponding relations between the application materials and the government matters to obtain material extraction results;
and extracting a corresponding relation between the application materials and the government affairs, handling one government affairs requires a plurality of application materials, the same application material can be used in different government affairs, the relation between the government affairs and the application materials is complex, and extracting the relation between the government affairs and the application materials and a second keyword of the application materials to obtain a material extraction result.
Illustratively, personal information of the applicant handling the government matters is commonly used for most of the government matters, for example: the basic resident information such as the identity card, the household book and the like is suitable for most government matters.
Step S240: if the extraction result exists in the preset item catalog, carrying out relevance clustering on the extraction result;
if the extraction result exists in the preset item catalog, the extraction result is proved to be government item and application material needing to be combed.
Illustratively, the extraction results include different kinds of government matters, wherein there are government matters which have been replaced or eliminated, or matters which are being transacted.
The extraction result is an existing government matter which needs to be carded, and the extraction result exists in a preset matter catalog, so that the extraction result is subjected to relevance clustering, and the government matters with relevance or the government matters with correspondence and the application materials are clustered.
And if the extraction result is the government matters which are replaced or eliminated, determining that the extraction result does not exist in the preset matter catalogue, so that relevance clustering is not carried out on the extraction result.
Step S250: clustering the application materials and the corresponding government matters based on the material extraction result to obtain a first clustering result;
the first clustering result is a clustering result of application materials and corresponding government matters, and the quantity of the application materials required when handling one government matters is different, for example: the total of 5 parts of application materials required for one government affair or 15 parts of application materials required for one government affair, etc. The same application material also corresponds to a plurality of different government matters, for example: the identity card of the person handling the government affairs is used in most government affairs.
For example, when clustering is performed, government matters and application materials are taken as main materials, application materials required when the government matters and the application materials are handled are clustered, and the application materials and the government matters used by the application materials are clustered, so that a first clustering result is obtained.
When the government matters and application materials required for handling the government matters are clustered, for example: when the wedding registration matters are transacted, materials such as an identity card, a household book and the like are required to be provided, namely the materials such as the identity card, the household book and the like are in one-to-one correspondence with the transacted wedding registration matters, and the wedding registration matters, the materials such as the identity card, the household book and the like are clustered.
Wherein, when clustering application materials and government matters used by the application materials, for example: the tax payment proving material is used for tax declaration or tax refund and other different government matters, and the tax payment proving material is clustered with the different government matters to obtain a first clustering result.
Step S260: clustering the related government matters based on the matters extraction result to obtain a second aggregation result;
there is a correlation between some government matters, such as: and clustering the government matters with a sequence relation or overlapping parts during the government matters handling, so as to obtain a second aggregation result.
For example, the wedding registration and the divorce registration are both in the range of the marital registration, and all the items need to be transacted to the civil office, namely, association exists between the wedding registration and the divorce registration, the wedding registration and the divorce registration are clustered, and the wedding registration and the divorce registration have a sequence relationship, and before the related personnel transact the divorce registration, the related personnel must transact the wedding registration to obtain a second clustering result.
In this embodiment, relevant content is extracted from government matters and application materials, original unstructured data is extracted, key points are extracted and converted into structured data, so that the relationship between the government matters and the application materials is conveniently carded, after the extraction is completed, the extracted structured data is clustered, namely, the government matters and the application materials are clustered and carded, the government matters and the application materials which are in the same type and have the relationship are integrated, and the unstructured data is converted into structured data, so that the efficiency of the carding process is improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of a third embodiment of a government affair related data carding method according to the present application, and based on the first embodiment and the second embodiment of the government affair related data carding method according to the present application, the third embodiment is provided, and the method further includes:
step S310: performing redundancy elimination treatment on repeated results in the clustering results to obtain a preliminary carding result;
the preliminary carding result is a result obtained after carding, at this time, government affair data is carded for the first time by means of preset extraction keywords, and the preliminary carding result is checked to confirm whether errors exist in the carding result or not by considering that the preset extraction keywords may be inaccurate.
For example, the inaccurate keyword in the preset extracted keywords or the few types of the keywords in the preset extracted keywords may cause the error of the subsequent carding result, so that the preliminary carding result is obtained when the first carding is performed, and the preliminary carding result is judged subsequently.
The types of keywords used for extraction in the preset extracted keywords are too similar, so that government matters similar to the keywords are clustered together, relevance among the government matters does not exist, and the result of the combing is wrong.
The method comprises the steps of presetting few types of keywords for extraction in the extracted keywords, leading to inaccurate extraction results, and clustering government matters which are not in close contact originally together in the subsequent carding process, so as to lead to incorrect carding results.
Step S320: converting the preliminary carding result into a visual logic diagram, and outputting the visual logic diagram to a display unit for relevant staff to calibrate the preliminary carding result so as to facilitate the relevant staff to adjust preset extracted keywords;
the preliminary carding result is converted into a visual logic diagram, namely, the data in the preliminary carding result is output outwards, the original tabulated or planarized data is converted into a form of a thinking guide diagram, and the data are displayed in a clearer form, so that related staff can check the preliminary carding result conveniently.
Illustratively, the preliminary combing results are converted into clear mind map by using corresponding mind map generating tools for the relevant staff to check the preliminary combing results.
The adopted mind map tool is a hundred-degree brain map tool or a MindMaster (mind map tool) and the like, and the mind map tool is automatically generated according to the logic of the structured data.
Step S330: and after the related staff adjusts the preset extraction keywords, re-combing the government affair data to obtain a target combing result.
After the mind map is automatically generated, checking the mind map by related staff, checking whether an error exists in the preliminary carding result, if so, improving the preset extracted keywords by the related staff, and carding again by using the improved preset extracted keywords to obtain an accurate carding result.
The method for processing the government affair data according to the present invention further includes:
step e: outputting the target carding result to a corresponding platform so that the corresponding platform can display the target carding result outwards; the government affairs are displayed according to the sequence of a preset item catalog, and the application materials are displayed in a mode of setting cascade inquiry; wherein the cascade inquiry is a multi-level logical inquiry set based on the combing result, and the application materials to be prepared by the clerk are determined through the cascade inquiry;
Before outputting the carding result to the corresponding platform, the carding result is packaged into a functional component for subsequent use of the carding result on the corresponding platform.
The corresponding platform comprises a government affair related display webpage, an online government affair application, a business guide display or window and the like.
The preset item catalogues are arranged according to the order of the major government affairs, each major government affair comprises a plurality of specific government affairs of the major government affairs, for example, the major government affairs are divided based on 36 elements in the national office function [2016]108 text implementation list.
The cascade inquiry is multi-level or multi-level logic judgment inquiry, the basic logic of the cascade inquiry is the corresponding relation between the government affairs and the application materials in the combing result, and specific government affairs are inquired step by step from the major class of the government affairs in a cascade inquiry mode, so that the government affairs which are required to be transacted by related personnel of the government affairs and the application materials required to transact the government affairs are accurately locked.
In this embodiment, the carding result of artificial intelligence carding is converted into the mind map, so that relevant staff can conveniently and manually verify the carding result, the situation that the carding result is inaccurate due to the defect of the artificial intelligence carding is avoided, and the accuracy of the government affair data is improved. Meanwhile, after the carding result is obtained, the carding result is used in each corresponding platform, so that relevant personnel handling the government affairs are convenient to prepare corresponding application materials, and the business happiness of the relevant personnel handling the government affairs is increased.
In addition, the application also provides a government affair related data carding device, a government affair related data carding device includes:
the acquisition module is used for: the method comprises the steps of acquiring government affair data;
and an extraction module: the method comprises the steps of extracting relevant content of government matters and relevant content of application materials from government matters data based on preset extraction keywords to obtain an extraction result;
and a clustering module: if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result;
and (3) a removal module: and the method is used for performing redundancy elimination processing on the repeated results in the clustering results to obtain target carding results.
And an identification module: the method is used for carrying out character recognition on the original data related to government affairs to obtain recognition results;
and a storage module: for storing the recognition result in a preset data unit.
Illustratively, the extraction module includes:
a first extraction subunit: the method comprises the steps of extracting government matters and application materials required for transacting the government matters from the government matters data based on preset extraction keywords;
a second extraction subunit: the method is used for extracting key contents of the government matters and extracting the relevance among the government matters to obtain a matter extraction result;
A third extraction subunit: and the method is used for extracting the key content of the application material, extracting the corresponding relation between the application material and the government matters and obtaining a material extraction result.
Illustratively, the clustering module further comprises:
judging subunit: if the extraction result exists in the preset item catalog, carrying out relevance clustering on the extraction result;
a first clustering subunit: the method comprises the steps of clustering the application materials and the corresponding government matters based on the material extraction results to obtain a first clustering result;
a second subclass subunit: and clustering the government affairs with the association based on the item extraction result to obtain a second aggregation result.
Illustratively, the clustering module further comprises:
fourth extraction subunit: extracting a third keyword for judging similarity from the clustering result; wherein the third keyword includes the first keyword and the second keyword;
determining a subunit: if the third keyword exists in a plurality of clustering results, determining repeated results existing in the clustering results;
illustratively, the removal module further comprises:
Removing the subunit: the method comprises the steps of performing redundancy elimination treatment on repeated results in the clustering results to obtain a primary carding result;
transformant unit: the method comprises the steps of converting a preliminary carding result into a visual logic diagram, and outputting the visual logic diagram to a display unit for relevant staff to check the preliminary carding result, so that the relevant staff can conveniently adjust preset extracted keywords;
carding subunit: the method comprises the steps that after the related staff adjusts the preset extraction keywords, the government affair data are re-carded to obtain a target carding result;
illustratively, the carding subunit further comprises:
an output unit: the target carding result is output to a corresponding platform so that the corresponding platform can display the target carding result outwards; the government affairs are displayed according to the sequence of a preset item catalog, and the application materials are displayed in a mode of setting cascade inquiry; wherein the cascade inquiry is a multi-level logical inquiry set based on the result of the sorting, and the application material to be prepared by the clerk is determined by the cascade inquiry.
The specific implementation manner of the government affair related data carding device is basically the same as that of each embodiment of the government affair related data carding method, and is not repeated here.
In addition, the application also provides government affair related data carding equipment. As shown in fig. 4, fig. 4 is a schematic structural diagram of a hardware running environment according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a hardware operating environment of the government related data carding device.
As shown in fig. 4, the government affair related data carding device may include a processor 401, a communication interface 402, a memory 404 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 404 complete communication with each other through the communication bus 404, and the memory 404 is used for storing a computer program; the processor 401 is configured to implement the steps of the government affair related data carding method when executing the program stored in the memory 404.
The communication bus 404 mentioned above for the government related data grooming device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus 404 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface 402 is used for communication between the government affairs related data carding device and other devices.
The Memory 404 may include a random access Memory (Random Access Memory, RMD) or may include a Non-Volatile Memory (NM), such as at least one disk Memory. Optionally, the memory 404 may also be at least one memory device located remotely from the aforementioned processor 401.
The processor 401 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The specific implementation manner of the government affair related data carding device is basically the same as that of each embodiment of the government affair related data carding method, and is not repeated here.
In addition, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a government affair related data combing program, and the government affair related data combing program realizes the steps of the government affair related data combing method when being executed by a processor.
The specific implementation manner of the computer readable storage medium is basically the same as the above embodiments of the government affair related data carding method, and will not be repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (8)

1. The government affair related data carding method is characterized by comprising the following steps of:
acquiring government affair data;
based on preset extraction keywords, extracting relevant contents of government matters and relevant contents of application materials from the government matters data to obtain an extraction result;
if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result; the clustering result is an association between government matters and government matters, and an association processing result of the association between government matters and application materials or the association between application materials and government matters, wherein the association between government matters and government matters comprises a plurality of government matters with precedence relations and a plurality of government matters with government matters overlapping;
The clustering result comprises a first clustering result and a second clustering result, and if the extraction result exists in a preset item catalog, relevance clustering is carried out on the government item and the application material with corresponding relations, and the clustering result is obtained, and comprises the following steps:
if the extraction result exists in the preset item catalog, carrying out relevance clustering on the extraction result;
clustering the application materials and the corresponding government matters based on a material extraction result to obtain a first clustering result;
clustering the government matters with the association based on the matter extraction result to obtain a second aggregation result;
performing redundancy elimination treatment on repeated results in the clustering results to obtain target carding results;
performing redundancy elimination treatment on repeated results in the clustering results to obtain a preliminary carding result;
converting the preliminary carding result into a visual logic diagram, and outputting the visual logic diagram to a display unit for relevant staff to calibrate the preliminary carding result so as to facilitate the relevant staff to adjust preset extracted keywords;
and after the related staff adjusts the preset extraction keywords, re-combing the government affair data to obtain a target combing result.
2. The government affair related data carding method according to claim 1, wherein after the related staff adjusts the preset extraction keywords, the government affair data is carded again to obtain a target carding result, further comprising:
outputting the target carding result to a corresponding platform so that the corresponding platform can display the target carding result outwards; the government affairs are displayed according to the sequence of a preset item catalog, and the application materials are displayed in a mode of setting cascade inquiry; wherein the cascade inquiry is a multi-level logical inquiry set based on the result of the sorting, and the application material to be prepared by the clerk is determined by the cascade inquiry.
3. The government affair related data carding method according to claim 1, wherein the extracting results include a matter extracting result and a material extracting result, the extracting the related content of the government affair matter and the related content of the application material from the government affair matter data based on a preset extracting keyword, and the extracting result includes:
based on preset extraction keywords, extracting government matters and application materials required for transacting the government matters from the government matters data;
Extracting key contents of the government matters, and extracting relevance among the government matters to obtain a matter extraction result;
and extracting key contents of the application materials, and extracting corresponding relations between the application materials and the government matters to obtain a material extraction result.
4. The government affair related data carding method according to claim 1, wherein before obtaining the government affair data, further comprising:
character recognition is carried out on the original data related to government affairs, and a recognition result is obtained;
and storing the identification result in a preset data storage unit.
5. The government affair related data carding method according to claim 1, wherein the step of performing redundancy elimination processing on the repeated results in the clustering result to obtain a target carding result further comprises:
extracting keywords for judging similarity from the clustering result; the keywords are words with intersection content between the government matters and the application materials;
and if the keywords exist in the clustering results, determining repeated results existing in the clustering results.
6. The utility model provides a government affair related data carding unit which characterized in that, government affair related data carding unit includes:
The acquisition module is used for: the method comprises the steps of acquiring government affair data;
and an extraction module: the method comprises the steps of extracting relevant content of government matters and relevant content of application materials from government matters data based on preset extraction keywords to obtain an extraction result;
and a clustering module: if the extraction result exists in the preset item catalog, carrying out relevance clustering on the government item and the application material with the corresponding relation, and obtaining a clustering result; the clustering result is an association between government matters and government matters, and an association processing result of the association between government matters and application materials or the association between application materials and government matters, wherein the association between government matters and government matters comprises a plurality of government matters with precedence relations and a plurality of government matters with government matters overlapping;
the clustering module is used for: the method is also used for carrying out relevance clustering on the extraction result if the extraction result exists in a preset item catalog; clustering the application materials and the corresponding government matters based on a material extraction result to obtain a first clustering result; clustering the government matters with the association based on the matter extraction result to obtain a second aggregation result;
And (3) a removal module: the method comprises the steps of performing redundancy elimination treatment on repeated results in the clustering results to obtain target carding results;
the removal module: the method is also used for performing redundancy elimination treatment on the repeated results in the clustering results to obtain a primary carding result; converting the preliminary carding result into a visual logic diagram, and outputting the visual logic diagram to a display unit for relevant staff to calibrate the preliminary carding result so as to facilitate the relevant staff to adjust preset extracted keywords; and after the related staff adjusts the preset extraction keywords, re-combing the government affair data to obtain a target combing result.
7. A government affairs related data carding device, the device comprising: a memory, a processor and a government affair related data grooming program stored on the memory and operable on the processor, the government affair related data grooming program configured to implement the steps of the government affair related data grooming method as claimed in any one of claims 1 to 5.
8. A computer readable storage medium, wherein a government affair related data carding program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of the government affair related data carding method according to any one of claims 1 to 5.
CN202210577037.5A 2022-05-25 2022-05-25 Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium Active CN115098596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210577037.5A CN115098596B (en) 2022-05-25 2022-05-25 Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210577037.5A CN115098596B (en) 2022-05-25 2022-05-25 Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN115098596A CN115098596A (en) 2022-09-23
CN115098596B true CN115098596B (en) 2023-04-25

Family

ID=83289235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210577037.5A Active CN115098596B (en) 2022-05-25 2022-05-25 Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115098596B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694963A (en) * 2020-05-11 2020-09-22 电子科技大学 Key government affair flow identification method and device based on item association network
CN111754206A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Government affair service affair granulation combing method
CN112561461A (en) * 2020-09-30 2021-03-26 速聚(福建)科技有限公司 Government affair approval method, system, device and storage medium based on machine learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143443B (en) * 2019-11-29 2024-04-05 数字广东网络建设有限公司 Government affair information display method, device, system, terminal and storage medium
CN111192012B (en) * 2019-12-27 2023-04-14 腾讯云计算(北京)有限责任公司 Item processing method, item processing device, server and storage medium
CN111931866B (en) * 2020-09-21 2021-01-01 平安科技(深圳)有限公司 Medical data processing method, device, equipment and storage medium
CN114049089A (en) * 2021-11-16 2022-02-15 姚刚 Method and system for constructing government affair big data platform
CN114117215A (en) * 2021-11-22 2022-03-01 武汉大学深圳研究院 Government affair data personalized recommendation system based on mixed mode

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694963A (en) * 2020-05-11 2020-09-22 电子科技大学 Key government affair flow identification method and device based on item association network
CN111754206A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Government affair service affair granulation combing method
CN112561461A (en) * 2020-09-30 2021-03-26 速聚(福建)科技有限公司 Government affair approval method, system, device and storage medium based on machine learning

Also Published As

Publication number Publication date
CN115098596A (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN110263248B (en) Information pushing method, device, storage medium and server
WO2021047186A1 (en) Method, apparatus, device, and storage medium for processing consultation dialogue
CN109800320B (en) Image processing method, device and computer readable storage medium
WO2019085064A1 (en) Medical claim denial determination method, device, terminal apparatus, and storage medium
CN108446295B (en) Information retrieval method, information retrieval device, computer equipment and storage medium
CN107491536B (en) Test question checking method, test question checking device and electronic equipment
CN110909123B (en) Data extraction method and device, terminal equipment and storage medium
CN110765760B (en) Legal case distribution method and device, storage medium and server
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
CN112783825B (en) Data archiving method, device, computer device and storage medium
CN111767390A (en) Skill word evaluation method and device, electronic equipment and computer readable medium
CN112835910B (en) Method and device for processing enterprise information and policy information
CN107330076A (en) A kind of network public sentiment information display systems and method
KR102280490B1 (en) Training data construction method for automatically generating training data for artificial intelligence model for counseling intention classification
CN113220875A (en) Internet information classification method and system based on industry label and electronic equipment
CN112801806A (en) Claims settlement method and system based on knowledge graph
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN115098596B (en) Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium
CN116719997A (en) Policy information pushing method and device and electronic equipment
CN116226108A (en) Data management method and system capable of realizing different management degrees
CN108073567A (en) A kind of Feature Words extraction process method, system and server
CN113050933B (en) Brain graph data processing method, device, equipment and storage medium
CN111198943A (en) Resume screening method and device and terminal equipment
CN115017256A (en) Power data processing method and device, electronic equipment and storage medium
CN113807256A (en) Bill data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant