CN114722801A - Government affair data classification storage method and related device - Google Patents

Government affair data classification storage method and related device Download PDF

Info

Publication number
CN114722801A
CN114722801A CN202011526719.0A CN202011526719A CN114722801A CN 114722801 A CN114722801 A CN 114722801A CN 202011526719 A CN202011526719 A CN 202011526719A CN 114722801 A CN114722801 A CN 114722801A
Authority
CN
China
Prior art keywords
item
description information
target
category
target item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011526719.0A
Other languages
Chinese (zh)
Inventor
王岩琪
贺东华
方标新
韦章兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202011526719.0A priority Critical patent/CN114722801A/en
Publication of CN114722801A publication Critical patent/CN114722801A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Economics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Mathematical Optimization (AREA)
  • Educational Administration (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)

Abstract

The application discloses a government affair data classified storage method and a related device, wherein item description information of target items is obtained, and feature extraction is carried out on the item description information to obtain item feature information corresponding to the target items. And performing classification and identification operation on the item feature information so as to determine the category of the target item corresponding to each item feature information. And storing the item description information of each target item and the category corresponding to the target item in a correlated manner. By the method, the problem that a method for storing government affair data in a classified mode is lacked in the related art is solved as much as possible.

Description

Government affair data classification storage method and related device
Technical Field
The invention relates to the technical field of data processing, in particular to a government affair data classified storage method and a related device.
Background
With the advent of the big data age, electronic government systems have been widely used. E-government affairs are to use modern information and communication technology to integrate management and service by network technology to realize the separation limit of time, space and departments on the internet, and provide high quality, comprehensive, standard and transparent management and service meeting international standard to society. In the related art, the electronic government affair department is divided into several departments. The items managed under each department are numerous, and when the item description information is stored, the item description information belonging to the same department is stored in a unified way.
The inventor found that, in the related art, when storing the event description information of each event under each department (i.e., performing government affair data deposit), all the event description information under the same department is often put together and classified and stored by the related personnel managing the government affair data storage. That is, there is a lack in the related art of a method of classified storage of government affairs data.
Disclosure of Invention
The application aims to provide a government affair data classification storage method and a related device. The method for storing the government affair data in a classified mode is used for solving the problem that the related art lacks a method for storing the government affair data in a classified mode.
In a first aspect, an embodiment of the present application provides a method for classified storage of government affairs data, where the method includes:
acquiring item description information of a target item;
extracting the feature of the item description information to obtain item feature information of the target item;
classifying and identifying the target items based on the item feature information to obtain the category of the target items;
and storing the item description information of the target item in association with the category of the target item.
In some possible embodiments, the performing feature extraction on the item description information to obtain item feature information of the target item includes:
performing word segmentation processing on the item description information to obtain word segmentation results;
determining a plurality of keywords of the item description information from the word segmentation result;
determining a weight for each of the plurality of keywords;
and constructing the item feature information based on the plurality of keywords.
In some possible embodiments, the determining the plurality of keywords of the item description information of the plurality of tables from the word segmentation result includes:
determining TF-IDF values of all words in the word segmentation result by adopting a word frequency-reverse file frequency TF-IDF algorithm, and determining the words of which the TF-IDF values are larger than or equal to a preset threshold value as the keywords.
In some possible embodiments, classifying and identifying the target item based on the item feature information to obtain a category to which the target item belongs includes:
and inputting the plurality of keywords and the preset weight of each keyword into a Bayesian classifier for classification and identification to obtain the category to which the target item belongs.
In some possible embodiments, each keyword corresponds to a preset weight, and if the keyword belongs to the table content in the item description information, the keyword corresponds to a first preset weight; if the keyword does not belong to the table content, the keyword corresponds to a second preset weight; the weight value of the first preset weight is greater than that of the second preset weight.
In some possible embodiments, the category for classification identification is constructed in advance according to the association relation before different matters.
In some possible embodiments, the different items classified into the same category are distinguished using item coding; wherein, each item and the corresponding item code satisfy a one-to-one correspondence relationship.
In some possible embodiments, after storing the item description information of the target item in association with the category to which the target item belongs, the method further comprises:
determining the classification accuracy of a new Bayesian model from the item description information of the stored target item by adopting a ten-fold cross algorithm;
and if the classification accuracy of the new Bayesian model is higher than that of the Bayesian classifier, taking the new Bayesian model as the classification model in the Bayesian classifier.
In a second aspect, an embodiment of the present application provides a government affairs data classification storage device, including:
the description information acquisition module is used for acquiring the item description information of the target item;
the characteristic information acquisition module is used for extracting the characteristics of the item description information to obtain item characteristic information of the target item;
the category confirmation module is used for classifying and identifying the target items based on the item feature information to obtain the category of the target items;
and the item storage module is used for storing the item description information of the target item and the category of the target item in an associated manner.
In some possible embodiments, the feature information obtaining module includes:
the word segmentation result confirming unit is used for carrying out word segmentation processing on the item description information to obtain a word segmentation result;
a keyword acquisition unit, configured to determine a plurality of keywords of the item description information from the word segmentation result;
a weight confirming unit for determining a weight of each of the plurality of keywords;
a feature information construction unit configured to construct the item feature information based on the plurality of keywords.
In some possible embodiments, the keyword obtaining unit, when determining a plurality of keywords of the item description information of the plurality of tables from the word segmentation result, is configured to:
determining TF-IDF values of all words in the word segmentation result by adopting a word frequency-reverse file frequency TF-IDF algorithm, and determining the words of which the TF-IDF values are larger than or equal to a preset threshold value as the keywords.
In some possible embodiments, the category confirmation module is configured to:
and inputting the plurality of keywords and the preset weight of each keyword into a Bayesian classifier for classification and identification to obtain the category to which the target item belongs.
In some possible embodiments, each keyword corresponds to a preset weight, and if the keyword belongs to the table content in the item description information, the keyword corresponds to a first preset weight; if the keyword does not belong to the table content, the keyword corresponds to a second preset weight; the weight value of the first preset weight is greater than that of the second preset weight.
In some possible embodiments, the categories for classifying and identifying are constructed in advance according to the association relation before different matters.
In some possible embodiments, the different items classified into the same category are distinguished using item coding; wherein, each item and the corresponding item code satisfy a one-to-one correspondence relationship.
In some possible embodiments, after the item storage module stores the item description information of the target item in association with the category to which the target item belongs, the item storage module is further configured to:
determining the classification accuracy of a new Bayesian model from the item description information of the stored target item by adopting a ten-fold cross algorithm;
and if the classification accuracy of the new Bayesian model is higher than that of the Bayesian classifier, taking the new Bayesian model as the classification model in the Bayesian classifier.
In a third aspect, another embodiment of the present application further provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute any government affair data classification storage method provided by the embodiment of the application.
In a fourth aspect, another embodiment of the present application further provides a computer storage medium, where a computer program is stored, where the computer program is used to make a computer execute any one of the government affair data classification storage methods provided in the embodiments of the present application.
According to the embodiment of the application, the item description information of the target item is obtained, and the item feature information corresponding to the target item is obtained by performing feature extraction on the item description information. And performing classification and identification operation on the item feature information so as to determine the category of the target item corresponding to each item feature information. And storing the item description information of each target item and the category corresponding to the target item in a correlated manner. By the method, the problem that a method for storing government affair data in a classified mode is lacked in the related art is solved as much as possible.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic illustration of an application environment according to one embodiment of the present application;
FIG. 2a is a flow chart of a government data classification storage method according to an embodiment of the present application;
FIG. 2b is a schematic diagram of transaction description information according to one embodiment of the present application;
FIG. 3 is a diagram of a government data classification storage device according to an embodiment of the present application;
FIG. 4 is a block diagram of an electronic device according to one embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described in detail and clearly with reference to the accompanying drawings. In the description of the embodiments of the present application, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" in the text is only an association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: three cases of a alone, a and B both, and B alone exist, and in addition, "a plurality" means two or more than two in the description of the embodiments of the present application.
In the description of the embodiments of the present application, the term "plurality" means two or more unless otherwise specified, and other terms and the like should be understood similarly, and the preferred embodiments described herein are only for the purpose of illustrating and explaining the present application, and are not intended to limit the present application, and features in the embodiments and examples of the present application may be combined with each other without conflict.
To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method operation steps as shown in the following embodiments or figures, more or fewer operation steps may be included in the method based on conventional or non-inventive labor. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed in the order of the embodiments or the method shown in the drawings or in parallel in the actual process or the control device.
The inventor has found that in the related art, when storing event description information under each department, all the event description information under the same department are put together and classified and stored by related personnel who manage government affair data storage. Because the classified storage operation is artificially executed, the problems of low efficiency, wrong classified storage and the like exist, and the accurate certificate storage and archiving of government affair data are not facilitated. Based on the above, multiple categories can be preset in all matters managed by each department, and the item description information of the application scenes corresponding to the matters and/or the areas to which the matters belong, which are similar, can be stored in the same preset category. The invention conception of the application is as follows: and acquiring the item description information corresponding to each target item from each target item used for government affair data storage of each department. And performing word segmentation processing on the item description information, wherein the word segmentation processing specifically comprises operations of removing sensitive words, exclamation words and stop words in the words, correcting wrongly written words in the words and the like. After the word segmentation processing operation is performed on the item description information, a TF-IDF (Term Frequency-Inverse text Frequency index) algorithm can be adopted to obtain a TD-IDF value of each word, and the word with the TD-IDF value larger than a preset threshold value in each word is determined as a keyword. And forming a key phrase by all the key words under the same target item, and classifying the key phrase serving as the characteristic information by adopting a Bayesian classifier. And finally, storing the classification result of the target item and the item description information corresponding to the target item in a correlated manner. By the method, the problem that a method for storing government affair data in a classified mode is lacked in the related art is solved as much as possible.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The following describes in detail a government affair data classification storage method in the embodiment of the present application with reference to the drawings.
Referring to fig. 1, a schematic diagram of an application environment according to an embodiment of the present application is shown.
As shown in fig. 1, the application environment may include, for example, a network 10, a server 20, at least one terminal device 30, and a database 40.
The server 20 has a database 40 associated therewith, and the database 40 stores therein the target items in each preset classification and item description information corresponding to the target items. The terminal device 30 may be a terminal device with a networking function, such as a smart phone, a desktop computer, or a portable computer. The terminal device 30 may upload, to the server 20, the target item to which each department belongs and the item description information corresponding to each target item based on the network 10. When each department stores government affair information, the network 10 reports each target affair and affair description information of the target affair to the server 20 in units of department to which the target affair belongs. After receiving the target items submitted by each department and the item description information of the target items, the server 20 performs a word segmentation processing operation on each item description information, and screens out the keywords for judging the preset category of the target items corresponding to each item description information through the TF-IDF algorithm. And taking the same item description information as a unit, and forming a keyword group by using each keyword screened out from the same item description information. And determining the preset classification of the target item corresponding to the key phrase by taking the key phrase as item characteristic information through a Bayesian classifier. After determining the preset classification to which each target item belongs, storing each target item and the corresponding preset classification in the database 40 in an associated manner.
In some possible embodiments, a plurality of keywords extracted from the item description information of each target item and the preset weight of each keyword are input into a bayesian classifier for classification and recognition, so as to obtain a preset category to which the target item belongs.
In some possible embodiments, each keyword in the item feature information corresponds to a preset weight value. Since the table contents in the item description information generally appear only under the same target item, but not the table contents may appear under different target items for a plurality of times, in order to ensure the classification accuracy of the target items, when the keywords are extracted from the item description information by the TF-IDF algorithm, the preset weight value corresponding to the keywords extracted from the table contents in the item description information is 1. And the preset weight values corresponding to the other keywords are 0.5.
In some possible embodiments, when the passenger makes a route query, the passenger may further query the route based on the filtering condition. The screening conditions may include total trip distance, total trip time, and the number of transfer vehicles. After the screening conditions are selected, the passengers can respectively obtain riding route sequences according to the total travel route from near to far, the total travel time from short to long and the number of the transfer vehicles from small to large.
In order to facilitate understanding of the method for storing government affair data in a classified manner according to an embodiment of the present application, the present application provides an overall flowchart 2a of the method for storing government affair data in a classified manner, and as shown in fig. 2a in detail, the method includes the following steps:
step 201: acquiring the item description information of the target item.
Existing e-government departments include several departments. Each department manages a plurality of target items. For example, some departments include a large number of target items such as declaration of a residential home, declaration of a name of an enterprise, and declaration of an operation address. When storing the item description information of each target item under each department (the item description information is a material form filled by the user when applying for the target item and a submitted front material), it is necessary to acquire each target item under each department and the item description information corresponding to each target item from the text data submitted by each department (the text data includes the item description information corresponding to each target item).
In the related technology, the text data submitted by the department is certified and archived by artificially executing classified storage operation, and the method has the problems of low efficiency, wrong classified storage and the like. In order to solve the problem, a plurality of classifications are preset from all target matters managed by a department, each target matter with a similar application scene and/or a similar field to which the matters belong is taken as related target matters, and the item description information corresponding to each related target matter is placed in the same preset classification for storage. For example, "business name registration", "limited liability company", etc., and the business-related item description information may be stored in a preset category of "business declaration".
Since the execution flows of classifying and storing the text data of each department are the same, in order to facilitate understanding of the present solution, only how to classify and store the text data submitted by one department will be explained below.
After extracting each target item and item description information corresponding to each target item from the text data submitted by the department, executing step 202: and extracting the characteristics of the item description information to obtain item characteristic information of the target item.
In order to determine the preset classification corresponding to each item description information, a feature extraction operation may be performed on the item description information in a manner of a preset keyword, all keywords extracted from the item description information form a keyword group, and the keyword group is used as feature information corresponding to the item description information.
When the keywords are preset, a plurality of specific words capable of representing target matters can be selected as the keywords according to actual conditions. Taking the item description information (i.e., each table in the figure) corresponding to the target item "business name declaration" shown in fig. 2b as an example, keywords including "business name declaration", "delivery bill of materials", and "business scope" may be set.
When the feature extraction is performed on the item description information, the item description information is subjected to word segmentation processing to obtain each word forming the item description information, namely a word segmentation result. The word segmentation processing is performed to screen out keywords from the obtained words. Since some sensitive words, word sigh words and stop words which cannot be used as keywords usually exist in the item description information, in order to reduce the workload of keyword screening, in the process of word segmentation processing, the sensitive words such as e, and then, etc. and their sigh words and stop words used for representing non-retrieval words in computer retrieval need to be removed.
In some possible embodiments, in order to ensure the accuracy of the keyword screening, when performing a word segmentation processing operation on the item description information, a legal third-party platform software may be used to perform text correction on the item description information. And further, the condition that the preset keywords are not identified due to wrongly written characters in the item description information is avoided.
And after all the words corresponding to the item description information are obtained according to the word segmentation result, the TF-IDF value of each word in the word segmentation result can be calculated by adopting a TF-IDF algorithm. The TF-IDF algorithm is used for calculating the importance of a word to a text (the larger the TF-IDF value of the word is, the greater the importance of the word in the text is), and based on the TF-IDF algorithm, the word with the TF-IDF value meeting the preset threshold value can be determined as a keyword. The formula of the TF-IDF algorithm is shown as the following formula (1):
Figure RE-GDA0003029244670000101
wherein, TF (Term Frequency) is used to represent the Frequency of the Term appearing in the item description information; an IDF (Inverse Document Frequency) is used to represent the discriminative power of the word; i is expressed as the 1-i-th word in the word segmentation result.
And after the TF-IDF value of each word is obtained according to the formula, determining the key words corresponding to the item description information from each word by comparing the TF-IDF value with a preset threshold value. And forming a key phrase by each key word, and taking the key phrase as the item characteristic information of the target item.
After acquiring the item feature information of the target item, step 203 is executed: and classifying and identifying the target item based on the item feature information to obtain the category of the target item.
In implementation, a bayesian classifier can be used to perform classification and identification operations on the event feature information, determine the preset category to which the event feature information belongs, and further determine the preset category to which the target event corresponding to the event feature information belongs.
The basic principle of classifying and identifying the transaction characteristic information through the Bayesian classifier is to calculate the probability that each keyword in the transaction characteristic information belongs to a preset class. When the method is implemented, the probability that each keyword in the item feature information belongs to the preset category can be calculated through a Bayesian formula, the probabilities corresponding to the keywords are summed, and then whether the target item corresponding to the item feature information belongs to the preset category is determined according to the comparison between the summation result and the preset probability threshold. Wherein, the Bayesian formula is shown as the following formula (2):
Figure RE-GDA0003029244670000102
wherein classkExpressed as the kth preset category; wiThe ith keyword in the item feature information is represented; p (w)i|classk) Expressed as a keyword WiBelong to a predetermined classkThe probability of (c).
The inventor considers that in the problem of performing classification processing on texts, when a certain word does not appear in a trained sample, that is, the invocation rate of the word is 0, the numerator in formula (2) is 0. Based on the principle of laplacian smoothing, assuming that the number of Wi is large, the change of the estimated probability caused by adding 1 to the count of each Wi can be ignored, but the problem of zero probability can be conveniently and effectively avoided. For example, assume that in the text classification, there are 3 classes, C1, C2, C3, and in the specified training sample, the observation counts in each class are 0, 990, and 10, respectively. Then the probabilities of K1 are 0, 0.99, and 0.01. The calculation of laplacian smoothing is used for these three words: (0+1)/(1000+1 × 3) ═ 0.001; (990+1)/(1000+1 × 3) ═ 0.988; (10+1)/(1000+1 × 3) ═ 0.011; therefore, after +1 operation is executed on each type and + number operation is executed in the denominator, the change of the obtained result and the original result can be ignored, and the zero probability problem is effectively avoided. Based on this, equation (2) can be optimized to the following equation (3):
Figure RE-GDA0003029244670000111
wherein classkExpressed as the kth preset category; wiThe ith keyword in the item feature information is represented; p (w)i|classk) Expressed as a keyword WiBelong to a predetermined classkThe probability of (d); n is the number of keywords included in the item feature information, and n is larger than or equal to i.
In addition, the inventor also considers that in the practical application scenario, the item description corresponding to part of the target item not only includes the table content shown in fig. 2b, but also needs to deliver the material content. Since the same material content may be used as the content in the item description in different target items, for example, in a plurality of target items such as "maternal and child health care management application" and "personal food management company application", the "health certificate" needs to be submitted as the material content. However, the table contents shown in fig. 2b are usually different in the event description information corresponding to different target events. Therefore, to ensure the accuracy of classification, the table content in the item description information is preset with a weight of 1, and the material content is preset with a weight of 0.5. When the probability of each keyword is calculated by using the above formula (3), the product of the keyword and the corresponding weight needs to be brought in. For example, W in the above formula (3)iIf the item is the table content in the item description information, determining the WiThe probability of (c) is calculated by the above equation (3). If WiFor the submitted material content in the item description information, the W is determinediThe probability of (c) is calculated by the following formula (4):
Figure RE-GDA0003029244670000112
wherein classkExpressed as the kth preset category; wiThe ith keyword in the item feature information is represented; p (w)i|classk) Expressed as a keyword WiBelong to a predetermined classkThe probability of (d); n is the number of keywords included in the item feature information, and n is larger than or equal to i.
And (3) obtaining the probability of each keyword in the item feature information through the formula (3) and the formula (4), summing the probabilities corresponding to the keywords, and if the summation result is greater than the probability threshold of the preset classification, indicating that the target item corresponding to the item feature information belongs to the preset classification. And if the summation result is smaller than the probability threshold value of the preset classification, the target item corresponding to the item feature information does not belong to the preset classification.
Step 204: and storing the item description information of the target item in association with the category of the target item.
The object is to ensure that the item description information of each object item can be accurately found after classifying a plurality of object items to which the goal belongs. The transaction description information considering the target transaction has a unique identifier, which may be added to each table and submission material in the transaction description information. The identification may be an entry code in the table of fig. 2 b. Based on this, different target matters classified into the same category can be distinguished according to the matter code.
In some possible embodiments, after all target items of a department are classified and stored, when the archived government affair data is called, the item description information corresponding to the target item can be quickly found according to the preset classification name and the item code of the target item to be extracted.
In addition, in order to ensure the accuracy of the classification result, the Bayesian classifier can be corrected based on the stored item description information corresponding to each target item. The system can periodically extract the item feature information corresponding to the n target items from the stored classification records. The Bayes classifier can compare the accuracy of the newly trained Bayes model with the accuracy of the model currently used according to a ten-fold cross algorithm for testing the accuracy of the classifier, and if the accuracy of the new model is higher than that of the existing model, the new model is used as the Bayes classifier.
In some possible embodiments, the event description information of n target events is selected from the stored government affair certificate classification records, and the event description information is randomly divided into ten equal parts. And taking nine parts of the new Bayesian classifier to train the Bayesian model each time, verifying the accuracy of the model by using the tenth data, repeating the steps for ten times, recording the accuracy of the new Bayesian classifier, comparing the accuracy of the newly trained Bayesian model with the accuracy of the model which is currently used, and updating the model if the accuracy of the new model is higher than that of the existing model.
Based on the same inventive concept, the present application further provides a government affairs data classification storage device 300, as shown in fig. 3, comprising:
a description information obtaining module 301, configured to obtain item description information of a target item;
a feature information obtaining module 302, configured to perform feature extraction on the item description information to obtain item feature information of the target item;
a category confirmation module 303, configured to perform classification and identification on the target item based on the item feature information to obtain a category to which the target item belongs;
the item storage module 304 is configured to associate and store the item description information of the target item with the category of the target item.
In some possible embodiments, the feature information obtaining module includes:
the word segmentation result confirming unit is used for carrying out word segmentation processing on the item description information to obtain a word segmentation result;
a keyword acquisition unit, configured to determine a plurality of keywords of the item description information from the word segmentation result;
a weight confirming unit for determining a weight of each of the plurality of keywords;
a feature information construction unit configured to construct the item feature information based on the plurality of keywords.
In some possible embodiments, the keyword obtaining unit, when determining a plurality of keywords of the item description information of the plurality of tables from the word segmentation result, is configured to:
determining TF-IDF values of all words in the word segmentation result by adopting a word frequency-reverse file frequency TF-IDF algorithm, and determining the words of which the TF-IDF values are larger than or equal to a preset threshold value as the keywords.
In some possible embodiments, the category confirmation module is configured to:
and inputting the plurality of keywords and the preset weight of each keyword into a Bayesian classifier for classification and identification to obtain the category to which the target item belongs.
In some possible embodiments, each keyword corresponds to a preset weight, and if the keyword belongs to the table content in the item description information, the keyword corresponds to a first preset weight; if the keyword does not belong to the table content, the keyword corresponds to a second preset weight; the weight value of the first preset weight is greater than that of the second preset weight.
In some possible embodiments, the categories for classifying and identifying are constructed in advance according to the association relation before different matters.
In some possible embodiments, the different items classified into the same category are distinguished using item coding; wherein, each item and the corresponding item code satisfy a one-to-one correspondence relationship.
In some possible embodiments, after the item storage module stores the item description information of the target item in association with the category to which the target item belongs, the item storage module is further configured to:
determining the classification accuracy of a new Bayesian model from the item description information of the stored target item by adopting a ten-fold cross algorithm;
and if the classification accuracy of the new Bayesian model is higher than that of the Bayesian classifier, taking the new Bayesian model as the classification model in the Bayesian classifier.
The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 4. The electronic device 130 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include programs/utilities 1325 having a set (at least one) of program modules 1324, such program modules 1324 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, the various aspects of a government data classification storage method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in one of the monitoring methods according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for government data classified storage of the embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and block diagrams, and combinations of flows and blocks in the flow diagrams and block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (11)

1. A government affair data classification storage method is characterized by comprising the following steps:
acquiring item description information of a target item;
extracting the feature of the item description information to obtain item feature information of the target item;
classifying and identifying the target items based on the item feature information to obtain the category of the target items;
and storing the item description information of the target item in association with the category of the target item.
2. The method according to claim 1, wherein said extracting the feature of the item description information to obtain item feature information of the target item comprises:
performing word segmentation processing on the item description information to obtain a word segmentation result;
determining a plurality of keywords of the item description information from the word segmentation result;
determining a weight for each of the plurality of keywords;
and constructing the item feature information based on the plurality of keywords.
3. The method of claim 2, wherein determining a plurality of keywords for the item description information for the plurality of tables from the word segmentation results comprises:
determining TF-IDF values of all words in the word segmentation result by adopting a word frequency-reverse file frequency TF-IDF algorithm, and determining the words of which the TF-IDF values are larger than or equal to a preset threshold value as the keywords.
4. The method according to claim 2, wherein classifying and identifying the target item based on the item feature information to obtain a category to which the target item belongs comprises:
and inputting the plurality of keywords and the preset weight of each keyword into a Bayesian classifier for classification and identification to obtain the category to which the target item belongs.
5. The method according to claim 4, wherein each keyword corresponds to a preset weight, and if the keyword belongs to table contents in the item description information, the keyword corresponds to a first preset weight; if the keyword does not belong to the table content, the keyword corresponds to a second preset weight; the weight value of the first preset weight is greater than that of the second preset weight.
6. The method according to any one of claims 1 to 5, wherein the categories for classification and identification are previously constructed based on the association relationship before different events.
7. The method of claim 6, wherein the different transactions classified into the same category are differentiated using transaction coding; wherein, each item and the corresponding item code satisfy a one-to-one correspondence relationship.
8. The method according to claim 1, wherein after storing the item description information of the target item in association with a category to which the target item belongs, the method further comprises:
determining the classification accuracy of a new Bayesian model from the item description information of the stored target item by adopting a ten-fold cross algorithm;
and if the classification accuracy of the new Bayesian model is higher than that of the Bayesian classifier, taking the new Bayesian model as the classification model in the Bayesian classifier.
9. A government affairs data classification storage device, characterized in that the device comprises:
the description information acquisition module is used for acquiring the item description information of the target item;
the characteristic information acquisition module is used for extracting the characteristics of the item description information to obtain item characteristic information of the target item;
the category confirmation module is used for classifying and identifying the target items based on the item feature information to obtain the category of the target items;
and the item storage module is used for storing the item description information of the target item and the category of the target item in an associated manner.
10. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
11. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method according to any one of claims 1-8.
CN202011526719.0A 2020-12-22 2020-12-22 Government affair data classification storage method and related device Pending CN114722801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011526719.0A CN114722801A (en) 2020-12-22 2020-12-22 Government affair data classification storage method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011526719.0A CN114722801A (en) 2020-12-22 2020-12-22 Government affair data classification storage method and related device

Publications (1)

Publication Number Publication Date
CN114722801A true CN114722801A (en) 2022-07-08

Family

ID=82230190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011526719.0A Pending CN114722801A (en) 2020-12-22 2020-12-22 Government affair data classification storage method and related device

Country Status (1)

Country Link
CN (1) CN114722801A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110275935A (en) * 2019-05-10 2019-09-24 平安科技(深圳)有限公司 Processing method, device and storage medium, the electronic device of policy information
CN110765265A (en) * 2019-09-06 2020-02-07 平安科技(深圳)有限公司 Information classification extraction method and device, computer equipment and storage medium
CN111475612A (en) * 2020-03-02 2020-07-31 深圳壹账通智能科技有限公司 Construction method, device and equipment of early warning event map and storage medium
WO2020224106A1 (en) * 2019-05-07 2020-11-12 平安科技(深圳)有限公司 Text classification method and system based on neural network, and computer device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020224106A1 (en) * 2019-05-07 2020-11-12 平安科技(深圳)有限公司 Text classification method and system based on neural network, and computer device
CN110275935A (en) * 2019-05-10 2019-09-24 平安科技(深圳)有限公司 Processing method, device and storage medium, the electronic device of policy information
CN110765265A (en) * 2019-09-06 2020-02-07 平安科技(深圳)有限公司 Information classification extraction method and device, computer equipment and storage medium
CN111475612A (en) * 2020-03-02 2020-07-31 深圳壹账通智能科技有限公司 Construction method, device and equipment of early warning event map and storage medium

Similar Documents

Publication Publication Date Title
CN107967575B (en) Artificial intelligence platform system for artificial intelligence insurance consultation service
CN111343161B (en) Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
US20160307113A1 (en) Large-scale batch active learning using locality sensitive hashing
WO2021189831A1 (en) Log optimization method, apparatus and device, and readable storage medium
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN112579621B (en) Data display method and device, electronic equipment and computer storage medium
CN117473431B (en) Airport data classification and classification method and system based on knowledge graph
CN107229614A (en) Method and apparatus for grouped data
CN112163072A (en) Data processing method and device based on multiple data sources
CN112182220A (en) Customer service early warning analysis method, system, equipment and medium based on deep learning
CN106445788A (en) Method and device for predicting operating state of information system
CN115803726A (en) Improved entity resolution of master data using qualifying relationship scores
CN113205442A (en) E-government data feedback management method and device based on block chain
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings
US11227288B1 (en) Systems and methods for integration of disparate data feeds for unified data monitoring
CN116485019A (en) Data processing method and device
US11838171B2 (en) Proactive network application problem log analyzer
CN112734352A (en) Document auditing method and device based on data dimensionality
CN114722801A (en) Government affair data classification storage method and related device
CN106156266B (en) Information processing unit and information processing method
US20210295036A1 (en) Systematic language to enable natural language processing on technical diagrams
CN110737749B (en) Entrepreneurship plan evaluation method, entrepreneurship plan evaluation device, computer equipment and storage medium
CN112084408A (en) List data screening method and device, computer equipment and storage medium
CN112465149A (en) Same-city part identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination