CN112800083B - Government decision-oriented government affair big data analysis method and equipment - Google Patents

Government decision-oriented government affair big data analysis method and equipment Download PDF

Info

Publication number
CN112800083B
CN112800083B CN202110204049.9A CN202110204049A CN112800083B CN 112800083 B CN112800083 B CN 112800083B CN 202110204049 A CN202110204049 A CN 202110204049A CN 112800083 B CN112800083 B CN 112800083B
Authority
CN
China
Prior art keywords
data
article
government
government affair
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110204049.9A
Other languages
Chinese (zh)
Other versions
CN112800083A (en
Inventor
史晓浩
李文茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Institute Of Housing And Urban Rural Development
Original Assignee
Shandong Institute Of Housing And Urban Rural Development
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Institute Of Housing And Urban Rural Development filed Critical Shandong Institute Of Housing And Urban Rural Development
Priority to CN202110204049.9A priority Critical patent/CN112800083B/en
Publication of CN112800083A publication Critical patent/CN112800083A/en
Application granted granted Critical
Publication of CN112800083B publication Critical patent/CN112800083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a government decision-oriented government affair big data analysis method and equipment, which are used for solving the technical problem that government affair data cannot be effectively integrated, analyzed and applied. The method comprises the following steps: determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source; cleaning the crawled data in batches, and storing the data in a data warehouse; constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data; and carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result. By the method, available government affair data can be acquired and integrated, analyzed and mined, so that the utilization rate of the government affair data is improved, deep analysis and processing of the government affair data are realized, and valuable reference is provided for decision making work of government departments.

Description

Government decision-oriented government affair big data analysis method and equipment
Technical Field
The application relates to the field of data processing, in particular to a government decision-oriented government affair big data analysis method and device.
Background
With the increase of computer storage capacity and the development of complex algorithms, the data volume in recent years increases exponentially. The integration and analysis of big data are applied to the fields of public transportation, public safety, social management and the like to a certain extent, and the development of cross-scientific research of social science, natural science and the like is promoted. More than 80% of information data resources of China are mastered by each stage of government departments, and the government data are not further planned and utilized, so that the waste of resources is caused.
Therefore, how to combine the traditional statistical technology with the computer technology to realize the integration, analysis and mining of government affair data and apply the government affair data to the decision work of relevant government departments becomes a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a government decision-oriented government affair big data analysis method and equipment, which are used for solving the technical problem that data resources are wasted because government affair data cannot be effectively analyzed, mined and applied.
In one aspect, an embodiment of the present application provides a government decision-oriented government affair big data analysis method, including: determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source; cleaning the crawled data in batches, and storing the data in a data warehouse; constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data; and carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result.
In one implementation of the present application, before performing batch cleaning on the crawled data, the method further includes: randomly sampling a target data source, and performing matching verification on the target data source and the crawled data; and storing the data into a local database under the condition that the crawled data passes verification.
In an implementation of the present application, the data that crawl to is cleaned in batches specifically includes: extracting the crawled data from a local database; filling null values and filtering repeated values of the crawled data; and uniformly converting the filtered data into a preset format, and filtering the converted data again.
In an implementation manner of the present application, displaying an analysis result specifically includes: determining the level and time of an administrative region specified in the hierarchical division index and classifying the classes specified in the summary index; according to a preset display form, displaying the government affair data of the summary indexes corresponding to the corresponding administrative regions in time respectively; wherein, the preset display form at least comprises any one of the following items: line graphs, bar graphs, sector graphs, tables.
In an implementation manner of the present application, the multidimensional mining and analyzing of data in a data warehouse specifically includes: according to the level of the administrative areas, performing drill-down inquiry on the government affair data of the summary indexes corresponding to the corresponding administrative areas in time and the government affair data of the corresponding specified categories in time; and/or adding specified keywords, and performing user-defined query through the keywords; and/or determining administrative regions, time and categories specified in the classified summary indexes specified in the hierarchy division indexes, and performing combined query.
In one implementation of the present application, the method further comprises: determining an index to be established according to metadata corresponding to data in a data warehouse; and constructing a column storage structure by taking the data packet as a unit, and establishing an index corresponding to the data warehouse.
In one implementation of the present application, the crawled data includes articles; the method further comprises the following steps: determining a geographical position range according to the geographical position of the user and a preset distance threshold; calculating a pushing coefficient corresponding to each article according to the click quantity and the collection quantity of each article in the geographical position range; and pushing the article to the user according to the pushing coefficient of the article.
In one implementation of the present application, the method further comprises: determining preset keywords corresponding to article types; the article types comprise leader speech, policy and regulation, research reports and practice innovation; performing word segmentation processing on an article to be published, comparing word segmentation results with preset keywords, and calculating similarity; under the condition that the similarity is not smaller than a first preset threshold value, dividing the articles to be published into corresponding article types; and comparing the similarity of the article to be published with other articles in the corresponding article type, and determining that the similarity is not less than a second preset threshold value.
In one implementation of the present application, the method further comprises: determining a collection type corresponding to an article divided by a user aiming at the article contained in the article type; the collection type is compared with the article type to which the article belongs, and the article type to which the article belongs is corrected.
On the other hand, the embodiment of the present application further provides government affair big data analysis equipment facing government decisions, and the equipment includes: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform a government decision oriented big data analysis method as described above.
The government decision-oriented government affair big data analysis method and device provided by the embodiment of the application at least have the following beneficial effects: the government affair data are crawled from the network and are analyzed and processed, so that the effective utilization of the existing government affair data is realized; and multidimensional data mining and analysis are carried out on the government affair data, and the analysis result is displayed in a user-friendly mode, so that multidimensional and omnibearing comprehensive analysis on the government affair data is realized, and valuable references can be provided for decision-making work of government departments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a government decision-oriented government affair big data analysis method provided by an embodiment of the application;
fig. 2 is a schematic diagram of a classification summary index provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a preset display form according to an embodiment of the present application;
fig. 4 is a schematic view of a drill-down query method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another drill-down query method provided in the embodiments of the present application;
fig. 6 is a schematic diagram of a custom query method provided in the embodiment of the present application;
fig. 7 is a schematic diagram of a combined query method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a government affair big data analysis device for government decision making according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a government decision-oriented government affair big data analysis method and equipment, which are used for solving the technical problem that the existing government affair data cannot be effectively planned and utilized and cannot meet the actual work requirement of the government, so that the waste of data resources is caused.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a government affair big data analysis method facing government decisions according to an embodiment of the present application. As shown in fig. 1, a government decision-oriented government affair big data analysis method provided in the embodiment of the present application mainly includes the following steps:
s101, determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source.
The search engine searches websites, webpages and platforms related to the government affair data and takes the websites, the webpages and the platforms as target data sources, then, the crawling rules are determined according to actual requirements, and the government affair data in the target data sources are crawled through crawlers. The target data sources related to the government affair data include, but are not limited to, various government portals, government affair information websites and data publicity websites.
In one possible implementation, after determining the target data source, the server may configure the crawling rule through the visual interface, for example, when crawling an article, the crawling rule includes an article title, an article source, an article keyword, a release time, and an article classification. Thus, after the server determines the target data source and configures the crawling rule, a corresponding crawling task is established. And the crawler automatically crawls corresponding government affair data in the target data source according to the crawling task.
It should be noted that the crawler accesses the target data source by simulating login, i.e. simulating login of a human rather than brute force cracking. The specific process is as follows: first, a login operation is performed according to a username, password or certificate provided by a user. If the verification code exists, the server automatically identifies the verification code for logging in without manual input of a user. Then, after logging in, the identity information of the user, such as cookie and session, is saved. After the login is completed, the crawler crawls in a local acquisition (single-machine acquisition) mode, so that government data are acquired. Therefore, the problem of the limitation of the picture bed can be solved by simulating to request to access the article link and storing the pictures in the article to the local.
In an embodiment of the application, after the crawler acquires government affair data in the target data source, the server randomly samples the target data source, matches the randomly sampled data with the crawled data, and calculates a matching rate. If the matching rate is not less than the preset value, the crawled data are relatively accurate, and the verified data are stored in a local database for subsequent analysis and processing; if the matching rate is smaller than the preset value, the accuracy of the crawled data is low, and the data crawl needs to be carried out again until the matching rate is not smaller than the preset value. The data crawled are matched and verified, the accuracy of data crawling can be effectively improved, and errors of analysis results caused by inaccuracy of a data acquisition stage are avoided.
In an embodiment of the application, after determining the target data source, the server recommends the similar target data source according to the similar indexes of the administrative region level, population quantity, area, urban development condition and the like in the target data source, so that the search cost is effectively reduced, and the data collection efficiency is improved.
And S102, cleaning the crawled data in batches, and storing the data in a data warehouse.
And the server performs batch cleaning on the crawled data and stores the cleaned data in a data warehouse for subsequent analysis and mining.
In one embodiment of the application, the server extracts data crawled from a target data source from a local database, and then null filling is carried out on missing parts in the data through a preset rule; for repeated parts in the data, the repeated parts are combined into one piece of data or repeated records are filtered, so that the integrity of the data is ensured. In addition, the crawled data may have a problem of non-uniform formats, for example, the pictures have the same content, but the pictures may have multiple formats (e.g., jpg, jpeg, png, etc.) due to different acquisition modes or different target data sources. After the repeated value filtering is carried out on the data, the server unifies the data with different formats into a preset format, and then after the format conversion is completed, the data with the conversion is filtered again, repeated data in the part is removed, unnecessary redundant data are further reduced, and therefore the data processing efficiency is improved.
The embodiment of the application adopts a MongoDB data storage technology to realize the distributed storage of the unstructured data. MongoDB is a product between relational databases and non-relational databases, and among the non-relational databases, the MongoDB has the most abundant functions and is most similar to the relational databases. The data structure supported by the method is very loose and is in a json-like bson format, so that more complex data types can be stored. MongoDB has the biggest characteristic that the supported query language is very strong, the grammar of the MongoDB is similar to the object-oriented query language, almost the most functions of single-table query of similar relational databases can be realized, and the index establishment of data is also supported.
MongoDB supports real-time data processing, can insert, update and inquire data in real time, and has replication and high flexibility required by real-time data storage. In addition, MongoDB has high performance and can be used as a cache layer of an information infrastructure. Therefore, after the system is restarted, the persistence cache layer built by the system can avoid the overload of the lower data source. In addition, MongoDB is also used to store unstructured data and article data.
S103, constructing a multi-dimensional data mining model according to the level division indexes and the classification summary indexes related to the government affair data.
The server defines a hierarchy dividing index and a classification summarizing index for data in the data warehouse, and constructs a multi-dimensional data mining model according to the hierarchy dividing index and the classification summarizing index so as to further mine and analyze the data.
Specifically, the hierarchical division index and the categorical summary index are both related to government data. The hierarchical division index indicates a division level of government affair data of a certain dimension, for example, dividing an administrative region into three levels of province, city and district, or dividing time into year, month and day; the classification and summary indexes are classified according to different fields of government affair data.
Fig. 2 is a schematic diagram of a classification summary index provided in an embodiment of the present application. As shown in fig. 2, the indexes of classification and summary include seven categories of resource environment, population employment, industry support, urban and rural construction, technological innovation, public service, and resident life. The type division is carried out on each index, so that the classification of government affair data is facilitated, the user can inquire related data more conveniently, and the work and decision efficiency is improved.
And S104, carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result.
And the server performs multi-dimensional and multi-angle mining and analysis on the data in the data warehouse according to the multi-dimensional data mining model, and displays the analysis result in a preset display mode.
In one embodiment of the present application, the hierarchical index includes administrative region levels (e.g., province, city, district) and time dimensions (e.g., year, month, day), and the categorized summary index specifies a specific category to categorize, summarize and calculate the data. The server firstly determines the level and time of an administrative region designated in the hierarchical division indexes and classifies the category designated in the summary indexes, then performs summary calculation on the government affair data of the summary indexes corresponding to the designated administrative region within the designated time, and displays the summary calculation result in a preset display form. Wherein, the preset display form at least comprises any one of the following items: line graphs, bar graphs, sector graphs, tables.
Fig. 3 is a schematic diagram of a preset display form according to an embodiment of the present application. As shown in fig. 3, the summary index data of the household citizenship urbanization rate in 2019 is shown in the form of a broken line diagram, wherein the ordinate represents the household citizenship urbanization rate, and the abscissa represents cities in shandong province.
According to the embodiment provided by the application, the query results are displayed in forms such as table visual graphs, so that a user can more intuitively know the index summarized data corresponding to each administrative region, and the work plan is adjusted and the decision is made according to the data corresponding to each classified summarized index by a government department. Meanwhile, the query result can be exported into various formats, and the interfaces of a common data analysis tool and an intelligent analysis mining tool are butted to realize deep mining, so that the utilization rate of data resources is improved, and the value and the analysis application level of the data resources are improved.
In one embodiment of the application, the server performs multidimensional mining and analysis on data in the data warehouse, and supports query and acquisition of analysis results in multiple ways.
Specifically, the server supports drill-down query on the classified summary indexes and the administrative region levels, namely, according to the classified administrative region levels, the drill-down query is performed on the government affair data of the summary indexes corresponding to the administrative regions within the specified time and the government affair data of the corresponding specified categories within the specified time; and/or adding a specified keyword according to a default prompt, and performing custom query, for example, adding an administrative region code, an index code and a time range, and querying government affair data of an index corresponding to the administrative region within specified time; and/or determining the specified administrative region, time and summary index category based on the hierarchy division index and the classification summary index, and realizing the high-level combined query of cross-time, multi-index and multi-region.
Fig. 4 is a schematic view of a drill-down query method according to an embodiment of the present application. Fig. 4 shows the results of the query of drilling down for the demographic employment index in 2019, shandong province and city arrangement. And performing drill-down inquiry on the corresponding government affair data of the specified category in the specified time, thereby realizing the domain-based inquiry on the government affair data.
Fig. 5 is a schematic view of another drill-down query method provided in the embodiment of the present application. As shown in fig. 5, the area is drilled down, and the aggregated data about the employment indexes of the population in 2019 of each administrative area in the city is obtained through query, so that the drilling down query of the government affair data of the aggregated indexes corresponding to the corresponding administrative areas in the specified time is realized.
Fig. 6 is a schematic diagram of a custom query method according to an embodiment of the present application. As shown in FIG. 6, default prompting includes entering keywords such as various dimensions, year ranges, area codes, index codes, and the like. The user can add the keywords according to the default prompt, and then the user-defined query can be carried out, so that the query conditions can be defined according to the self requirements of the user when the user uses the system, the required data can be obtained, and the usability is improved.
Fig. 7 is a schematic diagram of a combined query method according to an embodiment of the present application. As shown in fig. 7, the combined query conditions include time, index grouping, region mark, and whether a lower region is included, and after the user determines the corresponding query conditions, the user can perform multi-directional combined query on the government affair data, so that the advanced query mode refines the query granularity and better conforms to the actual working scene.
The multiple query modes provided by the embodiment of the application embody the multidimensional and comprehensive data analysis and mining, and meanwhile, the query of government affair data is completed in multiple modes, so that the demands of different users can be met, and the usability and the practicability are improved.
In one embodiment of the present application, the server implements ad hoc queries and combined queries for data by building an index. Firstly, the server determines an index to be established according to metadata corresponding to data in a data warehouse; then, with each stored data packet as a unit, a plurality of data packets form a column, and a column storage structure of the data layer, that is, an index corresponding to the data warehouse, is established. The index is established based on the metadata layer, so that not only can a user be ensured to intuitively and conveniently master and monitor the whole data resource overall appearance and the processing process thereof, but also a statistical value can be obtained without unpacking during data query, the I/O can be further reduced, and the data query efficiency is improved.
The indexing technology provided by the embodiment of the application can realize the ad hoc query of data, and ensures a strict metadata organization relation and a flexible data structure, so that a user can carry out any combination of query analysis, and provides an ad hoc visual performance guarantee, thereby enabling exploration type analysis to be possible.
Meanwhile, by establishing a metadata management function and mechanism of the whole data processing service process, a global view of data resources and a data processing process is provided, a user is guaranteed to intuitively and conveniently master and monitor the whole data resource complete picture and the processing process thereof, and the whole data processing life cycle is run through and comprises all links of data sources, ETL, storage, processing, analysis, presentation, use, archiving and the like. The standardized metadata is used for describing various original data, and integration and unified management of data resources are realized, so that all indexes can be inquired and displayed in an omnibearing manner according to a time sequence, regions, industries and other dimensions and multiple angles.
In one embodiment of the application, articles related to government affairs are included in the crawled data, and the server can conduct personalized pushing of the articles for different users.
Specifically, the geographic position of the user is determined through a positioning system, and the geographic position range to which the user belongs is determined according to the geographic position and a preset distance threshold.
Further, according to the click quantity and the collection quantity of each article in the geographic position range, the pushing coefficient corresponding to each article is calculated through the following formula:
E=p1·i1+p2·i2
wherein p is1Represents the click coefficient, i1Indicating the click rate, p, corresponding to the article2Represents the storage coefficient, i2Representing the corresponding collection of the article.
It should be noted that the click coefficient and the collection coefficient may be determined manually or obtained through calculation. For example, comparing the collection of an article in different geographical location ranges at a certain time, determining the maximum collection of the article to be 1010Then the storage coefficient is set to 1/1010. The above parameters are chosen by way of example only, and the present application is intended to be illustrativePlease not limit the scope.
Furthermore, the pushing sequence of each article is determined according to the pushing coefficient, and the articles are sequentially pushed to the user.
The pushed specific articles are determined according to the click quantity and the collection quantity of each article in the geographic position range of the user, so that when the articles are pushed, surrounding users of the user can be used as references, the pushing accuracy is improved, and personalized recommendation of the articles is realized.
Furthermore, the working place of the user is judged according to the activity track of the user and the staying time at each place. Secondly, determining the click quantity and the collection quantity of each article in the work place, and calculating a pushing coefficient. And finally, pushing the article to the user according to the pushing coefficient. Because the degree of association between the work place of the user and the government affair data is larger than that between the work place of the user and other positions, corresponding articles are recommended to the user according to the surrounding users of the work place, so that the recommended articles have higher referential performance, and the pushing accuracy is further improved.
In one embodiment of the application, before publishing the crawled articles, the server classifies the articles into specific types by comparing the similarity of the articles and keywords corresponding to the article types. The process of classifying the articles is as follows:
first, a preset keyword corresponding to each article type is determined. The article types mainly comprise leader speech, policy and regulation, research reports and practice innovation, and the preset keywords can be one or more. For example, keywords of policy and regulation type may be set to issue, policy, enforcement, and the like.
And secondly, segmenting words of the articles to be published, comparing the segmentation results with preset keywords corresponding to the article types, and calculating the similarity. And if the similarity is not less than the first preset threshold value and indicates that the type of the article to be published is similar to the article type, dividing the article to be published under the article type.
Then, after the article to be published is divided under a certain article type, the article to be published is compared with other articles under the article type again in similarity, and the accuracy of the classification result is ensured. And if the similarity is not less than the second preset threshold value, the article to be published is similar to other articles in the divided article types, and the classification result is accurate, the article type is taken as the type of the article to be published.
It should be noted that, when the similarity of the article to be published is compared with other articles in the divided article types, even if a single article is unique, the article to be published has a higher similarity with most of the articles in the divided article types. If the similarity comparison is carried out again, and the quantity of the articles with the similarity smaller than the second preset threshold value with the articles to be published exceeds the preset value, the classification result of the articles to be published is indicated to have errors. Then, the article to be published is compared with preset keywords of other article types, and the type of the article to be published is divided again.
In one embodiment of the present application, for each published article in each article type, a user may collect a certain article and divide it into a specific collection type. The collection type can be customized by the user besides the original article type. Then, the server compares the collection type corresponding to the article with the article type actually belonging to the article, if the ratio of the number of the article collected in a certain collection type to the total collection number of the article exceeds a preset value, and the collection type is not consistent with the article type actually belonging to the article, which indicates that a certain error possibly exists in the actual classification result of the article, the article type belonging to the article is changed into the collection type divided by the user.
For example, the total collection number of an article in the leader speech type is 50, and it is known from background data that the number of articles collected by the user in the policy and regulation type is 30, and the preset collection ratio is 50%, then the ratio of the number of articles collected in the policy and regulation to the total collection number exceeds the preset ratio, and the article type of the article is changed from the leader speech to the policy and regulation. The above parameters are only selected by way of example, and the present application is not limited thereto.
According to the government decision-oriented government affair big data analysis method, data in a target data source related to government affair data are crawled, the data are further analyzed and processed, the use ratio of the government affair data is effectively improved, and resource waste is reduced; based on multidimensional analysis and data mining technology, a multidimensional data mining model is constructed, and data in a data warehouse is deeper mined and analyzed, so that the problems of poor real-time performance, high efficiency and interactivity of the traditional government affair data analysis method are solved; the result of the multidimensional analysis is displayed through a visual interface, and various data query methods aiming at different levels of division indexes and classification and summarization indexes are provided, so that the usability is improved; by classifying the types of the articles and pushing the articles in real time, information reference is provided for work of all departments, and user experience is enhanced.
The above is the method embodiment proposed by the present application. Based on the same inventive concept, the embodiment of the application also provides government affair big data analysis equipment facing government decisions, and the internal structure of the equipment is shown in fig. 8.
Fig. 8 is a schematic structural diagram of a government affair big data analysis device for government decision making according to an embodiment of the present application. As shown in fig. 8, the apparatus comprises a processor 801 and a memory 802 having executable code stored thereon, which when executed, causes the processor 801 to perform a government decision oriented government big data analysis method as above.
In one embodiment of the present application, the processor 801 is configured to determine a target data source related to government affairs data, configure a crawling rule, and perform data crawling on the target data source; cleaning the crawled data in batches, and storing the data in a data warehouse; constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data; and carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (3)

1. A government decision-oriented government affairs big data analysis method, characterized in that the method comprises:
determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source;
randomly sampling the target data source, and performing matching verification on the target data source and the crawled data;
storing the data into a local database under the condition that the crawled data pass verification;
extracting data crawled from the target data source from the local database, cleaning the crawled data in batches, and storing the data in a data warehouse;
determining an index to be established according to metadata corresponding to the data in the data warehouse;
constructing a column storage structure by taking a data packet as a unit, and establishing an index corresponding to the data warehouse;
constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data;
based on the multidimensional data mining model, carrying out multidimensional mining and analysis on the data in the data warehouse, and displaying the analysis result;
displaying the analysis result, which specifically comprises the following steps:
determining the level and time of an administrative region specified in the hierarchical division index and the category specified in the classification summary index;
according to a preset display form, displaying the government affair data of the summary indexes corresponding to the corresponding administrative regions in the time respectively; wherein, the preset display form at least comprises any one of the following items: line graphs, bar graphs, sector graphs, tables;
performing multidimensional mining and analysis on the data in the data warehouse, which specifically comprises the following steps:
according to the administrative region level, performing drill-down inquiry on the government affair data of the summary index corresponding to the administrative region in the time and the government affair data of the corresponding specified category in the time; and/or
Adding a specified keyword, and performing user-defined query through the keyword; and/or
Determining administrative regions and time specified in the hierarchical division indexes and categories specified in the classified summary indexes, and performing combined query;
the crawled data comprises articles;
determining a geographical position range according to the geographical position of the user and a preset distance threshold;
calculating a pushing coefficient corresponding to each article according to the click quantity and the collection quantity of each article in the geographic position range;
pushing the article to the user according to the article pushing coefficient;
judging the working place of the user according to the activity track of the user and the staying time at each place;
determining the click rate and the collection rate of each article in the workplace, and calculating a pushing coefficient corresponding to each article according to the click rate and the collection rate;
pushing an article to the user according to the pushing coefficient;
determining preset keywords corresponding to article types; wherein the article types comprise leader speech, policy and regulation, research reports, practice innovation;
performing word segmentation processing on the article to be published, comparing word segmentation results with the preset keywords, and calculating similarity;
under the condition that the similarity is not smaller than a first preset threshold value, dividing the article to be published under the corresponding article type;
comparing the similarity of the article to be published with other articles in the corresponding article type, and determining that the similarity is not less than a second preset threshold;
determining a collection type corresponding to an article divided by a user aiming at the article contained in the article type;
comparing the collection type with the article type to which the article belongs, and correcting the article type to which the article belongs;
recommending similar target data sources according to similar indexes in the target data sources; wherein the indexes comprise administrative region levels, population numbers, areas and urban development conditions.
2. The government decision-oriented government affair big data analysis method according to claim 1, wherein the batch cleaning of the crawled data specifically comprises:
extracting the crawled data from the local database;
filling null values and filtering repeated values of the crawled data;
and uniformly converting the filtered data into a preset format, and filtering the converted data again.
3. A government decision oriented government affairs big data analysis device, characterized in that the device comprises:
a processor;
and a memory having executable code stored thereon, that when executed, causes the processor to perform a government decision oriented government big data analysis method of any of claims 1-2.
CN202110204049.9A 2021-02-24 2021-02-24 Government decision-oriented government affair big data analysis method and equipment Active CN112800083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110204049.9A CN112800083B (en) 2021-02-24 2021-02-24 Government decision-oriented government affair big data analysis method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110204049.9A CN112800083B (en) 2021-02-24 2021-02-24 Government decision-oriented government affair big data analysis method and equipment

Publications (2)

Publication Number Publication Date
CN112800083A CN112800083A (en) 2021-05-14
CN112800083B true CN112800083B (en) 2022-03-18

Family

ID=75815439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110204049.9A Active CN112800083B (en) 2021-02-24 2021-02-24 Government decision-oriented government affair big data analysis method and equipment

Country Status (1)

Country Link
CN (1) CN112800083B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114240220A (en) * 2021-12-22 2022-03-25 中国建设银行股份有限公司 Government affair data processing method, device, equipment, medium and program product
CN114596182B (en) * 2022-03-09 2023-05-16 王淑娟 Government affair management method and system based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168992A (en) * 2017-03-29 2017-09-15 北京百度网讯科技有限公司 Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence
CN107656938A (en) * 2016-07-26 2018-02-02 北京搜狗科技发展有限公司 It is a kind of to recommend method and apparatus, a kind of device for being used to recommend
CN110781236A (en) * 2019-10-29 2020-02-11 山西云时代技术有限公司 Method for constructing government affair big data management system
CN111222028A (en) * 2020-01-10 2020-06-02 四川日报社 Intelligent data crawling method
CN111783468A (en) * 2020-06-28 2020-10-16 百度在线网络技术(北京)有限公司 Text processing method, device, equipment and medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984435B (en) * 2010-11-17 2012-10-10 百度在线网络技术(北京)有限公司 Method and device for distributing texts
CN103309886B (en) * 2012-03-13 2017-05-10 阿里巴巴集团控股有限公司 Trading-platform-based structural information searching method and device
US20180341686A1 (en) * 2017-05-26 2018-11-29 Nanfang Hu System and method for data search based on top-to-bottom similarity analysis
WO2019113977A1 (en) * 2017-12-15 2019-06-20 腾讯科技(深圳)有限公司 Method, device, and server for processing written articles, and storage medium
CN109408642B (en) * 2018-08-30 2021-07-16 昆明理工大学 Domain entity attribute relation extraction method based on distance supervision

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656938A (en) * 2016-07-26 2018-02-02 北京搜狗科技发展有限公司 It is a kind of to recommend method and apparatus, a kind of device for being used to recommend
CN107168992A (en) * 2017-03-29 2017-09-15 北京百度网讯科技有限公司 Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence
CN110781236A (en) * 2019-10-29 2020-02-11 山西云时代技术有限公司 Method for constructing government affair big data management system
CN111222028A (en) * 2020-01-10 2020-06-02 四川日报社 Intelligent data crawling method
CN111783468A (en) * 2020-06-28 2020-10-16 百度在线网络技术(北京)有限公司 Text processing method, device, equipment and medium

Also Published As

Publication number Publication date
CN112800083A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
US7885918B2 (en) Creating a taxonomy from business-oriented metadata content
US8938475B2 (en) Managing business objects data sources
US7143107B1 (en) Reporting engine for data warehouse
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
Rao et al. Spatial hierarchy and OLAP-favored search in spatial data warehouse
US20130166573A1 (en) Managing Business Objects Data Sources
US9747349B2 (en) System and method for distributing queries to a group of databases and expediting data access
US20060074953A1 (en) Metadata management for a data abstraction model
US20040243555A1 (en) Methods and systems for optimizing queries through dynamic and autonomous database schema analysis
US20080065632A1 (en) Server, method and system for providing information search service by using web page segmented into several inforamtion blocks
Park et al. Toward total business intelligence incorporating structured and unstructured data
CN102667761A (en) Scalable cluster database
CN112800083B (en) Government decision-oriented government affair big data analysis method and equipment
CN104050213B (en) Query processing system including data classification
CN109408578A (en) One kind being directed to isomerous environment monitoring data fusion method
CN115757689A (en) Information query system, method and equipment
KR20180126792A (en) System and Method for processing complex stream data using distributed in-memory
KR100671077B1 (en) Server, Method and System for Providing Information Search Service by Using Sheaf of Pages
CN114707059A (en) Water conservancy object metadata recommendation system construction method based on user preference
Cai et al. Research on multi-source POI data fusion based on ontology and clustering algorithms
US20100268723A1 (en) Method of partitioning a search query to gather results beyond a search limit
CN114707006A (en) Multi-option intelligent retrieval method based on data dictionary
Goyal Qp-subdue: Processing queries over graph databases
CN112214660A (en) Industrial information identification and retrieval system
Cheng et al. Generic cumulative annular bucket histogram for spatial selectivity estimation of spatial database management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 250002 No. 128 Wolong Road, Jinan City, Shandong Province

Applicant after: Shandong Institute of housing and urban rural development

Address before: No.17, sanlizhuang, Jingliu Road, Shizhong District, Jinan City, Shandong Province

Applicant before: SHANDONG CONSTRUCTION DEVELOPMENT Research Institute

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant