CN112800083B - Government decision-oriented government affair big data analysis method and equipment - Google Patents
Government decision-oriented government affair big data analysis method and equipment Download PDFInfo
- Publication number
- CN112800083B CN112800083B CN202110204049.9A CN202110204049A CN112800083B CN 112800083 B CN112800083 B CN 112800083B CN 202110204049 A CN202110204049 A CN 202110204049A CN 112800083 B CN112800083 B CN 112800083B
- Authority
- CN
- China
- Prior art keywords
- data
- article
- government
- government affair
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000007405 data analysis Methods 0.000 title claims abstract description 26
- 238000004458 analytical method Methods 0.000 claims abstract description 32
- 230000009193 crawling Effects 0.000 claims abstract description 18
- 238000007418 data mining Methods 0.000 claims abstract description 16
- 238000005065 mining Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000004140 cleaning Methods 0.000 claims abstract description 8
- 238000003860 storage Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000011161 development Methods 0.000 claims description 4
- 230000018109 developmental process Effects 0.000 claims description 4
- 238000011160 research Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000005553 drilling Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 235000014510 cooky Nutrition 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a government decision-oriented government affair big data analysis method and equipment, which are used for solving the technical problem that government affair data cannot be effectively integrated, analyzed and applied. The method comprises the following steps: determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source; cleaning the crawled data in batches, and storing the data in a data warehouse; constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data; and carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result. By the method, available government affair data can be acquired and integrated, analyzed and mined, so that the utilization rate of the government affair data is improved, deep analysis and processing of the government affair data are realized, and valuable reference is provided for decision making work of government departments.
Description
Technical Field
The application relates to the field of data processing, in particular to a government decision-oriented government affair big data analysis method and device.
Background
With the increase of computer storage capacity and the development of complex algorithms, the data volume in recent years increases exponentially. The integration and analysis of big data are applied to the fields of public transportation, public safety, social management and the like to a certain extent, and the development of cross-scientific research of social science, natural science and the like is promoted. More than 80% of information data resources of China are mastered by each stage of government departments, and the government data are not further planned and utilized, so that the waste of resources is caused.
Therefore, how to combine the traditional statistical technology with the computer technology to realize the integration, analysis and mining of government affair data and apply the government affair data to the decision work of relevant government departments becomes a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a government decision-oriented government affair big data analysis method and equipment, which are used for solving the technical problem that data resources are wasted because government affair data cannot be effectively analyzed, mined and applied.
In one aspect, an embodiment of the present application provides a government decision-oriented government affair big data analysis method, including: determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source; cleaning the crawled data in batches, and storing the data in a data warehouse; constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data; and carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result.
In one implementation of the present application, before performing batch cleaning on the crawled data, the method further includes: randomly sampling a target data source, and performing matching verification on the target data source and the crawled data; and storing the data into a local database under the condition that the crawled data passes verification.
In an implementation of the present application, the data that crawl to is cleaned in batches specifically includes: extracting the crawled data from a local database; filling null values and filtering repeated values of the crawled data; and uniformly converting the filtered data into a preset format, and filtering the converted data again.
In an implementation manner of the present application, displaying an analysis result specifically includes: determining the level and time of an administrative region specified in the hierarchical division index and classifying the classes specified in the summary index; according to a preset display form, displaying the government affair data of the summary indexes corresponding to the corresponding administrative regions in time respectively; wherein, the preset display form at least comprises any one of the following items: line graphs, bar graphs, sector graphs, tables.
In an implementation manner of the present application, the multidimensional mining and analyzing of data in a data warehouse specifically includes: according to the level of the administrative areas, performing drill-down inquiry on the government affair data of the summary indexes corresponding to the corresponding administrative areas in time and the government affair data of the corresponding specified categories in time; and/or adding specified keywords, and performing user-defined query through the keywords; and/or determining administrative regions, time and categories specified in the classified summary indexes specified in the hierarchy division indexes, and performing combined query.
In one implementation of the present application, the method further comprises: determining an index to be established according to metadata corresponding to data in a data warehouse; and constructing a column storage structure by taking the data packet as a unit, and establishing an index corresponding to the data warehouse.
In one implementation of the present application, the crawled data includes articles; the method further comprises the following steps: determining a geographical position range according to the geographical position of the user and a preset distance threshold; calculating a pushing coefficient corresponding to each article according to the click quantity and the collection quantity of each article in the geographical position range; and pushing the article to the user according to the pushing coefficient of the article.
In one implementation of the present application, the method further comprises: determining preset keywords corresponding to article types; the article types comprise leader speech, policy and regulation, research reports and practice innovation; performing word segmentation processing on an article to be published, comparing word segmentation results with preset keywords, and calculating similarity; under the condition that the similarity is not smaller than a first preset threshold value, dividing the articles to be published into corresponding article types; and comparing the similarity of the article to be published with other articles in the corresponding article type, and determining that the similarity is not less than a second preset threshold value.
In one implementation of the present application, the method further comprises: determining a collection type corresponding to an article divided by a user aiming at the article contained in the article type; the collection type is compared with the article type to which the article belongs, and the article type to which the article belongs is corrected.
On the other hand, the embodiment of the present application further provides government affair big data analysis equipment facing government decisions, and the equipment includes: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform a government decision oriented big data analysis method as described above.
The government decision-oriented government affair big data analysis method and device provided by the embodiment of the application at least have the following beneficial effects: the government affair data are crawled from the network and are analyzed and processed, so that the effective utilization of the existing government affair data is realized; and multidimensional data mining and analysis are carried out on the government affair data, and the analysis result is displayed in a user-friendly mode, so that multidimensional and omnibearing comprehensive analysis on the government affair data is realized, and valuable references can be provided for decision-making work of government departments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a government decision-oriented government affair big data analysis method provided by an embodiment of the application;
fig. 2 is a schematic diagram of a classification summary index provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a preset display form according to an embodiment of the present application;
fig. 4 is a schematic view of a drill-down query method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another drill-down query method provided in the embodiments of the present application;
fig. 6 is a schematic diagram of a custom query method provided in the embodiment of the present application;
fig. 7 is a schematic diagram of a combined query method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a government affair big data analysis device for government decision making according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a government decision-oriented government affair big data analysis method and equipment, which are used for solving the technical problem that the existing government affair data cannot be effectively planned and utilized and cannot meet the actual work requirement of the government, so that the waste of data resources is caused.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a government affair big data analysis method facing government decisions according to an embodiment of the present application. As shown in fig. 1, a government decision-oriented government affair big data analysis method provided in the embodiment of the present application mainly includes the following steps:
s101, determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source.
The search engine searches websites, webpages and platforms related to the government affair data and takes the websites, the webpages and the platforms as target data sources, then, the crawling rules are determined according to actual requirements, and the government affair data in the target data sources are crawled through crawlers. The target data sources related to the government affair data include, but are not limited to, various government portals, government affair information websites and data publicity websites.
In one possible implementation, after determining the target data source, the server may configure the crawling rule through the visual interface, for example, when crawling an article, the crawling rule includes an article title, an article source, an article keyword, a release time, and an article classification. Thus, after the server determines the target data source and configures the crawling rule, a corresponding crawling task is established. And the crawler automatically crawls corresponding government affair data in the target data source according to the crawling task.
It should be noted that the crawler accesses the target data source by simulating login, i.e. simulating login of a human rather than brute force cracking. The specific process is as follows: first, a login operation is performed according to a username, password or certificate provided by a user. If the verification code exists, the server automatically identifies the verification code for logging in without manual input of a user. Then, after logging in, the identity information of the user, such as cookie and session, is saved. After the login is completed, the crawler crawls in a local acquisition (single-machine acquisition) mode, so that government data are acquired. Therefore, the problem of the limitation of the picture bed can be solved by simulating to request to access the article link and storing the pictures in the article to the local.
In an embodiment of the application, after the crawler acquires government affair data in the target data source, the server randomly samples the target data source, matches the randomly sampled data with the crawled data, and calculates a matching rate. If the matching rate is not less than the preset value, the crawled data are relatively accurate, and the verified data are stored in a local database for subsequent analysis and processing; if the matching rate is smaller than the preset value, the accuracy of the crawled data is low, and the data crawl needs to be carried out again until the matching rate is not smaller than the preset value. The data crawled are matched and verified, the accuracy of data crawling can be effectively improved, and errors of analysis results caused by inaccuracy of a data acquisition stage are avoided.
In an embodiment of the application, after determining the target data source, the server recommends the similar target data source according to the similar indexes of the administrative region level, population quantity, area, urban development condition and the like in the target data source, so that the search cost is effectively reduced, and the data collection efficiency is improved.
And S102, cleaning the crawled data in batches, and storing the data in a data warehouse.
And the server performs batch cleaning on the crawled data and stores the cleaned data in a data warehouse for subsequent analysis and mining.
In one embodiment of the application, the server extracts data crawled from a target data source from a local database, and then null filling is carried out on missing parts in the data through a preset rule; for repeated parts in the data, the repeated parts are combined into one piece of data or repeated records are filtered, so that the integrity of the data is ensured. In addition, the crawled data may have a problem of non-uniform formats, for example, the pictures have the same content, but the pictures may have multiple formats (e.g., jpg, jpeg, png, etc.) due to different acquisition modes or different target data sources. After the repeated value filtering is carried out on the data, the server unifies the data with different formats into a preset format, and then after the format conversion is completed, the data with the conversion is filtered again, repeated data in the part is removed, unnecessary redundant data are further reduced, and therefore the data processing efficiency is improved.
The embodiment of the application adopts a MongoDB data storage technology to realize the distributed storage of the unstructured data. MongoDB is a product between relational databases and non-relational databases, and among the non-relational databases, the MongoDB has the most abundant functions and is most similar to the relational databases. The data structure supported by the method is very loose and is in a json-like bson format, so that more complex data types can be stored. MongoDB has the biggest characteristic that the supported query language is very strong, the grammar of the MongoDB is similar to the object-oriented query language, almost the most functions of single-table query of similar relational databases can be realized, and the index establishment of data is also supported.
MongoDB supports real-time data processing, can insert, update and inquire data in real time, and has replication and high flexibility required by real-time data storage. In addition, MongoDB has high performance and can be used as a cache layer of an information infrastructure. Therefore, after the system is restarted, the persistence cache layer built by the system can avoid the overload of the lower data source. In addition, MongoDB is also used to store unstructured data and article data.
S103, constructing a multi-dimensional data mining model according to the level division indexes and the classification summary indexes related to the government affair data.
The server defines a hierarchy dividing index and a classification summarizing index for data in the data warehouse, and constructs a multi-dimensional data mining model according to the hierarchy dividing index and the classification summarizing index so as to further mine and analyze the data.
Specifically, the hierarchical division index and the categorical summary index are both related to government data. The hierarchical division index indicates a division level of government affair data of a certain dimension, for example, dividing an administrative region into three levels of province, city and district, or dividing time into year, month and day; the classification and summary indexes are classified according to different fields of government affair data.
Fig. 2 is a schematic diagram of a classification summary index provided in an embodiment of the present application. As shown in fig. 2, the indexes of classification and summary include seven categories of resource environment, population employment, industry support, urban and rural construction, technological innovation, public service, and resident life. The type division is carried out on each index, so that the classification of government affair data is facilitated, the user can inquire related data more conveniently, and the work and decision efficiency is improved.
And S104, carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result.
And the server performs multi-dimensional and multi-angle mining and analysis on the data in the data warehouse according to the multi-dimensional data mining model, and displays the analysis result in a preset display mode.
In one embodiment of the present application, the hierarchical index includes administrative region levels (e.g., province, city, district) and time dimensions (e.g., year, month, day), and the categorized summary index specifies a specific category to categorize, summarize and calculate the data. The server firstly determines the level and time of an administrative region designated in the hierarchical division indexes and classifies the category designated in the summary indexes, then performs summary calculation on the government affair data of the summary indexes corresponding to the designated administrative region within the designated time, and displays the summary calculation result in a preset display form. Wherein, the preset display form at least comprises any one of the following items: line graphs, bar graphs, sector graphs, tables.
Fig. 3 is a schematic diagram of a preset display form according to an embodiment of the present application. As shown in fig. 3, the summary index data of the household citizenship urbanization rate in 2019 is shown in the form of a broken line diagram, wherein the ordinate represents the household citizenship urbanization rate, and the abscissa represents cities in shandong province.
According to the embodiment provided by the application, the query results are displayed in forms such as table visual graphs, so that a user can more intuitively know the index summarized data corresponding to each administrative region, and the work plan is adjusted and the decision is made according to the data corresponding to each classified summarized index by a government department. Meanwhile, the query result can be exported into various formats, and the interfaces of a common data analysis tool and an intelligent analysis mining tool are butted to realize deep mining, so that the utilization rate of data resources is improved, and the value and the analysis application level of the data resources are improved.
In one embodiment of the application, the server performs multidimensional mining and analysis on data in the data warehouse, and supports query and acquisition of analysis results in multiple ways.
Specifically, the server supports drill-down query on the classified summary indexes and the administrative region levels, namely, according to the classified administrative region levels, the drill-down query is performed on the government affair data of the summary indexes corresponding to the administrative regions within the specified time and the government affair data of the corresponding specified categories within the specified time; and/or adding a specified keyword according to a default prompt, and performing custom query, for example, adding an administrative region code, an index code and a time range, and querying government affair data of an index corresponding to the administrative region within specified time; and/or determining the specified administrative region, time and summary index category based on the hierarchy division index and the classification summary index, and realizing the high-level combined query of cross-time, multi-index and multi-region.
Fig. 4 is a schematic view of a drill-down query method according to an embodiment of the present application. Fig. 4 shows the results of the query of drilling down for the demographic employment index in 2019, shandong province and city arrangement. And performing drill-down inquiry on the corresponding government affair data of the specified category in the specified time, thereby realizing the domain-based inquiry on the government affair data.
Fig. 5 is a schematic view of another drill-down query method provided in the embodiment of the present application. As shown in fig. 5, the area is drilled down, and the aggregated data about the employment indexes of the population in 2019 of each administrative area in the city is obtained through query, so that the drilling down query of the government affair data of the aggregated indexes corresponding to the corresponding administrative areas in the specified time is realized.
Fig. 6 is a schematic diagram of a custom query method according to an embodiment of the present application. As shown in FIG. 6, default prompting includes entering keywords such as various dimensions, year ranges, area codes, index codes, and the like. The user can add the keywords according to the default prompt, and then the user-defined query can be carried out, so that the query conditions can be defined according to the self requirements of the user when the user uses the system, the required data can be obtained, and the usability is improved.
Fig. 7 is a schematic diagram of a combined query method according to an embodiment of the present application. As shown in fig. 7, the combined query conditions include time, index grouping, region mark, and whether a lower region is included, and after the user determines the corresponding query conditions, the user can perform multi-directional combined query on the government affair data, so that the advanced query mode refines the query granularity and better conforms to the actual working scene.
The multiple query modes provided by the embodiment of the application embody the multidimensional and comprehensive data analysis and mining, and meanwhile, the query of government affair data is completed in multiple modes, so that the demands of different users can be met, and the usability and the practicability are improved.
In one embodiment of the present application, the server implements ad hoc queries and combined queries for data by building an index. Firstly, the server determines an index to be established according to metadata corresponding to data in a data warehouse; then, with each stored data packet as a unit, a plurality of data packets form a column, and a column storage structure of the data layer, that is, an index corresponding to the data warehouse, is established. The index is established based on the metadata layer, so that not only can a user be ensured to intuitively and conveniently master and monitor the whole data resource overall appearance and the processing process thereof, but also a statistical value can be obtained without unpacking during data query, the I/O can be further reduced, and the data query efficiency is improved.
The indexing technology provided by the embodiment of the application can realize the ad hoc query of data, and ensures a strict metadata organization relation and a flexible data structure, so that a user can carry out any combination of query analysis, and provides an ad hoc visual performance guarantee, thereby enabling exploration type analysis to be possible.
Meanwhile, by establishing a metadata management function and mechanism of the whole data processing service process, a global view of data resources and a data processing process is provided, a user is guaranteed to intuitively and conveniently master and monitor the whole data resource complete picture and the processing process thereof, and the whole data processing life cycle is run through and comprises all links of data sources, ETL, storage, processing, analysis, presentation, use, archiving and the like. The standardized metadata is used for describing various original data, and integration and unified management of data resources are realized, so that all indexes can be inquired and displayed in an omnibearing manner according to a time sequence, regions, industries and other dimensions and multiple angles.
In one embodiment of the application, articles related to government affairs are included in the crawled data, and the server can conduct personalized pushing of the articles for different users.
Specifically, the geographic position of the user is determined through a positioning system, and the geographic position range to which the user belongs is determined according to the geographic position and a preset distance threshold.
Further, according to the click quantity and the collection quantity of each article in the geographic position range, the pushing coefficient corresponding to each article is calculated through the following formula:
E=p1·i1+p2·i2
wherein p is1Represents the click coefficient, i1Indicating the click rate, p, corresponding to the article2Represents the storage coefficient, i2Representing the corresponding collection of the article.
It should be noted that the click coefficient and the collection coefficient may be determined manually or obtained through calculation. For example, comparing the collection of an article in different geographical location ranges at a certain time, determining the maximum collection of the article to be 1010Then the storage coefficient is set to 1/1010. The above parameters are chosen by way of example only, and the present application is intended to be illustrativePlease not limit the scope.
Furthermore, the pushing sequence of each article is determined according to the pushing coefficient, and the articles are sequentially pushed to the user.
The pushed specific articles are determined according to the click quantity and the collection quantity of each article in the geographic position range of the user, so that when the articles are pushed, surrounding users of the user can be used as references, the pushing accuracy is improved, and personalized recommendation of the articles is realized.
Furthermore, the working place of the user is judged according to the activity track of the user and the staying time at each place. Secondly, determining the click quantity and the collection quantity of each article in the work place, and calculating a pushing coefficient. And finally, pushing the article to the user according to the pushing coefficient. Because the degree of association between the work place of the user and the government affair data is larger than that between the work place of the user and other positions, corresponding articles are recommended to the user according to the surrounding users of the work place, so that the recommended articles have higher referential performance, and the pushing accuracy is further improved.
In one embodiment of the application, before publishing the crawled articles, the server classifies the articles into specific types by comparing the similarity of the articles and keywords corresponding to the article types. The process of classifying the articles is as follows:
first, a preset keyword corresponding to each article type is determined. The article types mainly comprise leader speech, policy and regulation, research reports and practice innovation, and the preset keywords can be one or more. For example, keywords of policy and regulation type may be set to issue, policy, enforcement, and the like.
And secondly, segmenting words of the articles to be published, comparing the segmentation results with preset keywords corresponding to the article types, and calculating the similarity. And if the similarity is not less than the first preset threshold value and indicates that the type of the article to be published is similar to the article type, dividing the article to be published under the article type.
Then, after the article to be published is divided under a certain article type, the article to be published is compared with other articles under the article type again in similarity, and the accuracy of the classification result is ensured. And if the similarity is not less than the second preset threshold value, the article to be published is similar to other articles in the divided article types, and the classification result is accurate, the article type is taken as the type of the article to be published.
It should be noted that, when the similarity of the article to be published is compared with other articles in the divided article types, even if a single article is unique, the article to be published has a higher similarity with most of the articles in the divided article types. If the similarity comparison is carried out again, and the quantity of the articles with the similarity smaller than the second preset threshold value with the articles to be published exceeds the preset value, the classification result of the articles to be published is indicated to have errors. Then, the article to be published is compared with preset keywords of other article types, and the type of the article to be published is divided again.
In one embodiment of the present application, for each published article in each article type, a user may collect a certain article and divide it into a specific collection type. The collection type can be customized by the user besides the original article type. Then, the server compares the collection type corresponding to the article with the article type actually belonging to the article, if the ratio of the number of the article collected in a certain collection type to the total collection number of the article exceeds a preset value, and the collection type is not consistent with the article type actually belonging to the article, which indicates that a certain error possibly exists in the actual classification result of the article, the article type belonging to the article is changed into the collection type divided by the user.
For example, the total collection number of an article in the leader speech type is 50, and it is known from background data that the number of articles collected by the user in the policy and regulation type is 30, and the preset collection ratio is 50%, then the ratio of the number of articles collected in the policy and regulation to the total collection number exceeds the preset ratio, and the article type of the article is changed from the leader speech to the policy and regulation. The above parameters are only selected by way of example, and the present application is not limited thereto.
According to the government decision-oriented government affair big data analysis method, data in a target data source related to government affair data are crawled, the data are further analyzed and processed, the use ratio of the government affair data is effectively improved, and resource waste is reduced; based on multidimensional analysis and data mining technology, a multidimensional data mining model is constructed, and data in a data warehouse is deeper mined and analyzed, so that the problems of poor real-time performance, high efficiency and interactivity of the traditional government affair data analysis method are solved; the result of the multidimensional analysis is displayed through a visual interface, and various data query methods aiming at different levels of division indexes and classification and summarization indexes are provided, so that the usability is improved; by classifying the types of the articles and pushing the articles in real time, information reference is provided for work of all departments, and user experience is enhanced.
The above is the method embodiment proposed by the present application. Based on the same inventive concept, the embodiment of the application also provides government affair big data analysis equipment facing government decisions, and the internal structure of the equipment is shown in fig. 8.
Fig. 8 is a schematic structural diagram of a government affair big data analysis device for government decision making according to an embodiment of the present application. As shown in fig. 8, the apparatus comprises a processor 801 and a memory 802 having executable code stored thereon, which when executed, causes the processor 801 to perform a government decision oriented government big data analysis method as above.
In one embodiment of the present application, the processor 801 is configured to determine a target data source related to government affairs data, configure a crawling rule, and perform data crawling on the target data source; cleaning the crawled data in batches, and storing the data in a data warehouse; constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data; and carrying out multi-dimensional mining and analysis on the data in the data warehouse based on the multi-dimensional data mining model, and displaying the analysis result.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (3)
1. A government decision-oriented government affairs big data analysis method, characterized in that the method comprises:
determining a target data source related to government affair data, configuring a crawling rule, and performing data crawling on the target data source;
randomly sampling the target data source, and performing matching verification on the target data source and the crawled data;
storing the data into a local database under the condition that the crawled data pass verification;
extracting data crawled from the target data source from the local database, cleaning the crawled data in batches, and storing the data in a data warehouse;
determining an index to be established according to metadata corresponding to the data in the data warehouse;
constructing a column storage structure by taking a data packet as a unit, and establishing an index corresponding to the data warehouse;
constructing a multi-dimensional data mining model according to the hierarchy division indexes and the classification summary indexes related to the government affair data;
based on the multidimensional data mining model, carrying out multidimensional mining and analysis on the data in the data warehouse, and displaying the analysis result;
displaying the analysis result, which specifically comprises the following steps:
determining the level and time of an administrative region specified in the hierarchical division index and the category specified in the classification summary index;
according to a preset display form, displaying the government affair data of the summary indexes corresponding to the corresponding administrative regions in the time respectively; wherein, the preset display form at least comprises any one of the following items: line graphs, bar graphs, sector graphs, tables;
performing multidimensional mining and analysis on the data in the data warehouse, which specifically comprises the following steps:
according to the administrative region level, performing drill-down inquiry on the government affair data of the summary index corresponding to the administrative region in the time and the government affair data of the corresponding specified category in the time; and/or
Adding a specified keyword, and performing user-defined query through the keyword; and/or
Determining administrative regions and time specified in the hierarchical division indexes and categories specified in the classified summary indexes, and performing combined query;
the crawled data comprises articles;
determining a geographical position range according to the geographical position of the user and a preset distance threshold;
calculating a pushing coefficient corresponding to each article according to the click quantity and the collection quantity of each article in the geographic position range;
pushing the article to the user according to the article pushing coefficient;
judging the working place of the user according to the activity track of the user and the staying time at each place;
determining the click rate and the collection rate of each article in the workplace, and calculating a pushing coefficient corresponding to each article according to the click rate and the collection rate;
pushing an article to the user according to the pushing coefficient;
determining preset keywords corresponding to article types; wherein the article types comprise leader speech, policy and regulation, research reports, practice innovation;
performing word segmentation processing on the article to be published, comparing word segmentation results with the preset keywords, and calculating similarity;
under the condition that the similarity is not smaller than a first preset threshold value, dividing the article to be published under the corresponding article type;
comparing the similarity of the article to be published with other articles in the corresponding article type, and determining that the similarity is not less than a second preset threshold;
determining a collection type corresponding to an article divided by a user aiming at the article contained in the article type;
comparing the collection type with the article type to which the article belongs, and correcting the article type to which the article belongs;
recommending similar target data sources according to similar indexes in the target data sources; wherein the indexes comprise administrative region levels, population numbers, areas and urban development conditions.
2. The government decision-oriented government affair big data analysis method according to claim 1, wherein the batch cleaning of the crawled data specifically comprises:
extracting the crawled data from the local database;
filling null values and filtering repeated values of the crawled data;
and uniformly converting the filtered data into a preset format, and filtering the converted data again.
3. A government decision oriented government affairs big data analysis device, characterized in that the device comprises:
a processor;
and a memory having executable code stored thereon, that when executed, causes the processor to perform a government decision oriented government big data analysis method of any of claims 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110204049.9A CN112800083B (en) | 2021-02-24 | 2021-02-24 | Government decision-oriented government affair big data analysis method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110204049.9A CN112800083B (en) | 2021-02-24 | 2021-02-24 | Government decision-oriented government affair big data analysis method and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112800083A CN112800083A (en) | 2021-05-14 |
CN112800083B true CN112800083B (en) | 2022-03-18 |
Family
ID=75815439
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110204049.9A Active CN112800083B (en) | 2021-02-24 | 2021-02-24 | Government decision-oriented government affair big data analysis method and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800083B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114240220A (en) * | 2021-12-22 | 2022-03-25 | 中国建设银行股份有限公司 | Government affair data processing method, device, equipment, medium and program product |
CN114596182B (en) * | 2022-03-09 | 2023-05-16 | 王淑娟 | Government affair management method and system based on big data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107168992A (en) * | 2017-03-29 | 2017-09-15 | 北京百度网讯科技有限公司 | Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence |
CN107656938A (en) * | 2016-07-26 | 2018-02-02 | 北京搜狗科技发展有限公司 | It is a kind of to recommend method and apparatus, a kind of device for being used to recommend |
CN110781236A (en) * | 2019-10-29 | 2020-02-11 | 山西云时代技术有限公司 | Method for constructing government affair big data management system |
CN111222028A (en) * | 2020-01-10 | 2020-06-02 | 四川日报社 | Intelligent data crawling method |
CN111783468A (en) * | 2020-06-28 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101984435B (en) * | 2010-11-17 | 2012-10-10 | 百度在线网络技术(北京)有限公司 | Method and device for distributing texts |
CN103309886B (en) * | 2012-03-13 | 2017-05-10 | 阿里巴巴集团控股有限公司 | Trading-platform-based structural information searching method and device |
US20180341686A1 (en) * | 2017-05-26 | 2018-11-29 | Nanfang Hu | System and method for data search based on top-to-bottom similarity analysis |
WO2019113977A1 (en) * | 2017-12-15 | 2019-06-20 | 腾讯科技(深圳)有限公司 | Method, device, and server for processing written articles, and storage medium |
CN109408642B (en) * | 2018-08-30 | 2021-07-16 | 昆明理工大学 | Domain entity attribute relation extraction method based on distance supervision |
-
2021
- 2021-02-24 CN CN202110204049.9A patent/CN112800083B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107656938A (en) * | 2016-07-26 | 2018-02-02 | 北京搜狗科技发展有限公司 | It is a kind of to recommend method and apparatus, a kind of device for being used to recommend |
CN107168992A (en) * | 2017-03-29 | 2017-09-15 | 北京百度网讯科技有限公司 | Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence |
CN110781236A (en) * | 2019-10-29 | 2020-02-11 | 山西云时代技术有限公司 | Method for constructing government affair big data management system |
CN111222028A (en) * | 2020-01-10 | 2020-06-02 | 四川日报社 | Intelligent data crawling method |
CN111783468A (en) * | 2020-06-28 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN112800083A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7885918B2 (en) | Creating a taxonomy from business-oriented metadata content | |
US8938475B2 (en) | Managing business objects data sources | |
US7143107B1 (en) | Reporting engine for data warehouse | |
US9798772B2 (en) | Using persistent data samples and query-time statistics for query optimization | |
Rao et al. | Spatial hierarchy and OLAP-favored search in spatial data warehouse | |
US20130166573A1 (en) | Managing Business Objects Data Sources | |
US9747349B2 (en) | System and method for distributing queries to a group of databases and expediting data access | |
US20060074953A1 (en) | Metadata management for a data abstraction model | |
US20040243555A1 (en) | Methods and systems for optimizing queries through dynamic and autonomous database schema analysis | |
US20080065632A1 (en) | Server, method and system for providing information search service by using web page segmented into several inforamtion blocks | |
Park et al. | Toward total business intelligence incorporating structured and unstructured data | |
CN102667761A (en) | Scalable cluster database | |
CN112800083B (en) | Government decision-oriented government affair big data analysis method and equipment | |
CN104050213B (en) | Query processing system including data classification | |
CN109408578A (en) | One kind being directed to isomerous environment monitoring data fusion method | |
CN115757689A (en) | Information query system, method and equipment | |
KR20180126792A (en) | System and Method for processing complex stream data using distributed in-memory | |
KR100671077B1 (en) | Server, Method and System for Providing Information Search Service by Using Sheaf of Pages | |
CN114707059A (en) | Water conservancy object metadata recommendation system construction method based on user preference | |
Cai et al. | Research on multi-source POI data fusion based on ontology and clustering algorithms | |
US20100268723A1 (en) | Method of partitioning a search query to gather results beyond a search limit | |
CN114707006A (en) | Multi-option intelligent retrieval method based on data dictionary | |
Goyal | Qp-subdue: Processing queries over graph databases | |
CN112214660A (en) | Industrial information identification and retrieval system | |
Cheng et al. | Generic cumulative annular bucket histogram for spatial selectivity estimation of spatial database management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 250002 No. 128 Wolong Road, Jinan City, Shandong Province Applicant after: Shandong Institute of housing and urban rural development Address before: No.17, sanlizhuang, Jingliu Road, Shizhong District, Jinan City, Shandong Province Applicant before: SHANDONG CONSTRUCTION DEVELOPMENT Research Institute |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |