CN111858733A - Government affair information comparison method and system based on internet multi-source heterogeneous data - Google Patents

Government affair information comparison method and system based on internet multi-source heterogeneous data Download PDF

Info

Publication number
CN111858733A
CN111858733A CN202010685187.9A CN202010685187A CN111858733A CN 111858733 A CN111858733 A CN 111858733A CN 202010685187 A CN202010685187 A CN 202010685187A CN 111858733 A CN111858733 A CN 111858733A
Authority
CN
China
Prior art keywords
internet
data
source heterogeneous
information
heterogeneous data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010685187.9A
Other languages
Chinese (zh)
Inventor
肖卓明
吴敏东
李兴杰
陈志云
罗琪元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Southern Newspaper Media Group New Media Co ltd
Original Assignee
Guangdong Southern Newspaper Media Group New Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Southern Newspaper Media Group New Media Co ltd filed Critical Guangdong Southern Newspaper Media Group New Media Co ltd
Priority to CN202010685187.9A priority Critical patent/CN111858733A/en
Publication of CN111858733A publication Critical patent/CN111858733A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a government affair information comparison method and a system based on internet multi-source heterogeneous data, wherein the method is based on a cloud server and comprises the following steps: establishing a database, acquiring multi-source heterogeneous data of the internet to the database according to an internet data acquisition mechanism, performing keyword matching on the data in storage to generate index associated information, integrating the index associated information through an algorithm model, generating a comparison image, and completing comparison of government affair information; through internet data acquisition mechanism, can directly acquire the data relevant with the government affair information from the internet for the data of comparing are more comprehensive, can increase substantially the effect that government affair information was compared, simultaneously, independently collect data through adopting data acquisition mechanism, can reduce collection time and energy by a wide margin, in addition, still through adopting algorithm model to integrate the comparison to index associated information, solved current government affair information comparison method and too depended on the defect of analyst's experience.

Description

Government affair information comparison method and system based on internet multi-source heterogeneous data
Technical Field
The invention relates to the technical field of information comparison, in particular to a government affair information comparison method and system based on internet multi-source heterogeneous data.
Background
Government affair information occupies and important position in the whole social information, and through comparing government affair information, can effectively know government operational aspect, plays crucial effect to government planning work, and current government affair information comparison method generally is through artifical data collection integration, then compares to realize government affair information's comparison.
However, the existing government affair information comparison method has the following defects: (1) data collection is carried out manually, so that a large amount of time and energy are consumed, the comparison efficiency is low, and the comparison effect is often closely related to the experience of an analyst; (2) usually, the internet multi-source heterogeneous data analysis cannot be effectively utilized, or a convenient and effective technology is lacked, so that the evaluation on government affair information cannot meet objective requirements, and the comparison effect is influenced.
Disclosure of Invention
In view of this, the invention provides a method and a system for comparing government affair information based on internet multi-source heterogeneous data, which can solve the problems of low comparison efficiency and poor comparison effect of the existing method for comparing government affair information.
The technical scheme of the invention is realized as follows:
a government affair information comparison method based on internet multi-source heterogeneous data is based on a cloud server and specifically comprises the following steps:
step S1, a database for storing multi-source heterogeneous data of the Internet is established in a cloud server;
step S2, collecting Internet multi-source heterogeneous data related to government affair information to the database according to a preset Internet data collection mechanism;
step S3, matching the key words of the database data to generate index association information;
and step S4, integrating the index association information through an algorithm model pre-stored in the cloud server to generate a comparison image, thereby completing the comparison of government affair information.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the preset internet data acquisition mechanism comprises website access, access time and access frequency.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the step S2 includes the following steps:
step S21, determining an access task for acquiring multi-source heterogeneous data of the Internet according to a preset Internet data acquisition mechanism;
Step S22, calling a corresponding application development interface API unit according to the access task;
and step S23, obtaining the Internet multi-source heterogeneous data returned by the API unit.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the step S21 includes the following steps:
step S211, acquiring task starting time in the access task;
step S212, determining that the task corresponding to the task start time is the access task when the current time exceeds or reaches the task start time.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the step S3 includes the following steps:
step S31, determining a keyword to be retrieved;
and step S32, screening in the database data according to the keywords, thereby forming index associated information.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the screening in step S32 includes a regular expression matching method.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the algorithm model comprises a data calculation model and a chart generation model.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the internet multi-source heterogeneous data are stored in a distributed mode through data slices and data fragments to form warehouse-in data.
As a further alternative of the internet multi-source heterogeneous data-based government affair information comparison method, the index association information includes media index association information, public opinion index association information and government affair index association information.
A government affair information comparison system based on internet multi-source heterogeneous data utilizes any one comparison method.
The invention has the beneficial effects that: through predetermined internet data acquisition mechanism, can directly acquire the data relevant with the government affair information from the internet, make the data of comparing more comprehensive, can increase substantially the effect that government affair information is compared, and simultaneously, through adopting internet data acquisition mechanism, can realize independently collecting data, solve the problem that artifical collection exists consumes a large amount of time and energy, in addition, still integrate the comparison to index associated information through adopting algorithm model, thereby solved current government affair information comparison method and too depended on the defect of analyst's experience.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a government affair information comparison method based on internet multi-source heterogeneous data.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a government affair information comparison method based on internet multi-source heterogeneous data is based on a cloud server and specifically comprises the following steps:
step S1, a database for storing multi-source heterogeneous data of the Internet is established in a cloud server;
Step S2, collecting Internet multi-source heterogeneous data related to government affair information to the database according to a preset Internet data collection mechanism;
step S3, matching the key words of the database data to generate index association information;
and step S4, integrating the index association information through an algorithm model pre-stored in the cloud server to generate a comparison image, thereby completing the comparison of government affair information.
In this embodiment, through predetermined internet data acquisition mechanism, can directly acquire the data relevant with the government affair information from the internet, make the data of comparing more comprehensive, can increase substantially the effect that government affair information compares, simultaneously, through adopting internet data acquisition mechanism, can realize independently collecting data, solve the problem that artifical collection exists consumes a large amount of time and energy, in addition, still integrate the comparison to index associated information through adopting algorithm model, thereby solved the defect that current government affair information comparison method too relies on analyst's experience.
It should be noted that, in step S3, the index that needs to be compared is determined by the keyword, for example, the index that needs to be compared is determined to be a media propagation index by determining that the keyword is media propagation, and then the index-related information is generated by matching data information related to media propagation in the collected internet multi-source heterogeneous data related to government affair information.
Preferably, the preset internet data acquisition mechanism comprises website access, access time and access frequency.
In this embodiment, the internet data acquisition mechanism is preset in a cloud server, the access websites include but are not limited to various news websites, forums and microblogs, internet multi-source heterogeneous data related to government affair information is acquired by accessing various news websites, forums and microblogs, and meanwhile, a large amount of government affair information can be accumulated in various news websites, forums and microblogs, so that the acquired internet multi-source heterogeneous data is more sufficient and comprehensive; the access time is the time for accessing the website once, and the depths for capturing the multi-source heterogeneous data of the internet can be different by setting different access times, for example, a website with high association with government affair information can be set with longer access time, a website with low association with government affair information can be set with shorter time, and the multi-source heterogeneous data of the internet can be captured more reasonably; the access frequency is the number of times of accessing the website, and the depths of capturing the multi-source heterogeneous data of the internet can be different by setting different access frequencies, so that the capturing of the multi-source heterogeneous data of the internet is more reasonable.
Preferably, the step S2 includes the steps of:
step S21, determining an access task for acquiring multi-source heterogeneous data of the Internet according to a preset Internet data acquisition mechanism;
step S22, calling a corresponding application development interface API unit according to the access task;
and step S23, obtaining the Internet multi-source heterogeneous data returned by the API unit.
In the embodiment, since the internet data acquisition mechanism comprises various news websites, forums and microblogs, access tasks need to be allocated, which news website, forum and microblog are accessed in what time period is specifically allocated, the access tasks can be determined to be accessed more orderly, the efficiency of the access websites for acquiring internet multi-source heterogeneous data is improved, and in addition, the returned data is API data by calling the development interface API unit, so that the integrity and accuracy of the returned data can be well ensured.
Preferably, the step S21 includes the steps of:
step S211, acquiring task starting time in the access task;
step S212, determining that the task corresponding to the task start time is the access task when the current time exceeds or reaches the task start time.
In the embodiment, the task starting time in the access task is firstly obtained, then the task starting time is compared with the current time when the current time is obtained, whether the current time exceeds or reaches the task starting time is judged, and if yes, the task corresponding to the task starting time is determined to be the access task; it should be noted that, in a normal situation, the access tasks are ordered according to the start time of the tasks, and when one access task is completed, the next access task is executed.
Preferably, the step S3 includes the steps of:
step S31, determining a keyword to be retrieved;
and step S32, screening in the database data according to the keywords, thereby forming index associated information.
Preferably, the method adopted in the screening in step S32 includes a regular expression matching method.
In the embodiment, the keywords are determined firstly, so that the indexes to be compared are determined through the keywords, then, the collected internet multi-source heterogeneous data related to government affair information and stored in a database is screened out through a regular expression matching method, information related to the indexes is screened out, then, text semantic analysis is carried out on the screening result, and therefore index related information is formed, and the screened data can be more accurate through the regular expression matching method.
Preferably, the algorithm model includes a data calculation model and a graph generation model.
In this embodiment, the index association information is input into the data calculation model for integration, the data calculation model outputs the calculation result, and then the calculation result is input into the chart generation model, the chart generation model outputs the comparison image, so that the comparison of the indexes is realized, and the comparison is performed through a plurality of indexes, so that the comparison of the government affair information is realized; it should be noted that the data calculation model and the chart generation model are generated based on deep neural network training, have a relatively high recognition rate, and can effectively solve the defect that the existing government affair information comparison method depends on the experience of an analyst.
Preferably, the internet multi-source heterogeneous data is stored in a distributed manner by adopting data slices and data fragments to form warehouse data.
In this embodiment, the heterogeneous data of internet multisource that will gather carries out the distributing type through the mode of data slice and data fragmentation and stores in the database, not only can optimize structural data and unstructured data storage mode, can all kinds of data formats of rapid storage moreover, in addition, when the heterogeneous data of internet multisource was seeked to needs, can also look up the heterogeneous data of required internet multisource fast.
Preferably, the index association information includes media index association information, public opinion index association information, and government affairs index association information.
In this embodiment, the index associated information is generated according to the keyword, and when new index associated information needs to be established, the new keyword can be determined to generate, and the comparison can be performed through the multi-angle index, so that the effect of comparing the government affair information can be better, and the government affair work can be better planned; the index-related information is index-related information related to a government body.
A government affair information comparison system based on internet multi-source heterogeneous data utilizes any one comparison method.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A government affair information comparison method based on internet multi-source heterogeneous data is based on a cloud server and is characterized by comprising the following steps:
step S1, a database for storing multi-source heterogeneous data of the Internet is established in a cloud server;
Step S2, collecting Internet multi-source heterogeneous data related to government affair information to the database according to a preset Internet data collection mechanism;
step S3, matching the key words of the database data to generate index association information;
and step S4, integrating the index association information through an algorithm model pre-stored in the cloud server to generate a comparison image, thereby completing the comparison of government affair information.
2. The internet multi-source heterogeneous data-based government information comparison method according to claim 1, wherein the preset internet data acquisition mechanism comprises website access, access time and access frequency.
3. The internet-based government information comparing method for multi-source heterogeneous data according to claim 2, wherein the step S2 comprises the steps of:
step S21, determining an access task for acquiring multi-source heterogeneous data of the Internet according to a preset Internet data acquisition mechanism;
step S22, calling a corresponding application development interface API unit according to the access task;
and step S23, obtaining the Internet multi-source heterogeneous data returned by the API unit.
4. The internet-based government information comparing method for multi-source heterogeneous data according to claim 3, wherein the step S21 comprises the steps of:
Step S211, acquiring task starting time in the access task;
step S212, determining that the task corresponding to the task start time is the access task when the current time exceeds or reaches the task start time.
5. The internet multi-source heterogeneous data-based government information comparison method according to claim 1 or 4, wherein the step S3 comprises the steps of:
step S31, determining a keyword to be retrieved;
and step S32, screening in the database data according to the keywords, thereby forming index associated information.
6. The internet multi-source heterogeneous data-based government information comparison method according to claim 5, wherein the screening in the step S32 includes a regular expression matching method.
7. The internet-based government information comparison method for multi-source heterogeneous data according to claim 1, wherein the algorithm model comprises a data calculation model and a chart generation model.
8. The internet multi-source heterogeneous data-based government information comparison method according to claim 7, wherein the internet multi-source heterogeneous data is stored in a distributed manner by adopting data slices and data fragments to form warehouse-in data.
9. The internet-based government affair information comparing method for multi-source heterogeneous data according to claim 8, wherein the index association information includes media index association information, public opinion index association information and government affair index association information.
10. A government affairs information comparison system based on internet multi-source heterogeneous data, which is characterized in that the system uses the comparison method of any one of claims 1-9.
CN202010685187.9A 2020-07-16 2020-07-16 Government affair information comparison method and system based on internet multi-source heterogeneous data Pending CN111858733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010685187.9A CN111858733A (en) 2020-07-16 2020-07-16 Government affair information comparison method and system based on internet multi-source heterogeneous data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010685187.9A CN111858733A (en) 2020-07-16 2020-07-16 Government affair information comparison method and system based on internet multi-source heterogeneous data

Publications (1)

Publication Number Publication Date
CN111858733A true CN111858733A (en) 2020-10-30

Family

ID=72983573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010685187.9A Pending CN111858733A (en) 2020-07-16 2020-07-16 Government affair information comparison method and system based on internet multi-source heterogeneous data

Country Status (1)

Country Link
CN (1) CN111858733A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385609A (en) * 2021-12-23 2022-04-22 北京北明数科信息技术有限公司 Label-based government affair event processing system, method, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202501A (en) * 2016-07-20 2016-12-07 宁波公众信息产业有限公司 A kind of information analysis system
CN108038229A (en) * 2017-12-25 2018-05-15 河北省科学院应用数学研究所 Government affairs information search method, system and terminal device
CN110751374A (en) * 2019-09-26 2020-02-04 中电万维信息技术有限责任公司 Electronic government affair assessment method based on neural network and related equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202501A (en) * 2016-07-20 2016-12-07 宁波公众信息产业有限公司 A kind of information analysis system
CN108038229A (en) * 2017-12-25 2018-05-15 河北省科学院应用数学研究所 Government affairs information search method, system and terminal device
CN110751374A (en) * 2019-09-26 2020-02-04 中电万维信息技术有限责任公司 Electronic government affair assessment method based on neural network and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385609A (en) * 2021-12-23 2022-04-22 北京北明数科信息技术有限公司 Label-based government affair event processing system, method, equipment and storage medium

Similar Documents

Publication Publication Date Title
US10776431B2 (en) System and method for recommending content based on search history and trending topics
CN109033200B (en) Event extraction method, device, equipment and computer readable medium
JP5092165B2 (en) Data construction method and system
CN106897334A (en) A kind of question pushing method and equipment
US20140201203A1 (en) System, method and device for providing an automated electronic researcher
CN103546326A (en) Website traffic statistic method
Lande et al. OSINT as a part of cyber defense system
CN105488211A (en) Method for determining user group based on feature analysis
US10250550B2 (en) Social message monitoring method and apparatus
Anandhi et al. Prediction of user’s type and navigation pattern using clustering and classification algorithms
KR101450453B1 (en) Method and apparatus for recommending contents
Sujatha Improved user navigation pattern prediction technique from web log data
Thakur et al. Detection of malicious URLs in big data using RIPPER algorithm
CN103595747A (en) User-information recommending method and system
CN111858733A (en) Government affair information comparison method and system based on internet multi-source heterogeneous data
CN105589935A (en) Social group recognition method
CN116842099B (en) Multi-source heterogeneous data processing method and system
Ng et al. Forecasting topic activity with exogenous and endogenous information signals in Twitter
CN113495945A (en) Text search method, text search device and storage medium
Belkaroui et al. Conversational based method for tweet contextualization
CN113961811B (en) Event map-based conversation recommendation method, device, equipment and medium
Bhakdisuparit et al. Understanding and clustering hashtags according to their word distributions
CN113868481A (en) Component acquisition method and device, electronic equipment and storage medium
Nguyen et al. Pagerank-based approach on ranking social events: a case study with flickr
KR20210117037A (en) Method for recommending similar user in social internet of things, and recording medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination