CN112667700A

CN112667700A - Government affair big data super search system

Info

Publication number: CN112667700A
Application number: CN201910980793.0A
Authority: CN
Inventors: 张丹普; 董雪梅
Original assignee: China Changfeng Science Technology Industry Group Corp
Current assignee: China Changfeng Science Technology Industry Group Corp
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2021-04-16

Abstract

The invention relates to a government affair big data super search system, which comprises a front-end display module, a super search module, a data governance module, a holographic archive module and a data storage module, wherein the front-end display module is used for displaying a data record; the method mainly solves the problem of high-efficiency query and retrieval of large-scale structured data by adopting an ElasticSearch + MPP architecture, can efficiently realize full-text retrieval, pinyin retrieval, range retrieval, logic combination retrieval and portrait retrieval of key words, can flexibly configure display fields of search results, solves the problems of low retrieval and query efficiency, few types of retrieval data, insufficient retrieval range, difficult display and modification of the search results and the like of the conventional retrieval mode, and can realize quick and comprehensive query and viewing of detail information of the search results by combining with multi-type data storage modes of MPP DB, HDFS and Neo4 j.

Description

Government affair big data super search system

Technical Field

The invention belongs to the technical field of big data retrieval, and relates to a government affair big data super search system based on elastic search and MPPD.

Background

Big data is a new stage of informatization development. With the convergence of information technology and human production and life, the internet is rapidly popularized, global data shows the characteristics of explosive growth and mass aggregation, and has great influence on economic development, social governance, national management and people's life. The implementation of the national big data strategy accelerates the construction of digital China, digital economy with data as key elements needs to be constructed, and the development and application of big data cannot be separated from the construction of a modern economic system.

The government affair big data refers to the process that governments promote the development of big data applications or the application practice of big data in the field of public services. However, governments at all levels still have many problems in promoting government affair big data application, and the quick, efficient and accurate query and retrieval of multi-source heterogeneous mass large-scale data is always a big problem in big data application.

The government affair big data search refers to searching, classifying, screening, filtering, sequencing and the like of a big data set according to data characteristics, such as keywords, semanteme, contents, figures and the like. The early data retrieval of government departments mainly depends on SQL-based database retrieval, and gradually develops to Solr-based full-text retrieval along with the increase of data volume, however, in the big data era, along with information explosion, when the data scale reaches a certain degree, the Solr search efficiency becomes very low, while the ElasticSearch is a real-time distributed search and analysis engine, the ElasticSearch-based search efficiency for large-scale data search is higher, and large indexes and high query rate are easier to process.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a government affair big data super search system which can efficiently realize keyword full-text search, pinyin search, range search, logic combination search and portrait search, can flexibly configure the display fields of search results, and solves the problems of low search and query efficiency, few types of search data, insufficient search range, difficult display and modification of search results and the like of the conventional search mode.

The technical scheme of the invention is as follows:

a government affair big data super search system is characterized in that: the system comprises a front-end display module, a super search module, a data management module, a holographic file module and a data storage module;

the front-end display module provides a unified search entry for keyword retrieval, pinyin retrieval, logic combination retrieval, picture retrieval and the like, provides search result display and screening according to different departments and different categories, and supports viewing of holographic files;

the super search module is based on an elastic search full-text retrieval engine, constructs a full-text index library, and provides functions of a resolver, image recognition, a word segmentation device, a query device, range retrieval, pinyin retrieval, logic combination retrieval, an external interface and the like;

the data management module is used for providing functions of metadata management, dictionary management, resource catalogue, resource item configuration and the like;

the holographic file module provides functions of quickly checking basic information, track information, relation maps and other information of target objects by searching a target result list;

the data storage module is used for constructing a data warehouse based on the MPP DB, completing picture storage based on the HDFS, and storing entity and relationship information of a target object by a Neo4j high-performance database to realize full data storage.

The invention can support accurate and fuzzy keyword query, portrait comparison, vehicle comparison, pinyin query, range query and logic combination query, provide a unified entry of various query modes, flexibly configure query range, support detail check of search targets and fast check of holographic files of the search targets, realize fast check, accurate check and complete check of government affair big data, and remarkably improve full text retrieval efficiency.

Drawings

FIG. 1 is a functional block and system flow diagram of the present invention.

Detailed Description

As shown in fig. 1, the present invention includes a front-end display module, a super search module, a data management module, a holographic archive module, and a data storage module, wherein the front-end display module provides a unified search entry for keyword retrieval, pinyin retrieval, logic combination retrieval, picture retrieval, etc., provides search result display and screening according to different departments and different categories, and supports viewing of holographic archives; the super search module is based on an elastic search full-text retrieval engine, constructs a full-text index library, and provides functions of a resolver, image recognition, a word segmentation device, a query device, range retrieval, pinyin retrieval, logic combination retrieval, an external interface and the like; the data management module is used for providing functions of metadata management, dictionary management, resource catalogue, resource item configuration and the like; the holographic file module provides functions of quickly checking basic information, track information, relation maps and other information of target objects by searching a target result list; the data storage module is used for constructing a data warehouse based on the MPP DB, completing picture storage based on the HDFS, and storing entity and relationship information of a target object by a Neo4j high-performance database to realize full data storage.

The working process of the front-end display module is as follows:

step 1, inputting keywords, pinyin and pictures through a unified search entry;

step 2, aiming at retrieval results of different conditions, providing search result display and secondary screening according to different departments and different categories, for example, carrying out retrieval range locking according to factors such as people, places, things, objects, groups and the like, and carrying out secondary screening on condition characteristics of each factor, such as age bracket, sex, nationality, native place and the like of people, and quickly and accurately positioning a search target through the secondary screening;

and 3, displaying the holographic file of the target object in a multi-dimensional mode based on the search target.

The working process of the super search module is as follows:

step 1, extracting data from an MPP DB, constructing a full-text index library, and unifying prefixes of index tables;

step 2, the system judges that the input content is an independent keyword, or a logic combination keyword, or pinyin, or a picture:

if the keyword is an independent keyword or pinyin, full-text retrieval is directly carried out through the word segmentation device and the query device; if the keywords are logical combination keywords (comprising special symbols such as AND, OR, NOT, blank space AND the like), rule analysis is carried out through an analyzer, AND then full-text retrieval is carried out through a word segmentation device AND a query device; if the picture is the picture, the target identity verification is firstly carried out through the image recognition service, and the retrieval result is directly given.

And 3, calling and reading the holographic file of the target object based on the search target, and checking basic information, track information, a relation map and other information of the search target object in a multi-dimensional mode.

The working process of the data management module is as follows:

step 1, extracting a data source from an MPP DB to realize metadata management and dictionary management;

step 2, registering through a resource directory, connecting a database, importing the database table, marking names, classifications, authorizations and the like of related data tables, and constructing a data resource directory;

and 3, flexibly setting fields of the related database table as screening conditions of super search, displaying result page fields or detail page fields, displaying position sequencing and the like according to the registered database table and the associated metadata, and realizing resource item configuration.

The working process of the holographic file module is as follows:

step 1, extracting a data source from the MPP DB to realize inquiry and display of basic information, track information, social information and the like;

step 2, extracting relevant picture information such as certificate head portraits, vehicle photos and the like from the HDFS;

and 3, inquiring entity and relationship information of the target object from the Neo4j database and displaying the entity and relationship information in a relationship map.

The working process of the data storage module is as follows:

step 1, an MPP DB cluster stores full structured data and provides a quick read-write function;

step 2, the HDFS stores semi-structured data (such as Excel, TxT, CSV and the like) and structured data (such as pictures, videos, audios and the like), and provides an information sharing exchange interface for the outside;

and 3, storing entity and relationship information of target objects (such as people, vehicles, mobile phones, enterprises, houses and the like) by the Neo4j high-performance graph database, and providing an information sharing exchange interface for the outside.

Claims

1. A government affair big data super search system is characterized in that: the system comprises a front-end display module, a super search module, a data management module, a holographic file module and a data storage module;

2. The government affair big data super search system according to claim 1, wherein: the working process of the front-end display module is as follows:

(21) inputting keywords, pinyin and pictures through a unified search entry;

(22) aiming at retrieval results of different conditions, search result display and secondary screening are provided according to different departments and different categories, the retrieval range is locked according to factors such as people, places, things, objects, groups and the like, secondary screening is carried out on condition characteristics of each factor, such as age bracket, sex, ethnicity, native place and the like of people, and a search target can be quickly and accurately positioned through secondary screening;

(23) and displaying the holographic archive of the target object in multiple dimensions based on the search target.

3. The government affair big data super search system according to claim 1, wherein: the working process of the super search module is as follows:

(31) extracting data from the MPP DB, constructing a full-text index library, and unifying prefixes of all index tables;

(32) the system judges that the input content is an independent keyword, or a logic combination keyword, or pinyin, or a picture: if the keyword is an independent keyword or pinyin, full-text retrieval is directly carried out through the word segmentation device and the query device; if the keyword is a logic combination keyword, rule analysis is performed through an analyzer, and full-text retrieval is performed through a word segmentation device and a query device; if the picture is the picture, the target identity verification is firstly carried out through the image recognition service, and the retrieval result is directly given.

(33) Based on the search target, the holographic archive of the read target object is called, and basic information, track information, a relation map and other information of the search target object are checked in a multi-dimensional mode.

4. The government affair big data super search system according to claim 1, wherein: the working process of the data governance module is as follows:

(41) extracting a data source from the MPP DB to realize metadata management and dictionary management;

(42) through resource directory registration, database connection, database table import, name, classification, authorization and other marking on related data tables, and data resource directories are constructed;

(43) and flexibly setting fields of the related database tables as screening conditions of super search, fields of a display result page or fields of a display detail page, display position sequencing and the like according to the registered database tables and the associated metadata, so as to realize resource item configuration.

5. The government affair big data super search system according to claim 1, wherein: the working process of the holographic file module is as follows:

(51) extracting a data source from the MPP DB to realize the query and display of basic information, track information and social information;

(52) extracting related picture information including certificate head portraits and vehicle photos from the HDFS;

(53) and inquiring the entity and relationship information of the target object from the Neo4j database and displaying the entity and relationship information in a relationship graph.

6. The government affair big data super search system according to claim 1, wherein: the working process of the data storage module is as follows:

(61) the MPPD cluster stores full structured data and provides a quick read-write function;

(62) the HDFS stores semi-structured data and provides an information sharing exchange interface for the outside;

(63) the Neo4j high-performance graph database stores entity and relationship information of target objects and provides an information sharing exchange interface for the outside.