CN112667700A - Government affair big data super search system - Google Patents

Government affair big data super search system Download PDF

Info

Publication number
CN112667700A
CN112667700A CN201910980793.0A CN201910980793A CN112667700A CN 112667700 A CN112667700 A CN 112667700A CN 201910980793 A CN201910980793 A CN 201910980793A CN 112667700 A CN112667700 A CN 112667700A
Authority
CN
China
Prior art keywords
retrieval
data
search
module
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910980793.0A
Other languages
Chinese (zh)
Inventor
张丹普
董雪梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Changfeng Science Technology Industry Group Corp
Original Assignee
China Changfeng Science Technology Industry Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Changfeng Science Technology Industry Group Corp filed Critical China Changfeng Science Technology Industry Group Corp
Priority to CN201910980793.0A priority Critical patent/CN112667700A/en
Publication of CN112667700A publication Critical patent/CN112667700A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a government affair big data super search system, which comprises a front-end display module, a super search module, a data governance module, a holographic archive module and a data storage module, wherein the front-end display module is used for displaying a data record; the method mainly solves the problem of high-efficiency query and retrieval of large-scale structured data by adopting an ElasticSearch + MPP architecture, can efficiently realize full-text retrieval, pinyin retrieval, range retrieval, logic combination retrieval and portrait retrieval of key words, can flexibly configure display fields of search results, solves the problems of low retrieval and query efficiency, few types of retrieval data, insufficient retrieval range, difficult display and modification of the search results and the like of the conventional retrieval mode, and can realize quick and comprehensive query and viewing of detail information of the search results by combining with multi-type data storage modes of MPP DB, HDFS and Neo4 j.

Description

Government affair big data super search system
Technical Field
The invention belongs to the technical field of big data retrieval, and relates to a government affair big data super search system based on elastic search and MPPD.
Background
Big data is a new stage of informatization development. With the convergence of information technology and human production and life, the internet is rapidly popularized, global data shows the characteristics of explosive growth and mass aggregation, and has great influence on economic development, social governance, national management and people's life. The implementation of the national big data strategy accelerates the construction of digital China, digital economy with data as key elements needs to be constructed, and the development and application of big data cannot be separated from the construction of a modern economic system.
The government affair big data refers to the process that governments promote the development of big data applications or the application practice of big data in the field of public services. However, governments at all levels still have many problems in promoting government affair big data application, and the quick, efficient and accurate query and retrieval of multi-source heterogeneous mass large-scale data is always a big problem in big data application.
The government affair big data search refers to searching, classifying, screening, filtering, sequencing and the like of a big data set according to data characteristics, such as keywords, semanteme, contents, figures and the like. The early data retrieval of government departments mainly depends on SQL-based database retrieval, and gradually develops to Solr-based full-text retrieval along with the increase of data volume, however, in the big data era, along with information explosion, when the data scale reaches a certain degree, the Solr search efficiency becomes very low, while the ElasticSearch is a real-time distributed search and analysis engine, the ElasticSearch-based search efficiency for large-scale data search is higher, and large indexes and high query rate are easier to process.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a government affair big data super search system which can efficiently realize keyword full-text search, pinyin search, range search, logic combination search and portrait search, can flexibly configure the display fields of search results, and solves the problems of low search and query efficiency, few types of search data, insufficient search range, difficult display and modification of search results and the like of the conventional search mode.
The technical scheme of the invention is as follows:
a government affair big data super search system is characterized in that: the system comprises a front-end display module, a super search module, a data management module, a holographic file module and a data storage module;
the front-end display module provides a unified search entry for keyword retrieval, pinyin retrieval, logic combination retrieval, picture retrieval and the like, provides search result display and screening according to different departments and different categories, and supports viewing of holographic files;
the super search module is based on an elastic search full-text retrieval engine, constructs a full-text index library, and provides functions of a resolver, image recognition, a word segmentation device, a query device, range retrieval, pinyin retrieval, logic combination retrieval, an external interface and the like;
the data management module is used for providing functions of metadata management, dictionary management, resource catalogue, resource item configuration and the like;
the holographic file module provides functions of quickly checking basic information, track information, relation maps and other information of target objects by searching a target result list;
the data storage module is used for constructing a data warehouse based on the MPP DB, completing picture storage based on the HDFS, and storing entity and relationship information of a target object by a Neo4j high-performance database to realize full data storage.
The invention can support accurate and fuzzy keyword query, portrait comparison, vehicle comparison, pinyin query, range query and logic combination query, provide a unified entry of various query modes, flexibly configure query range, support detail check of search targets and fast check of holographic files of the search targets, realize fast check, accurate check and complete check of government affair big data, and remarkably improve full text retrieval efficiency.
Drawings
FIG. 1 is a functional block and system flow diagram of the present invention.
Detailed Description
As shown in fig. 1, the present invention includes a front-end display module, a super search module, a data management module, a holographic archive module, and a data storage module, wherein the front-end display module provides a unified search entry for keyword retrieval, pinyin retrieval, logic combination retrieval, picture retrieval, etc., provides search result display and screening according to different departments and different categories, and supports viewing of holographic archives; the super search module is based on an elastic search full-text retrieval engine, constructs a full-text index library, and provides functions of a resolver, image recognition, a word segmentation device, a query device, range retrieval, pinyin retrieval, logic combination retrieval, an external interface and the like; the data management module is used for providing functions of metadata management, dictionary management, resource catalogue, resource item configuration and the like; the holographic file module provides functions of quickly checking basic information, track information, relation maps and other information of target objects by searching a target result list; the data storage module is used for constructing a data warehouse based on the MPP DB, completing picture storage based on the HDFS, and storing entity and relationship information of a target object by a Neo4j high-performance database to realize full data storage.
The working process of the front-end display module is as follows:
step 1, inputting keywords, pinyin and pictures through a unified search entry;
step 2, aiming at retrieval results of different conditions, providing search result display and secondary screening according to different departments and different categories, for example, carrying out retrieval range locking according to factors such as people, places, things, objects, groups and the like, and carrying out secondary screening on condition characteristics of each factor, such as age bracket, sex, nationality, native place and the like of people, and quickly and accurately positioning a search target through the secondary screening;
and 3, displaying the holographic file of the target object in a multi-dimensional mode based on the search target.
The working process of the super search module is as follows:
step 1, extracting data from an MPP DB, constructing a full-text index library, and unifying prefixes of index tables;
step 2, the system judges that the input content is an independent keyword, or a logic combination keyword, or pinyin, or a picture:
if the keyword is an independent keyword or pinyin, full-text retrieval is directly carried out through the word segmentation device and the query device; if the keywords are logical combination keywords (comprising special symbols such as AND, OR, NOT, blank space AND the like), rule analysis is carried out through an analyzer, AND then full-text retrieval is carried out through a word segmentation device AND a query device; if the picture is the picture, the target identity verification is firstly carried out through the image recognition service, and the retrieval result is directly given.
And 3, calling and reading the holographic file of the target object based on the search target, and checking basic information, track information, a relation map and other information of the search target object in a multi-dimensional mode.
The working process of the data management module is as follows:
step 1, extracting a data source from an MPP DB to realize metadata management and dictionary management;
step 2, registering through a resource directory, connecting a database, importing the database table, marking names, classifications, authorizations and the like of related data tables, and constructing a data resource directory;
and 3, flexibly setting fields of the related database table as screening conditions of super search, displaying result page fields or detail page fields, displaying position sequencing and the like according to the registered database table and the associated metadata, and realizing resource item configuration.
The working process of the holographic file module is as follows:
step 1, extracting a data source from the MPP DB to realize inquiry and display of basic information, track information, social information and the like;
step 2, extracting relevant picture information such as certificate head portraits, vehicle photos and the like from the HDFS;
and 3, inquiring entity and relationship information of the target object from the Neo4j database and displaying the entity and relationship information in a relationship map.
The working process of the data storage module is as follows:
step 1, an MPP DB cluster stores full structured data and provides a quick read-write function;
step 2, the HDFS stores semi-structured data (such as Excel, TxT, CSV and the like) and structured data (such as pictures, videos, audios and the like), and provides an information sharing exchange interface for the outside;
and 3, storing entity and relationship information of target objects (such as people, vehicles, mobile phones, enterprises, houses and the like) by the Neo4j high-performance graph database, and providing an information sharing exchange interface for the outside.

Claims (6)

1. A government affair big data super search system is characterized in that: the system comprises a front-end display module, a super search module, a data management module, a holographic file module and a data storage module;
the front-end display module provides a unified search entry for keyword retrieval, pinyin retrieval, logic combination retrieval, picture retrieval and the like, provides search result display and screening according to different departments and different categories, and supports viewing of holographic files;
the super search module is based on an elastic search full-text retrieval engine, constructs a full-text index library, and provides functions of a resolver, image recognition, a word segmentation device, a query device, range retrieval, pinyin retrieval, logic combination retrieval, an external interface and the like;
the data management module is used for providing functions of metadata management, dictionary management, resource catalogue, resource item configuration and the like;
the holographic file module provides functions of quickly checking basic information, track information, relation maps and other information of target objects by searching a target result list;
the data storage module is used for constructing a data warehouse based on the MPP DB, completing picture storage based on the HDFS, and storing entity and relationship information of a target object by a Neo4j high-performance database to realize full data storage.
2. The government affair big data super search system according to claim 1, wherein: the working process of the front-end display module is as follows:
(21) inputting keywords, pinyin and pictures through a unified search entry;
(22) aiming at retrieval results of different conditions, search result display and secondary screening are provided according to different departments and different categories, the retrieval range is locked according to factors such as people, places, things, objects, groups and the like, secondary screening is carried out on condition characteristics of each factor, such as age bracket, sex, ethnicity, native place and the like of people, and a search target can be quickly and accurately positioned through secondary screening;
(23) and displaying the holographic archive of the target object in multiple dimensions based on the search target.
3. The government affair big data super search system according to claim 1, wherein: the working process of the super search module is as follows:
(31) extracting data from the MPP DB, constructing a full-text index library, and unifying prefixes of all index tables;
(32) the system judges that the input content is an independent keyword, or a logic combination keyword, or pinyin, or a picture: if the keyword is an independent keyword or pinyin, full-text retrieval is directly carried out through the word segmentation device and the query device; if the keyword is a logic combination keyword, rule analysis is performed through an analyzer, and full-text retrieval is performed through a word segmentation device and a query device; if the picture is the picture, the target identity verification is firstly carried out through the image recognition service, and the retrieval result is directly given.
(33) Based on the search target, the holographic archive of the read target object is called, and basic information, track information, a relation map and other information of the search target object are checked in a multi-dimensional mode.
4. The government affair big data super search system according to claim 1, wherein: the working process of the data governance module is as follows:
(41) extracting a data source from the MPP DB to realize metadata management and dictionary management;
(42) through resource directory registration, database connection, database table import, name, classification, authorization and other marking on related data tables, and data resource directories are constructed;
(43) and flexibly setting fields of the related database tables as screening conditions of super search, fields of a display result page or fields of a display detail page, display position sequencing and the like according to the registered database tables and the associated metadata, so as to realize resource item configuration.
5. The government affair big data super search system according to claim 1, wherein: the working process of the holographic file module is as follows:
(51) extracting a data source from the MPP DB to realize the query and display of basic information, track information and social information;
(52) extracting related picture information including certificate head portraits and vehicle photos from the HDFS;
(53) and inquiring the entity and relationship information of the target object from the Neo4j database and displaying the entity and relationship information in a relationship graph.
6. The government affair big data super search system according to claim 1, wherein: the working process of the data storage module is as follows:
(61) the MPPD cluster stores full structured data and provides a quick read-write function;
(62) the HDFS stores semi-structured data and provides an information sharing exchange interface for the outside;
(63) the Neo4j high-performance graph database stores entity and relationship information of target objects and provides an information sharing exchange interface for the outside.
CN201910980793.0A 2019-10-16 2019-10-16 Government affair big data super search system Pending CN112667700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910980793.0A CN112667700A (en) 2019-10-16 2019-10-16 Government affair big data super search system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910980793.0A CN112667700A (en) 2019-10-16 2019-10-16 Government affair big data super search system

Publications (1)

Publication Number Publication Date
CN112667700A true CN112667700A (en) 2021-04-16

Family

ID=75400139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910980793.0A Pending CN112667700A (en) 2019-10-16 2019-10-16 Government affair big data super search system

Country Status (1)

Country Link
CN (1) CN112667700A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051441A (en) * 2021-06-01 2021-06-29 北京道达天际科技有限公司 Storage design and management method of entity object
CN113849695A (en) * 2021-05-13 2021-12-28 南京爱福路汽车科技有限公司 Method and system for intelligently retrieving vehicle type based on vehicle set

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849695A (en) * 2021-05-13 2021-12-28 南京爱福路汽车科技有限公司 Method and system for intelligently retrieving vehicle type based on vehicle set
CN113849695B (en) * 2021-05-13 2024-05-03 南京爱福路汽车科技有限公司 Method and system for intelligently searching vehicle types based on vehicle groups
CN113051441A (en) * 2021-06-01 2021-06-29 北京道达天际科技有限公司 Storage design and management method of entity object

Similar Documents

Publication Publication Date Title
US8914414B2 (en) Integrated repository of structured and unstructured data
US8862566B2 (en) Systems and methods for intelligent parallel searching
US8311999B2 (en) System and method for knowledge research
US20120117116A1 (en) Extended Database Search
US9442905B1 (en) Detecting neighborhoods from geocoded web documents
US20090204590A1 (en) System and method for an integrated enterprise search
US20100198804A1 (en) Security management for data virtualization system
WO2012129149A2 (en) Aggregating search results based on associating data instances with knowledge base entities
US9047368B1 (en) Self-organizing user-centric document vault
CN112667701A (en) Government affair big data super search method
Giangreco et al. ADAM pro: Database support for big multimedia retrieval
US20140019454A1 (en) Systems and Methods for Caching Data Object Identifiers
CN112667700A (en) Government affair big data super search system
CN111680043B (en) Method for quickly retrieving mass data
CN115145871A (en) File query method and device and electronic equipment
US20210382885A1 (en) Collaborating using different object models
US8805820B1 (en) Systems and methods for facilitating searches involving multiple indexes
Mondal et al. Efficient indexing of top-k entities in systems of engagement with extensions for geo-tagged entities
CN116483829A (en) Data query method, device, computer equipment and storage medium
US20230367750A1 (en) System and method for assigning an entity a unique identifier
US11954223B2 (en) Data record search with field level user access control
CN115617905A (en) Method and system for quickly retrieving cloud disk metadata
Karthikeyan et al. Deep Root Memory Optimized Indexing Methodology for Image Search Engines.
CN110019993B (en) Method for realizing sequencing optimization algorithm technology based on massive standard literature data
Aravinth et al. Study and Analysis of Various Big Data Analytics Tools for Data Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination