CN115757689A - Information query system, method and equipment - Google Patents

Information query system, method and equipment Download PDF

Info

Publication number
CN115757689A
CN115757689A CN202211153838.5A CN202211153838A CN115757689A CN 115757689 A CN115757689 A CN 115757689A CN 202211153838 A CN202211153838 A CN 202211153838A CN 115757689 A CN115757689 A CN 115757689A
Authority
CN
China
Prior art keywords
data
information
layer
module
organization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211153838.5A
Other languages
Chinese (zh)
Inventor
苗壮
刘鹏年
武帅
李杏军
赵晋巍
于安妮
刘奇林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Military Science Information Research Center Of Military Academy Of Chinese Pla
Original Assignee
Military Science Information Research Center Of Military Academy Of Chinese Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Military Science Information Research Center Of Military Academy Of Chinese Pla filed Critical Military Science Information Research Center Of Military Academy Of Chinese Pla
Priority to CN202211153838.5A priority Critical patent/CN115757689A/en
Publication of CN115757689A publication Critical patent/CN115757689A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an information query system, an information query method and information query equipment, relates to the technical field of Internet and is used for solving the problems that only research reports exist in the prior art, and query information is incomplete and inaccurate. The information inquiry system at least comprises: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer, wherein the data acquisition layer acquires mechanism-related data from a plurality of data sources; the data processing layer processes the mechanism related data to obtain processed data; the data service layer is used for carrying out data service processing on the processed data; the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module; the display layer is an interactive entry between the user and the platform, and displays target information and associated information of which the association degree with the target information meets preset conditions for the user based on an information query request of the user. The query efficiency and the query accuracy of the information can be improved.

Description

Information query system, method and equipment
Technical Field
The invention relates to the technical field of internet, in particular to an information query system, method and device.
Background
With the wide popularization of mobile intelligent terminals and the continuous and high-speed development of communication technology and mobile internet technology, location-based information query service plays an essential role in the daily life of people. For scientific institutions, for example: the application of the internet technology plays an important role in the fields of national defense, energy, aerospace, medicine and the like. Taking a scientific research institution as an example, the method analyzes and researches scientific research projects, construction management and other conditions of national laboratories and national laboratories in the national defense field, and has important reference function and reference value for the national laboratory construction demonstration work in the national defense field which is being developed at present.
Scientific research institutions in the field of national defense and related information relate to a plurality of contents and a plurality of sources, a plurality of contents are dispersedly distributed on different army and different platforms and presented in different forms, some contents even do not exist independently, analysis and mining are needed from some documents or other forms of research results, and development and utilization are very inconvenient. The scientific research test information of the main scientific research institutions in the field of national defense is taken as a research object, analysis, association and mining are carried out, a knowledge base in the field of the main scientific research institutions is constructed, and a service is provided for scientific research test management equipment and front-line scientific research personnel, so that a user can timely master the overall scientific research condition of the main scientific research institutions of target objects, reference is provided for scientific research attack and customs organization and implementation of specific model equipment, and the method has very important guiding significance for subsequent relevant work development.
Therefore, it is desirable to provide a more reliable information query architecture.
Disclosure of Invention
The invention aims to provide an information query system, an information query method and information query equipment, which are used for solving the problems that only research reports exist and query information is incomplete and inaccurate in the prior art.
In order to achieve the above purpose, the invention provides the following technical scheme:
in a first aspect, the present invention provides an information query system, which at least includes:
the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer;
the data acquisition layer acquires organization-related data from a plurality of data sources;
the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration;
the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module;
the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module;
the display layer is an interactive inlet between a user and a platform, and displays target information corresponding to an information query request and associated information of the target information for the user based on the information query request input by the user; and the association degree between the target information and the associated information meets a preset condition.
In a second aspect, the present invention provides an information query method, where the information query method is applied to an information query system, and the information query system at least includes: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources; the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module; the display layer is an interactive inlet between a user and the platform;
the information query method comprises the following steps:
acquiring an information query request input by a user through the display layer;
searching target information matched with the information query request and associated information of the target information from the information query system based on the information query request; the association degree between the target information and the associated information meets a preset condition;
and displaying the target information and the associated information of the target information through a display layer.
In a third aspect, the present invention provides an information query device, where the information query device is applied to an information query system, and the information query system at least includes: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources; the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing fusion module, an organization classification knowledge base, a knowledge search module, a visualization module and a system management module; the display layer is an interactive inlet between a user and the platform; the apparatus comprises:
the communication unit/communication interface is used for acquiring an information query request input by a user through the display layer;
the processing unit/processor is used for searching target information matched with the information query request and the associated information of the target information from the information query system based on the information query request; the association degree between the target information and the associated information meets a preset condition;
and displaying the target information and the associated information of the target information through a display layer.
Compared with the prior art, the invention provides an information inquiry system, a method and equipment, wherein the information inquiry system at least comprises: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer, wherein the data acquisition layer acquires mechanism related data from a plurality of data sources; the data processing layer performs data extraction, data cleaning, data conversion and/or data integration on the mechanism related data to obtain processed data; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module; the display layer is an interactive entry between the user and the platform, and displays target information corresponding to the information query request and associated information of which the association degree with the target information meets preset conditions for the user based on the information query request input by the user. The method comprises the steps of collecting relevant data of a target mechanism, wherein the data source is multidimensional, and can comprehensively cover the relevant data corresponding to the target mechanism; valuable information is selected for analysis and compilation, and is processed by adopting a plurality of processing modes, so that the query requirements of a plurality of query retrieval modes are met; the processed data is stored in the knowledge base according to the type dimension, so that the information query efficiency and the information query accuracy can be improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of an information query system architecture according to the present invention;
FIG. 2 is a schematic flow chart of an information query method according to the present invention;
fig. 3 is a schematic structural diagram of an information query device provided by the present invention.
Detailed Description
In order to facilitate clear description of technical solutions of the embodiments of the present invention, in the embodiments of the present invention, words such as "first" and "second" are used to distinguish identical items or similar items with substantially the same functions and actions. For example, the first threshold and the second threshold are only used for distinguishing different thresholds, and the sequence order of the thresholds is not limited. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
It is to be understood that the terms "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.
In the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a alone, A and B together, and B alone, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a and b combination, a and c combination, b and c combination, or a, b and c combination, wherein a, b and c can be single or multiple.
Next, the scheme provided by the embodiments of the present specification will be described with reference to the accompanying drawings:
example 1
Fig. 1 is a schematic diagram of an information query system architecture provided by the present invention, where the information query system at least includes:
a data acquisition layer (in figure 1, referred to as an acquisition layer for short), a data processing layer, a storage layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources;
the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration;
the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module;
the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module;
the display layer is an interactive entrance between a user and a platform, and displays target information corresponding to an information query request and associated information of the target information for the user based on the information query request input by the user; and the association degree between the target information and the associated information meets a preset condition.
The association degree meeting the preset condition may be that the association degree has one layer of association degree, two layers of association degree or multiple layers of association degree with the target information, the number of layers of association degree may be determined based on a knowledge graph, a point in the knowledge graph represents each entity, and an edge in the knowledge graph represents an association relationship, for example: A-B-C-D, ABCD is four points, the middle "-" represents an edge, and in the "A-B-C-D", three layers of relevance between A and D can be considered to exist.
The information query system in fig. 1 collects the relevant data of the target mechanism, and the data source is multidimensional, so that the relevant data corresponding to the target mechanism can be comprehensively covered; valuable information is selected for analyzing and editing, and a plurality of processing modes are adopted for processing, so that the query requirements of a plurality of query retrieval modes are met; the processed data is stored in the knowledge base according to the type dimension, so that the information query efficiency and the information query accuracy can be improved.
In the scheme, the mechanism can be a scientific research institution, a medical institution, a national defense institution, an energy institution, a space flight institution and the like. In this specification, a scientific research institution may be taken as an example for explanation, and the main scientific research institutions in the national defense field, especially the research plans, research projects, research key points, and other aspects published by the national laboratory, research laboratory, and national target range in the national defense field, of collecting and organizing a scientific information query system by using advanced and mature information technology in combination with the characteristics of research data thereof, collecting research plans, scientific research experiment projects, management policies, and investment direction data published by the main scientific research institutions in the national defense field of a target object, selecting valuable information for analysis and compilation, performing data fusion on data from different sources, establishing an information query system, and changing only research reports and dynamic service modes in the past, for relevant users, for example: the first leader, the organization and a front line of scientific research personnel of the scientific research institution provide services.
As shown in fig. 1, the bottom layer is a data source, which includes various acquired structured data, semi-structured data, and unstructured data, specifically, official websites of scientific research institutions, news websites, procurement data (i.e., scientific research project data), resume data, encyclopedia data, and search engines. The data acquisition layer can be used for crawling open source information in and out of the country, and the acquired information is combined with manual processing and is called through an interface. The data processing layer performs automatic data processing including data extraction, data cleaning, data conversion and data integration. The part and the previous part are initial processing of the collected data; the storage layer stores various data resources, defining the format and structure of the data storage. Including data such as organization basic information, research results, development history, news trends, research projects, biographies, planning plans, policy and regulations, and the like. The data service layer provides data service functions on the basis of a database, and the functions comprise model training, full-text retrieval, semantic analysis, association analysis and the like. The application layer can build a plurality of application service modules, including processing fusion, mechanism classification knowledge base, knowledge search, visualization and system management. The upper layer is a display layer, is an interactive inlet between a user and the platform, and provides a knowledge service portal for the terminal through the display layer.
The information query system in fig. 1 is divided according to service requirements, and may be divided into 6 primary function modules, 25 secondary function modules, and in addition, further includes 5 tertiary function modules, where the 6 primary function modules are respectively: the system comprises a data acquisition module, a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module.
25 secondary function modules which are respectively: the data acquisition module comprises an information source splitting and configuring module, a data acquisition and extraction module, an acquisition task scheduling module, a failure detection module and a log statistics module;
the data processing and fusing module comprises a multi-element heterogeneous data fusion extraction and labeling module, a file import module, a manual labeling and processing module, a multi-dimensional data management module and a data classification release module;
the mechanism drawing module, the mechanism document module and the knowledge correlation module are contained in the mechanism classification knowledge base;
the knowledge search module comprises a full-library and sub-library search module, a Chinese and English retrieval module, a combined retrieval module, a relevancy sorting module, a result counting module and a formatted document retrieval module;
the visualization module comprises a data statistics module and a data visualization module;
the system management module comprises a user management authorization module, a data maintenance management module, a log management module and a data import and export module.
The 5 tertiary functional modules are respectively:
the knowledge association module comprises a product association module, a project association module and a character association module;
the data statistics module comprises a business data statistics module and a user behavior statistics module.
Optionally, the multiple data sources may include trusted authority official nets, social media, news websites, employment data sheets, resume databases, encyclopedia data, and/or search engines; wherein the trusted authority may be a scientific research authority.
The organization-related data may include organization data, project data, and personnel data of each organization; the organization data can comprise basic organization information, organization architecture, organization research results, organization development process, organization news dynamics, organization scientific research projects and organization technical expert data; the data acquisition layer acquires organization-related data from a plurality of data sources, and specifically may include:
the data acquisition layer acquires the mechanism data, the project data and the personnel data of each structure from a plurality of data sources by selecting a corresponding data acquisition mode; the acquisition mode comprises the following steps: evading reverse crawling technology, preventing flow monitoring, distributed acquisition technology, automatic acquisition technology, incremental acquisition technology, automatic coding processing technology, multi-format acquisition technology and automatic filtering acquisition technology.
Taking a scientific research institution as an example, when data is collected, the collection range can be focused on national laboratories, research laboratories and national target yards in the field of national defense, and data in the aspects of institutions, projects, personnel and the like can be comprehensively collected. The collected data of the organization comprises basic information of the organization, organization architecture, research results, development process, news dynamics, scientific research projects, technical experts and the like, and the collected data of the organization comprises information of project names, project introduction, project time, project amount and the like. The personnel acquisition information comprises data such as personnel basic information, news dynamics, social media and the like.
Optionally, after the data service layer is configured to perform data service processing on the processed data, the method may further include:
acquiring an organization ID;
associating the organization related data based on the organization ID to generate an associated data table;
setting a storage structure, field attributes and field descriptions of each associated data table, and respectively storing the processed data into an organization database, a research result library, a news dynamic library, a scientific research project library, a character introduction library and a policy and regulation library according to a plurality of different data type dimensions to realize the association of description metadata, management metadata, digital objects and data; the metadata includes title, content, keywords, publication time, source, and link.
Taking a scientific research institution as an example, aiming at data sources of the scientific research institution, the characteristics of data from different sources are researched, and the data specification is researched and formulated by combining the construction requirements of a knowledge base. The data element set required by general attribute description of various types of resources (organization information, research results, news dynamics, scientific research projects and character resumes) is specified and stored, and the description, the disclosure, the management, the storage and the interoperation of various types of resources are realized. The association is carried out around the organization, various association data tables with organization IDs as cores are designed, and the storage structure, field attributes and field descriptions of the data tables are set, so that the description metadata, the management metadata, the digital objects and the data association are realized.
The content of the institution knowledge base can comprise English names, chinese names, institution types, establishment time, institution introduction, organization architecture diagrams and other data of the information of scientific research institutions, colleges and universities and military enterprises. The organization database master database structure design is shown in table 1.
TABLE 1 organization database Master database Structure Table
Figure BDA0003857528130000091
The research result library is the main research results of the scientific research institutions, scientific research management institutions, colleges and universities and military enterprises of the authoritative institutions, the contents of the research result library comprise the names of the institutions, the names of the results, the types of the results, the brief introduction of the results, the pictures of the results, keywords, the sources of the results and the like, and the database and the institution database can be used in a correlation manner. The structural design of the main database of the research results is shown in table 2.
TABLE 2 Structure Table of Master database of research results library
Name (R) Identification Data type Data length Remarks to note
Achievement ID Id Character(s) 50 Main key
Mechanism code org_code Character(s) 50 Achievement correlation mechanism (many-to-one relation)
Name of organization org_name Character(s) 200
Name of result Name Character(s) 200
Type of outcome type Character(s) 50
Result picture picture Character(s) 500
Keyword keyword Character(s) 200
Brief introduction to the results Profile Text
Data source Source Character(s) 200
The news dynamic library is related news information of scientific research institutions, scientific research management institutions, colleges and military enterprises on the Internet, the contents comprise news titles, news contents, keywords, news sources and the like, and the organization ID is marked, so that the association with the organization is realized. The news dynamic library database structure is designed as shown in table 3.
TABLE 3 News dynamic library Master database Structure Table
Name(s) Identification Data type Data length Remarks to note
News ID Id Character(s) 50 Main key
Mechanism code org_code Character(s) 50 Association mechanism table (multi)In a one-to-one relationship)
News headline org_name Character(s) 200
News content content Text
Date of release Publish_date Date 20
Keyword keyword Character(s) 200 Number division
Data source Source Character(s) 200
The scientific research project library is related project information initiated by an organization or completed by the organization, and the main contents of the scientific research project library comprise an initiating organization, a completing organization, a project name, a project introduction, keywords, project standing time, project expenses and the like. Wherein the organization database is associated with the initiating organization and the completing organization. The scientific research project library database structure design is shown in table 3.
TABLE 3 Master database structure table of scientific research project library
Name(s) Identification Data type Data length Remarks for note
Item ID ID Character(s) 50 Main key
First square code Part_a_code Character(s) 50 Association mechanism meter (many-to-one relation)
Name of first party Part_a_name Character(s) 200
Coding of second square Part_b_code Character(s) 50 Association mechanism table (many-to-one relation)
Name of second party Part_b_name Character(s) 200
Name of item title Character(s) 300
Brief introduction to the project description Text
Time to make a stand establishment_date Date 20
Project expenses funds Number of figures 20,4
Data sources Source Character(s) 200
The character resume library is related information of characters such as management personnel and technical experts in the organization, and comprises character names, past names, character profiles, photos, education experiences, work experiences and the like. The association with the organization database is achieved by the organization ID. The main database structure design of the people resume library is shown in table 4.
Table 4 main database structure table of character resume library
Figure BDA0003857528130000111
In the invention, taking a scientific research institution as an example, the data is from internet open source data and can be collected by means of web crawlers, manual collection and the like. On the basis of data of multiple data sources, research and judgment are needed according to channel conditions, so that the data are updated to be the latest as possible, and authority, accuracy and timeliness of the data are ensured.
The information sources collected may include, by source: official website data, social media and news website data of the scientific research institution are emphasized, and recent dynamic, activity and other data of the institution can be extracted; national defense department scientific research project data; the character resume data can extract the basic information of the character; encyclopedia data, encyclopedia, wikipedia, etc.; aiming at the information source, a plurality of collectors can be configured to perform virtualization deployment, and specific collection configuration items comprise: title, content, keywords, time of release, source, links, and the like. In the process of acquisition, a maintainer can configure acquisition parameters, which mainly comprises the following steps: acquisition address, proxy, scan rules, restrictions, flow control, advanced options, etc., to optimize acquisition tasks. During data acquisition, the data acquisition tool can be screened according to the URL listed in each source, and the acquisition tool is added to perform regular monitoring and acquisition work. The data of the target information source is comprehensively and continuously acquired through an advanced acquisition technology, and a cleaned, filtered and indexed original information resource library is provided. In the acquisition process, relevant links in the page are automatically analyzed, then relevant pages are continuously captured, needed contents are captured, and useless information is eliminated. Meanwhile, multitasking is carried out, so that the acquisition efficiency is improved. Data which are inconvenient to use the acquisition tool to acquire can be acquired in a manual mode. The authorized user logs in the system, data entry is carried out in the system in a browser mode, and the system has the characteristics of intuition and usability. In the collection process, various collection technologies can be adopted, continuous and stable collection is guaranteed, continuous acquisition of overseas network data is supported, and the specific technologies comprise:
1) Avoiding the reverse crawling technology: many overseas professional websites set up anti-creep mechanism themselves, can't snatch through conventional website snatch mode, need through studying its website and webpage structure, realize avoiding the means and the measure of anti-creep mechanism.
2) And (3) flow monitoring prevention: in order to prevent the traffic detection shielding of the overseas website, technologies such as IP drift are considered, and the tracked risk is reduced.
3) Distributed acquisition techniques: the distributed acquisition technology is adopted to acquire a large number of information sources, a plurality of virtual machines can be deployed, and the acquisition sources are expanded as required in the future when being increased.
4) Automatic collection: the method automatically runs after collection and configuration, freely sets collection starting time and collection interval time, simultaneously monitors data source change, and automatically updates database data after collection.
5) Incremental acquisition: all contents are acquired for the first time, then incremental acquisition is performed periodically, and only data which are added, modified and deleted from the last monitoring point are acquired.
6) An automatic coding processing technology comprises the following steps: the condition that various character sets exist in the overseas website, and if the character sets are not identified, a messy code state is presented. Therefore, in the acquisition process, various character set codes are identified, unified conversion of the codes is performed, and different coded contents are converted into Unicode for storage.
7) Multi-format acquisition: characters, pictures, audios, videos and file attachments are collected and stored in a unified mode, the pictures, the audios, the videos and the file attachments can be stored in a disk directory, and the pictures, the audios, the videos and the file attachments need to be related in a system.
8) Automatic filtration and collection: aiming at the webpage files, the useful data on the webpage are automatically extracted by utilizing the webpage structure analysis, and unnecessary webpages or files such as advertisements, useless links, logo and the like are filtered.
Aiming at important fields such as titles, a single data automatic translation or random calling mode is adopted, and the combination of machine automatic translation and manual proofreading is realized, so that portable and friendly use experience is provided for authorized workers. And configuring different acquisition strategies aiming at different acquisition targets. For example: acquiring the priority setting of the website: the collection target groups are set, the websites which are monitored in a key mode are gathered together, the collection period and the collection updating interval of the group can be set, and preferential collection is achieved. Multithread collection: and adjusting the number of the collected working threads and the collection time interval. Collecting and automatically operating: the automatic operation can be realized after the acquisition configuration, and the acquisition start time and the acquisition interval time can be freely set. And (3) deploying: the data acquisition can be deployed on physical machines or virtual machines at a plurality of different places, each machine is configured with different acquisition groups, and the acquired data is stored in a centralized manner. Analyzing the webpage structure: the link is determined by adopting a webpage structure analysis method, so that accurate grabbing is realized, more grabbing and less grabbing are realized, and the maintenance work of reconfiguring the template due to webpage version changing is reduced. Multi-format acquisition: and comprehensively collecting pictures, forms, accessories and audios and videos. And may determine which data should be collected based on the data format selected by the user. The acquisition updating adopts a mode of completely updating for the first time and starting to perform incremental updating for the second time. The first total update is to perform total acquisition and update on data, the technology and equipment are required to acquire mainstream key data, and the mechanism acquires current data, and technical documents and projects acquire data about 10 years. And respectively setting strategies according to different conditions of data in later incremental updating, wherein the items can be updated quickly after the items are disclosed and the like if the news dynamic needs to be updated in a large amount. Data updating is synchronized to the intranet by adopting a DVD disc burning mode.
The acquisition updating adopts a mode of completely updating for the first time and starting to perform incremental updating for the second time. For example: the first total update is to perform total acquisition and update on data, the technology and equipment are required to acquire mainstream key data, and the mechanism acquires current data, and technical documents and projects acquire data about 10 years. And respectively setting strategies according to different conditions of data in later incremental updating, wherein the items can be updated quickly after the items are disclosed and the like if the news dynamic needs to be updated in a large amount. The data updating can be synchronized to the intranet by adopting a DVD disc burning mode. In the initial stage of each acquisition, the acquisition list is scanned first to determine whether the scanned website can be acquired normally, and then the formal acquisition stage is started. Due to website address change, website revision or other factors, normal collection may not be performed, and at this time, the collection website is marked and recorded into a collection log, and a prompt is given, so that the collection website can be processed at a later stage. After the website fails, parameter adjustment can be performed first, and if the parameter adjustment still cannot perform acquisition, an acquisition template may need to be customized again, and technical personnel are needed to perform implementation and deployment.
The system records acquisition logs including normal acquisition logs and acquisition failure logs, and log files are stored in the system and can be checked at any time. Websites which cannot be normally collected are recorded in the log file, and the recorded content comprises time and error conditions. And (4) carrying out statistics on the collection condition classification through collecting logs, such as the number of successfully collected websites, the number of failures and the number of collected effective information, and recording the overall log condition.
By the method, when data are collected, the data are collected from multiple data sources, and different collection modes and configuration parameters are set for different data sources, so that the data can be collected quickly and accurately.
The data service layer at least comprises a model training module, a full-text retrieval module, a voice analysis module and an association analysis module; the data service layer is configured to perform data service processing on the processed data, and may specifically include: the model training module trains an information query model based on historical data, wherein the information query model is used for extracting characteristics of query information input by a user and outputting a query result; the full-text retrieval module is used for performing full-text retrieval based on a query request input by a user; the voice analysis module is used for analyzing based on voice information input by a user and extracting keywords so as to match corresponding query results for the user; the correlation analysis module is used for determining all query results of which the correlation degree between the query results matched with the information query request meets the preset conditions based on the information query request input by the user.
By the method, various data services are provided on the data service layer, and the data are processed based on multi-dimensional requirements so as to meet different query requirements. The analysis of the association degree can also recommend other information associated with the target information for the user on the basis of recommending the target information for the user, and recommend more useful information on the basis of providing the demand information for the user again.
Optionally, the data processing and fusing module is used for importing and primarily processing the acquired original data; the data processing and fusing module comprises a multi-source heterogeneous data fusing and processing unit, a file importing unit, a manual processing unit, a multi-dimensional data management unit and a data classification and publishing unit, wherein the multi-dimensional data management unit automatically calculates the hierarchical relationship of the displayed information in an automatic association algorithm and multi-level association diagram mode, and therefore multi-dimensional data management is achieved.
Aiming at mechanism data and knowledge documents from different sources, the heterogeneous information is uniformly processed and is imported into a standard structured database; the processing, sorting and fusion of data are realized by combining the automatic computer processing and the manual processing, and the maintenance and management of multidimensional data contents are supported so as to construct data association clues and support multidimensional association analysis; and a dynamic publishing technology is adopted to support data classified publishing.
The multi-source heterogeneous data fusion extraction labeling can adopt a natural language processing mechanism to extract various entity contents from mass text data, such as various entities of time, place, technology, equipment, mechanisms, projects and the like, and then automatically construct the relationship among the entities, so that the working personnel can conveniently and quickly obtain the subject information of the text contents from the large text. The method comprises the steps of uniformly cleaning and fusing collected multi-source heterogeneous data such as webpages, databases and files, labeling the contents such as titles, time, texts and file names, converting the contents into a structured database, and creating indexes for the structured database. And for the document data and the project data, an information intelligent extraction technology is adopted, and a natural language processing mechanism is used for extracting various entity contents from mass text data to extract knowledge. The method mainly extracts equipment knowledge and mechanism knowledge, such as equipment fields, equipment names, equipment models, mechanism names, mechanism telephones, mechanism addresses and the like, and knowledge category information is formed through knowledge extraction. The intelligent information extraction technology gets rid of the fact that a traditional acquisition system can only be based on related searches of unstructured data and a keyword mode, cleaning of the data is really achieved, and a structured database with clear orderings is generated.
The intelligent information extraction technology gets rid of the fact that a traditional acquisition system can only be based on related searches of unstructured data and a keyword mode, cleaning of the data is really achieved, and a structured database with clear orderings is generated.
Intelligent extraction employs a dictionary-based, rule-based entity extraction technique to automatically extract valuable entity information from unstructured data. The method comprises the steps of extracting known time, place, technology, equipment, mechanism, project and the like in the data through a dictionary, extracting unknown time, place, technology, equipment, mechanism, project and the like in the data through rules, enabling a computer to have certain human thinking, enabling a machine to simulate human thinking, and intelligently extracting valuable information. After the information is extracted, a mathematical statistical algorithm, such as word frequency, word position, phrase relation, simultaneous occurrence probability, certainty probability, uncertainty probability, far-end probability and near-end probability, is utilized to calculate the correlation and the relevance among the data to form a professional database for experts to study and judge.
Besides data acquisition through an acquisition tool, the file import uniformly stores documents of different formats (mainly Office series files and PDF files) under a certain folder, and then the documents are automatically scanned by a system and acquired in batches. And removing formats and extracting contents in the acquisition process, and extracting file attributes. The fused data automatically collected by the computer can be further edited and processed in a manual mode. And authorized workers can perform data entry and modification in the background of the system. Providing a visual labeling function for manually labeling the document attributes and the entity object data of the organization, wherein the knowledge document attribute labels comprise organization names, result types and result introduction; the organization entity object data labels comprise main research fields, organization structures, originators, unit introductions and the like of organizations. The manual marking processing comprises the functions of creating, modifying and deleting. Multidimensional data management: the data dimension of the multidimensional model established by the system mainly comprises the following aspects: organization, achievement, project, personnel, development process, memorial matters and the like.
The system adopts an automatic association algorithm and a multi-level association diagram mode to automatically calculate and display the hierarchical relationship of information, thereby realizing multi-dimensional data management. And performing target analysis by data association and superposition of the dimensions. The core of the target analysis is to build a multidimensional data model, namely: in the process of data mining, a concrete application domain is identified from the whole by abstracting, modeling metadata of the concrete business domain and data association rules. Through the system self-learning and man-machine interaction functions, the business rules and requirements are continuously accumulated, the efficiency and the accuracy of the platform for processing data are improved, and the analysis result which better meets the business requirements is obtained.
And (3) classified data release: the system adopts a dynamic release technology, and after data updating is carried out on the database, automatic display is realized on the foreground. The classification data adopts a tree organization structure to realize release management, and can maintain nodes at all levels of the tree. And displaying various association relations between the organization classification data and other data, including organization portraits, organization documents and knowledge associations, wherein the knowledge associations are further divided into product associations, project associations and character associations. The mechanism portrait can adopt a visualization technology to show various attribute characteristics taking the mechanism as a core, and realize comprehensive display and analysis of multiple dimensions of the mechanism, so that an integral portrait depicting all information is provided for the mechanism. The bottom layer support data applied by the organization portrait is a structured database which mainly comprises basic conditions, organization structures, research results, development processes, news dynamics, scientific research projects, technical experts and the like of scientific research organizations. The basic information can comprise the Chinese name, english name, originator/highest leader, company type, established date, telephone, website, address, main business field and enterprise introduction of the organization, and the geographical position of the organization is shown in a map display mode. The organization architecture can display the whole architecture of the organization in a graphical mode, and clearly display the division of the functions of the organization. The research result can show the products and owned technologies developed by the organization in a mode of combining pictures and texts. The entire product of the facility may be displayed or the display may be by field. The development history can be used for recording change information of major records of enterprises, purchasing and stripping of enterprises and the like. The development process includes memorial, stripping and purchasing three plates. The history data is shown using a time axis. News dynamics can show relevant news information of the organization, and news can be searched. The scientific research projects can display the scientific research project information related to the institution. Including the Chinese and English names of the project, the project initiating mechanism, the project amount, the start and end time of the project and the data source. The major technical experts of the organization may also be listed. Including the avatar and character names, as well as the detailed description of the character, etc. The institutional literature may be literature data such as technical research reports, strategic literature, policy documents, literature publications, and the like, published by the institution. The content published by the institutional literature is mainly various formatted documents, the content is mainly the original documents in PDF, PPT and other formats, and comprises the classification of planning plans, expense directions, management policies, scientific research reports, treatises, periodicals and the like, and the content is the information of the original documents, the document dates, the document titles and the like.
The mechanism classification knowledge base comprises mechanism portraits, mechanism documents and knowledge associations, wherein the knowledge associations comprise product associations, project associations and character associations;
the establishment of the organization classification knowledge base specifically includes:
establishing a solid model based on the processed data, wherein the solid model at least comprises a mechanism model; entities in the mechanism model at least comprise mechanisms, organization structures, mechanism personnel, research fields and mechanism projects, and all the entities are associated through entity attribute information;
and combining the entity alignment and the reference elimination technology to realize knowledge fusion, and combining different expression forms of the same entity, entity attribute and entity relationship of data from different sources to form a mechanism classification knowledge base.
In the specific operation process, a mode of combining manual marking with an automatic association algorithm and a multi-level association diagram can be adopted, firstly, the definition, the entity relation definition and the entity attribute definition are carried out on the map entity, the establishment of the entity, the relation and the attribute is automatically completed according to key technical vocabularies of definition information, and a user can expand the related attribute and the association relation of the mechanism entity and the equipment entity according to the actual service requirement. The method comprises the steps of obtaining information such as entities, relations and time in data from structured data, unstructured data and semi-structured data, adopting different obtaining modes, and combining various modes such as machine processing and manual work.
The method comprises the steps of carrying out data extraction, conversion and loading (ETL) operation on the existing structured data, and rapidly realizing development and deployment of the ETL operation in a draggable mode through a visual basic tool so as to realize the support of data analysis and processing. The data are dynamically screened and filtered by setting classification by using key words or logic expressions, and a classification tree can be set in the system to focus related information. A natural language processing mechanism is adopted to extract various entity contents from massive text data, such as various entities of time, places, technologies, equipment, mechanisms, projects and the like, and then the relation among the entities is automatically constructed, so that a worker can conveniently and rapidly obtain the subject information of the text contents from a large text. The intelligent information extraction technology gets rid of the fact that a traditional acquisition system can only carry out related search based on unstructured data and a keyword mode, data cleaning is really achieved, and a structured database with clear orderliness is generated. And combining the entity alignment and the reference elimination technology to realize knowledge fusion, thereby combining different expression forms of the same entity, entity attribute and entity relationship of data from different sources.
Entity alignment determines whether two or more entities from different sources point to the same object in the real world. If a plurality of entities represent the same object, an alignment relationship is constructed among the entities, and information contained by the entities is fused and aggregated. A common method for entity alignment is to determine whether different source entities can be aligned by using attribute information of the entities. Furthermore, the alignment may also be performed using knowledge of the relevant field. Reference elimination is the process of dividing different references representing the same entity into an equivalent set. The method can effectively solve the problem of unknown reference in the text, is a fundamental research in the NLP field, and plays an important role in machine reading understanding and information extraction.
The meaning elimination mainly utilizes the relation between the context response language and the antecedent language to judge and resolve.
The graph database is an expansion of the traditional relational database, the graph structure supported by the graph database is more flexible, and the vertex-based visual angle is adopted in the aspects of data addition, deletion, query, modification and the like based on the graph, so that the query and the update of the vertex/edge are more efficiently supported. The neo4j graph database is a mainstream graph database which is open at present, mainly considers the actual requirements of Java application programs, and has excellent performance. The system can support the neo4j graph database to store data related to the knowledge graph, and data access is carried out through interface specifications.
In the system, organizations, projects, personnel, products, equipment, contracts and related documents are mutually related, and a user can find scientific research personnel or related projects related to the organizations in an organization data list through a related pull-down menu and can also view the products, the equipment, the contracts and the related documents under the organizations.
The knowledge search supports knowledge search on original data collected by the Internet, supports all search and sub-library search, displays information such as original data sources and titles, supports knowledge search using Chinese and English at the same time, supports search in various combination modes, performs correlation and result statistics on search results, and supports direct search of document formatted files. The search results can be displayed not only in a list manner or in a manner of related information (such as related items, related mechanisms and related documents), but also in a manner of a related map for relevance display. Full-library search and sub-library search: when the system creates indexes, uniform index data is established for a plurality of special item libraries, so that cross-library search of heterogeneous resources can be supported, a single-library and multi-library selectable search mode is realized, a joint search function is realized, and a retrieval function for characters, words and paragraphs is provided.
Chinese and English retrieval: matching is not dependent on grammatical structures in any language, words are treated as abstract symbols of meaning, understanding of a word is formed by the context in which it occurs rather than by strict grammatical definitions, slang and other variations do not affect system results. The language identification, analysis and search are based on the mathematical model of probability theory and information theory, and can process all language characters in character expression form.
And (3) combined retrieval: the system adopts an intelligent search engine, and completely supports keywords, boolean logic expressions and accurate search and fuzzy search functions, such as: combining key word logic expressions: including AND, OR, NOT, NEAR, DNEAR, SOUNDEX, FUZZY, RANGE, AND the like; the system provides a search condition setting interface, and a user can set various search conditions, including search fields (keywords, boolean logic expression (logic relation) search, search ranges (data types, time ranges, information formats and language types), result sorting modes, filter conditions, result display forms and the like.
And (3) relevance ranking: the user can quickly search the document, provide the function of sorting according to the relevance of the search result, sort the search result according to any field or a plurality of fields, and simultaneously sort the search result correspondingly according to the relevant attribute of the document. And (4) counting results: the knowledge search supports classification and navigation of searched result data, related statistical analysis charts are directly displayed on a result page, for example, a chart for summarizing and counting hit result distribution, proportion and data quantity is displayed in a visual navigation mode through the chart, and the chart can be clicked to enter list data. Formatting and retrieving: aiming at formatted files in various documents (such as strategic files, intellectual library documents and the like), manual processing is not needed, the full-text content of the files is directly searched, the files are classified and organized according to a directory form or a list form, and the classification level can be customized at will; automatically scanning the document in the directory, and automatically extracting the directory as a document classification index item; for some standard format documents, some characteristic values, such as title, author, unit, abstract, etc., can be extracted as metadata indexing items; and automatic storage of the document indexing items and automatic generation of document original text links are realized. The method supports the retrieval of file formats such as HTML, XML, office (Word, powerPoint, excel), PDF and the like.
The method is based on statistical data, statistical analysis of multi-dimensional data is achieved, analysis results support multi-angle visual presentation modes such as a time axis, a fishbone picture and a pie chart, and real-time multi-angle switching visual presentation is supported. The functions comprise data statistics and data visualization, and the data statistics is divided into service data statistics and user behavior statistics. The system realizes the visual statistical chart of the lightweight data, can customize the visual chart according to the actual business requirements, organizes and counts various data through the chart, and realizes the multi-dimensional statistical analysis of various data of the system. The data visualization chart not only displays the current statistical result, but also can directly click the part of the related graph to enter the corresponding data list, thereby realizing information tracing. The system management is to monitor and manage the user, the authorization and the data in a unified way, and comprises four parts, namely user management authorization, data maintenance management, log management and data import and export. User management: and managing a single user, and supporting addition, deletion, modification and check, including user names, passwords and log logs of log-in conditions. And (3) role management: the grouping manages the users and supports addition, deletion, modification and search. And aiming at data browsing and searching, the authority distribution and combination are carried out by setting the user roles. The role management supports the establishment of an administrator role and a common user role, and the common user role only has the service of data browsing and inquiring authority. According to the business needs, the user logging into the system does not have any function in the system to operate, and the system supports a mechanism for authorization according to data categories. The authorization management of the data category can authorize all the contents and also can authorize a certain database, thereby effectively controlling the operation of a user in the system. The authorization content comprises creation, viewing, modification and deletion, relatively fixed authority can be set for a common role, and personnel are added into the role so as to obtain the authority.
The information query system established in embodiment 1 can achieve the following technical effects:
1) The information query system can display the target query information and the associated information for the user based on the query request of the user, and can actively push the associated information for the user based on the historical operation behavior of the user.
2) When the query result is displayed for the user, the query result with multiple dimensions is displayed, and the related information with multiple relevance degrees with the target query information is additionally displayed for the user, so that better assistance is provided for the decision of the user.
3) The method can satisfy various query modes of users, such as: the method comprises the following steps of keyword query, sentence query, voice query and the like, and more comprehensive and accurate query results are recommended to a user based on various query modes.
4) A large amount of mechanized and repeated manual labor is replaced by the computer, and the initial fusion and processing of multi-channel information are realized by an intelligent analysis technology, so that the manual labor is reduced to the maximum extent, and the usability is high. The design is open and standardized, the design principle can be used for data interaction with other associated systems, and the design is open and standardized.
5) A multi-level safety protection system is established, authority management and control are adopted, classified authorization is issued according to requirements of service objects, user types, data types and the like, illegal override operation is prevented, and safety is high.
6) The platform is easy to manage and maintain, simple to operate, easy to learn and use, convenient to carry out system configuration, capable of well monitoring the contents of the operation state, the safety, the performance and the like, and strong in manageability.
7) The system has a good structure, and each part has clear and complete definition, so that the structure and operation of the global part and other parts are not influenced by local modification; the coupling degree between the modules is small, the method is suitable for service development, and system inheritance and expansion are facilitated.
Example 2
As shown in fig. 2, fig. 2 is a schematic flow chart of the information query method provided by the present invention. It should be noted that the method in embodiment 2 is implemented based on the information query system established in embodiment 1, and the information query system established in embodiment 1 is used to implement information query. Therefore, the information inquiry method is applied to an information inquiry system, and the information inquiry system at least comprises the following components: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources; the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing fusion module, an organization classification knowledge base, a knowledge search module, a visualization module and a system management module; the display layer is an interactive inlet between a user and the platform; the process may include the steps of:
step 210: and acquiring an information query request input by a user through the display layer.
Step 220: searching target information matched with the information query request and associated information of the target information from the information query system based on the information query request; and the association degree between the target information and the associated information meets a preset condition.
Step 230: and displaying the target information and the associated information of the target information through a display layer.
Since the method in fig. 2 is implemented based on the system in fig. 1, the technical effect is the same as that of embodiment 1, and is not described herein again.
Based on the same idea, the embodiment of the present specification further provides an information query device. Fig. 3 is a schematic structural diagram of an information query device provided by the present invention. The information inquiry equipment is applied to an information inquiry system, and the information inquiry system at least comprises: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources; the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module; the display layer is an interactive inlet between a user and the platform; the apparatus comprises:
the communication unit/communication interface is used for acquiring an information query request input by a user through the display layer;
the processing unit/processor is used for searching target information matched with the information query request and the associated information of the target information from the information query system based on the information query request; the association degree between the target information and the associated information meets a preset condition;
and displaying the target information and the associated information of the target information through a display layer.
As shown in fig. 3, the terminal device may further include a communication line. The communication link may include a path for transmitting information between the aforementioned components.
Optionally, as shown in fig. 3, the terminal device may further include a memory. The memory is used for storing computer-executable instructions for implementing the inventive arrangements and is controlled for execution by the processor. The processor is used for executing computer execution instructions stored in the memory, thereby realizing the method provided by the embodiment of the invention.
Optionally, the computer-executable instructions in the embodiment of the present invention may also be referred to as application program codes, which is not specifically limited in this embodiment of the present invention.
In one implementation, as shown in FIG. 3, a processor may include one or more CPUs, such as CPU0 and CPU1 of FIG. 3, for example.
In one embodiment, as shown in fig. 3, the terminal device may include a plurality of processors, such as the processor in fig. 3. Each of these processors may be a single core processor or a multi-core processor.
The above description mainly introduces the scheme provided by the embodiment of the present invention from the perspective of interaction between the modules. It is understood that each module, in order to implement the above functions, includes a corresponding hardware structure and/or software unit for performing each function. Those of skill in the art will readily appreciate that the present invention can be implemented in hardware or a combination of hardware and computer software, with the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The functional modules may be divided according to the above method examples, for example, the functional modules may be divided corresponding to the functions, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only one logic function division, and another division manner may be available in actual implementation.
The processor in this specification may also have the function of a memory. The memory is used for storing computer-executable instructions for implementing the inventive arrangements and is controlled for execution by the processor. The processor is used for executing computer execution instructions stored in the memory, thereby realizing the method provided by the embodiment of the invention.
The memory may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be separate and coupled to the processor via a communication link. The memory may also be integral to the processor.
Optionally, the computer-executable instructions in the embodiment of the present invention may also be referred to as application program codes, which is not specifically limited in this embodiment of the present invention.
The method disclosed by the embodiment of the invention can be applied to a processor or realized by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field-programmable gate array) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
In one possible implementation, a computer-readable storage medium is provided, in which instructions are stored, and when executed, are used to implement the method in the above embodiments.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the procedures or functions described in the embodiments of the present invention are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a terminal, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, hard disk, magnetic tape; or an optical medium, such as a Digital Video Disc (DVD); it may also be a semiconductor medium, such as a Solid State Drive (SSD).
While the invention has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality.
While the invention has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the invention. Accordingly, the specification and figures are merely exemplary of the invention as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An information inquiry system, characterized in that the information inquiry system comprises at least:
the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer;
the data acquisition layer acquires organization-related data from a plurality of data sources;
the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration;
the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module;
the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module;
the display layer is an interactive inlet between a user and a platform, and displays target information corresponding to an information query request and associated information of the target information for the user based on the information query request input by the user; and the association degree between the target information and the associated information meets a preset condition.
2. The system of claim 1, wherein the plurality of data sources comprises trusted authority official networks, social media, news websites, employment data sheets, resume databases, encyclopedia data, and/or search engines;
the organization related data comprises organization data, project data and personnel data of each organization;
the organization data comprises organization basic information, organization architecture, organization research results, organization development process, organization news dynamics, organization scientific research projects and organization technical expert data;
the data acquisition layer acquires mechanism-related data from a plurality of data sources, and specifically comprises:
the data acquisition layer acquires the mechanism data, the project data and the personnel data of each structure from a plurality of data sources by selecting a corresponding data acquisition mode; the data acquisition mode comprises the following steps: evading reverse crawling technology, preventing flow monitoring, distributed acquisition technology, automatic acquisition technology, incremental acquisition technology, automatic coding processing technology, multi-format acquisition technology and automatic filtering acquisition technology.
3. The system of claim 1, wherein the data service layer is configured to, after performing data service processing on the processed data, further include:
acquiring an organization ID;
associating the organization related data based on the organization ID to generate an associated data table;
setting a storage structure, field attributes and field descriptions of each associated data table, and respectively storing the processed data into an organization database, a research result library, a news dynamic library, a scientific research project library, a character introduction library and a policy and regulation library according to a plurality of different data type dimensions to realize description metadata, management metadata, digital objects and data association; the metadata includes title, content, keywords, time of publication, source, and link.
4. The system of claim 1, wherein the data service layer comprises at least a model training module, a full text retrieval module, a speech analysis module, and an association analysis module;
the data service layer is configured to perform data service processing on the processed data, and specifically includes:
the model training module trains an information query model based on historical data, and the information query model is used for extracting the characteristics of query information input by a user and outputting a query result;
the full-text retrieval module is used for performing full-text retrieval based on a query request input by a user;
the voice analysis module is used for analyzing based on voice information input by a user and extracting keywords so as to match corresponding query results for the user;
the correlation analysis module is used for determining all query results of which the correlation degree between the query results matched with the information query request meets the preset conditions based on the information query request input by the user.
5. The system of claim 2, wherein the data acquisition layer prioritizes the data acquisition modes based on data acquisition tasks when acquiring agency-related data from a plurality of data sources; setting the number of the collected working threads and the collection time interval; when the collection and updating are carried out, all updating is carried out for the first time, and incremental updating is carried out subsequently.
6. The system of claim 1, wherein the data processing and fusing module is configured to import and perform preliminary processing on the collected raw data; the data processing and fusing module comprises a multi-source heterogeneous data fusing and processing unit, a file importing unit, a manual processing unit, a multi-dimensional data management unit and a data classification issuing unit, wherein the multi-dimensional data management unit automatically calculates the hierarchical relationship of the display information in an automatic association algorithm and multi-level association diagram mode, and therefore multi-dimensional data management is achieved.
7. The system of claim 1, wherein the organization taxonomy knowledge base comprises organization representations, organization documents, and knowledge associations, the knowledge associations comprising product associations, item associations, and person associations;
the establishment of the mechanism classification knowledge base specifically comprises the following steps:
establishing a solid model based on the processed data, wherein the solid model at least comprises a mechanism model; entities in the mechanism model at least comprise mechanisms, organization structures, mechanism personnel, research fields and mechanism projects, and all the entities are associated through entity attribute information;
and combining the entity alignment and the reference elimination technology to realize knowledge fusion, and combining different expression forms of the same entity, entity attribute and entity relationship of data from different sources to form a mechanism classification knowledge base.
8. The system of claim 1, wherein the knowledge search module comprises: the system comprises a full-library searching and sub-library searching unit, a Chinese and English retrieval unit, a combined retrieval unit, a relevancy sorting unit, a result counting unit and a formatted document retrieval unit;
the visualization module is used for realizing the statistical analysis of the multi-dimensional data based on the statistical data, and the analysis result supports the visual display of a time axis, a fishbone picture and a pie picture and supports the visual display of multi-angle switching.
9. An information inquiry method is characterized in that the information inquiry method is applied to an information inquiry system, and the information inquiry system at least comprises: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources; the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module; the display layer is an interactive inlet between a user and the platform;
the information query method comprises the following steps:
acquiring an information query request input by a user through the display layer;
searching target information matched with the information query request and associated information of the target information from the information query system based on the information query request; the association degree between the target information and the associated information meets a preset condition;
and displaying the target information and the associated information of the target information through a display layer.
10. An information inquiry apparatus, characterized in that the information inquiry apparatus is applied to an information inquiry system, the information inquiry system at least includes: the system comprises a data acquisition layer, a data processing layer, a data service layer, an application layer and a display layer; the data acquisition layer acquires organization-related data from a plurality of data sources; the data processing layer performs data processing on the mechanism related data to obtain processed data; the data processing comprises data extraction, data cleaning, data conversion and/or data integration; the data service layer is used for carrying out data service processing on the processed data; the data service layer at least comprises one or more of a model training module, a full-text retrieval module, a semantic analysis module and an association analysis module; the application layer comprises a data processing and fusing module, an organization classification knowledge base, a knowledge searching module, a visualization module and a system management module; the display layer is an interactive inlet between a user and the platform; the apparatus comprises:
the communication unit/communication interface is used for acquiring an information query request input by a user through the display layer;
the processing unit/processor is used for searching target information matched with the information inquiry request and the associated information of the target information from the information inquiry system based on the information inquiry request; the association degree between the target information and the associated information meets a preset condition;
and displaying the target information and the associated information of the target information through a display layer.
CN202211153838.5A 2022-09-21 2022-09-21 Information query system, method and equipment Pending CN115757689A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211153838.5A CN115757689A (en) 2022-09-21 2022-09-21 Information query system, method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211153838.5A CN115757689A (en) 2022-09-21 2022-09-21 Information query system, method and equipment

Publications (1)

Publication Number Publication Date
CN115757689A true CN115757689A (en) 2023-03-07

Family

ID=85351761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211153838.5A Pending CN115757689A (en) 2022-09-21 2022-09-21 Information query system, method and equipment

Country Status (1)

Country Link
CN (1) CN115757689A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258138A (en) * 2023-03-15 2023-06-13 北京百度网讯科技有限公司 Knowledge base construction method, entity linking method, device and equipment
CN117009187A (en) * 2023-09-27 2023-11-07 西安热工研究院有限公司 CID file incremental compiling method, system and equipment for upper computer monitoring system
CN117453805A (en) * 2023-12-22 2024-01-26 石家庄学院 Visual analysis method for uncertainty data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258138A (en) * 2023-03-15 2023-06-13 北京百度网讯科技有限公司 Knowledge base construction method, entity linking method, device and equipment
CN116258138B (en) * 2023-03-15 2024-01-02 北京百度网讯科技有限公司 Knowledge base construction method, entity linking method, device and equipment
CN117009187A (en) * 2023-09-27 2023-11-07 西安热工研究院有限公司 CID file incremental compiling method, system and equipment for upper computer monitoring system
CN117009187B (en) * 2023-09-27 2024-01-19 西安热工研究院有限公司 CID file incremental compiling method, system and equipment for upper computer monitoring system
CN117453805A (en) * 2023-12-22 2024-01-26 石家庄学院 Visual analysis method for uncertainty data
CN117453805B (en) * 2023-12-22 2024-03-15 石家庄学院 Visual analysis method for uncertainty data

Similar Documents

Publication Publication Date Title
CN107819824B (en) Urban data opening and information service system and service method
US10261954B2 (en) Optimizing search result snippet selection
US20190213407A1 (en) Automated Analysis System and Method for Analyzing at Least One of Scientific, Technological and Business Information
US10380144B2 (en) Business intelligence (BI) query and answering using full text search and keyword semantics
CN110597981B (en) Network news summary system for automatically generating summary by adopting multiple strategies
US8935272B2 (en) Curated answers community automatically populated through user query monitoring
CN109710851B (en) Employment recommendation method and system based on multi-source data analysis in Internet mode
KR20210040891A (en) Method and Apparatus of Recommending Information, Electronic Device, Computer-Readable Recording Medium, and Computer Program
CN115757689A (en) Information query system, method and equipment
CN108984667A (en) A kind of public sentiment monitoring system
US10678820B2 (en) System and method for computerized semantic indexing and searching
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
WO2018020495A1 (en) Computerized environment for human expert analysts
US9984108B2 (en) Database joins using uncertain criteria
KR20110133909A (en) Semantic dictionary manager, semantic text editor, semantic term annotator, semantic search engine and semantic information system builder based on the method defining semantic term instantly to identify the exact meanings of each word
CN116226494B (en) Crawler system and method for information search
KR20160120583A (en) Knowledge Management System and method for data management based on knowledge structure
Salam et al. Distributed framework for political event coding in real-time
Blümel et al. The quest for research information
Faraj et al. Enriching Wikidata with cultural heritage data from the COURAGE project
Martínez-Castaño et al. Polypus: a big data self-deployable architecture for microblogging text extraction and real-time sentiment analysis
Li et al. Cs5604 fall 2016 solr team project report
CN116541503B (en) Emergency treatment auxiliary decision-making system of gas transmission and distribution system
KR102434880B1 (en) System for providing knowledge sharing service based on multimedia platform
Xuan et al. Construction and query of power equipment knowledge map based on graph database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination