CN117171105A

CN117171105A - Electronic archive management system based on knowledge graph

Info

Publication number: CN117171105A
Application number: CN202311195734.5A
Authority: CN
Inventors: 马宝森
Original assignee: Inspur General Software Co Ltd
Current assignee: Inspur General Software Co Ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-05

Abstract

The embodiment of the specification discloses an electronic archive management system based on a knowledge graph, which comprises: the system comprises a data storage module, a data management module, a relationship analysis module, a relationship storage module and a relationship display module; the data storage module is used for collecting archive data and realizing the storage and inquiry of various business archives through a database; the data management module is used for realizing management of preset business links of the electronic files; the relation analysis module is used for summarizing different archive data stored in the data storage module to obtain archive relations of the different archive data; the relation storage module is used for distinguishing and storing file relations of different businesses and different historical periods; the relation display module is used for obtaining the archive relation stored in the relation storage module and displaying the association relation between the current archive data and other archive data through a knowledge graph.

Description

Electronic archive management system based on knowledge graph

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an electronic archive management system based on a knowledge graph.

Background

With the continuous development of digital technology, the file management work is gradually transferred from paper file management to a digitizing process, and compared with the traditional paper file management work, the electronic file can save the searching process of a real place in an on-line operation data management mode, so that the file management method is more efficient and convenient; and along with the time of file management, the quantity of files can be more and more, traditional paper archives can occupy more and more space, and the searching and retrieving of archives also can become more and more complicated along with the increase of archives quantity, influences archives management personnel's work efficiency. Compared with the traditional paper archives, the management and the searching of the electronic archives are more concise and efficient, archive management personnel can easily locate specific archives on-line and quickly look up corresponding archives through electronic equipment, the setting of the electronic archives on data authority is more standard, the authority isolation of different data authorities and security personnel is more standard and easy to operate, the data security work is better ensured, and the problem of important information leakage caused by negligence of archive management personnel in the archive review process can be greatly avoided. The electronic file management system can store and manage mass information and can also combine artificial intelligence technologies such as pattern recognition, natural language processing and the like to realize information mining of mass files, obtain deep association relations among the files, fully mine the value of file data and provide more support and help for file management work.

The electronic file management system is a system for providing a complete solution for file modernization management of enterprises and public institutions, realizes a computer management information system for receiving, managing, storing and utilizing electronic files, has the functional requirements of openness, functional expandability, configuration flexibility, safety reliability and the like, meets the management forms of multiple categories and multiple formats, and has the function of carrying out auxiliary management on entity files. The electronic file is an important component of national information resources, and effective utilization of the electronic file information has important significance for improving working efficiency and realizing maximization of data value.

At present, many electronic file management systems simply store and inquire file data, and as the electronic file quantity stored by a service system is larger and larger in scale, the traditional inquiry and retrieval scheme is low in efficiency, and the requirements on precision and recall rate cannot be met when nonstandard and unstructured data are processed, so that how to analyze and display the existing file data relationship becomes more and more difficult.

Disclosure of Invention

One or more embodiments of the present disclosure provide an electronic archive management system based on a knowledge graph, which is used for solving the technical problem set forth in the background art.

One or more embodiments of the present disclosure adopt the following technical solutions:

one or more embodiments of the present disclosure provide an electronic archive management system based on a knowledge graph, including: the system comprises a data storage module, a data management module, a relationship analysis module, a relationship storage module and a relationship display module; wherein,

the data storage module is used for collecting archive data and realizing the storage and inquiry of various business archives through a database;

the data management module is used for realizing management of preset business links of the electronic files;

the relation analysis module is used for summarizing different archive data stored in the data storage module to obtain archive relations of the different archive data;

the relation storage module is used for distinguishing and storing file relations of different businesses and different historical periods;

the relation display module is used for obtaining the archive relation stored in the relation storage module and displaying the association relation between the current archive data and other archive data through a knowledge graph.

The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect:

the traditional query searching scheme becomes low in efficiency along with the increase of the number of the electronic files, but the electronic file management system based on the knowledge graph in the embodiment of the specification can effectively improve the speed and accuracy of query searching through the relationship analysis module and the relationship display module, and a user can more intuitively know the association relationship among files through the display of the knowledge graph to quickly find out the required file information.

While the traditional query search scheme has limited processing capacity on nonstandard and unstructured data, the electronic archive management system based on the knowledge graph in the embodiment of the specification can collect different archive data, and the archive relations of different businesses and historic periods are stored in a distinguishing mode through the relation storage module, so that nonstandard and unstructured data can be processed better. Thus, the system can provide more accurate and comprehensive retrieval results, and meets the requirements of users on precision and recall rate.

According to the electronic archive management system based on the knowledge graph, the archive relation stored in the relation storage module can be visually displayed through the relation display module. The display mode can intuitively present the association relation between the current archive data and other archive data, and helps users to better understand the relation and value between archives. Thus, the user can quickly locate and utilize the related archive data, thereby improving the working efficiency and maximizing the data value.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

FIG. 1 is a schematic diagram of an electronic archive management system based on a knowledge graph according to one or more embodiments of the present disclosure;

FIG. 2 is a flow diagram of a relationship presentation provided by one or more embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating archival relationships according to one or more embodiments of the present disclosure.

Detailed Description

The embodiment of the specification provides an electronic file management system based on a knowledge graph.

At present, many electronic file management systems simply store and inquire file data, and as the electronic file quantity stored by a service system is larger and larger in scale, the traditional inquiry and retrieval scheme is low in efficiency, and the requirements on precision and recall rate cannot be met when nonstandard and unstructured data are processed, so that how to analyze and display the existing file data relationship becomes more and more difficult. Through analysis and display of the archival relations of different business sources, the utilization efficiency of archival data can be greatly improved, and business data relations in different periods can be better utilized by virtue of the advantage that the electronic archives can infinitely prolong the material preservation time, so that great convenience is brought to the review and utilization of archival documents, and the maximization of archival data value is realized.

In recent years, knowledge graph is used as a very popular technology in the field of artificial intelligence, and has been very successful in the fields of semantic search, information analysis, intelligent question-answering and the like. The knowledge graph displays the relation between different matters through the forms of 'points' and 'edges', constructs the heterogeneous knowledge in the field, constructs the knowledge association, can solve the problems that data in the field are scattered in a plurality of systems, the data are various, complex and island, and the single data has low value in application scenes, and constructs rich association relation by combining the uniform structured expression form with rich semantic information to be directly provided for downstream application. Aiming at the problems that the data formats and standards of various heterogeneous data sources in enterprises and organizations are inconsistent and difficult to integrate and analyze, the data sources can be standardized through the technical scheme of the knowledge graph and converted into the form of the knowledge graph, and unified query and analysis are conveniently carried out by utilizing the visualization and reasoning capabilities of the graph, so that users are helped to find hidden relations and modes, and more efficient, accurate and intelligent data analysis is supported.

The knowledge graph is introduced into the electronic archive management system, so that a user can be helped to quickly construct management relations of archive data of different business systems, knowledge association is quickly searched through graph reasoning and visualization capability, hidden data relations are found, different business data are effectively associated, high efficiency and intelligence of business processes are realized, and multi-dimensional utilization of the knowledge data by enterprises and organizations is enabled. In a traditional electronic file management system, files of different service sources are stored and displayed in a classified mode according to types, when a user inquires the files, the files are required to be filtered and inquired in a database according to specific field information of the files, the inquiry mode is single, and the problem of low inquiry speed can be faced along with the increasing of the data volume of the files; and when the user does not know the specific information of the file to be queried, the user cannot directly locate the target file, so that the query cost of the user is further increased, and the normal workflow of the user is delayed. By means of a file relation network built by the knowledge graph, the electronic file management system can correlate file data scattered in a plurality of service sources, potential correlation of file information of different service sources is built through the relation network, and the problems of islanding and low single data application value are solved. Users can utilize the information to combine with the customized graph display of the business requirement to make explicit sedimentation and association of domain knowledge, and expand the utilization scene of data: the query and search range of the files can be expanded through the file associated information of different service sources, the query and search efficiency of the user on the related file information is improved, and the capability of better organizing, managing and understanding the Internet mass information is provided.

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present disclosure.

Fig. 1 is a schematic structural diagram of an electronic archive management system based on a knowledge graph according to one or more embodiments of the present disclosure.

The electronic archive management system may include: a data storage module 102, a data management module 104, a relationship analysis module 106, a relationship storage module 108, and a relationship display module 110; wherein,

the data storage module of the embodiment of the specification can be used for collecting archive data and realizing the storage and inquiry of various business archives through a database.

In the embodiment of the present disclosure, it is possible to determine which archival data needs to be collected and the source of the data; a proper data acquisition mode is designed, wherein the data acquisition mode can comprise manual input, file importing, data interfaces with other systems and the like, so that the accuracy and completeness of acquired data are ensured, and the safety of the data and the standardization of a data format are considered; then, a suitable database system, such as a relational database (e.g., mySQL, oracle) or a document database (e.g., mongo db), may be selected to store the archive data, and the table structure of the database, including fields and indexes, may be designed to support the subsequent query and analysis requirements, taking into account the structure and relationships of the data.

Further, according to the embodiment of the specification, a corresponding data table can be created according to the database design, and data storage is performed. The creation of data tables and data insertion operations may be implemented using database management tools or programming languages; and then based on the query language (such as SQL) or the programming interface of the database, the query function of the archive data is realized, the query interface can be designed according to the service requirement, the query is supported according to the conditions of keywords, time range, service type and the like, and the accuracy and the response speed of the query result are ensured.

Meanwhile, the embodiment of the specification can establish a data quality management mechanism comprising data cleaning, data checking, data backup and the like so as to ensure the accuracy and the integrity of the archive data; related data authority management strategies can be designed and implemented according to the sensitivity and business requirements of the archives, so that only authorized personnel can access and modify corresponding archival data.

In addition, the embodiment of the specification can also test the data storage and query functions, so that the stability and the performance of the system are ensured. And optimizing according to the test result, and improving the response speed and user experience of the system.

The data management module of the embodiment of the specification can be used for realizing management of the preset business links of the electronic file.

In the embodiment of the present disclosure, specific preset service link requirements may be defined first, and the preset service links may include file data receiving, sorting, utilization, identification, statistics, audit trail, and the like. The system can communicate and negotiate with file management personnel and business departments to know business processes and data requirements; and designing a proper business process according to preset business links, and defining specific operation steps, roles and authorities of each link, and data flow and data processing requirements.

In the embodiment of the present disclosure, a flow and a manner of receiving archive data may be designed and implemented, which may include a manner of file uploading, data importing, interface docking, etc., so as to ensure accuracy and integrity of archive data.

In the embodiment of the present specification, the received archive data may be sorted and identified according to archive management specifications and requirements, which may include sorting, archiving, integrity checking, quality control, and data verification, etc., to ensure accuracy, integrity, and consistency of the archive data.

In the embodiment of the specification, functions and interfaces can be provided to support the viewing, searching, analyzing and utilizing of the archival data, and the query interface and the data analysis function are designed according to the service requirements to support the effective utilization of the archival data by a user.

In the embodiment of the specification, a statistics and audit trail function can be designed and implemented to count and audit the use condition of the archive data. Recording an operation log of a user, tracking the use condition of data, and auditing the modification, deletion and other operations of the data.

In the embodiment of the present specification, system management functions including user authority management, role management, system configuration, log management, etc. may be provided to ensure security, stability and manageability of the system.

In the embodiment of the specification, the function of the data management module can be tested, the stability and the performance of the system are ensured, the optimization is performed according to the test result, and the response speed and the user experience of the system are improved.

The relationship analysis module of the embodiment of the present disclosure may be configured to aggregate different archive data stored in the data storage module, so as to obtain an archive relationship of the different archive data.

In the embodiment of the present disclosure, when obtaining the archive relation of the different archive data, the archive relation of the different archive data may be obtained by performing preprocessing, data subtraction, data cleaning and data mining analysis on the different archive data, where the preprocessing includes missing value processing.

When the embodiment of the specification processes the missing values of the different archive data, the missing values of the archive data of the same service in the same period can be complemented by an average value processing mode, which means that the missing values are filled with the average value of other data of the same service in the same period; the missing values of the file data of different businesses in different periods can be supplemented in a preset rule mode, and the preset rule can be a logic rule predefined according to business requirements and used for judging and filling the missing values; the missing values of the archive data of the same service in different periods can be complemented by a last observed value pushing mode LOCF (Last Observation Carried Forward), namely the last observed value can be used as a filling value of the missing values, and the values are assumed to be relatively stable; missing values of profile data of different businesses in the same period can be complemented in a statistical manner, which means that a reasonable value is calculated to replace the missing value by using a statistical method (such as average value, median, etc.).

When the data cleaning is performed on the different archive data, the embodiment of the specification can detect whether the content of the different archive data accords with the archive storage specification of the archive according to the detection specification of the preset archive data, and screen out the archive data which does not accord with the archive storage specification.

In particular, embodiments of the present description may first define archive storage specifications for the archive, i.e., determine which content and rules are considered to be compliant. These specifications may include data formats, data structures, field rules, data types, and the like. And according to a preset detection specification, detecting the content of each archive data to determine whether the archive data accords with the specification, and verifying by comparing the archive data with requirements defined by the specification.

In the content detection process, the embodiment of the specification identifies file data which does not meet the specification, and the data may include format errors, missing fields, field values which do not meet the specification, and the like. For archive data that is detected to be out of specification, processing or marking may be performed. Processing may include repairing data for format errors, filling in missing fields, adjusting field values that are not compliant with specifications, and so forth. The labels may be used for subsequent processing or further analysis.

Further, the embodiment of the present disclosure may perform the step of data cleansing multiple times to ensure that the archive data meets the specification.

When the embodiment of the specification performs data mining analysis on different archive data, the field information of the different archive data can be subjected to semantic analysis according to the data characteristics of the deep learning network, the NLP framework and different services, so that multi-level and multi-dimensional data analysis on characters, words and chapters in the different archive data is realized, and the semantic analysis content of the different archive data is obtained.

In the embodiment of the specification, a deep learning network, such as a self-encoder, a cyclic neural network (RNN) or a Convolutional Neural Network (CNN), can be used to construct a suitable model according to the data characteristics of different services, and field information of different archive data can be subjected to semantic analysis by using a Natural Language Processing (NLP) framework and technology, such as a word bag model, word embedding, TF-IDF, topic model, and the like, and relevant features can be extracted.

The embodiment of the specification can apply NLP technology and deep learning model to carry out text analysis and semantic analysis on the characters, words and chapters in different archive data. The meaning and association of the data is extracted and understood by analyzing the semantics, emotion, subject, keywords, etc. of the text.

The embodiment of the specification can perform multi-level and multi-dimensional data analysis based on the result of semantic analysis. The analysis results may be presented in the form of charts, images, word clouds, etc. using data visualization tools and techniques to better understand and discover relevant patterns and trends of the data.

Through the implementation steps, multi-level and multi-dimensional data analysis, including semantic analysis, can be performed on the different archive data. This helps to drill down into the inherent meaning and relevance of the data, providing more rich information and insight to support business decisions and business optimization.

The embodiment of the present disclosure may perform data subtraction on the different archive data, and may take the following implementation steps:

data cleaning and arrangement: and cleaning and sorting the archive data, including removing redundant data, repairing error data, unifying formats, naming specifications and the like. This helps to improve data quality and accuracy and provides for subsequent data subtraction.

Data classification and labeling: the data is classified and marked according to its type, content and value. The data may be categorized using tags, metadata, or other means for subsequent abatement policy formulation and enforcement.

Data evaluation and screening: and evaluating and screening each data according to the actual requirements and targets. And determining the data to be reserved, deleted or archived by considering factors such as timeliness, importance, availability and the like of the data.

And (3) formulating a reduction strategy: and (3) formulating a specific reduction strategy according to the data evaluation and screening results. The following strategies can be considered:

deleting redundant data: duplicate, or no longer useful data is deleted.

Deleting outdated data: data that is outdated, or no longer needed is identified and deleted.

Archiving and storing: data that does not need frequent access for a long period of time is archived to free up storage space.

Compressing data: and the compression algorithm is used for compressing the data, so that the occupied storage space is reduced.

Implementing a subtractive strategy: and starting to implement data reduction according to the formulated strategy. Deleting, archiving or compressing data ensures that the process of data reduction is correct and efficient.

Monitoring and evaluation: in practice, the effect and impact of data reduction is monitored. Assessing whether the outcome of the data abatement achieves the intended effect and ensuring that the abatement process does not cause data loss or errors.

Updating documents and records: all performed data reduction operations are recorded, including deleting, archiving, or compressing data. The accuracy and timeliness of the files and records are ensured.

Periodic inspection and updating: data mitigation strategies are periodically reviewed and updated to accommodate changing needs and requirements. The data value and retention period are re-assessed and the abatement strategy and criteria are adjusted.

Through the implementation steps, different archive data can be effectively subtracted, redundancy and useless data are reduced, and the data management and utilization efficiency is optimized. Meanwhile, the method is also beneficial to reducing the storage cost, improving the data access speed and ensuring the compliance and reliability of the data.

The relation storage module of the embodiment of the specification can be used for distinguishing and storing the archive relation of different businesses and different historical periods.

In the embodiment of the present disclosure, when the archive relationships are stored in a differentiated manner, the current archive data may be stored as a first source archive; storing the same business archive relation filed in the same period as a second source archive; storing different business archive relations filed in the same period as current business source archives; storing the same business archive relationship filed in different periods as a second source archive, and storing the time interval between the two archives; and storing different business archive relations filed in different periods as current business source archives, and storing the time interval between the two archives.

It should be noted that, in the embodiment of the present specification, the current archive data is stored as the first source, which may indicate that the archives are up to date and have a direct association with the current service. For archive relationships of the same service that are archived during the same period, storing them as second source archives may indicate that these archives are archived during the same period but are weakly associated with the current service. For the archive relations of different businesses archived in the same period, the archive relations are stored as source archives of the current business, which can indicate that the archives are not only archived in the same period, but also have direct association with the current business. For the archive relation of the same service archived in different periods, the archive relation is stored as a second source archive, and the time interval between the two archives is recorded, so that the archives can indicate that the archives are archived in different historical periods of the same service, and the development condition of the service can be known through the time interval. For the archive relations of different businesses archived in different periods, the archive relations are stored as source archives of the current business, and the time interval between the two archives is recorded, so that the archives are archived in different businesses or different historic periods, and the evolution situation among the different businesses can be known through the time interval.

Through the implementation steps, the archive relations of different businesses and different historical periods can be stored in a distinguishing mode, and subsequent management and utilization are facilitated.

Further, in the embodiment of the present disclosure, the above-mentioned archival relationships between different services and different historical periods are combined to perform a distinguishing and storage, when the archival relationships between different archival data stored in the data storage module are obtained, the archival data of the same service obtained at the current moment may be summarized, a first association relationship of the archival data of the same service is analyzed, and when the first association relationship is established, the first source archive and the second source archive are recorded in a distinguishing manner; summarizing the file data of the first appointed service acquired at the current moment and the file data of the same service which is filed previously, analyzing second association relations existing among the file data of the same service in different historical periods, distinguishing and recording the first source file and the second source file when the second association relations are established, and distinguishing and recording the generation time interval of the file data; summarizing and analyzing the archive data of the second designated service acquired at the current moment and the archive data of other services acquired at the same time to acquire a third association relation of all archive data at the same time, and distinguishing and recording the first source archive and the source archive of other services when the third association relation is established; and summarizing and analyzing the archive data of the third service acquired at the current moment and the archive data of other previously archived services to acquire a fourth association relation of all archive data at the same time, distinguishing and recording the current service source archive and the other service source archive when the fourth association relation is established, and distinguishing and recording the generation time interval of the archive data.

It should be noted that, in the embodiment of the present disclosure, when the first association relationship is established, the first source archive and the second source archive are recorded in a distinguishing manner, which means that the first association relationship is established by analyzing the relationship between the first source archive and the second source archive for the archive data of the same service acquired at the current time, and the distinction between the first source archive and the second source archive is recorded. When the second association relationship is established, the first source archive and the second source archive are subjected to distinguishing record according to the generation time interval of the archive data, which means that the archive data of the first designated service acquired at the current moment is compared and analyzed with the archive data of the same service which is filed before, the relationship between the first source archive and the second source archive is established, and the first source archive and the second source archive are recorded and distinguished according to the time interval. When the third association relation is established, the first source file and the other service source files are recorded in a distinguishing mode, which means that the relation between the first source file and the other service source files is obtained through analyzing the file data of the second designated service acquired at the current moment and the file data of the other service, and the first source file and the other service source files are recorded in a distinguishing mode. When the fourth association relation is established, the current service source file and other service source files are recorded in a distinguishing mode, and the distinguishing record is carried out according to the generation time interval of the file data, which means that the file data of the third service acquired at the current moment is compared and analyzed with the file data of other services to obtain the relation between the file data and the file data of the third service and the file data of other services, and the current service source file and the file data of other service sources are recorded and distinguished according to the time interval.

The relationship display module of the embodiment of the present disclosure may be configured to obtain the archive relationship stored in the relationship storage module, and display, through a knowledge graph, an association relationship between current archive data and other archive data.

In this embodiment of the present disclosure, the archive relationship stored in the relationship storage module may be acquired first, and other archive data associated with the current archive data and a dimension in which the other archive data is located may be determined; and determining the nodes and the associated edges of the current archive data and the other archive data in the associated network through a knowledge graph, and generating a related associated network so as to display the association relationship between the current archive data and the other archive data.

It should be noted that, when acquiring the archive relationship stored in the relationship storage module in the embodiment of the present disclosure, stored archive relationship data may be acquired from the relationship storage module, where the data describes an association relationship between archives, including an association between current archive data and other archive data.

It should be noted that, when determining other archive data associated with the current archive data and the dimension thereof, the embodiment of the present disclosure may determine other archive data associated with the current archive data according to the obtained archive relationship data, and further determine the dimension thereof, where the archive data is located, where the dimension may be a service dimension, a time dimension or other relevant dimensions.

It should be noted that, when determining nodes and edges in the association network through the knowledge graph in the embodiment of the present disclosure, the technology and method of the knowledge graph may be utilized, according to the determined association relationship and dimension information, determine nodes and associated edges of the current archive data and other archive data in the association network, where the nodes may represent archive data, and the associated edges may represent association relationships between archives.

It should be noted that, when the association network display association relationship is generated in the embodiment of the present disclosure, a related association network diagram may be generated according to the determined nodes and association edges in the association network, and the association network diagram may be displayed in a graphic or other form, so as to display the association relationship between the current archive data and other archive data, and through this diagram, the association situation between the archives may be intuitively known.

It should be noted that, through the above implementation steps, the embodiment of the present disclosure may acquire and display the association relationship between the current archive data and other archive data, so as to help understand and analyze the association between archive data, and further apply to relevant management and decision.

Further, in the embodiment of the present disclosure, when determining, by using a knowledge graph, a node and an associated edge where the current profile data and the other profile data are located in an associated network, the first source profile may be determined to be a root node in the associated network by using the knowledge graph, the other profile data may be determined to be a slave node in the associated network, and a second source profile in the same period, a current service source profile in the same period, a second source profile in a different period, and a current service source profile in a different period may be determined to be associated edges in the associated network.

It should be noted that, in the embodiment of the present disclosure, a knowledge graph is first constructed, and the knowledge graph is constructed to include nodes and associated edges of archive data. The nodes represent different archive data, and the associated edges represent the relations among the different archive data; determining a root node and a slave node, wherein the first source archive is determined to be used as the root node of the association network according to the knowledge graph, and other archive data are used as the slave nodes of the association network; then, determining the associated edge, and determining a second source file in the same period, a current service source file in the same period, a second source file in different periods and a current service source file in different periods as the associated edge in the associated network according to the knowledge graph; and finally, determining the relation between the nodes and the edges, and determining the relation between the current archive data and other archive data according to the nodes and the associated edges in the knowledge graph, namely determining the nodes and the associated edges where the current archive data and other archive data are located in an associated network.

It should be noted that the specific implementation steps of the embodiments of the present disclosure may operate according to the above summary, including constructing a knowledge graph, determining a root node and a slave node, determining an associated edge, and determining a relationship between a node and an edge.

It should be noted that, in order to solve the problems that the existing electronic archive management system has low query and search efficiency and less effective information of archive query and search when the archive data volume is increased, the embodiment of the specification provides an electronic archive efficient management system of a knowledge graph, an electronic archive information base is built by collecting basic data information of archives, field information of the archives can be used for query when query is performed, and the built archive association relationship can be used for query, so that the query speed is high, the accuracy is high, and when a user does not determine specific content information of a target archive, the user can also quickly locate through a relationship network built among other archives, thereby being convenient for archive management and greatly improving the utilization rate of archive data.

The electronic archive management system comprises a data storage module, a data management module, a relationship analysis module, a relationship storage module and a relationship display module:

1. the data storage module realizes the storage and inquiry of various business files through a database;

2. the data management module realizes the general functional requirements of the conventional electronic file business links and the electronic file system management, such as receiving, sorting, utilizing, identifying, counting, audit trail, system management and the like.

3. The relation analysis module collects different archive data stored in the data storage module after archiving, and obtains the relation of the archive data through methods such as data preprocessing, data reduction, data cleaning and evaluation, data mining analysis and the like;

4. the electronic archive management system collects archive data of different service sources, stores the data in the data storage module, performs association relation analysis on the data archived in different time ranges and different source ranges, and stores the obtained archive data in the relation storage module;

5. the relation display module is used for acquiring the file relation stored in the relation storage module and displaying the association relation among different files.

FIG. 2 is a flowchart illustrating a relationship display process according to one or more embodiments of the present disclosure, where the process includes file data collection, file data archiving, file relationship analysis, file relationship storage, and file relationship display.

The data storage module can collect archive data and realize the storage and inquiry of various business archives through the database.

The data management module can realize the general functional requirements of the conventional electronic file business links and the electronic file system management, such as receiving, sorting, utilizing, identifying, counting, audit trail, system management and the like.

The relation analysis module can collect different archive data stored in the data storage module after archiving, and the relation acquisition of the archive data is carried out through methods such as data preprocessing, data reduction, data cleaning and evaluation, data mining analysis and the like.

The relation storage module can store the file relation of different businesses and different historical periods, the electronic file management system stores the data in the data storage module by collecting the file data of different business sources, performs association relation analysis on the data filed in different time ranges and different source ranges through the relation analysis module, and stores the obtained file data in the relation storage module.

The relationship display module can be used for acquiring the file relationship stored in the relationship storage module and displaying the association relationship among different files.

It should be noted that, the specific technical scheme of the embodiment of the present specification is as follows:

the relationship analysis module can summarize different archive data stored in the data storage module after archiving, and acquire the relationship of the archive data through methods such as data preprocessing, data reduction, data cleaning and evaluation, data mining analysis and the like:

(1) The data preprocessing comprises missing value processing, data cleaning, data selecting, data transforming, data integrating, data reducing and data cleaning;

(2) Filling the file data missing values in the same period in an average value processing mode, filling the file data of different service sources in a prefabricating rule, filling the file data missing values in different periods in a local area network (LOCF) method, and filling the file data of different service sources in the same period in a statistical mode;

(3) In the data cleaning process, detecting whether the content of the checking data accords with the archive storage specification of the archive according to the detection specification of the electronic archive data, and screening the data which does not accord with the archive storage specification of the electronic archive;

(4) Carrying out semantic analysis on field information of the archive data, carrying out semantic understanding by utilizing a deep learning network and an NLP framework to realize multi-level and multi-dimensional data analysis such as characters, words, chapters and the like, obtaining semantic analysis content of the archive data, simultaneously carrying out fine-granularity semantic analysis extraction by combining data characteristics of different service data sources, and expanding characteristic information of the archive data;

(5) And re-examining and checking the obtained archive data information characteristics, correcting the identifiable errors in the identified data file, cleaning the erroneous or conflicting data according to a certain rule, and obtaining the analyzed data characteristic vector in a desired format.

The electronic archive management system stores data in the data storage module by collecting archive data of different service sources, performs association relation analysis on the data archived in different time ranges and different source ranges, and stores the obtained archive data in the relation storage module:

(1) Summarizing the same service source data acquired at the current moment through a relationship analysis module, analyzing the internal association relationship of the data, storing the generated relationship in a relationship storage module, and distinguishing and recording a first source file and a second source file when the relationship is established;

(2) Summarizing a certain service file acquired at the current moment and the same service file filed previously through a relation analysis module, analyzing the association relation existing between the same service file data in different historical periods, storing the acquired association relation in a relation storage module, distinguishing and recording a first source file and a second source file when the relation is established, and distinguishing and recording the generation time interval of the file data;

(3) Summarizing and analyzing a certain service file acquired at the current moment and other service files acquired at the same time through a relation analysis module, acquiring the association relation of all archive file data at the same time, storing the acquired association relation in a relation storage module, and distinguishing and recording a first source file and other service source files when the relation is established;

(4) And carrying out summarization analysis on a certain service file acquired at the current moment and other service files which are filed previously through a relation analysis module, acquiring the association relation of all filed file data at the same time, storing the acquired association relation in a relation storage module, distinguishing and recording the current service source file from other service source files when the relation is established, and distinguishing and recording the generation time interval of the file data.

The relation storage module distinguishes and records the file relation between different service sources and service periods according to each file, records the current file as a first source file, and then distinguishes the different service sources and the service periods:

(1) Taking the same business file relationship filed in the same period as a second source file;

(2) Different business archive relations filed in the same period are used as current business source archives;

(3) Taking the same business archive relationship filed in different periods as a second source archive, and recording the time interval between the two archives;

(4) And taking the different business archive relations filed in different periods as current business source archives, and recording the time interval between the two archives.

The relation display module is used for acquiring the file relation stored in the relation storage module and displaying the association relation among different files. By looking up a specific archive, the user can choose whether to look up the association relation of the corresponding archive and choose whether to jump to look up or not on the display and check interface of the archive, and click on the option of looking up the archive relation to jump.

During the jump, the system searches each dimension file associated with the file through the relation storage module (the jump inquiry mode can avoid the long time consumption problem caused by directly inquiring all file information), transfers the file data associated with the file to a system front stage, displays the association relation between different files according to the mode of ' nodes ' -sides ', and displays the association relation between different dimension files and file nodes in a distinguishing way, thereby intuitively displaying the association relation between files of different levels and service files of different periods. The current display file is used as a root node in the relational network as a first source file, and then is simultaneously displayed on a relational display interface according to a second source file, a current service source file, second source files in different periods and current service source files in different periods in a relational storage module, wherein the relational display interface can be referred to a file relational diagram shown in fig. 3.

When a user does not determine the information of a specific file to be searched, fuzzy query can be performed through the related file key information associated with the file, then the file is quickly positioned on a target file through the association relation established between the file and the target file, and the file query and retrieval efficiency of the user is improved.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.

Claims

1. An electronic archive management system based on a knowledge graph, the system comprising: the system comprises a data storage module, a data management module, a relationship analysis module, a relationship storage module and a relationship display module; wherein,

2. The system of claim 1, wherein said obtaining archival relationships of said different archival data comprises:

and preprocessing, data reduction, data cleaning and data mining analysis are carried out on the different archive data to obtain archive relations of the different archive data, wherein the preprocessing comprises missing value processing.

3. A system according to claim 2, wherein the missing value processing of the different profile data comprises:

filling missing values of file data of the same service in the same period in an average value processing mode;

filling missing values of file data of different businesses in different periods in a preset rule mode;

filling missing values of file data of the same service in different periods by using a last observed value pushing mode LOCF;

and supplementing missing values of the archive data of different businesses in the same period in a statistical mode.

4. A system according to claim 2, wherein the data cleansing of the different profile data comprises:

according to the detection specification of the preset archive data, detecting whether the content of different archive data accords with the archive storage specification of the archive or not, and screening out the archive data which does not accord with the archive storage specification.

5. The system of claim 2, wherein performing data mining analysis on the different profile data comprises:

according to the data characteristics of the deep learning network, the NLP framework and different services, carrying out semantic analysis on field information of different archive data so as to realize multi-level and multi-dimensional data analysis on characters, words and chapters in the different archive data and obtain semantic analysis contents of the different archive data.

6. The system of claim 1, wherein the archive relationships for different businesses and different historical periods are stored separately, comprising:

storing the current archive data as a first source archive;

storing the same business archive relation filed in the same period as a second source archive;

storing different business archive relations filed in the same period as current business source archives;

storing the same business archive relationship filed in different periods as a second source archive, and storing the time interval between the two archives;

and storing different business archive relations filed in different periods as current business source archives, and storing the time interval between the two archives.

7. The system of claim 6, wherein the aggregating the different profile data stored in the data storage module to obtain the profile relationship of the different profile data comprises:

summarizing the archive data of the same service obtained at the current moment, analyzing a first association relation of the archive data of the same service, and distinguishing and recording the first source archive and the second source archive when the first association relation is established;

Summarizing the file data of the first appointed service acquired at the current moment and the file data of the same service which is filed previously, analyzing second association relations existing among the file data of the same service in different historical periods, distinguishing and recording the first source file and the second source file when the second association relations are established, and distinguishing and recording the generation time interval of the file data;

summarizing and analyzing the archive data of the second designated service acquired at the current moment and the archive data of other services acquired at the same time to acquire a third association relation of all archive data at the same time, and distinguishing and recording the first source archive and the source archive of other services when the third association relation is established;

and summarizing and analyzing the archive data of the third service acquired at the current moment and the archive data of other previously archived services to acquire a fourth association relation of all archive data at the same time, distinguishing and recording the current service source archive and the other service source archive when the fourth association relation is established, and distinguishing and recording the generation time interval of the archive data.

8. The system of claim 6, wherein the acquiring the archive relationship stored in the relationship storage module and displaying the association relationship between the current archive data and other archive data through a knowledge graph includes:

acquiring the archive relation stored in the relation storage module, and determining other archive data associated with the current archive data and the dimension of the other archive data;

and determining nodes and associated edges of the current archive data and the other archive data in an associated network through a knowledge graph, and generating an associated network so as to display the association relationship between the current archive data and the other archive data.

9. The system of claim 8, wherein the determining, by a knowledge-graph, a node and an associated edge where the current profile data and the other profile data are located in an associated network comprises:

and determining the first source file as a root node in the association network through a knowledge graph, determining the other file data as slave nodes in the association network, and determining a second source file in the same period, a current service source file in the same period, a second source file in different periods and a current service source file in different periods as association edges in the association network.

10. The system of claim 1, wherein the preset traffic segment comprises: the receipt, collation, utilization, qualification, statistics and audit trail of the archival data, and one or more of the system management.