CN106682072A

CN106682072A - Knowledge management based data mining method for digital archives

Info

Publication number: CN106682072A
Application number: CN201611013730.0A
Authority: CN
Inventors: 王学杰; 杨乃平
Original assignee: Anhui Huabo Shengxun Information Technologies Co Ltd
Current assignee: Anhui Huabo Shengxun Information Technologies Co Ltd
Priority date: 2016-11-17
Filing date: 2016-11-17
Publication date: 2017-05-17

Abstract

The invention discloses a knowledge management based data mining method for digital archives. The knowledge management based data mining method creates conditions for knowledge management implementation of the digital archives, and is a method and a strategy for coordinating and managing information processing technology clusters. The knowledge management based data mining method is based on networks and digital resources as well as coordination and cooperation of multiple information technologies, takes implementation of a mining algorithm and a mining model as a measure, organizes and discovers existing knowledge resources in the digital archives, and takes providing management objects to knowledge management implementation as the objective, so that the digital archives utilize knowledge effectively, and knowledge innovation is realized.

Description

A kind of data digging method in the Digital Archives of knowledge based management

Technical field

The present invention relates to field of information management, the data in more particularly to a kind of Digital Archives of knowledge based management Method for digging.

Background technology

Digital Archives, as conventional entity archives the information age new type organization forms, be that entity archives exist The certainty that information age constantly brings forth new ideas and develops, is the Challenges of Knowledge-based Economy Age, expands conventional entity archives function, full Sufficient user's request, provides the key of personalization, diversified service, is also the new opportunity for improving Social files sense4.So, how Refine from the vast as the open sea a large amount of digitalization resources of Digital Archives, excavate valuable, Digital Archives is known Know accumulation, knowledge innovation have data supporting act on effective information, this be future digital archives construction faced it is important Problem.Data mining technology exactly solves the effective way of this difficult problem, and data mining is the focus in computer nowadays field, its Achievement is also widely used in Library Information Science.

Data mining is the cross discipline of a very broad sense, is derived from computer, although having application to numerous areas, figure Book, the practice of intelligence community have also fully verified its value, but in archives circle, data mining be still treated as the technology of profundity and Theory, many Archives Workers are relatively fuzzyyer to this concept or misty.So what is data miningData Excavate (Data Ming), be exactly that extraction lies in from substantial amounts of, incomplete, noisy, fuzzy, random data The prior process that is ignorant but being potential useful information and knowledge of therein, people.The purpose of this process be in order to It was found that " the knowledge gold mine " that be hidden in mass data silt, therefore, data mining is defined as " knowledge excavations in data " It is more appropriate.So, data mining is also knowledge excavation, Knowledge Extraction etc. by person.

Data mining can be divided into data digging method conceptual description, association analysiss, divide according to the difference of mining task The polytypes such as alanysis, cluster analyses, separate-blas estimation, it is specific as follows:

Conceptual description exactly by analysis and compares, and certain class data that are mutually related are collected, and summarizes such right The correlated characteristic of elephant, to being described with regard to such bulk information, these descriptions are abstract, it is intended that justice.Its type There are two kinds:Characteristic is described and distinctiveness description.

1) characteristic description is applied to the something in common for describing certain class object, for example, in the archive database in certain archives There is substantial amounts of user basic information, be directed to:The information such as name, age, work, utilization hobby, if to historic survey Person is described, it is more likely that draw following result:Based on College Teachers, student, ground with compiling various district annals, writing historiography For the purpose of studying carefully article.

2) distinctiveness description, for describing the difference between two or more class objects, for example, to enterprise customer and history Researcher feature is compared, and perhaps can draw following rule:It is main using production management and research and development management in terms of archives letter Breath, to obtain certain economic benefit and social benefit for the purpose of.

Association analysiss are exactly the correlation properties existed between data item in descriptive data base, that is, excavate and be hidden in data item Between mutual relation, specifically, if wherein two item datas or many item datas exist certain association, one of which data are just Can be predicted according to other data.Association analysiss can find user using the association between different archive informations, analyses and prediction User's Land use models.

Classification analysises are exactly to condense together the data in data base are orderly, contribute to comprehensively handle of the people to things Hold.Classification analysises can be divided into structural data classification analysises, the classification point of the such as data in relational database, and unstructured data Analysis, such as text data.The detailed process of classification analysises is:It is the data in a data acquisition system with the different classification of a stack features Classified, then find out and describe the model of these data, and data are divided in different classifications according to this model, profit Unknown data can be predicted with this model.Classification analysises can pass through the data in existing subscriber's archive database, disclose User characteristicses and user are classified according to the degree for affecting user behavior using the relation between behavior to these data, For predicting the user behavior in future.

Cluster analyses are exactly the process that the data in data base are divided into different pieces of information class, and it is different from classification analysises, The former is that, in the case where known disaggregated model is not considered in advance, in placing the data into different classification, the purpose of cluster is root According to the similarity maximized in class, similarity this principle minimized between class reasonably divides data acquisition system, in simple terms It is to minimize the difference in class, the difference between class is maximized, and thus similar data can be organized together and derived Certain rule.

Separate-blas estimation be exactly by finding data base in abnormal conditions process that the data of deviation are analyzed, emphasis It is to find the ANOMALOUS VARIATIONS in data, the data variation in data base is probably what mistake caused, is more likely data The result of the natural trends such as renewal.The meaning of separate-blas estimation is can effectively to exclude a large amount of incoherent data.For example, certain shelves Line retrieval is first entered being formed before certain volume is ground into fruit in case shop in User Information Database, and with archives data base in Existing resource is combined, then is excluded incoherent user using model with data mining technology, using remaining as emphasis, is formulated Targetedly compile and grind strategy.

Digital Archives resource, information management sum are tackled first in the data mining of the Digital Archives of knowledge based management According to the relation positioning excavated.The knowledge resource of Digital Archives will be organized and found, this is that Digital Archives realizes modernization Scientific management, provides the quick, basis of good service.It is to choose the reply era of knowledge-driven economy to implement information management to Digital Archives War, maximizes Digital Archives knowledge resource potentiality, finally realizes the inevitable requirement of Digital Archives knowledge innovation.Without enforcement The Digital Archives of information management cannot meet the needs of future development, lack the knowledge of management object also into water without a source. Data mining is the effective way for organizing and finding knowledge resource in Digital Archives, is that Digital Archives implements information management wound Condition is made, has been that both are able to forming a connecting link the stage for seamless link.Here data mining can not regard pure information as Treatment technology, it is the method and strategy for coordinating to information processing technology cluster and managing.The numeral of knowledge based management Data mining in archives be based on network and digitalization resource, based on the coordination and cooperation of several information, It is real to organize and find already present knowledge resource in Digital Archives to implement mining algorithm and mining model as means Apply information management to provide for the purpose of management object, allow Digital Archives effectively utilizes knowledge, realize the process of knowledge innovation.

Main excavation object in the Digital Archives of knowledge based management mainly includes：

1) the solidification resource in Digital Archives, this is the Explicit Knowledge being present in Digital Archives, that is, be recorded in one Determine the knowledge on material carrier, including:Digitized Collection Resources, existing e-file, gopher, volume are ground into fruit, with Digital Archives work related various laws and regulations, rules and regulations, industry standard etc., around produced by Numerical Archives ' Construction Achievement in research, technical data and contribute to Digital Archives development other relevant knowledges.

2) intellectual resources in Digital Archives, this is the implicit knowledge being present in Digital Archives, is to be present in shelves That what is laid in the brains such as case shop administration staff, policies and regulations research worker, information technologist, external coordination personnel is big Amount non-coding intellectual resources, including:Various management methods, computer processing technology, ability of process problem etc..Because people is to know Know the core of management, be the factor of most active most active in information management, so the excavation to this partial knowledge is also numeral The emphasis of archives knowledge excavation.

3) user utilizes the utilization behavioural information of behavioural information, user to include two aspects, using information and feedback information.Profit With information be user in order to solve practical problems, meet science, scientific research, the demand such as production, implementing concrete using behavior when institute The information of generation, including:Content, access frequency, access time etc. are accessed, they reflect individual character of the user to digitalization resource Change, diversified demand and assimilated equations.Feedback information is that in File use this continuously active, what File use person had found asks Topic and situation, requirement, suggestion, evaluation and benefit etc..Excavation to these data, can be used to that user will be utilized in future The analyses and prediction of gesture, and management decision-making on this basis is proposed, the service level to improve Digital Archives provides foundation.

The content of the invention

It is an object of the invention to provide the data digging method in a kind of Digital Archives of knowledge based management.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of data digging method in Digital Archives of knowledge based management, comprises the following steps：

Step one, determine theme：It is determined that needing the datum target for excavating；

Step 2, requirement definition：According to the theme that step one determines, theme is defined, what explicit data was excavated will Summation purpose；

Step 3, data collection：While being defined to theme, to Explicit Knowledge and recessiveness in archive database Knowledge is collected extraction, and the correlated characteristic that conceptual description summarizes demand is carried out to it；

Step 4, analyze and formed result：By cluster analyses, according to similarity and diversity different demands point are formed Class model, and place the data in different classification, by demand classification model and user using the combination of information, carry out difference Analysis and separate-blas estimation, exclude a large amount of incoherent data, form Result；

Step 5, Result is evaluated：The Result of formation there may exist unrelated data, it is also possible to Demand is unsatisfactory for, if do not meet excavation required and purpose, step 3 is gone to, and repeats mining process；Otherwise, step is gone to Six；

Step 6, through evaluating, Result reaches data mining requirement, can be used by Digital Archives information management, In then enriching legacy data storehouse, the knowledge innovation in archives is realized.

Beneficial effects of the present invention：

Data digging method in a kind of Digital Archives of knowledge based management provided by the present invention, is digital archives Shop is implemented information management and creates condition, and the present invention is the method and plan for coordinating to information processing technology cluster and managing Slightly, the data mining in the Digital Archives of knowledge based management of the present invention is based on network and digitalization resource, to base oneself upon In the coordination and cooperation of several information, to implement mining algorithm and mining model as means, to organize and find digital shelves Already present knowledge resource in case shop, manages for the purpose of object to implement information management and providing, allows Digital Archives effectively utilizes Knowledge, realizes knowledge innovation.

Description of the drawings

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can be with basis The accompanying drawing of offer obtains other accompanying drawings.

Fig. 1 is the schematic diagram of the present invention.

Specific embodiment

The core of the present invention is to provide the data digging method in a kind of Digital Archives of knowledge based management.

In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, and described embodiment is only the present invention A part of embodiment, rather than the embodiment of whole.Based on the embodiment in the present invention, those of ordinary skill in the art are not having The every other embodiment obtained under the premise of creative work is made, the scope of protection of the invention is belonged to.

As shown in figure 1, the invention provides the data digging method in a kind of Digital Archives of knowledge based management, is somebody's turn to do Method comprises the steps：

Step one, theme is determined, it is determined that needing the datum target for excavating.

Step 2, requirement definition：According to the theme that step one determines, theme is defined, what explicit data was excavated will Summation purpose.

Step 3, data collection：While being defined to problem, to Explicit Knowledge and recessiveness in archive database Knowledge is collected extraction, and the correlated characteristic that conceptual description summarizes demand is carried out to it.

Step 4, analyze and formed result：By cluster analyses, according to similarity and diversity different demands point are formed Class model, and place the data in different classification, by demand classification model and user using the combination of information, carry out difference Analysis and separate-blas estimation, exclude a large amount of incoherent data, form Result.

Step 5, Result is evaluated：The Result of formation there may exist unrelated data, it is also possible to Demand is unsatisfactory for, if do not meet excavation required and purpose, step 3 is gone to, and repeats mining process；Otherwise, step is gone to Six.

Above content is only to present configuration example and explanation, affiliated those skilled in the art couple Described specific embodiment is made various modifications or supplements or substituted using similar mode, without departing from invention Structure surmounts scope defined in the claims, all should belong to protection scope of the present invention.

Claims

1. the data digging method in the Digital Archives that a kind of knowledge based is managed, it is characterised in that comprise the following steps：

Step 2, requirement definition：According to step one determine theme, theme is defined, explicit data excavate requirement and Purpose；

Step 3, data collection：While being defined to theme, to Explicit Knowledge and implicit knowledge in archive database Extraction is collected, and the correlated characteristic that conceptual description summarizes demand is carried out to it；

Step 4, analyze and formed result：By cluster analyses, according to similarity and diversity different demand classification moulds are formed Type, and place the data in different classification, by demand classification model and user using the combination of information, carry out variation analyses And separate-blas estimation, a large amount of incoherent data are excluded, form Result；

Step 5, Result is evaluated：The Result of formation there may exist unrelated data, it is also possible to discontented Sufficient demand, if do not meet excavation required and purpose, goes to step 3, and repeats mining process；Otherwise, step 6 is gone to；

Step 6, process are evaluated, and Result reaches data mining requirement, can then be filled used by Digital Archives information management In actual arrival legacy data storehouse, the knowledge innovation in archives is realized.