CN104361111B - A kind of archives are compiled and grind method automatically - Google Patents

A kind of archives are compiled and grind method automatically Download PDF

Info

Publication number
CN104361111B
CN104361111B CN201410714594.2A CN201410714594A CN104361111B CN 104361111 B CN104361111 B CN 104361111B CN 201410714594 A CN201410714594 A CN 201410714594A CN 104361111 B CN104361111 B CN 104361111B
Authority
CN
China
Prior art keywords
mrow
volume
archives
expert
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410714594.2A
Other languages
Chinese (zh)
Other versions
CN104361111A (en
Inventor
蒋静
王卓平
门霞
赵毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University
Original Assignee
Qingdao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University filed Critical Qingdao University
Priority to CN201410714594.2A priority Critical patent/CN104361111B/en
Publication of CN104361111A publication Critical patent/CN104361111A/en
Application granted granted Critical
Publication of CN104361111B publication Critical patent/CN104361111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • G06F16/3323Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the classification of documents and retrieval technique field, being related to a kind of archives based on B/S framework MIS of Department Files, volume grinds method automatically;It is first according to the unified form difference typing archive information that archives list, list in volume and expert's registration card interface are provided, archive information is classified and collected automatically using automatic hierarchical classification algorithm with management module by archives typing again, and is stored in respectively in corresponding database;Then the compiling and researching condition and the information of deposit that Compiling of Files module is inputted according to user are retrieved to associated databases, inquired about and collected generation Compiling of Files result, finally Compiling of Files result is included on screen, or preserve Compiling of Files result to form paper document after export printing in the form of Word document or Excel forms, realize that the automatic volume of archives is ground;Its design principle science is reliable, and volume grinds that labor intensity is small, and operating efficiency is high, and information careless omission is few, it is ensured that compile the quality and value ground, and volume grinds efficiency high, and volume grinds environment-friendly.

Description

A kind of archives are compiled and grind method automatically
Technical field:
The invention belongs to the classification of documents and retrieval technique field, it is related to a kind of based on B/S framework MIS of Department Files Archives compile automatically and grind method, be that archives compilation and archives digest compilation provide the automatic volume of the archives completed by computer software and ground Technology.
Background technology:
MIS of Department Files based on B/S frameworks is extension of the Traditional Archives function in informationized society, is both had There are the base attribute and function of Traditional Archives, the development need of information age is adapted to again, realizing conventional archival digitalization While management, realized by internet with digital archives data storage storehouse is set up to all departments and the receipts of all kinds of archive informations Collection, storage, management and use, Informatization Service is provided for the utilization of Archive Resource.Compiling of Files towards File use works It is actual demand of the archives/room according to File use, based on archives/room stock archives, with assemble archive with reference to money Expect a specialized work for achievement form.The essence of Compiling of Files work is that the content of files is studied and processed Arrange, and volume is ground into volume, is allowed to very clear, to improve department in power and unit integrated management level and operating efficiency, enhancing Value of services of the Archive Resource to society.At present, based on Compiling of Files work is mainly ground with artificial volume, its speed is slow, efficiency Low, volume grinds of poor quality.
Traditional artificial method of compiling and researching archives is divided into 2 kinds according to the processing level to archives, the first Compiling of Files plus Work method is to take passages, reduce the staff and editing on the basis of raw file, forms summary material;It processes the achievement of Compiling of Files Form has:Dispatch collects, special topic compilation, records series digest compilation, experts and scholars and academic opinion including a certain field Digest compilation, scientific and technological achievement digest compilation etc.;The processing method of second of Compiling of Files is to need to provide to raw file On the basis of the relevant content of material is analyzed, studied and concluded, new material is write out;It processes Compiling of Files achievement form Have:Yearbook, evolution of organization, Shi Zhi and the report of integrated technology economic research etc..Fruit shape is processed into second Compiling of Files Due to containing people to new contents such as the new knowledge of things, new viewpoint, new conclusion and new suggestions in formula, from information content These are all the information newly increased from the point of view of angle, so the processing of this Compiling of Files is usually the expert or scholar by association area Complete.And the information that the first processing Compiling of Files achievement is included is existing information in archives, information content will not be increased, do not produced Raw new content, makes every effort to accomplish " entirely, essence, standard ", it is impossible to have careless omission, the content that volume is ground is comprehensive, careful, peaceful not leak more, and Continuity over time will reach mass data amount, it is necessary to compile the raw file material accumulation ground, for mass data by people Work progress volume grinds slightly general idea and will result in information omission or malfunction, it is impossible to ensure the quality and value of Compiling of Files achievement, its Labor intensity is big, inefficiency, and it is poor that volume grinds accuracy rate, constrains Archive Resource in mass data scale and more high-tech level On abundant development and utilization, human cost is very high.
The content of the invention:
It is an object of the invention to overcome the shortcoming that prior art is present, seek design and provide a kind of based on file administration letter The archives of breath system are compiled and grind method automatically, and Compiling of Files is automatically generated into fruit shape using computer automatic sorting and retrieval technique Formula, improves Compiling of Files efficiency and accuracy rate, reduces the artificial information careless omission compiled when grinding.
To achieve these goals, the present invention in the MIS of Department Files based on B/S frameworks by archives typing with Management module and Compiling of Files module, which are combined, realizes that archives automatically grind by volume, and it is concretely comprised the following steps:
(1) archive information typing, the archives list shown according to system, list in volume and expert's registration card interface are first carried out Unified form difference typing archives title, the affiliated classification of archives, shelves number, year and all kinds of essential informations of expert's situation provided;
(2) it is automatic to step using automatic hierarchical classification algorithm proposed by the present invention by archives typing and management module again (1) archive information of typing is classified and collected, and it is basic to be stored in corresponding archives catalog, list in volume and expert respectively Information is registered in catalog data base and expert database;
(3) compiling and researching condition that is inputted again by Compiling of Files module according to user and the information of deposit to corresponding archives catalog, List in volume registers catalog data base with expert's essential information and expert database is retrieved, inquired about and collected generation archives and compiled Grind result;
(4) Compiling of Files result is included on screen, or by Compiling of Files result with Word document or Excel forms Printout forms paper document preservation after form export, realizes that the automatic volume of archives is ground.
Automatic hierarchical classification algorithm proposed by the present invention is the improvement to existing conventional NB Algorithm, simple shellfish This algorithm of leaf refers to consider that all features of text classify to text during classification, ties forecast sample according to prediction during classification Fruit is divided into particular document class probability highest class library.
The specific disaggregated model of NB Algorithm of the present invention is as follows:The archives text of a given unknown classification This X, provided with m classification, is designated as C1, C2 ... ..., Cm, according to Naive Bayes Classification law, has under condition X after highest The calculation formula for testing the classification P (Ci | X) of probability is as follows:
In P (Ci | X) calculation formula, P (X) is constant, therefore only needs to maximize molecule P (X | Ci) P (Ci) i.e. Can;P (Ci) is the category distribution probability in training set, and calculation formula is:Molecule is classification in formula | Ci | Comprising textual data plus 1, denominator is m classification and | D | be the text sum sum included in training set;In order to simplify P (X | Ci calculating process), it is assumed that multiple attributes of text are independent of each other, therefore, and calculating P (X | Ci) it is exactly to calculate characteristic attribute The probability occurred on classification Ci, P (X | Ci) value is calculated using 2 kinds of computation models of Laplace estimation:
(1) whether multivariate model, statistical nature attribute occurred in the text, if occurring being designated as 1, was otherwise designated as 0.Meter Calculating formula is:
Wherein, | V | the total quantity of representation eigenvalue, BxtIt is wtThe mark occurred in text X, if wtThere is then BxtIt is designated as 1, otherwise it is designated as 0, wtRepresent t-th of feature, i.e. t-th vectorial of component, therefore, the P (w in formulat|Ci) calculation formula is as follows:
(2) multinomial model (Multinomial Model) the then occurrence number of statistical nature attribute in the text, is calculated Formula is:
Wherein, NxtRepresent the number of times that feature t occurs in text X;P(wt|Ci) calculation formula it is as follows:
In P (wt|Ci) calculation formula in, NjtT is characterized in text djIn the number of times that once occurred, | D | it is total for training text Number, | V | it is characterized sum, NjsIt is feature s in text djIn occurrence number;The essence of the sorting technique is in text object All characteristic values probability for being counted and being mapped in each already present classification.
The present invention is improved NB Algorithm in archives typing management module, is realized and is based on archives list The automatic hierarchical classification algorithm of title and keyword rough sort, directly extracts crucial from the topic of archives list and list in volume Word set, builds stratification disaggregated model, and classifying quality is reached with low characteristic dimension after appropriate dimensionality reduction, replaces traditional text The Chinese word segmentation of sorting algorithm, effectively improves the nicety of grading and operational efficiency of file literature;It is described based on archives list mark The implementation process of the automatic hierarchical classification algorithm of topic and keyword rough sort is as follows:
(1) first in local or online input system archive information, the archives list shown according to system, list in volume and expert step on All kinds of basic letters of unified form difference typing archives title, affiliated classification, shelves number, year and expert that note card interface is provided Breath;
(2) system automatically extracts the text data characteristic parameter collection of archives title and the keyword in archives text and preserved In corresponding database;
(3) dimensionality reduction is carried out when exceeding threshold value to the text data characteristic parameter collection of extraction, excessive feature frequently can lead to Dimension disaster, making the efficiency of classification reduces;
(4) rough sort of Naive Bayes Classification Algorithm is performed according to the text data characteristic parameter or keyword of extraction;
(5) feature extraction is carried out for each subclass respectively again in step (4) rough sort result;
(6) the text data characteristic parameter execution Naive Bayes Classification Algorithm again for each subclass is automatically performed subdivision Class;
(7) output category result and it is saved in corresponding database.
The data message of Compiling of Files resume module of the present invention is for the archive information typing set up and pipe The basic volume for managing archives list, list in volume and expert's registration card catalogue progress of module is ground, in archive information typing and management Being created in module includes archives list database, list in volume database, classification of documents database, expert's essential information registrating number According to 6 databases such as storehouse, the civilized thin and expert's Item Details of Specialists;Compiling of Files module by the classification of documents compile grind submodule, File font size index volume grinds submodule and expert info is compiled and grinds 3 submodule compositions, is created and above-mentioned 6 in Compiling of Files module The associated Compiling of Files basic database of individual database;The classification of documents is compiled and grinds the compiling and researching condition that submodule is inputted according to user, It is automatic to realize that classification of documents volume is ground, archives title is compiled to grind to compile with time of filing and grinds and show that volume grinds result with tabular form;File Font size index, which is compiled to grind submodule and grind demand according to the volume of user and input, includes official document font size, year, shelves number and the group of retention period Compiling and researching condition is closed, is clicked on according to condition filter data after inquiry, to automatically generate and show that file font size index volume grinds list; Expert info volume grinds submodule and grinds statistics to expert's registration card information progress volume in file administration, according to expert's surname of input The compiling and researching condition of name, research direction and achievement carries out fuzzy query, realizes that expert classification volume is ground, expert's research direction is compiled and ground, specially Family's paper information is compiled to grind to compile with expert's project information and ground, and volume is ground after result is collected including on screen;Wherein expert Classification, which compiles to grind to refer to compile to grind all expert infos for automatically generating a certain research field and compile, grinds the results list, and it is compiled and grinds result and can lead Go out into Excel or Word document to preserve or print out.
The Compiling of Files that classification of documents volume of the present invention grinds submodule comprises the following steps:
(1) create classification volume and grind view, view is using list in volume or archives list as main table, and associative classification information table is obtained Classification information title, list in volume is stored in from archives list information in different tables of data, needed when classification volume is ground Collect two parts information, carry out unified query and retrieve;
(2) data access layer identification code, retrieval needs to compile the archives ground during data access layer is the view provided from step (1) Information, performs function using querying condition as parameter, retrieves qualified archive information, and classification volume grinds the extraction classification of documents, shelves Case title, time of filing, number, number of pages and shelves information, and when result is ground to volume according to specific name, distribution caption and filing Between be ranked up;
(3) classification volume grinds application layer realization, classification compiling and researching condition is set first, by the fuzzy search of classification of documents title, shelves The fuzzy search of case title and time of filing searching classification, which are compiled, to be ground;
(4) volume is ground into result to export in EXCEL or Word document, facilitates user's preservation to check and print and bind into book form.
The Compiling of Files that file font size index volume of the present invention grinds submodule comprises the following steps:
(1) establishment file font size indexed view, file font size indexed view is using list in volume as main table associative classification information Table, extracts official document font size, year, file sequence number, shelves number, page number, number of pages and retention period information;
(2) data access layer identification code, data access layer is to extract information from file font size indexed view, and according to official document Font size, file sequence number and year sequence, function is using querying condition as parameter, by application layer dynamic construction;
(3) application layer main code, default compiling and researching condition, including official document font size, year, shelves number and retention period, Needs are ground according to volume and are combined input, are clicked on after inquiry according to condition filter data;
(4) volume is ground into result to export in EXCEL or Word document, facilitates user's preservation to check and print and bind into book form.
The Compiling of Files that expert info volume of the present invention grinds submodule comprises the following steps:
(1) create expert info volume and grind view, view is using expert's essential information registration form as main table, association Specialists civilization Thin table extracts paper information, and association expert item details extract project and winning information, the registration of expert's essential information and expert Achievement information is stored in different tables of data, is needed to collect each several part information when row information is compiled and ground entering, is carried out unified query;
(2) data access layer identification code, data access layer is to obtain to need to compile the expert's letter ground from the view in step (1) Breath, function inquires about qualified expert's archive information using querying condition as parameter, volume grind extraction expert name, expert's classification, Research direction, paper information and project information;
(3) expert info, which is compiled, grinds application layer realization, sets compiling and researching condition, carries out the fuzzy search of expert's title, expert's research Direction fuzzy search, thesis topic summarizes fuzzy search, the retrieval of paper publishing time, project name fuzzy search, item with paper Mesh summarizes fuzzy search and the retrieval of project beginning and ending time and the prize-winning situation retrieval of project;Expert info is compiled to grind and ground according to the volume of input Condition, realizes that expert classification volume is ground, expert's research direction volume is ground, expert's paper information is compiled to grind to compile with expert's project information and ground;
(4) volume is ground into result to export in Excel or Word document, facilitates user's preservation to check and print and bind into book form.
The execution flow of MIS of Department Files of the present invention is:
(1) a browser is opened in client, the station address of input system is sent out to Web server in address field Go out service request, when showing the login page of system on desktop, after login page fills in user name, password and identifying code Send Web server to, Web server uses http protocol the master of MIS of Department Files after being verified to user identity Page sends client to, and client browser receives the homepage file transmitted, and it is shown on screen;
(2) typing of archives essential information, the archives list, list in volume and expert for being shown in homepage according to system is registered Block classification, shelves number, archives title, year and all kinds of basic letters of expert belonging to the unified form difference typing provided and addition archives Breath;System performs corresponding extension application in the Business Logic of Web server and is attached with database server, leads to Cross SQL modes the above-mentioned all kinds of essential informations of user's typing or addition are connected in storage to corresponding with Web server Before database, catalogue and archives title are carried out automatically classifying and filing by system, then original text is mounted;Original text can be electricity Sub- scanned copy or electronic edition original paper;
(3) when needing to carry out Compiling of Files to certain class archive information, the entry for the Compiling of Files that system is selected according to user, Corresponding Compiling of Files interface is entered, compiling and researching condition is inputted in the interface;For example:Carry out the expert info in some field When volume is ground, then the title or research direction for inputting the research field also grind parameter click inquiry including the volume such as paper, project situation Associated databases are carried out information retrieval and inquiry by button;
(4) volume inputted according to (3) grinds parameter, performed with Web server link after by SQL statement to corresponding Database server propose data processing request, i.e., to Compiling of Files basic database and be associated other database informations Retrieved and inquiry operation, and the data item that meets compiling and researching condition will be retrieved and counted, analyzed and collected, generate archives Volume is ground into fruit;
(5) result of the Compiling of Files of generation is submitted to Web server by database server, then is passed by Web server It is sent to client and is shown on screen;
(6) volume is ground into fruit and exports to preservation or printout in Word document or Excel forms.
Compared with prior art, its design principle science is reliable by the present invention, and volume grinds that labor intensity is small, and operating efficiency is high, letter Breath careless omission is few, it is ensured that compile the quality and value ground, and volume grinds efficiency high, and volume grinds environment-friendly.
Brief description of the drawings:
Fig. 1 constitutes structural principle schematic block diagram for the hardware of apparatus of the present invention.
Fig. 2 is the logical functional structure principle schematic block of Compiling of Files module of the present invention and module for managing files Figure.
Automatic compile of MIS of Department Files that Fig. 3 is the present invention grinds execution flow chart.
Fig. 4 is the hierarchical classification algorithm execution flow chart of the rough sort of the present invention.
Embodiment:
It is described further below by embodiment and with reference to accompanying drawing.
Embodiment 1:
The present embodiment carries out test and evaluation to sorting algorithm proposed by the present invention, first in 1000 archives texts being collected into In this, classification based training, remaining 960 archives text are carried out to method of the present invention from every 40 texts of class random selection The test evaluation of classification results is just carried out to method of the present invention as text set to be sorted;Wherein, secretarial document class It it is 222, scientific and technical archive class 216, accounting file class 162, personal file class 95, sound and video archive class 43 is comprehensive to shine Piece class 86, archival objects class 35, archive file class 40, periodical archives class is 61, respectively with precision ratio, recall ratio Classification results are evaluated with F1 (harmonic-mean of recall ratio and precision ratio) three indexs of test value, test evaluation result As shown in table 1;
The classification results test evaluation table of table 1
Classification Recall ratio Precision ratio F1 test values
Secretarial document class 95.08% 91.80% 93.44%
Scientific and technical archive class 85.34% 87.93% 86.64%
Accounting file class 90.32% 93.55% 91.94%
Personal file class 92.00% 94.67% 93.33%
Sound and video archive class 93.02% 95.35% 94.19%
Photomonate class 97.67% 94.19% 95.93%
Archival objects class 91.43% 94.29% 92.86%
Archive file class 87.50% 85.00% 86.25%
Periodical archives class 90.16% 83.61% 86.89%
Upper table illustrates that recall ratio, precision ratio and the F1 test values of the classification results of the present embodiment can reach preferably Effect, the intrinsic dimensionality produced during rough sort according to Document Title and keyword improves system fortune below 50 Line efficiency.
The running environment requirement of the present embodiment:Configure dual microprocessors or higher, more than internal memory 2G networking PC and simultaneous Appearance machine is run;Server OS is Windows XP and above version;VS .NET Prerequisites software is Framework 3.5, SQL Server2005;Exploitation software is Microsoft Visual Studio 2008;Using B S three-tier architectures, pass through asp.net Expression layer, Business Logic and data layer identification code are realized respectively.
It is server addition use current embodiment require that installing and setting Microsoft SQL Server database servers Import system database after name in an account book and setting password;Then issuing web site (i.e. the MIS of Department Files of B/S frameworks);Net Stand after issuing successfully, open any one browser networked on PC, input station address, which is entered, in address field steps on Click logs into system administration main interface after recording the page, input account, password and identifying code;On the left of system manager Tree menu column in click on【Compiling of Files】, showing on screen is needed to compile the multiple entries ground, including classification volume is ground, file word Number index volume grind with expert info compile grind;Select and click on【Classification volume is ground】, enter classification volume and grind interface, in the interface Each compiling and researching condition is inputted, including classification of documents title, archives title and time of filing (are more than sometime, less than sometime Or in certain time) etc., system can automatically generate classification according to the compiling and researching condition of input and compile the result ground and in the form of a list It is shown on screen, and preservation or printing in Word document or Excel can be exported to;Select and click on【File font size index is compiled Grind】, then enter file font size index volume and grind interface, being ground in the interface according to volume needs to be combined the content that input volume is ground Such as official document font size, year, shelves number and retention period, system after inquiry of clicking on can be according to the retrieval of the compiling and researching condition of input, filtering Data automatically generate and show that volume grinds result, or volume is ground into result exported in Word document or Excel and preserve or print;Selection And click on【Expert info is compiled and ground】, enter expert info volume and grind interface, compiling and researching condition is inputted in the interface, system carries out special Family's name fuzzy search, research direction fuzzy search, thesis topic, paper general introduction fuzzy search, the retrieval of paper publishing time, item Whether the fuzzy search of mesh title, project overview fuzzy search, the retrieval of project beginning and ending time and the prize-winning situation of project are (prize-winning, prize-winning Title) retrieval;System is according to expert's name of input, and the compiling and researching condition of research direction and achievement carries out fuzzy query, realizes special Family's classification volume is ground, expert's research direction volume is ground, expert's paper information is compiled to grind to compile with expert's project information and ground;And volume is ground into result entered Row is shown on screen after collecting, or result is exported into preservation or printout in Excel or Word document.
The classification of documents management of the present embodiment carries out the setting that classification volume is ground, actual work according to the actual demand of Compiling of Files Volume is needed to grind the classification of documents of one " administrative examination and approval ", system manager only needs to increase in classification of documents table one " OK The grouped data of political examination batch ", you can safeguard the archive information accordingly classified in the list in volume, archives list in archives, and it is right The classification performs volume and grinds operation;System is collected all kinds of archive informations by the typing to all kinds of archive informations, according to point The relevant information quick-searching such as class or archives title goes out the archives, by checking the inventory locations of archives, quickly from corresponding thing Reason deposit position obtains the archives, shortens the time that archives are found, and improves operating efficiency.
Embodiment 2:
The automatic device for compiling method of grinding of archives that the present embodiment is related to is realized, its agent structure is by client browser 1, shelves Case Data Enter connects composition with management module 2, Compiling of Files module 3 and with 7 database power information that it is linked.Its In, archive information typing is with management module 2 again including list in volume management submodule 4, archives list management submodule 5 and expert 6 three functional module elements of registration card message sub-module;Compiling of Files module 3 is compiled by the classification of documents and grinds submodule 7, file font size Index volume, which is ground submodule 8 and expert info and compiled, grinds 9 three functional module elements power information of submodule and connects composition;Archives typing with The typing archive information of management module 2 is simultaneously safeguarded to archive information, to the shelves of list in volume, archives list and expert's registration card Maintenance is classified and collected to case information;The compiling and researching condition that Compiling of Files module 3 is inputted according to user, realizes the classification of documents automatically Volume is ground, archives title is compiled to grind to compile with time of filing and ground, and shows that volume grinds result with tabular form;File font size index volume grinds submodule Block 8 grinds the compiling and researching condition that demand input is combined by official document font size, year, shelves number and retention period according to the volume of user, and click is looked into Filter data can be crossed according to compiling and researching condition after inquiry, to automatically generate and show that file font size index volume grinds list;Expert info Volume grinds submodule 9 and grinds statistics to expert's registration card information progress volume in file administration, according to expert's name of input, research side Fuzzy query is carried out to the compiling and researching condition with achievement, realizes that expert classification volume is ground, expert's research direction volume is ground, expert's paper information Volume is ground to compile with expert's project information and ground, and volume is ground after result is collected including on screen;Wherein expert classification, which is compiled to grind, is Compile to grind all expert infos for automatically generating a certain research field and compile and grind the results list, it is compiled and grinds result and can export to Preserve or print out in Excel or Word document.With archive information typing and management module 2 and the telecommunications of Compiling of Files module 3 7 databases of manner of breathing link are respectively archives list database, list in volume database, classification of documents database, Zhuan Jiaji The literary detailed data storehouse of this information registered database, Specialists, expert's Item Detail database and Compiling of Files basic database;Visitor Family end browser 1 is any browser software run on any computer and terminal device of networking.

Claims (6)

1. a kind of archives are compiled and grind method automatically, it is characterised in that by archives in the MIS of Department Files based on B/S frameworks Typing is combined with management module and Compiling of Files module realizes that archives automatically grind by volume, and it is concretely comprised the following steps:
(1) archive information typing is first carried out, the archives list shown according to system, list in volume and expert's registration card interface are provided Unified form difference typing archives title, classification, shelves number, annual and all kinds of essential informations of expert's situation belonging to archives;
(2) step (1) is recorded using automatic hierarchical classification algorithm proposed by the present invention is automatic by archives typing and management module again The archive information entered is classified and collected automatically, and it is basic to be stored in corresponding archives catalog, list in volume and expert respectively Information is registered in catalog data base and expert database;
(3) compiling and researching condition and the information of deposit inputted again by Compiling of Files module according to user is in corresponding archives catalog, volume Catalogue registers catalog data base with expert's essential information and expert database is retrieved, inquired about and collected and generates Compiling of Files knot Really;
(4) Compiling of Files result is included on screen, or by Compiling of Files result in the form of Word document or Excel forms Printout forms paper document preservation after export, realizes that the automatic volume of archives is ground;
The implementation process of described automatic hierarchical classification algorithm is as follows:
(1) first in local or online input system archive information, the archives list shown according to system, list in volume and expert's registration card All kinds of essential informations of unified form difference typing archives title, affiliated classification, shelves number, year and expert that interface is provided;
(2) system automatically extracts the text data characteristic parameter collection of archives title and the keyword in archives text and is stored in phase In the database answered;
(3) dimensionality reduction is carried out when exceeding threshold value to the text data characteristic parameter collection of extraction, excessive feature frequently can lead to dimension Disaster, making the efficiency of classification reduces;
(4) rough sort of Naive Bayes Classification Algorithm is performed according to the text data characteristic parameter or keyword of extraction;
(5) feature extraction is carried out for each subclass respectively again in step (4) rough sort result;
(6) the text data characteristic parameter execution Naive Bayes Classification Algorithm again for each subclass is automatically performed disaggregated classification;
(7) output category result and it is saved in corresponding database;
The NB Algorithm refers to consider that all features of text classify to text during classification, will predicted during classification Sample is divided into according to predicting the outcome in particular document class probability highest class library, and its specific disaggregated model is as follows:It is given The archives text X of one unknown classification, provided with m classification, is designated as C1, C2 ... ..., Cm, according to Naive Bayes Classification law, The calculation formula of the classification P (Ci | X) with highest posterior probability is as follows under condition X:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>C</mi> <mi>i</mi> <mo>|</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>|</mo> <mi>C</mi> <mi>i</mi> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>C</mi> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
In P (Ci | X) calculation formula, P (X) is constant, therefore only needs to maximize molecule P (X | Ci) P (Ci);P (Ci) it is category distribution probability in training set, calculation formula is:Molecule is classification in formula | Ci | include Textual data add 1, denominator is m classification and | D | be the text sum sum included in training set;In order to simplify P (X | Ci) Calculating process, it is assumed that multiple attributes of text are independent of each other, therefore, and it is exactly to calculate that characteristic attribute exists to calculate P (X | Ci) The probability occurred on classification Ci, P (X | Ci) value is calculated using 2 kinds of computation models of Laplace estimation:
(1) whether multivariate model, statistical nature attribute occurred in the text, if occurring being designated as 1, was otherwise designated as 0, calculated public Formula is:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>v</mi> <mo>|</mo> </mrow> </munderover> <mrow> <mo>(</mo> <msub> <mi>B</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> <mo>+</mo> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <msub> <mi>B</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>
Wherein, | V | the total quantity of representation eigenvalue, BxtIt is wtThe mark occurred in text X, if wtThere is then BxtIt is designated as 1, it is no Then it is designated as 0, wtRepresent t-th of feature, i.e. t-th vectorial of component, therefore, the P (w in formulat|Ci) calculation formula is as follows:
(2) multinomial model (Multinomial Model) the then occurrence number of statistical nature attribute in the text, calculation formula For:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>v</mi> <mo>|</mo> </mrow> </munderover> <mfrac> <mrow> <mi>P</mi> <msup> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>N</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> </msup> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> <mo>!</mo> </mrow> </mfrac> </mrow>
Wherein, NxtRepresent the number of times that feature t occurs in text X;P(wt|Ci) calculation formula it is as follows:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>D</mi> <mo>|</mo> </mrow> </munderover> <msub> <mi>N</mi> <mrow> <mi>j</mi> <mi>t</mi> </mrow> </msub> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>d</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mo>|</mo> <mi>V</mi> <mo>|</mo> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>V</mi> <mo>|</mo> </mrow> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>D</mi> <mo>|</mo> </mrow> </munderover> <msub> <mi>N</mi> <mrow> <mi>j</mi> <mi>s</mi> </mrow> </msub> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>d</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>
In P (wt|Ci) calculation formula in, NjtT is characterized in text djIn the number of times that once occurred, | D | for training text sum, | V | it is characterized sum, NjsIt is feature s in text djIn occurrence number;The essence of the sorting technique is to the institute in text object There is the probability that characteristic value is counted and is mapped in each already present classification.
2. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the Compiling of Files resume module being related to Data message is for the archive information typing set up and archives list, list in volume and the expert's registration card mesh of management module The basic volume that record is carried out is ground, and being created in archive information typing and management module includes archives list database, list in volume number According to storehouse, classification of documents database, expert's essential information registered database, civilized thin and 6 data of expert's Item Detail of Specialists Storehouse;Compiling of Files module is compiled to grind submodule, file font size index volume and grind submodule and expert info and compile by the classification of documents grinds 3 sons Module is constituted, and the Compiling of Files basic database associated with above-mentioned 6 databases is created in Compiling of Files module;Archives point Class is compiled and grinds the compiling and researching condition that submodule is inputted according to user, realizes that classification of documents volume is ground, archives title is compiled when grinding and filing automatically Between compile grind and with tabular form show volume grind result;File font size index volume grinds submodule and grinds demand input bag according to the volume of user Official document font size, year, shelves number and the combination of retention period compiling and researching condition are included, according to condition filter data after click inquiry, so as to Automatically generate and show that file font size index volume grinds list;Expert info is compiled and grinds submodule to expert's registration card in file administration Information carries out volume and grinds statistics, and fuzzy query is carried out according to the compiling and researching condition of expert's name of input, research direction and achievement, realizes Expert classification volume is ground, expert's research direction volume is ground, expert's paper information is compiled to grind to compile with expert's project information and ground, and volume is ground into result It is shown in after being collected on screen;Wherein expert classification, which is compiled to grind to refer to compile, grinds all experts for automatically generating a certain research field Information is compiled and grinds the results list, and its volume, which grinds result, can export to preservation or printout in Excel or Word document.
3. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the classification of documents being related to is compiled and grinds submodule Compiling of Files comprise the following steps:
(1) create classification volume and grind view, view is using list in volume or archives list as main table, and associative classification information table obtains classification Name of the information, list in volume is stored in from archives list information in different tables of data, is carrying out needing to collect when classification volume is ground Two parts information, carries out unified query and retrieves;
(2) data access layer identification code, retrieval needs to compile the archives letter ground during data access layer is the view provided from step (1) Breath, performs function using querying condition as parameter, retrieves qualified archive information, and classification volume grinds the extraction classification of documents, archives Title, time of filing, number, number of pages and shelves information, and result is ground according to specific name, distribution caption and time of filing to volume It is ranked up;
(3) classification volume grinds application layer realization, classification compiling and researching condition is set first, by the fuzzy search of classification of documents title, archives mark Topic fuzzy search and time of filing searching classification, which are compiled, to be ground;
(4) volume is ground into result to export in EXCEL or Word document, facilitates user's preservation to check and print and bind into book form.
4. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the file font size index volume being related to grinds son The Compiling of Files of module comprises the following steps:
(1) establishment file font size indexed view, file font size indexed view is carried using list in volume as main table associative classification information table Take official document font size, year, file sequence number, shelves number, page number, number of pages and retention period information;
(2) data access layer identification code, data access layer is that information is extracted from file font size indexed view, and according to official document word Number, file sequence number and year sequence, function is using querying condition as parameter, by application layer dynamic construction;
(3) application layer main code, default compiling and researching condition, including official document font size, year, shelves number and retention period, according to Volume grinds needs and is combined input, clicks on after inquiry according to condition filter data;
(4) volume is ground into result to export in EXCEL or Word document, facilitates user's preservation to check and print and bind into book form.
5. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the expert info being related to is compiled and grinds submodule Compiling of Files comprise the following steps:
(1) create expert info volume and grind view, view is using expert's essential information registration form as main table, the literary detail list of association Specialists Paper information is extracted, association expert item details extract project and winning information, the registration of expert's essential information and expert's achievement Information is stored in different tables of data, is needed to collect each several part information when row information is compiled and ground entering, is carried out unified query;
(2) data access layer identification code, data access layer is to obtain to need to compile the expert info ground from the view in step (1), Function inquires about qualified expert's archive information using querying condition as parameter, and volume is ground extraction expert name, expert's classification, ground Study carefully direction, paper information and project information;
(3) expert info, which is compiled, grinds application layer realization, sets compiling and researching condition, carries out the fuzzy search of expert's title, expert's research direction Fuzzy search, thesis topic summarizes fuzzy search with paper, and paper publishing time retrieval, project name fuzzy search, project is general State fuzzy search and the retrieval of project beginning and ending time and the prize-winning situation retrieval of project;Expert info is compiled to grind grinds bar according to the volume of input Part, realizes that expert classification volume is ground, expert's research direction volume is ground, expert's paper information is compiled to grind to compile with expert's project information and ground;
(4) volume is ground into result to export in Excel or Word document, facilitates user's preservation to check and print and bind into book form.
6. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the MIS of Department Files being related to Performing flow is:
(1) a browser is opened in client, the station address of input system sends clothes to Web server in address field Business request, when showing the login page of system on desktop, is transmitted after login page fills in user name, password and identifying code To Web server, Web server passes the homepage of MIS of Department Files with http protocol after being verified to user identity Client is given, client browser receives the homepage file transmitted, and it is shown on screen;
(2) typing of archives essential information, according to system be shown in the archives list, list in volume and expert's registration card of homepage to Classification, shelves number, archives title, year and all kinds of essential informations of expert belonging to the unified form difference typing gone out and addition archives; System performs corresponding extension application in the Business Logic of Web server and is attached with database server, passes through The above-mentioned all kinds of essential informations of user's typing or addition are arrived the corresponding number being connected with Web server by SQL modes in storage Before storehouse, catalogue and archives title are carried out automatically classifying and filing by system, then original text is mounted;Original text can be electronics Scanned copy or electronic edition original paper;
(3) when needing to carry out Compiling of Files to certain class archive information, the entry for the Compiling of Files that system is selected according to user enters To corresponding Compiling of Files interface, compiling and researching condition is inputted in the interface and clicks on inquiry button, associated databases are believed Breath retrieval and inquiry;
(4) volume inputted according to step (3) grinds parameter, performed with Web server link after by SQL statement to corresponding Database server propose data processing request, i.e., to Compiling of Files basic database and be associated other database informations Retrieved and inquiry operation, and the data item that meets compiling and researching condition will be retrieved and counted, analyzed and collected, generate archives Volume is ground into fruit;
(5) result of the Compiling of Files of generation is submitted to Web server by database server, then is sent to by Web server Client is simultaneously shown on screen;
(6) volume is ground into fruit and exports to preservation or printout in Word document or Excel forms.
CN201410714594.2A 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically Active CN104361111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410714594.2A CN104361111B (en) 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410714594.2A CN104361111B (en) 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically

Publications (2)

Publication Number Publication Date
CN104361111A CN104361111A (en) 2015-02-18
CN104361111B true CN104361111B (en) 2017-10-27

Family

ID=52528371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410714594.2A Active CN104361111B (en) 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically

Country Status (1)

Country Link
CN (1) CN104361111B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303321A (en) * 2015-11-04 2016-02-03 广州赛莱拉干细胞科技股份有限公司 Archive management method and apparatus
CN105808770A (en) * 2016-03-22 2016-07-27 北京北方微电子基地设备工艺研究中心有限责任公司 File management method and device
CN106021355B (en) * 2016-05-10 2020-07-28 重庆大学 Statistical method and custom rule establishing method, device and system among multiple tables
CN106227748A (en) * 2016-07-14 2016-12-14 上海超橙科技有限公司 A kind of information generating method and equipment
CN106227749A (en) * 2016-07-14 2016-12-14 上海超橙科技有限公司 A kind of information-pushing method and equipment
CN106776695B (en) * 2016-11-11 2020-12-04 上海信联信息发展股份有限公司 Method for automatically identifying value of document and file
CN107463651A (en) * 2017-07-27 2017-12-12 合肥泓泉档案信息科技有限公司 A kind of electronic record is filed management method
CN107491498A (en) * 2017-07-27 2017-12-19 合肥泓泉档案信息科技有限公司 A kind of automatic adjusting method of dossier table
CN109684608A (en) * 2017-10-19 2019-04-26 航天信息股份有限公司 It is a kind of that the method and system of generation EXCEL document are passed through based on database
CN107894999A (en) * 2017-10-27 2018-04-10 成都准星云学科技有限公司 Towards the topic type automatic classification method and system based on thinking of solving a problem of elementary mathematics
CN107943957A (en) * 2017-11-27 2018-04-20 广西简约科技有限公司 A kind of software design approach for collecting meeting summary
CN108763467B (en) * 2018-05-29 2023-07-11 甘肃集优品网络科技有限公司 Electronic file intelligent processing management system suitable for archives trade
CN109189730A (en) * 2018-09-21 2019-01-11 郑州云海信息技术有限公司 A kind of archives visual management method, system, device and readable storage medium storing program for executing
CN109766439A (en) * 2018-12-15 2019-05-17 内蒙航天动力机械测试所 The unlimited tree-shaped class definition and assigning method of statistical query software
CN111597150B (en) * 2020-05-09 2023-09-12 云南驰宏锌锗股份有限公司 Automatic change and file arrangement information system
CN111858499A (en) * 2020-08-03 2020-10-30 王洋 File identification method, system and device based on black and white list
CN112463896B (en) * 2020-12-08 2024-02-23 常兰会 Archive catalogue data processing method, archive catalogue data processing device, computing equipment and storage medium
CN112861473B (en) * 2021-03-12 2024-02-02 国网浙江省电力有限公司物资分公司 Directory examination result summarizing system and method based on openpyl
CN113204610A (en) * 2021-05-06 2021-08-03 广东博维创远科技有限公司 Automatic cataloguing method based on criminal case electronic file and computer readable storage device
CN113407645A (en) * 2021-05-19 2021-09-17 福建福清核电有限公司 Intelligent sound image archive compiling and researching method based on knowledge graph
CN113220842B (en) * 2021-05-20 2022-04-19 广州中海云科技有限公司 Processing method, device and equipment for maritime affair administration punishment cutting template
CN113590903B (en) * 2021-09-27 2022-01-25 广东电网有限责任公司 Management method and device of information data
CN114947402A (en) * 2022-06-20 2022-08-30 国网山东省电力公司冠县供电公司 Archives screening classification device
CN115329086B (en) * 2022-08-29 2024-04-16 中铁四局集团电气化工程有限公司 Track traffic document retrieval system and method based on classification coding
CN116501862B (en) * 2023-06-25 2023-09-12 桂林电子科技大学 Automatic text extraction system based on dynamic distributed collection
CN116595238B (en) * 2023-07-17 2023-09-19 三土电子有限公司 User archive data analysis processing method based on RFID technology

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368273A (en) * 2011-11-29 2012-03-07 神华集团有限责任公司 Archive management system and method
CN103745302A (en) * 2013-12-19 2014-04-23 镇江锐捷信息科技有限公司 Digitalized archival data management system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132496A1 (en) * 2007-11-16 2009-05-21 Chen-Kun Chen System And Method For Technique Document Analysis, And Patent Analysis System

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368273A (en) * 2011-11-29 2012-03-07 神华集团有限责任公司 Archive management system and method
CN103745302A (en) * 2013-12-19 2014-04-23 镇江锐捷信息科技有限公司 Digitalized archival data management system

Also Published As

Publication number Publication date
CN104361111A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104361111B (en) A kind of archives are compiled and grind method automatically
CN104462306B (en) A kind of archives compile grinding device automatically
CN109992645B (en) Data management system and method based on text data
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
CN101593200B (en) Method for classifying Chinese webpages based on keyword frequency analysis
US9317613B2 (en) Large scale entity-specific resource classification
CN100440224C (en) Automatization processing method of rating of merit of search engine
CN103226578B (en) Towards the website identification of medical domain and the method for webpage disaggregated classification
US20070198459A1 (en) System and method for online information analysis
CN101609450A (en) Web page classification method based on training set
CN112632405B (en) Recommendation method, recommendation device, recommendation equipment and storage medium
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN104077407B (en) A kind of intelligent data search system and method
CN101794311A (en) Fuzzy data mining based automatic classification method of Chinese web pages
CN110532309B (en) Generation method of college library user portrait system
Irudeen et al. Big data solution for Sri Lankan development: A case study from travel and tourism
CN111192176B (en) Online data acquisition method and device supporting informatization assessment of education
KR100974064B1 (en) System for providing information adapted to users and method thereof
Kim et al. Event diffusion patterns in social media
KR101801257B1 (en) Text-Mining Application Technique for Productive Construction Document Management
CN103810162A (en) Method and system for recommending network information
CN104834739B (en) Internet information storage system
Balasubramaniam et al. Identifying covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using nonnegative coupled matrix tensor factorization
CN116010552A (en) Engineering cost data analysis system and method based on keyword word library
KR20170043365A (en) Important precedents extraction and sorting method using Big Data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant