CN110399339A - File classifying method, device, equipment and the storage medium of knowledge base management system - Google Patents

File classifying method, device, equipment and the storage medium of knowledge base management system Download PDF

Info

Publication number
CN110399339A
CN110399339A CN201910524705.6A CN201910524705A CN110399339A CN 110399339 A CN110399339 A CN 110399339A CN 201910524705 A CN201910524705 A CN 201910524705A CN 110399339 A CN110399339 A CN 110399339A
Authority
CN
China
Prior art keywords
target data
data file
file
target
management system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910524705.6A
Other languages
Chinese (zh)
Inventor
王建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910524705.6A priority Critical patent/CN110399339A/en
Publication of CN110399339A publication Critical patent/CN110399339A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to artificial intelligence fields, disclose file classifying method, device, equipment and the storage medium of knowledge base management system, label can be arranged to the new old information in knowledge base management system by natural language processing technique and classify, be conducive to the convenient and fast managerial data of documenter, improve the efficiency of management, also facilitate user to carry out data-searching simultaneously, improves recall precision.The method of the present invention includes: the target data file obtained in knowledge base management system;Obtain the target data text in target data file;Pass through the key message of preset knowledge-based classification model analysis target data text;The tag along sort of target data file is set according to the key message of target data file;By establishing index for the tag along sort of target data file and target data file association;If receiving the search instruction of user, according to the target classification label carried in the search instruction of user in retrieval page displaying target information paper.

Description

File classifying method, device, equipment and the storage medium of knowledge base management system
Technical field
The present invention relates to the file classifying method of natural language processing field more particularly to knowledge base management system, device, Equipment and storage medium.
Background technique
In knowledge base management field, knowledge base management system (knowledge base management system) storage The shared data of magnanimity enterprises and institutions, it is good that managerial knowledge library data are convenient for user search data, it is also beneficial to enterprise's thing The administrator of industry unit easily handles data.
Existing knowledge base management system, the classification of data are manually to carry out mostly, and point of new data and old data Class arrangement is very difficult, and the efficiency of management is low.
Summary of the invention
The present invention provides a kind of file classifying method of knowledge base management system, can pass through natural language processing technique pair New old information in knowledge base management system is arranged label and classifies, and is conducive to the convenient and fast managerial data of documenter, improves pipe Efficiency is managed, while user also being facilitated to carry out data-searching, improves recall precision.
The first aspect of the embodiment of the present invention provides a kind of file classifying method of knowledge base management system, comprising: obtains Target data file in knowledge base management system;Obtain the target data text in the target data file;By preset The key message of target data text described in knowledge-based classification model analysis;It is set according to the key message of the target data file Set the tag along sort of the target data file;By establishing index for the tag along sort of the target data file and the mesh Mark information paper association;If receiving the search instruction of user, according to the target carried in the search instruction of the user point Class label shows the target data file in the retrieval page.
Optionally, described to obtain the target data in the first implementation of first aspect of the embodiment of the present invention Target data text in file, comprising: judge that the file type of the target data file, the file type include document File, audio-video document and picture file;If the target data file is document files, the target data text is obtained The text for including in part, using the text for including in the target data file as the target data file;If the target Information paper is audio-video document, then the audio-video document is converted into the target by preset speech recognition tools and provided Expect text;If the target data file is picture file, through preset character recognition tool from the picture file Obtain the target data text.
Optionally, described to pass through preset knowledge base point in second of implementation of first aspect of the embodiment of the present invention Class model analyzes the key message of the target data file, comprising: by the target data text of the target data file into Row pretreatment, by the target data text conversion of natural language composition at the target data text of discrete data format;It will be described The target data text input of discrete data format is into preset knowledge-based classification model;Obtain the preset knowledge-based classification mould The output phrase of type is believed the output phrase of the preset knowledge-based classification model as the key of the target data file Breath.
Optionally, described by the target data text in the third implementation of first aspect of the embodiment of the present invention The target data text of part is pre-processed, by the target data text conversion of natural language composition at the mesh of discrete data format Mark data text, comprising: word segmentation processing is carried out to the target data text, obtains pretreatment word finder;Delete the pre- place The vocabulary that frequency of occurrence in word finder is higher than first threshold is managed, frequency of occurrence in the pretreatment word finder is deleted and is lower than the second threshold The vocabulary of value obtains target word and collects;By preset dictionary index table, the target word is collected and is converted into target data Collection, using the target data set as the target data text of the discrete data format.
Optionally, described according to the target data in the 4th kind of implementation of first aspect of the embodiment of the present invention The tag along sort of the target data file is arranged in the key message of file, comprising: the mark of the target data file is arranged Set, the logo collection are used to store the sort key word of target data file;The key message of the file destination is gone It is added to the logo collection of the target data file after weight, using each sort key word in the logo collection as described in The tag along sort of target data file.
Optionally, in the 5th kind of implementation of first aspect of the embodiment of the present invention, if the inspection for receiving user Suo Zhiling then shows the target data in the retrieval page according to the target classification label carried in the search instruction of the user File, comprising: identify the target classification label carried in the search instruction;Obtain the storage that target classification tab indexes are directed toward Address;Information paper associated with the target classification label is read from the storage address;Retrieval the page show and The associated information paper of target classification label.
Optionally, in the 6th kind of implementation of first aspect of the embodiment of the present invention, the method also includes: according to pre- The time interval set updates the tag along sort of information paper in knowledge base management system.
The second aspect of the embodiment of the present invention provides a kind of device for sorting document of knowledge base management system, comprising: the One acquiring unit, for obtaining the target data file in knowledge base management system;Second acquisition unit, for obtaining the mesh Mark the target data text in information paper;Analytical unit, for being provided by target described in preset knowledge-based classification model analysis Expect the key message of file;Setting unit, for the target data to be arranged according to the key message of the target data file The tag along sort of file;Associative cell, for by establishing index for the tag along sort of the target data file and the mesh Mark information paper association;Display unit, if receiving the search instruction of user, in the search instruction according to the user The target classification label of carrying shows the target data file in the retrieval page.
Optionally, in the first implementation of second aspect of the embodiment of the present invention, second acquisition unit is specifically used for: Judge that the file type of the target data file, the file type include document files, audio-video document and picture text Part;If the target data file is document files, the text for including in the target data file is obtained, by the target The text for including in information paper is as the target data file;If the target data file is audio-video document, lead to It crosses preset speech recognition tools and the audio-video document is converted into the target data text;If the target data file It is picture file, then the target data text is obtained from the picture file by preset character recognition tool.
Optionally, in second of implementation of second aspect of the embodiment of the present invention, analytical unit is specifically included: conversion Module, for pre-processing the target data text of the target data file, by the target data of natural language composition Text conversion at discrete data format target data text;Input module, for providing the target of the discrete data format Expect text input into preset knowledge-based classification model;Module is obtained, for obtaining the defeated of the preset knowledge-based classification model Phrase out, using the output phrase of the preset knowledge-based classification model as the key message of the target data file.
Optionally, in the third implementation of second aspect of the embodiment of the present invention, conversion module is specifically used for: to institute It states target data text and carries out word segmentation processing, obtain pretreatment word finder;It is high to delete frequency of occurrence in the pretreatment word finder In the vocabulary of first threshold, the vocabulary that frequency of occurrence in the pretreatment word finder is lower than second threshold is deleted, target word is obtained Collect;By preset dictionary index table, the target word is collected and is converted into target data set, the target data set is made For the target data text of the discrete data format.
Optionally, in the 4th kind of implementation of second aspect of the embodiment of the present invention, setting unit is specifically used for: setting The logo collection of the target data file, the logo collection are used to store the sort key word of target data file;By institute The logo collection for being added to the target data file after the key message duplicate removal of file destination is stated, it will be in the logo collection Tag along sort of each sort key word as the target data file.
Optionally, in the 5th kind of implementation of second aspect of the embodiment of the present invention, display unit is specifically used for: identification The target classification label carried in the search instruction;Obtain the storage address that target classification tab indexes are directed toward;It is deposited from described It stores up and reads information paper associated with the target classification label in address;It is shown and the target classification mark in the retrieval page Sign associated information paper.
Optionally, in the 6th kind of implementation of second aspect of the embodiment of the present invention, the file of knowledge base management system Sorter further include: updating unit, for updating information paper in knowledge base management system according to preset time interval Tag along sort.
The third aspect of the embodiment of the present invention provides a kind of document classification equipment of knowledge base management system, including storage Device, processor and it is stored in the computer program that can be run on the memory and on the processor, the processor is held The file classifying method of knowledge base management system described in any of the above-described embodiment is realized when the row computer program.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, including instruction, when the finger When order is run on computers, so that computer executes the file point of knowledge base management system described in any of the above-described embodiment The step of class method.
In technical solution provided in an embodiment of the present invention, the target data file in knowledge base management system is obtained;It obtains Target data text in the target data file;Pass through target data text described in preset knowledge-based classification model analysis Key message;The tag along sort of the target data file is set according to the key message of the target data file;By building Lithol draws the tag along sort of the target data file and the target data file association;If the retrieval for receiving user refers to It enables, then the target data text is shown in the retrieval page according to the target classification label carried in the search instruction of the user Part.Label can be arranged to the new old information in knowledge base management system by natural language processing technique in the embodiment of the present invention And classify, be conducive to the convenient and fast managerial data of documenter, improve the efficiency of management, while user also being facilitated to carry out data-searching, Improve recall precision.
Detailed description of the invention
Fig. 1 is one embodiment schematic diagram of the file classifying method of knowledge base management system in the embodiment of the present invention;
Fig. 2 is another embodiment schematic diagram of the file classifying method of knowledge base management system in the embodiment of the present invention;
Fig. 3 is one embodiment schematic diagram of the device for sorting document of knowledge base management system in the embodiment of the present invention;
Fig. 4 is another embodiment schematic diagram of the device for sorting document of knowledge base management system in the embodiment of the present invention;
Fig. 5 is one embodiment schematic diagram of the document classification equipment of knowledge base management system in the embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of file classifying method of knowledge base management system, device, equipment and storages to be situated between Matter can recommend interview question to interviewer in interview process, scored and will be interviewed according to the interview content of applicant Scoring is sent to interviewer as reference frame, improves interview efficiency, it helps the interview process of specification interviewer.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention The embodiment of the present invention is described in attached drawing.
Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein Or the sequence other than the content of description is implemented.In addition, term " includes " or " having " and its any deformation, it is intended that covering is not Exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units be not necessarily limited to it is clear Step or unit those of is listed on ground, but is not clearly listed or for these process, methods, product or is set Standby intrinsic other step or units.
Referring to Fig. 1, a kind of flow chart of the file classifying method of knowledge base management system provided in an embodiment of the present invention, It specifically includes:
101, the target data file in knowledge base management system is obtained.
Server obtains the target data file in knowledge base management system.Target data text in knowledge base management system Part is the file that can be exchanged into text.Knowledge base management system is that enterprises and institutions are to realize unified document sharing and build sea Measure the centrally stored platform of document.
Information paper in knowledge base management system includes all kinds of Office files, multimedia file and electronic document, Information paper support format include Word, Excel, PowerPoint, WPS, Visio, PDF, AVI, WAV, MID, MPEG, MP3, DWF and JPG.
102, the target data text in target data file is obtained.
Server obtains the target data text in target data file.Specifically, server judges target data file File type, file type includes document files, audio-video document and picture file;If target data file is document text Part, then server obtains the text for including in target data file, provides the text for including in target data file as target Expect file;If target data file is audio-video document, server is literary by the audio-video by preset speech recognition tools Part is converted into target data text;If target data file is picture file, server passes through preset character recognition tool Target data text is obtained from the picture file.
It needs to illustrate to be, the input of preset knowledge-based classification model is textual form, and server is passing through preset knowledge Before the relation information of library disaggregated model extraction document, the text information of extraction document is needed, such as when target data file is When one first song, server needs to extract the lyrics of this song in advance.
103, pass through the key message of preset knowledge-based classification model analysis target data text.
The key message that server passes through preset knowledge-based classification model analysis target data text.Key message is preset Multiple keywords in dictionary.Preset dictionary is the classification according to the classification demand setting of information paper in knowledge base management system Keyword database.Specifically, key message of the server by preset knowledge-based classification model analysis target data file, packet Include: server pre-processes the target data text of target data file, the target data text that natural language is formed It is expressed as discrete data format;Server is by the target data text input of discrete data format to preset knowledge-based classification model In;Server obtains the output phrase of preset knowledge-based classification model, using the output phrase of preset knowledge-based classification model as The key message of target data file.
Wherein, preset knowledge-based classification model is all files constantly updated using in knowledge base management system as corpus Library, which is trained, obtains the text classification convolutional neural networks (text that accuracy reaches 90 or more percent Convolutional neural networks, TextCNN) model, and the output of preset knowledge base model is in preset dictionary Multiple keywords, TextCNN model is the algorithm classified using convolutional neural networks to text.
104, the tag along sort of target data file is set according to the key message of target data file.
The tag along sort of target data file is arranged according to the key message of target data file for server.Specifically, clothes The logo collection of business device setting target data file, mark combine the sort key word for storing target data file;Service Device will be added to the logo collection of target data file after the key message duplicate removal of file destination, by each of the logo collection Tag along sort of the sort key word as target data file.
It needs to illustrate to be, server is according to different classification demands, and under each classification foundation, target data is arranged The different logo collection of file, and all logo collections of target data file are stored into preset classified index table.It lifts Example explanation, in preset classified index table, when classification foundation is author, the logo collection of target data file are as follows: { author A, author B, author C }.
105, by establishing index for the tag along sort of target data file and target data file association.
Server is by establishing index for the tag along sort of target data file and target data file association.The index mentions For a kind of and associated data pointer of tag along sort, which is directed toward the address of storage target data file.
In knowledge base management system, server can be by establishing Hash (hash) index, B+Ttree index or letter Number index by tag along sort and target data file association, server can also by other kinds of index by tag along sort and Target data file association, such as bitmap index or descending order index, specifically herein with no restrictions.
It needs to illustrate to be, includes all tag along sorts in preset dictionary, each tag along sort is by index and owns Information paper association comprising the tag along sort.
If 106, receiving the search instruction of user, existed according to the target classification label carried in the search instruction of user Retrieve page displaying target information paper.
If receiving the search instruction of user, server is according to the target classification label carried in the search instruction of user In retrieval page displaying target information paper.Specifically, the target classification label carried in server identification search instruction;Service Device obtains the storage address that target classification tab indexes are directed toward;Server reads related to target classification label from storage address The information paper of connection;Server shows information paper associated with target classification label in the retrieval page.
Need to illustrate to be, server retrieve the page show information paper associated with the target classification label when, It supports to be ranked up according to attributes such as the title of information paper, size, date and upper successors.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection Rope improves recall precision.
Referring to Fig. 2, in the embodiment of the present invention file classifying method of knowledge base management system another embodiment packet It includes:
201, the target data file in knowledge base management system is obtained.
Server obtains the target data file in knowledge base management system.Target data text in knowledge base management system Part is the file that can be exchanged into text.Knowledge base management system is that enterprises and institutions are to realize unified document sharing and build sea Measure the centrally stored platform of document.
Information paper in knowledge base management system includes all kinds of Office files, multimedia file and electronic document, Information paper support format include Word, Excel, PowerPoint, WPS, Visio, PDF, AVI, WAV, MID, MPEG, MP3, DWF and JPG etc..
202, the target data text in target data file is obtained.
Server obtains the target data text in target data file.Specifically, server judges target data file File type, file type includes document files, audio-video document and picture file;If target data file is document text Part, then server obtains the text for including in target data file, provides the text for including in target data file as target Expect file;If target data file is audio-video document, server is literary by the audio-video by preset speech recognition tools Part is converted into target data text;If target data file is picture file, server passes through preset character recognition tool Target data text is obtained from the picture file.
It needs to illustrate to be, the input of preset knowledge-based classification model is textual form, and server is passing through preset knowledge Before the relation information of library disaggregated model extraction document, the text information of extraction document is needed, such as when target data file is When one first song, server needs to extract the lyrics of this song in advance.
203, the target data text of target data file is pre-processed, by the target data text of natural language composition Originally it is converted into the target data text of discrete data format.
Server pre-processes the target data text of target data file, the target data that natural language is formed Text conversion at discrete data format target data text.Specifically, server carries out word segmentation processing to target data text, Pretreatment word finder is obtained, such as " LeTV will be restored transaction tomorrow: 5% or more shareholder is without the plan of reducing " is subjected to word segmentation processing, Obtained pretreatment word finder be LeTV, tomorrow, restore transaction, 5%, it is above, shareholder, nothing, reduce, plan;Server is deleted Except frequency of occurrence is higher than the vocabulary of first threshold in pretreatment word finder, deletes frequency of occurrence in pretreatment word finder and be lower than second The vocabulary of threshold value obtains target word and collects;Target word is collected by preset dictionary index table and is converted into number of targets by server According to collection, using target data set as the target data text of discrete data format, such as target word collected into { LeTV is restored transaction } It is converted into target data set { 0,1 }.
Wherein, server carries out the segmentation methods of word segmentation processing to target data text, can be maximum matching algorithm, most Maximum probability segmentation methods or minimum segmentation algorithm, can also be other segmentation methods, specifically herein with no restrictions.
204, by the target data text input of discrete data format into preset knowledge-based classification model.
Server is by the target data text input of discrete data format into preset knowledge-based classification model.Preset knowledge Library disaggregated model is used to extract the feature of target data text.
It is understood that preset knowledge-based classification model will adjust the target data text combination of data format into target After matrix stack, the feature that convolution algorithm extracts objective matrix collection is carried out using preset convolution kernel, then part is removed by pond layer Obtained merging features are obtained into the key message of discrete data format after feature.
It should be noted that preset knowledge-based classification model is all texts to constantly update in knowledge base management system Part is trained as corpus obtains the TextCNN model that accuracy reaches 90 or more percent, and preset knowledge base The output of model is multiple keywords in preset dictionary, and TextCNN model is to be divided using convolutional neural networks text The algorithm of class.
205, the output phrase for obtaining preset knowledge-based classification model makees the output phrase of preset knowledge-based classification model For the key message of target data file.
Server obtains the output phrase of preset knowledge-based classification model, by the output phrase of preset knowledge-based classification model Key message as target data file.
It needs to illustrate to be, the output phrase of preset knowledge-based classification model is discrete data format, and server needs logical Preset indexed lexicon is crossed, the output phrase of discrete data format is converted to the output phrase of natural language form.
206, the tag along sort of target data file is set according to the key message of target data file.
The tag along sort of target data file is arranged according to the key message of target data file for server.Specifically, clothes The logo collection of business device setting target data file, mark combine the sort key word for storing target data file;Service Device will be added to the logo collection of target data file after the key message duplicate removal of file destination, by each of the logo collection Tag along sort of the sort key word as target data file.
It needs to illustrate to be, server is according to different classification demands, and under each classification foundation, target data is arranged The different logo collection of file, and all logo collections of target data file are stored into preset classified index table.It lifts Example explanation, in preset classified index table, when classification foundation is author, the logo collection of target data file are as follows: { author A, author B, author C }.
207, by establishing index for the tag along sort of target data file and target data file association.
Server is by establishing index for the tag along sort of target data file and target data file association.The index mentions For a kind of and associated data pointer of tag along sort, which is directed toward the address of storage target data file.
In knowledge base management system, server can be by establishing Hash (hash) index, B+Ttree index or letter Number index by tag along sort and target data file association, server can also by other kinds of index by tag along sort and Target data file association, such as bitmap index or descending order index, specifically herein with no restrictions.
It needs to illustrate to be, includes all tag along sorts in preset dictionary, each tag along sort is by index and owns Information paper association comprising the tag along sort.
If 208, receiving the search instruction of user, existed according to the target classification label carried in the search instruction of user Retrieve page displaying target information paper.
If receiving the search instruction of user, server is according to the target classification label carried in the search instruction of user In retrieval page displaying target information paper.Specifically, the target classification label carried in server identification search instruction;Service Device obtains the storage address that target classification tab indexes are directed toward;Server reads related to target classification label from storage address The information paper of connection;Server shows information paper associated with target classification label in the retrieval page.
Need to illustrate to be, server retrieve the page show information paper associated with the target classification label when, It supports to be ranked up according to attributes such as the title of information paper, size, date and upper successors.
209, the tag along sort of information paper in knowledge base management system is updated according to preset time interval.
Server updates the tag along sort of information paper in knowledge base management system according to preset time interval.Specifically , server obtains the tag along sort that manager updates according to preset time interval;The classification that server updates manager Tag update is into preset dictionary;Preset dictionary based on update is trained preset knowledge-based classification model, obtains target Knowledge-based classification model;The logo collection of information paper in knowledge base management system is updated by object knowledge library disaggregated model.
It needs to illustrate to be, the tag along sort that server updates manager is updated into preset dictionary, comprising: server According to the tag along sort that manager updates, existing tag along sort in preset dictionary is modified;Server is updated according to manager Tag along sort deletes existing tag along sort in preset dictionary;The tag along sort that server is updated according to manager adds new Tag along sort is to preset dictionary.
Wherein, preset time interval can be adjusted according to the actual situation, such as 24 hours or 48 hours, can be with It is other durations, specifically herein with no restrictions.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection Rope improves recall precision.
The file classifying method of knowledge base management system in the embodiment of the present invention is described above, below to this hair The device for sorting document of knowledge base management system is described in bright embodiment, referring to Fig. 3, knowledge base in the embodiment of the present invention One embodiment of the device for sorting document of management system includes:
First acquisition unit 301, for obtaining the target data file in knowledge base management system;
Second acquisition unit 302, for obtaining the target data text in the target data file;
Analytical unit 303, for the key message by target data file described in preset knowledge-based classification model analysis;
Setting unit 304, for the target data file to be arranged according to the key message of the target data file Tag along sort;
Associative cell 305, for being provided the tag along sort of the target data file and the target by establishing index Expect file association;
Display unit 306, if receiving the search instruction of user, for being carried in the search instruction according to the user Target classification label retrieval the page show the target data file.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection Rope improves recall precision.
Referring to Fig. 4, one embodiment of the device for sorting document of knowledge base management system includes: in the embodiment of the present invention
Second acquisition unit 302, for obtaining the target data text in the target data file;
Analytical unit 303, for the key message by target data file described in preset knowledge-based classification model analysis;
Setting unit 304, for the target data file to be arranged according to the key message of the target data file Tag along sort;
Associative cell 305, for being provided the tag along sort of the target data file and the target by establishing index Expect file association;
Display unit 306, if receiving the search instruction of user, for being carried in the search instruction according to the user Target classification label retrieval the page show the target data file.
Optionally, second acquisition unit 302 is specifically used for:
Judge the file type of the target data file, the file type include document files, audio-video document with And picture file;If the target data file is document files, the text for including in the target data file is obtained, it will The text for including in the target data file is as the target data file;If the target data file is audio-video text The audio-video document is then converted into the target data text by preset speech recognition tools by part;If the target Information paper is picture file, then the target data text is obtained from the picture file by preset character recognition tool This.
Optionally, analytical unit 303 specifically includes:
Conversion module 3031, for pre-processing the target data text of the target data file, by nature language Say the target data text conversion of composition at the target data text of discrete data format;
Input module 3032, for by the target data text input of the discrete data format to preset knowledge-based classification In model;
Module 3033 is obtained, for obtaining the output phrase of the preset knowledge-based classification model, by the preset knowledge Key message of the output phrase of library disaggregated model as the target data file.
Optionally, conversion module 3031 is specifically used for:
Word segmentation processing is carried out to the target data text, obtains pretreatment word finder;Delete the pretreatment word finder Middle frequency of occurrence is higher than the vocabulary of first threshold, deletes the word that frequency of occurrence in the pretreatment word finder is lower than second threshold It converges, obtains target word and collect;By preset dictionary index table, the target word is collected and is converted into target data set, by institute State target data text of the target data set as the discrete data format.
Optionally, setting unit 304 is specifically used for:
The logo collection of the target data file is set, and the logo collection is used to store the classification of target data file Keyword;The logo collection of the target data file will be added to after the key message duplicate removal of the file destination, it will be described Tag along sort of each sort key word as the target data file in logo collection.
Optionally, display unit 306 is specifically used for:
Identify the target classification label carried in the search instruction;With obtaining the storage that target classification tab indexes are directed toward Location;Information paper associated with the target classification label is read from the storage address;It is shown and institute in the retrieval page State the associated information paper of target classification label.
Optionally, the device for sorting document of knowledge base management system further include:
Updating unit 307, for updating the classification of information paper in knowledge base management system according to preset time interval Label.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection Rope improves recall precision.
Angle of the above figure 3 to Fig. 4 from modular functionality entity is to the knowledge base management system in the embodiment of the present invention Device for sorting document is described in detail, below from the angle of hardware handles to knowledge base management system in the embodiment of the present invention Document classification equipment is described in detail.
Fig. 5 is a kind of structural schematic diagram of the document classification equipment of knowledge base management system provided in an embodiment of the present invention, The document classification equipment 500 of the knowledge base management system can generate bigger difference because configuration or performance are different, can wrap One or more processors (central processing units, CPU) 501 is included (for example, at one or more Manage device) and memory 509, one or more store storage medium 508 (such as one of application programs 507 or data 506 Or more than one mass memory unit).Wherein, memory 509 and storage medium 508 can be of short duration storage or persistent storage. The program for being stored in storage medium 508 may include one or more modules (diagram does not mark), and each module can wrap Include the series of instructions operation in the document classification equipment to knowledge base management system.Further, processor 501 can be set It is set to and is communicated with storage medium 508, executed in storage medium 508 in the document classification equipment 500 of knowledge base management system Series of instructions operation.
The document classification equipment 500 of knowledge base management system can also include one or more power supplys 502, one or More than one wired or wireless network interface 503, one or more input/output interfaces 504, and/or, one or one The above operating system 505, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD etc..Art technology Personnel are appreciated that the document classification device structure of knowledge base management system shown in Fig. 5 is not constituted to knowledge base management The restriction of the document classification equipment of system, may include than illustrating more or fewer components, perhaps combine certain components or Different component layouts.Processor 501 can execute first acquisition unit 301 in above-described embodiment, second acquisition unit 302, The function of analytical unit 303, setting unit 304, associative cell 305 and display unit 306.
Specifically it is situated between below with reference to each component parts of the Fig. 5 to the document classification equipment of knowledge base management system It continues:
Processor 501 is the control centre of the document classification equipment of knowledge base management system, can be according to the knowledge of setting The file classifying method of base management system is handled.Processor 501 utilizes various interfaces and the entire knowledge depositary management of connection The various pieces of the document classification equipment of reason system, by run or execute the software program being stored in memory 509 and/or Module, and the data being stored in memory 509 are called, execute the various function of the document classification equipment of knowledge base management system Can and data be handled, to realize the document classification of knowledge base management system.Storage medium 508 and memory 509 are all storages The carrier of data, in embodiment, storage medium 508 can refer to that storage volume is smaller, but fireballing built-in storage, and store Device 509 can be that storage volume is big, but the external memory that storage speed is slow.
Memory 509 can be used for storing software program and module, and processor 501 is stored in memory 509 by operation Software program and module, thereby executing the document classification equipment 500 of knowledge base management system various function application and Data processing.Memory 509 can mainly include storing program area and storage data area, wherein storing program area can store operation Application program needed for system, at least one function (such as target data text in acquisition target data file etc.) etc.;It deposits Storage data field, which can be stored, uses created data (such as contingency table according to the document classification equipment of knowledge base management system Label) etc..In addition, memory 509 may include high-speed random access memory, it can also include nonvolatile memory, such as At least one disk memory, flush memory device or other volatile solid-state parts.It provides in embodiments of the present invention The file classifying method program of knowledge base management system and the data flow received store in memory, when it is desired to be used, Processor 501 is called from memory 509.
When loading on computers and executing the computer program instructions, entirely or partly generate according to of the invention real Apply process described in example or function.The computer can be general purpose computer, special purpose computer, computer network or its His programmable device.The computer instruction may be stored in a computer readable storage medium, or can from a computer Read storage medium transmitted to another computer readable storage medium, for example, the computer instruction can from a web-site, Computer, server or data center pass through wired (such as coaxial cable, optical fiber, twisted pair) or wireless (such as infrared, nothing Line, microwave etc.) mode transmitted to another web-site, computer, server or data center.It is described computer-readable Storage medium can be any usable medium that computer can store or include that one or more usable mediums are integrated The data storage devices such as server, data center.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, magnetic Band), optical medium (for example, CD) or semiconductor medium (such as solid state hard disk (solid state disk, SSD)) etc..
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in embodiments of the present invention can integrate in one processing unit, it is also possible to each A unit physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit was both It can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic or disk etc. are various can store program The medium of code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of file classifying method of knowledge base management system characterized by comprising
Obtain the target data file in knowledge base management system;
Obtain the target data text in the target data file;
Pass through the key message of target data text described in preset knowledge-based classification model analysis;
The tag along sort of the target data file is set according to the key message of the target data file;
By establishing index for the tag along sort of the target data file and the target data file association;
If receiving the search instruction of user, retrieved according to the target classification label carried in the search instruction of the user The page shows the target data file.
2. the file classifying method of knowledge base management system according to claim 1, which is characterized in that described in the acquisition Target data text in target data file, comprising:
Judge that the file type of the target data file, the file type include document files, audio-video document and figure Piece file;
If the target data file is document files, the text for including in the target data file is obtained, by the mesh The text for including in mark information paper is as the target data file;
If the target data file is audio-video document, the audio-video document is turned by preset speech recognition tools Change the target data text into;
If the target data file is picture file, obtained from the picture file by preset character recognition tool The target data text.
3. the file classifying method of knowledge base management system according to claim 1, which is characterized in that described by preset The key message of target data file described in knowledge-based classification model analysis, comprising:
The target data text of the target data file is pre-processed, the target data text of natural language composition is turned Change the target data text of discrete data format into;
By the target data text input of the discrete data format into preset knowledge-based classification model;
The output phrase for obtaining the preset knowledge-based classification model makees the output phrase of the preset knowledge-based classification model For the key message of the target data file.
4. the file classifying method of knowledge base management system according to claim 3, which is characterized in that described by the mesh The target data text of mark information paper is pre-processed, by the target data text conversion of natural language composition at discrete data The target data text of format, comprising:
Word segmentation processing is carried out to the target data text, obtains pretreatment word finder;
The vocabulary that frequency of occurrence in the pretreatment word finder is higher than first threshold is deleted, deletes in the pretreatment word finder Occurrence number is lower than the vocabulary of second threshold, obtains target word and collects;
By preset dictionary index table, the target word is collected and is converted into target data set, the target data set is made For the target data text of the discrete data format.
5. the file classifying method of knowledge base management system according to claim 1, which is characterized in that described according to The tag along sort of the target data file is arranged in the key message of target data file, comprising:
The logo collection of the target data file is set, and the logo collection is used to store the sort key of target data file Word;
The logo collection of the target data file will be added to after the key message duplicate removal of the file destination, by the mark Tag along sort of each sort key word as the target data file in set.
6. according to claim 1 in -5 any knowledge base management system file classifying method, which is characterized in that it is described If receiving the search instruction of user, according to the target classification label carried in the search instruction of the user in the retrieval page Show the target data file, comprising:
Identify the target classification label carried in the search instruction;
Obtain the storage address that target classification tab indexes are directed toward;
Information paper associated with the target classification label is read from the storage address;
Information paper associated with the target classification label is shown in the retrieval page.
7. according to claim 1 in -5 any knowledge base management system file classifying method, which is characterized in that it is described Method further include:
The tag along sort of information paper in knowledge base management system is updated according to preset time interval.
8. a kind of device for sorting document of knowledge base management system characterized by comprising
First acquisition unit, for obtaining the target data file in knowledge base management system;
Second acquisition unit, for obtaining the target data text in the target data file;
Analytical unit, for the key message by target data file described in preset knowledge-based classification model analysis;
Setting unit, for the contingency table of the target data file to be arranged according to the key message of the target data file Label;
Associative cell, for being closed the tag along sort of the target data file and the target data file by establishing index Connection;
Display unit, the target if receiving the search instruction of user, for being carried in the search instruction according to the user Tag along sort shows the target data file in the retrieval page.
9. a kind of document classification equipment of knowledge base management system, which is characterized in that including memory, processor and be stored in institute The computer program that can be run on memory and on the processor is stated, the processor executes real when the computer program The now file classifying method of the knowledge base management system as described in any one of claim 1-7.
10. a kind of computer readable storage medium, which is characterized in that including instruction, when described instruction is run on computers, So that computer executes the file classifying method of the knowledge base management system as described in any one of claim 1-7.
CN201910524705.6A 2019-06-18 2019-06-18 File classifying method, device, equipment and the storage medium of knowledge base management system Pending CN110399339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910524705.6A CN110399339A (en) 2019-06-18 2019-06-18 File classifying method, device, equipment and the storage medium of knowledge base management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910524705.6A CN110399339A (en) 2019-06-18 2019-06-18 File classifying method, device, equipment and the storage medium of knowledge base management system

Publications (1)

Publication Number Publication Date
CN110399339A true CN110399339A (en) 2019-11-01

Family

ID=68323232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910524705.6A Pending CN110399339A (en) 2019-06-18 2019-06-18 File classifying method, device, equipment and the storage medium of knowledge base management system

Country Status (1)

Country Link
CN (1) CN110399339A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046225A (en) * 2019-12-20 2020-04-21 网易(杭州)网络有限公司 Audio resource processing method, device, equipment and storage medium
CN111125016A (en) * 2019-12-24 2020-05-08 普世(南京)智能科技有限公司 Magneto-optical hybrid file storage method and system based on label organization
CN111523289A (en) * 2020-04-24 2020-08-11 支付宝(杭州)信息技术有限公司 Text format generation method, device, equipment and readable medium
CN111881100A (en) * 2020-07-10 2020-11-03 棕榈设计有限公司 Knowledge base management framework system, management method, device and storage medium
CN112256669A (en) * 2020-09-27 2021-01-22 北京三快在线科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112445782A (en) * 2020-12-10 2021-03-05 深圳市中博科创信息技术有限公司 Enterprise knowledge base management method for customer service
CN112559670A (en) * 2020-12-22 2021-03-26 江苏鼎岳智慧信息技术有限公司 Data management system
CN112597100A (en) * 2020-09-17 2021-04-02 武汉大学 File management method and device based on object proxy tag
CN113360459A (en) * 2021-07-08 2021-09-07 国网能源研究院有限公司 Method, system and device for semi-automatically marking and storing files
CN113392250A (en) * 2021-06-30 2021-09-14 合肥高维数据技术有限公司 Vector diagram retrieval method and system based on deep learning
CN115422131A (en) * 2022-11-04 2022-12-02 北京国电通网络技术有限公司 Business audit knowledge base retrieval method, device, equipment and computer readable medium
CN115934880A (en) * 2022-10-31 2023-04-07 永道工程咨询有限公司 Construction of project cost document database and search method of project cost document
CN117454396A (en) * 2023-10-24 2024-01-26 深圳市马博士网络科技有限公司 Forced access control system and method for private cloud system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010250439A (en) * 2009-04-13 2010-11-04 Kanagawa Univ Retrieval system, data generation method, program and recording medium for recording program
CN103034667A (en) * 2011-10-08 2013-04-10 英业达股份有限公司 System and method for establishing database according to webpage index labels
CN104123366A (en) * 2014-07-23 2014-10-29 谢建平 Search method and server
CN107038480A (en) * 2017-05-12 2017-08-11 东华大学 A kind of text sentiment classification method based on convolutional neural networks
CN107944559A (en) * 2017-11-24 2018-04-20 国家计算机网络与信息安全管理中心 A kind of entity relationship automatic identifying method and system
CN108255972A (en) * 2017-12-27 2018-07-06 浪潮通用软件有限公司 A kind of text searching method and system
CN108829765A (en) * 2018-05-29 2018-11-16 平安科技(深圳)有限公司 A kind of information query method, device, computer equipment and storage medium
CN108932294A (en) * 2018-05-31 2018-12-04 平安科技(深圳)有限公司 Resume data processing method, device, equipment and storage medium based on index
CN109558492A (en) * 2018-10-16 2019-04-02 中山大学 A kind of listed company's knowledge mapping construction method and device suitable for event attribution

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010250439A (en) * 2009-04-13 2010-11-04 Kanagawa Univ Retrieval system, data generation method, program and recording medium for recording program
CN103034667A (en) * 2011-10-08 2013-04-10 英业达股份有限公司 System and method for establishing database according to webpage index labels
CN104123366A (en) * 2014-07-23 2014-10-29 谢建平 Search method and server
CN107038480A (en) * 2017-05-12 2017-08-11 东华大学 A kind of text sentiment classification method based on convolutional neural networks
CN107944559A (en) * 2017-11-24 2018-04-20 国家计算机网络与信息安全管理中心 A kind of entity relationship automatic identifying method and system
CN108255972A (en) * 2017-12-27 2018-07-06 浪潮通用软件有限公司 A kind of text searching method and system
CN108829765A (en) * 2018-05-29 2018-11-16 平安科技(深圳)有限公司 A kind of information query method, device, computer equipment and storage medium
CN108932294A (en) * 2018-05-31 2018-12-04 平安科技(深圳)有限公司 Resume data processing method, device, equipment and storage medium based on index
CN109558492A (en) * 2018-10-16 2019-04-02 中山大学 A kind of listed company's knowledge mapping construction method and device suitable for event attribution

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046225A (en) * 2019-12-20 2020-04-21 网易(杭州)网络有限公司 Audio resource processing method, device, equipment and storage medium
CN111046225B (en) * 2019-12-20 2024-01-26 网易(杭州)网络有限公司 Audio resource processing method, device, equipment and storage medium
CN111125016A (en) * 2019-12-24 2020-05-08 普世(南京)智能科技有限公司 Magneto-optical hybrid file storage method and system based on label organization
CN111523289A (en) * 2020-04-24 2020-08-11 支付宝(杭州)信息技术有限公司 Text format generation method, device, equipment and readable medium
CN111523289B (en) * 2020-04-24 2023-05-09 支付宝(杭州)信息技术有限公司 Text format generation method, device, equipment and readable medium
CN111881100A (en) * 2020-07-10 2020-11-03 棕榈设计有限公司 Knowledge base management framework system, management method, device and storage medium
CN112597100B (en) * 2020-09-17 2022-07-15 武汉大学 File management method and device based on object proxy label
CN112597100A (en) * 2020-09-17 2021-04-02 武汉大学 File management method and device based on object proxy tag
CN112256669A (en) * 2020-09-27 2021-01-22 北京三快在线科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112445782A (en) * 2020-12-10 2021-03-05 深圳市中博科创信息技术有限公司 Enterprise knowledge base management method for customer service
CN112559670A (en) * 2020-12-22 2021-03-26 江苏鼎岳智慧信息技术有限公司 Data management system
CN113392250A (en) * 2021-06-30 2021-09-14 合肥高维数据技术有限公司 Vector diagram retrieval method and system based on deep learning
CN113392250B (en) * 2021-06-30 2024-01-12 合肥高维数据技术有限公司 Vector diagram retrieval method and system based on deep learning
CN113360459A (en) * 2021-07-08 2021-09-07 国网能源研究院有限公司 Method, system and device for semi-automatically marking and storing files
CN115934880A (en) * 2022-10-31 2023-04-07 永道工程咨询有限公司 Construction of project cost document database and search method of project cost document
CN115422131A (en) * 2022-11-04 2022-12-02 北京国电通网络技术有限公司 Business audit knowledge base retrieval method, device, equipment and computer readable medium
CN117454396A (en) * 2023-10-24 2024-01-26 深圳市马博士网络科技有限公司 Forced access control system and method for private cloud system
CN117454396B (en) * 2023-10-24 2024-07-05 深圳市马博士网络科技有限公司 Forced access control system and method for private cloud system

Similar Documents

Publication Publication Date Title
CN110399339A (en) File classifying method, device, equipment and the storage medium of knowledge base management system
Hidayat et al. Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier
US11663254B2 (en) System and engine for seeded clustering of news events
US9390086B2 (en) Classification system with methodology for efficient verification
Jiang et al. An improved K-nearest-neighbor algorithm for text categorization
Al Qadi et al. Arabic text classification of news articles using classical supervised classifiers
Bisandu et al. Clustering news articles using efficient similarity measure and N-grams
CN111125086B (en) Method, device, storage medium and processor for acquiring data resources
US10706030B2 (en) Utilizing artificial intelligence to integrate data from multiple diverse sources into a data structure
CA2956627A1 (en) System and engine for seeded clustering of news events
Bolaj et al. Text classification for Marathi documents using supervised learning methods
CN111783861A (en) Data classification method, model training device and electronic equipment
CN115098690B (en) Multi-data document classification method and system based on cluster analysis
CN111522950A (en) Rapid identification system for unstructured massive text sensitive data
CN116401338A (en) Design feature extraction and attention mechanism based on data asset intelligent retrieval input and output requirements and method thereof
CN114266255A (en) Corpus classification method, apparatus, device and storage medium based on clustering model
Ilic et al. Suffix tree clustering–data mining algorithm
Bhatt et al. An improved optimized web page classification using firefly algorithm with nb classifier (wpcnb)
Swarnalatha et al. Classwise clustering for classification of imbalanced text data
Desai et al. Analysis of Health Care Data Using Natural Language Processing
Singh et al. Intra News Category Classification using N-gram TF-IDF Features and Decision Tree Classifier
CN111259150A (en) Document representation method based on word frequency co-occurrence analysis
CN109947941A (en) A kind of method and system based on elevator customer service text classification
Arivarasan et al. Data mining K-means document clustering using tfidf and word frequency count
CN111694948B (en) Text classification method and system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191101

RJ01 Rejection of invention patent application after publication