CN106844714A - A kind of knowledge base management system - Google Patents

A kind of knowledge base management system Download PDF

Info

Publication number
CN106844714A
CN106844714A CN201710068416.0A CN201710068416A CN106844714A CN 106844714 A CN106844714 A CN 106844714A CN 201710068416 A CN201710068416 A CN 201710068416A CN 106844714 A CN106844714 A CN 106844714A
Authority
CN
China
Prior art keywords
file
user
server
retrieval
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710068416.0A
Other languages
Chinese (zh)
Inventor
耿玉霞
白宏熙
陈慧萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Campus of Hohai University
Original Assignee
Changzhou Campus of Hohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Campus of Hohai University filed Critical Changzhou Campus of Hohai University
Priority to CN201710068416.0A priority Critical patent/CN106844714A/en
Publication of CN106844714A publication Critical patent/CN106844714A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of knowledge base management system, the present invention relates to a kind of knowledge base management system, with Browser/Server Mode(Browser/Server)Structure as system overall architecture, the document classification system that effective intellectual property is concluded, forms orderliness is carried out to mass file, user can carry out easily document manipulation by the system, the system puts position user experience in the first place, realize the functions such as very fast file dynamic, multi-platform file interconnection, intelligent full-text retrieval, personalized rules customization, class window-operating, it is characterized in that will efficiently be stored in Knowledge Set to server end, and web-based management pattern is used, to reach the purpose of knowledge assistance management, integrated use knowledge.System architecture aspect sets file server and Web server, it is ensured that the high efficiency of user file storage;The system will carry out effective induction-arrangement and analysis, and boosting is personal and tissue is more preferably using the realization of intellectual property.

Description

A kind of knowledge base management system
Technical field
The invention belongs to administrative skill field, more particularly to a kind of knowledge base management system.
Background technology
In our daily life and work, everyone constantly adds up in respective field, forms different levels Process resource wealth, covers the intellectual property carrier of each type such as text, fax, figure, video, audio.Intellectual property It is constantly accumulative during, we are also faced with some problems, because the source of intellectual property is different, purposes is different, carrier not On an equal basis, often lead to that the time chain entanglement of intellectual property, association knowledge cannot be retrieved effectively, all kinds of knowledge cannot integrated use The problems such as appearance, with intellectual property persistent accumulation, this problem more protrude, in the urgent need to effective means come Typing, management, analysis, statistical knowledge wealth.
The content of the invention
The present invention provides a kind of knowledge base management system, to solve the problems of the prior art.
To achieve the above object, the technical solution adopted by the present invention is:
A kind of knowledge base management system, using Browser/Server structures as the overall architecture of system, system architecture includes File server and Web server;
Said system also includes file basic management, the centrally stored upload of file, file link, file process, file association, text Part is shared, rule is used, intelligent retrieval and commending system;
The file basic management carries out newly-built, duplication, stickup, shearing, renaming, deletion, online pressure to file including user Contracting and decompression;Support Offcie documents, PDF, image, audio frequency and video and drawing class file;Support all kinds of forms of online preview Picture, play and video, audio and browse all kinds of office documents;File download;
The centrally stored upload of file includes that user is uploaded to the batch of file and compressed package is imported;Enter style of writing in front end first The sectioning of part, the burst for then carrying out file is uploaded;
The file link includes sending file to catalogue;
The file process after transmitting file on user, using Apache Lucene Tika algorithms, in spooler Text extraction, extraction keyword therein can be carried out to file, or carries out the information extraction of metadata, generate the summary of file; Detection and actual types of the text-processing also including file language form judge;Wherein, the Apache Lucene for being used Tika algorithms, Tika is by a resolver framework, and MIME testing mechanisms, language detection, an and facade components contact is all Component.In total, the architecture of Tika is that expansible, new resolver can easily be added and deleted.Its Specifically include:Language testing mechanism:Supporting language recognizes that Tika has a class to be called language identifier in bag Org.apache.tika.language and instruction identification information storehouse the inside contain algorithm of the language detection from given text.Often When a text is passed to Tika, it will detect language therein.And receive comment file without language and By detecting that the language adds the metadata information in this document.Its inside carries out language detection using N-gram algorithms;MIME Testing mechanism:Tika can be according to MIME standard detection Doctypes.Acquiescence mime type detection is to use org.apache.tika.mime.mimeTypes.It uses org.apache.tika.detect.Detector interfaces to complete Most contents type detection;Resolver interface:Org.apache.tika.parser resolver interfaces are its parsing documents Primary interface.Text and metadata of the interface from document is extracted, and summarize it and be ready to write resolver to external user and insert Part.Due to using different specific resolver classes, specially each Doctype, therefore support substantial amounts of file format.These The file format that the concrete kind of form is different provides support, either by being directly realized by logic analyzer or being parsed using outside Device storehouse;Tika Facade classes:By realizing basic use-case, Tika as facade agency.The bottom in it is abstract Tika storehouses Layer complexity, such as MIME testing mechanisms, resolver interface and language testing mechanism, and it is supplied to user one simply to connect Mouthful use.
The file association, file is newly-built or after uploading, and automatic or manual is that this document generates unique number, and supports text The batch association of shelves;
The rule is used and includes regulation engine and rule setting;
The intelligent retrieval uses Apache Lucene full-text search engine kits, sets up Solr full-text search engines, while With reference to TF-IDF and name Entity recognition generation prediction entry;
The commending system is based on the collaborative filtering and neural network model of Mahout, forms effective retrieval system.
Wherein, the input of the recommended engine based on collaborative filtering is the history preference information of user, in Mahout it It is modeled as Preference(Interface).In order to optimize performance, two be given using Mahout realize class, respectively according to User and article are assembled to user preference in itself, can thus compress the space of ID or article ID.
The Generalization bounds of the collaborative filtering that Mahout is provided, we select wherein most classical three kinds, User CF, Item CF and Slope One;
1.User CF:
1) sets up DataModel from file, such as using FileDataModel.
2) is based on the similarity that user preference data calculates user, such as PearsonCorrelationSimilarity (Similarity is calculated based on Pearson correlation coefficient)
ItemSimilarity is also similar:
According to the similarity calculating method set up, neighbor user is found.Here the method for looking for neighbor user, also including two kinds:It is " solid The neighbours of fixed number amount(NearestNUserNeighborhood:The nearest-neighbors of fixed qty N are taken to each user)" and " Similarity threshold neighbours(ThresholdUserNeighborhood:Certain limitation is based on to each user, takes similar All users in degree thresholding are neighbours)" computational methods.Based on DataModel, UserNeighborhood and UserSimilarity builds GenericUserBasedRecommender, realizes User CF Generalization bounds.
The realization of 2.Item CF is similar with User CF, is all based on ItemSimilarity.
3.Slope One
In big data quantity, the amount of calculation of User CF, Item CF can be very big, so as to cause to recommend efficiency poor.Therefore use More CF Generalization bounds of lightweight:Slope One.
Neural network model is then first generated alternative file list by neutral net, then to input file(Such as Search Results 5 before candidate)List give a mark ranking, so as to by file recommendation in the top to user.
Alternative file generation is to produce a wide in range personalized recommendation candidate for user by collaborative filtering List.Ranking neutral net is that the list of network is generated based on first candidate, there is provided finer differentiation refinement, is always reached Recommendation hit rate higher.A series of feature of description files and user is provided by objective function, ranking network is then Given a mark to each file according to object function.One group of file of fraction highest is just recommended to user.
The benefit of double-level neural network model can be million magnitude files for the treatment of, and ensure that the file for recommending user is It is high-quality.Commending system employs various quantizating index, such as accuracy rate, coverage rate, ranking loss in exploitation and training stage Etc..
The Offcie documents include the form of Word, Excel, PowerPoint, WPS, Visio;The file process The text of middle support pdf, doc, docx, ppt, excel, txt, html, xml, zip, tar form.
The file association, user manually sets master file, and associates that its is corresponding from file, unique by file Numbering, realizes that file is associated with associated documents, drawing, picture and other form annexes, and a key is quick by clickthrough Check;Auto-associating is carried out to file, video generates thumbnail, and picture is compressed, so as to be embodied as picture, audio, video Class file increases thumbnail;File increases summary.
The sharing files include that sending mail shares, and directly shares the link of file, save borrow mailbox server Transfer process;Share to social platform;The shared file by way of letter in station;The file of wechat platform deployment in mobile terminal Search is shared;File server is set, physical space is locally marked as virtual file server catalogue for storage file, FTP carries out the transmission of file, and Tomcat safeguards this virtual file server catalogue and Web server Tomcat Server When, a virtual file server is safeguarded in Tomcat, it is that every user distributes a single file root, all uploads File moves into library, and a storage file accesses link under each User Catalog.
Also include Table top type interface, active file is presented to user in the form of desktop shortcuts, and user is without every time Searched in each catalogue is opened, respective document is checked by the shortcut is clicked directly on when using.
Unique number is automatically generated during files passe, the batch association of file is supported, manual association operation, user can be carried out Master file is set, and is associated corresponding from file;The unique number automatically generated during by files passe, realize document to it is related Document, drawing, picture and other form annexes are associated, and the key of clickthrough one is quickly checked;Sharing files have various points Enjoy in channel, including system send mail share, directly the link of file is shared, save borrow mailbox server in turn over Journey;The shared part of the file search of wechat platform deployment in mobile terminal.
The regulation engine, including hiding, automated back-up, transfer, filing are encrypted to file are of little use file, file It is temporary;The rule setting function, is file required movement, condition and operation, when action triggers meet the condition of setting, system The then operation of automatic executing rule.
The Solr full-text search engines set up in the intelligent retrieval, intelligent sequencing is carried out to retrieval result according to weight, And it is highlighted term;And provide the user cross-language information retrieval, spell check, canonical retrieval, real-time retrieval result With the record of entry, the Optimum Operation of assisted retrieval is realized;In retrieving, according to the automatic benefit that historical record and network boom are searched Full behavior;User can quickly precisely file needed for retrieval carries out intelligent retrieval from magnanimity data, and retrieval result is in viewing area Domain shows.
The commending system, the viewing area of user search result, with reference to user search result first five carry out it is parallel Recommend, i.e. the part of " you may still want to look for ", retrieved in advance for different user is generated during user search for term Survey entry.
Said system, also including generation user's calendar and dynamic part;
User's calendar is during the quantity of documents that user uploads recorded file calendar;File dynamic part is user according to " up and down The mode of text lookup ", is quickly found out the file of oneself needs.
Compared with prior art, the present invention has advantages below:
The present invention extracts the wealth in user knowledge, using B/S structures as the overall architecture of system, management system is placed on Web ends, file is stored in server end.On the basis of operation readiness, mass file is carried out effective intellectual property conclude, The document classification system of orderliness is formed, user can carry out easily document manipulation by the system, the system is by user's body Degree of testing is put in the first place position, realizes very fast file dynamic, multi-platform file interconnection, intelligent full-text retrieval, personalized rule fixed The functions such as system, class window-operating.
Brief description of the drawings
Fig. 1 is system architecture diagram of the invention.
Specific embodiment
The present invention is further described with reference to embodiment.
As shown in figure 1, a kind of knowledge base management system, using Browser/Server structures as the integrated stand of system Structure, system architecture includes file server and Web server;
Said system also includes file basic management, the centrally stored upload of file, file link, file process, file association, text Part is shared, rule is used, intelligent retrieval and commending system;
The file basic management carries out newly-built, duplication, stickup, shearing, renaming, deletion, online pressure to file including user Contracting and decompression;Support Offcie documents, PDF, image, audio frequency and video and drawing class file;Support all kinds of forms of online preview Picture, play and video, audio and browse all kinds of office documents;File download;
The centrally stored upload of file includes that user is uploaded to the batch of file and compressed package is imported;Enter style of writing in front end first The sectioning of part, the burst for then carrying out file is uploaded;
The file link includes sending file to catalogue;
The file process after transmitting file on user, using Apache Lucene Tika algorithms, in spooler Text extraction, extraction keyword therein can be carried out to file, or carries out the information extraction of metadata, generate the summary of file; Detection and actual types of the text-processing also including file language form judge;Wherein, the Apache Lucene for being used Tika algorithms, Tika is by a resolver framework, and MIME testing mechanisms, language detection, an and facade components contact is all Component.In total, the architecture of Tika is that expansible, new resolver can easily be added and deleted.Its Specifically include:Language testing mechanism:Supporting language recognizes that Tika has a class to be called language identifier in bag Org.apache.tika.language and instruction identification information storehouse the inside contain algorithm of the language detection from given text.Often When a text is passed to Tika, it will detect language therein.And receive comment file without language and By detecting that the language adds the metadata information in this document.Its inside carries out language detection using N-gram algorithms;MIME Testing mechanism:Tika can be according to MIME standard detection Doctypes.Acquiescence mime type detection is to use org.apache.tika.mime.mimeTypes.It uses org.apache.tika.detect.Detector interfaces to complete Most contents type detection;Resolver interface:Org.apache.tika.parser resolver interfaces are its parsing documents Primary interface.Text and metadata of the interface from document is extracted, and summarize it and be ready to write resolver to external user and insert Part.Due to using different specific resolver classes, specially each Doctype, therefore support substantial amounts of file format.These The file format that the concrete kind of form is different provides support, either by being directly realized by logic analyzer or being parsed using outside Device storehouse;Tika Facade classes:By realizing basic use-case, Tika as facade agency.The bottom in it is abstract Tika storehouses Layer complexity, such as MIME testing mechanisms, resolver interface and language testing mechanism, and it is supplied to user one simply to connect Mouthful use.
The file association, file is newly-built or after uploading, and automatic or manual is that this document generates unique number, and supports text The batch association of shelves;
The rule is used and includes regulation engine and rule setting;
The intelligent retrieval uses Apache Lucene full-text search engine kits, sets up Solr full-text search engines, while With reference to TF-IDF and name Entity recognition generation prediction entry;
The commending system is based on the collaborative filtering and neural network model of Mahout, forms effective retrieval system.
Wherein, the input of the recommended engine based on collaborative filtering is the history preference information of user, in Mahout it It is modeled as Preference(Interface).In order to optimize performance, two be given using Mahout realize class, respectively according to User and article are assembled to user preference in itself, can thus compress the space of ID or article ID.
The Generalization bounds of the collaborative filtering that Mahout is provided, we select wherein most classical three kinds, User CF, Item CF and Slope One;
1.User CF:
1) sets up DataModel from file, such as using FileDataModel.
2) is based on the similarity that user preference data calculates user, such as PearsonCorrelationSimilarity (Similarity is calculated based on Pearson correlation coefficient)
ItemSimilarity is also similar:
According to the similarity calculating method set up, neighbor user is found.Here the method for looking for neighbor user, also including two kinds:It is " solid The neighbours of fixed number amount(NearestNUserNeighborhood:The nearest-neighbors of fixed qty N are taken to each user)" and " Similarity threshold neighbours(ThresholdUserNeighborhood:Certain limitation is based on to each user, takes similar All users in degree thresholding are neighbours)" computational methods.Based on DataModel, UserNeighborhood and UserSimilarity builds GenericUserBasedRecommender, realizes User CF Generalization bounds.
The realization of 2.Item CF is similar with User CF, is all based on ItemSimilarity.
3.Slope One
In big data quantity, the amount of calculation of User CF, Item CF can be very big, so as to cause to recommend efficiency poor.Therefore use More CF Generalization bounds of lightweight:Slope One.
Neural network model is then first generated alternative file list by neutral net, then to input file(Such as Search Results 5 before candidate)List give a mark ranking, so as to by file recommendation in the top to user.
Alternative file generation is to produce a wide in range personalized recommendation candidate for user by collaborative filtering List.Ranking neutral net is that the list of network is generated based on first candidate, there is provided finer differentiation refinement, is always reached Recommendation hit rate higher.A series of feature of description files and user is provided by objective function, ranking network is then Given a mark to each file according to object function.One group of file of fraction highest is just recommended to user.
The benefit of double-level neural network model can be million magnitude files for the treatment of, and ensure that the file for recommending user is It is high-quality.Commending system employs various quantizating index, such as accuracy rate, coverage rate, ranking loss in exploitation and training stage Etc..
The Offcie documents include the form of Word, Excel, PowerPoint, WPS, Visio;The file process The text of middle support pdf, doc, docx, ppt, excel, txt, html, xml, zip, tar form.
The file association, user manually sets master file, and associates that its is corresponding from file, unique by file Numbering, realizes that file is associated with associated documents, drawing, picture and other form annexes, and a key is quick by clickthrough Check;Auto-associating is carried out to file, video generates thumbnail, and picture is compressed, so as to be embodied as picture, audio, video Class file increases thumbnail;File increases summary.
The sharing files include that sending mail shares, and directly shares the link of file, save borrow mailbox server Transfer process;Share to social platform;The shared file by way of letter in station;The file of wechat platform deployment in mobile terminal Search is shared;File server is set, physical space is locally marked as virtual file server catalogue for storage file, FTP carries out the transmission of file, and Tomcat safeguards this virtual file server catalogue and Web server Tomcat Server When, a virtual file server is safeguarded in Tomcat, it is that every user distributes a single file root, all uploads File moves into library, and a storage file accesses link under each User Catalog.
Also include Table top type interface, active file is presented to user in the form of desktop shortcuts, and user is without every time Searched in each catalogue is opened, respective document is checked by the shortcut is clicked directly on when using.
Unique number is automatically generated during files passe, the batch association of file is supported, manual association operation, user can be carried out Master file is set, and is associated corresponding from file;The unique number automatically generated during by files passe, realize document to it is related Document, drawing, picture and other form annexes are associated, and the key of clickthrough one is quickly checked;Sharing files have various points Enjoy in channel, including system send mail share, directly the link of file is shared, save borrow mailbox server in turn over Journey;The shared part of the file search of wechat platform deployment in mobile terminal.
The regulation engine, including hiding, automated back-up, transfer, filing are encrypted to file are of little use file, file It is temporary;The rule setting function, is file required movement, condition and operation, when action triggers meet the condition of setting, system The then operation of automatic executing rule.
The Solr full-text search engines set up in the intelligent retrieval, intelligent sequencing is carried out to retrieval result according to weight, And it is highlighted term;And provide the user cross-language information retrieval, spell check, canonical retrieval, real-time retrieval result With the record of entry, the Optimum Operation of assisted retrieval is realized;In retrieving, according to the automatic benefit that historical record and network boom are searched Full behavior;User can quickly precisely file needed for retrieval carries out intelligent retrieval from magnanimity data, and retrieval result is in viewing area Domain shows.
The commending system, the viewing area of user search result, with reference to user search result first five carry out it is parallel Recommend, i.e. the part of " you may still want to look for ", retrieved in advance for different user is generated during user search for term Survey entry.
Said system, also including generation user's calendar and dynamic part;
User's calendar is during the quantity of documents that user uploads recorded file calendar;File dynamic part is user according to " up and down The mode of text lookup ", is quickly found out the file of oneself needs.
A kind of knowledge base management system, including KBMS(WEB server), KBMS(WEB server)With file server (Tomcat)Communication, KBMS(WEB server), database(Mysql), index database(Solr)Datacycle is sequentially formed, is passed through KBMS(WEB server)Full-text search, intelligent retrieval, prediction entry, commending system, batch upload, text extraction can be carried out (Participle), document associations, file, mail, exterior chain, desktop, regulation engine are transmitted/received, wherein file is extracted(Participle)Including document Summary, keyword and name Entity recognition.
First, file basic management
User can carry out the operation such as newly-built, duplication, stickup, shearing, renaming, deletion to file, and basic pipe is carried out to file Reason operation;Using PageOffice local component, realize line compression and the decompression of file, support Office documents, PDF, The all types of files such as image, audio frequency and video and drawing;Support the picture of all kinds of forms of online preview, play video, audio and clear Look at all kinds of Office files, the form such as including Word, Excel, PowerPoint, WPS, Visio.And for the use of user Behavior, the regulation engine of coupling system can carry out further management operation to file.
In addition the system also provides the function of file download, and user can freely download the file in oneself knowledge base, The shared file of other users can also be downloaded.
2nd, the centrally stored upload of file
For large batch of file, the system is supplied to the functions such as the batch of user file is uploaded and compressed package is imported, and exists first Front end carries out the sectioning of file, and the burst for then carrying out file is uploaded, and while uploading speed is accelerated, can effectively be reduced Because the file that suspension brings loses mistake during files passe.
3rd, document links
Document links function is supported to send document to common document, and user in each catalogue is opened without searching every time;This System provides the user the interface of desktop form(Class viewfinder operation interface), active file represents in the form of desktop shortcuts To user, user checks respective document without being searched in each catalogue is opened every time by only need to clicking on the shortcut.
On the other hand, the system is supported to send document to catalogue, facilitates other users to consult.
4th, text-processing
After the upper transmitting file of user, using Apache Lucene Tika algorithms, the spooler of the system can be to text File carries out text extraction, supports the common formats such as pdf, doc, docx, ppt, excel, txt, html, xml, zip, tar Text, extracts keyword therein, for files such as video, audios, carries out the information extraction of metadata, and using life Name Entity recognition(Stamford NER bags), Chinese word segmentation(IKAnalyzer Chinese word segmentation bags)Etc. the summary that mode generates file.Also Detection and actual types including file language form judge.
5th, document associations
Document it is newly-built or upload after, the system can automatic or manual be file generated unique number, support document batch association.
Operation associated aspect, user can manually set master file, and associate corresponding from file, facilitate user preferably to make Use file.By file unique number, realize that document is associated with relevant documentation, drawing, picture and other form annexes, point A key is quickly checked by hitting link.
In addition to carrying out manual association file, the system can carry out auto-associating to the file format of user.Can be Video file generates thumbnail, and also picture file can be compressed, and contracting is increased so as to be embodied as picture, sound, video class file Sketch map.In addition the system just can also increase summary for Miscellaneous Documents.
6th, sharing files
Four kinds of the system offer file share channel, including transmission mail is shared, and directly shares the link of file, saves and borrows With the transfer process of mailbox server;Can share to social platform, cover the platforms such as QQ, wechat, microblogging;Also can be by letter in station Mode shared file;In mobile terminal, the shared part of the file search of wechat platform deployment, can solve user computer not in body Side but need to use the difficulty of file.Sharing files function above, can help user to carry out maximized file-sharing behaviour Make.
7th, rule is used
The regulation engine that the system is set, hiding, automated back-up, transfer, filing can be encrypted to file and be of little use file, text The operations such as the temporary setting of part.Except engine section, the system is supplied to the user a series of rule setting function is carried out to file, It is file required movement(Combination), condition and operation, when action triggers meet the condition of setting, system then automatic executing rule Operation, makes the more hommization of our system.
At present, the main rule that the system is provided has:
(1)Thumbnail is generated according to Windows task schedulings;
(2)According to user-defined rule, specified file regularly is sent to specified mailbox, rule may include time, receiver , the people that makes a copy for, send file, word content etc.;
(3)Can grow interior no file of fixing time by one and be put into knowledge base recycle bin according to user's request;
(4)For the file for uploading sets User Defined tag along sort or system default filing label;
(5)File encryption is hidden, and the file that need to be protected is placed into a file space for encryption, need to be input into during access independent close Code;
(6)File association, user can customize the unfolding mode of active file, check pattern;
(7)User can choose whether automated back-up, prevent account from maliciously being deleted, and can effectively recover file;
(8)File terminal, in the setting file temporary cycle, saves space, and can be prevented effectively from disposable file with periodic cleaning Generation;
(9)Default access rule when file generated is linked, such as access rights and life cycle, the version of enhancing user knowledge are set Power and security;
(10)The unlatching of user-defined file recommendation function, faster more accurate acquisition recommends knowledge in retrieval to help user;
(11)User can customize file content that the form of expression of document --- -- attribute, keyword, system are extracted etc. Aspect, facilitates user's locating file.
8th, intelligent retrieval
The system can carry out full-text search operation based on Solr, with database as source, set up index database, and inquiry velocity reaches million Bar/millisecond, weight is calculated using TF-IDF, carries out intelligent sequencing according to weight to retrieval result, and make term highlighted aobvious Show;Cross-language information retrieval, spell check, canonical retrieval that the system is provided the user(For professional person), real-time retrieval The functions such as the record of result and entry, realize the Optimum Operation of assisted retrieval;In retrieving, according to historical record and network The auto-complete behavior that heat is searched, with reference to our commending system, preferably experience is brought to user.Allow users to it is quick from Precisely file needed for retrieval carries out intelligent search in magnanimity data, and by being received information after search operaqtion, being judged, The knowledge hierarchy of oneself is formed after extracting, analyze and summarizing.
9th, commending system
In the viewing area of user search result, the system is based on " collaborative filtering(Mahout)And neural network model (RapidMiner)" algorithm, first five with reference to user search result carry out parallel recommendation, the i.e. portion of " you may still want to look for " Point, retrieved for different user generation during user search for term and predict entry.
Tenth, user's calendar and dynamic part
The operation such as use to help user to carry out more preferable ff, the quantity of documents that user uploads recorded text by the system In part calendar;In file dynamic part, user can be quickly found out the file of oneself needs according to the mode of " context lookup ".
The above is only the preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (10)

1. a kind of knowledge base management system, it is characterised in that:Using Browser/Server structures as the overall architecture of system, System architecture includes file server and Web server;
Also include file basic management, the centrally stored upload of file, file link, file process, file association, sharing files, Rule use, intelligent retrieval and commending system;
The file basic management carries out newly-built, duplication, stickup, shearing, renaming, deletion, online pressure to file including user Contracting and decompression;Support Offcie documents, PDF, image, audio frequency and video and drawing class file;Support all kinds of forms of online preview Picture, play and video, audio and browse all kinds of office documents;File download;
The centrally stored upload of file includes that user is uploaded to the batch of file and compressed package is imported;Enter style of writing in front end first The sectioning of part, the burst for then carrying out file is uploaded;
The file link includes sending file to catalogue;
The file process, using Apache Lucene Tika algorithms, text is carried out to file after transmitting file on user Extract, extraction keyword therein, or carry out the information extraction of metadata, generate the summary of file;Text-processing also includes text The detection of part language form and actual types judge;
The file association, file is newly-built or after uploading, and automatic or manual is that this document generates unique number, and supports document Batch is associated;
The rule is used and includes regulation engine and rule setting;
The intelligent retrieval uses Apache Lucene full-text search engine kits, sets up Solr full-text search engines, while With reference to TF-IDF and name Entity recognition generation prediction entry;
The commending system is based on the collaborative filtering and neural network model of Mahout, forms effective retrieval system.
2. knowledge base management system according to claim 1, it is characterised in that:The Offcie documents include Word, The form of Excel, PowerPoint, WPS, Visio;In the file process support pdf, doc, docx, ppt, excel, The text of txt, html, xml, zip, tar form.
3. knowledge base management system according to claim 1, it is characterised in that:The file association, user is manually Master file is set, and associates that its is corresponding from file, by file unique number, realize file and associated documents, drawing, picture And other form annexes are associated, a key is quickly checked by clickthrough;Auto-associating, video generation contracting are carried out to file Sketch map, picture is compressed, and thumbnail is increased so as to be embodied as picture, audio, video class file;File increases summary.
4. knowledge base management system according to claim 1, it is characterised in that:The sharing files include sending mail point Enjoy, directly share the link of file, save the transfer process for borrowing mailbox server;Share to social platform;By in station The mode shared file of letter;In mobile terminal, the file search of wechat platform deployment is shared;File server is set, thing is locally marked Reason space is used for storage file as virtual file server catalogue, and FTP carries out the transmission of file, and Tomcat is empty to safeguard this When intending file server catalogue and Web server Tomcat Server, a virtual file server is safeguarded in Tomcat, be Every user distributes a single file root, and all upper transmitting files move into library, the lower storage of each User Catalog File access is linked.
5. knowledge base management system according to claim 1, it is characterised in that:Also include Table top type interface, active file User is presented in the form of desktop shortcuts, user is without lookup, direct point when using in each catalogue is opened every time Respective document is checked by hitting the shortcut.
6. knowledge base management system according to claim 1, it is characterised in that:Unique volume is automatically generated during files passe Number, the batch association of file is supported, manual association operation can be carried out, user sets master file, and associates corresponding from file;It is logical The unique number automatically generated during files passe is crossed, realizes that document is carried out with relevant documentation, drawing, picture and other form annexes Association, the key of clickthrough one is quickly checked;Sharing files have it is various share in channel, including system send mail share, directly Connect and share the link of file, save the transfer process for borrowing mailbox server;In mobile terminal, the file of wechat platform deployment is searched Rope shares part.
7. knowledge base management system according to claim 1, it is characterised in that:The regulation engine, including file is entered Be of little use file, file of row enciphering hiding, automated back-up, transfer, filing is kept in;The rule setting function, is that file is specified Action, condition and operation, when action triggers meet the condition of setting, the operation of system then automatic executing rule.
8. knowledge base management system according to claim 1, it is characterised in that:The Solr set up in the intelligent retrieval is complete Literary search engine, carries out intelligent sequencing, and be highlighted term to retrieval result according to weight;And provide the user across language The record of speech information retrieval, spell check, canonical retrieval, real-time retrieval result and entry, realizes the Optimum Operation of assisted retrieval; In retrieving, according to the auto-complete behavior that historical record and network boom are searched;User can be quickly accurate from magnanimity data File needed for retrieval carries out intelligent retrieval, and retrieval result shows in viewing area.
9. knowledge base management system according to claim 8, it is characterised in that:The commending system, user search result Viewing area, first five with reference to user search result carries out parallel recommendation, the i.e. part of " you may still want to look for ", for inspection Rope word is different user generation retrieval prediction entry during user search.
10. knowledge base management system according to claim 1, it is characterised in that:Also include generation user's calendar and dynamic Part;
User's calendar is during the quantity of documents that user uploads recorded file calendar;File dynamic part is user according to " up and down The mode of text lookup ", is quickly found out the file of oneself needs.
CN201710068416.0A 2017-02-08 2017-02-08 A kind of knowledge base management system Pending CN106844714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710068416.0A CN106844714A (en) 2017-02-08 2017-02-08 A kind of knowledge base management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710068416.0A CN106844714A (en) 2017-02-08 2017-02-08 A kind of knowledge base management system

Publications (1)

Publication Number Publication Date
CN106844714A true CN106844714A (en) 2017-06-13

Family

ID=59121506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710068416.0A Pending CN106844714A (en) 2017-02-08 2017-02-08 A kind of knowledge base management system

Country Status (1)

Country Link
CN (1) CN106844714A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967658A (en) * 2017-11-09 2018-04-27 安徽律正科技信息服务有限公司 A kind of scientific and technological achievement share system
CN108363775A (en) * 2018-02-09 2018-08-03 上海宝尊电子商务有限公司 Preview environment method on the high scalability line of rule-based engine
CN108776672A (en) * 2018-05-21 2018-11-09 山东浪潮商用系统有限公司 Knowledge Management System based on SOLR
CN109639812A (en) * 2018-12-24 2019-04-16 山东浪潮云信息技术有限公司 A kind of participle package management method, apparatus and system
CN109992645A (en) * 2019-03-29 2019-07-09 国家计算机网络与信息安全管理中心 A kind of data supervision system and method based on text data
CN111524581A (en) * 2020-04-17 2020-08-11 东莞理工学院 Method for realizing medical film and data interaction through cloud platform
CN111881100A (en) * 2020-07-10 2020-11-03 棕榈设计有限公司 Knowledge base management framework system, management method, device and storage medium
CN113111198A (en) * 2021-06-15 2021-07-13 平安科技(深圳)有限公司 Demonstration manuscript recommendation method based on collaborative filtering algorithm and related equipment
CN115982429A (en) * 2023-03-21 2023-04-18 中交第四航务工程勘察设计院有限公司 Knowledge management method and system based on flow control
CN116226761A (en) * 2022-12-27 2023-06-06 北京关键科技股份有限公司 Training data classification cataloging method and system based on deep neural network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1126384A2 (en) * 1992-11-06 2001-08-22 Ncr International Inc. Data analysis apparatus and methods
CN1845104A (en) * 2006-05-22 2006-10-11 赵开灏 System and method for intelligent retrieval and processing of information
CN101493820A (en) * 2008-01-25 2009-07-29 北京华深慧正系统工程技术有限公司 Medicine Regulatory industry knowledge base platform and construct method thereof
CN102360358A (en) * 2011-09-28 2012-02-22 百度在线网络技术(北京)有限公司 Keyword recommendation method and system
US20130232143A1 (en) * 2012-03-02 2013-09-05 Xerox Corporation Efficient knowledge base system
CN103377208A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Method for updating files in cloud service file management system
CN103905516A (en) * 2012-12-28 2014-07-02 联想(北京)有限公司 Data sharing method and corresponding server and terminal
CN104516861A (en) * 2014-11-26 2015-04-15 无锡永中软件有限公司 Multimedia interactive document processing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1126384A2 (en) * 1992-11-06 2001-08-22 Ncr International Inc. Data analysis apparatus and methods
CN1845104A (en) * 2006-05-22 2006-10-11 赵开灏 System and method for intelligent retrieval and processing of information
CN101493820A (en) * 2008-01-25 2009-07-29 北京华深慧正系统工程技术有限公司 Medicine Regulatory industry knowledge base platform and construct method thereof
CN102360358A (en) * 2011-09-28 2012-02-22 百度在线网络技术(北京)有限公司 Keyword recommendation method and system
US20130232143A1 (en) * 2012-03-02 2013-09-05 Xerox Corporation Efficient knowledge base system
CN103377208A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Method for updating files in cloud service file management system
CN103905516A (en) * 2012-12-28 2014-07-02 联想(北京)有限公司 Data sharing method and corresponding server and terminal
CN104516861A (en) * 2014-11-26 2015-04-15 无锡永中软件有限公司 Multimedia interactive document processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《百度百科》: "KBMS", 《百度百科》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967658A (en) * 2017-11-09 2018-04-27 安徽律正科技信息服务有限公司 A kind of scientific and technological achievement share system
CN108363775A (en) * 2018-02-09 2018-08-03 上海宝尊电子商务有限公司 Preview environment method on the high scalability line of rule-based engine
CN108776672A (en) * 2018-05-21 2018-11-09 山东浪潮商用系统有限公司 Knowledge Management System based on SOLR
CN109639812A (en) * 2018-12-24 2019-04-16 山东浪潮云信息技术有限公司 A kind of participle package management method, apparatus and system
CN109992645B (en) * 2019-03-29 2021-05-14 国家计算机网络与信息安全管理中心 Data management system and method based on text data
CN109992645A (en) * 2019-03-29 2019-07-09 国家计算机网络与信息安全管理中心 A kind of data supervision system and method based on text data
CN111524581A (en) * 2020-04-17 2020-08-11 东莞理工学院 Method for realizing medical film and data interaction through cloud platform
CN111881100A (en) * 2020-07-10 2020-11-03 棕榈设计有限公司 Knowledge base management framework system, management method, device and storage medium
CN113111198A (en) * 2021-06-15 2021-07-13 平安科技(深圳)有限公司 Demonstration manuscript recommendation method based on collaborative filtering algorithm and related equipment
CN113111198B (en) * 2021-06-15 2021-08-31 平安科技(深圳)有限公司 Demonstration manuscript recommendation method based on collaborative filtering algorithm and related equipment
CN116226761A (en) * 2022-12-27 2023-06-06 北京关键科技股份有限公司 Training data classification cataloging method and system based on deep neural network
CN115982429A (en) * 2023-03-21 2023-04-18 中交第四航务工程勘察设计院有限公司 Knowledge management method and system based on flow control
CN115982429B (en) * 2023-03-21 2023-08-01 中交第四航务工程勘察设计院有限公司 Knowledge management method and system based on flow control

Similar Documents

Publication Publication Date Title
CN106844714A (en) A kind of knowledge base management system
CN110309393B (en) Data processing method, device, equipment and readable storage medium
CN110968782B (en) User portrait construction and application method for learner
Zhang et al. Mining domain knowledge on service goals from textual service descriptions
US20110295823A1 (en) Method and apparatus for modeling relations among data items
CN107301195A (en) Generate disaggregated model method, device and the data handling system for searching for content
WO2023108980A1 (en) Information push method and device based on text adversarial sample
Das et al. A CV parser model using entity extraction process and big data tools
US20160041975A1 (en) Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
CN107491465A (en) For searching for the method and apparatus and data handling system of content
El Abdouli et al. Sentiment analysis of moroccan tweets using naive bayes algorithm
CN107463592A (en) For by the method, equipment and data handling system of content item and images match
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
Maciołek et al. Cluo: Web-scale text mining system for open source intelligence purposes
Greenberg Metadata and digital information
Gasparetti Discovering prerequisite relations from educational documents through word embeddings
Tsapatsoulis Image retrieval via topic modelling of Instagram hashtags
Rajiv et al. A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix
JP4242794B2 (en) Metadata generation device
tong et al. Mining and analyzing user feedback from app reviews: An econometric approach
CN106777124B (en) Semantic knowledge method, apparatus and system
US20220156228A1 (en) Data Tagging And Synchronisation System
Martin et al. Keops: Knowledge extractor pipeline system
Qureshi et al. Detecting social polarization and radicalization
Qassimi et al. Towards an emergent semantic of web resources using collaborative tagging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613

RJ01 Rejection of invention patent application after publication