CN110399339A - File classifying method, device, equipment and the storage medium of knowledge base management system - Google Patents
File classifying method, device, equipment and the storage medium of knowledge base management system Download PDFInfo
- Publication number
- CN110399339A CN110399339A CN201910524705.6A CN201910524705A CN110399339A CN 110399339 A CN110399339 A CN 110399339A CN 201910524705 A CN201910524705 A CN 201910524705A CN 110399339 A CN110399339 A CN 110399339A
- Authority
- CN
- China
- Prior art keywords
- target data
- data file
- file
- target
- management system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000003860 storage Methods 0.000 title claims abstract description 45
- 238000013145 classification model Methods 0.000 claims abstract description 38
- 238000004458 analytical method Methods 0.000 claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 239000000203 mixture Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 abstract description 8
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 238000007726 management method Methods 0.000 description 81
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000007689 inspection Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to artificial intelligence fields, disclose file classifying method, device, equipment and the storage medium of knowledge base management system, label can be arranged to the new old information in knowledge base management system by natural language processing technique and classify, be conducive to the convenient and fast managerial data of documenter, improve the efficiency of management, also facilitate user to carry out data-searching simultaneously, improves recall precision.The method of the present invention includes: the target data file obtained in knowledge base management system;Obtain the target data text in target data file;Pass through the key message of preset knowledge-based classification model analysis target data text;The tag along sort of target data file is set according to the key message of target data file;By establishing index for the tag along sort of target data file and target data file association;If receiving the search instruction of user, according to the target classification label carried in the search instruction of user in retrieval page displaying target information paper.
Description
Technical field
The present invention relates to the file classifying method of natural language processing field more particularly to knowledge base management system, device,
Equipment and storage medium.
Background technique
In knowledge base management field, knowledge base management system (knowledge base management system) storage
The shared data of magnanimity enterprises and institutions, it is good that managerial knowledge library data are convenient for user search data, it is also beneficial to enterprise's thing
The administrator of industry unit easily handles data.
Existing knowledge base management system, the classification of data are manually to carry out mostly, and point of new data and old data
Class arrangement is very difficult, and the efficiency of management is low.
Summary of the invention
The present invention provides a kind of file classifying method of knowledge base management system, can pass through natural language processing technique pair
New old information in knowledge base management system is arranged label and classifies, and is conducive to the convenient and fast managerial data of documenter, improves pipe
Efficiency is managed, while user also being facilitated to carry out data-searching, improves recall precision.
The first aspect of the embodiment of the present invention provides a kind of file classifying method of knowledge base management system, comprising: obtains
Target data file in knowledge base management system;Obtain the target data text in the target data file;By preset
The key message of target data text described in knowledge-based classification model analysis;It is set according to the key message of the target data file
Set the tag along sort of the target data file;By establishing index for the tag along sort of the target data file and the mesh
Mark information paper association;If receiving the search instruction of user, according to the target carried in the search instruction of the user point
Class label shows the target data file in the retrieval page.
Optionally, described to obtain the target data in the first implementation of first aspect of the embodiment of the present invention
Target data text in file, comprising: judge that the file type of the target data file, the file type include document
File, audio-video document and picture file;If the target data file is document files, the target data text is obtained
The text for including in part, using the text for including in the target data file as the target data file;If the target
Information paper is audio-video document, then the audio-video document is converted into the target by preset speech recognition tools and provided
Expect text;If the target data file is picture file, through preset character recognition tool from the picture file
Obtain the target data text.
Optionally, described to pass through preset knowledge base point in second of implementation of first aspect of the embodiment of the present invention
Class model analyzes the key message of the target data file, comprising: by the target data text of the target data file into
Row pretreatment, by the target data text conversion of natural language composition at the target data text of discrete data format;It will be described
The target data text input of discrete data format is into preset knowledge-based classification model;Obtain the preset knowledge-based classification mould
The output phrase of type is believed the output phrase of the preset knowledge-based classification model as the key of the target data file
Breath.
Optionally, described by the target data text in the third implementation of first aspect of the embodiment of the present invention
The target data text of part is pre-processed, by the target data text conversion of natural language composition at the mesh of discrete data format
Mark data text, comprising: word segmentation processing is carried out to the target data text, obtains pretreatment word finder;Delete the pre- place
The vocabulary that frequency of occurrence in word finder is higher than first threshold is managed, frequency of occurrence in the pretreatment word finder is deleted and is lower than the second threshold
The vocabulary of value obtains target word and collects;By preset dictionary index table, the target word is collected and is converted into target data
Collection, using the target data set as the target data text of the discrete data format.
Optionally, described according to the target data in the 4th kind of implementation of first aspect of the embodiment of the present invention
The tag along sort of the target data file is arranged in the key message of file, comprising: the mark of the target data file is arranged
Set, the logo collection are used to store the sort key word of target data file;The key message of the file destination is gone
It is added to the logo collection of the target data file after weight, using each sort key word in the logo collection as described in
The tag along sort of target data file.
Optionally, in the 5th kind of implementation of first aspect of the embodiment of the present invention, if the inspection for receiving user
Suo Zhiling then shows the target data in the retrieval page according to the target classification label carried in the search instruction of the user
File, comprising: identify the target classification label carried in the search instruction;Obtain the storage that target classification tab indexes are directed toward
Address;Information paper associated with the target classification label is read from the storage address;Retrieval the page show and
The associated information paper of target classification label.
Optionally, in the 6th kind of implementation of first aspect of the embodiment of the present invention, the method also includes: according to pre-
The time interval set updates the tag along sort of information paper in knowledge base management system.
The second aspect of the embodiment of the present invention provides a kind of device for sorting document of knowledge base management system, comprising: the
One acquiring unit, for obtaining the target data file in knowledge base management system;Second acquisition unit, for obtaining the mesh
Mark the target data text in information paper;Analytical unit, for being provided by target described in preset knowledge-based classification model analysis
Expect the key message of file;Setting unit, for the target data to be arranged according to the key message of the target data file
The tag along sort of file;Associative cell, for by establishing index for the tag along sort of the target data file and the mesh
Mark information paper association;Display unit, if receiving the search instruction of user, in the search instruction according to the user
The target classification label of carrying shows the target data file in the retrieval page.
Optionally, in the first implementation of second aspect of the embodiment of the present invention, second acquisition unit is specifically used for:
Judge that the file type of the target data file, the file type include document files, audio-video document and picture text
Part;If the target data file is document files, the text for including in the target data file is obtained, by the target
The text for including in information paper is as the target data file;If the target data file is audio-video document, lead to
It crosses preset speech recognition tools and the audio-video document is converted into the target data text;If the target data file
It is picture file, then the target data text is obtained from the picture file by preset character recognition tool.
Optionally, in second of implementation of second aspect of the embodiment of the present invention, analytical unit is specifically included: conversion
Module, for pre-processing the target data text of the target data file, by the target data of natural language composition
Text conversion at discrete data format target data text;Input module, for providing the target of the discrete data format
Expect text input into preset knowledge-based classification model;Module is obtained, for obtaining the defeated of the preset knowledge-based classification model
Phrase out, using the output phrase of the preset knowledge-based classification model as the key message of the target data file.
Optionally, in the third implementation of second aspect of the embodiment of the present invention, conversion module is specifically used for: to institute
It states target data text and carries out word segmentation processing, obtain pretreatment word finder;It is high to delete frequency of occurrence in the pretreatment word finder
In the vocabulary of first threshold, the vocabulary that frequency of occurrence in the pretreatment word finder is lower than second threshold is deleted, target word is obtained
Collect;By preset dictionary index table, the target word is collected and is converted into target data set, the target data set is made
For the target data text of the discrete data format.
Optionally, in the 4th kind of implementation of second aspect of the embodiment of the present invention, setting unit is specifically used for: setting
The logo collection of the target data file, the logo collection are used to store the sort key word of target data file;By institute
The logo collection for being added to the target data file after the key message duplicate removal of file destination is stated, it will be in the logo collection
Tag along sort of each sort key word as the target data file.
Optionally, in the 5th kind of implementation of second aspect of the embodiment of the present invention, display unit is specifically used for: identification
The target classification label carried in the search instruction;Obtain the storage address that target classification tab indexes are directed toward;It is deposited from described
It stores up and reads information paper associated with the target classification label in address;It is shown and the target classification mark in the retrieval page
Sign associated information paper.
Optionally, in the 6th kind of implementation of second aspect of the embodiment of the present invention, the file of knowledge base management system
Sorter further include: updating unit, for updating information paper in knowledge base management system according to preset time interval
Tag along sort.
The third aspect of the embodiment of the present invention provides a kind of document classification equipment of knowledge base management system, including storage
Device, processor and it is stored in the computer program that can be run on the memory and on the processor, the processor is held
The file classifying method of knowledge base management system described in any of the above-described embodiment is realized when the row computer program.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, including instruction, when the finger
When order is run on computers, so that computer executes the file point of knowledge base management system described in any of the above-described embodiment
The step of class method.
In technical solution provided in an embodiment of the present invention, the target data file in knowledge base management system is obtained;It obtains
Target data text in the target data file;Pass through target data text described in preset knowledge-based classification model analysis
Key message;The tag along sort of the target data file is set according to the key message of the target data file;By building
Lithol draws the tag along sort of the target data file and the target data file association;If the retrieval for receiving user refers to
It enables, then the target data text is shown in the retrieval page according to the target classification label carried in the search instruction of the user
Part.Label can be arranged to the new old information in knowledge base management system by natural language processing technique in the embodiment of the present invention
And classify, be conducive to the convenient and fast managerial data of documenter, improve the efficiency of management, while user also being facilitated to carry out data-searching,
Improve recall precision.
Detailed description of the invention
Fig. 1 is one embodiment schematic diagram of the file classifying method of knowledge base management system in the embodiment of the present invention;
Fig. 2 is another embodiment schematic diagram of the file classifying method of knowledge base management system in the embodiment of the present invention;
Fig. 3 is one embodiment schematic diagram of the device for sorting document of knowledge base management system in the embodiment of the present invention;
Fig. 4 is another embodiment schematic diagram of the device for sorting document of knowledge base management system in the embodiment of the present invention;
Fig. 5 is one embodiment schematic diagram of the document classification equipment of knowledge base management system in the embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of file classifying method of knowledge base management system, device, equipment and storages to be situated between
Matter can recommend interview question to interviewer in interview process, scored and will be interviewed according to the interview content of applicant
Scoring is sent to interviewer as reference frame, improves interview efficiency, it helps the interview process of specification interviewer.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
The embodiment of the present invention is described in attached drawing.
Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein
Or the sequence other than the content of description is implemented.In addition, term " includes " or " having " and its any deformation, it is intended that covering is not
Exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units be not necessarily limited to it is clear
Step or unit those of is listed on ground, but is not clearly listed or for these process, methods, product or is set
Standby intrinsic other step or units.
Referring to Fig. 1, a kind of flow chart of the file classifying method of knowledge base management system provided in an embodiment of the present invention,
It specifically includes:
101, the target data file in knowledge base management system is obtained.
Server obtains the target data file in knowledge base management system.Target data text in knowledge base management system
Part is the file that can be exchanged into text.Knowledge base management system is that enterprises and institutions are to realize unified document sharing and build sea
Measure the centrally stored platform of document.
Information paper in knowledge base management system includes all kinds of Office files, multimedia file and electronic document,
Information paper support format include Word, Excel, PowerPoint, WPS, Visio, PDF, AVI, WAV, MID, MPEG,
MP3, DWF and JPG.
102, the target data text in target data file is obtained.
Server obtains the target data text in target data file.Specifically, server judges target data file
File type, file type includes document files, audio-video document and picture file;If target data file is document text
Part, then server obtains the text for including in target data file, provides the text for including in target data file as target
Expect file;If target data file is audio-video document, server is literary by the audio-video by preset speech recognition tools
Part is converted into target data text;If target data file is picture file, server passes through preset character recognition tool
Target data text is obtained from the picture file.
It needs to illustrate to be, the input of preset knowledge-based classification model is textual form, and server is passing through preset knowledge
Before the relation information of library disaggregated model extraction document, the text information of extraction document is needed, such as when target data file is
When one first song, server needs to extract the lyrics of this song in advance.
103, pass through the key message of preset knowledge-based classification model analysis target data text.
The key message that server passes through preset knowledge-based classification model analysis target data text.Key message is preset
Multiple keywords in dictionary.Preset dictionary is the classification according to the classification demand setting of information paper in knowledge base management system
Keyword database.Specifically, key message of the server by preset knowledge-based classification model analysis target data file, packet
Include: server pre-processes the target data text of target data file, the target data text that natural language is formed
It is expressed as discrete data format;Server is by the target data text input of discrete data format to preset knowledge-based classification model
In;Server obtains the output phrase of preset knowledge-based classification model, using the output phrase of preset knowledge-based classification model as
The key message of target data file.
Wherein, preset knowledge-based classification model is all files constantly updated using in knowledge base management system as corpus
Library, which is trained, obtains the text classification convolutional neural networks (text that accuracy reaches 90 or more percent
Convolutional neural networks, TextCNN) model, and the output of preset knowledge base model is in preset dictionary
Multiple keywords, TextCNN model is the algorithm classified using convolutional neural networks to text.
104, the tag along sort of target data file is set according to the key message of target data file.
The tag along sort of target data file is arranged according to the key message of target data file for server.Specifically, clothes
The logo collection of business device setting target data file, mark combine the sort key word for storing target data file;Service
Device will be added to the logo collection of target data file after the key message duplicate removal of file destination, by each of the logo collection
Tag along sort of the sort key word as target data file.
It needs to illustrate to be, server is according to different classification demands, and under each classification foundation, target data is arranged
The different logo collection of file, and all logo collections of target data file are stored into preset classified index table.It lifts
Example explanation, in preset classified index table, when classification foundation is author, the logo collection of target data file are as follows: { author
A, author B, author C }.
105, by establishing index for the tag along sort of target data file and target data file association.
Server is by establishing index for the tag along sort of target data file and target data file association.The index mentions
For a kind of and associated data pointer of tag along sort, which is directed toward the address of storage target data file.
In knowledge base management system, server can be by establishing Hash (hash) index, B+Ttree index or letter
Number index by tag along sort and target data file association, server can also by other kinds of index by tag along sort and
Target data file association, such as bitmap index or descending order index, specifically herein with no restrictions.
It needs to illustrate to be, includes all tag along sorts in preset dictionary, each tag along sort is by index and owns
Information paper association comprising the tag along sort.
If 106, receiving the search instruction of user, existed according to the target classification label carried in the search instruction of user
Retrieve page displaying target information paper.
If receiving the search instruction of user, server is according to the target classification label carried in the search instruction of user
In retrieval page displaying target information paper.Specifically, the target classification label carried in server identification search instruction;Service
Device obtains the storage address that target classification tab indexes are directed toward;Server reads related to target classification label from storage address
The information paper of connection;Server shows information paper associated with target classification label in the retrieval page.
Need to illustrate to be, server retrieve the page show information paper associated with the target classification label when,
It supports to be ranked up according to attributes such as the title of information paper, size, date and upper successors.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique
Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection
Rope improves recall precision.
Referring to Fig. 2, in the embodiment of the present invention file classifying method of knowledge base management system another embodiment packet
It includes:
201, the target data file in knowledge base management system is obtained.
Server obtains the target data file in knowledge base management system.Target data text in knowledge base management system
Part is the file that can be exchanged into text.Knowledge base management system is that enterprises and institutions are to realize unified document sharing and build sea
Measure the centrally stored platform of document.
Information paper in knowledge base management system includes all kinds of Office files, multimedia file and electronic document,
Information paper support format include Word, Excel, PowerPoint, WPS, Visio, PDF, AVI, WAV, MID, MPEG,
MP3, DWF and JPG etc..
202, the target data text in target data file is obtained.
Server obtains the target data text in target data file.Specifically, server judges target data file
File type, file type includes document files, audio-video document and picture file;If target data file is document text
Part, then server obtains the text for including in target data file, provides the text for including in target data file as target
Expect file;If target data file is audio-video document, server is literary by the audio-video by preset speech recognition tools
Part is converted into target data text;If target data file is picture file, server passes through preset character recognition tool
Target data text is obtained from the picture file.
It needs to illustrate to be, the input of preset knowledge-based classification model is textual form, and server is passing through preset knowledge
Before the relation information of library disaggregated model extraction document, the text information of extraction document is needed, such as when target data file is
When one first song, server needs to extract the lyrics of this song in advance.
203, the target data text of target data file is pre-processed, by the target data text of natural language composition
Originally it is converted into the target data text of discrete data format.
Server pre-processes the target data text of target data file, the target data that natural language is formed
Text conversion at discrete data format target data text.Specifically, server carries out word segmentation processing to target data text,
Pretreatment word finder is obtained, such as " LeTV will be restored transaction tomorrow: 5% or more shareholder is without the plan of reducing " is subjected to word segmentation processing,
Obtained pretreatment word finder be LeTV, tomorrow, restore transaction, 5%, it is above, shareholder, nothing, reduce, plan;Server is deleted
Except frequency of occurrence is higher than the vocabulary of first threshold in pretreatment word finder, deletes frequency of occurrence in pretreatment word finder and be lower than second
The vocabulary of threshold value obtains target word and collects;Target word is collected by preset dictionary index table and is converted into number of targets by server
According to collection, using target data set as the target data text of discrete data format, such as target word collected into { LeTV is restored transaction }
It is converted into target data set { 0,1 }.
Wherein, server carries out the segmentation methods of word segmentation processing to target data text, can be maximum matching algorithm, most
Maximum probability segmentation methods or minimum segmentation algorithm, can also be other segmentation methods, specifically herein with no restrictions.
204, by the target data text input of discrete data format into preset knowledge-based classification model.
Server is by the target data text input of discrete data format into preset knowledge-based classification model.Preset knowledge
Library disaggregated model is used to extract the feature of target data text.
It is understood that preset knowledge-based classification model will adjust the target data text combination of data format into target
After matrix stack, the feature that convolution algorithm extracts objective matrix collection is carried out using preset convolution kernel, then part is removed by pond layer
Obtained merging features are obtained into the key message of discrete data format after feature.
It should be noted that preset knowledge-based classification model is all texts to constantly update in knowledge base management system
Part is trained as corpus obtains the TextCNN model that accuracy reaches 90 or more percent, and preset knowledge base
The output of model is multiple keywords in preset dictionary, and TextCNN model is to be divided using convolutional neural networks text
The algorithm of class.
205, the output phrase for obtaining preset knowledge-based classification model makees the output phrase of preset knowledge-based classification model
For the key message of target data file.
Server obtains the output phrase of preset knowledge-based classification model, by the output phrase of preset knowledge-based classification model
Key message as target data file.
It needs to illustrate to be, the output phrase of preset knowledge-based classification model is discrete data format, and server needs logical
Preset indexed lexicon is crossed, the output phrase of discrete data format is converted to the output phrase of natural language form.
206, the tag along sort of target data file is set according to the key message of target data file.
The tag along sort of target data file is arranged according to the key message of target data file for server.Specifically, clothes
The logo collection of business device setting target data file, mark combine the sort key word for storing target data file;Service
Device will be added to the logo collection of target data file after the key message duplicate removal of file destination, by each of the logo collection
Tag along sort of the sort key word as target data file.
It needs to illustrate to be, server is according to different classification demands, and under each classification foundation, target data is arranged
The different logo collection of file, and all logo collections of target data file are stored into preset classified index table.It lifts
Example explanation, in preset classified index table, when classification foundation is author, the logo collection of target data file are as follows: { author
A, author B, author C }.
207, by establishing index for the tag along sort of target data file and target data file association.
Server is by establishing index for the tag along sort of target data file and target data file association.The index mentions
For a kind of and associated data pointer of tag along sort, which is directed toward the address of storage target data file.
In knowledge base management system, server can be by establishing Hash (hash) index, B+Ttree index or letter
Number index by tag along sort and target data file association, server can also by other kinds of index by tag along sort and
Target data file association, such as bitmap index or descending order index, specifically herein with no restrictions.
It needs to illustrate to be, includes all tag along sorts in preset dictionary, each tag along sort is by index and owns
Information paper association comprising the tag along sort.
If 208, receiving the search instruction of user, existed according to the target classification label carried in the search instruction of user
Retrieve page displaying target information paper.
If receiving the search instruction of user, server is according to the target classification label carried in the search instruction of user
In retrieval page displaying target information paper.Specifically, the target classification label carried in server identification search instruction;Service
Device obtains the storage address that target classification tab indexes are directed toward;Server reads related to target classification label from storage address
The information paper of connection;Server shows information paper associated with target classification label in the retrieval page.
Need to illustrate to be, server retrieve the page show information paper associated with the target classification label when,
It supports to be ranked up according to attributes such as the title of information paper, size, date and upper successors.
209, the tag along sort of information paper in knowledge base management system is updated according to preset time interval.
Server updates the tag along sort of information paper in knowledge base management system according to preset time interval.Specifically
, server obtains the tag along sort that manager updates according to preset time interval;The classification that server updates manager
Tag update is into preset dictionary;Preset dictionary based on update is trained preset knowledge-based classification model, obtains target
Knowledge-based classification model;The logo collection of information paper in knowledge base management system is updated by object knowledge library disaggregated model.
It needs to illustrate to be, the tag along sort that server updates manager is updated into preset dictionary, comprising: server
According to the tag along sort that manager updates, existing tag along sort in preset dictionary is modified;Server is updated according to manager
Tag along sort deletes existing tag along sort in preset dictionary;The tag along sort that server is updated according to manager adds new
Tag along sort is to preset dictionary.
Wherein, preset time interval can be adjusted according to the actual situation, such as 24 hours or 48 hours, can be with
It is other durations, specifically herein with no restrictions.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique
Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection
Rope improves recall precision.
The file classifying method of knowledge base management system in the embodiment of the present invention is described above, below to this hair
The device for sorting document of knowledge base management system is described in bright embodiment, referring to Fig. 3, knowledge base in the embodiment of the present invention
One embodiment of the device for sorting document of management system includes:
First acquisition unit 301, for obtaining the target data file in knowledge base management system;
Second acquisition unit 302, for obtaining the target data text in the target data file;
Analytical unit 303, for the key message by target data file described in preset knowledge-based classification model analysis;
Setting unit 304, for the target data file to be arranged according to the key message of the target data file
Tag along sort;
Associative cell 305, for being provided the tag along sort of the target data file and the target by establishing index
Expect file association;
Display unit 306, if receiving the search instruction of user, for being carried in the search instruction according to the user
Target classification label retrieval the page show the target data file.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique
Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection
Rope improves recall precision.
Referring to Fig. 4, one embodiment of the device for sorting document of knowledge base management system includes: in the embodiment of the present invention
Second acquisition unit 302, for obtaining the target data text in the target data file;
Analytical unit 303, for the key message by target data file described in preset knowledge-based classification model analysis;
Setting unit 304, for the target data file to be arranged according to the key message of the target data file
Tag along sort;
Associative cell 305, for being provided the tag along sort of the target data file and the target by establishing index
Expect file association;
Display unit 306, if receiving the search instruction of user, for being carried in the search instruction according to the user
Target classification label retrieval the page show the target data file.
Optionally, second acquisition unit 302 is specifically used for:
Judge the file type of the target data file, the file type include document files, audio-video document with
And picture file;If the target data file is document files, the text for including in the target data file is obtained, it will
The text for including in the target data file is as the target data file;If the target data file is audio-video text
The audio-video document is then converted into the target data text by preset speech recognition tools by part;If the target
Information paper is picture file, then the target data text is obtained from the picture file by preset character recognition tool
This.
Optionally, analytical unit 303 specifically includes:
Conversion module 3031, for pre-processing the target data text of the target data file, by nature language
Say the target data text conversion of composition at the target data text of discrete data format;
Input module 3032, for by the target data text input of the discrete data format to preset knowledge-based classification
In model;
Module 3033 is obtained, for obtaining the output phrase of the preset knowledge-based classification model, by the preset knowledge
Key message of the output phrase of library disaggregated model as the target data file.
Optionally, conversion module 3031 is specifically used for:
Word segmentation processing is carried out to the target data text, obtains pretreatment word finder;Delete the pretreatment word finder
Middle frequency of occurrence is higher than the vocabulary of first threshold, deletes the word that frequency of occurrence in the pretreatment word finder is lower than second threshold
It converges, obtains target word and collect;By preset dictionary index table, the target word is collected and is converted into target data set, by institute
State target data text of the target data set as the discrete data format.
Optionally, setting unit 304 is specifically used for:
The logo collection of the target data file is set, and the logo collection is used to store the classification of target data file
Keyword;The logo collection of the target data file will be added to after the key message duplicate removal of the file destination, it will be described
Tag along sort of each sort key word as the target data file in logo collection.
Optionally, display unit 306 is specifically used for:
Identify the target classification label carried in the search instruction;With obtaining the storage that target classification tab indexes are directed toward
Location;Information paper associated with the target classification label is read from the storage address;It is shown and institute in the retrieval page
State the associated information paper of target classification label.
Optionally, the device for sorting document of knowledge base management system further include:
Updating unit 307, for updating the classification of information paper in knowledge base management system according to preset time interval
Label.
The embodiment of the present invention can be arranged the new old information in knowledge base management system by natural language processing technique
Label is simultaneously classified, and is conducive to the convenient and fast managerial data of documenter, is improved the efficiency of management, while user also being facilitated to carry out data inspection
Rope improves recall precision.
Angle of the above figure 3 to Fig. 4 from modular functionality entity is to the knowledge base management system in the embodiment of the present invention
Device for sorting document is described in detail, below from the angle of hardware handles to knowledge base management system in the embodiment of the present invention
Document classification equipment is described in detail.
Fig. 5 is a kind of structural schematic diagram of the document classification equipment of knowledge base management system provided in an embodiment of the present invention,
The document classification equipment 500 of the knowledge base management system can generate bigger difference because configuration or performance are different, can wrap
One or more processors (central processing units, CPU) 501 is included (for example, at one or more
Manage device) and memory 509, one or more store storage medium 508 (such as one of application programs 507 or data 506
Or more than one mass memory unit).Wherein, memory 509 and storage medium 508 can be of short duration storage or persistent storage.
The program for being stored in storage medium 508 may include one or more modules (diagram does not mark), and each module can wrap
Include the series of instructions operation in the document classification equipment to knowledge base management system.Further, processor 501 can be set
It is set to and is communicated with storage medium 508, executed in storage medium 508 in the document classification equipment 500 of knowledge base management system
Series of instructions operation.
The document classification equipment 500 of knowledge base management system can also include one or more power supplys 502, one or
More than one wired or wireless network interface 503, one or more input/output interfaces 504, and/or, one or one
The above operating system 505, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD etc..Art technology
Personnel are appreciated that the document classification device structure of knowledge base management system shown in Fig. 5 is not constituted to knowledge base management
The restriction of the document classification equipment of system, may include than illustrating more or fewer components, perhaps combine certain components or
Different component layouts.Processor 501 can execute first acquisition unit 301 in above-described embodiment, second acquisition unit 302,
The function of analytical unit 303, setting unit 304, associative cell 305 and display unit 306.
Specifically it is situated between below with reference to each component parts of the Fig. 5 to the document classification equipment of knowledge base management system
It continues:
Processor 501 is the control centre of the document classification equipment of knowledge base management system, can be according to the knowledge of setting
The file classifying method of base management system is handled.Processor 501 utilizes various interfaces and the entire knowledge depositary management of connection
The various pieces of the document classification equipment of reason system, by run or execute the software program being stored in memory 509 and/or
Module, and the data being stored in memory 509 are called, execute the various function of the document classification equipment of knowledge base management system
Can and data be handled, to realize the document classification of knowledge base management system.Storage medium 508 and memory 509 are all storages
The carrier of data, in embodiment, storage medium 508 can refer to that storage volume is smaller, but fireballing built-in storage, and store
Device 509 can be that storage volume is big, but the external memory that storage speed is slow.
Memory 509 can be used for storing software program and module, and processor 501 is stored in memory 509 by operation
Software program and module, thereby executing the document classification equipment 500 of knowledge base management system various function application and
Data processing.Memory 509 can mainly include storing program area and storage data area, wherein storing program area can store operation
Application program needed for system, at least one function (such as target data text in acquisition target data file etc.) etc.;It deposits
Storage data field, which can be stored, uses created data (such as contingency table according to the document classification equipment of knowledge base management system
Label) etc..In addition, memory 509 may include high-speed random access memory, it can also include nonvolatile memory, such as
At least one disk memory, flush memory device or other volatile solid-state parts.It provides in embodiments of the present invention
The file classifying method program of knowledge base management system and the data flow received store in memory, when it is desired to be used,
Processor 501 is called from memory 509.
When loading on computers and executing the computer program instructions, entirely or partly generate according to of the invention real
Apply process described in example or function.The computer can be general purpose computer, special purpose computer, computer network or its
His programmable device.The computer instruction may be stored in a computer readable storage medium, or can from a computer
Read storage medium transmitted to another computer readable storage medium, for example, the computer instruction can from a web-site,
Computer, server or data center pass through wired (such as coaxial cable, optical fiber, twisted pair) or wireless (such as infrared, nothing
Line, microwave etc.) mode transmitted to another web-site, computer, server or data center.It is described computer-readable
Storage medium can be any usable medium that computer can store or include that one or more usable mediums are integrated
The data storage devices such as server, data center.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, magnetic
Band), optical medium (for example, CD) or semiconductor medium (such as solid state hard disk (solid state disk, SSD)) etc..
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in embodiments of the present invention can integrate in one processing unit, it is also possible to each
A unit physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit was both
It can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention
Portion or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (read-only memory,
ROM), random access memory (random access memory, RAM), magnetic or disk etc. are various can store program
The medium of code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before
Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding
Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these
It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of file classifying method of knowledge base management system characterized by comprising
Obtain the target data file in knowledge base management system;
Obtain the target data text in the target data file;
Pass through the key message of target data text described in preset knowledge-based classification model analysis;
The tag along sort of the target data file is set according to the key message of the target data file;
By establishing index for the tag along sort of the target data file and the target data file association;
If receiving the search instruction of user, retrieved according to the target classification label carried in the search instruction of the user
The page shows the target data file.
2. the file classifying method of knowledge base management system according to claim 1, which is characterized in that described in the acquisition
Target data text in target data file, comprising:
Judge that the file type of the target data file, the file type include document files, audio-video document and figure
Piece file;
If the target data file is document files, the text for including in the target data file is obtained, by the mesh
The text for including in mark information paper is as the target data file;
If the target data file is audio-video document, the audio-video document is turned by preset speech recognition tools
Change the target data text into;
If the target data file is picture file, obtained from the picture file by preset character recognition tool
The target data text.
3. the file classifying method of knowledge base management system according to claim 1, which is characterized in that described by preset
The key message of target data file described in knowledge-based classification model analysis, comprising:
The target data text of the target data file is pre-processed, the target data text of natural language composition is turned
Change the target data text of discrete data format into;
By the target data text input of the discrete data format into preset knowledge-based classification model;
The output phrase for obtaining the preset knowledge-based classification model makees the output phrase of the preset knowledge-based classification model
For the key message of the target data file.
4. the file classifying method of knowledge base management system according to claim 3, which is characterized in that described by the mesh
The target data text of mark information paper is pre-processed, by the target data text conversion of natural language composition at discrete data
The target data text of format, comprising:
Word segmentation processing is carried out to the target data text, obtains pretreatment word finder;
The vocabulary that frequency of occurrence in the pretreatment word finder is higher than first threshold is deleted, deletes in the pretreatment word finder
Occurrence number is lower than the vocabulary of second threshold, obtains target word and collects;
By preset dictionary index table, the target word is collected and is converted into target data set, the target data set is made
For the target data text of the discrete data format.
5. the file classifying method of knowledge base management system according to claim 1, which is characterized in that described according to
The tag along sort of the target data file is arranged in the key message of target data file, comprising:
The logo collection of the target data file is set, and the logo collection is used to store the sort key of target data file
Word;
The logo collection of the target data file will be added to after the key message duplicate removal of the file destination, by the mark
Tag along sort of each sort key word as the target data file in set.
6. according to claim 1 in -5 any knowledge base management system file classifying method, which is characterized in that it is described
If receiving the search instruction of user, according to the target classification label carried in the search instruction of the user in the retrieval page
Show the target data file, comprising:
Identify the target classification label carried in the search instruction;
Obtain the storage address that target classification tab indexes are directed toward;
Information paper associated with the target classification label is read from the storage address;
Information paper associated with the target classification label is shown in the retrieval page.
7. according to claim 1 in -5 any knowledge base management system file classifying method, which is characterized in that it is described
Method further include:
The tag along sort of information paper in knowledge base management system is updated according to preset time interval.
8. a kind of device for sorting document of knowledge base management system characterized by comprising
First acquisition unit, for obtaining the target data file in knowledge base management system;
Second acquisition unit, for obtaining the target data text in the target data file;
Analytical unit, for the key message by target data file described in preset knowledge-based classification model analysis;
Setting unit, for the contingency table of the target data file to be arranged according to the key message of the target data file
Label;
Associative cell, for being closed the tag along sort of the target data file and the target data file by establishing index
Connection;
Display unit, the target if receiving the search instruction of user, for being carried in the search instruction according to the user
Tag along sort shows the target data file in the retrieval page.
9. a kind of document classification equipment of knowledge base management system, which is characterized in that including memory, processor and be stored in institute
The computer program that can be run on memory and on the processor is stated, the processor executes real when the computer program
The now file classifying method of the knowledge base management system as described in any one of claim 1-7.
10. a kind of computer readable storage medium, which is characterized in that including instruction, when described instruction is run on computers,
So that computer executes the file classifying method of the knowledge base management system as described in any one of claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910524705.6A CN110399339A (en) | 2019-06-18 | 2019-06-18 | File classifying method, device, equipment and the storage medium of knowledge base management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910524705.6A CN110399339A (en) | 2019-06-18 | 2019-06-18 | File classifying method, device, equipment and the storage medium of knowledge base management system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110399339A true CN110399339A (en) | 2019-11-01 |
Family
ID=68323232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910524705.6A Pending CN110399339A (en) | 2019-06-18 | 2019-06-18 | File classifying method, device, equipment and the storage medium of knowledge base management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399339A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046225A (en) * | 2019-12-20 | 2020-04-21 | 网易(杭州)网络有限公司 | Audio resource processing method, device, equipment and storage medium |
CN111125016A (en) * | 2019-12-24 | 2020-05-08 | 普世(南京)智能科技有限公司 | Magneto-optical hybrid file storage method and system based on label organization |
CN111523289A (en) * | 2020-04-24 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Text format generation method, device, equipment and readable medium |
CN111881100A (en) * | 2020-07-10 | 2020-11-03 | 棕榈设计有限公司 | Knowledge base management framework system, management method, device and storage medium |
CN112256669A (en) * | 2020-09-27 | 2021-01-22 | 北京三快在线科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN112445782A (en) * | 2020-12-10 | 2021-03-05 | 深圳市中博科创信息技术有限公司 | Enterprise knowledge base management method for customer service |
CN112559670A (en) * | 2020-12-22 | 2021-03-26 | 江苏鼎岳智慧信息技术有限公司 | Data management system |
CN112597100A (en) * | 2020-09-17 | 2021-04-02 | 武汉大学 | File management method and device based on object proxy tag |
CN113360459A (en) * | 2021-07-08 | 2021-09-07 | 国网能源研究院有限公司 | Method, system and device for semi-automatically marking and storing files |
CN113392250A (en) * | 2021-06-30 | 2021-09-14 | 合肥高维数据技术有限公司 | Vector diagram retrieval method and system based on deep learning |
CN115422131A (en) * | 2022-11-04 | 2022-12-02 | 北京国电通网络技术有限公司 | Business audit knowledge base retrieval method, device, equipment and computer readable medium |
CN115934880A (en) * | 2022-10-31 | 2023-04-07 | 永道工程咨询有限公司 | Construction of project cost document database and search method of project cost document |
CN117454396A (en) * | 2023-10-24 | 2024-01-26 | 深圳市马博士网络科技有限公司 | Forced access control system and method for private cloud system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010250439A (en) * | 2009-04-13 | 2010-11-04 | Kanagawa Univ | Retrieval system, data generation method, program and recording medium for recording program |
CN103034667A (en) * | 2011-10-08 | 2013-04-10 | 英业达股份有限公司 | System and method for establishing database according to webpage index labels |
CN104123366A (en) * | 2014-07-23 | 2014-10-29 | 谢建平 | Search method and server |
CN107038480A (en) * | 2017-05-12 | 2017-08-11 | 东华大学 | A kind of text sentiment classification method based on convolutional neural networks |
CN107944559A (en) * | 2017-11-24 | 2018-04-20 | 国家计算机网络与信息安全管理中心 | A kind of entity relationship automatic identifying method and system |
CN108255972A (en) * | 2017-12-27 | 2018-07-06 | 浪潮通用软件有限公司 | A kind of text searching method and system |
CN108829765A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | A kind of information query method, device, computer equipment and storage medium |
CN108932294A (en) * | 2018-05-31 | 2018-12-04 | 平安科技(深圳)有限公司 | Resume data processing method, device, equipment and storage medium based on index |
CN109558492A (en) * | 2018-10-16 | 2019-04-02 | 中山大学 | A kind of listed company's knowledge mapping construction method and device suitable for event attribution |
-
2019
- 2019-06-18 CN CN201910524705.6A patent/CN110399339A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010250439A (en) * | 2009-04-13 | 2010-11-04 | Kanagawa Univ | Retrieval system, data generation method, program and recording medium for recording program |
CN103034667A (en) * | 2011-10-08 | 2013-04-10 | 英业达股份有限公司 | System and method for establishing database according to webpage index labels |
CN104123366A (en) * | 2014-07-23 | 2014-10-29 | 谢建平 | Search method and server |
CN107038480A (en) * | 2017-05-12 | 2017-08-11 | 东华大学 | A kind of text sentiment classification method based on convolutional neural networks |
CN107944559A (en) * | 2017-11-24 | 2018-04-20 | 国家计算机网络与信息安全管理中心 | A kind of entity relationship automatic identifying method and system |
CN108255972A (en) * | 2017-12-27 | 2018-07-06 | 浪潮通用软件有限公司 | A kind of text searching method and system |
CN108829765A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | A kind of information query method, device, computer equipment and storage medium |
CN108932294A (en) * | 2018-05-31 | 2018-12-04 | 平安科技(深圳)有限公司 | Resume data processing method, device, equipment and storage medium based on index |
CN109558492A (en) * | 2018-10-16 | 2019-04-02 | 中山大学 | A kind of listed company's knowledge mapping construction method and device suitable for event attribution |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046225A (en) * | 2019-12-20 | 2020-04-21 | 网易(杭州)网络有限公司 | Audio resource processing method, device, equipment and storage medium |
CN111046225B (en) * | 2019-12-20 | 2024-01-26 | 网易(杭州)网络有限公司 | Audio resource processing method, device, equipment and storage medium |
CN111125016A (en) * | 2019-12-24 | 2020-05-08 | 普世(南京)智能科技有限公司 | Magneto-optical hybrid file storage method and system based on label organization |
CN111523289A (en) * | 2020-04-24 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Text format generation method, device, equipment and readable medium |
CN111523289B (en) * | 2020-04-24 | 2023-05-09 | 支付宝(杭州)信息技术有限公司 | Text format generation method, device, equipment and readable medium |
CN111881100A (en) * | 2020-07-10 | 2020-11-03 | 棕榈设计有限公司 | Knowledge base management framework system, management method, device and storage medium |
CN112597100B (en) * | 2020-09-17 | 2022-07-15 | 武汉大学 | File management method and device based on object proxy label |
CN112597100A (en) * | 2020-09-17 | 2021-04-02 | 武汉大学 | File management method and device based on object proxy tag |
CN112256669A (en) * | 2020-09-27 | 2021-01-22 | 北京三快在线科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN112445782A (en) * | 2020-12-10 | 2021-03-05 | 深圳市中博科创信息技术有限公司 | Enterprise knowledge base management method for customer service |
CN112559670A (en) * | 2020-12-22 | 2021-03-26 | 江苏鼎岳智慧信息技术有限公司 | Data management system |
CN113392250A (en) * | 2021-06-30 | 2021-09-14 | 合肥高维数据技术有限公司 | Vector diagram retrieval method and system based on deep learning |
CN113392250B (en) * | 2021-06-30 | 2024-01-12 | 合肥高维数据技术有限公司 | Vector diagram retrieval method and system based on deep learning |
CN113360459A (en) * | 2021-07-08 | 2021-09-07 | 国网能源研究院有限公司 | Method, system and device for semi-automatically marking and storing files |
CN115934880A (en) * | 2022-10-31 | 2023-04-07 | 永道工程咨询有限公司 | Construction of project cost document database and search method of project cost document |
CN115422131A (en) * | 2022-11-04 | 2022-12-02 | 北京国电通网络技术有限公司 | Business audit knowledge base retrieval method, device, equipment and computer readable medium |
CN117454396A (en) * | 2023-10-24 | 2024-01-26 | 深圳市马博士网络科技有限公司 | Forced access control system and method for private cloud system |
CN117454396B (en) * | 2023-10-24 | 2024-07-05 | 深圳市马博士网络科技有限公司 | Forced access control system and method for private cloud system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110399339A (en) | File classifying method, device, equipment and the storage medium of knowledge base management system | |
Hidayat et al. | Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier | |
US11663254B2 (en) | System and engine for seeded clustering of news events | |
US9390086B2 (en) | Classification system with methodology for efficient verification | |
Jiang et al. | An improved K-nearest-neighbor algorithm for text categorization | |
Al Qadi et al. | Arabic text classification of news articles using classical supervised classifiers | |
Bisandu et al. | Clustering news articles using efficient similarity measure and N-grams | |
CN111125086B (en) | Method, device, storage medium and processor for acquiring data resources | |
US10706030B2 (en) | Utilizing artificial intelligence to integrate data from multiple diverse sources into a data structure | |
CA2956627A1 (en) | System and engine for seeded clustering of news events | |
Bolaj et al. | Text classification for Marathi documents using supervised learning methods | |
CN111783861A (en) | Data classification method, model training device and electronic equipment | |
CN115098690B (en) | Multi-data document classification method and system based on cluster analysis | |
CN111522950A (en) | Rapid identification system for unstructured massive text sensitive data | |
CN116401338A (en) | Design feature extraction and attention mechanism based on data asset intelligent retrieval input and output requirements and method thereof | |
CN114266255A (en) | Corpus classification method, apparatus, device and storage medium based on clustering model | |
Ilic et al. | Suffix tree clustering–data mining algorithm | |
Bhatt et al. | An improved optimized web page classification using firefly algorithm with nb classifier (wpcnb) | |
Swarnalatha et al. | Classwise clustering for classification of imbalanced text data | |
Desai et al. | Analysis of Health Care Data Using Natural Language Processing | |
Singh et al. | Intra News Category Classification using N-gram TF-IDF Features and Decision Tree Classifier | |
CN111259150A (en) | Document representation method based on word frequency co-occurrence analysis | |
CN109947941A (en) | A kind of method and system based on elevator customer service text classification | |
Arivarasan et al. | Data mining K-means document clustering using tfidf and word frequency count | |
CN111694948B (en) | Text classification method and system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191101 |
|
RJ01 | Rejection of invention patent application after publication |