CN110096590A - A kind of document classification method, apparatus, medium and electronic equipment - Google Patents

A kind of document classification method, apparatus, medium and electronic equipment Download PDF

Info

Publication number
CN110096590A
CN110096590A CN201910206339.XA CN201910206339A CN110096590A CN 110096590 A CN110096590 A CN 110096590A CN 201910206339 A CN201910206339 A CN 201910206339A CN 110096590 A CN110096590 A CN 110096590A
Authority
CN
China
Prior art keywords
document
specified directory
user
under
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910206339.XA
Other languages
Chinese (zh)
Inventor
彭龙腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin ByteDance Technology Co Ltd
Original Assignee
Tianjin ByteDance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin ByteDance Technology Co Ltd filed Critical Tianjin ByteDance Technology Co Ltd
Priority to CN201910206339.XA priority Critical patent/CN110096590A/en
Publication of CN110096590A publication Critical patent/CN110096590A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides a kind of document classification method, apparatus, medium and electronic equipments, this method comprises: User ID is obtained, the determining quantity with the associated document under specified directory of the User ID;When the number of documents is more than preset value, the structural relation between the document under the specified directory is adjusted;Result adjusted is shown at default interface.The present disclosure proposes a kind of document classification methods, the quantity of system automatic identification statistical documents, and whether user is reminded to sort out according to setting, classifying module is called to sort out the document under specified directory when being sorted out, so that the document under specified directory carries out classified finishing storage according to document content, convenient for user's lookup, office efficiency is improved.

Description

A kind of document classification method, apparatus, medium and electronic equipment
Technical field
This disclosure relates to field of computer technology, in particular to a kind of document classification method, apparatus, medium and electricity Sub- equipment.
Background technique
With document use more and more frequently, when counting user document data, the number of documents of user is long-range In folder data, this just illustrates that the possible most of document of user is unordered hash storage.In fact, user need using When document, is used often through newly-built or copy relevant documentation, current location is just arbitrarily stored in after the completion of editor, with the time Accumulation, number of documents will be more and more, cause to store a large amount of nameless documents under same catalogue, not by arranging, Causing can be comparatively laborious when wanting to look for some type of document below.
Therefore, how automatic taxonomic revision quickly and effectively to be carried out to document, just becomes urgent the technical issues of solving.
Disclosure
The disclosure is designed to provide a kind of document classification method, apparatus, medium and electronic equipment, is able to solve above-mentioned At least one technical problem mentioned.Concrete scheme is as follows:
According to the specific embodiment of the disclosure, in a first aspect, the disclosure provides a kind of document classification method, comprising:
Obtain User ID, the determining quantity with the associated document under specified directory of the User ID;
When the number of documents is more than preset value, the structural relation between the document under the specified directory is adjusted;
Result adjusted is shown at default interface.
Optionally, it is described result adjusted is shown at default interface after, comprising:
According in the default received user instruction in interface, by the structural relation of the document under the specified directory according to The result adjusted is shown.
Optionally, described when the on-line documentation quantity is more than preset value, it adjusts between the document under the specified directory Structural relation, comprising:
When the number of documents is more than preset value, the relevance between the document is calculated according to pre-defined rule;
According to the relevance to the document classification.
Optionally, it is described according to the relevance to the document classification, comprising:
The ID for obtaining the document reads the content information of the document;
The high document of the degree of association is polymerize;
Document after polymerization is placed under same catalogue.
Optionally, described when the on-line documentation quantity is more than preset value, it adjusts between the document under the specified directory Structural relation, comprising:
It is described when the on-line documentation quantity be more than preset value when, provide and whether carry out document classification prompt information;
After confirmation is sorted out, the specified directory Documents are sorted out automatically.
According to the specific embodiment of the disclosure, second aspect, the disclosure provides a kind of document classification device, comprising:
Acquiring unit, for obtaining User ID, the associated document under specified directory of the determining and User ID Quantity;
Sort out unit, for adjusting between the document under the specified directory when the number of documents is more than preset value Structural relation;
Display unit, for showing result adjusted at default interface.
Optionally, the display unit is also used to:
According in the default received user instruction in interface, by the structural relation of the document under the specified directory according to The result adjusted is shown.
Optionally, the classification unit is also used to:
When the number of documents is more than preset value, the relevance between the document is calculated according to pre-defined rule;
According to the relevance to the document classification.
According to the specific embodiment of the disclosure, the third aspect, the disclosure provides a kind of computer readable storage medium, On be stored with computer program, when described program is executed by processor realize as above described in any item methods.
According to the specific embodiment of the disclosure, fourth aspect, the disclosure provides a kind of electronic equipment, comprising: one or Multiple processors;Storage device, for storing one or more programs, when one or more of programs are by one or more When a processor executes, so that one or more of processors realize as above described in any item methods.
The above scheme of the embodiment of the present disclosure compared with prior art, at least has the advantages that the disclosure proposes A kind of document classification method, the quantity of system automatic identification statistical documents, and whether user is reminded to sort out according to setting, Classifying module is called to sort out the document under specified directory when being sorted out, so that the text under specified directory Shelves carry out classified finishing storage according to document content, search convenient for user, improve office efficiency.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 shows the document classification method flow diagram according to the embodiment of the present disclosure;
Fig. 2 shows drag execution flow diagram according to the document classification method of the embodiment of the present disclosure;
Fig. 3 shows the document classification apparatus structure schematic diagram according to the embodiment of the present disclosure;
Fig. 4 shows electronic equipment attachment structure schematic diagram according to an embodiment of the present disclosure.
Specific embodiment
In order to keep the purposes, technical schemes and advantages of the disclosure clearer, below in conjunction with attached drawing to the disclosure make into It is described in detail to one step, it is clear that described embodiment is only disclosure a part of the embodiment, rather than whole implementation Example.It is obtained by those of ordinary skill in the art without making creative efforts based on the embodiment in the disclosure All other embodiment belongs to the range of disclosure protection.
The term used in the embodiments of the present disclosure is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The disclosure.In the embodiment of the present disclosure and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning, " a variety of " generally comprise at least two.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though may be described in the embodiments of the present disclosure using term first, second, third, etc.., But these ... it should not necessarily be limited by these terms.These terms be only used to by ... distinguish.For example, implementing not departing from the disclosure In the case where example range, first ... can also be referred to as second ..., and similarly, second ... can also be referred to as the One ....
Depending on context, word as used in this " if ", " if " can be construed to " ... when " or " when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as Fruit detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection (statement Condition or event) when " or " in response to detection (condition or event of statement) ".
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Include, so that commodity or device including a series of elements not only include those elements, but also including not clear The other element listed, or further include for this commodity or the intrinsic element of device.In the feelings not limited more Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or device for including the element also There are other identical elements.
The alternative embodiment of the disclosure is described in detail with reference to the accompanying drawing.
Embodiment 1
As shown in Figure 1, the disclosure provides a kind of document classification method, particularly according to the specific embodiment of the disclosure Refer to a kind of classifying method of online document, naturally it is also possible to which, suitable for the classification of common document, being applied to master catalogue, (such as user is first Page) under document classification, can also be any specified file, such as a certain files etc. with multiple storage documents, Be certainly not limited to this, for it is any include that the positions of multiple online documents can execute the online document automatic clustering side Method.Online document herein can be allowed for one kind user's online editing word, excel or any can input text Editing machine.This method specifically comprises the following steps:
Step S102: User ID, the determining quantity with the associated document under specified directory of the User ID are obtained.
Each online document editor is enlightened, requires the editor user for determining the online document, including confirm the volume The id informations such as account, user name, phone, the mailbox of user are collected, it is determining later associated positioned at specified directory with the User ID Under document quantity, the online document quantity under the real-time statistics specified directory of computer backstage, when newly-built or copy this catalogue to When next document, one will be added in statistical magnitude, conversely, deleting document will subtract one in corresponding statistical magnitude.Wherein Include but is not limited under specified root, such as can be under user's homepage or specify under thread path or current edit line The page is inferior, can also be any specified file, such as a certain file etc. with multiple storage documents, for the side of explanation Just, the present embodiment is described using a certain file as specified directory.
Step S104: when the number of documents is more than preset value, the knot between the document under the specified directory is adjusted Structure relationship.
Wherein, preset value can be with self-setting, such as 20,25,30 etc. can be with, does not do considered critical to particular number, But it is advisable with being greater than 10, preferably 20-25.Optionally, described when the on-line documentation quantity is more than preset value, it adjusts described specified The structural relation between document under catalogue, comprising: it is described when the on-line documentation quantity is more than preset value, it provides and whether carries out Document classification prompt information;After confirmation is sorted out, online document under the specified directory is sorted out automatically.
Specific execution method is as follows: as shown in Figure 2.
Step S1042: automatic to call document associations analysis module when the on-line documentation quantity is more than preset value;
The analysis module is that analysis module trained in advance is carried out using bayesian algorithm, in the analysis module of the training In, automatic clustering can be carried out according to the degree of association of document.
Bayesian algorithm is as follows:
In naive Bayesian document classification type, calculating whether certain document belongs to Type C, calculation formula is as follows,
P (F1F2...Fn | C) P (C)=P (F1 | C) P (F2 | C) ... P (Fn | C) P (C), wherein P (C) identity type C is literary The probability that shelves occur, and P (F1 | C) identify the probability that word F1 occurs in C type document.
The rule that P (F1 | C) is calculated is as follows:
1. under Type C the number of all words of all documents and be N,
2. the number that word F1 occurs in all document documents is as M (note: the document being not only under Type C, but complete The document in portion)
3. in all documents unduplicated word number and be NN
So P (F1 | C)=M/ (N+NN)
By above method it is recognised that the probability P (W | C) that any one word W occurs in document C, if do not had So probability occur is exactly 0, then the probability that document F belongs to Type C is P (C) * P (W1 | C) * P (W2 | C) * ... * P (Wn | C) Wherein W1, W2 indicate the word occurred in document F to=p1.P (W1 | C) identify the probability that word W1 occurs under Type C.
Then the probability that document F occurs in other types is calculated with identical method, obtains p2, p3 etc., compares p1, The value of p2, p3 etc., being worth maximum indicates that document F is more like with such, so that document is divided into the type.
Specific case column are as shown in the table:
Document id The word contained in document Doctype
1 Bayes's classification formula science Science and technology
2 Bayes's signal-to-noise ratio science Science and technology
3 Formula official documents and correspondence Patent class
To be sorted 4 Bayes's signal-to-noise ratio official documents and correspondence
Steps are as follows for specific algorithm
1. the document A, B, C etc. in off-line data are carried out manual sort first, A, B belong to science and technology, and C belongs to patent Class, a total of different word total number is " Bayes "+" classification "+" formula "+" science "+"signal to noise ratio"+" official documents and correspondence " in document =6
2. taking out same type of document A, B, corresponding total words mesh has 6, and wherein word " Bayes " occurs general Rate is (2)/(6+6)=2/12, wherein what is identified for first 6 is a total of number of words of science and technology document, second 6 mark Be unduplicated word in all documents number, successively calculate the probability of each word.
The probability that word occurs under science and technology document:
" Bayes "=(2)/(6+6)=2/12;
" classification "=(1)/(6+6)=2/12;
" formula "=(1)/(6+6)=2/12;
" science "=(2)/(6+6)=2/12;
"signal to noise ratio"=(1)/(6+6)=2/12;
The probability that word occurs under patent class document:
" formula "=(1)/(2+6)=2/8;
" official documents and correspondence "=(1)/(2+6)=2/8;
3. for the stepping 4 to be classified:
In probability=(2/3) * (2/12) * (2/12) * (1/12)=0.001543209 of science and technology document;
In probability=(1/3) * (1/8) * (1/8) * (2/8)=0.001302083333 of patent class document;
In summary, document 4 to be sorted is greater than patent class in science and technology document probability, so that it is classified as science and technology text Shelves.
Step S1044: the document associations analysis module is sorted out according to the relevance of the online document.
Optionally, the document associations analysis module is sorted out according to the relevance of the online document, comprising:
Firstly, then the ID, such as filename, attribute etc. that obtain a certain online document read the online document Content information, the title of document, abstract can be read first herein or concluded according to the lexical word existing number that occurs frequently, from And what the content for substantially analyzing the document record is, such as " liquid crystal display " word repeatedly occurs, then it is assumed that the document is retouched What is stated is about " liquid crystal display " relevant technology contents;Then all texts under the other catalogue are analyzed in the same way The content information of shelves, the high online document of the degree of association is polymerize, such as have 100 documents under the catalogue, wherein 30 A piece refers to " liquid crystal display ", then 30 " liquid crystal display " the relevant documents is carried out classification polymerization;Finally, by after polymerization Online document is placed under same catalogue, and can be named again for the catalogue, such as by above-mentioned 30 " liquid crystal display " class documents After being placed under same catalogue, " liquid crystal display " is named as it.
Step S106: result adjusted is shown at default interface.
Optionally, it is described result adjusted is shown at default interface after, comprising:
Classification structure adjusted is shown in interface automatically by system in a manner of preview, and user can be automatic to this at this time The result of classification is judged, when thinking that the categorization results are accurate, then can pass through the side of click confirmation or input validation order Formula receives the categorization results, and online document is shown according to the categorization results under the specified directory;Otherwise, user can select It selects and does not receive the categorization results, online document is remained stationary constant under the specified directory.
The present disclosure proposes a kind of online document classifying method, system automatic identification counts the quantity of online document, and root Whether user is reminded to sort out according to setting, calls classifying module to the online document under specified directory when being sorted out Sorted out, so that the document under specified directory carries out classified finishing storage according to document content, searches, mention convenient for user High office efficiency.
Embodiment 2
As shown in figure 3, the disclosure provides a kind of document classification device, particularly according to the specific embodiment of the disclosure Refer to a kind of categorization arrangement of online document, naturally it is also possible to which, suitable for the classification of common document, being applied to master catalogue, (such as user is first Page) under document classification, can also be any specified file, such as a certain files etc. with multiple storage documents, Be certainly not limited to this, for it is any include that the positions of multiple online documents can execute the online document automatic clustering side Method.Online document herein can be allowed for one kind user's online editing word, excel or any can input text Editing machine.The device specifically includes: acquiring unit 302 sorts out unit 304 and display unit 306.
Acquiring unit 302: for obtaining User ID, the associated document under specified directory of the determining and User ID Quantity.
Each online document editor is enlightened, requires the editor user for determining the online document, including confirm the volume The id informations such as account, user name, phone, the mailbox of user are collected, it is determining later associated positioned at specified directory with the User ID Under document quantity, the online document quantity under the real-time statistics specified directory of computer backstage, when newly-built or copy this catalogue to When next document, one will be added in statistical magnitude, conversely, deleting document will subtract one in corresponding statistical magnitude.Wherein Include but is not limited under specified root, such as can be under user's homepage or specify under thread path or current edit line The page is inferior, can also be any specified file, such as a certain file etc. with multiple storage documents, for the side of explanation Just, the present embodiment is described using a certain file as specified directory.
Sort out unit 304: for when the number of documents is more than preset value, adjust document under the specified directory it Between structural relation.
Wherein, preset value can be with self-setting, such as 20,25,30 etc. can be with, does not do considered critical to particular number, But it is advisable with being greater than 10, preferably 20-25.Optionally, described when the on-line documentation quantity is more than preset value, it adjusts described specified The structural relation between document under catalogue, comprising: it is described when the on-line documentation quantity is more than preset value, it provides and whether carries out Document classification prompt information;After confirmation is sorted out, online document under the specified directory is sorted out automatically.
Specific execution method is as follows: being also used to as shown in Fig. 2, sorting out unit.
The first, automatic to call document associations analysis module when the on-line documentation quantity is more than preset value;
The analysis module is that analysis module trained in advance is carried out using bayesian algorithm, in the analysis module of the training In, automatic clustering can be carried out according to the degree of association of document.Bayesian algorithm is no longer superfluous herein referring specifically to embodiment 1 as above It states.
The second, the described document associations analysis module is sorted out according to the relevance of the online document.
Specific example is as follows:
Firstly, then the ID, such as filename, attribute etc. that obtain a certain online document read the online document Content information, the title of document, abstract can be read first herein or concluded according to the lexical word existing number that occurs frequently, from And what the content for substantially analyzing the document record is, such as " liquid crystal display " word repeatedly occurs, then it is assumed that the document is retouched What is stated is about " liquid crystal display " relevant technology contents;Then all texts under the other catalogue are analyzed in the same way The content information of shelves, the high online document of the degree of association is polymerize, such as have 100 documents under the catalogue, wherein 30 A piece refers to " liquid crystal display ", then 30 " liquid crystal display " the relevant documents is carried out classification polymerization;Finally, by after polymerization Online document is placed under same catalogue, and can be named again for the catalogue, such as by above-mentioned 30 " liquid crystal display " class documents After being placed under same catalogue, " liquid crystal display " is named as it.
Display unit 306: for showing result adjusted at default interface.
Be also used to: classification structure adjusted is shown in interface automatically by system in a manner of preview, and user can be at this time The result that this classifies automatically is judged, it, then can be by clicking confirmation or input validation when thinking that the categorization results are accurate The mode of order receives the categorization results, and online document is shown according to the categorization results under the specified directory;Otherwise, it uses Family, which can choose, does not receive the categorization results, and online document is remained stationary constant under the specified directory.
The present disclosure proposes a kind of online document categorization arrangement, system automatic identification counts the quantity of online document, and root Whether user is reminded to sort out according to setting, calls classifying module to the online document under specified directory when being sorted out Sorted out, so that the document under specified directory carries out classified finishing storage according to document content, searches, mention convenient for user High office efficiency.
Embodiment 3
As shown in figure 4, the equipment is for the classification to online document, the electricity the present embodiment provides a kind of electronic equipment Sub- equipment, comprising: at least one processor;And the memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one Device is managed to execute, so that at least one described processor is able to carry out following operation:
Obtain User ID, the determining quantity with the associated document under specified directory of the User ID;
When the number of documents is more than preset value, the structural relation between the document under the specified directory is adjusted;
Result adjusted is shown at default interface.
Optionally, it is described result adjusted is shown at default interface after, comprising:
According in the default received user instruction in interface, by the structural relation of the document under the specified directory according to The result adjusted is shown.
Optionally, described when the on-line documentation quantity is more than preset value, it adjusts between the document under the specified directory Structural relation, comprising:
When the number of documents is more than preset value, the relevance between the document is calculated according to pre-defined rule;
According to the relevance to the document classification.
Optionally, it is described according to the relevance to the document classification, comprising:
The ID for obtaining the document reads the content information of the document;
The high document of the degree of association is polymerize;
Document after polymerization is placed under same catalogue.
Optionally, described when the on-line documentation quantity is more than preset value, it adjusts between the document under the specified directory Structural relation, comprising:
It is described when the on-line documentation quantity be more than preset value when, provide and whether carry out document classification prompt information;
After confirmation is sorted out, the specified directory Documents are sorted out automatically.
Embodiment 4
The embodiment of the present disclosure provides a kind of nonvolatile computer storage media, and the computer storage medium is stored with Any of the above-described method can be performed in computer executable instructions, the computer executable instructions.
Embodiment 5
Below with reference to Fig. 4, it illustrates the structural representations for the electronic equipment 400 for being suitable for being used to realize the embodiment of the present disclosure Figure.Terminal device in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting and connect Receive device, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle Carry navigation terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Electricity shown in Fig. 4 Sub- equipment is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 4, electronic equipment 400 may include processing unit (such as central processing unit, graphics processor etc.) 401, random access can be loaded into according to the program being stored in read-only memory (ROM) 402 or from storage device 408 Program in memory (RAM) 403 and execute various movements appropriate and processing.In RAM 403, it is also stored with electronic equipment Various programs and data needed for 400 operations.Processing unit 401, ROM 402 and RAM 403 pass through the phase each other of bus 404 Even.Input/output (I/O) interface 405 is also connected to bus 404.
In general, following device can connect to I/O interface 405: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 406 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 407 of dynamic device etc.;Storage device 408 including such as tape, hard disk etc.;And communication device 409.Communication device 409, which can permit electronic equipment 400, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 4 shows tool There is the electronic equipment 400 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 409, or from storage device 408 It is mounted, or is mounted from ROM 402.When the computer program is executed by processing unit 401, the embodiment of the present disclosure is executed Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions.

Claims (10)

1. a kind of document classification method characterized by comprising
Obtain User ID, the determining quantity with the associated document under specified directory of the User ID;
When the number of documents is more than preset value, the structural relation between the document under the specified directory is adjusted;
Result adjusted is shown at default interface.
2. the method as described in claim 1, which is characterized in that described to carry out showing it at default interface by result adjusted Afterwards, comprising:
According in the default received user instruction in interface, by the structural relation of the document under the specified directory according to described Result adjusted is shown.
3. the method as described in claim 1, which is characterized in that described when the on-line documentation quantity is more than preset value, adjustment The structural relation between document under the specified directory, comprising:
When the number of documents is more than preset value, the relevance between the document is calculated according to pre-defined rule;
According to the relevance to the document classification.
4. method as claimed in claim 3, which is characterized in that it is described according to the relevance to the document classification, comprising:
The ID for obtaining the document reads the content information of the document;
The high document of the degree of association is polymerize;
Document after polymerization is placed under same catalogue.
5. the method as described in claim 1, which is characterized in that described when the on-line documentation quantity is more than preset value, adjustment The structural relation between document under the specified directory, comprising:
It is described when the on-line documentation quantity be more than preset value when, provide and whether carry out document classification prompt information;
After confirmation is sorted out, the specified directory Documents are sorted out automatically.
6. a kind of document classification device characterized by comprising
Acquiring unit, for obtaining User ID, the determining quantity with the associated document under specified directory of the User ID;
Sort out unit, for adjusting the knot between the document under the specified directory when the number of documents is more than preset value Structure relationship;
Display unit, for showing result adjusted at default interface.
7. device as claimed in claim 6, which is characterized in that the display unit is also used to:
According in the default received user instruction in interface, by the structural relation of the document under the specified directory according to described Result adjusted is shown.
8. device as claimed in claim 7, which is characterized in that the classification unit is also used to:
When the number of documents is more than preset value, the relevance between the document is calculated according to pre-defined rule;
According to the relevance to the document classification.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is by processor The method as described in any one of claims 1 to 5 is realized when execution.
10. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize the method as described in any one of claims 1 to 5.
CN201910206339.XA 2019-03-19 2019-03-19 A kind of document classification method, apparatus, medium and electronic equipment Pending CN110096590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910206339.XA CN110096590A (en) 2019-03-19 2019-03-19 A kind of document classification method, apparatus, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910206339.XA CN110096590A (en) 2019-03-19 2019-03-19 A kind of document classification method, apparatus, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN110096590A true CN110096590A (en) 2019-08-06

Family

ID=67443203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910206339.XA Pending CN110096590A (en) 2019-03-19 2019-03-19 A kind of document classification method, apparatus, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110096590A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674082A (en) * 2019-09-24 2020-01-10 北京字节跳动网络技术有限公司 Method and device for removing online document, electronic equipment and computer readable medium
CN111858476A (en) * 2020-07-20 2020-10-30 上海闻泰电子科技有限公司 File processing method and device, electronic equipment and computer readable storage medium
CN111858518A (en) * 2020-07-09 2020-10-30 北京字节跳动网络技术有限公司 Method and device for updating reference document, electronic equipment and storage medium
CN112269870A (en) * 2020-11-03 2021-01-26 北京字跳网络技术有限公司 Document sorting method and device, electronic equipment and computer readable storage medium
CN113254583A (en) * 2021-05-28 2021-08-13 北京明略软件系统有限公司 Document marking method, device and medium based on semantic vector
CN115757799A (en) * 2022-12-02 2023-03-07 松原市邹佳网络科技有限公司 Data storage method and system based on artificial intelligence and cloud platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1773492A (en) * 2004-11-09 2006-05-17 国际商业机器公司 Method for organizing multi-file and equipment for displaying multi-file
CN1855094A (en) * 2005-04-28 2006-11-01 国际商业机器公司 Method and device for processing electronic files of users
CN104160395A (en) * 2012-02-29 2014-11-19 Ubic股份有限公司 Document classification system, document classification method, and document classification program
CN107943984A (en) * 2017-11-30 2018-04-20 广东欧珀移动通信有限公司 Image processing method, device, computer equipment and computer-readable recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1773492A (en) * 2004-11-09 2006-05-17 国际商业机器公司 Method for organizing multi-file and equipment for displaying multi-file
CN1855094A (en) * 2005-04-28 2006-11-01 国际商业机器公司 Method and device for processing electronic files of users
CN104160395A (en) * 2012-02-29 2014-11-19 Ubic股份有限公司 Document classification system, document classification method, and document classification program
CN107943984A (en) * 2017-11-30 2018-04-20 广东欧珀移动通信有限公司 Image processing method, device, computer equipment and computer-readable recording medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674082A (en) * 2019-09-24 2020-01-10 北京字节跳动网络技术有限公司 Method and device for removing online document, electronic equipment and computer readable medium
CN111858518A (en) * 2020-07-09 2020-10-30 北京字节跳动网络技术有限公司 Method and device for updating reference document, electronic equipment and storage medium
CN111858518B (en) * 2020-07-09 2022-10-25 北京字节跳动网络技术有限公司 Method and device for updating reference document, electronic equipment and storage medium
CN111858476A (en) * 2020-07-20 2020-10-30 上海闻泰电子科技有限公司 File processing method and device, electronic equipment and computer readable storage medium
CN112269870A (en) * 2020-11-03 2021-01-26 北京字跳网络技术有限公司 Document sorting method and device, electronic equipment and computer readable storage medium
CN113254583A (en) * 2021-05-28 2021-08-13 北京明略软件系统有限公司 Document marking method, device and medium based on semantic vector
CN113254583B (en) * 2021-05-28 2021-11-02 北京明略软件系统有限公司 Document marking method, device and medium based on semantic vector
CN115757799A (en) * 2022-12-02 2023-03-07 松原市邹佳网络科技有限公司 Data storage method and system based on artificial intelligence and cloud platform
CN115757799B (en) * 2022-12-02 2023-10-24 北京国联视讯信息技术股份有限公司 Data storage method and system based on artificial intelligence and cloud platform

Similar Documents

Publication Publication Date Title
CN110096590A (en) A kind of document classification method, apparatus, medium and electronic equipment
CN109634698B (en) Menu display method and device, computer equipment and storage medium
US9331971B2 (en) Message subscription based on message aggregate characteristics
CN110162796B (en) News thematic creation method and device
US8458194B1 (en) System and method for content-based document organization and filing
WO2020155750A1 (en) Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium
CN111680254B (en) Content recommendation method and device
WO2013189296A1 (en) Method and system for processing recommended target software
CN106911757A (en) The method for pushing and device of a kind of business information
TW201833851A (en) Risk control event automatic processing method and apparatus
CN108764319A (en) A kind of sample classification method and apparatus
WO2023272850A1 (en) Decision tree-based product matching method, apparatus and device, and storage medium
US9002832B1 (en) Classifying sites as low quality sites
CN109284367B (en) Method and device for processing text
CN110362815A (en) Text vector generation method and device
KR20180011261A (en) Search processing method and apparatus
CN103942328A (en) Video retrieval method and video device
CN110321447A (en) Determination method, apparatus, electronic equipment and the storage medium of multiimage
CN110489156A (en) Edition control method, device, medium and the electronic equipment of binary format
US10474700B2 (en) Robust stream filtering based on reference document
CN112084448B (en) Similar information processing method and device
CN114443943A (en) Information scheduling method, device and equipment and computer readable storage medium
CN110704139B (en) Icon classification method and device
CN111428159A (en) Online classification method and device
CN113051919A (en) Method and device for identifying named entity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination