CN110688349A - Document sorting method, device, terminal and computer readable storage medium - Google Patents

Document sorting method, device, terminal and computer readable storage medium Download PDF

Info

Publication number
CN110688349A
CN110688349A CN201910820963.9A CN201910820963A CN110688349A CN 110688349 A CN110688349 A CN 110688349A CN 201910820963 A CN201910820963 A CN 201910820963A CN 110688349 A CN110688349 A CN 110688349A
Authority
CN
China
Prior art keywords
document
content
sorted
documents
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910820963.9A
Other languages
Chinese (zh)
Other versions
CN110688349B (en
Inventor
张登超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Xiaoyu Small Loan Co Ltd
Original Assignee
Chongqing Xiaoyu Small Loan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Xiaoyu Small Loan Co Ltd filed Critical Chongqing Xiaoyu Small Loan Co Ltd
Priority to CN201910820963.9A priority Critical patent/CN110688349B/en
Publication of CN110688349A publication Critical patent/CN110688349A/en
Application granted granted Critical
Publication of CN110688349B publication Critical patent/CN110688349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a document sorting method, a document sorting device, a terminal and a computer readable storage medium, wherein the method comprises the following steps: determining a plurality of content keywords, and acquiring a document to be sorted according to a set target path; scanning the documents to be sorted, and respectively extracting information corresponding to the content keywords from the documents to be sorted; and filling the extracted information corresponding to the content keywords into the position matched with each content keyword in the summary document. By implementing the method, the document can be automatically arranged, and the document can be arranged according to the rules set by the user, so that the tedious and error-prone manual operation is solved, and the working efficiency is improved.

Description

Document sorting method, device, terminal and computer readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for document organization, a terminal, and a computer-readable storage medium.
Background
With the rapid development of the computer field, the electronic documents gradually replace the traditional paper documents, a large number of electronic documents such as financial documents and personnel documents are generated in the development of related work of enterprises, document sorting work is also generated along with the appearance of the electronic documents, and most of the enterprises adopt a manual operation method when the documents need to be sorted.
At present, for a document arrangement method, the operations of searching a document, opening the document, extracting document content, copying and pasting the document to a target form are all performed manually, so that the operation is quite troublesome, time and labor are wasted, errors are easy to occur in arrangement, and the working efficiency cannot be improved.
Disclosure of Invention
The embodiment of the invention provides a document sorting method, a document sorting device, a terminal and a computer readable storage medium, which can automatically sort documents and sort the documents according to rules set by a user, solve the problem of complicated manual operation which is easy to make mistakes, and improve the working efficiency.
The first aspect of the embodiment of the invention discloses a document sorting method, which comprises the following steps:
determining a plurality of content keywords, and acquiring a document to be sorted according to a set target path;
scanning the documents to be sorted, and respectively extracting information corresponding to the content keywords from the documents to be sorted;
and filling the extracted information corresponding to the content keywords into the position matched with each content keyword in the summary document.
The second aspect of the embodiment of the present invention discloses a document finishing apparatus, including:
the acquisition module is used for determining a plurality of content keywords and acquiring the document to be sorted according to a set target path;
the extraction module is used for scanning the documents to be sorted and respectively extracting information corresponding to the content keywords from the documents to be sorted;
and the filling module is used for respectively filling the extracted information corresponding to the content keywords into the position, matched with each content keyword, in the summary document.
A third aspect of an embodiment of the present invention discloses a terminal, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.
A fourth aspect of the present invention discloses a computer-readable storage medium, wherein the computer storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the method of the first aspect.
In the embodiment of the invention, a terminal determines a plurality of content keywords, acquires a document to be sorted according to a set target path, scans the document to be sorted, respectively extracts information corresponding to the content keywords from the document to be sorted, and further respectively fills the extracted information corresponding to the content keywords into a position, matched with each content keyword, in a summary document. By implementing the method, the document can be automatically arranged, and the document can be arranged according to the rules set by the user, so that the tedious and error-prone manual operation is solved, and the working efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a document sorting method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another document sorting method according to an embodiment of the present invention;
FIG. 3 is a finishing interface provided by embodiments of the present invention;
FIG. 4 is a schematic structural diagram of a document collating device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a document sorting method according to an embodiment of the present invention. The document sorting method described in this embodiment includes the following steps:
101: determining a plurality of content keywords, and acquiring the document to be sorted according to a set target path.
The documents to be sorted can include one or more documents, the documents to be sorted can be intellectual property documents, financial documents, personnel documents and the like, the types of the documents to be sorted can be word, excel table, PPT slide and the like, content keywords can be set according to the requirements of users, the content keywords can be keywords such as document content title, date and the like, and for example, application date, issue date, authorization date, application number, applicant, inventor and the like in patent-related documents can be set as the content keywords.
Specifically, when a user needs to sort documents such as intellectual property documents, the user can set a plurality of content keywords and target paths of the documents to be sorted, after the user sets the content keywords and the target paths of the documents to be sorted, a terminal obtains a document sorting request from the user, the document sorting request comprises the plurality of content keywords and the target paths of the documents to be sorted, and the terminal obtains the documents to be sorted according to the target paths.
For example, as shown in fig. 3, when a user needs to sort documents such as intellectual property documents, the terminal display screen outputs a sorting interface, the sorting interface includes a parameter setting area and a status indication area, the parameter setting area is used for the user to input a target path of the document to be sorted, a content keyword and a document keyword, the status indication area is used for displaying a progress of document sorting, and the progress of document sorting may be represented by a percentage. For example, after a user inputs a target path of a document to be sorted in a path input box for searching a file name in a parameter setting area, inputs a content keyword of the document to be sorted in the content keyword input box, and clicks a corresponding "confirm" button, a terminal may obtain a document sorting request from the user, where the document sorting request includes a plurality of content keywords input by the user in the parameter setting area and the target path of the document to be sorted, and further, the terminal obtains the document to be sorted according to the target path.
It should be noted that the target paths of all the documents to be sorted may be in the same path or in different paths, the target path of the document to be sorted may be a path newly created by the user when sorting the document to be sorted, or may be an original path of the document to be sorted before sorting, and the target path of the document to be sorted is set and selected by the user, which is not limited in the embodiment of the present invention.
102: and scanning the document to be sorted, and respectively extracting information corresponding to the plurality of content keywords from the document to be sorted.
Specifically, after determining the document to be sorted according to the target path, the terminal scans the document to be sorted and respectively extracts information corresponding to the content keywords from the document to be sorted. In the process that the terminal respectively extracts the information corresponding to the content keywords from the document to be sorted, the terminal firstly obtains the name of the document to be sorted and respectively extracts the information corresponding to the content keywords from the name of the document to be sorted, further, the terminal detects whether the target content keywords of which the corresponding information is not extracted exist in the content keywords, if the target content keywords of which the corresponding information is not extracted exist in the content keywords, the terminal scans the content of the document to be sorted and extracts the information corresponding to the target content keywords from the content of the document to be sorted.
For example, the content keyword set by the user is "application date", the terminal scans the document to be collated and extracts the corresponding information according to the "application date", for example, the description about "application date" in the document to be collated is "application date: 2018.03.30 ", the information corresponding to the" application date "extracted by the terminal is" 2018.03.30 ".
103: and filling the extracted information corresponding to the plurality of content keywords into the position matched with each content keyword in the summary document.
The summary document is word or Excel for information collection of the document to be sorted.
Specifically, the terminal may obtain a target table in the summary document, determine a target table associated with each content keyword from the tables of the target table, and fill, for each content keyword, the extracted information corresponding to the content keyword to a position corresponding to the target table associated with the content keyword.
For example, when the user sorts the documents related to intellectual property rights, the content keywords preset by the user may be the document content title, application date, application number, applicant, and inventor. For example, table 1 is a target table in a summary document, the terminal needs to fill information corresponding to content keywords in a document to be sorted into table 1, before filling, the terminal can obtain the target table in the summary document, that is, table 1, and determine a target table header associated with each content keyword from the table headers of table 1, where the target table header is a document content title, an application date, an application number, an applicant, and an inventor, and further, for each content keyword, the terminal fills the extracted information corresponding to the content keyword into a corresponding position of the target table header associated with the content keyword, and the table header of table 1 does not need to be filled with information at corresponding positions of a text sending date and an authorization date.
Table 1:
file title Date of filling Date of issue Date of authorization Application number Applicant Inventor(s):
for another example, when the user organizes the personnel file, the content keyword preset by the user may be the name of an employee, the date of birth, the academic calendar, the university, the home address, the contact address, and the information of the relative. For example, table 2 is a target table in the summary document, the terminal needs to fill the information corresponding to the content keyword in the document to be sorted into table 2, before filling, the terminal may obtain the target table in the summary document, that is, table 2, and determine a target table associated with each content keyword from the table heads of table 2, where the target table head is employee name, birth date, academic calendar, graduate colleges, family addresses, contact ways, and family information, and further, for each content keyword, the terminal fills the extracted information corresponding to the content keyword into a corresponding position of the target table head associated with the content keyword.
Table 2:
staff name Date of birth Study calendar Colleges and universities of graduation Home address Contact means Information of relatives
In one implementation mode, after a terminal scans a current document in documents to be sorted and respectively acquires information corresponding to a plurality of content keywords from the current document, the terminal extracts the information corresponding to the plurality of content keywords acquired from the current document into a cache space, then respectively fills the information corresponding to the plurality of content keywords of the current document in the cache space into a position, matched with each content keyword, in a summary document, and then judges whether the current document is the last document of the documents to be sorted, if not, the terminal scans the next document of the current document, and if so, the terminal finishes scanning.
It should be noted that the target table in the summary document is not limited to a table in an excel document or a word document, where a plurality of tables may exist in the summary document, and the target table in the summary document is set and selected by a user, which is not limited in the embodiment of the present invention.
In the embodiment of the invention, a terminal determines a plurality of content keywords, acquires a document to be sorted according to a set target path, scans the document to be sorted, respectively extracts information corresponding to the content keywords from the document to be sorted, and further respectively fills the extracted information corresponding to the content keywords into a position, matched with each content keyword, in a summary document. By implementing the method, the document can be automatically arranged, and the document can be arranged according to the rules set by the user, so that the tedious and error-prone manual operation is solved, and the working efficiency is improved.
Please refer to fig. 2, which is a flowchart illustrating another document sorting method according to an embodiment of the present invention. The document sorting method described in this embodiment includes the following steps:
201: and acquiring a target table in the summary document, and respectively determining the table heads of the target table files as content keywords.
The target form in the summary document can be set by a user, and the target form is not limited to a form in an excel document or a word document.
Specifically, the terminal may obtain a target table in the summary document, and determine the table header of the target table file as the content keyword, respectively. For example, table 1 is the target table in the summary document, the header content in the table is: the title of the document, the date of application, the date of issuance, the date of authorization, the number of application, the applicant, the inventor, are determined as the content keyword.
202: and acquiring the document to be sorted according to the set target path.
Specifically, the terminal can obtain preset document keywords, wherein the document keywords include one or more of document types (txt, xls, xlsxx, doc, docx, pptx and the like), document names and document editing time, the terminal scans all documents in a set target path, screens out documents matched with the document keywords from all the documents, and determines the documents matched with the document keywords as the documents to be sorted.
For example, as shown in fig. 3, after a user inputs a target path of a document to be sorted in a path input box for searching a file name, inputs a content keyword of the document to be sorted in a content keyword input box, inputs a document keyword of the document to be sorted in a document keyword input box, and clicks a "ok" button, a terminal obtains a document sorting request from the user, the document sorting request includes the target path of the document to be sorted, the content keyword, and the document keyword, which are input by the user in a parameter setting area, and the terminal determines whether the document or the folder is under the path one by one according to the target path. If the documents exist, the terminal continuously searches the documents in the folder until no folder exists and only the documents exist, and if a plurality of documents exist, the terminal screens the documents according to the document keywords to screen the documents matched with the document keywords, and determines the documents matched with the document keywords as the documents to be sorted.
203: and scanning the document to be sorted, and respectively extracting information corresponding to the plurality of content keywords from the document to be sorted.
Specifically, the specific implementation of step 203 may refer to the related description of step 103 in the foregoing embodiment, and is not described herein again.
204: and filling the extracted information corresponding to the plurality of content keywords into the position matched with each content keyword in the summary document.
Specifically, the terminal may obtain a target table in the summary document, determine a target table associated with each content keyword from the tables of the target table, and fill, for each content keyword, the extracted information corresponding to the content keyword to a position corresponding to the target table associated with the content keyword.
In one implementation mode, the terminal scans all the contents of the document to be sorted according to the content keywords set by the user, acquires information corresponding to the content keywords set by the user, and then stores the information corresponding to the acquired content keywords in a cache for later use. The terminal does not store the information corresponding to the content keywords of all the documents to be sorted in the cache, but scans one document to be sorted to process one document to be sorted, and the terminal fills the information corresponding to each content keyword in the cache to the corresponding position of the target header in the summary document by taking one document to be sorted as a unit, for example, the header in table 1 is: the title of the document, the date of application, the date of issue, the date of grant, the number of application, the applicant, the inventor. The terminal scans a document to be sorted and extracts information corresponding to the content keywords, and fills the information corresponding to the content keywords in the document to be sorted to the corresponding position of the target header in the summary form document, but does not scan all the documents to be sorted and extract the information corresponding to the content keywords and then fill the documents, so that the situation that the cache is insufficient due to too many documents and too large content is avoided. After all the files to be sorted are searched and matched, the terminal prompts the user that the content is searched, that is, the task status frame of the status indication area is displayed by 100% as shown in fig. 3, and simultaneously, the target table in the summary document is filled.
205: and adding an identifier to the sorted document in the document to be sorted, recording the sorting time of the summarized document, and regularly acquiring the editing time of the document under the target path.
For example, assuming that the information in table 1 is completely filled, the terminal adds an identifier to the sorted document in table 1, and the identifier is used to specifically specify the position of the information of a certain sorted document in the summarized document, and if the information corresponding to the sorted document a is filled in table 1 of the summarized document and is specifically filled in the second row of table 1, the identifier of the sorted document a is the second row in table 1 of the summarized document.
Specifically, the terminal may obtain the editing time of the document under the target path at a fixed time each day, for example, 17:00 each day.
206: when there is a target document whose editing time is later than the finishing time, target information corresponding to the content keyword is extracted from the target document.
207: and replacing the target information with the information of the position corresponding to the identification of the target document in the summary document.
Specifically, the sorted documents may have errors recorded in the document contents for some reasons, for example, the related time is wrong, and a user may modify the contents of one of the sorted documents, at this time, the editing time, for example, the modification date of the document may change, and if it is found that the editing time that changes is later than the sorting time, error information may occur in the information of the document filled in the sorted document.
In one implementation mode, when the terminal detects that no identifier is added to the document to be sorted and the editing time of the document to be sorted is later than the sorting time, the terminal scans the document to be sorted and respectively extracts information corresponding to a plurality of content keywords from the document to be sorted, further, respectively fills the extracted information corresponding to the plurality of content keywords into the position, matched with each content keyword, in the summary document, and adds the identifier to the document.
It can be seen that after the finishing time of the summarized documents is recorded, if a target document of which the editing time is later than the finishing time is detected, target information corresponding to the content keywords needs to be extracted from the target document, and the target information replaces information at a position in the summarized document corresponding to the identifier of the target document.
In one implementation, after finishing the sorting of the summarized document, the terminal may further scan the content in the target table, and if the information under the set header is the same, merge the information with the same number of lines of the information under the set header, and modify the document identifier at the same time, where the information under the set header should be used to uniquely identify whether the information represents the same attribute, such as an application number, an identity card number, and the like. Wherein, the setting header can be set by a user.
For example, as shown in table 1, the user sets the application number as the setting header, and when the terminal scans the content in the target table, it finds that the application numbers in the first row and the third row in table 1 are the same, the terminal merges the information in the first row and the third row, and fills the merged information in the first row, where which row in table 1 the merged information is filled in can be set by the user, and can be in the first row or the third row, because the position of the information of the sorted document in table 1 changes after the information merging, the identifier of the sorted document needs to be modified, for example, the merged information is filled in the first row, and the identifier of the sorted document corresponding to the information in the third row before merging should be modified from the third row in table 1 to the first row in table 1.
In the embodiment of the invention, the terminal acquires the target table in the summary document, and respectively determines the table heads of the target table files as the content keywords, then, the terminal acquires a document to be sorted according to a set target path, scans the document to be sorted, respectively extracts information corresponding to a plurality of content keywords from the document to be sorted, respectively fills the extracted information corresponding to the plurality of content keywords into a position, matched with each content keyword, in the summary document, further, the terminal adds an identifier to the document to be sorted in the document to be sorted, records the sorting time of the summary document, and periodically acquires the editing time of the document under the target path, and when the target document with the editing time later than the finishing time exists, extracting target information corresponding to the content keywords from the target document, and replacing the target information with information at a position corresponding to the identifier of the target document in the summary document. By implementing the method, the document can be automatically arranged, and the document can be arranged according to the rules set by the user, so that the tedious and error-prone manual operation is solved, and the working efficiency is improved.
Please refer to fig. 4, which is a schematic structural diagram of a document sorting apparatus according to an embodiment of the present invention. The document finishing apparatus includes:
an obtaining module 401, configured to determine multiple content keywords, and obtain a document to be sorted according to a set target path;
an extracting module 402, configured to scan the documents to be sorted, and extract information corresponding to the content keywords from the documents to be sorted, respectively;
a filling module 403, configured to fill the extracted information corresponding to the content keywords into the position, in the summary document, matching with each content keyword.
In an implementation manner, the extracting module 402 is specifically configured to:
acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted;
under the condition that a plurality of content keywords are detected to have target content keywords of which corresponding information is not extracted, scanning the content of the document to be sorted;
and extracting information corresponding to the target content keywords from the content of the document to be sorted.
In an implementation manner, the obtaining module 401 is specifically configured to:
acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time;
scanning all documents under the set target path, and screening out documents matched with the document keywords from all the documents;
and determining the document matched with the document keyword as a document to be sorted.
In an implementation manner, the filling module 403 is specifically configured to:
acquiring a target table in a summary document, and determining a target table head associated with each content keyword from the table heads of the target table;
and for each content keyword, filling the extracted information corresponding to the content keyword into the corresponding position of the target header associated with the content keyword.
In an implementation manner, the obtaining module 401 is specifically configured to:
acquiring a target table in the summary document;
and respectively determining the headers of the target table files as content keywords.
In an implementation manner, the extracting module 402 is specifically configured to scan a current document in the documents to be sorted, respectively obtain information corresponding to the plurality of content keywords from the current document, and extract the obtained information corresponding to the plurality of content keywords into a cache space;
the filling module 403 is specifically configured to fill information corresponding to the content keywords in the current document in the cache space to a position in a summary document that matches each content keyword, and determine whether the current document is a last document of the documents to be sorted, and if not, scan a next document of the current document; if yes, the scanning is finished.
In an implementation manner, the obtaining module 401 is further configured to add an identifier to a sorted document in the document to be sorted, where the identifier is used to mark a position of information of the sorted document in the summarized document, record sorting time of the summarized document, and periodically obtain editing time of the document under the target path;
the extracting module 402 is further configured to extract target information corresponding to the content keyword from a target document when the target document exists, where the editing time is later than the sorting time;
the populating module 403 is further configured to replace the target information with information in a position in the summary document corresponding to the identifier of the target document.
It can be understood that the functions of the functional modules of the document finishing device described in the embodiment of the present invention may be specifically implemented according to the method in the embodiment of the method described in fig. 1 or fig. 2, and the specific implementation process may refer to the description related to the embodiment of the method in fig. 1 or fig. 2, which is not described herein again.
In the embodiment of the present invention, the obtaining module 401 determines a plurality of content keywords, obtains a document to be sorted according to a set target path, the extracting module 402 scans the document to be sorted, and extracts information corresponding to the plurality of content keywords from the document to be sorted, and further, the filling module 403 fills the extracted information corresponding to the plurality of content keywords into a position of the summary document matching with each content keyword. By implementing the method, the document can be automatically arranged, and the document can be arranged according to the rules set by the user, so that the tedious and error-prone manual operation is solved, and the working efficiency is improved.
Please refer to fig. 5, which is a schematic structural diagram of a terminal according to an embodiment of the present invention. The terminal described in this embodiment includes: a processor 501 and a memory 502. The processor 501 and the memory 502 are connected by a bus.
The Processor 501 may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field-Programmable Gate arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 502 may include both read-only memory and random access memory, and provides program instructions and data to the processor 501. A portion of the memory 502 may also include non-volatile random access memory. Wherein, the processor 501, when calling the program instruction, is configured to perform:
determining a plurality of content keywords, and acquiring a document to be sorted according to a set target path;
scanning the documents to be sorted, and respectively extracting information corresponding to the content keywords from the documents to be sorted;
and filling the extracted information corresponding to the content keywords into the position matched with each content keyword in the summary document.
In one implementation, the processor 501 is specifically configured to:
acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted;
under the condition that a plurality of content keywords are detected to have target content keywords of which corresponding information is not extracted, scanning the content of the document to be sorted;
and extracting information corresponding to the target content keywords from the content of the document to be sorted.
In one implementation, the processor 501 is specifically configured to:
acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time;
scanning all documents under the set target path, and screening out documents matched with the document keywords from all the documents;
and determining the document matched with the document keyword as a document to be sorted.
In one implementation, the processor 501 is specifically configured to:
acquiring a target table in a summary document, and determining a target table head associated with each content keyword from the table heads of the target table;
and for each content keyword, filling the extracted information corresponding to the content keyword into the corresponding position of the target header associated with the content keyword.
In one implementation, the processor 501 is specifically configured to:
acquiring a target table in the summary document;
and respectively determining the headers of the target table files as content keywords.
In one implementation, the processor 501 is specifically configured to:
scanning a current document in the documents to be sorted, and respectively acquiring information corresponding to the content keywords from the current document;
extracting the acquired information corresponding to the plurality of content keywords to a cache space;
filling information corresponding to the content keywords of the current document in the cache space into a position matched with each content keyword in a summary document;
judging whether the current document is the last document of the documents to be sorted or not, and if not, scanning the next document of the current document; if yes, the scanning is finished.
In one implementation, the processor 501 is further configured to:
adding an identifier to the sorted documents in the documents to be sorted, wherein the identifier is used for marking the positions of the information of the sorted documents in the summary document;
recording the arrangement time of the summarized documents, and regularly acquiring the editing time of the documents under the target path;
when a target document with the editing time later than the sorting time exists, extracting target information corresponding to the content keywords from the target document;
and replacing the target information with the information of the position corresponding to the identification of the target document in the summary document.
In a specific implementation, the processor 501 and the memory 502 described in this embodiment of the present invention may execute the implementation manner described in the document sorting method provided in fig. 1 or fig. 2 in this embodiment of the present invention, and may also execute the implementation manner of the document sorting apparatus described in fig. 4 in this embodiment of the present invention, which is not described herein again.
In the embodiment of the present invention, the processor 501 may determine a plurality of content keywords, obtain a document to be sorted according to a set target path, scan the document to be sorted, extract information corresponding to the plurality of content keywords from the document to be sorted, and further fill the extracted information corresponding to the plurality of content keywords into a position of the summary document matching each content keyword. Through implementing the mode, the documents can be automatically arranged, the arrangement is carried out according to the rules set by the user, the tedious and error-prone manual operation is solved, and the working efficiency is improved.
An embodiment of the present invention further provides a computer storage medium, where program instructions are stored in the computer storage medium, and when the program is executed, the computer storage medium may include some or all of the steps of the document sorting method in the embodiment corresponding to fig. 1 or fig. 2.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The document sorting method, the document sorting device, the document sorting terminal and the computer-readable storage medium according to the embodiments of the present invention are described in detail, and a specific example is applied to illustrate the principle and the implementation manner of the present invention, and the description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A document finishing method, comprising:
determining a plurality of content keywords, and acquiring a document to be sorted according to a set target path;
scanning the documents to be sorted, and respectively extracting information corresponding to the content keywords from the documents to be sorted;
and filling the extracted information corresponding to the content keywords into the position matched with each content keyword in the summary document.
2. The method according to claim 1, wherein the scanning the document to be collated and extracting information corresponding to the plurality of content keywords from the document to be collated, respectively, comprises:
acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted;
under the condition that a plurality of content keywords are detected to have target content keywords of which corresponding information is not extracted, scanning the content of the document to be sorted;
and extracting information corresponding to the target content keywords from the content of the document to be sorted.
3. The method according to claim 1, wherein the obtaining the document to be collated according to the set target path comprises:
acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time;
scanning all documents under the set target path, and screening out documents matched with the document keywords from all the documents;
and determining the document matched with the document keyword as a document to be sorted.
4. The method according to any one of claims 1 to 3, wherein the filling the extracted information corresponding to the plurality of content keywords into the summary document at the position matched with each content keyword respectively comprises:
acquiring a target table in a summary document, and determining a target table head associated with each content keyword from the table heads of the target table;
and for each content keyword, filling the extracted information corresponding to the content keyword into the corresponding position of the target header associated with the content keyword.
5. The method of claim 4, wherein determining the plurality of content keywords comprises:
acquiring a target table in the summary document;
and respectively determining the headers of the target table files as content keywords.
6. The method according to claim 1, wherein the scanning the document to be collated and extracting information corresponding to the plurality of content keywords from the document to be collated, respectively, comprises:
scanning a current document in the documents to be sorted, and respectively acquiring information corresponding to the content keywords from the current document;
extracting the acquired information corresponding to the plurality of content keywords to a cache space;
the filling the extracted information corresponding to the content keywords into the summary document at the position matched with each content keyword respectively comprises:
filling information corresponding to the content keywords of the current document in the cache space into a position matched with each content keyword in a summary document;
after the extracted information corresponding to the content keywords is respectively filled in the position matched with each content keyword in the summary document, the method further comprises the following steps:
judging whether the current document is the last document of the documents to be sorted or not, and if not, scanning the next document of the current document; if yes, the scanning is finished.
7. The method according to claim 1, wherein after the extracted information corresponding to the plurality of content keywords is respectively filled in a position matching each content keyword in the summary document, the method further comprises:
adding an identifier to the sorted documents in the documents to be sorted, wherein the identifier is used for marking the positions of the information of the sorted documents in the summary document;
recording the arrangement time of the summarized documents, and regularly acquiring the editing time of the documents under the target path;
when a target document with the editing time later than the sorting time exists, extracting target information corresponding to the content keywords from the target document;
and replacing the target information with the information of the position corresponding to the identification of the target document in the summary document.
8. A document collating apparatus characterized by comprising:
the acquisition module is used for determining a plurality of content keywords and acquiring the document to be sorted according to a set target path;
the extraction module is used for scanning the documents to be sorted and respectively extracting information corresponding to the content keywords from the documents to be sorted;
and the filling module is used for respectively filling the extracted information corresponding to the content keywords into the position, matched with each content keyword, in the summary document.
9. A terminal, characterized in that it comprises a processor and a memory, said processor and memory being interconnected, wherein said memory is adapted to store a computer program comprising program instructions, said processor being configured to invoke said program instructions to perform the method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.
CN201910820963.9A 2019-08-29 2019-08-29 Document sorting method, device, terminal and computer readable storage medium Active CN110688349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910820963.9A CN110688349B (en) 2019-08-29 2019-08-29 Document sorting method, device, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910820963.9A CN110688349B (en) 2019-08-29 2019-08-29 Document sorting method, device, terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110688349A true CN110688349A (en) 2020-01-14
CN110688349B CN110688349B (en) 2023-05-26

Family

ID=69108778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910820963.9A Active CN110688349B (en) 2019-08-29 2019-08-29 Document sorting method, device, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110688349B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552666A (en) * 2020-03-23 2020-08-18 苏州沁游网络科技有限公司 Resource acquisition method, device, equipment and storage medium
CN112269870A (en) * 2020-11-03 2021-01-26 北京字跳网络技术有限公司 Document sorting method and device, electronic equipment and computer readable storage medium
CN112800761A (en) * 2020-12-25 2021-05-14 讯飞智元信息科技有限公司 Information backfill method and related electronic equipment and storage medium thereof
CN113505580A (en) * 2021-07-26 2021-10-15 京东科技控股股份有限公司 Method and device for analyzing table file
CN114939532A (en) * 2022-07-11 2022-08-26 河北汇金集团股份有限公司 Sorting method for disordered documents
CN115757915A (en) * 2023-01-09 2023-03-07 佰聆数据股份有限公司 Electronic file online generation method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159998A (en) * 2015-09-08 2015-12-16 海南大学 Keyword calculation method based on document clustering
CN105608068A (en) * 2014-11-17 2016-05-25 三星电子株式会社 Display apparatus and method for summarizing of document
CN106844328A (en) * 2016-08-23 2017-06-13 华南师范大学 A kind of new extensive document subject matter semantic analysis and system
CN107273555A (en) * 2017-08-18 2017-10-20 郑州云海信息技术有限公司 A kind of document information extraction element and method
CN108038095A (en) * 2017-12-15 2018-05-15 四川汉科计算机信息技术有限公司 A kind of document automatic creation method
CN108073616A (en) * 2016-11-14 2018-05-25 北京航天长峰科技工业集团有限公司 A kind of magnanimity document keyword method for quickly retrieving based on big data technology
CN109284427A (en) * 2018-08-30 2019-01-29 上海与德通讯技术有限公司 A kind of document structure tree method, apparatus, server and storage medium
CN109831323A (en) * 2019-01-15 2019-05-31 网宿科技股份有限公司 Management method, management system and the server of server info

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608068A (en) * 2014-11-17 2016-05-25 三星电子株式会社 Display apparatus and method for summarizing of document
CN105159998A (en) * 2015-09-08 2015-12-16 海南大学 Keyword calculation method based on document clustering
CN106844328A (en) * 2016-08-23 2017-06-13 华南师范大学 A kind of new extensive document subject matter semantic analysis and system
CN108073616A (en) * 2016-11-14 2018-05-25 北京航天长峰科技工业集团有限公司 A kind of magnanimity document keyword method for quickly retrieving based on big data technology
CN107273555A (en) * 2017-08-18 2017-10-20 郑州云海信息技术有限公司 A kind of document information extraction element and method
CN108038095A (en) * 2017-12-15 2018-05-15 四川汉科计算机信息技术有限公司 A kind of document automatic creation method
CN109284427A (en) * 2018-08-30 2019-01-29 上海与德通讯技术有限公司 A kind of document structure tree method, apparatus, server and storage medium
CN109831323A (en) * 2019-01-15 2019-05-31 网宿科技股份有限公司 Management method, management system and the server of server info

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HONGXI WEI等: "A multiple instances approach to improving keyword spotting on historical Mongolian document images" *
秦代辉 等: "图书馆图书信息自动整合检索仿真研究" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552666A (en) * 2020-03-23 2020-08-18 苏州沁游网络科技有限公司 Resource acquisition method, device, equipment and storage medium
CN111552666B (en) * 2020-03-23 2021-02-26 苏州沁游网络科技有限公司 Resource acquisition method, device, equipment and storage medium
CN112269870A (en) * 2020-11-03 2021-01-26 北京字跳网络技术有限公司 Document sorting method and device, electronic equipment and computer readable storage medium
CN112800761A (en) * 2020-12-25 2021-05-14 讯飞智元信息科技有限公司 Information backfill method and related electronic equipment and storage medium thereof
CN113505580A (en) * 2021-07-26 2021-10-15 京东科技控股股份有限公司 Method and device for analyzing table file
CN114939532A (en) * 2022-07-11 2022-08-26 河北汇金集团股份有限公司 Sorting method for disordered documents
CN114939532B (en) * 2022-07-11 2022-11-08 河北汇金集团股份有限公司 Sorting method for disordered documents
CN115757915A (en) * 2023-01-09 2023-03-07 佰聆数据股份有限公司 Electronic file online generation method and device

Also Published As

Publication number Publication date
CN110688349B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN110688349B (en) Document sorting method, device, terminal and computer readable storage medium
US20180075138A1 (en) Electronic document management using classification taxonomy
US7324998B2 (en) Document search methods and systems
CN112052749A (en) Archive filing method and device, electronic equipment and computer readable storage medium
US10530957B2 (en) Image filing method
CN107783950A (en) Package insert processing method and processing device
CN109241003B (en) File management method and device
CN102110102A (en) Data processing method and device, and file identifying method and tool
CN110795520B (en) Automatic identification method for association relation between digital geological data packet directory and file
CN117194322A (en) File classification management method, system and computing device
US10803308B2 (en) Apparatus for deciding whether to include text in searchable data, and method and storage medium thereof
CN113536759B (en) Text duplicate checking method, device and equipment
CN114328895A (en) News abstract generation method and device and computer equipment
CN111079375B (en) Information sorting method and device, computer storage medium and terminal
KR102043434B1 (en) Apparatus for manufacturing search report and method for displaying the same
CN111061863B (en) Journal catalog display method, device and equipment
CN111046629B (en) Outline display method, device and equipment
CN117493712B (en) PDF document navigable directory extraction method and device, electronic equipment and storage medium
US9990420B2 (en) Method of searching and generating a relevant search string
CN115168684B (en) Financial archive management method and system
CN117272953B (en) Automatic document information filling method, system and storage medium
US20230326225A1 (en) System and method for machine learning document partitioning
JP7377565B2 (en) Drawing search device, drawing database construction device, drawing search system, drawing search method, and program
JP2003058559A (en) Document classification method, retrieval method, classification system, and retrieval system
CN113821482A (en) Information processing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant