CN116894005A - File processing method, device, electronic equipment and storage medium - Google Patents
File processing method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116894005A CN116894005A CN202310907058.3A CN202310907058A CN116894005A CN 116894005 A CN116894005 A CN 116894005A CN 202310907058 A CN202310907058 A CN 202310907058A CN 116894005 A CN116894005 A CN 116894005A
- Authority
- CN
- China
- Prior art keywords
- file
- processed
- folder
- determining
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003860 storage Methods 0.000 title claims abstract description 34
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 64
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000004590 computer program Methods 0.000 claims description 16
- 238000003058 natural language processing Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 230000000875 corresponding effect Effects 0.000 description 37
- 230000008569 process Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000004801 process automation Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a file processing method, a file processing device, electronic equipment and a storage medium. The method specifically comprises the following steps: the first one. Acquiring a file database table corresponding to a folder to be processed; determining the corresponding character features of each file to be processed according to at least one file information of each file to be processed in a file database table; the file to be processed is stored in a folder to be processed; determining a classification result of the file to be processed according to the character features and a pre-trained recognition model; and according to the classification result and the file database table, circulating the files to be processed. According to the technical scheme, the accuracy of file classification is improved through character feature recognition, meanwhile, file circulation is carried out according to the classification result, and overall file processing efficiency is improved.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for processing a file, an electronic device, and a storage medium.
Background
Along with the continuous improvement of the digitization degree of each industry, in the normal operation process of enterprises, a great amount of information needs to be processed, such as text information, documents, pictures, compression packages and the like, and operations such as reasonable arrangement, interpretation, processing, circulation and the like need to be performed inside the enterprises, so that each enterprise faces more and more challenges of information processing and management.
Currently, in each industry, a traditional manual processing mode is still adopted to comprehensively process various information in enterprise operation, related staff manually judge the acquired various information and induce and circulate the processed information through terminal equipment and the like. However, the traditional manual processing mode has low efficiency and is easy to cause error processing, and the requirements of various industries of society cannot be met.
Disclosure of Invention
The application provides a file processing method, a file processing device, electronic equipment and a storage medium, so as to improve the efficiency and the accuracy of file processing.
According to an aspect of the present application, there is provided a file processing method, the method including:
acquiring a file database table corresponding to a folder to be processed;
determining the corresponding character features of each file to be processed according to at least one file information of each file to be processed in a file database table; the file to be processed is stored in a folder to be processed;
determining a classification result of the file to be processed according to the character features and a pre-trained recognition model;
and according to the classification result and the file database table, circulating the files to be processed.
According to another aspect of the present application, there is provided a document processing apparatus including:
the library table acquisition module is used for acquiring a file database table corresponding to the folder to be processed;
the characteristic determining module is used for determining the character characteristics corresponding to each file to be processed according to at least one file information of each file to be processed in the file database table; the file to be processed is stored in a folder to be processed;
the file classification module is used for determining a classification result of the file to be processed according to the character features and the pre-trained recognition model;
and the file circulation module is used for circulating the files to be processed according to the classification result and the file database table.
According to another aspect of the present application, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the file processing method according to any one of the embodiments of the present application.
According to another aspect of the present application, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a file processing method according to any embodiment of the present application.
According to the technical scheme, the text characteristics of each file are determined by acquiring the file database table corresponding to the folder to be processed, the text characteristics are identified and classified, and each file is circulated according to the classification result. Through the recognition of the character features, the accuracy of file classification is improved, meanwhile, file circulation is carried out according to classification results, and the overall file processing efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for processing a file according to a first embodiment of the present application;
FIG. 2 is a flowchart of a file processing method according to a second embodiment of the present application;
FIG. 3 is a schematic diagram of a file classification and circulation scheme applicable to a third embodiment of the present application;
FIG. 4 is a schematic diagram of a document processing apparatus according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device implementing a file processing method according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a file processing method according to an embodiment of the present application, where the method may be performed by a file processing device, and the file processing device may be implemented in hardware and/or software, and the file processing device may be configured in an electronic device. As shown in fig. 1, the method includes:
s110, acquiring a file database table corresponding to the folder to be processed.
The folder to be processed may be a folder in which files are to be classified and circulated, that is, the folder to be processed stores files to be processed. The pending folder may be selected by a user. The file processed by the embodiment of the application can be a document, a picture, a compressed package and the like, and the embodiment of the application is not limited to the processed file. The files (whether in documents, pictures or other forms) can be applied to any scene needing file classification and circulation processing, such as various files generated in the running process of a nuclear power plant, and the processing requirements on the files are increased due to the very high safety requirements of the nuclear power plant. The obtaining the file to be processed and the corresponding file database table thereof and the subsequent processing of the file may be based on RPA (Robotic Process Automation, robot flow automation) technology, which is only an example and not limited in this embodiment of the present application.
The file database table may be a data table for performing overall summary on information of various files in the folder to be processed, for example, may be in an excel form. Because the file database table can be constructed in advance, the user can directly acquire the corresponding file database table after selecting the file to be processed.
In an alternative embodiment, before acquiring the file database table corresponding to the folder to be processed, the method may include: acquiring a folder to be processed, taking the folder to be processed as a primary folder, and taking a secondary folder in the folder to be processed as a secondary folder; determining file information of each file to be processed according to the primary folder and the secondary folder; and constructing a file database table according to the file information.
The to-be-processed folder is used as a primary folder, a secondary folder may be stored in the primary folder, the to-be-processed folder is used as a primary folder, the secondary folder is used as a secondary folder, other folders may be stored in the secondary folder, and of course, all files and folders in the secondary folder may be all regarded as the whole of the secondary folder, and only the primary folder and the secondary folder are processed, so that the efficiency of file processing can be improved, for example, all files in the primary folder are used as primary files, and all things in the secondary folder are processed as accessories.
The file information of the file to be processed may be information for characterizing the content, format, storage path, and the like of the file to be processed. And reading the primary folder and the secondary folder (such as traversing names, formats, storage paths and the like of the files and the folders) to acquire various information of the main files and the attachments, uniformly recording the information, and storing the information as a file database table.
Further, determining the file information of each file to be processed according to the primary folder and the secondary folder may include: determining basic file information according to the primary folder; determining the information of the file in the portable format according to the secondary folder; determining a storage path of each file to be processed according to the primary folder and the secondary folder; and taking the file basic information, the portable format file information and the storage path as file information.
Because the files in the primary folder are main files, the basic information of each main file can be obtained by reading all the files in the primary folder. The file base information may be used to characterize the content, type, format, etc. of these files. Wherein, the file basic information may include: file title, file language number and file content data. The document contact number may be a unique, inherent unique number unique to each document. The file content data may be used to characterize what the specific content of the file is, such as a file content profile, etc.
A portable format (Portable Document Format, PDF) file (abbreviated as PDF file) is a general file distribution format, which is not affected by running environment images and is known to have high stability and high versatility. By scanning the secondary folder, PDF files contained in these attachments are determined, and file information (e.g., PDF file names, file numbers, content profiles, etc.) of these PDF files is acquired. It can be understood that, due to the high stability and high versatility of PDF files, more and more users use files in such a format as carriers for transmitting information such as text and pictures. For example, in a suitable scenario in a nuclear power plant, a large number of words and pictures and even hyperlinks need to be stably propagated and reloaded, and it is appropriate to use PDF files.
By scanning the primary folder and the secondary folder, the storage paths of the different files can be determined, which of course includes the storage paths of the primary files and the attachments. And finally, saving the information of various files (main files and PDF files in the attachments) and the storage path as file information, and constructing a file database table and using the file information.
According to the embodiments, the files of different types in the primary folder and the secondary folder are respectively identified by classifying the folders, and the file information and the storage path of the files are determined, so that basis and support are provided for constructing a file database table and subsequent file classification and circulation, and a foundation is further laid for improving the classification accuracy and the file flow efficiency.
S120, determining the corresponding text features of each file to be processed according to at least one file information of each file to be processed in a file database table; the files to be processed are stored in the folders to be processed.
The files to be processed can be files to be classified and circulated and are stored in the folders to be processed. The character features can be identified from the file information, for example, character recognition can be performed in file titles, file language numbers and file content data of various files through OCR (Optical Character Recognition ) technology, character feature extraction is performed through other character feature extraction algorithms, and key characters in the identified characters are selected as character features. Of course, the related text feature extraction algorithm may be a text feature extraction algorithm in related technology, and the embodiment of the present application is not limited.
S130, determining a classification result of the file to be processed according to the character features and the pre-trained recognition model.
The pre-trained recognition model can be a model which is trained in advance and used for classifying files, the model inputs the character characteristics of each file to be processed, and the model outputs the classification matching result of each file to be processed. For example, it may be determined where the corresponding stream targets for the different files to be processed are.
And S140, circulating the file to be processed according to the classification result and the file database table.
According to the classification result, the circulation targets of different files to be processed can be determined, and various file information (including a storage path) of the files to be processed are stored in the file database table, so that different files to be processed can be called according to the file information in the file database table, and the files can be accurately sent to the circulation targets corresponding to the classification result.
According to the technical scheme, the text characteristics of each file are determined by acquiring the file database table corresponding to the folder to be processed, the text characteristics are identified and classified, and each file is circulated according to the classification result. Through the recognition of the character features, the accuracy of file classification is improved, meanwhile, file circulation is carried out according to classification results, and the overall file processing efficiency is improved.
Example two
Fig. 2 is a flowchart of a file processing method according to a second embodiment of the present application, where the determining operation of the file classification result is further refined based on the foregoing embodiments. As shown in fig. 2, the method includes:
s210, acquiring a file database table corresponding to the folder to be processed.
S220, determining the corresponding text features of each file to be processed according to at least one file information of each file to be processed in the file database table; the files to be processed are stored in the folders to be processed.
In an optional implementation manner, the determining, according to at least one file information of each file to be processed in the file database table, a text feature corresponding to each file to be processed may include: and carrying out character recognition on at least one file information according to a preset natural language processing algorithm to obtain at least one character feature.
Among other things, natural language processing (NLP, natural Language Processing) algorithms may be techniques that analyze and recognize text, converting natural language into a form understandable by a computer, such as OCR. And obtaining a plurality of character features through character recognition on different files to be processed. In general, each document to be processed may determine a plurality of text features, so as to improve the accuracy and efficiency of classification.
S230, inputting the character features into a pre-trained recognition model, and determining the confidence between the character features and the places to be classified.
The to-be-classified room can be the place to which the file is finally to be streamed, and the to-be-classified room is matched with the to-be-classified room through the character features due to the fact that the destination of the file to be streamed is to be determined. The confidence level can be the degree of correlation between each character feature and different treatment rooms to be classified, namely the confidence level. That is, the text feature may be highly correlated with a location, and the location may be a destination where the text feature corresponds to the document to be processed. Of course, there may be multiple text features in a document to be processed, each text feature should determine a confidence level, and different text features of each document to be processed may be used as input of a pre-trained recognition model, respectively, to determine a destination to be circulated. For example, the confidence degrees corresponding to different text features can be in the form of distinguishing weights, and the total confidence degrees corresponding to each file to be processed can be obtained by calculating a weighted sum. Of course, the confidence coefficient may be determined by using a confidence coefficient calculation algorithm in the related art, which is not limited in the embodiment of the present application.
S240, determining a classification result of the target circulation department corresponding to the file to be processed according to each confidence coefficient and a preset threshold value.
The preset threshold may be a preset limiting condition for the confidence, and when the confidence of the text feature (or the file to be processed) and a certain room exceeds the preset threshold, the text feature (or the file to be processed) is considered to be highly related to the room, and the room can be used as a target circulation room of the file to be processed, that is, a destination of the file circulation, so that a classification result of each file to be processed is obtained.
It should be noted that, for some files to be processed, there may be files that have undergone accurate classification in advance, so that it may be determined in advance before calculating the confidence level, which files to be processed are classified, and the corresponding classification result may be directly obtained. Or when the confidence coefficient is calculated uniformly, the confidence coefficient of the character features (or files to be classified) and the positions to be classified is 100%, and the corresponding classification result can be directly obtained.
S250, circulation is carried out on the files to be processed according to the classification result and the file database table.
In an optional implementation manner, the transferring the file to be processed according to the classification result and the file database table may include: determining target circulation department room information according to the classification result; determining the processing ending time of the file to be processed according to the file database table; generating a letter receiving processing list corresponding to the file to be processed according to the target circulation department information and the processing ending time; and sending the file to be processed and the receipt processing list to a target circulation processing room.
The processing end time may be a time limit when the file to be processed needs to be processed by the relevant processing room to perform the corresponding job. The time limit of the corresponding operation of the file to be processed is analyzed through each item of file information in the file database table, and of course, any analysis algorithm in the related art can be adopted in the analysis method. The receipt processing order may be an opinion order about how to work the corresponding document to be processed, which may be received by the target circulation department, and may include processing within what time period. Therefore, the corresponding letter receiving processing list is generated according to the target flow processing room information and the determined processing end time. For example: it is recommended that a room complete the processing of a document before a month and day of the year. And sending the file to be processed to a corresponding target circulation department according to the letter receiving processing list.
According to the technical scheme, judgment is performed by determining the confidence coefficient between each character feature and each to-be-circulated place, and the to-be-processed files corresponding to each character feature are classified. The confidence is introduced to classify, so that the accuracy of file classification is further improved, and the problems of high error rate and poor efficiency of manual classification and circulation in the prior art are solved for the automatic circulation process of the files to be processed as a whole.
Example III
The embodiment of the application is a preferred embodiment provided on the basis of the above embodiments, and is suitable for various scenes for generating a large amount of process files, and takes various files to be processed by a nuclear power plant as an example, as shown in fig. 3, and specifically comprises the following steps:
s310, automatically importing files to a nuclear power document management system based on an RPA technology.
The method comprises the steps that an RPA robot is manually started by a user, a folder needing to be imported (namely the folder to be processed) is selected, the RPA robot automatically acquires names and file information of all folders under the folder and records the names and the file information to an EXCEL form, the RPA robot enters a document importing page based on a nuclear power document management system, the steps of automatic login and page switching are included, a folder file, a system document category and a template are automatically associated, the imported file is imported to a document importing node of the document management system, document information of all primary folders and information and storage paths of all PDF accessories in a secondary folder are counted, nuclear power internal corresponding item numbers are generated according to incoming numbers, the nuclear power internal corresponding item numbers are input to a page appointed position, and a receipt processing list is initially created.
S320, classifying the documents based on the natural language processing NLP.
The document classification result is obtained by using a history classification result (namely, a record which is accurately classified in advance) or using natural language processing NLP through document classification analysis. If the history classification result is passed, the RPA robot enters the association history receipt to obtain the history classification result, if the classification result is obtained by natural language processing NLP (can be a pre-trained classification recognition model), the documents are automatically classified by analyzing the document title, the coming document number, the content information and the like, and meanwhile, classification screening is further performed according to the confidence level of the recognition result, so that the efficiency and the accuracy of file management are improved. And finally, generating a corresponding business room of the document and generating a personification opinion to be filled into the created letter receiving processing list.
S330, automatically circulating the document based on the RPA technology.
The RPA robot obtains the document classification result, automatically fills in business departments and anthropomorphic information, calculates the document ending time according to the document title information, and fills in the receipt processing list. Finally, the letter receiving processing list can comprise a content brief introduction of the document to be processed, a processing time limit, a processing opinion of the letter receiving department and the like. And after the input information is finished, automatically sending the letter receiving processing list to the target circulation department.
According to the method and the device, the RPA centralized management and control processing can be adopted, so that the original path of the data can be quickly found when the data is required to be searched and utilized or the documents related to the data are searched later, labor is saved, document classification and storage efficiency is improved, omission or errors possibly caused by manual extraction and arrangement can be reduced, the fault problem between service operation management and an opposite service system can be solved through application of the RPA technology, the manual collection and processing degree of various management requirements is reduced, and the transformation of scientific quantitative management is realized; and the robot technology is added in the processes of index collection, trend prediction, task allocation and treatment efficiency supervision, so that the workload and human factor risk of manual treatment can be greatly reduced, and the service operation efficiency is practically improved. Meanwhile, different files to be classified are identified through training the classification identification model, confidence is calculated, and classification results are determined, so that the classification accuracy of the files is greatly improved. Moreover, the automatic circulation process further improves the working efficiency, and the whole business process can be smoother and more efficient.
Example IV
Fig. 4 is a schematic structural diagram of a document processing device according to a third embodiment of the present application. As shown in fig. 4, the apparatus 400 includes:
a library table obtaining module 410, configured to obtain a file database table corresponding to a folder to be processed;
the feature determining module 420 is configured to determine a text feature corresponding to each file to be processed according to at least one file information of each file to be processed in the file database table; the file to be processed is stored in a folder to be processed;
the file classification module 430 is configured to determine a classification result of the file to be processed according to the text feature and the pre-trained recognition model;
the file circulation module 440 is configured to circulate the file to be processed according to the classification result and the file database table.
According to the technical scheme, the text characteristics of each file are determined by acquiring the file database table corresponding to the folder to be processed, the text characteristics are identified and classified, and each file is circulated according to the classification result. Through the recognition of the character features, the accuracy of file classification is improved, meanwhile, file circulation is carried out according to classification results, and the overall file processing efficiency is improved.
In an alternative embodiment, the apparatus 400 may include:
the folder acquisition module is used for acquiring the folders to be processed, taking the folders to be processed as primary folders and taking secondary folders in the folders to be processed as secondary folders;
the file information acquisition module is used for determining the file information of each file to be processed according to the primary folder and the secondary folder;
and the library table construction module is used for constructing a file database table according to the file information.
In an optional implementation manner, the determining the file information of each file to be processed according to the primary folder and the secondary folder includes:
determining basic file information according to the primary folder;
determining the information of the file in the portable format according to the secondary folder;
determining a storage path of each file to be processed according to the primary folder and the secondary folder;
and taking the file basic information, the portable format file information and the storage path as file information.
In an alternative embodiment, the file basic information may include: file title, file language number and file content data.
In an alternative embodiment, the file classification module 430 may include:
the confidence determining unit is used for inputting the character features into a pre-trained recognition model and determining the confidence between the character features and the to-be-classified chambers;
and determining a classification result of the target circulation department corresponding to the file to be processed according to each confidence coefficient and a preset threshold value.
In an alternative embodiment, the feature determination module 420 may be specifically configured to: and carrying out character recognition on at least one file information according to a preset natural language processing algorithm to obtain at least one character feature.
In an alternative embodiment, the file transfer module 440 may include:
the department information determining unit is used for determining target circulation department information according to the classification result;
the ending time determining unit is used for determining the processing ending time of the file to be processed according to the file database table;
the receiving letter processing list generating unit is used for generating a receiving letter processing list corresponding to the file to be processed according to the target circulation department information and the processing ending time;
and the file circulation unit is used for sending the file to be processed and the receipt processing list to the target circulation department.
The file processing device provided by the embodiment of the application can execute the file processing method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of executing the file processing methods.
Example five
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a file processing method.
In some embodiments, the file processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the file processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the file processing method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present application, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present application are achieved, and the present application is not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.
Claims (10)
1. A method of processing a document, the method comprising:
acquiring a file database table corresponding to a folder to be processed;
determining the corresponding text characteristics of each file to be processed according to at least one file information of each file to be processed in the file database table; the files to be processed are stored in the folders to be processed;
determining a classification result of the file to be processed according to the character features and a pre-trained recognition model;
and according to the classification result and the file database table, circulating the files to be processed.
2. The method according to claim 1, wherein before the obtaining a file database table corresponding to a folder to be processed, the method comprises:
acquiring a folder to be processed, taking the folder to be processed as a primary folder, and taking a secondary folder in the folder to be processed as a secondary folder;
determining file information of each file to be processed according to the primary folder and the secondary folder;
and constructing the file database table according to the file information.
3. The method according to claim 2, wherein determining file information of each of the files to be processed according to the primary folder and the secondary folder comprises:
determining basic file information according to the primary folder;
determining the information of the file in the portable format according to the secondary folder;
determining a storage path of each file to be processed according to the primary folder and the secondary folder;
and taking the file basic information, the portable format file information and the storage path as the file information.
4. A method according to claim 3, wherein the file base information comprises: file title, file language number and file content data.
5. The method according to any one of claims 1-4, wherein said determining the classification result of the document to be processed based on the text feature and a pre-trained recognition model comprises:
inputting the character features into a pre-trained recognition model, and determining the confidence between the character features and the chambers to be classified;
and determining a classification result of the target circulation department corresponding to the file to be processed according to each confidence coefficient and a preset threshold value.
6. The method according to any one of claims 1-4, wherein determining the text feature corresponding to each of the files to be processed according to at least one file information of each of the files to be processed in the file database table includes:
and carrying out character recognition on the at least one file information according to a preset natural language processing algorithm to obtain at least one character feature.
7. The method according to any one of claims 1-4, wherein the circulating the file to be processed according to the classification result and the file database table includes:
determining target circulation department room information according to the classification result;
determining the processing ending time of the file to be processed according to the file database table;
generating a letter receiving processing list corresponding to the file to be processed according to the target circulation department information and the processing ending time;
and sending the file to be processed and the receipt processing list to a target circulation department.
8. A document processing apparatus, comprising:
the library table acquisition module is used for acquiring a file database table corresponding to the folder to be processed;
the characteristic determining module is used for determining character characteristics corresponding to each file to be processed according to at least one file information of each file to be processed in the file database table; the files to be processed are stored in the folders to be processed;
the file classification module is used for determining a classification result of the file to be processed according to the character features and a pre-trained recognition model;
and the file circulation module is used for circulating the files to be processed according to the classification result and the file database table.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the file processing method of any one of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores computer instructions for causing a processor to implement the file processing method of any one of claims 1-7 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310907058.3A CN116894005A (en) | 2023-07-24 | 2023-07-24 | File processing method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310907058.3A CN116894005A (en) | 2023-07-24 | 2023-07-24 | File processing method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116894005A true CN116894005A (en) | 2023-10-17 |
Family
ID=88314727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310907058.3A Pending CN116894005A (en) | 2023-07-24 | 2023-07-24 | File processing method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116894005A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118211941A (en) * | 2024-05-21 | 2024-06-18 | 江苏移动信息系统集成有限公司 | Automatic community work order circulation method and system based on RPA |
-
2023
- 2023-07-24 CN CN202310907058.3A patent/CN116894005A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118211941A (en) * | 2024-05-21 | 2024-06-18 | 江苏移动信息系统集成有限公司 | Automatic community work order circulation method and system based on RPA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390408B (en) | Transaction object prediction method and device | |
CN115982376B (en) | Method and device for training model based on text, multimode data and knowledge | |
CN116894005A (en) | File processing method, device, electronic equipment and storage medium | |
CN115168562A (en) | Method, device, equipment and medium for constructing intelligent question-answering system | |
CN116340831B (en) | Information classification method and device, electronic equipment and storage medium | |
CN117994021A (en) | Auxiliary configuration method, device, equipment and medium for asset verification mode | |
CN113052325A (en) | Method, device, equipment, storage medium and program product for optimizing online model | |
CN113722593B (en) | Event data processing method, device, electronic equipment and medium | |
CN115909357A (en) | Target identification method based on artificial intelligence, model training method and device | |
CN112784600B (en) | Information ordering method, device, electronic equipment and storage medium | |
CN114187448A (en) | Document image recognition method and device, electronic equipment and computer readable medium | |
CN113536788A (en) | Information processing method, device, storage medium and equipment | |
CN113313196A (en) | Annotation data processing method, related device and computer program product | |
CN113010782A (en) | Demand amount acquisition method and device, electronic equipment and computer readable medium | |
CN117541366B (en) | Method and device for predicting winning probability, electronic equipment and storage medium | |
CN113343090B (en) | Method, apparatus, device, medium and product for pushing information | |
CN117272970B (en) | Document generation method, device, equipment and storage medium | |
CN116340864B (en) | Model drift detection method, device, equipment and storage medium thereof | |
CN113032609A (en) | Picture retrieval method and device, electronic equipment and storage medium | |
CN118606887A (en) | Data processing method, device, electronic equipment and storage medium | |
CN115718734A (en) | Document quality determination method and device, electronic equipment and storage medium | |
CN115660363A (en) | Dialogue processing method and device, electronic equipment and storage medium | |
CN117131197A (en) | Method, device, equipment and storage medium for processing demand category of bidding document | |
CN115965817A (en) | Training method and device of image classification model and electronic equipment | |
CN116452834A (en) | Image set generation, sample set generation and automatic test method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |