CN115757314A

CN115757314A - File processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN115757314A
Application number: CN202211490225.0A
Authority: CN
Inventors: 陈知生; 陈迎昕
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-03-07

Abstract

The embodiment of the application discloses a file processing method, a file processing device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a first file to be processed; analyzing the first file, and determining first identification information corresponding to the first file; under the condition that the preset processing model is detected to comprise a first knowledge base corresponding to the first identification information, the first file is input into the preset processing model, and the first file is processed based on the first knowledge base to obtain a second file; the preset processing model is constructed according to at least one knowledge base, and the at least one knowledge base comprises a first knowledge base.

Description

File processing method and device, electronic equipment and readable storage medium

Technical Field

The present application belongs to the field of information processing technologies, and in particular, to a file processing method and apparatus, an electronic device, and a readable storage medium.

Background

Currently, the business management needs always fall behind due to system development. People have to export data from multiple systems manually and then manually process various types of report files. When processing a large amount of complicated files, the problem of non-uniform information and the problem of large file format change are often encountered, which results in long processing time and low processing accuracy.

Thus, the current file processing efficiency is low.

Disclosure of Invention

The embodiment of the application provides a file processing method, a file processing device and a readable storage medium, which can solve the problem of low file processing efficiency at present.

In a first aspect, an embodiment of the present application provides a file processing method, where the method includes:

acquiring a first file to be processed;

analyzing the first file, and determining first identification information corresponding to the first file;

under the condition that the preset processing model is detected to comprise a first knowledge base corresponding to the first identification information, the first file is input into the preset processing model, and the first file is processed based on the first knowledge base to obtain a second file;

the preset processing model is constructed according to at least one knowledge base, and the at least one knowledge base comprises a first knowledge base.

In a second aspect, an embodiment of the present application provides a document processing apparatus, including:

the acquisition module is used for acquiring a first file to be processed;

the analysis module is used for analyzing the first file and determining first identification information corresponding to the first file;

the processing module is used for inputting the first file into the preset processing model under the condition that the preset processing model comprises a first knowledge base corresponding to the first identification information, and processing the first file based on the first knowledge base to obtain a second file;

In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, performs the method as in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement a method as in the first aspect or any possible implementation manner of the first aspect.

In the embodiment of the application, the first identification information corresponding to the first file is determined by analyzing the first file to be processed, here, the first knowledge base capable of processing the first file can be conveniently and quickly determined subsequently, and when the first knowledge base corresponding to the first identification information is detected to be included in the preset processing model, the first file is input to the preset processing model constructed according to at least one knowledge base, wherein the at least one knowledge base includes the first knowledge base. And finally, processing the content in the first file based on the first knowledge base to obtain a second file. Here, the first file can be automatically processed based on the preset processing model and the first knowledge base, and file processing efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a document processing method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a default processing model provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a document processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The file processing method provided by the embodiment of the application can be applied to the following application scenarios, which are described below.

Because system development always lags behind the management requirements, even the system development cannot be realized. People have to export data from countless systems and then manually process, various report files. For massive and complex data processing, the following problems are often faced:

first, the problem of information non-uniformity. Generally, in the information collection process, the information sources are multiple, the data file names are also multiple, and the field name laws are not uniform. Such as: as installation completion data, a plurality of calling methods such as 'table 03-opening tool order installation list', 'table 04-opening maintenance staff open completion tool order drilling on the same day', 'table 09A-installation and movement machine tool order return detailed report', 'service opening tool order list information' and the like exist. Second, the problem of cell consolidation. Cell merging can result in data misplacement, which in turn can result in a failure to import into the database.

Then, there is a problem that the file format is largely changed. Also an "installed inventory" may be 53 rows on a given day, 105 rows on a given day, csv format on a given day, xls format or xlsx format on a given day as time progresses. Versions are not uniform from XP to 2016. Finally, there is a problem of large data difference. For example, a file is "731xxxx", another file is "|731xxxx", or "_731xxxx", "731xxxx". Similar situations exist in many industries and in many fields.

Based on the application scenario, the following describes in detail a file processing method provided in the embodiment of the present application.

Fig. 1 is a flowchart of a file processing method according to an embodiment of the present application.

As shown in fig. 1, the document processing method may include steps 110 to 130, and the method is applied to a document processing apparatus, and specifically as follows:

step 110, a first file to be processed is obtained.

Step 120, parsing the first file, and determining first identification information corresponding to the first file.

Step 130, inputting the first file into the preset processing model under the condition that the preset processing model is detected to include a first knowledge base corresponding to the first identification information, and processing the first file based on the first knowledge base to obtain a second file; the preset processing model is constructed according to at least one knowledge base, and the at least one knowledge base comprises a first knowledge base.

Wherein, the knowledge base includes: the preset corresponding relation is used for cleaning information, and the preset format is used for format conversion. The content of the knowledge base may be determined based on the operator input.

In the file processing method provided by the embodiment of the application, the first identification information corresponding to the first file is determined by analyzing the first file to be processed, here, the first knowledge base capable of processing the first file can be conveniently and quickly determined subsequently, and when the first knowledge base corresponding to the first identification information is detected in the preset processing model, the first file is input to the preset processing model constructed according to at least one knowledge base, wherein the at least one knowledge base comprises the first knowledge base. And finally, processing the content in the first file based on the first knowledge base to obtain a second file. Here, the first file can be automatically processed based on the preset processing model and the first knowledge base, and file processing efficiency is improved.

The following describes the contents of steps 110 to 130:

step 110 is involved.

Acquiring a first file to be processed.

The first file may be a table class file (such as a csv file, an xls file, an xlsx file, etc.), a document class file, and the like.

Step 120 is involved.

And analyzing the first file and determining first identification information corresponding to the first file.

The first identification information is used to identify category information of the first file, for example, the first identification information is transportation information, that is, the first file includes data related to transportation.

The first identification information may also be used to identify item information related to the first file, for example, if the first file is associated with item a, the first identification information is item a.

In one possible embodiment, step 120 includes:

under the condition that the keywords in the first file are matched with the preset keywords, determining identification information corresponding to the preset keywords;

and determining the identification information corresponding to the preset keyword as first identification information.

The keywords in the first file may be: "on the way"; the preset keywords may be: in transit, transport, destination; the keyword in the first file may be determined to be matched with the preset keyword, and if the identification information corresponding to the preset keyword is "transportation", the identification information corresponding to the preset keyword is determined to be the first identification information, that is, the first identification information is "transportation".

Here, when the keyword in the first file matches the preset keyword, the identification information corresponding to the preset keyword is determined as the first identification information, so that the first identification information corresponding to the first file can be quickly and accurately determined, and the first knowledge base capable of processing the first file can be conveniently and quickly determined according to the first identification information.

In a possible embodiment, after step 120, the method further comprises:

displaying prompt information under the condition that the first knowledge base corresponding to the first identification information is not detected;

receiving a second input of the prompt message;

in response to the second input, the first knowledge base is established and the edited content corresponding to the second input is stored in the first knowledge base.

And displaying prompt information under the condition that the first knowledge base corresponding to the first identification information is not detected, and establishing the first knowledge base by an operator. Specifically, editing the content may include: the system comprises a file classification rule, an information cleaning rule, a preset corresponding relation for information conversion, a final unified name of each information, a standard preset format, a data type, a data length and the like.

Step 130 is involved.

Under the condition that the preset processing model is detected to comprise a first knowledge base corresponding to the first identification information, the first file is input into the preset processing model, and the first file is processed based on the first knowledge base to obtain a second file; the preset processing model is constructed according to at least one knowledge base, and the at least one knowledge base comprises a first knowledge base.

Wherein, the knowledge base can include: knowledge for identifying and converting file names, knowledge for identifying and converting titles, knowledge for identifying and converting sub-tables in files, knowledge for data cleansing, and knowledge for data merging and data analysis.

In addition, if the first knowledge base corresponding to the first identification information is detected, whether the corresponding sub-identification information exists or not can be continuously searched from the preset rule table, if the corresponding sub-identification information does not exist, the first sub-knowledge base corresponding to the sub-identification information can be newly established and used for recording the preset format and the preset corresponding relation and prompting an operator to give a merger of the original certain preset format or the preset corresponding relation or a newly established preset format or the preset corresponding relation.

For example, the "order number" is confirmed by the operator, and is not the original meaning of the "order number", but the "order number", and the "product order number" is designated.

The preset processing model can be a model developed based on VFP (Visual FoxPro), and is used for developing a database, so that the method is simple and convenient. The VFP which has the same source with office software is selected as a development tool, a preset processing model is built based on at least one knowledge base, the built preset processing model has the auxiliary functions of machine learning and data cleaning, and data conversion is matched.

In one possible embodiment, step 130 includes:

inputting a first file into a preset processing model, and identifying first information and second information in the first file based on a first knowledge base;

and carrying out format conversion processing on the first information in the first file, and carrying out information cleaning processing on the second information in the first file to obtain a second file.

As shown in fig. 3, a first file is input to a preset processing model, first, a first message and a second message in the first file are identified based on a first knowledge base, where semantic information of the first message is consistent with preset semantic information, for example, the preset semantic information is time information, and if the semantic information of the first message is used to represent the time information, it may be determined that the semantic information of the first message is consistent with the preset semantic information, for example, if the semantic information of the first message represents time, there may be first messages in multiple formats, which may specifically include "xx xx xx xx xx/xx/xx xx xx: xx" and "xxxx _ xx _ xx _ xx: xx" and the like.

The second information may be information of a preset field type, where the preset field type includes: personnel, locations, units, products, and the like. For example, if the second information is XX unit, the field type "unit" of the second information may be determined, the information cleansing processing may be performed on the second information in the first file, and the second information after the information cleansing processing may be "XX unit in XX city".

Then, format conversion processing may be performed on the first information in the first file based on the preset format, format conversion processing may be performed on the first information in the first file based on the preset correspondence, information cleaning processing may be performed on the second information in the first file, and the second file may be finally output.

The method for obtaining the second file includes the steps of obtaining the second file by performing format conversion processing on first information in the first file and performing information cleaning processing on second information in the first file, where the first knowledge base includes a preset format and a preset corresponding relationship, and the steps may specifically include the steps of:

based on a preset format, carrying out format conversion processing on first information in the first file to obtain a third file, wherein the third file comprises third information in the preset format, and semantic information corresponding to the first information is consistent with semantic information corresponding to the third information;

and replacing the second information in the third file with fourth information based on a preset corresponding relationship to obtain a second file, wherein the preset corresponding relationship comprises the second information and the fourth information which correspond to each other.

Wherein, the operator can modify the preset format and the preset corresponding relation.

Specifically, the semantic information of the first information may be determined first, such as: location, time, product, etc. And searching a preset format from the first knowledge base according to the semantic information of the first information, and then performing format conversion processing on the first information in the first file based on the preset format to obtain a third file.

Under the condition that the first information is detected, performing format conversion processing on the first information in the first file based on a preset format to obtain a third file, for example:

for example, the various time formats "xxxx/xx/xx xx xx: xx", "xxxx _ xx _ xx xx: xx", "xxxx-xx-xx-xx: xx", "xx-xx-xx-xx" and so on will be unified as: "xxxx. Xx. Xxxx: xx".

Specifically, the field type of the second information may be determined first, for example: personnel, location, products, etc. And searching a preset corresponding relation from the first knowledge base according to the field type of the second information, and replacing the second information in the third file with fourth information based on the preset corresponding relation to obtain a second file.

Under the condition that the second information is detected, replacing the second information in the third file with fourth information based on a preset corresponding relation to obtain a second file, for example:

TABLE 1

Field item	Second information	Fourth information	Operation of
				Maintenance department	Liu A city	Liu A City division	Replacement of
Maintenance department	Ning A county	Ning A county division	Replacement of
				Maintenance department	Wang A county	Wang A county division Co Ltd	Replacement of
Maintenance department	Long A county	Division of Long A county	Replacement of
				Maintenance department	Central office of south China A	District office of Chang A county A	Replacement of
Maintenance department	Branch office of garden A	Liuyang city industry A garden branch office	Replacement of
				Maintenance department	Dong A branch office	Ningxiang county east A division	Replacement of

Therefore, format conversion processing is carried out on the first information in the first file based on the preset format to obtain a third file, and files with different expressions from different data sources can be unified to the preset format. The problem of data source combination with different formats is solved. Especially for scenes where format changes are frequent. And replacing the second information in the third file with the fourth information based on the preset corresponding relation to obtain a second file, and particularly coordinating files from different data sources through data cleaning, standard and unified data expression, thereby greatly facilitating the merging, statistics and analysis of data.

In one possible embodiment, step 130 includes:

under the condition that the preset processing model is detected to comprise a first knowledge base corresponding to the first identification information, receiving a first input of the first file, wherein the first input is used for indicating fifth information in the first file;

and responding to the first input, inputting the first file into a preset processing model, and processing fifth information in the first file based on the first knowledge base to obtain a second file.

The first input is used for indicating fifth information in the first file, and the fifth information is at least part of information selected from the first file by a user. The method specifically comprises the following steps: lines in the first file, sections in the first file, full text in the first file except for the "remarks", and so on. In this case, the data columns which are irrelevant and have huge data, such as 'remarks', can be avoided, and the conversion time is saved. And accurately process the required fifth information, thereby avoiding wasting processing resources.

Here, by responding to the first input to the first file, the fifth information in the first file can be processed to obtain the second file, that is, the first file is selectively converted, and the file processing efficiency is improved.

In a possible embodiment, based on the first knowledge base, title information meeting a preset condition is determined from the first file, and the title information is located in a target row; wherein the preset conditions include: each of the first column to the Nth column of the target row includes a character, and each of the Nth column to the Mth column of the target row does not include a character; wherein N is less than M, and both N and M are positive integers.

Specifically, N consecutive columns of data may be characters starting from the first column, i.e., each of the first to nth columns of the target row includes characters, such as numbers, letters, and words. In addition, considering the case that there may be merged cells in the first file, the preset condition may be adjusted to be that a preset number (e.g., 1) of columns may be missing at most from the first column to the nth column of the target row. If the Nth column to the Mth column are followed, each of the Nth column to the Mth column is all empty column.

Such as: if the header information is a transmission list, each of the first to nth (i.e., 4) columns of the target row includes a character "transmission list", and each of the nth to mth columns of the target row does not include a character and is a blank column.

Therefore, the title information meeting the preset conditions is determined from the first file based on the first knowledge base, and the intelligent title line searching is achieved. The problem of identification conversion of the title line not in the first line is solved. Therefore, the problem of data conversion of the merging cells can be solved on the basis of keeping the logic of rows and columns of the data.

In summary, in the embodiment of the present application, the first identification information corresponding to the first file is determined by parsing the first file to be processed, here, the first knowledge base capable of processing the first file can be conveniently and quickly determined in the subsequent process, and when it is detected that the preset processing model includes the first knowledge base corresponding to the first identification information, the first file is input to the preset processing model constructed according to at least one knowledge base, where the at least one knowledge base includes the first knowledge base. And finally, processing the content in the first file based on the first knowledge base to obtain a second file. Here, the first file can be automatically processed based on the preset processing model and the first knowledge base, and file processing efficiency is improved.

Based on the document processing method shown in fig. 1, an embodiment of the present application further provides a document processing apparatus, as shown in fig. 3, the apparatus 300 may include:

an obtaining module 310, configured to obtain a first file to be processed;

the parsing module 320 is configured to parse the first file and determine first identification information corresponding to the first file;

the processing module 330 is configured to, when it is detected that a preset processing model includes a first knowledge base corresponding to the first identification information, input the first file into the preset processing model, and process the first file based on the first knowledge base to obtain a second file;

wherein the preset processing model is constructed according to at least one knowledge base, the at least one knowledge base comprising the first knowledge base.

In a possible embodiment, the parsing module 320 is specifically configured to:

under the condition that the keywords in the first file are matched with preset keywords, determining identification information corresponding to the preset keywords;

and determining the identification information corresponding to the preset keyword as the first identification information.

In a possible embodiment, the processing module 330 is specifically configured to:

inputting the first file into the preset processing model, and identifying first information and second information in the first file based on the first knowledge base;

and performing format conversion processing on the first information in the first file, and performing information cleaning processing on the second information in the first file to obtain the second file.

In a possible embodiment, the first knowledge base includes a preset format and a preset corresponding relationship, and the processing module 330 is specifically configured to:

based on the preset format, performing format conversion processing on the first information in the first file to obtain a third file, wherein the third file comprises third information in the preset format, and semantic information corresponding to the first information is consistent with semantic information corresponding to the third information;

In a possible embodiment, the apparatus 300 may further include:

the display module is used for displaying prompt information under the condition that the first knowledge base corresponding to the first identification information is not detected;

the receiving module is used for receiving second input of the prompt message;

a village shukumon module to establish the first knowledge base and store edit content corresponding to the second input in the first knowledge base in response to the second input.

In a possible embodiment, the apparatus 300 may further include:

the determining module is used for determining title information meeting preset conditions from the first file based on the first knowledge base, wherein the title information is positioned in a target row;

wherein the preset conditions include: each of the first column to the Nth column of the target row comprises characters, and each of the Nth column to the Mth column of the target row does not comprise characters; wherein N is less than M, and both N and M are positive integers.

In summary, in the embodiment of the present application, the first identification information corresponding to the first file is determined by parsing the first file to be processed, where a first knowledge base capable of processing the first file can be conveniently and quickly determined in the subsequent step, and in a case that it is detected that the preset processing model includes the first knowledge base corresponding to the first identification information, the first file is input to the preset processing model constructed according to at least one knowledge base, where the at least one knowledge base includes the first knowledge base. And finally, processing the content in the first file based on the first knowledge base to obtain a second file. Here, the first file can be automatically processed based on the preset processing model and the first knowledge base, and file processing efficiency is improved.

Fig. 4 shows a hardware structure diagram of an electronic device according to an embodiment of the present application.

The electronic device may comprise a processor 401 and a memory 402 in which computer program instructions are stored.

Specifically, the processor 401 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. The memory 402 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 402 is non-volatile solid-state memory. In a particular embodiment, the memory 402 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically Alterable ROM (EAROM), or flash memory, or a combination of two or more of these.

The processor 401 may implement any of the file processing methods in the illustrated embodiments by reading and executing computer program instructions stored in the memory 402.

In one example, the electronic device may also include a communication interface 403 and a bus 410. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 410 to complete communication therebetween.

The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.

Bus 410 includes hardware, software, or both to couple the components of the electronic device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 410 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The electronic device may execute the file processing method in this embodiment, so as to implement the file processing method described in conjunction with fig. 1 to fig. 2.

In addition, in combination with the file processing method in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement the file processing method of fig. 1-2.

It is to be understood that the present application is not limited to the particular arrangements and instrumentalities described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed at the same time.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A method of file processing, the method comprising:

acquiring a first file to be processed;

under the condition that a first knowledge base corresponding to the first identification information is detected in a preset processing model, inputting the first file into the preset processing model, and processing the first file based on the first knowledge base to obtain a second file;

the preset processing model is constructed according to at least one knowledge base, and the at least one knowledge base comprises the first knowledge base.

2. The method according to claim 1, wherein the parsing the first file and determining the first identification information corresponding to the first file comprises:

3. The method according to claim 1, wherein the inputting the first file into the preset processing model in a case that it is detected that a first knowledge base corresponding to the first identification information is included in a preset processing model, and processing the first file based on the first knowledge base to obtain a second file comprises:

4. The method according to claim 3, wherein the first knowledge base includes a preset format and a preset correspondence, and the performing format conversion processing on the first information in the first file and performing information cleaning processing on the second information in the first file to obtain the second file includes:

and replacing the second information in the third file with fourth information based on the preset corresponding relationship to obtain the second file, wherein the preset corresponding relationship comprises the second information and the fourth information which correspond to each other.

5. The method according to claim 1, wherein the inputting the first file into the preset processing model and processing the first file based on the first knowledge base to obtain a second file when detecting that a preset processing model includes a first knowledge base corresponding to the first identification information includes:

under the condition that a first knowledge base corresponding to the first identification information is detected to be included in a preset processing model, receiving a first input of the first file, wherein the first input is used for indicating fifth information in the first file;

responding to the first input, inputting the first file into the preset processing model, and processing the fifth information in the first file based on the first knowledge base to obtain the second file.

6. The method according to claim 1, wherein after the parsing the first file and determining the first identification information corresponding to the first file, the method further comprises:

displaying prompt information under the condition that a first knowledge base corresponding to the first identification information is not detected;

receiving a second input of the prompt message;

and responding to the second input, establishing the first knowledge base and storing the editing content corresponding to the second input in the first knowledge base.

7. The method of claim 1, further comprising:

determining title information meeting preset conditions from the first file based on the first knowledge base, wherein the title information is positioned in a target row;

8. A document processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a first file to be processed;

the processing module is used for inputting the first file into the preset processing model under the condition that a first knowledge base corresponding to the first identification information is detected in the preset processing model, and processing the first file based on the first knowledge base to obtain a second file;

9. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements a document processing method as claimed in any one of claims 1 to 7.

10. A readable storage medium, characterized in that the computer readable storage medium has stored thereon computer program instructions, which, when executed by a processor, implement the file processing method according to any one of claims 1 to 7.