CN110826619A - File classification method and device of electronic files and electronic equipment - Google Patents

File classification method and device of electronic files and electronic equipment Download PDF

Info

Publication number
CN110826619A
CN110826619A CN201911058977.8A CN201911058977A CN110826619A CN 110826619 A CN110826619 A CN 110826619A CN 201911058977 A CN201911058977 A CN 201911058977A CN 110826619 A CN110826619 A CN 110826619A
Authority
CN
China
Prior art keywords
image
page
file
electronic
layout
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911058977.8A
Other languages
Chinese (zh)
Inventor
赵岳
贾昌鑫
贺敏
刘明
付阳
张学来
张云仙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING THUNISOFT INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING THUNISOFT INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING THUNISOFT INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING THUNISOFT INFORMATION TECHNOLOGY Co Ltd
Priority to CN201911058977.8A priority Critical patent/CN110826619A/en
Publication of CN110826619A publication Critical patent/CN110826619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The embodiment of the disclosure provides a method and a device for classifying files of electronic files and electronic equipment, belonging to the technical field of image processing, wherein the method comprises the following steps: receiving images corresponding to all material pages of the electronic files to be classified; preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page, wherein the layout types comprise a first page, a middle page and a tail page; performing character recognition on all home page images of the electronic file; and performing file category matching of a preset rule according to the character recognition results of all home page images corresponding to the electronic file to obtain the file category of the file corresponding to each home page image. By the processing scheme, the file classification efficiency of the electronic files and the utilization rate of computing resources are improved.

Description

File classification method and device of electronic files and electronic equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for classifying files of electronic files, and an electronic device.
Background
For a long time, the arrangement work of electronic files in a court is manually completed, and the files of cases are usually large, so that the work consumes the labor of the court. With the development of deep learning technology in recent years, computer vision and pattern recognition are greatly improved, automatic recognition and classification of electronic files can be realized through a rule engine and an Artificial Intelligence (AI for short), but the existing technical scheme sequentially recognizes all pages, the required time is long, and the occupied computing resources are high.
Therefore, the existing file classification method of the electronic file has the problems of long time and high occupied computing resources.
Disclosure of Invention
In view of the above, embodiments of the present disclosure provide a method for classifying files of electronic files, which at least partially solves the problems in the prior art.
In a first aspect, an embodiment of the present disclosure provides a file classification method for an electronic file, where the method includes:
receiving images corresponding to all material pages of the electronic files to be classified;
preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page, wherein the layout types comprise a first page, a middle page and a tail page;
performing character recognition on all home page images of the electronic file;
and performing file category matching of a preset rule according to the character recognition results of all home page images corresponding to the electronic file to obtain the file category of the file corresponding to each home page image.
According to a specific implementation manner of the embodiment of the present disclosure, the preprocessing includes image classification, image blank page detection, and image layout classification.
According to a specific implementation manner of the embodiment of the present disclosure, before the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page, the method further includes:
establishing an image layout classification model;
the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page comprises the following steps:
and inputting the image corresponding to each material page corresponding to the electronic portfolio into the image layout classification model to obtain the layout type of the image corresponding to each material page.
According to a specific implementation manner of the embodiment of the present disclosure, the step of establishing the image layout classification model includes:
training a convolutional neural network by using a preset number of sample data to obtain the image layout classification model capable of performing layout classification on the image, wherein the sample data at least comprises a first page image, a middle page image and a tail page image.
According to a specific implementation manner of the embodiment of the present disclosure, the step of performing document type matching of a preset rule according to the character recognition results of all home page images corresponding to the electronic portfolio to obtain a document type of a document corresponding to each home page image includes:
determining all files contained in the electronic volume according to all the first page images of the electronic volume, wherein each first page image corresponds to one file, and each file comprises a first page image, a middle page image and a tail page image, the pages of which are sequentially adjacent;
and performing file category matching of a preset rule according to the character recognition result of each file home page image to obtain the file category of the file corresponding to the home page image.
According to a specific implementation manner of the embodiment of the present disclosure, the image type obtained by the preprocessing includes an attachment type image; the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page further comprises:
classifying the images of each material page to obtain an attachment type image of the electronic file;
after the step of performing file category matching of a preset rule according to the character recognition result of each file home page image to obtain the file category of the file corresponding to the home page image, the method further comprises the following steps:
performing character recognition on each attachment type image;
determining the type of the file corresponding to each attachment type image according to the character recognition result of each attachment type image;
and correspondingly storing the attachment type image and the file of the corresponding category.
According to a specific implementation manner of the embodiment of the present disclosure, the step of performing character recognition on all home page images of the electronic file includes:
and converting all the home page images of the electronic file into a text format through optical character recognition processing.
In a second aspect, an embodiment of the present disclosure provides a file sorting apparatus for electronic files, including:
the receiving module is used for receiving images corresponding to all material pages of the electronic files to be classified;
the system comprises a preprocessing module, a display module and a display module, wherein the preprocessing module is used for preprocessing an image of each material page to obtain a layout type of the image corresponding to each material page, and the layout types comprise a first page, a middle page and a tail page;
the recognition module is used for carrying out character recognition on all home page images of the electronic file;
and the matching module is used for performing file category matching of a preset rule according to the character recognition results of all the home page images corresponding to the electronic file to obtain the file category of the file corresponding to each home page image.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for classifying a file of an electronic file according to the first aspect or any implementation manner of the first aspect.
In a fourth aspect, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the file classification method for an electronic volume in the first aspect or any implementation manner of the first aspect.
In a fifth aspect, the present disclosure also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes the file classification method for an electronic volume in the foregoing first aspect or any implementation manner of the first aspect.
The file classification scheme of the electronic file in the embodiment of the disclosure includes: receiving images corresponding to all material pages of the electronic files to be classified; preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page, wherein the layout types comprise a first page, a middle page and a tail page; performing character recognition on all home page images of the electronic file; and performing file category matching of a preset rule according to the character recognition results of all home page images corresponding to the electronic file to obtain the file category of the file corresponding to each home page image. By the scheme, the file classification efficiency of the electronic files is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart illustrating a document classification of an electronic file according to an embodiment of the present disclosure;
FIG. 2 is a partial flow diagram illustrating a file classification of another electronic file according to an embodiment of the present disclosure;
FIG. 3 is a partial flow diagram illustrating a file classification of another electronic file according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a file sorting apparatus for electronic files according to an embodiment of the present disclosure;
fig. 5 is a schematic view of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the disclosure provides a file classification method for electronic files. The file classification method for the electronic file provided by the embodiment may be executed by a computing device, which may be implemented as software or as a combination of software and hardware, and may be integrally provided in a server, a terminal device, or the like.
Referring to fig. 1, a file classification method for an electronic file provided by an embodiment of the present disclosure includes:
s101, receiving images corresponding to all material pages of the electronic files to be classified;
the file classification method for the electronic file provided by the embodiment of the disclosure can be applied to the file classification process in the scenes of courthouse, public security, inspection yards and the like.
The image receiving module receives all material page images corresponding to the electronic files to be detected and classified, and then the images can be directly sent to the processor for subsequent analysis processing operation, and can also store all the received material page images corresponding to the electronic files to be detected and classified into a preset storage space, and when specific partial images in the electronic files to be detected and classified need to be analyzed and processed, the corresponding images can be obtained from the preset storage space for analysis processing.
In this embodiment, the electronic volume to be detected is set as a court volume, and the received material page images to be detected and classified corresponding to the electronic volume may include at least a complaint material image, an identification material image, a first-approval-decision material image, or certainly may include material page images such as a second-approval-decision material image and a delivery document material image.
S102, preprocessing the image of each material page to obtain a layout type of the image corresponding to each material page, wherein the layout types comprise a first page, a middle page and a tail page;
after receiving all the material pages corresponding to the electronic portfolio, the received all the material page images corresponding to the electronic portfolio may be processed by using preset operations, optionally, the preset operations include image classification, image blank page detection and image layout classification. The image classification operation can divide all received material page images corresponding to the electronic file into at least document type images, attachment type images and other types, the image blank page detection operation can detect blank page images without any content in all the material page images corresponding to the electronic file and eliminate the blank page images, and the image layout classification operation can divide all the images corresponding to the electronic file into a first page image, a middle page image and a tail page image. The type identification of the corresponding file can be carried out only by depending on the first page image, and the file classification process can be effectively simplified.
For example, when an electronic file of a court is preprocessed, image classification operation, image blank page detection operation and image layout classification operation are performed on all material page images corresponding to the electronic file, so that the layout type of an image corresponding to each material page of the electronic file is obtained.
S103, performing character recognition on all home page images of the electronic file;
the character recognition means that the electronic equipment converts characters in the image into a text format, and keywords or keywords in the text can be recognized through subsequent operations.
After the layout type of the image corresponding to each material page of the electronic volume is obtained, a home page type image is extracted, further, all home page images of the electronic volume are converted into a text format through optical character recognition processing, and texts corresponding to all home page images can be sent to a processor or stored into a preset storage space.
S104, performing file category matching of a preset rule according to character recognition results of all home page images corresponding to the electronic portfolio to obtain a file category of a file corresponding to each home page image;
and performing file category matching on the texts corresponding to all the home page images according to a preset algorithm, and obtaining the file category of the file corresponding to each home page image according to keywords or keywords in the texts corresponding to all the home page images.
For example, if keywords in the text corresponding to one of the all home page images have keywords such as judgment, penalty and the like, the document category of the document corresponding to the home page image can be determined to belong to the judgment book category through the preset algorithm.
According to the file classification method for the electronic files provided by the embodiment of the disclosure, the document materials in the electronic files are divided into the first page, the middle page and the tail page, only the first page is subjected to character recognition, and then the result of the character recognition of the first page is matched with the preset rule, so that the file classification of the file corresponding to each image of the first page is obtained, and the file classification efficiency of the electronic files and the utilization rate of computing resources are improved.
In an embodiment, as shown in fig. 2, before the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page, the method may further include:
s201, establishing an image layout classification model;
the electronic device may pre-establish a model for performing layout classification on an image, defined as an image layout classification model, which can analyze an input image to determine whether the input image is a top page image, a middle page image, or a bottom page image.
Optionally, the step of establishing an image layout classification model includes:
training a convolutional neural network by using a preset number of sample data to obtain the image layout classification model capable of performing layout classification on the image, wherein the sample data at least comprises a first page image, a middle page image and a tail page image.
For example, sample image data at least comprising corresponding first pages, middle pages and end pages of the material are respectively collected, 500 parts of each type of image are stored in a database, and a layout classification model is obtained by training a convolutional neural network through the sample image data corresponding to the first pages, the middle pages and the end pages.
S202, the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page comprises the following steps:
and inputting the image corresponding to each material page corresponding to the electronic portfolio into the image layout classification model to obtain the layout type of the image corresponding to each material page.
For example, if the electronic portfolio has 100 corresponding images of the material pages, the image corresponding to each material page corresponding to the electronic portfolio is input into the image layout classification model, so as to obtain the layout type of the image corresponding to each material page.
In addition, in a specific embodiment, as shown in fig. 3, the performing document type matching according to a preset rule according to the character recognition result of all the top page images corresponding to the electronic volume to obtain a document type of a document corresponding to each top page image includes:
s301, determining all files contained in the electronic volume according to all the first page images of the electronic volume, wherein each first page image corresponds to one file, and each file comprises a first page image, a middle page image and a tail page image, the pages of which are sequentially adjacent;
determining the number of files contained in the electronic volume according to all the first page images of the electronic volume, and defining the first page image, the middle page image and the tail page image which are sequentially adjacent in page number as one file.
S302, performing file category matching of a preset rule according to the character recognition result of each file home page image to obtain the file category of the file corresponding to the home page image.
The method has the advantages that each file home page image is converted into a text format, the file type of the corresponding file is obtained through a preset algorithm according to the text content converted from each file home page image, and the file type of the file corresponding to the home page image can be determined only by carrying out character recognition on the home page image of each file, so that the recognition efficiency is improved.
For example, the electronic volume has 100 pages of images, wherein 1 to 2 pages of images are classified into a first page image through a layout, 3 to 10 pages of images are middle page images, and 11 pages of images are end page images, then 1 to 11 pages of images are defined as a file, then the text contents corresponding to the first page images of 1 to 2 pages of images are matched by using a preset rule, and if the first page images of 1 to 2 pages of images belong to the judgment book contents, the files of 1 to 11 pages of images are determined to belong to the judgment book files.
Further, the image type obtained by preprocessing comprises an attachment type image;
the step S102 of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page further includes:
classifying the images of each material page to obtain an attachment type image of the electronic file;
on the basis of the above-described disclosed embodiment, after the document type matching of the preset rule is performed according to the character recognition result of each document home page image in the step S104 to obtain the document type of the document corresponding to the home page image, the method may further include:
performing character recognition on each attachment type image;
determining the type of the file corresponding to each attachment type image according to the character recognition result of each attachment type image;
and correspondingly storing the attachment type image and the file of the corresponding category.
When preprocessing each material page image, the image of each material page is further required to be classified, the corresponding attachment type image in the electronic file is extracted, and the content of the attachment type image is converted into a text format, so that the category of the file corresponding to the attachment is determined, and the attachment type image and the file classified before are correspondingly stored.
For example, when the images are classified, if it is determined that 12 to 14 pages of the electronic file class are the attachment type images, 12 to 14 pages of images are extracted for character recognition, and if it is determined that the files corresponding to 12 to 14 pages of images are the judgment book class files, 12 to 14 pages of images are stored in correspondence with the judgment book class files.
For the condition that one material in the electronic file is composed of a plurality of images, the material is divided into a first page, a middle page and a tail page through a layout classification model, character recognition is carried out only on the first page image, then the file type of the file corresponding to the first page image can be obtained by combining preset rule matching, character recognition is carried out on the attachment type image, the file type of the file corresponding to the attachment type image is determined, and corresponding storage is carried out, so that the pressure of image character recognition service is greatly reduced, and the utilization rate of computing resources and the efficiency of file classification are improved.
Corresponding to the above method embodiment, referring to fig. 4, an embodiment of the present disclosure further provides a file sorting apparatus 40 for electronic files, including:
a receiving module 401, configured to receive images corresponding to all material pages of the electronic files to be classified;
a preprocessing module 402, configured to preprocess the image of each material page to obtain a layout type of the image corresponding to each material page, where the layout type includes a first page, a middle page, and a last page;
the recognition module 403 is configured to perform character recognition on all home page images of the electronic portfolio; and
the matching module 404 is configured to perform file category matching according to a preset rule according to the character recognition results of all home page images corresponding to the electronic portfolio, so as to obtain a file category of a file corresponding to each home page image.
The apparatus shown in fig. 4 can correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
Referring to fig. 5, an embodiment of the present disclosure also provides an electronic device 50, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for classifying a document of an electronic file in the above method embodiments.
The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the file classification method of the electronic volume in the aforementioned method embodiments.
The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the file classification method of an electronic volume in the aforementioned method embodiments.
Referring now to FIG. 5, a schematic diagram of an electronic device 50 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 50 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 50 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 50 to communicate with other devices wirelessly or by wire to exchange data. While the figures illustrate an electronic device 50 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the steps associated with the method embodiments.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, enable the electronic device to perform the steps associated with the method embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method for classifying files of an electronic file, the method comprising:
receiving images corresponding to all material pages of the electronic files to be classified;
preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page, wherein the layout types comprise a first page, a middle page and a tail page;
performing character recognition on all home page images of the electronic file;
and performing file category matching of a preset rule according to the character recognition results of all home page images corresponding to the electronic file to obtain the file category of the file corresponding to each home page image.
2. The method of claim 1, wherein the pre-processing comprises image classification, image blank page detection, and image layout classification.
3. The method of claim 2, wherein prior to the step of preprocessing the image of each material sheet to obtain the layout type of the image corresponding to each material sheet, the method further comprises:
establishing an image layout classification model;
the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page comprises the following steps:
and inputting the image corresponding to each material page corresponding to the electronic portfolio into the image layout classification model to obtain the layout type of the image corresponding to each material page.
4. The method of claim 3, wherein the step of creating an image layout classification model comprises:
training a convolutional neural network by using a preset number of sample data to obtain the image layout classification model capable of performing layout classification on the image, wherein the sample data at least comprises a first page image, a middle page image and a tail page image.
5. The method according to claim 2, wherein the step of performing document type matching of a preset rule according to the character recognition results of all home page images corresponding to the electronic portfolio to obtain a document type of a document corresponding to each home page image comprises:
determining all files contained in the electronic volume according to all the first page images of the electronic volume, wherein each first page image corresponds to one file, and each file comprises a first page image, a middle page image and a tail page image, the pages of which are sequentially adjacent;
and performing file category matching of a preset rule according to the character recognition result of each file home page image to obtain the file category of the file corresponding to the home page image.
6. The method of claim 5, wherein the pre-processed image type comprises an attachment type image;
the step of preprocessing the image of each material page to obtain the layout type of the image corresponding to each material page further comprises:
classifying the images of each material page to obtain an attachment type image of the electronic file;
after the step of performing file category matching of a preset rule according to the character recognition result of each file home page image to obtain the file category of the file corresponding to the home page image, the method further comprises the following steps:
performing character recognition on each attachment type image;
determining the type of the file corresponding to each attachment type image according to the character recognition result of each attachment type image;
and correspondingly storing the attachment type image and the file of the corresponding category.
7. The method of claim 1, wherein the step of performing text recognition on all top page images of the electronic file comprises:
and converting all the home page images of the electronic file into a text format through optical character recognition processing.
8. An apparatus for classifying electronic files, comprising:
the receiving module is used for receiving images corresponding to all material pages of the electronic files to be classified;
the system comprises a preprocessing module, a display module and a display module, wherein the preprocessing module is used for preprocessing an image of each material page to obtain a layout type of the image corresponding to each material page, and the layout types comprise a first page, a middle page and a tail page;
the recognition module is used for carrying out character recognition on all home page images of the electronic file;
and the matching module is used for performing file category matching of a preset rule according to the character recognition results of all the home page images corresponding to the electronic file to obtain the file category of the file corresponding to each home page image.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of file classification of an electronic volume of any of the preceding claims 1-7.
10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for file classification of an electronic volume of any of the preceding claims 1-7.
CN201911058977.8A 2019-11-01 2019-11-01 File classification method and device of electronic files and electronic equipment Pending CN110826619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911058977.8A CN110826619A (en) 2019-11-01 2019-11-01 File classification method and device of electronic files and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911058977.8A CN110826619A (en) 2019-11-01 2019-11-01 File classification method and device of electronic files and electronic equipment

Publications (1)

Publication Number Publication Date
CN110826619A true CN110826619A (en) 2020-02-21

Family

ID=69551886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911058977.8A Pending CN110826619A (en) 2019-11-01 2019-11-01 File classification method and device of electronic files and electronic equipment

Country Status (1)

Country Link
CN (1) CN110826619A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860657A (en) * 2020-07-23 2020-10-30 中国建设银行股份有限公司 Image classification method and device, electronic equipment and storage medium
CN112990177A (en) * 2021-04-13 2021-06-18 太极计算机股份有限公司 Classified cataloguing method, device and equipment based on electronic file files
CN113344510A (en) * 2021-05-28 2021-09-03 方欣科技有限公司 Intelligent tax material online auditing method, device, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188077A (en) * 2019-05-29 2019-08-30 北京市律典通科技有限公司 A kind of electronics folder intelligent method for classifying, device, electronic equipment and storage medium
CN110363102A (en) * 2019-06-24 2019-10-22 北京融汇金信信息技术有限公司 A kind of identification of objects process method and device of pdf document

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188077A (en) * 2019-05-29 2019-08-30 北京市律典通科技有限公司 A kind of electronics folder intelligent method for classifying, device, electronic equipment and storage medium
CN110363102A (en) * 2019-06-24 2019-10-22 北京融汇金信信息技术有限公司 A kind of identification of objects process method and device of pdf document

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860657A (en) * 2020-07-23 2020-10-30 中国建设银行股份有限公司 Image classification method and device, electronic equipment and storage medium
CN112990177A (en) * 2021-04-13 2021-06-18 太极计算机股份有限公司 Classified cataloguing method, device and equipment based on electronic file files
CN113344510A (en) * 2021-05-28 2021-09-03 方欣科技有限公司 Intelligent tax material online auditing method, device, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN112507806B (en) Intelligent classroom information interaction method and device and electronic equipment
CN110674349B (en) Video POI (Point of interest) identification method and device and electronic equipment
CN110826619A (en) File classification method and device of electronic files and electronic equipment
CN110278447B (en) Video pushing method and device based on continuous features and electronic equipment
CN111738316B (en) Zero sample learning image classification method and device and electronic equipment
CN109815448B (en) Slide generation method and device
CN111178056A (en) Deep learning based file generation method and device and electronic equipment
CN111753114A (en) Image pre-labeling method and device and electronic equipment
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN111797822B (en) Text object evaluation method and device and electronic equipment
CN112487883A (en) Intelligent pen writing behavior characteristic analysis method and device and electronic equipment
CN112486337A (en) Handwriting graph analysis method and device and electronic equipment
CN111832354A (en) Target object age identification method and device and electronic equipment
CN113486171B (en) Image processing method and device and electronic equipment
CN111402867B (en) Hybrid sampling rate acoustic model training method and device and electronic equipment
CN111738311A (en) Multitask-oriented feature extraction method and device and electronic equipment
CN110852042A (en) Character type conversion method and device
CN112487897A (en) Handwriting content evaluation method and device and electronic equipment
CN113936271A (en) Text recognition method and device, readable medium and electronic equipment
CN112487774A (en) Writing form electronization method and device and electronic equipment
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN110674348B (en) Video classification method and device and electronic equipment
CN112309180A (en) Text processing method, device, equipment and medium
CN111753836A (en) Character recognition method and device, computer readable medium and electronic equipment
CN112307245B (en) Method and apparatus for processing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination