CN113377958A

CN113377958A - Document classification method and device, electronic equipment and storage medium

Info

Publication number: CN113377958A
Application number: CN202110768945.8A
Authority: CN
Inventors: 韩光耀; 许海洋; 王天祺
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-09-10

Abstract

The disclosure provides a document classification method, a document classification device, an electronic device, a storage medium and a computer program product, and particularly relates to the field of image processing. The specific implementation scheme is as follows: acquiring image features related to an audit service in an image to be classified and text features corresponding to a specified text in the image to be classified; performing feature fusion on the image features and the text features to obtain fusion features; and obtaining a document classification result corresponding to the image to be classified according to the fusion characteristics. According to the scheme, in order to obtain the document classification result corresponding to the image to be classified, the image characteristics related to the auditing service in the image to be classified and the text characteristics corresponding to the specified text in the image to be classified are considered at the same time. Therefore, the accuracy of the document classification result is improved.

Description

Document classification method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical field of computer vision and image processing, and further relates to a document classification method, a document classification device, electronic equipment and a storage medium.

Background

With the vigorous spread of "internet + government services", more and more residents or enterprises begin to transact various businesses online. When a resident or an enterprise transacts various services on the internet, document materials required for transacting the services are often converted into images and uploaded. Taking the transaction of financial auditing business as an example, when a resident or an enterprise transacts the financial auditing business on the internet, checks, deposit agreements and the like need to be photographed and scanned into images and uploaded to a business transaction platform.

Business handling is performed based on images uploaded by residents or enterprises, and the images are often classified first. However, for an image converted from document materials required by an audit service, due to the problems that the document materials required by the audit service are many and similar, the accuracy of the document classification result is often low.

Disclosure of Invention

The present disclosure provides a document classification method, apparatus, electronic device, storage medium, and computer program product to improve accuracy of a document classification result.

According to a first aspect of the present disclosure, there is provided a document classification method, including:

acquiring image features related to an audit service in an image to be classified and text features corresponding to a specified text in the image to be classified;

performing feature fusion on the image features and the text features to obtain fusion features;

and obtaining a document classification result corresponding to the image to be classified according to the fusion characteristics.

According to a second aspect of the present disclosure, there is provided a document classification apparatus, the apparatus comprising:

the image characteristic acquisition module is used for acquiring image characteristics related to an audit service in an image to be classified and text characteristics corresponding to a specified text in the image to be classified;

a fusion feature obtaining module, configured to perform feature fusion on the image feature and the text feature to obtain a fusion feature;

and the classification result obtaining module is used for obtaining a document classification result corresponding to the image to be classified according to the fusion characteristics.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any of the methods of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform any one of the methods of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, enables carrying out any of the methods of the present disclosure.

According to the scheme, the image features related to the auditing service in the image to be classified and the text features corresponding to the specified texts in the image to be classified are subjected to feature fusion, and then the document classification result corresponding to the image to be classified is obtained according to the fusion features. In order to obtain a document classification result corresponding to an image to be classified, image features related to an audit service in the image to be classified and text features corresponding to a specified text in the image to be classified are considered at the same time. Therefore, the accuracy of the document classification result is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a document classification method provided by an embodiment of the present disclosure;

FIG. 2 is a flowchart of a document classification result obtaining method provided in an embodiment of the present disclosure;

FIG. 3 is a flow chart of another document classification result obtaining method provided in an embodiment of the present disclosure;

FIG. 4 is a schematic illustration of a process of document classification provided in an embodiment of the present disclosure;

FIG. 5 is a schematic view of a document sorting apparatus provided in an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The document classification method provided in the embodiment of the present disclosure is shown in fig. 1, which is a flowchart of a document classification method provided in the embodiment of the present disclosure. The method comprises the following steps:

step S101: and acquiring image features related to the auditing service in the image to be classified and text features corresponding to the specified text in the image to be classified.

Step S102: and carrying out feature fusion on the image features and the text features to obtain fusion features.

Step S103: and obtaining a document classification result corresponding to the image to be classified according to the fusion characteristics.

In the document classification method provided in the embodiment of the present disclosure, the execution subject may be a client, where the client is an electronic device for running a target application or software, and the target application or software is a target application or software capable of implementing the document classification method provided in the embodiment of the present disclosure.

In the document classification method provided in the embodiment of the present disclosure, the execution subject may also be a server, where the server is a computing device that provides services such as data processing for a client that runs a specific application or software, and the specific application or software is an application or software that at least has a document classification result acquisition function.

Implementations of so-called electronic devices include, but are not limited to: mobile phones, computers, etc.

The implementation manner of the server is generally a server or a server cluster.

Applications or software include, but are not limited to, cell phone applications, computer applications or software, and web applications or software.

When obtaining a document classification result corresponding to an image to be classified, because many images to be classified are similar, only using image features to classify the document often results in low accuracy of the document classification result of the image to be classified, and only using text features will cause too much document content, which results in a problem of more complex document classification. Therefore, the document classification effect tends to be poor.

The document classification method provided in the embodiment of the disclosure performs feature fusion on image features related to an audit service in an image to be classified and text features corresponding to a specified text in the image to be classified, and then obtains a document classification result corresponding to the image to be classified according to the fusion features. In order to obtain a document classification result corresponding to an image to be classified, image features related to an audit service in the image to be classified and text features corresponding to a specified text in the image to be classified are considered at the same time. Therefore, the accuracy of the document classification result is improved, and the document classification effect is further improved.

The image to be classified may refer to an image including text for auditing a business, and generally includes: images converted from document material used for auditing transactions, images used for auditing transactions, and the like.

The audit service generally refers to a service for performing audit verification on materials used for handling related services to ensure the accuracy of the materials. Common auditing services include financial auditing services and the like.

In order to facilitate clear and brief determination of which image features are image features related to the auditing service, so as to provide a reference for acquiring the image features related to the auditing service in the image to be classified, the image features related to the auditing service can be predetermined before acquiring the image features related to the auditing service in the image to be classified.

In an embodiment of the present disclosure, the image feature related to the audit service at least includes one of the following image features: the method comprises the steps of classifying the texts of the images to be classified, classifying the texts of the texts to be classified, and classifying the texts to be classified.

The features for obtaining the number of tables in the image to be classified can be obtained by the following method: firstly, inputting an image to be classified into a pre-trained table detection model to obtain a table in the image to be classified; then, determining the number of tables in the image to be classified; finally, feature coding is carried out according to the number of the tables in the image to be classified, and features used for representing the number of the tables in the image to be classified are obtained.

The pre-trained table detection model is obtained by training the table detection model by utilizing the sample image and the corresponding labeled table.

The characteristics of acquiring the number of the seals in the image to be classified can be realized by adopting the following modes: firstly, inputting an image to be classified into a seal detection model trained in advance to obtain a seal in the image to be classified; then, determining the number of seals in the image to be classified; and finally, performing feature coding on the number of the seals in the image to be classified to obtain features for representing the number of the seals in the image to be classified.

The seal detection model trained in advance is obtained by training the seal detection model by utilizing the sample image and the correspondingly marked seal.

The features for representing the image area proportion of the table in the image to be classified are obtained as follows: firstly, acquiring the area of an image to be classified based on the image attribute of the image to be classified; secondly, summing the areas of the tables in the images to be classified to obtain the sum of the areas of the tables in the images to be classified; and finally, carrying out feature coding on the image area ratio of the table in the image to be classified to obtain features for expressing the image area ratio of the table in the image to be classified.

The characteristics used for representing the proportion of the handwritten text in the image to be classified in all texts are obtained by the following modes: firstly, acquiring texts, text coordinates and text contents in an image to be classified by utilizing an OCR (Optical Character Recognition) technology and the like; then, obtaining the proportion of the handwritten text part in the text in the image to be classified; and finally, performing feature coding on the proportion of the handwritten text in the image to be classified to obtain features for expressing the proportion of the handwritten text in all the texts in the image to be classified.

The characteristics for representing the proportion of the printed text in the image to be classified in the whole text are obtained as follows: firstly, acquiring texts, text coordinates and text contents in an image to be classified by utilizing an OCR technology and the like; then, obtaining the proportion of the text of the printed text part in the text in the image to be classified; finally, feature coding is carried out on the proportion of the printed text in the image to be classified, and features used for expressing the proportion of the printed text in the image to be classified in all the texts are obtained.

The text features refer to features obtained by feature coding of a specified text in an image to be classified.

The designated text includes at least one of a title in the image to be classified and a text of a designated word number selected in advance in the sequence of the arrangement of the text. That is, the specified text may simply be: the title in the image to be classified may be only the text with the designated number of words preselected according to the sequence of the text arrangement, or may be the title in the image to be classified and the text with the designated number of words preselected according to the sequence of the text arrangement.

The number of words designated may be 250 words, 100 words, etc., and is generally a predetermined number of words based on the prior value and the overall words in the image to be classified.

Because the title is generally a short sentence for identifying materials such as a document, the title in the image to be classified is used as a specified text, and the specified text can often accurately identify the type of the document. The text features corresponding to the specified text are used as a basis for obtaining the document classification result, so that the accuracy of the document classification result can be improved to a great extent.

The text with the designated word number pre-selected according to the sequence of the text arrangement is used as the designated text, and the text with the designated word number is selected, so that the calculation amount can be reduced, and the cost can be saved.

In addition, because the characters arranged in the front of the document are often used for simply and exactly summarizing the content of the document, the text with the specified number of words pre-selected according to the sequence of the text arrangement in the image to be classified is used as the specified text, and the specified text can often accurately identify the type of the document. The text features corresponding to the specified text are used as a basis for obtaining the document classification result, so that the accuracy of the document classification result can be improved to a great extent.

In order to ensure the simplicity of the fused feature acquisition operation, in the embodiment of the present disclosure, feature fusion is performed for image features and text features.

In the embodiment of the present disclosure, feature fusion is to combine image features and text features into one feature, for example, by stitching the image features and the text features into one feature, for example: and if the feature vectors corresponding to the image features and the text features are in the same dimension, adding corresponding elements to the feature vectors corresponding to the image features and the text features.

The concrete implementation mode of splicing the image characteristic and the text characteristic into one characteristic is as follows: and overlapping the features aiming at the image features and the text features, and taking the overlapped features as fusion features.

In addition, in the embodiments of the present disclosure, other manners besides the feature superposition may also be used to obtain the fusion features, which are not described herein again. That is to say, in the embodiments of the present disclosure, the feature fusion may also be implemented in other ways besides the feature superposition.

The feature superposition means that the image feature text features are spliced into one feature in a superposition manner, specifically, if the image feature is regarded as X and the text feature is regarded as Y, X, Y is subjected to feature superposition, and the obtained feature after feature superposition is (X, Y). The superposition of features, as reflected in the feature vector dimension, is as follows: for example, if the feature vector corresponding to X is represented by (X1, X2, X3), and the feature vector corresponding to Y is represented by (Y1, Y2), the feature vector corresponding to the feature (X, Y) after feature superimposition is represented by (X1, X2, X3, Y1, Y2).

Step S103 of the present disclosure may be implemented by any one of the steps corresponding to fig. 2 or fig. 3. Fig. 2 is a flowchart of a method for obtaining a document classification result provided in an embodiment of the present disclosure, and fig. 3 is a flowchart of another method for obtaining a document classification result provided in an embodiment of the present disclosure.

As shown in fig. 2, step S201: and acquiring image features related to the auditing service in the image to be classified and text features corresponding to the specified text in the image to be classified.

Step S202: and carrying out feature fusion on the image features and the text features to obtain fusion features.

Step S203: and inputting the fusion characteristics into the trained image classification model to obtain a document classification result corresponding to the image to be classified.

And the document classification result is obtained by adopting the trained image classification model, so that the accuracy of the document classification result can be ensured.

In order to obtain a document classification result by obtaining a high-precision image classification model, before the fusion features are input into the trained image classification model, the image classification model needs to be trained by using the sample fusion features and the correspondingly labeled sample document classification result to obtain the trained image classification model.

In the model training process, the ending condition of the model training can be preset, including but not limited to: and training for a specified number of times, or enabling the accuracy of the document classification result of the image classification model to reach a corresponding threshold value.

In order to obtain a document classification result corresponding to an image to be classified more simply and clearly, in the embodiment of the present disclosure, a document classification result method as shown in fig. 3 may also be adopted to obtain a document classification result.

Step S301: and acquiring image features related to the auditing service in the image to be classified and text features corresponding to the specified text in the image to be classified.

Step S302: and carrying out feature fusion on the image features and the text features to obtain fusion features.

Step S303: and searching the corresponding relation between the fusion characteristics and the document classification result according to the fusion characteristics to obtain the corresponding document classification result as the document classification result corresponding to the image to be classified.

The correspondence between the fusion features and the document classification results is generally that each fusion feature corresponds to one document classification result, that is, the fusion features and the document classification results are generally in one-to-one correspondence.

The document classification result generally refers to a document type result obtained by dividing a document type of an image to be classified, and the document classification result includes, but is not limited to: a debt business approval bill, a peer deposit agreement, a special business certificate, a handwritten check, a special transfer certificate and a commission charge approval table.

When the execution subject is a server, a process of document classification is shown in fig. 4, which is a schematic diagram of a process of document classification provided in an embodiment of the present disclosure. The specific process is as follows:

first, the server 401 receives a document classification request message sent by the client 402 for an image to be classified.

Secondly, the server 401 acquires, for the document classification request message, image features related to the audit service in the image to be classified and text features corresponding to the specified text in the image to be classified.

The image to be classified may be sent to the server 401 by the client 402, or may be stored in a storage device in the server 401.

Thirdly, the server 401 performs feature fusion on the image feature and the text feature to obtain a fusion feature.

Fourthly, the server 401 inputs the fusion features into the trained image classification model to obtain a document classification result corresponding to the image to be classified.

In FIG. 4, the classification result is taken as a handwritten check as an example.

Fifth, the server 401 provides the classification result of the document to the client 402.

Sixth, client 402 presents the document classification results.

The server is used as an execution main body, so that the calculation amount of the client can be reduced, and the performance of the client is ensured.

In addition, after the document classification result corresponding to the image to be classified is obtained, the document classification result is provided for the client, so that a user can know the document classification result more intuitively.

In addition, the user can know the document classification result more intuitively, and when the execution main body is a user side for running the target application or software, the document classification result can be directly displayed after being obtained.

Because the image to be classified is data in the form of an image, and text in the image cannot be directly extracted, in order to facilitate relevant personnel to obtain keywords in the image to be classified, a keyword extraction model for extracting the keywords from the image to be classified is determined according to a document classification result corresponding to the image to be classified, so as to extract the keywords from the image to be classified. The keyword extraction model is a model for extracting keywords from images corresponding to the document classification result.

Specifically, images to be classified corresponding to different document classification results, document contents and forms in the images, layouts of the images, and the like are generally different. In order to ensure the accuracy of keyword extraction, for different types of images to be classified, different keyword extraction models need to be selected to extract keywords from the images to be classified, so as to obtain keywords in the images to be classified.

Determining a keyword extraction model for extracting keywords from the image to be classified according to the document classification result may be: and according to the document classification result, selecting and acquiring a keyword extraction model for extracting keywords from the image corresponding to the document classification result from the pre-trained character extraction model to extract the keywords of the image to be classified. For example: the method comprises the steps that a liability business approval sheet needs to be subjected to keyword extraction by using a keyword extraction model for extracting keywords from the business approval sheet, and when a document classification result corresponding to an image to be classified is the liability business approval sheet, a keyword extraction model for extracting the keywords from the business approval sheet needs to be selected from pre-trained character extraction models to extract the keywords from the image to be classified.

Determining a keyword extraction model for extracting keywords from the image to be classified according to the document classification result may be: and determining a keyword extraction model for extracting keywords from the images corresponding to the document classification result according to the document classification result, and training and acquiring the keyword extraction model.

When the document classification result corresponding to the image to be classified is obtained, the image characteristics related to the auditing service in the image to be classified and the text characteristics corresponding to the specified text in the image to be classified are considered, so that the accuracy of the document classification result can be improved. Therefore, the keyword extraction model is selected based on the document classification result, and the accuracy of selecting the keyword extraction model can be improved.

According to an embodiment of the present disclosure, there is also provided a document sorting apparatus, as shown in fig. 5, which is a schematic diagram of a document sorting apparatus provided in an embodiment of the present disclosure. The device, comprising:

an image feature obtaining module 501, configured to obtain image features related to an audit service in an image to be classified and text features corresponding to a specified text in the image to be classified;

a fusion feature obtaining module 502, configured to perform feature fusion on the image feature and the text feature to obtain a fusion feature;

a classification result obtaining module 503, configured to obtain a document classification result corresponding to the image to be classified according to the fusion feature.

In one embodiment, the fused feature obtaining module 502 further comprises:

and the feature superposition submodule is used for superposing the features of the image features and the text features and taking the superposed features as the fusion features.

In one embodiment, the classification result obtaining module 503 further includes:

and the first classification result obtaining submodule is used for inputting the fusion characteristics into a trained image classification model to obtain a document classification result corresponding to the image to be classified.

In an embodiment, the classification result obtaining module 503 further includes:

and the model training submodule is used for training the image classification model by using the sample fusion characteristics and the correspondingly labeled sample document classification result to obtain the trained image classification model.

and the second classification result obtaining submodule is used for searching the corresponding relation between the fusion characteristics and the document classification results according to the fusion characteristics to obtain corresponding document classification results which are used as the document classification results corresponding to the images to be classified.

In one embodiment, the image features include at least one of the following image features:

features for representing the number of tables in the image to be classified;

the characteristic is used for representing the number of the seals in the image to be classified;

the characteristic is used for representing the image area ratio of the table in the image to be classified;

the characteristic is used for representing the proportion of the handwritten text in all texts in the image to be classified;

and the characteristic is used for representing the proportion of the printed text in the image to be classified in all the texts.

In one embodiment, the specified text includes at least one of: a title in the image to be classified; and pre-selecting the text with the designated word number according to the sequence of the text arrangement.

In one embodiment, the image feature obtaining module 501 further includes:

the request message receiving submodule is used for receiving a document classification request message sent by a client aiming at the image to be classified;

and the image feature obtaining submodule is used for obtaining the image features related to the auditing service in the image to be classified and the text features corresponding to the specified texts in the image to be classified aiming at the document classification request message.

In one embodiment, the apparatus further comprises:

and the keyword extraction model selection module is used for determining a keyword extraction model for extracting keywords from the image to be classified according to the document classification result.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 606 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as a document classification method. For example, in some embodiments, the document classification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM603 and executed by the computing unit 601, one or more steps of the document classification method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the document classification method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of classifying a document, comprising:

2. The method according to claim 1, wherein the performing feature fusion on the image feature and the text feature to obtain a fused feature comprises:

and superposing the features of the image features and the text features, and taking the superposed features as the fusion features.

3. The method according to claim 1, wherein the obtaining of the document classification result corresponding to the image to be classified according to the fusion feature comprises:

and inputting the fusion characteristics into a trained image classification model to obtain a document classification result corresponding to the image to be classified.

4. The method of claim 3, further comprising:

and training the image classification model by using the sample fusion characteristics and the correspondingly labeled sample document classification result to obtain the trained image classification model.

5. The method according to claim 1, wherein the obtaining of the document classification result corresponding to the image to be classified according to the fusion feature comprises:

and searching the corresponding relation between the fusion features and the document classification results according to the fusion features, and obtaining the corresponding document classification results to be used as the document classification results corresponding to the images to be classified.

6. The method of claim 1, wherein the image features comprise at least one of:

features for representing the number of tables in the image to be classified;

7. The method of claim 1, wherein the specified text comprises at least one of: a title in the image to be classified; and pre-selecting the text with the designated word number according to the sequence of the text arrangement.

8. The method according to claim 1, wherein the obtaining of the image features related to the audit service in the image to be classified and the text features corresponding to the specified text in the image to be classified comprises:

receiving a document classification request message sent by a client aiming at the image to be classified;

and acquiring image features related to the auditing service in the image to be classified and text features corresponding to the specified text in the image to be classified aiming at the document classification request message.

9. The method of claim 1, further comprising:

and determining a keyword extraction model for extracting keywords of the image to be classified according to the document classification result.

10. A document sorting apparatus, comprising:

11. The apparatus of claim 10, wherein the fused feature obtaining module comprises:

12. The apparatus of claim 11, wherein the classification result obtaining module comprises:

13. The apparatus of claim 12, wherein the classification result obtaining module further comprises:

14. The apparatus of claim 10, wherein the classification result obtaining module comprises:

15. The apparatus of claim 10, wherein the image features comprise at least one of:

features for representing the number of tables in the image to be classified;

16. The apparatus of claim 10, wherein the specified text comprises at least one of: a title in the image to be classified; and pre-selecting the text with the designated word number according to the sequence of the text arrangement.

17. The apparatus of claim 10, wherein the image feature acquisition module comprises:

18. The apparatus of claim 10, further comprising:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.