CN114067338A

CN114067338A - Information extraction method, device and medium

Info

Publication number: CN114067338A
Application number: CN202111167852.6A
Authority: CN
Inventors: 秦波; 辛晓哲
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-02-18

Abstract

The embodiment of the application provides an information extraction method, an information extraction device and an information extraction medium, and relates to the technical field of computers and artificial intelligence. The method comprises the following steps: acquiring a target image, wherein the target image comprises at least one information unit; acquiring an information extraction model, wherein the information extraction model is obtained by training at least one information unit in a training sample image on training labels of a plurality of characteristic types; and extracting first characteristic information of at least one information unit in the target image as target characteristic information through the information extraction model. The technical scheme of the embodiment of the application can improve the accuracy of information extraction.

Description

Information extraction method, device and medium

Technical Field

The present application relates to the field of computer and artificial intelligence technologies, and in particular, to an information extraction method, apparatus, and medium.

Background

In an information extraction scenario, for example, in an information extraction scenario (for example, extracting a formula or text in an image) in an image, information in the image is usually extracted by sequentially segmenting, identifying, and post-processing information units in the image. However, in the processes of sequentially segmenting, identifying and post-processing information units in an image, errors may be accumulated, which may cause inaccurate information extraction in the image.

Disclosure of Invention

Embodiments of the present application provide an information extraction method, apparatus, computer program product or computer program, and computer readable medium, which can improve the accuracy of information extraction at least to a certain extent.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided an information extraction method, including: acquiring a target image, wherein the target image comprises at least one information unit; acquiring an information extraction model, wherein the information extraction model is obtained by training at least one information unit in a training sample image on training labels of a plurality of characteristic types; and extracting first characteristic information of at least one information unit in the target image as target characteristic information through the information extraction model.

According to an aspect of an embodiment of the present application, there is provided an information extraction apparatus including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target image which comprises at least one information unit; the second acquisition unit is used for acquiring an information extraction model, and the information extraction model is obtained by training at least one information unit in a training sample image on training labels of a plurality of characteristic types; and the extracting unit is used for extracting the first characteristic information of at least one information unit in the target image as target characteristic information through the information extracting model.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit includes: a third acquisition unit configured to acquire a training sample image; a determining unit, configured to determine, for at least one information unit in the training sample image, training labels on multiple feature types to obtain multiple training labels, where each training label is used to characterize feature information of the at least one information unit on a corresponding feature type; and the fourth obtaining unit is used for obtaining a model to be trained, and training the model to be trained through the training sample image and the training labels to obtain the information extraction model.

In some embodiments of the present application, based on the foregoing scheme, the third obtaining unit is configured to: acquiring at least one frame of original training sample image; the method comprises the steps of scaling the height or width of each frame of original training sample image to a preset image height or a preset image width, and scaling the width or height of the original training sample image according to the scaling of the height or the width to obtain a preprocessed training sample image; selecting a predetermined number of images from the pre-processed training sample images as the training sample images.

In some embodiments of the present application, based on the foregoing scheme, the third obtaining unit is configured to: sequencing the preprocessed training sample images according to the width or the height of the preprocessed training sample images; and selecting a preset number of images which are connected in sequence from the preprocessed training sample images as the training sample images.

In some embodiments of the present application, based on the foregoing solution, the information unit includes an explicit information unit and an implicit information unit, and the determining unit is configured to: acquiring an information unit dictionary matched with each feature type, wherein at least feature vectors corresponding to all dominant information units are recorded in the information unit dictionary; and constructing a training label on a corresponding feature type for at least one information unit in the training sample image based on the feature vector recorded in each information unit dictionary.

In some embodiments of the present application, based on the foregoing scheme, the fourth obtaining unit is configured to: searching and acquiring an encoder model and a decoder model through a network structure, wherein the encoder model is used for encoding an image, and the decoder model is used for decoding the characteristics encoded by the encoder to obtain second characteristic information of at least one information unit in the image; and constructing the model to be trained based on the encoder model and the decoder model.

In some embodiments of the present application, based on the foregoing scheme, the fourth obtaining unit is configured to: training the model to be trained through the training sample images and the training labels according to the preset training times to obtain an information extraction reference model of the preset training times; and carrying out averaging processing on the information extraction reference model with the preset training times to obtain the information extraction model.

In some embodiments of the present application, based on the foregoing scheme, the fourth obtaining unit is configured to: inputting the training sample image to the model to be trained, and acquiring third feature information of at least one information unit in the training sample image output by the model to be trained; determining error information existing on the corresponding feature type in the third feature information based on the training label corresponding to each feature type; and on the basis of error information existing on each feature type, respectively and reversely updating model parameters in the model to be trained through a preset loss function corresponding to each feature type to obtain the information extraction reference model.

In some embodiments of the present application, based on the foregoing scheme, the fourth obtaining unit is configured to: selecting a training label corresponding to a target feature type from the plurality of training labels; inputting the training sample image to the model to be trained, and acquiring third feature information of at least one information unit in the training sample image output by the model to be trained; determining target error information existing on the target feature type in the third feature information based on a training label corresponding to the target feature type; based on the target error information, carrying out reverse update on model parameters in the model to be trained through a preset loss function corresponding to the target characteristic type to obtain an intermediate information extraction reference model; and taking the intermediate information extraction reference model as a new model to be trained, and re-executing the step of selecting a training label corresponding to the target feature type from the plurality of training labels until all the labels in the plurality of training labels are selected to obtain the information extraction reference model.

In some embodiments of the present application, based on the foregoing scheme, the information unit includes a character unit, and the determining unit is configured to: determining a training label on a positioning feature for at least one character unit in the training sample image, wherein the positioning feature is used for at least characterizing the relative relationship feature between the at least one character unit; a training label on a shape feature is determined for at least one character unit in the training sample image.

In some embodiments of the present application, based on the foregoing solution, the extracting unit is configured to: for each target information unit of the at least one information unit, determining all information units and partial information units arranged before the target information unit; predicting first reference feature information of the target information unit based on feature information of all the information units; predicting second reference feature information of the target information unit based on feature information of the partial information unit; determining the first feature information based on the first reference feature information and the second reference feature information of each of the at least one information unit.

In some embodiments of the present application, based on the foregoing solution, the information units include character units, the at least one information unit constitutes one or more formulas containing the character units, and the apparatus further includes an editing unit configured to edit the one or more formulas in the target image to a formula editing area based on the target characteristic information after extracting the first characteristic information of the at least one information unit in the target image as the target characteristic information through the information extraction model.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the information extraction method as described in the above embodiments.

There is also provided, in accordance with an aspect of the embodiments of the present application, an information extraction device, including a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs including instructions for performing the information extraction method as described in the above embodiments.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded into and executed by a processor to implement the operations performed by the information extraction method as described in the above embodiments.

In the technical solution provided by some embodiments of the present application, first feature information of at least one information unit in a target image may be extracted by training an information extraction model obtained by training at least one information unit in a sample image on training labels of a plurality of feature types. The training labels of the information units on the multiple feature types take supervision feature information of the information units on the multiple feature types in the image into consideration, so that the trained information extraction model has the capability of accurately extracting information, and the information extraction model is used for extracting feature information reflected by at least one information unit in the target image, so that the accuracy of information extraction can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;

FIG. 2 shows a flow diagram of an information extraction method according to one embodiment of the present application;

FIG. 3 illustrates a detailed flow diagram for obtaining an information extraction model according to one embodiment of the present application;

FIG. 4 illustrates a detailed flow diagram for acquiring training sample images according to one embodiment of the present application;

FIG. 5 illustrates a detailed flow diagram for determining training labels on a plurality of feature types for at least one information unit in the training sample image according to one embodiment of the present application;

FIG. 6 illustrates a detailed flow diagram for obtaining a model to be trained according to one embodiment of the present application;

FIG. 7 illustrates a detailed flow diagram for training the model to be trained via the training sample images and the plurality of training labels according to one embodiment of the present application;

FIG. 8 illustrates a detailed flow diagram for training the model to be trained via the training sample images and the plurality of training labels according to one embodiment of the present application;

FIG. 9 illustrates a detailed flow diagram for training the model to be trained via the training sample images and the plurality of training labels according to one embodiment of the present application;

FIG. 10 shows a block diagram of an information extraction model according to one embodiment of the present application;

FIG. 11 shows a detailed flowchart of extracting first feature information of at least one information unit in the target image according to one embodiment of the present application;

FIG. 12 shows a block diagram of an information extraction apparatus according to one embodiment of the present application;

FIG. 13 shows a block diagram of an information extraction apparatus according to one embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It is noted that the terms first, second and the like in the description and claims of the present application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than those illustrated or described herein.

Embodiments in this application relate to techniques related to artificial intelligence, i.e., fully automated processing of data (e.g., image data) is achieved through artificial intelligence. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (such as one or more of the smart phone 101, the tablet computer 102, and the portable computer 103 shown in fig. 1, and certainly may be a desktop computer, etc., but is not limited thereto, and the present application is not limited thereto), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

In an embodiment of the application, when a user needs to identify feature information reflected by at least one information unit in a target image, the target image may be sent to the server 105 through the terminal device, the server 105 acquires an information extraction model after acquiring the target image, wherein the information extraction model is obtained by training at least one information unit in a training sample image on training labels of a plurality of feature types, and then the server 105 extracts first feature information of at least one information unit in the target image as target feature information through the information extraction model.

The first characteristic information proposed in the present application may be all characteristic information of at least one information unit, or may be partial characteristic information of at least one information unit. Specifically, the first feature information may include feature information of the information units themselves, information of features of relative relationships (e.g., relative positional relationships) between the information units, and information of features of the relative relationships between the information units and the information of the feature information of the information units themselves.

For example, taking the scene of the formula in the recognition image as an example, the information unit may be a character unit in the formula, and it is understood that the first feature information of the character unit in the formula may include shape feature information of each character unit and/or relative position relationship feature information between the character units.

In this implementation, the information extraction model trained by the training labels of the at least one information unit in the training sample image on the plurality of feature types has the capability of accurately extracting information, and the accuracy of information extraction can be improved by extracting the feature information reflected by the at least one information unit in the target image through the information extraction model.

It should be noted that the information extraction method provided in the embodiment of the present application may be executed by the server 105, and accordingly, the information extraction device is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the information extraction scheme provided by the embodiments of the present application.

It should also be noted that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. According to implementation needs, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like.

It should be explained that cloud computing (cloud computing) as described above is a computing model that distributes computing tasks over a large pool of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud can be infinitely expanded to users, and can be acquired at any time, used as required and expanded at any time. The cloud computing resource pool mainly comprises computing equipment (which is a virtualization machine and comprises an operating system), storage equipment and network equipment.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

fig. 2 shows a flowchart of an information extraction method according to an embodiment of the present application, which may be performed by a device having a calculation processing function, such as the server 105 shown in fig. 1, or may be performed by a terminal device shown in fig. 1. Referring to fig. 2, the information extraction method at least includes steps 210 to 250, which are described in detail as follows:

in step 210, a target image is acquired, the target image including at least one information unit.

In the application, the proposed information extraction scheme can be applied to an information recognition scene of a target object in an image, such as recognizing a formula in the image, recognizing text in the image, and recognizing some specific patterns in the image. Further, the target object in the image may be composed of at least one information unit, for example, the formula or text in the image may be composed of at least one character unit, or for example, some specific pattern in the image may be composed of at least one graphic unit.

In the application, the target image may be obtained through a page area containing the target object in the capture interface, or the target image may be directly obtained locally.

With continued reference to FIG. 2, in step 230, an information extraction model is obtained that is trained by training labels on a plurality of feature types for at least one information unit in a training sample image.

In the present application, the information identification of the target object in the image may be implemented by a pre-constructed information extraction model, and the information extraction model proposed in the present application may be obtained by training at least one information unit in the sample image on training labels of multiple feature types.

In order to make the information extraction model proposed in the present application more clearly understood, the details of the information extraction model acquisition technology will be described in detail below:

in one embodiment of step 230 shown in FIG. 2, the obtaining of the information extraction model may be performed according to the steps shown in FIG. 3.

Referring to FIG. 3, a detailed flow diagram of obtaining an information extraction model according to one embodiment of the present application is shown. Specifically, the method comprises steps 231 to 233:

step 231, acquiring a training sample image.

Step 232, determining training labels on multiple feature types for at least one information unit in the training sample image, and obtaining multiple training labels, where each training label is used to represent feature information of the at least one information unit on a corresponding feature type.

And 233, acquiring a model to be trained, and training the model to be trained through the training sample image and the training labels to obtain the information extraction model.

In one embodiment of step 231 shown in FIG. 3, the acquiring of the training sample image may be performed according to the steps shown in FIG. 4.

Referring to FIG. 4, a detailed flow diagram for acquiring training sample images is shown, according to one embodiment of the present application. Specifically, steps 2311 to 2313:

at step 2311, at least one frame of original training sample image is obtained.

Step 2312, the height or width of each frame of original training sample image is scaled to a preset image height or a preset image width, and the width or height of the original training sample image is scaled according to the scaling of the height or the width to obtain a preprocessed training sample image.

At step 2313, a predetermined number of images are selected from the preprocessed training sample images as the training sample images.

In this embodiment, the original training sample image may refer to an initially acquired sample image, which may be one frame, or multiple frames, such as one thousand frames, or ten thousand frames, and the specific number of sample images may be determined according to actual conditions. At least one information unit may be included in each frame sample image.

It should be noted that there may be cases where the original sample image is not uniform in size for the initially acquired sample image, and for this case, the original sample image needs to be preprocessed. Specifically, the height or width of each frame of original training sample image may be scaled to a preset image height or a preset image width, and the width or height of the original training sample image may be scaled according to the scaling of the height or the width.

For example, the original sample image includes 5 frames, wherein the height and width dimensions are 15 × 30 for the first frame, 5 × 15 for the second frame, 20 × 30 for the third frame, 5 × 30 for the fourth frame, and 15 × 45 for the fifth frame, respectively. Take the example of enlarging the width of the sample image to the preset image width 60. It can be seen that the width magnification ratio of the first frame image is 2 times, the width magnification ratio of the second frame image is 4 times, the width magnification ratio of the third frame image is 2 times, the width magnification ratio of the fourth frame image is 2 times, the width magnification ratio of the first frame image is 2 times, and the width magnification ratio of the first frame image is 4/3 times.

The height of the image is amplified according to the same amplification ratio as the width of the image, and the height and width of the obtained preprocessed training sample image are respectively 30 × 60 for the first frame, 20 × 60 for the second frame, 40 × 60 for the third frame, 10 × 60 for the fourth frame and 20 × 60 for the fifth frame.

After obtaining the pre-processed training sample images, a predetermined number of images may be selected from the pre-processed training sample images as the training sample images.

In the present application, the height or width of each frame of original training sample image is scaled to the preset image height or preset image width, which is beneficial in that the model training can be accelerated in the subsequent process.

Further, in step 2313 shown in fig. 3, the selecting a predetermined number of images from the preprocessed training sample images as the training sample images may be performed by:

first, the preprocessed training sample images may be ranked according to their width or height. Then, a predetermined number of images connected in series are selected from the preprocessed training sample images as the training sample images.

For example, taking the pre-processed training sample images with the height and width dimensions of 30 × 60 for the first frame, 20 × 60 for the second frame, 40 × 60 for the third frame, 10 × 60 for the fourth frame, and 20 × 60 for the fifth frame as an example, the pre-processed training sample images may be sorted according to the height of the pre-processed training sample images to obtain the sorted fourth frame 10 × 60, 20 × 60 for the fifth frame, 20 × 60 for the second frame, 30 × 60 for the first frame, and 40 × 60 for the third frame.

Further, if the predetermined number is set to 4, the fourth frame, the fifth frame, the second frame, the first frame, or the fifth frame, the second frame, the first frame, and the third frame may be selected as the training sample image.

In the present application, when training a model based on training sample images, the sizes of the training sample images may be kept consistent in order to optimize the training effect. Based on this, when the height or width of the training sample images is consistent and the width or height of the images is inconsistent, the width or height of the training sample images can be supplemented with reference to the maximum width or maximum height of the training sample images so that the height and width of the training sample images are completely consistent.

Before the above, the preprocessing training sample images are sequenced, and a predetermined number of images connected in sequence are selected from the preprocessing training sample images as the training sample images, so that the difference between the maximum width and the minimum width or between the maximum height and the minimum height in the training sample images can be avoided from being too large, and the width or the height of the training sample images is prevented from being supplemented with too much invalid information, so that the calculation amount of invalid data by a computer is reduced in the model training process, and computer resources are saved.

In this application, the information unit may include an explicit information unit and an implicit information unit. The explicit unit may refer to an information unit contained in the image that can be directly seen, and the implicit information unit may refer to an information unit contained in the image that cannot be directly seen.

For example, taking the scene of the formula or the text in the image as an example, each character unit composing the formula or the text is an explicit information unit, and a relative relationship between each character unit (for example, a position relationship between a base number and an exponent in the formula) is an implicit information unit.

In one embodiment of step 232 shown in FIG. 3, the determining of training labels on a plurality of feature types for at least one information unit in the training sample image may be performed according to the steps shown in FIG. 5.

Referring to FIG. 5, a detailed flow diagram of determining training labels on a plurality of feature types for at least one information unit in the training sample image is shown, according to one embodiment of the present application. Specifically, the method comprises steps 2321 to 2322:

step 2321, an information unit dictionary matched with each feature type is obtained, and at least feature vectors corresponding to all dominant information units are recorded in the information unit dictionary.

Step 2322, based on the feature vector recorded in each information unit dictionary, a training label on the corresponding feature type is constructed for at least one information unit in the training sample image.

In order to make the person skilled in the art better understand the present embodiment, the following continues to use an application scenario for recognizing a formula or text in an image as an example, and specifically, in step 232 shown in fig. 2, the feature type may include a location feature of a character unit and a shape feature of the character unit in the application scenario.

Further, the determining of the training labels on the plurality of feature types for the at least one information unit in the training sample image may include the following two types:

first, a training label is determined for at least one character unit in the training sample image on a positioning feature, wherein the positioning feature is used for at least characterizing relative relationship features between the at least one character unit.

Second, a training label on a shape feature is determined for at least one character unit in the training sample image.

In this scenario, on the one hand, the information unit dictionary matching with the positioning feature may include feature vectors of explicit information units and implicit information units that can be all enumerated, for example, the feature vector of [ 1000 … … 0] for one explicit information unit (e.g., character unit "a"), and the feature vector of [ 0100 … … 0] for one implicit information unit (e.g., the relative relationship between two character units is the base number and exponent relationship). On the other hand, the information unit dictionary matching the shape feature may include only feature vectors of explicit information units that can be enumerated in their entirety.

Further, based on the feature vector recorded in each information unit dictionary, a training label on a corresponding feature type is constructed for at least one information unit in the training sample image. For example, the image includes a formula "a^b". Wherein, the character unit 'a' corresponds to the feature vector of [ 1000 … … 0]]The character unit "b" corresponds to a feature vector of [ 0010 … … 0]The relative relationship (i.e., the relationship between the base and the exponent) between the character unit "a" and the character unit "b" corresponds to a feature vector of [ 0100 … … 0]。

Based on the above situation, for the inclusion of the formula "a^b"the training label of the information unit in the training sample image constructed on the positioning feature is" [ 1000 … … 0]-[0 1 0 0 …… 0]-[0 0 1 0 …… 0]", the information units in the training sample image of (9) construct a training label on the shape feature of" [ 1000 … … 0]]，[0 0 1 0 …… 0]”。

It can be seen that the training labels "[ 1000 … … 0] - [ 0100 … … 0] - [ 0010 … … 0 ]" mainly represent the characteristics of the relative relationship between the character units, and certainly represent the shape characteristics of the character units. The training labels are "[ 1000 … … 0], [ 0010 … … 0 ]" only represent the shape features of each character unit, but do not represent the features of the relative relationship between the character units, so that the training labels are essentially weak supervision labels.

Based on the above scenario, it can be understood that in the present application, a weakly supervised training label may be included in the training labels constructed on the respective feature types for at least one information unit in the training sample image.

In the application, by determining the training labels on the plurality of feature types including the weak supervision training label for at least one information unit in the training sample image, the advantage is that the feature information of the training sample image can be segmented by the training labels on the plurality of feature types, and more supervision information is given to the model, so that the model can be supervised and trained by different types of supervision information, and the accuracy of model prediction is improved.

In one embodiment of step 233 shown in FIG. 3, the obtaining of the model to be trained may be performed according to the steps shown in FIG. 6.

Referring to FIG. 6, a detailed flow diagram for obtaining a model to be trained is shown, according to one embodiment of the present application. Specifically, the method includes steps 2331 to 2332:

step 2331, an encoder model and a decoder model are obtained through network structure search, the encoder model is used for encoding an image, and the decoder model is used for decoding the features encoded by the encoder to obtain second feature information of at least one information unit in the image.

Step 2332, building the model to be trained based on the encoder model and the decoder model.

It will be appreciated by those skilled in the art that the encoder model and the decoder model may be of a network architecture model in nature.

In the present application, a Network Architecture Search (NAS) is an effective tool for generating and optimizing a network Architecture, and a recurrent network (recurrent network) is used as a controller to generate a field of the network Architecture to construct a sub-Neural network without determining the length and the structure of the network. And (3) taking the accuracy rate after the sub-network is trained as a feedback signal (rewarded signal) of the controller, and updating the controller by calculating a strategy gradient (policy gradient), so that continuous iteration and circulation are performed. In the next iteration, the controller will have a higher probability of proposing a network structure of high accuracy.

Based on the method, the encoder model and the decoder model are obtained in a network structure searching mode, and the method has the advantages that the better encoder model and decoder model can be obtained, so that the constructed model to be trained has accurate learning capability.

In an embodiment of step 233 shown in fig. 3, the training of the model to be trained through the training sample image and the training labels to obtain the information extraction model may be performed according to the steps shown in fig. 7.

Referring to fig. 7, a detailed flow diagram for training the model to be trained through the training sample images and the plurality of training labels according to an embodiment of the present application is shown. Specifically, it includes steps 2333 to 2334:

step 2333, training the model to be trained through the training sample images and the training labels according to preset training times to obtain an information extraction reference model of the preset training times;

step 2334, averaging the information extraction reference models of the predetermined training times to obtain the information extraction models.

In this embodiment, the model to be trained may be subjected to multiple rounds of training, for example, 12 rounds of training, through the training sample image and the training labels, where each round of training obtains one information extraction reference model, and then the obtained multiple information extraction reference models are averaged to obtain the information extraction model. The advantage of this is that the accuracy of the model can be improved, and the accuracy of the information extraction model for extracting the feature information in the image can be enhanced.

In one embodiment of step 2333 shown in fig. 7, the training of the model to be trained by the training sample images and the plurality of training labels may be performed according to the steps shown in fig. 8.

Referring to fig. 8, a detailed flowchart for training the model to be trained through the training sample images and the plurality of training labels according to an embodiment of the present application is shown. Specifically, the method comprises steps 23331 to 23333:

step 23331, inputting the training sample image to the model to be trained, and obtaining third feature information of at least one information unit in the training sample image output by the model to be trained.

At step 23332, error information existing on the corresponding feature type is determined in the third feature information based on the training label corresponding to each feature type.

And 23333, based on the error information existing in each feature type, respectively updating the model parameters in the model to be trained in the reverse direction through the preset loss function corresponding to each feature type to obtain the information extraction reference model.

In this embodiment, after determining the error information existing in each feature type in the third feature information through the training label corresponding to each feature type, the model parameters in the model to be trained may be respectively updated reversely through a preset loss function based on each error information, so as to obtain the information extraction reference model.

In this embodiment, the model parameters in the model to be trained are respectively updated reversely based on each error information, which has the advantages of shortening the training time of the information extraction reference model and improving the model training efficiency.

It should be noted that, in the present application, when the model parameters in the model to be trained are updated reversely through each error information, the utilized loss parameters may be different.

For example, continuing with the example of recognizing the application scenario of the formula or text in the image, for the error information existing on the type of the positioning feature, the model parameters in the model to be trained may be updated reversely through the CELoss Loss function, and for the error information existing on the type of the shape feature, the model parameters in the model to be trained may be updated reversely through the bce withlogs Loss function. The advantage of this is that, based on the error information existing in each feature type, the loss function adapted to each feature type is set, so that the training effect of the model can be enhanced, and the accuracy of the information extraction model for extracting the feature information in the image can be enhanced.

In another embodiment, as shown in step 2333 of fig. 7, the training of the model to be trained by the training sample images and the plurality of training labels may be performed according to the steps shown in fig. 9.

Referring to fig. 9, a detailed flowchart for training the model to be trained through the training sample images and the plurality of training labels according to an embodiment of the present application is shown. Specifically, it includes steps 23334 to 23338:

at step 23334, a training label corresponding to a target feature type is selected from the plurality of training labels.

Step 23335, inputting the training sample image to the model to be trained, and obtaining third feature information of at least one information unit in the training sample image output by the model to be trained.

Step 23336, determining target error information existing on the target feature type in the third feature information based on the training label corresponding to the target feature type.

Step 23337, based on the target error information, performing reverse update on the model parameters in the model to be trained through a preset loss function corresponding to the target feature type to obtain an intermediate information extraction reference model.

Step 23338, using the intermediate information extraction reference model as a new model to be trained, and re-executing the step of selecting a training label corresponding to the target feature type from the plurality of training labels until all the labels in the plurality of training labels are selected, thereby obtaining the information extraction reference model.

In this embodiment, it can be understood that, in practice, the model to be trained is trained sequentially through each training label in an iterative training manner based on a plurality of training labels, that is, after the model to be trained is trained through the training label corresponding to one feature type, the model to be trained that is trained last time is trained through the training label corresponding to another feature type again. The advantage of this is that the training effect of the model can be enhanced, so as to enhance the accuracy of the information extraction model for extracting the feature information in the image.

To better enable those skilled in the art to implement the information extraction model mentioned in the present application, the following description will be made with reference to fig. 10.

Referring to FIG. 10, a block diagram of an information extraction model according to one embodiment of the present application is shown.

As shown in fig. 10, the information extraction model is composed of an encoder 1002 and a decoder 1004, wherein in the process of extracting the feature information of at least one information unit in the image, firstly, the encoder 1002 encodes the image data 1001 to obtain the encoded feature information 1003, and then, the decoder 1004 decodes the encoded feature information 1003 to obtain the feature information 1005 of at least one information unit in the image.

With continued reference to fig. 2, in step 250, first feature information of at least one information unit in the target image is extracted as target feature information by the information extraction model.

In one embodiment of step 250 shown in fig. 2, the extracting of the first feature information of at least one information unit in the target image by the information extraction model may be performed according to the steps shown in fig. 11.

Referring to fig. 11, a detailed flowchart for extracting first feature information of at least one information unit in the target image according to an embodiment of the present application is shown. Specifically, the method comprises steps 251 to 254:

step 251, for each target information unit of the at least one information unit, determining all information units and partial information units arranged before the target information unit.

Step 252, predicting the first reference feature information of the target information unit based on the feature information of all the information units.

Step 253, predicting second reference characteristic information of the target information unit based on the characteristic information of the partial information unit.

Step 254, determining the first characteristic information based on the first reference characteristic information and the second reference characteristic information of each information unit in the at least one information unit.

In the embodiment, the information units in the target image have some logical relative relationship, for example, a spatial logical relative relationship, and also a reading order logical relative relationship.

In this embodiment, the total information units arranged before the target information unit may refer to all information units logically arranged before the target information unit in space or in reading order. The partial information units arranged before the target information unit may refer to partial information units logically arranged before the target information unit in space or in reading order.

In the present embodiment, the application scenario of recognizing the formula or text in the image is taken as an example, for example, the formula "a" is included in the target image^bc^dIf the first feature information of the target information unit "d" needs to be predicted, the first reference feature information of "d" may be predicted based on the feature information of all the information units "a", "b", and "c" arranged before the target information unit. On the other hand, the second reference characteristic information of "d" may be predicted based on the characteristic information of the partial information unit "c" arranged before the target information unit. And then determining the first characteristic information based on the first reference characteristic information and the second reference characteristic information.

In this embodiment, when extracting the first feature information of at least one information unit in the target image, by considering the feature information of all information units and the feature information of some information units arranged before each target information unit, the global information and the local information can be integrated at the same time, so that the prediction of the information extraction model is more accurate.

In this application, continuing to take the application scenario identified by the formula in the image as an example, the information units may include character units, and the at least one information unit may constitute one or more formulas containing the character units.

Specifically, in an embodiment after step 250 shown in fig. 2, that is, after the first feature information of at least one information unit in the target image is extracted as the target feature information by the information extraction model, the following scheme may be further performed:

and editing one or more formulas in the target image into a formula editing area based on the target characteristic information.

Specifically, in the application scenario, when a user edits a document, a formula image to be edited can be intercepted on a webpage, then, based on an information extraction scheme provided by the application, first characteristic information of at least one character unit in the formula image is extracted to obtain target characteristic information, and then, based on the target characteristic information, a formula in the formula image is edited to a formula editing area.

According to the technical scheme, the first feature information of at least one information unit in the target image can be extracted through an information extraction model obtained by training at least one information unit in a training sample image on training labels of a plurality of feature types. The supervised feature information of the information unit in the image on the multiple feature types is considered through the training labels of the at least one information unit in the training sample image on the multiple feature types, so that the information extraction model obtained through training has the capability of accurately extracting the information, and the information extraction model is used for extracting the feature information reflected by the at least one information unit in the target image, so that the accuracy of information extraction can be improved.

The following describes embodiments of an apparatus of the present application, which may be used to perform the information extraction method in the above embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the information extraction method described above in the present application.

FIG. 12 shows a block diagram of an information extraction apparatus according to one embodiment of the present application.

Referring to fig. 12, an information extraction apparatus 1200 according to an embodiment of the present application includes: a first acquisition unit 1201, a second acquisition unit 1202, and an extraction unit 1203.

The first acquiring unit 1201 is used for acquiring a target image, wherein the target image comprises at least one information unit; a second obtaining unit 1202, configured to obtain an information extraction model, where the information extraction model is obtained by training at least one information unit in a training sample image on training labels of multiple feature types; an extracting unit 1203 is configured to extract first feature information of at least one information unit in the target image as target feature information through the information extraction model.

In some embodiments of the present application, based on the foregoing solution, the second obtaining unit 1202 includes: a third acquisition unit configured to acquire a training sample image; a determining unit, configured to determine, for at least one information unit in the training sample image, training labels on multiple feature types to obtain multiple training labels, where each training label is used to characterize feature information of the at least one information unit on a corresponding feature type; and the fourth obtaining unit is used for obtaining a model to be trained, and training the model to be trained through the training sample image and the training labels to obtain the information extraction model.

In some embodiments of the present application, based on the foregoing solution, the extracting unit 1203 is configured to: for each target information unit of the at least one information unit, determining all information units and partial information units arranged before the target information unit; predicting first reference feature information of the target information unit based on feature information of all the information units; predicting second reference feature information of the target information unit based on feature information of the partial information unit; determining the first feature information based on the first reference feature information and the second reference feature information of each of the at least one information unit.

As another aspect, the present application provides another information extraction apparatus, including a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by one or more processors, and the one or more programs include instructions for performing the information extraction method as described in the foregoing embodiments.

FIG. 13 shows a block diagram of an information extraction apparatus according to one embodiment of the present application. For example, apparatus 1300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 13, the apparatus 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, and a communication component 1316.

The processing component 1302 generally controls overall operation of the device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 1302 may include one or more processors 1320 to execute instructions to perform all or part of the steps of the method described above. Further, the processing component 1302 can include one or more modules that facilitate interaction between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.

The memory 1304 is configured to store various types of data to support operation at the device 1300. Examples of such data include instructions for any application or method operating on device 1300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 1306 provides power to the various components of device 1300. Power components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 1300.

The multimedia component 1308 includes a screen between the device 1300 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1308 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the back-facing camera may receive external multimedia data when the device 1300 is in an operational mode, such as a capture mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1310 is configured to output and/or input audio signals. For example, audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 1300 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.

The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1314 includes one or more sensors for providing various aspects of state assessment for the device 1300. For example, the sensor component 1314 may detect the open/closed state of the device 1300, the relative positioning of components, such as a display and keypad of the apparatus 1300, the sensor component 1314 may also search for a change in the position of the result presentation apparatus 1300 or a component of the apparatus 1300, the presence or absence of user contact with the apparatus 1300, the orientation or acceleration/deceleration of the apparatus 1300, and a change in the temperature of the apparatus 1300. The sensor assembly 1314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1316 is configured to facilitate communications between the apparatus 1300 and other devices in a wired or wireless manner. The apparatus 1300 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1316 also includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 1304 comprising instructions, executable by the processor 1320 of the apparatus 1300 to perform the information extraction method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

As another aspect, the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to make the computer device execute the information extraction method implemented in the above embodiments.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer-readable storage medium has stored therein at least one program code, which is loaded and executed by a processor of the apparatus to implement the operations performed by the information extraction method as described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An information extraction method, characterized in that the method comprises:

acquiring a target image, wherein the target image comprises at least one information unit;

acquiring an information extraction model, wherein the information extraction model is obtained by training at least one information unit in a training sample image on training labels of a plurality of characteristic types;

and extracting first characteristic information of at least one information unit in the target image as target characteristic information through the information extraction model.

2. The method of claim 1, wherein obtaining the information extraction model comprises:

acquiring a training sample image;

determining training labels on a plurality of feature types for at least one information unit in the training sample image to obtain a plurality of training labels, wherein each training label is used for representing feature information of the at least one information unit on the corresponding feature type;

and obtaining a model to be trained, and training the model to be trained through the training sample image and the training labels to obtain the information extraction model.

3. The method of claim 2, the acquiring training sample images, comprising:

acquiring at least one frame of original training sample image;

the method comprises the steps of scaling the height or width of each frame of original training sample image to a preset image height or a preset image width, and scaling the width or height of the original training sample image according to the scaling of the height or the width to obtain a preprocessed training sample image;

selecting a predetermined number of images from the pre-processed training sample images as the training sample images.

4. The method of claim 3, wherein said selecting a predetermined number of images from said pre-processed training sample images as said training sample images comprises:

sequencing the preprocessed training sample images according to the width or the height of the preprocessed training sample images;

and selecting a preset number of images which are connected in sequence from the preprocessed training sample images as the training sample images.

5. The method of claim 2, the information units comprising an explicit information unit and an implicit information unit, the determining training labels over a plurality of feature types for at least one information unit in the training sample image comprising:

acquiring an information unit dictionary matched with each feature type, wherein at least feature vectors corresponding to all dominant information units are recorded in the information unit dictionary;

and constructing a training label on a corresponding feature type for at least one information unit in the training sample image based on the feature vector recorded in each information unit dictionary.

6. The method of claim 2, wherein the obtaining the model to be trained comprises:

searching and acquiring an encoder model and a decoder model through a network structure, wherein the encoder model is used for encoding an image, and the decoder model is used for decoding the characteristics encoded by the encoder to obtain second characteristic information of at least one information unit in the image;

and constructing the model to be trained based on the encoder model and the decoder model.

7. The method of claim 2, wherein the training the model to be trained through the training sample images and the plurality of training labels to obtain the information extraction model comprises:

training the model to be trained through the training sample images and the training labels according to the preset training times to obtain an information extraction reference model of the preset training times;

and carrying out averaging processing on the information extraction reference model with the preset training times to obtain the information extraction model.

8. The method of claim 7, wherein the training the model to be trained with the training sample images and the plurality of training labels comprises:

inputting the training sample image to the model to be trained, and acquiring third feature information of at least one information unit in the training sample image output by the model to be trained;

determining error information existing on the corresponding feature type in the third feature information based on the training label corresponding to each feature type;

and on the basis of error information existing on each feature type, respectively and reversely updating model parameters in the model to be trained through a preset loss function corresponding to each feature type to obtain the information extraction reference model.

9. The method of claim 7, wherein the training the model to be trained with the training sample images and the plurality of training labels comprises:

selecting a training label corresponding to a target feature type from the plurality of training labels;

determining target error information existing on the target feature type in the third feature information based on a training label corresponding to the target feature type;

based on the target error information, carrying out reverse update on model parameters in the model to be trained through a preset loss function corresponding to the target characteristic type to obtain an intermediate information extraction reference model;

and taking the intermediate information extraction reference model as a new model to be trained, and re-executing the step of selecting a training label corresponding to the target feature type from the plurality of training labels until all the labels in the plurality of training labels are selected to obtain the information extraction reference model.

10. The method of any of claims 2 to 9, wherein the information units comprise character units, and wherein determining training labels on a plurality of feature types for at least one information unit in the training sample image comprises:

determining a training label on a positioning feature for at least one character unit in the training sample image, wherein the positioning feature is used for at least characterizing the relative relationship feature between the at least one character unit;

a training label on a shape feature is determined for at least one character unit in the training sample image.

11. The method according to claim 1, wherein the extracting first feature information of at least one information unit in the target image comprises:

for each target information unit of the at least one information unit, determining all information units and partial information units arranged before the target information unit;

predicting first reference feature information of the target information unit based on feature information of all the information units;

predicting second reference feature information of the target information unit based on feature information of the partial information unit;

determining the first feature information based on the first reference feature information and the second reference feature information of each of the at least one information unit.

12. The method of claim 1, wherein the information units comprise character units, wherein the at least one information unit constitutes one or more formulas containing the character units, and wherein after extracting first feature information of the at least one information unit in the target image as target feature information through the information extraction model, the method further comprises:

13. An information extraction apparatus, characterized in that the apparatus comprises:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target image which comprises at least one information unit;

the second acquisition unit is used for acquiring an information extraction model, and the information extraction model is obtained by training at least one information unit in a training sample image on training labels of a plurality of characteristic types;

and the extracting unit is used for extracting the first characteristic information of at least one information unit in the target image as target characteristic information through the information extracting model.

14. An information extraction device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the information extraction method according to any one of claims 1 to 12.

15. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded into and executed by a processor to perform operations performed by the information extraction method according to any one of claims 1 to 12.