CN112101308B

CN112101308B - Method and device for combining text boxes based on language model and electronic equipment

Info

Publication number: CN112101308B
Application number: CN202011257776.3A
Authority: CN
Inventors: 谢春鸿
Original assignee: Beijing Testin Information Technology Co Ltd
Current assignee: Beijing Testin Information Technology Co Ltd
Priority date: 2020-11-11
Filing date: 2020-11-11
Publication date: 2021-02-09
Anticipated expiration: 2040-11-11
Also published as: CN112101308A

Abstract

The application discloses a method and a device for merging text boxes based on a language model and electronic equipment thereof, which are used for solving the problem that the merging and separating operations of the text boxes are not accurate enough in the prior art. The method comprises the following steps: acquiring a first text box and a second text box which are adjacent in a target interface; predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model respectively; predicting a third probability that the first text box and the second text box are merged text boxes through the target language model; determining whether to merge the first text box and the second text box based on a first probability, the second probability, and the third probability; the target language model is obtained by training words based on the corpus of the target field and the expected words of the target field as labels, and the words in the first text box and the second text box belong to the target field.

Description

Method and device for combining text boxes based on language model and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for merging text boxes based on a language model, and an electronic device.

Background

Currently, the Recognition process of Optical Character Recognition (OCR) generally includes detecting a text box, filtering the detected text box, performing word Recognition on words in the text box, and merging or separating the recognized text box according to rules.

In the prior art, when text boxes are merged or separated, factors such as whether texts in two text boxes have the same height, are on the same horizontal line, are considered, and a determination is made as to whether a merging or separating operation is performed on two adjacent text boxes according to the factors. However, the merging or separating of the text boxes may result in merging of some text boxes that should not be merged, or maintain a separated state for the text boxes that need to be merged.

Therefore, how to accurately implement the merging and separating operations of the text boxes still needs to provide further solutions.

Disclosure of Invention

The embodiment of the application provides a method and a device for merging text boxes based on a language model and electronic equipment, and aims to solve the problem that the merging and separating operations of the text boxes are not accurate enough in the prior art.

In order to solve the above technical problem, the embodiment of the present application is implemented as follows:

in a first aspect, a method for merging text boxes based on a language model is provided, which includes:

acquiring a first text box and a second text box which are adjacent in a target interface;

predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model respectively;

predicting, by the target language model, a third probability that the first text box and the second text box are merged text boxes;

determining whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability;

the target language model is obtained by training words based on linguistic data of a target field and an expected word of the target field as a label, and the words in the first text box and the second text box belong to the target field.

In a second aspect, an apparatus for merging text boxes based on a language model is provided, including:

the text box acquisition module is used for acquiring a first text box and a second text box which are adjacent in the target interface;

the first prediction module is used for respectively predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model;

a second prediction module to predict a third probability that the first text box and the second text box are merged text boxes via the target language model;

a text box merging module configured to determine whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability;

In a third aspect, an electronic device is provided, which includes:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

In a fourth aspect, a computer-readable storage medium is presented, the computer-readable storage medium storing one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to:

The embodiment of the application can at least achieve the following technical effects by adopting the technical scheme:

in the process of identifying characters in a target interface, a first text box and a second text box which are adjacent to each other in the target interface can be obtained; then, respectively predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model; predicting a third probability that the first text box and the second text box are merged text boxes through the target language model; finally, whether the first text box and the second text box are combined or not is determined based on the first probability, the second probability and the third probability; the target language model is obtained by training words based on the corpus of the target field and the expected words of the target field as labels, and the words in the first text box and the second text box belong to the target field. A language model capable of identifying words is introduced in the process of combining and separating the text boxes, so that the combined or separated text boxes can better accord with language habits in specific fields, and the effect of combining or separating the text boxes is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart illustrating an implementation process of a text box merging method based on a language model according to an embodiment of the present specification;

FIG. 2 is a schematic diagram illustrating an embodiment of a method for merging text boxes based on a language model in an actual scene;

fig. 3 is a schematic structural diagram of a text box merging device based on a language model according to an embodiment of the present specification;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

In order to solve the problem that the combining and separating operations of the text boxes in the prior art are not accurate enough, embodiments of the present specification provide a method for combining text boxes based on a language model. By adopting the method provided by the embodiment of the specification, the adjacent first text box and second text box in the target interface can be obtained in the process of identifying the characters in the target interface; then, respectively predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model; predicting a third probability that the first text box and the second text box are merged text boxes through the target language model; finally, whether the first text box and the second text box are combined or not is determined based on the first probability, the second probability and the third probability; the target language model is obtained by training words based on the corpus of the target field and the expected words of the target field as labels, and the words in the first text box and the second text box belong to the target field. A language model capable of identifying words is introduced in the process of combining and separating the text boxes, so that the combined or separated text boxes can better accord with language habits in specific fields, and the effect of combining or separating the text boxes is improved.

The execution subject of the text box merging method based on the language model provided by the embodiment of the present specification may be, but is not limited to, a server or the like capable of being configured to execute at least one of the method recognition apparatuses provided by the embodiment of the present invention.

For convenience of description, the following description will be made of an embodiment of the method, taking an execution subject of the method as a server capable of executing the method as an example. It is understood that the implementation of the method by the server is merely an exemplary illustration and should not be construed as a limitation of the method.

Specifically, an implementation flow diagram of a text box merging method based on a language model provided by one or more embodiments of the present specification is shown in fig. 1, and includes:

101, acquiring a first text box and a second text box which are adjacent in a target interface;

as shown in fig. 2, a schematic diagram of a target interface provided for an embodiment of the present specification is that, in the target interface, first, a first text box and a second text box which are adjacent to each other are obtained, where a word included in the first text box is "a group of people and a social organization, an enterprise and public institution", and a word included in the second text box is "a position leader".

102, respectively predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model;

the target language model is obtained by training words based on the corpus of the target field and the expected words of the target field, and the words in the first text box and the second text box belong to the target field.

It should be understood that when determining whether to merge two adjacent text boxes, in addition to considering whether the two text boxes are adjacent to each other, the form factor such as the distance between the two text boxes, and the like, it is also necessary to consider whether the semantics after the words in the two text boxes are merged meet the language habit in the field to which the two text boxes belong, so as to accurately determine whether to merge the two adjacent text boxes. Based on the above, language models in a plurality of specific fields are trained in advance to identify whether words in the specific fields conform to language habits in the specific fields, and further determine whether words in two text boxes are combined. Optionally, predicting, by the target language model, a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box, respectively, includes:

determining a target field matched with words in a target interface;

calling a target language model obtained by training based on the linguistic data of the target field and the expected words of the target field as labels;

a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box are respectively predicted through the target language model.

Optionally, the first word in the first text box may be recognized by the target language model, and then the target language model may predict whether the first word is the independent word and meets the language habit of the target field, that is, predict a first probability that the first word is the independent word, and the target language model may recognize the second word in the second text box, and then the target language model may predict whether the second word is the independent word and meets the language habit of the target field, that is, predict a second probability that the second word is the independent word. Specifically, predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box respectively through the target language model includes:

identifying a first word in a first text box through a target language model;

identifying a second word in a second text box through the target language model;

a first probability that the first word is an independent word and a second probability that the second word is an independent word are predicted through the target language model.

As shown in fig. 2, a first word in the first text box may be identified as "group and social organization of the masses, enterprise and institution" and a second word in the second text box may be identified as "position responsible person" through the target language model, and then a first probability that the first word "group and social organization of the masses, enterprise and institution" is an independent word and a second probability that the second word "position responsible person" is an independent word may be predicted through the target language model.

103, predicting a third probability that the first text box and the second text box are combined text boxes through the target language model;

optionally, predicting, by the target language model, a third probability that the first text box and the second text box are merged text boxes includes:

combining words in the first text box and the second text box to obtain a third word;

identifying a third word through the target language model;

a third probability that the third word is an independent word is predicted by the target language model.

As shown in fig. 2, the first word "the group and social organization, the enterprise and public institution" in the first text box and the second word "the responsible person" in the second text box may be merged to obtain the third word "the group and social organization, the responsible person of the enterprise and public institution", the third word may be identified by the target language model to obtain "the group and social organization, the responsible person of the enterprise and public institution", and finally, the probability that the third word "the group and social organization, the responsible person of the enterprise and public institution" conforms to the language habit in the target field may be predicted by the target language model, that is, the third probability that "the group and social organization, and the responsible person of the enterprise and public institution" are independent words may be predicted.

And 104, determining whether to combine the first text box and the second text box based on the first probability, the second probability and the third probability.

Optionally, determining whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability includes:

determining an average probability of the first probability and the second probability based on the first probability and the second probability;

determining whether to merge the first text box and the second text box based on the average probability and the third probability.

Optionally, determining whether to merge the first text box and the second text box based on the average probability and the third probability includes:

if the average probability is smaller than the third probability, combining the characters in the first text box and the second text box;

if the average probability is greater than the third probability, the first text box and the second text box are not merged.

As shown in fig. 2, if the average probability of the first probability and the second probability is less than the third probability, that is, the third probability that the "group of people and social organization, and responsible person of enterprise and public institution" are independent words is greater than the probabilities that the "group of people and social organization, responsible person of enterprise and public institution" and "responsible person" are independent words, the first text box and the second text box are merged, otherwise, the first text box and the second text box are not merged.

Optionally, after determining whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability, the method provided in the embodiment of the present specification further includes:

acquiring a third text box adjacent to the second text box in the target interface;

predicting a fourth probability that the third text box is an independent text box through the target language model;

predicting a fourth probability that the second text box and the third text box are merged text boxes through the target language model;

and determining whether to merge the second text box and the third text box based on the second probability, the third probability and the fourth probability.

As shown in fig. 2, after determining whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability, a third text box adjacent to the second text box in the target interface may be continuously obtained, a fourth probability that the third text box is an independent text box is predicted through the target language model, and a fourth probability that the second text box and the third text box are merged text boxes is predicted through the target language model, and finally, whether to merge the second text box and the third text box is determined based on the second probability, the third probability, and the fourth probability. And so on until the words in all adjacent text boxes in the target interface are recognized once and whether merging is needed is determined.

Fig. 3 is a schematic structural diagram of a text box merging apparatus 300 based on a language model according to an embodiment of the present invention. Referring to fig. 3, in a software implementation, the apparatus 300 for merging text boxes based on a language model may include a text box obtaining module 301, a first prediction module 302, a second prediction module 303, and a text box merging module 304, wherein:

a text box obtaining module 301, configured to obtain a first text box and a second text box that are adjacent to each other in a target interface;

a first prediction module 302, configured to predict, through a target language model, a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box, respectively;

a second prediction module 303, configured to predict, through the target language model, a third probability that the first text box and the second text box are merged text boxes;

a text box merging module 304, configured to determine whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability;

Optionally, in an embodiment, the text box merging module 304 is configured to:

merging characters in the first text box and the second text box if the average probability is smaller than the third probability;

if the average probability is greater than the third probability, not merging the first text box and the second text box.

Optionally, in an embodiment, the first prediction module 302 is configured to:

determining the target field matched with the words in the target interface;

calling the target language model obtained by training for a label based on the linguistic data of the target field and the expected words of the target field;

and respectively predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through the target language model.

Optionally, in an embodiment, the first prediction module 302 is configured to:

identifying, by the target language model, a first word in the first text box;

identifying, by the target language model, a second word in the second text box;

predicting, by the target language model, a first probability that the first word is an independent word and a second probability that the second word is an independent word.

Optionally, in an embodiment, the second prediction module 303 is configured to:

merging words in the first text box and the second text box to obtain a third word;

identifying the third word by the target language model;

predicting, by the target language model, a third probability that the third word is an independent word.

Alternatively, in one embodiment,

the text box obtaining module 301 is further configured to obtain a third text box adjacent to the second text box in the target interface;

the first prediction module 302 is further configured to predict, through the target language model, a fourth probability that the third text box is an independent text box;

the second prediction module 303, further configured to predict, by the target language model, a fourth probability that the second text box and the third text box are merged text boxes;

the text box merging module 304 is further configured to determine whether to merge the second text box and the third text box based on the second probability, the third probability, and the fourth probability.

The device 300 for merging text boxes based on a language model can implement the method for merging text boxes based on a language model in the embodiments of the methods shown in fig. 1 and fig. 2, and specifically refer to the method for merging text boxes based on a language model in the embodiments shown in fig. 1 and fig. 2, which is not described again.

Fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification. Referring to fig. 4, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a text box merging device based on the language model on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

The method for merging text boxes based on a language model disclosed in the embodiment of fig. 1 in this specification can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further perform the method for merging text boxes based on the language model in fig. 1, which is not described herein again.

Of course, besides the software implementation, the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular to perform the following:

By adopting the computer-readable storage medium provided by the embodiment of the specification, in the process of identifying the characters in the target interface, a first text box and a second text box which are adjacent to each other in the target interface can be obtained; then, respectively predicting a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box through a target language model; predicting a third probability that the first text box and the second text box are merged text boxes through the target language model; finally, whether the first text box and the second text box are combined or not is determined based on the first probability, the second probability and the third probability; the target language model is obtained by training words based on the corpus of the target field and the expected words of the target field as labels, and the words in the first text box and the second text box belong to the target field. A language model capable of identifying words is introduced in the process of combining and separating the text boxes, so that the combined or separated text boxes can better accord with language habits in specific fields, and the effect of combining or separating the text boxes is improved.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A method for merging text boxes based on a language model is characterized by comprising the following steps:

the target language model is obtained by training words in a corpus based on a target field and words in the corpus based on the target field as labels, and the words in the first text box and the second text box belong to the target field;

determining whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability, including:

determining whether to merge the first text box and the second text box based on the average probability and the third probability;

determining whether to merge the first text box and the second text box based on the average probability and the third probability, including:

2. The method of claim 1, wherein predicting, by a target language model, a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box, respectively, comprises:

determining the target field matched with the words in the target interface;

calling the target language model obtained by training based on the corpus of the target field and words in the corpus of the target field as labels;

3. The method of claim 1 or 2, wherein predicting, by the target language model, a first probability that the first text box is an independent text box and a second probability that the second text box is an independent text box, respectively, comprises:

identifying, by the target language model, a first word in the first text box;

4. The method of claim 1, wherein predicting, by the target language model, a third probability that the first text box and the second text box are merged text boxes comprises:

identifying the third word by the target language model;

5. The method of claim 1, wherein after determining whether to merge the first text box and the second text box based on the first probability, the second probability, and the third probability, the method further comprises:

predicting, by the target language model, a fourth probability that the third text box is an independent text box;

predicting, by the target language model, a fourth probability that the second text box and the third text box are merged text boxes;

determining whether to merge the second text box and the third text box based on the second probability, the third probability, and the fourth probability.

6. An apparatus for merging text boxes based on a language model, comprising:

the text box merging module is configured to:

7. An electronic device, comprising:

a processor; and

8. A computer-readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to: