CN114219876A

CN114219876A - Text merging method, device, equipment and storage medium

Info

Publication number: CN114219876A
Application number: CN202210149404.1A
Authority: CN
Inventors: 廖敏鹏; 樊楷
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-03-22
Anticipated expiration: 2042-02-18
Also published as: CN114219876B

Abstract

The disclosure relates to a text merging method, a text merging device, text merging equipment and a storage medium. The method comprises the steps of obtaining a target picture, and obtaining the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture. Further, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to at least one text line one by one, the undirected graph can be used as the input of a prediction model, and the directed graph can be obtained through prediction of the prediction model. Because the relevance between the nodes at the two ends of the directed edge in the directed graph is strong, and the two nodes also have a precedence order which is related to the direction of the directed edge, text lines respectively corresponding to the two nodes are spliced according to the precedence order to form a sentence with complete semantics. So that subsequent text understanding or processing errors can be avoided.

Description

Text merging method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to a text merging method, apparatus, device, and storage medium.

Background

Current Optical Character Recognition (OCR) technology can recognize text or words in a picture.

However, the inventors of the present application have found that when a piece of text is presented in multiple lines in a picture, the text obtained using OCR technology may be a fragmented single sentence with incomplete semantics, resulting in subsequent text understanding or processing errors.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a text merging method, apparatus, device and storage medium to merge multiple fragmented single sentences into one semantically complete sentence. So that subsequent text understanding or processing errors can be avoided.

In a first aspect, an embodiment of the present disclosure provides a text merging method, including:

acquiring a target picture;

acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;

determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;

inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model;

and merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.

In a second aspect, an embodiment of the present disclosure provides a text merging method, where the method is applied to a terminal, and the method includes:

receiving a target picture from a server or acquiring the target picture through a shooting device;

sending the undirected graph to a server, wherein the server comprises a prediction model, and the server is used for inputting the undirected graph into the prediction model and obtaining a directed graph through the prediction model;

and receiving the directed graph from the server, and merging text lines corresponding to two nodes at two ends of a directed edge in the directed graph respectively.

In a third aspect, an embodiment of the present disclosure provides a text merging method, where the method is applied to a server, and the method includes:

receiving a target picture from a terminal;

merging text lines respectively corresponding to two nodes at two ends of a directed edge in the directed graph to obtain a merged text;

and sending the combined text to the terminal.

In a fourth aspect, an embodiment of the present disclosure provides a text merging apparatus, including:

the first acquisition module is used for acquiring a target picture;

the second acquisition module is used for acquiring the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture;

a determining module, configured to determine an undirected graph according to the position information of each text line and a sub-picture including the text line, where at least one node included in the undirected graph corresponds to the at least one text line one to one;

the prediction module is used for inputting the undirected graph into a prediction model and obtaining a directed graph through the prediction model;

and the merging module is used for merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.

In a fifth aspect, an embodiment of the present disclosure provides a terminal, including:

a communication component for communicating with a server;

the shooting device is used for acquiring a target picture;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the second aspect.

In a sixth aspect, an embodiment of the present disclosure provides a server, including:

a communication component for communicating with a terminal;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the third aspect.

In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.

In an eighth aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.

According to the text merging method, the text merging device, the text merging equipment and the storage medium, the target picture is obtained, and the position information of each text line in at least one text line and the sub-picture containing the text line are obtained from the target picture. Further, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to at least one text line one by one, the undirected graph can be used as the input of a prediction model, and the directed graph can be obtained through prediction of the prediction model. Because the relevance between the nodes at the two ends of the directed edge in the directed graph is strong, and the two nodes also have a precedence order which is related to the direction of the directed edge, text lines respectively corresponding to the two nodes are spliced according to the precedence order to form a sentence with complete semantics. So that subsequent text understanding or processing errors can be avoided.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic view of a picture provided by an embodiment of the present disclosure;

fig. 2 is a flowchart of a text merging method provided in the embodiment of the present disclosure;

fig. 3 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure;

FIG. 4 is a flowchart of a text merging method according to another embodiment of the present disclosure;

FIG. 5 is a flowchart of a text merging method according to another embodiment of the present disclosure;

FIG. 6 is a flowchart of a text merging method according to another embodiment of the present disclosure;

FIG. 7 is a flowchart of a text merging method according to another embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a text merging device according to another embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Current Optical Character Recognition (OCR) technology can recognize text or words in a picture and output the recognized text or words. However, when a piece of text is presented in multiple lines in a picture, the text obtained using OCR technology may be a fragmented single sentence with incomplete semantics, resulting in subsequent text understanding or processing errors. For example, fig. 1 is a schematic diagram of a certain picture, and the present embodiment does not limit the content presented by the picture and the source of the picture. Specifically, the picture may be an advertisement picture of a certain commodity, or a picture taken by the terminal, or a picture from a network. Since OCR technology is primarily to locate text positions and obtain text by lines, when a piece of text, for example, "hot sky can be used as baby summer" is presented in two lines in a picture, OCR technology recognizes that two fragmented single sentences, i.e., "hot sky can be used as baby" and "summer" are used, thereby causing subsequent text understanding or text processing to introduce systematic errors. For example, in the subsequent machine translation, "heaven Hot can be used as baby" and "Summer cold is used" will be machine translated into "Hot weather can be a baby" and "Summer cool is used", and the meaning expressed by such translation results may be wrong. The "natural heat can be used as summer cool for the baby" is translated into "Hot weather can be used as baby's summer cool", and the translation result may better conform to the meaning expressed in the original text.

To solve this problem, embodiments of the present disclosure provide a text merging method, which is described below with reference to specific embodiments. Fig. 2 is a flowchart of a text merging method provided in the embodiment of the present disclosure. The method may be performed by a terminal or a server. As shown in fig. 3, when the method is executed by the terminal 31, the terminal 31 may merge the fragment phrases in the pictures it takes or the locally stored pictures. Alternatively, the terminal 31 may obtain a picture from the server 32 and merge the fragment clauses in the picture. When the method is executed by the server 32, the server 32 may receive the picture sent by the terminal 31, or the server 32 may obtain the picture from other network devices or terminals and merge the fragment clauses in the picture. The text recognition method in the image will be described below by taking the server 32 as an example. As shown in fig. 2, the method comprises the following specific steps:

s201, obtaining a target picture.

For example, the server 32 obtains a target picture, which may be the picture shown in fig. 1.

S202, obtaining the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.

For example, the target picture may be recorded as an original picture, such as the original picture 41 shown in fig. 4. The original picture 41 includes a plurality of text lines, such as text lines of "multiple uses of bath towel", "summer quilt", "hot weather can be used as baby", "summer quilt", and the like. The server 32 includes an OCR model, the server 32 can input the original picture 41 into the OCR model, and the OCR model can obtain the position information of each text line in the original picture 41 and the sub-picture containing the text line. The position information of each text line may be a position occupied by each text line in the original picture 41. Taking the text line "multiple uses of bath towel" as an example, the position information of the text line can be the position of the center point of the box 42 in the original picture 41 in fig. 4, and the length and width of the box 42. Alternatively, the position information of the text line may be the coordinate position of 4 points at 4 corners of the box 42 in fig. 4 in the original picture 41. The sub-picture containing the text line "multiple use of bath towel" may be the sub-picture in box 42. That is, for each line of text, the OCR model may output the location information of the line of text and a sub-picture containing the line of text.

S203, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one.

As shown in fig. 4, the position information of each text line and the sub-picture containing the text line are input to a text merge model (Textline 2paragraph model) in the text picture. In some embodiments, the Textline2paragraph model may also be translated into a model that merges lines of text into paragraphs. Specifically, the Textline2paragraph model may determine an undirected graph according to the position information of each text line and the sub-picture containing the text line, where the undirected graph includes nodes and undirected edges, where the number of nodes in the undirected graph is the same as the number of text lines in the original picture 41, and each text line corresponds to one node. In addition, in the undirected graph, there is one undirected edge between every two nodes. The two nodes are any two nodes in the undirected graph.

And S204, inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model.

For example, the Textline2 paramgraph model includes a prediction model, and the prediction model may specifically be a deep learning model such as Graph Neural Networks (GNNs) or transform model. An undirected graph as described above may be used as an input to the GNN model or the Transformer model, which may predict directed edges between nodes, resulting in a directed graph.

S205, merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.

It can be understood that, in the directed graph, if a directed edge exists between two nodes, it indicates that the association between the two nodes is strong, and a precedence order also exists between the two nodes, and the precedence order is related to the direction of the directed edge. For example, a directed edge exists between the node a and the node B, and the direction of the directed edge is from the node a to the node B, which indicates that the text line corresponding to the node a needs to be read first, and then the text line corresponding to the node B needs to be read, and the text line corresponding to the node a and the text line corresponding to the node B are spliced together to form a sentence with complete semantics. If there is no directed edge between node A and node B, it means that node A and node B are each semantically complete sentences. Therefore, the text lines corresponding to the two nodes at the two ends of the directed edge in the directed graph can be merged, so that a merged text is obtained. As shown in fig. 4, "hot days can be used as a baby" and "cool summer" can be combined into a paragraph, i.e., a text, "wrap a baby after bathing" and "avoid catching a cold" can be combined into a paragraph, "can be padded in a stroller" and "or a padding cloth when playing" can be combined into a paragraph. Further, the server 32 may also transmit the combined text to the terminal 31.

The embodiment of the disclosure acquires a target picture, and acquires the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture. Further, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to at least one text line one by one, the undirected graph can be used as the input of a prediction model, and the directed graph can be obtained through prediction of the prediction model. Because the relevance between the nodes at the two ends of the directed edge in the directed graph is strong, and the two nodes also have a precedence order which is related to the direction of the directed edge, text lines respectively corresponding to the two nodes are spliced according to the precedence order to form a sentence with complete semantics. So that subsequent text understanding or processing errors can be avoided.

Fig. 5 is a flowchart of a text merging method according to another embodiment of the present disclosure. On the basis of the above embodiment, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line includes the following steps:

s501, extracting the characteristics of the position information of the text line to obtain first characteristic information.

The processing procedure inside the Textline2 pargraph model as described above is shown in fig. 6, for example, taking a text line "multiple uses of bath towel" as an example, the position information of the text line may be the position information of the box 42 in the original picture 41, and specifically, the position information may be the position of the center point of the box 42 in the original picture 41 in fig. 4, and the length and width of the box 42. Alternatively, the position information of the text line may be the coordinate position of 4 points at 4 corners of the box 42 in fig. 4 in the original picture 41. Further, feature extraction is performed on the position information of the text line by using a Convolutional Neural Network (CNN) or patch embedding, so as to obtain first feature information, where the first feature information may be a representation vector.

And S502, performing feature extraction on the sub-picture containing the text line to obtain second feature information.

For example, the sub-picture including the text line, that is, the sub-picture in the frame 42, is subjected to feature extraction by using a CNN or patch embedding method, so as to obtain second feature information, where the second feature information may be one expression vector.

S503, obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information; the undirected graph includes a representation vector for each of the at least one node and a representation vector for undirected edges between nodes.

As shown in fig. 6, the node 60 corresponds to the text line in the box 42, and a representation vector of the node 60 can be obtained according to the first feature information corresponding to the position information of the box 42 and the second feature information of the sub-picture in the box 42. Specifically, the first feature information and the second feature information may be spliced to obtain the representation vector of the node 60, or the first feature information and the second feature information may be weighted and summed to obtain the representation vector of the node 60. In addition, as the node 62 shown in fig. 6 corresponds to the text line in the box 61, the same can be said of the representative vector of the node 62. An undirected graph 63 as shown in fig. 6 can be constructed from the nodes corresponding to each line of text, and the undirected edges between the nodes.

Optionally, the vector of representation of the non-directional edge is determined according to the vectors of representation of two nodes at two ends of the non-directional edge; or the method further comprises: different undirected edges are assigned different representation vectors.

For example, in the undirected graph 63, an undirected edge exists between the node 60 and the node 62, and the undirected edge representation vector can be derived from the representation vector of the node 60 and the representation vector of the node 62. Alternatively, different undirected edges in the undirected graph 63 may be assigned different representation vectors.

In this embodiment, feature extraction is performed on the position information of the text line to obtain first feature information, and feature extraction is performed on a sub-picture including the text line to obtain second feature information. And obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information, so that a task of combining the text line into a paragraph is converted into a prediction process of a directed graph. Therefore, the combination of the text lines can be realized by utilizing the picture information of the text lines and the position information of the text lines.

Fig. 7 is a flowchart of a text merging method according to another embodiment of the disclosure. The method comprises the following specific steps:

and S701, acquiring a target picture.

Specifically, the implementation manner of S701 and S201 is consistent with a specific principle, and is not described herein again.

S702, acquiring the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.

Specifically, the implementation manner and specific principle of S702 and S202 are consistent, and are not described herein again.

S703, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one.

Specifically, the implementation manner of S703 and S203 is consistent with a specific principle, and is not described herein again.

S704, inputting the undirected graph into a prediction model, and predicting whether directional edges exist among nodes and the direction of the directional edges through the prediction model, wherein the directed graph comprises a representation vector of each node in the at least one node and a representation vector of the directional edges among the nodes.

As shown in fig. 6, the undirected graph 63 is input to a prediction model, which may be a GNN model or a Transformer model, that is, a vector representing each node and a vector representing each undirected edge in the undirected graph 63 may be input to the prediction model. The prediction model may predict whether a directed edge exists between any two nodes according to the representation vector of each node and the representation vector of each undirected edge, and may also predict the direction of the directed edge when a directed edge exists between two nodes, thereby obtaining a directed graph 64 as shown in fig. 6. For example, there is a directed edge between node 65 and node 66, which is directed from node 65 to node 66.

S705, merging the sub-pictures of the text line respectively corresponding to the two nodes at the two ends of the directed edge into a target sub-picture according to the direction of the directed edge in the directed graph.

For example, the text line corresponding to the node 65 is "hot in the sky" and can be used as a baby ", and the text line corresponding to the node 66 is" used in summer. The sub-picture including "hot and cold available baby" and the sub-picture including "cold and hot used" may be merged into the target sub-picture 67 according to a directed edge pointing from the node 65 to the node 66.

And S706, acquiring the combined text from the target sub-picture.

For example, the merged text "heaven hot can be used as summer cold" can be obtained from the target sub-picture 67. Therefore, the correct translation can be carried out on the 'hot day can be used as a baby in summer' in the subsequent machine translation. Rather than translating "hot day as a baby" and "cool summer as used" separately.

In this embodiment, an undirected graph is constructed by the position information of the text line and the sub-picture containing the text line, and a digraph is predicted by using a depth network model such as GNN or Transformer, so that the operation of merging the text lines is completed, and the position information of the text line and the sub-picture containing the text line are fully and effectively utilized. The combined text can better adapt to the downstream text processing related modules.

It will be appreciated that after merging some lines of text, the merged text may not only be applied in the machine translation scenario described above. In addition, the method can also be applied to an OCR-based Visual Question Answering (VQA) scene. For example, the user asks "what function this summer cool is, and OCR-based visual questions and answers may analyze or understand the merged text, replying to the user" heaven hot can be used as a baby summer cool ". In addition, the downstream task is not limited to the machine translation, VQA, but may be other tasks that require merging of a single line of text, and is not described in detail here.

When the text merging method provided by the embodiment of the present disclosure is executed by the terminal 31, the method specifically includes the following steps:

and S11, receiving the target picture from the server or acquiring the target picture through the shooting device.

For example, the terminal 31 may be provided with a camera, which may be a camera. Specifically, the terminal 31 may capture a target picture by a photographing device, or the terminal 31 may receive the target picture from the server 32. It is understood that the terminal 31 may also receive the target picture from other servers.

S12, obtaining the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.

S13, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein the undirected graph comprises at least one node which is in one-to-one correspondence with the at least one text line.

And S14, sending the undirected graph to a server, wherein the server comprises a prediction model, and the server is used for inputting the undirected graph into the prediction model and obtaining the directed graph through the prediction model.

S15, receiving the directed graph from the server, and merging text lines corresponding to two nodes at two ends of a directed edge in the directed graph.

Specifically, the implementation principle of S12-S15 can be described with reference to the above embodiments, and is not explained herein. In the present embodiment, in order to reduce the calculation pressure of the terminal 31, the prediction model may be provided on the server 32 side. When the terminal 31 determines the undirected graph, the undirected graph can be sent to the server 32, so that the server 32 can input the undirected graph into the prediction model and obtain the directed graph through the prediction model. Further, the terminal 31 may receive the directed graph from the server and continue to perform the subsequent calculation process.

When the text merging method provided by the embodiment of the present disclosure is executed by the server 32, the method specifically includes the following steps:

and S21, receiving the target picture from the terminal.

For example, the server 32 may receive the target picture from the terminal 31. Or the server 32 may obtain the target picture from other network devices or other terminals.

S22, obtaining the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.

S23, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein the undirected graph comprises at least one node which is in one-to-one correspondence with the at least one text line.

And S24, inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model.

And S25, merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph to obtain a merged text.

And S26, sending the combined text to the terminal.

Specifically, the implementation principle of S21-S26 can be described with reference to the above embodiments, and is not explained herein. In this embodiment, after the server 32 merges the text lines corresponding to the two nodes at the two ends of the directed edge in the directed graph respectively to obtain a merged text, the merged text may also be sent to the terminal 31.

Fig. 8 is a schematic structural diagram of a text merging device according to an embodiment of the present disclosure. The text combining apparatus provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the text combining method, as shown in fig. 8, the text combining apparatus 80 includes:

a first obtaining module 81, configured to obtain a target picture;

a second obtaining module 82, configured to obtain, from the target picture, position information of each text line in at least one text line and a sub-picture including the text line;

a determining module 83, configured to determine an undirected graph according to the position information of each text line and the sub-picture including the text line, where at least one node included in the undirected graph corresponds to the at least one text line one to one;

the prediction module 84 is configured to input the undirected graph into a prediction model, and obtain a directed graph through the prediction model;

and a merging module 85, configured to merge text lines corresponding to two nodes at two ends of a directed edge in the directed graph respectively.

Optionally, when the determining module 83 determines an undirected graph according to the position information of each text line and the sub-picture containing the text line, the determining module is specifically configured to:

performing feature extraction on the position information of the text line to obtain first feature information;

performing feature extraction on the sub-picture containing the text line to obtain second feature information;

obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information; the undirected graph includes a representation vector for each of the at least one node and a representation vector for undirected edges between nodes.

Optionally, the vector of representation of the non-directional edge is determined according to the vectors of representation of two nodes at two ends of the non-directional edge; or

The device further comprises: an assigning module 86 for assigning different vectors of representation to different undirected edges.

Optionally, the prediction module 84 inputs the undirected graph into a prediction model, and when a directed graph is obtained through the prediction model, the prediction module is specifically configured to:

inputting the undirected graph into a prediction model, and predicting whether directional edges exist among nodes and the direction of the directional edges through the prediction model, wherein the directional graph comprises a representation vector of each node in the at least one node and a representation vector of the directional edges among the nodes.

Optionally, when the merging module 85 merges text lines corresponding to two nodes at two ends of a directed edge in the directed graph, the merging module is specifically configured to:

merging sub-pictures of text lines corresponding to two nodes at two ends of the directed edge into a target sub-picture according to the direction of the directed edge in the directed graph;

and acquiring the combined text from the target sub-picture.

The text merging apparatus in the embodiment shown in fig. 8 can be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

The internal functions and structures of the text combining apparatus, which can be implemented as an electronic device, are described above. The electronic device may be a terminal or a server. Fig. 9 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure. As shown in fig. 9, the electronic device includes a memory 91 and a processor 92.

The memory 91 is used to store programs. In addition to the above-described programs, the memory 91 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.

The memory 91 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The processor 92 is coupled to the memory 91 and executes the programs stored in the memory 91 for:

acquiring a target picture;

Further, as shown in fig. 9, the electronic device may further include: communication components 93, power components 94, audio components 95, a display 96, and other components. Only some of the components are schematically shown in fig. 9, and the electronic device is not meant to include only the components shown in fig. 9.

The communication component 93 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 93 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 93 further includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

A power supply assembly 94 provides power to the various components of the electronic device. The power components 94 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.

The audio component 95 is configured to output and/or input audio signals. For example, the audio assembly 95 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 91 or transmitted via the communication component 93. In some embodiments, audio assembly 95 also includes a speaker for outputting audio signals.

The display 96 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

In addition, an embodiment of the present disclosure further provides a terminal, where the terminal may include:

a communication component for communicating with a server;

the shooting device is used for acquiring a target picture;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the text merging method of the above embodiment.

In addition, an embodiment of the present disclosure further provides a server, where the server may include:

a communication component for communicating with a terminal;

a processor; and

a computer program;

In addition, the embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the text merging method described in the above embodiment.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A text merging method, wherein the method comprises:

acquiring a target picture;

2. The method of claim 1, wherein determining an undirected graph according to the position information of each text line and the sub-picture containing the text line comprises:

3. The method of claim 2, wherein the vector of representations of the non-directional edge is determined from vectors of representations of two nodes at both ends of the non-directional edge; or

The method further comprises the following steps:

different undirected edges are assigned different representation vectors.

4. The method according to claim 1 or 2, wherein inputting the undirected graph into a predictive model from which a directed graph is derived comprises:

5. The method of claim 1, wherein merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph comprises:

and acquiring the combined text from the target sub-picture.

6. A text merging method is applied to a terminal and comprises the following steps:

7. A text merging method, wherein the method is applied to a server, and the method comprises the following steps:

receiving a target picture from a terminal;

and sending the combined text to the terminal.

8. A text merging apparatus, comprising:

the first acquisition module is used for acquiring a target picture;

9. A terminal, comprising:

a communication component for communicating with a server;

the shooting device is used for acquiring a target picture;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of claim 6.

10. A server, comprising:

a communication component for communicating with a terminal;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of claim 7.

11. An electronic device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-5.

12. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method according to any of claims 1-5.