CN114219876A - Text merging method, device, equipment and storage medium - Google Patents

Text merging method, device, equipment and storage medium Download PDF

Info

Publication number
CN114219876A
CN114219876A CN202210149404.1A CN202210149404A CN114219876A CN 114219876 A CN114219876 A CN 114219876A CN 202210149404 A CN202210149404 A CN 202210149404A CN 114219876 A CN114219876 A CN 114219876A
Authority
CN
China
Prior art keywords
text
text line
picture
graph
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210149404.1A
Other languages
Chinese (zh)
Other versions
CN114219876B (en
Inventor
廖敏鹏
樊楷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Institute Hangzhou Technology Co Ltd
Original Assignee
Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Damo Institute Hangzhou Technology Co Ltd filed Critical Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority to CN202210149404.1A priority Critical patent/CN114219876B/en
Publication of CN114219876A publication Critical patent/CN114219876A/en
Application granted granted Critical
Publication of CN114219876B publication Critical patent/CN114219876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a text merging method, a text merging device, text merging equipment and a storage medium. The method comprises the steps of obtaining a target picture, and obtaining the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture. Further, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to at least one text line one by one, the undirected graph can be used as the input of a prediction model, and the directed graph can be obtained through prediction of the prediction model. Because the relevance between the nodes at the two ends of the directed edge in the directed graph is strong, and the two nodes also have a precedence order which is related to the direction of the directed edge, text lines respectively corresponding to the two nodes are spliced according to the precedence order to form a sentence with complete semantics. So that subsequent text understanding or processing errors can be avoided.

Description

Text merging method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of information technologies, and in particular, to a text merging method, apparatus, device, and storage medium.
Background
Current Optical Character Recognition (OCR) technology can recognize text or words in a picture.
However, the inventors of the present application have found that when a piece of text is presented in multiple lines in a picture, the text obtained using OCR technology may be a fragmented single sentence with incomplete semantics, resulting in subsequent text understanding or processing errors.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a text merging method, apparatus, device and storage medium to merge multiple fragmented single sentences into one semantically complete sentence. So that subsequent text understanding or processing errors can be avoided.
In a first aspect, an embodiment of the present disclosure provides a text merging method, including:
acquiring a target picture;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model;
and merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.
In a second aspect, an embodiment of the present disclosure provides a text merging method, where the method is applied to a terminal, and the method includes:
receiving a target picture from a server or acquiring the target picture through a shooting device;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
sending the undirected graph to a server, wherein the server comprises a prediction model, and the server is used for inputting the undirected graph into the prediction model and obtaining a directed graph through the prediction model;
and receiving the directed graph from the server, and merging text lines corresponding to two nodes at two ends of a directed edge in the directed graph respectively.
In a third aspect, an embodiment of the present disclosure provides a text merging method, where the method is applied to a server, and the method includes:
receiving a target picture from a terminal;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model;
merging text lines respectively corresponding to two nodes at two ends of a directed edge in the directed graph to obtain a merged text;
and sending the combined text to the terminal.
In a fourth aspect, an embodiment of the present disclosure provides a text merging apparatus, including:
the first acquisition module is used for acquiring a target picture;
the second acquisition module is used for acquiring the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture;
a determining module, configured to determine an undirected graph according to the position information of each text line and a sub-picture including the text line, where at least one node included in the undirected graph corresponds to the at least one text line one to one;
the prediction module is used for inputting the undirected graph into a prediction model and obtaining a directed graph through the prediction model;
and the merging module is used for merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.
In a fifth aspect, an embodiment of the present disclosure provides a terminal, including:
a communication component for communicating with a server;
the shooting device is used for acquiring a target picture;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the second aspect.
In a sixth aspect, an embodiment of the present disclosure provides a server, including:
a communication component for communicating with a terminal;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the third aspect.
In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In an eighth aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.
According to the text merging method, the text merging device, the text merging equipment and the storage medium, the target picture is obtained, and the position information of each text line in at least one text line and the sub-picture containing the text line are obtained from the target picture. Further, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to at least one text line one by one, the undirected graph can be used as the input of a prediction model, and the directed graph can be obtained through prediction of the prediction model. Because the relevance between the nodes at the two ends of the directed edge in the directed graph is strong, and the two nodes also have a precedence order which is related to the direction of the directed edge, text lines respectively corresponding to the two nodes are spliced according to the precedence order to form a sentence with complete semantics. So that subsequent text understanding or processing errors can be avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic view of a picture provided by an embodiment of the present disclosure;
fig. 2 is a flowchart of a text merging method provided in the embodiment of the present disclosure;
fig. 3 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure;
FIG. 4 is a flowchart of a text merging method according to another embodiment of the present disclosure;
FIG. 5 is a flowchart of a text merging method according to another embodiment of the present disclosure;
FIG. 6 is a flowchart of a text merging method according to another embodiment of the present disclosure;
FIG. 7 is a flowchart of a text merging method according to another embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a text merging device according to another embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
Current Optical Character Recognition (OCR) technology can recognize text or words in a picture and output the recognized text or words. However, when a piece of text is presented in multiple lines in a picture, the text obtained using OCR technology may be a fragmented single sentence with incomplete semantics, resulting in subsequent text understanding or processing errors. For example, fig. 1 is a schematic diagram of a certain picture, and the present embodiment does not limit the content presented by the picture and the source of the picture. Specifically, the picture may be an advertisement picture of a certain commodity, or a picture taken by the terminal, or a picture from a network. Since OCR technology is primarily to locate text positions and obtain text by lines, when a piece of text, for example, "hot sky can be used as baby summer" is presented in two lines in a picture, OCR technology recognizes that two fragmented single sentences, i.e., "hot sky can be used as baby" and "summer" are used, thereby causing subsequent text understanding or text processing to introduce systematic errors. For example, in the subsequent machine translation, "heaven Hot can be used as baby" and "Summer cold is used" will be machine translated into "Hot weather can be a baby" and "Summer cool is used", and the meaning expressed by such translation results may be wrong. The "natural heat can be used as summer cool for the baby" is translated into "Hot weather can be used as baby's summer cool", and the translation result may better conform to the meaning expressed in the original text.
To solve this problem, embodiments of the present disclosure provide a text merging method, which is described below with reference to specific embodiments. Fig. 2 is a flowchart of a text merging method provided in the embodiment of the present disclosure. The method may be performed by a terminal or a server. As shown in fig. 3, when the method is executed by the terminal 31, the terminal 31 may merge the fragment phrases in the pictures it takes or the locally stored pictures. Alternatively, the terminal 31 may obtain a picture from the server 32 and merge the fragment clauses in the picture. When the method is executed by the server 32, the server 32 may receive the picture sent by the terminal 31, or the server 32 may obtain the picture from other network devices or terminals and merge the fragment clauses in the picture. The text recognition method in the image will be described below by taking the server 32 as an example. As shown in fig. 2, the method comprises the following specific steps:
s201, obtaining a target picture.
For example, the server 32 obtains a target picture, which may be the picture shown in fig. 1.
S202, obtaining the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.
For example, the target picture may be recorded as an original picture, such as the original picture 41 shown in fig. 4. The original picture 41 includes a plurality of text lines, such as text lines of "multiple uses of bath towel", "summer quilt", "hot weather can be used as baby", "summer quilt", and the like. The server 32 includes an OCR model, the server 32 can input the original picture 41 into the OCR model, and the OCR model can obtain the position information of each text line in the original picture 41 and the sub-picture containing the text line. The position information of each text line may be a position occupied by each text line in the original picture 41. Taking the text line "multiple uses of bath towel" as an example, the position information of the text line can be the position of the center point of the box 42 in the original picture 41 in fig. 4, and the length and width of the box 42. Alternatively, the position information of the text line may be the coordinate position of 4 points at 4 corners of the box 42 in fig. 4 in the original picture 41. The sub-picture containing the text line "multiple use of bath towel" may be the sub-picture in box 42. That is, for each line of text, the OCR model may output the location information of the line of text and a sub-picture containing the line of text.
S203, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one.
As shown in fig. 4, the position information of each text line and the sub-picture containing the text line are input to a text merge model (Textline 2paragraph model) in the text picture. In some embodiments, the Textline2paragraph model may also be translated into a model that merges lines of text into paragraphs. Specifically, the Textline2paragraph model may determine an undirected graph according to the position information of each text line and the sub-picture containing the text line, where the undirected graph includes nodes and undirected edges, where the number of nodes in the undirected graph is the same as the number of text lines in the original picture 41, and each text line corresponds to one node. In addition, in the undirected graph, there is one undirected edge between every two nodes. The two nodes are any two nodes in the undirected graph.
And S204, inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model.
For example, the Textline2 paramgraph model includes a prediction model, and the prediction model may specifically be a deep learning model such as Graph Neural Networks (GNNs) or transform model. An undirected graph as described above may be used as an input to the GNN model or the Transformer model, which may predict directed edges between nodes, resulting in a directed graph.
S205, merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.
It can be understood that, in the directed graph, if a directed edge exists between two nodes, it indicates that the association between the two nodes is strong, and a precedence order also exists between the two nodes, and the precedence order is related to the direction of the directed edge. For example, a directed edge exists between the node a and the node B, and the direction of the directed edge is from the node a to the node B, which indicates that the text line corresponding to the node a needs to be read first, and then the text line corresponding to the node B needs to be read, and the text line corresponding to the node a and the text line corresponding to the node B are spliced together to form a sentence with complete semantics. If there is no directed edge between node A and node B, it means that node A and node B are each semantically complete sentences. Therefore, the text lines corresponding to the two nodes at the two ends of the directed edge in the directed graph can be merged, so that a merged text is obtained. As shown in fig. 4, "hot days can be used as a baby" and "cool summer" can be combined into a paragraph, i.e., a text, "wrap a baby after bathing" and "avoid catching a cold" can be combined into a paragraph, "can be padded in a stroller" and "or a padding cloth when playing" can be combined into a paragraph. Further, the server 32 may also transmit the combined text to the terminal 31.
The embodiment of the disclosure acquires a target picture, and acquires the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture. Further, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to at least one text line one by one, the undirected graph can be used as the input of a prediction model, and the directed graph can be obtained through prediction of the prediction model. Because the relevance between the nodes at the two ends of the directed edge in the directed graph is strong, and the two nodes also have a precedence order which is related to the direction of the directed edge, text lines respectively corresponding to the two nodes are spliced according to the precedence order to form a sentence with complete semantics. So that subsequent text understanding or processing errors can be avoided.
Fig. 5 is a flowchart of a text merging method according to another embodiment of the present disclosure. On the basis of the above embodiment, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line includes the following steps:
s501, extracting the characteristics of the position information of the text line to obtain first characteristic information.
The processing procedure inside the Textline2 pargraph model as described above is shown in fig. 6, for example, taking a text line "multiple uses of bath towel" as an example, the position information of the text line may be the position information of the box 42 in the original picture 41, and specifically, the position information may be the position of the center point of the box 42 in the original picture 41 in fig. 4, and the length and width of the box 42. Alternatively, the position information of the text line may be the coordinate position of 4 points at 4 corners of the box 42 in fig. 4 in the original picture 41. Further, feature extraction is performed on the position information of the text line by using a Convolutional Neural Network (CNN) or patch embedding, so as to obtain first feature information, where the first feature information may be a representation vector.
And S502, performing feature extraction on the sub-picture containing the text line to obtain second feature information.
For example, the sub-picture including the text line, that is, the sub-picture in the frame 42, is subjected to feature extraction by using a CNN or patch embedding method, so as to obtain second feature information, where the second feature information may be one expression vector.
S503, obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information; the undirected graph includes a representation vector for each of the at least one node and a representation vector for undirected edges between nodes.
As shown in fig. 6, the node 60 corresponds to the text line in the box 42, and a representation vector of the node 60 can be obtained according to the first feature information corresponding to the position information of the box 42 and the second feature information of the sub-picture in the box 42. Specifically, the first feature information and the second feature information may be spliced to obtain the representation vector of the node 60, or the first feature information and the second feature information may be weighted and summed to obtain the representation vector of the node 60. In addition, as the node 62 shown in fig. 6 corresponds to the text line in the box 61, the same can be said of the representative vector of the node 62. An undirected graph 63 as shown in fig. 6 can be constructed from the nodes corresponding to each line of text, and the undirected edges between the nodes.
Optionally, the vector of representation of the non-directional edge is determined according to the vectors of representation of two nodes at two ends of the non-directional edge; or the method further comprises: different undirected edges are assigned different representation vectors.
For example, in the undirected graph 63, an undirected edge exists between the node 60 and the node 62, and the undirected edge representation vector can be derived from the representation vector of the node 60 and the representation vector of the node 62. Alternatively, different undirected edges in the undirected graph 63 may be assigned different representation vectors.
In this embodiment, feature extraction is performed on the position information of the text line to obtain first feature information, and feature extraction is performed on a sub-picture including the text line to obtain second feature information. And obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information, so that a task of combining the text line into a paragraph is converted into a prediction process of a directed graph. Therefore, the combination of the text lines can be realized by utilizing the picture information of the text lines and the position information of the text lines.
Fig. 7 is a flowchart of a text merging method according to another embodiment of the disclosure. The method comprises the following specific steps:
and S701, acquiring a target picture.
Specifically, the implementation manner of S701 and S201 is consistent with a specific principle, and is not described herein again.
S702, acquiring the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.
Specifically, the implementation manner and specific principle of S702 and S202 are consistent, and are not described herein again.
S703, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one.
Specifically, the implementation manner of S703 and S203 is consistent with a specific principle, and is not described herein again.
S704, inputting the undirected graph into a prediction model, and predicting whether directional edges exist among nodes and the direction of the directional edges through the prediction model, wherein the directed graph comprises a representation vector of each node in the at least one node and a representation vector of the directional edges among the nodes.
As shown in fig. 6, the undirected graph 63 is input to a prediction model, which may be a GNN model or a Transformer model, that is, a vector representing each node and a vector representing each undirected edge in the undirected graph 63 may be input to the prediction model. The prediction model may predict whether a directed edge exists between any two nodes according to the representation vector of each node and the representation vector of each undirected edge, and may also predict the direction of the directed edge when a directed edge exists between two nodes, thereby obtaining a directed graph 64 as shown in fig. 6. For example, there is a directed edge between node 65 and node 66, which is directed from node 65 to node 66.
S705, merging the sub-pictures of the text line respectively corresponding to the two nodes at the two ends of the directed edge into a target sub-picture according to the direction of the directed edge in the directed graph.
For example, the text line corresponding to the node 65 is "hot in the sky" and can be used as a baby ", and the text line corresponding to the node 66 is" used in summer. The sub-picture including "hot and cold available baby" and the sub-picture including "cold and hot used" may be merged into the target sub-picture 67 according to a directed edge pointing from the node 65 to the node 66.
And S706, acquiring the combined text from the target sub-picture.
For example, the merged text "heaven hot can be used as summer cold" can be obtained from the target sub-picture 67. Therefore, the correct translation can be carried out on the 'hot day can be used as a baby in summer' in the subsequent machine translation. Rather than translating "hot day as a baby" and "cool summer as used" separately.
In this embodiment, an undirected graph is constructed by the position information of the text line and the sub-picture containing the text line, and a digraph is predicted by using a depth network model such as GNN or Transformer, so that the operation of merging the text lines is completed, and the position information of the text line and the sub-picture containing the text line are fully and effectively utilized. The combined text can better adapt to the downstream text processing related modules.
It will be appreciated that after merging some lines of text, the merged text may not only be applied in the machine translation scenario described above. In addition, the method can also be applied to an OCR-based Visual Question Answering (VQA) scene. For example, the user asks "what function this summer cool is, and OCR-based visual questions and answers may analyze or understand the merged text, replying to the user" heaven hot can be used as a baby summer cool ". In addition, the downstream task is not limited to the machine translation, VQA, but may be other tasks that require merging of a single line of text, and is not described in detail here.
When the text merging method provided by the embodiment of the present disclosure is executed by the terminal 31, the method specifically includes the following steps:
and S11, receiving the target picture from the server or acquiring the target picture through the shooting device.
For example, the terminal 31 may be provided with a camera, which may be a camera. Specifically, the terminal 31 may capture a target picture by a photographing device, or the terminal 31 may receive the target picture from the server 32. It is understood that the terminal 31 may also receive the target picture from other servers.
S12, obtaining the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.
S13, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein the undirected graph comprises at least one node which is in one-to-one correspondence with the at least one text line.
And S14, sending the undirected graph to a server, wherein the server comprises a prediction model, and the server is used for inputting the undirected graph into the prediction model and obtaining the directed graph through the prediction model.
S15, receiving the directed graph from the server, and merging text lines corresponding to two nodes at two ends of a directed edge in the directed graph.
Specifically, the implementation principle of S12-S15 can be described with reference to the above embodiments, and is not explained herein. In the present embodiment, in order to reduce the calculation pressure of the terminal 31, the prediction model may be provided on the server 32 side. When the terminal 31 determines the undirected graph, the undirected graph can be sent to the server 32, so that the server 32 can input the undirected graph into the prediction model and obtain the directed graph through the prediction model. Further, the terminal 31 may receive the directed graph from the server and continue to perform the subsequent calculation process.
When the text merging method provided by the embodiment of the present disclosure is executed by the server 32, the method specifically includes the following steps:
and S21, receiving the target picture from the terminal.
For example, the server 32 may receive the target picture from the terminal 31. Or the server 32 may obtain the target picture from other network devices or other terminals.
S22, obtaining the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture.
S23, determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein the undirected graph comprises at least one node which is in one-to-one correspondence with the at least one text line.
And S24, inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model.
And S25, merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph to obtain a merged text.
And S26, sending the combined text to the terminal.
Specifically, the implementation principle of S21-S26 can be described with reference to the above embodiments, and is not explained herein. In this embodiment, after the server 32 merges the text lines corresponding to the two nodes at the two ends of the directed edge in the directed graph respectively to obtain a merged text, the merged text may also be sent to the terminal 31.
Fig. 8 is a schematic structural diagram of a text merging device according to an embodiment of the present disclosure. The text combining apparatus provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the text combining method, as shown in fig. 8, the text combining apparatus 80 includes:
a first obtaining module 81, configured to obtain a target picture;
a second obtaining module 82, configured to obtain, from the target picture, position information of each text line in at least one text line and a sub-picture including the text line;
a determining module 83, configured to determine an undirected graph according to the position information of each text line and the sub-picture including the text line, where at least one node included in the undirected graph corresponds to the at least one text line one to one;
the prediction module 84 is configured to input the undirected graph into a prediction model, and obtain a directed graph through the prediction model;
and a merging module 85, configured to merge text lines corresponding to two nodes at two ends of a directed edge in the directed graph respectively.
Optionally, when the determining module 83 determines an undirected graph according to the position information of each text line and the sub-picture containing the text line, the determining module is specifically configured to:
performing feature extraction on the position information of the text line to obtain first feature information;
performing feature extraction on the sub-picture containing the text line to obtain second feature information;
obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information; the undirected graph includes a representation vector for each of the at least one node and a representation vector for undirected edges between nodes.
Optionally, the vector of representation of the non-directional edge is determined according to the vectors of representation of two nodes at two ends of the non-directional edge; or
The device further comprises: an assigning module 86 for assigning different vectors of representation to different undirected edges.
Optionally, the prediction module 84 inputs the undirected graph into a prediction model, and when a directed graph is obtained through the prediction model, the prediction module is specifically configured to:
inputting the undirected graph into a prediction model, and predicting whether directional edges exist among nodes and the direction of the directional edges through the prediction model, wherein the directional graph comprises a representation vector of each node in the at least one node and a representation vector of the directional edges among the nodes.
Optionally, when the merging module 85 merges text lines corresponding to two nodes at two ends of a directed edge in the directed graph, the merging module is specifically configured to:
merging sub-pictures of text lines corresponding to two nodes at two ends of the directed edge into a target sub-picture according to the direction of the directed edge in the directed graph;
and acquiring the combined text from the target sub-picture.
The text merging apparatus in the embodiment shown in fig. 8 can be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
The internal functions and structures of the text combining apparatus, which can be implemented as an electronic device, are described above. The electronic device may be a terminal or a server. Fig. 9 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure. As shown in fig. 9, the electronic device includes a memory 91 and a processor 92.
The memory 91 is used to store programs. In addition to the above-described programs, the memory 91 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.
The memory 91 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The processor 92 is coupled to the memory 91 and executes the programs stored in the memory 91 for:
acquiring a target picture;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model;
and merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.
Further, as shown in fig. 9, the electronic device may further include: communication components 93, power components 94, audio components 95, a display 96, and other components. Only some of the components are schematically shown in fig. 9, and the electronic device is not meant to include only the components shown in fig. 9.
The communication component 93 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 93 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 93 further includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
A power supply assembly 94 provides power to the various components of the electronic device. The power components 94 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.
The audio component 95 is configured to output and/or input audio signals. For example, the audio assembly 95 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 91 or transmitted via the communication component 93. In some embodiments, audio assembly 95 also includes a speaker for outputting audio signals.
The display 96 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
In addition, an embodiment of the present disclosure further provides a terminal, where the terminal may include:
a communication component for communicating with a server;
the shooting device is used for acquiring a target picture;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the text merging method of the above embodiment.
In addition, an embodiment of the present disclosure further provides a server, where the server may include:
a communication component for communicating with a terminal;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the text merging method of the above embodiment.
In addition, the embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the text merging method described in the above embodiment.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A text merging method, wherein the method comprises:
acquiring a target picture;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model;
and merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.
2. The method of claim 1, wherein determining an undirected graph according to the position information of each text line and the sub-picture containing the text line comprises:
performing feature extraction on the position information of the text line to obtain first feature information;
performing feature extraction on the sub-picture containing the text line to obtain second feature information;
obtaining a representation vector of a node corresponding to the text line according to the first characteristic information and the second characteristic information; the undirected graph includes a representation vector for each of the at least one node and a representation vector for undirected edges between nodes.
3. The method of claim 2, wherein the vector of representations of the non-directional edge is determined from vectors of representations of two nodes at both ends of the non-directional edge; or
The method further comprises the following steps:
different undirected edges are assigned different representation vectors.
4. The method according to claim 1 or 2, wherein inputting the undirected graph into a predictive model from which a directed graph is derived comprises:
inputting the undirected graph into a prediction model, and predicting whether directional edges exist among nodes and the direction of the directional edges through the prediction model, wherein the directional graph comprises a representation vector of each node in the at least one node and a representation vector of the directional edges among the nodes.
5. The method of claim 1, wherein merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph comprises:
merging sub-pictures of text lines corresponding to two nodes at two ends of the directed edge into a target sub-picture according to the direction of the directed edge in the directed graph;
and acquiring the combined text from the target sub-picture.
6. A text merging method is applied to a terminal and comprises the following steps:
receiving a target picture from a server or acquiring the target picture through a shooting device;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
sending the undirected graph to a server, wherein the server comprises a prediction model, and the server is used for inputting the undirected graph into the prediction model and obtaining a directed graph through the prediction model;
and receiving the directed graph from the server, and merging text lines corresponding to two nodes at two ends of a directed edge in the directed graph respectively.
7. A text merging method, wherein the method is applied to a server, and the method comprises the following steps:
receiving a target picture from a terminal;
acquiring the position information of each text line in at least one text line and a sub-picture containing the text line from the target picture;
determining an undirected graph according to the position information of each text line and the sub-picture containing the text line, wherein at least one node included in the undirected graph corresponds to the at least one text line one by one;
inputting the undirected graph into a prediction model, and obtaining a directed graph through the prediction model;
merging text lines respectively corresponding to two nodes at two ends of a directed edge in the directed graph to obtain a merged text;
and sending the combined text to the terminal.
8. A text merging apparatus, comprising:
the first acquisition module is used for acquiring a target picture;
the second acquisition module is used for acquiring the position information of each text line in at least one text line and the sub-picture containing the text line from the target picture;
a determining module, configured to determine an undirected graph according to the position information of each text line and a sub-picture including the text line, where at least one node included in the undirected graph corresponds to the at least one text line one to one;
the prediction module is used for inputting the undirected graph into a prediction model and obtaining a directed graph through the prediction model;
and the merging module is used for merging the text lines respectively corresponding to the two nodes at the two ends of the directed edge in the directed graph.
9. A terminal, comprising:
a communication component for communicating with a server;
the shooting device is used for acquiring a target picture;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of claim 6.
10. A server, comprising:
a communication component for communicating with a terminal;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of claim 7.
11. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-5.
12. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method according to any of claims 1-5.
CN202210149404.1A 2022-02-18 2022-02-18 Text merging method, device, equipment and storage medium Active CN114219876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210149404.1A CN114219876B (en) 2022-02-18 2022-02-18 Text merging method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210149404.1A CN114219876B (en) 2022-02-18 2022-02-18 Text merging method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114219876A true CN114219876A (en) 2022-03-22
CN114219876B CN114219876B (en) 2022-06-24

Family

ID=80708916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210149404.1A Active CN114219876B (en) 2022-02-18 2022-02-18 Text merging method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114219876B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110302168A1 (en) * 2010-06-08 2011-12-08 International Business Machines Corporation Graphical models for representing text documents for computer analysis
WO2014173882A1 (en) * 2013-04-23 2014-10-30 Thales Method and device for automatically extracting themes from at least one document containing text
CN106156110A (en) * 2015-04-03 2016-11-23 科大讯飞股份有限公司 text semantic understanding method and system
CN107526718A (en) * 2017-09-19 2017-12-29 北京百度网讯科技有限公司 Method and apparatus for generating text
CN107977592A (en) * 2016-10-21 2018-05-01 中兴通讯股份有限公司 A kind of image text detection method and system, user terminal and server
CN111680168A (en) * 2020-05-29 2020-09-18 平安银行股份有限公司 Text feature semantic extraction method and device, electronic equipment and storage medium
CN112926564A (en) * 2021-02-25 2021-06-08 中国平安人寿保险股份有限公司 Picture analysis method, system, computer device and computer-readable storage medium
CN112949476A (en) * 2021-03-01 2021-06-11 苏州美能华智能科技有限公司 Text relation detection method and device based on graph convolution neural network and storage medium
CN113780254A (en) * 2021-11-12 2021-12-10 阿里巴巴达摩院(杭州)科技有限公司 Picture processing method and device, electronic equipment and computer storage medium
CN113886568A (en) * 2021-09-30 2022-01-04 宿迁硅基智能科技有限公司 Text abstract generation method and device
CN114036298A (en) * 2021-11-17 2022-02-11 西安理工大学 Node classification method based on graph convolution neural network and word vector

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110302168A1 (en) * 2010-06-08 2011-12-08 International Business Machines Corporation Graphical models for representing text documents for computer analysis
WO2014173882A1 (en) * 2013-04-23 2014-10-30 Thales Method and device for automatically extracting themes from at least one document containing text
CN106156110A (en) * 2015-04-03 2016-11-23 科大讯飞股份有限公司 text semantic understanding method and system
CN107977592A (en) * 2016-10-21 2018-05-01 中兴通讯股份有限公司 A kind of image text detection method and system, user terminal and server
CN107526718A (en) * 2017-09-19 2017-12-29 北京百度网讯科技有限公司 Method and apparatus for generating text
CN111680168A (en) * 2020-05-29 2020-09-18 平安银行股份有限公司 Text feature semantic extraction method and device, electronic equipment and storage medium
CN112926564A (en) * 2021-02-25 2021-06-08 中国平安人寿保险股份有限公司 Picture analysis method, system, computer device and computer-readable storage medium
CN112949476A (en) * 2021-03-01 2021-06-11 苏州美能华智能科技有限公司 Text relation detection method and device based on graph convolution neural network and storage medium
CN113886568A (en) * 2021-09-30 2022-01-04 宿迁硅基智能科技有限公司 Text abstract generation method and device
CN113780254A (en) * 2021-11-12 2021-12-10 阿里巴巴达摩院(杭州)科技有限公司 Picture processing method and device, electronic equipment and computer storage medium
CN114036298A (en) * 2021-11-17 2022-02-11 西安理工大学 Node classification method based on graph convolution neural network and word vector

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HANZHOU WU等: "Linguistic Steganalysis With Graph Neural Networks", 《IEEE》, 26 February 2021 (2021-02-26), pages 558 - 562, XP011845859, DOI: 10.1109/LSP.2021.3062233 *
翟文洁等: "基于混合深度信念网络的多类文本表示与分类方法", 《情报工程》, vol. 02, no. 05, 31 October 2016 (2016-10-31), pages 30 - 40 *
许晶航等: "基于图注意力网络的因果关系抽取", 《计算机研究与发展》, no. 01, 31 January 2020 (2020-01-31), pages 161 - 176 *

Also Published As

Publication number Publication date
CN114219876B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
KR102365890B1 (en) Method, apparatus and storage medium for training of neural network
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
KR102338918B1 (en) Method, device and storage medium for training machine translation model
JP2021114277A (en) Information processing method, device and storage medium
CN112100431B (en) Evaluation method, device and equipment of OCR system and readable storage medium
CN113065591B (en) Target detection method and device, electronic equipment and storage medium
CN104035995A (en) Method and device for generating group tags
CN111160047A (en) Data processing method and device and data processing device
CN108829481B (en) Presentation method of remote controller interface based on control electronic equipment
CN115879440A (en) Natural language processing method, natural language processing device, natural language model training equipment and storage medium
CN103995844B (en) Information search method and device
CN105677717A (en) Display method and terminal
CN114219876B (en) Text merging method, device, equipment and storage medium
CN112259122A (en) Audio type identification method and device and storage medium
CN110147426B (en) Method for determining classification label of query text and related device
CN105630948A (en) Web page display method and apparatus
CN111353422B (en) Information extraction method and device and electronic equipment
CN104699668B (en) Determine the method and device of Words similarity
CN111222011B (en) Video vector determining method and device
CN113807540A (en) Data processing method and device
CN114202647B (en) Method, device and equipment for recognizing text in image and storage medium
CN112149653A (en) Information processing method, information processing device, electronic equipment and storage medium
CN105975621B (en) Method and device for identifying search engine in browser page
CN104376030A (en) Intelligent browser bookmark grouping method and device
US20190303664A1 (en) Method and system for using whiteboard changes as interactive directives for vectorization software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant