CN112667771A - Answer sequence determination method and device - Google Patents

Answer sequence determination method and device Download PDF

Info

Publication number
CN112667771A
CN112667771A CN202011529776.4A CN202011529776A CN112667771A CN 112667771 A CN112667771 A CN 112667771A CN 202011529776 A CN202011529776 A CN 202011529776A CN 112667771 A CN112667771 A CN 112667771A
Authority
CN
China
Prior art keywords
subsequence
node
text box
sequence
binary tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011529776.4A
Other languages
Chinese (zh)
Inventor
王德勋
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011529776.4A priority Critical patent/CN112667771A/en
Publication of CN112667771A publication Critical patent/CN112667771A/en
Priority to PCT/CN2021/109383 priority patent/WO2022134578A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Abstract

The invention discloses a method and a device for determining an answer sequence, relates to the technical field of intelligent decision, and mainly aims to solve the problem of low accuracy caused by the fact that an answer text box sequence contains irrelevant characters. The method comprises the following steps: acquiring a text box sequence, and storing the text box sequence to a root node S of a binary tree storage structure0Performing the following steps; clustering the text box sequence to obtain a first subsequence and a second subsequence, and detecting whether an end point subsequence exists in the first subsequence and the second subsequenceColumns; if not, backtracking the binary tree storage structure to obtain an answer sequence; if so, saving the endpoint subsequence to the root node S0Left child node S of1In (3), saving the non-endpoint subsequence to the root node S0Right child node S of2Middle, left child node S1And repeatedly clustering and detecting the terminal subsequence in the binary tree until no terminal subsequence exists, backtracking the binary tree storage structure, and obtaining and outputting an answer sequence.

Description

Answer sequence determination method and device
Technical Field
The invention relates to the field of intelligent decision making, in particular to a method and a device for determining an answer sequence.
Background
Visual Question Answering (VQA) is a research object combining multiple fields of text detection, text recognition, NLP reading understanding, and the process may generally include: an Optical Character Recognition (OCR) system detects and recognizes all text regions in a scanned document, sorts all text boxes in the order of positions from left to right and from top to bottom, and outputs answers to questions through models.
Currently, it is common to output the starting and ending positions using a pre-trained reading understanding model and use the text sequence between the two positions as the answer to the question. However, the scanned document structure and layout under the real scene are very complicated, so that the output question answers easily contain irrelevant characters, and the accuracy is low.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for determining an answer sequence, and mainly aims to solve the problem that the output answers to a question easily contain irrelevant characters and have a low accuracy due to a complex structure and layout of a scanned document in a real scene.
According to an aspect of the present invention, there is provided a method for determining a sequence of answers, including:
acquiring a text box sequence, and storing the text box sequence to a root node S of a binary tree storage structure0Performing the following steps;
clustering the text box sequence to obtain a first subsequence and a second subsequence;
detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence simultaneously comprising a first text box and a second text box;
if not, backtracking and merging the binary tree storage structure to obtain an answer sequence;
if so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence;
and outputting the answer sequence.
Further, the pair of the left child nodes S2Before the clustering and detecting steps are repeatedly performed by the endpoint subsequence in (1), the method further comprises:
for the right child node S2The non-endpoint subsequence in (1) is clustered to obtain a third subsequence and a fourth subsequence,
respectively calculating the third subsequence, the fourth subsequence and the left child node S1The minimum horizontal distance between the end point subsequences in (a);
if the minimum horizontal distance is not larger than a preset distance threshold, merging and storing the corresponding third subsequence or fourth subsequence to the left child node S1In (1).
Further, the performing backtracking merging processing on the binary tree storage structure includes:
according to the left child node S at the bottommost layer of the binary tree storage structure2i+1Find the corresponding parent node Si+1
Calculating the parent node Si+1With said parent node Si+1Of brother node Si+2A minimum horizontal distance therebetween;
judging the father node Si+1With said parent node Si+1Of brother node Si+2Whether the minimum horizontal distance between the two is not greater than a preset distance threshold value;
if not, stopping backtracking and enabling the left child node S2i+1The subsequence in (1) is determined as the answer sequence;
if yes, backtracking is continuously carried out on the upper nodes of the binary tree storage structure.
Further, the calculation of the parent node Si+1Sibling node S with the parent nodei+2A minimum horizontal distance therebetween, comprising:
obtaining the father node Si+1Minimum and maximum x-coordinates (a1, a 2);
obtaining brother node S of the father nodei+2Minimum and maximum x-coordinates (B1, B2);
calculating the father node S according to a preset minimum horizontal distance formulai+1Sibling node S with the parent nodei+2A minimum horizontal distance therebetween, the minimum horizontal distance formula comprising:
D=max(A2,B2)-min(A1,B1)-(B2-B1)-(A2-A1)
wherein D is the father node Si+1Sibling node S with the parent nodei+2A1 is the parent node Si+1Is the parent node S, A2i+1B1 is the parent node Si+1Of brother node Si+2B2 is the parent node Si+1Of brother node Si+2The maximum x coordinate of (c).
Further, the clustering the text box sequence to obtain a first subsequence and a second subsequence includes:
and performing k-means clustering processing on the text box sequence to obtain a first subsequence and a second subsequence.
Further, performing k-means clustering processing on the text box sequence to obtain a first subsequence and a second subsequence, including;
randomly extracting 2 text boxes in the text box sequence as a first centroid and a second centroid;
respectively calculating Euclidean distances between the rest text boxes in the text box sequence and the first centroid and the second centroid;
and dividing the text boxes with the Euclidean distance from the first centroid larger than that from the second centroid into a first subsequence, and dividing the text boxes with the Euclidean distance from the second centroid larger than that from the first centroid into a second subsequence.
Further, the text box sequence is obtained and stored to a root node S of a binary tree storage structure0Before, the method further comprises:
detecting and identifying the obtained scanned document by using an optical character recognition system to obtain a text box cluster;
arranging the text box clusters according to a preset sequence;
processing the arranged text box cluster by using a pre-trained reading understanding model to obtain a first text box and a second text box;
and determining a text box cluster between the first text box and the second text box as an output text box sequence.
According to another aspect of the present invention, there is provided an answer sequence determination apparatus including:
an obtaining unit, configured to obtain a text box sequence, and store the text box sequence to a root node S of a binary tree storage structure0Performing the following steps;
the processing unit is used for clustering the text box sequence to obtain a first subsequence and a second subsequence, and detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence which simultaneously comprises the first text box and the second text box;
the backtracking unit is used for backtracking and merging the binary tree storage structure if the binary tree storage structure is not the binary tree storage structure, so as to obtain an answer sequence;
a merging unit, configured to save the endpoint subsequence to the root node S if yes0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1The end point subsequence in (1) repeatedly performs clustering and detectionA measuring step, namely backtracking and merging the binary tree storage structure until no terminal subsequence exists to obtain an answer sequence;
and the output unit is used for outputting the answer sequence.
Further, the apparatus further comprises: a calculating unit, a judging unit,
the processing unit is specifically further configured to assign the right child node S2The non-endpoint subsequence in (1) is clustered to obtain a third subsequence and a fourth subsequence,
the calculating unit is further specifically configured to calculate the third subsequence, the fourth subsequence, and the left child node S respectively1The minimum horizontal distance between the end point subsequences in (a);
the judging unit is configured to merge and store the corresponding third subsequence or fourth subsequence to the left child node S if the minimum horizontal distance is not greater than a preset distance threshold1In (1).
Further, the backtracking unit includes:
a searching module for searching the left child node S at the bottom layer of the binary tree storage structure2i+1Find the corresponding parent node Si+1
A first calculation module for calculating the father node Si+1With said parent node Si+1Of brother node Si+2A minimum horizontal distance therebetween;
a judging module for judging the father node Si+1With said parent node Si+1Of brother node Si+2Whether the minimum horizontal distance between the two is not greater than a preset distance threshold value;
a determining module, configured to stop backtracking and determine the left child node S if the left child node S is not the left child node S2i+1The subsequence in (1) is determined as the answer sequence; if yes, backtracking is continuously carried out on the upper nodes of the binary tree storage structure.
Further, the first computing module is specifically configured to obtain the parent node Si+1Minimum and maximum x-coordinates (a1, a 2); obtaining brother node S of the father nodei+2Minimum and maximum x-coordinates (B1, B2); calculating the father node S according to a preset minimum horizontal distance formulai+1Sibling node S with the parent nodei+2A minimum horizontal distance therebetween, the minimum horizontal distance formula comprising:
D=max(A2,B2)-min(A1,B1)-(B2-B1)-(A2-A1)
wherein D is the father node Si+1Sibling node S with the parent nodei+2A1 is the parent node Si+1Is the parent node S, A2i+1B1 is the parent node Si+1Of brother node Si+2B2 is the parent node Si+1Of brother node Si+2The maximum x coordinate of (c).
Further, the processing unit is specifically configured to perform k-means clustering on the text box sequence to obtain a first subsequence and a second subsequence.
Further, the processing unit comprises;
the extraction module is used for randomly extracting 2 text boxes in the text box sequence to serve as a first centroid and a second centroid;
the second calculation module is used for calculating Euclidean distances between the rest text boxes in the text box sequence and the first centroid and the second centroid respectively;
and the dividing module is used for dividing the text boxes with the Euclidean distance from the first centroid to the second centroid to a first subsequence, and dividing the text boxes with the Euclidean distance from the second centroid to the second subsequence.
Further, the apparatus further comprises:
the recognition unit is used for detecting and recognizing the obtained scanning document by using an optical character recognition system to obtain a text box cluster;
the arranging unit is used for arranging the text box clusters according to a preset sequence;
the training unit is used for processing the arranged text box cluster by utilizing a pre-trained reading understanding model to obtain a first text box and a second text box;
and the determining unit is used for determining the text box cluster between the first text box and the second text box as the output text box sequence.
According to still another aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the determination method of the answer sequence as described above.
According to still another aspect of the present invention, there is provided a computer apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the determination method of the answer sequence.
Compared with the prior art, the method and the device for determining the answer sequence have the advantages that the text box sequence is obtained and stored to the root node S of the binary tree storage structure0Performing the following steps; clustering the text box sequence to obtain a first subsequence and a second subsequence; detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence simultaneously comprising a first text box and a second text box; if not, backtracking and merging the binary tree storage structure to obtain an answer sequence; if so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence; and outputting the answer sequence. Thereby to obtainThe method can automatically delete irrelevant answers in the answer sequence, and improve the accuracy of the output answer sequence.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a method for determining answer sequences according to an embodiment of the present invention;
FIG. 2 is a block diagram of an answer sequence determining apparatus according to an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
An embodiment of the present invention provides a method for determining an answer sequence, as shown in fig. 1, where the method includes:
101. acquiring a text box sequence, and storing the text box sequence to a root node S of a binary tree storage structure0In (1).
Wherein, the application environment of the invention can be under the visual question answering technology,the method includes the steps of acquiring text box data, wherein a Visual Question Answering (VQA) is a new field needing to understand texts and vision, and automatically analyzing relevant Question Answering answers about images, such as what is in the images, by using a deep learning model according to input image information through a Visual Question Answering system? What movement is in progress? Who is kicking the ball? How many players are in the image? Who are participants? Is it raining? And the problem that the answer that 11 players play the ball, the Buck, the Langzo, the Boll, the Krie, the Thompson, the Deck, the Novogue, the Mark, the Gaussel, the Kaiwen, the Leff and the weather are raining can be analyzed, and the data obtained by analysis is determined as the obtained text box sequence. For the embodiment of the present invention, after the text box sequence is obtained, the text box sequence may be saved to the root node S of the binary tree storage structure0Thereby obtaining a binary tree storage structure storing the current text box sequence.
102. And clustering the text box sequence to obtain a first subsequence and a second subsequence, and detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence.
The clustering processing is a method for automatically dividing a pile of data without labels into several classes, belongs to an unsupervised learning method, and ensures that the data of the same class have similar characteristics. Specifically, the first subsequence and the second subsequence may include two subsequences into which the initial text box is split, and the sum of text box clusters included in the two subsequences is all text boxes in the initial text box sequence.
In addition, the endpoint subsequence is a subsequence including both a first text box and a second text box, the first text box may be a starting text box in the initial text box sequence, the second text box may be an ending text box in the initial text box, for example, after reading the understanding model, a piece of answer information is output as { I, am, a, boy }, then { I } may be the starting text box, and { boy } may be the ending text box. In the process of acquiring the text box sequence, the reading understanding model may directly output the order information of the starting text box and the ending text box, and may search whether to include the first text box and the second text box simultaneously by traversing all the text boxes in the first subsequence and all the text boxes in the second subsequence.
103. If not, backtracking and merging the binary tree storage structure to obtain an answer sequence.
For the embodiment of the present invention, the binary tree storage structure may be subjected to backtracking and merging processing by calculating the minimum horizontal distance between nodes, so as to determine whether the current node is a final result node. The specific backtracking process may include: a. and detecting whether the minimum horizontal distance of the father node of the current node is not more than a preset distance threshold value or not by taking the node at the bottommost layer as the current node. b. If yes, taking the father node of the current node as the current node, executing the step a, and determining the text box sequence stored in the current node as an answer sequence when the minimum horizontal distance is larger than a preset distance threshold value.
104. If so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1And repeatedly executing the clustering processing and detecting steps by the terminal subsequence until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence.
In the embodiment of the invention, the default is to use the left child node of the binary tree storage structure as the storage space of the final output answer and use the right child node of the binary tree storage structure as the storage space of the abandoned child sequence, and in an actual application scene, the left child node can be set according to habits and business requirements, and the invention is not specified specifically. Specifically, the left child node of the root node may be denoted as S1The right child node of the root node may be denoted as S2. Specifically, when the endpoint subsequence exists in the first subsequence and the second subsequence, the endpoint subsequence is continuously clustered, whether the two new subsequences obtained by clustering include the endpoint subsequence is detected, if yes, the endpoint subsequence is continuously clustered, and the process is repeated until the endpoint subsequence does not exist. And when the end point subsequence does not exist, performing backtracking and merging processing on the binary tree storage structure to obtain an answer sequence, wherein the specific backtracking process is the same as the step 104, and is not described herein again.
105. And outputting the answer sequence.
Specifically, after the answer sequence is obtained, the answer sequence may be output, and in an actual application scenario, the answer sequence may be displayed on a display screen, so that an actual problem can be solved by using the answer sequence.
In the embodiment of the invention, the text box sequence is obtained and stored to a root node S of a binary tree storage structure0Before, the method further comprises: detecting and identifying the obtained scanned document by using an optical character recognition system to obtain a text box cluster; arranging the text box clusters according to a preset sequence; processing the arranged text box cluster by using a pre-trained reading understanding model to obtain a first text box and a second text box; and determining a text box cluster between the first text box and the second text box as an output text box sequence.
In the embodiment of the invention, the electronic version document can be obtained in a scanning mode, so that the scanning document is detected and identified by using the optical character recognition system to obtain the text box cluster, and the text box cluster can be a data set of a series of text boxes obtained after the detection and identification by using the optical character recognition system. For example, each text box of the text box clusters { I, am, a, boy }, has a position parameter, for example, the position parameter of { I } may be 1, and the position parameter of { am } may be 2, according to which the text boxes in the text box clusters may be arranged. Inputting the text box cluster after arrangement processing into a pre-trained reading understanding model to obtain a first text box and a second text box, namely a starting text box and an ending text box of the answer sequence, wherein the text box cluster between the starting text box and the ending text box can be determined as a text box sequence to be output.
For further limitation and description, the clustering the text box sequence to obtain a first subsequence and a second subsequence includes: and performing k-means clustering processing on the text box sequence to obtain a first subsequence and a second subsequence.
The specific process may include: randomly extracting 2 text boxes in the text box sequence as a first centroid and a second centroid; respectively calculating Euclidean distances between the rest text boxes in the text box sequence and the first centroid and the second centroid; and dividing the text boxes with the Euclidean distance from the first centroid larger than that from the second centroid into a first subsequence, and dividing the text boxes with the Euclidean distance from the second centroid larger than that from the first centroid into a second subsequence.
The euclidean distance, also known as the euclidean distance, is the most common distance metric, measuring the absolute distance between two points in a multidimensional space, i.e. the true distance between two points in an m-dimensional space, or the natural length of a vector. The euclidean distance in two and three dimensions is the actual distance between two points. The specific calculation formula is as follows:
Figure BDA0002851717760000101
wherein xi and yi respectively represent the horizontal and vertical coordinates of the vector.
According to the embodiment of the invention, the text box sequence is clustered into two subsequences by carrying out k-means clustering processing on the text box sequence, so that the useful text box sequence and the waste text box sequence are respectively stored by utilizing the two subsequences subsequently, thereby deleting irrelevant answers and improving the accuracy of the output answer sequence.
For the inventionIn an embodiment, the performing backtracking merging processing on the binary tree storage structure includes: according to the left child node S at the bottommost layer of the binary tree storage structure2i+1Find the corresponding parent node Si+1(ii) a Calculating the parent node Si+1With said parent node Si+1Of brother node Si+2A minimum horizontal distance therebetween; judging the father node Si+1With said parent node Si+1Of brother node Si+2Whether the minimum horizontal distance between the two is not greater than a preset distance threshold value; if not, stopping backtracking and enabling the left child node S2i+1The subsequence in (1) is determined as the answer sequence; if yes, backtracking is continuously carried out on the upper nodes of the binary tree storage structure.
In the embodiment of the invention, the parent node S is calculatedi+1With said parent node Si+1Of brother node Si+2The minimum horizontal distance therebetween may specifically include: obtaining the father node Si+1Minimum and maximum x-coordinates (a1, a 2); obtaining brother node S of the father nodei+2Minimum and maximum x-coordinates (B1, B2); calculating the father node S according to a preset minimum horizontal distance formulai+1Sibling node S with the parent nodei+2A minimum horizontal distance therebetween, the minimum horizontal distance formula comprising:
D=max(A2,B2)-min(A1,B1)-(B2-B1)-(A2-A1)
wherein D is the father node Si+1Sibling node S with the parent nodei+2A1 is the parent node Si+1Is the parent node S, A2i+1B1 is the parent node Si+1Of brother node Si+2B2 is the parent node Si+1Of brother node Si+2The maximum x coordinate of (c).
For the embodiment of the present invention, the pair of the left child nodes S2Before the clustering and detecting steps are repeatedly performed by the endpoint subsequence in (1), the method further comprises: for the right child node S2The non-endpoint subsequence in (1) is clustered to obtain a thirdSubsequence and fourth subsequence, calculating the third subsequence, fourth subsequence and the left child node S respectively1The minimum horizontal distance between the end point subsequences in (a); if the minimum horizontal distance is not larger than a preset distance threshold, merging and storing the corresponding third subsequence or fourth subsequence to the left child node S1In (1).
Wherein the third subsequence and the fourth subsequence may be for the right child node S2The non-endpoint subsequence in (1) is clustered to obtain two subsequences. In this embodiment, the third subsequence, the fourth subsequence and the left child node S are calculated1The process of the minimum horizontal distance between the terminal sub-sequences in (1) is the same as that in step 205, and is not described herein again. The preset distance threshold may be a preset distance parameter τ, and in an actual application scenario, may be generally set to 30 or 40, and if the minimum horizontal distance is not greater than the preset distance threshold, the corresponding third subsequence or fourth subsequence is merged and stored to the left child node S1Therefore, deletion of irrelevant answers is reduced, and accuracy of the final answer sequence is improved.
The invention provides a method for determining an answer sequence, which can acquire a text box sequence and store the text box sequence to a root node S of a binary tree storage structure0Performing the following steps; clustering the text box sequence to obtain a first subsequence and a second subsequence; detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence simultaneously comprising a first text box and a second text box; if not, backtracking and merging the binary tree storage structure to obtain an answer sequence; if so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence; output stationAnd (5) the answer sequence is described. Therefore, the technical problems that the scanned document structure and typesetting in a real scene are very complex, irrelevant characters are easily contained in the output problem answers, and the accuracy is low can be solved, and the accuracy of the problem answers is improved.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides an answer sequence determining apparatus, as shown in fig. 2, the apparatus includes:
an obtaining unit 21, configured to obtain a text box sequence, and store the text box sequence to a root node S of a binary tree storage structure0Performing the following steps;
the processing unit 22 is configured to perform clustering processing on the text box sequence to obtain a first subsequence and a second subsequence, and detect whether an endpoint subsequence exists in the first subsequence and the second subsequence, where the endpoint subsequence is a subsequence that includes both the first text box and the second text box;
a backtracking unit 23, configured to perform backtracking merging processing on the binary tree storage structure if the answer sequence is not found in the binary tree storage structure, to obtain an answer sequence;
a merging unit 24, configured to save the endpoint subsequence to the root node S if yes0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence;
and an output unit 25, configured to output the answer sequence.
Further, the apparatus further comprises: a calculating unit, a judging unit,
the processing unit is specifically further configured to assign the right child node S2The non-endpoint subsequence in (1) is clustered to obtain a third subsequence and a fourth subsequence,
the computing unit is further specifically configured to compute the third subsequence, the fourth subsequence, and the left child node respectivelyS1The minimum horizontal distance between the end point subsequences in (a);
the judging unit is configured to merge and store the corresponding third subsequence or fourth subsequence to the left child node S if the minimum horizontal distance is not greater than a preset distance threshold1In (1).
Further, the backtracking unit includes:
a searching module for searching the left child node S at the bottom layer of the binary tree storage structure2i+1Find the corresponding parent node Si+1
A first calculation module for calculating the father node Si+1With said parent node Si+1Of brother node Si+2A minimum horizontal distance therebetween;
a judging module for judging the father node Si+1With said parent node Si+1Of brother node Si+2Whether the minimum horizontal distance between the two is not greater than a preset distance threshold value;
a determining module, configured to stop backtracking and determine the left child node S if the left child node S is not the left child node S2i+1The subsequence in (1) is determined as the answer sequence; if yes, backtracking is continuously carried out on the upper nodes of the binary tree storage structure.
Further, the first computing module is specifically configured to obtain the parent node Si+1Minimum and maximum x-coordinates (a1, a 2); obtaining brother node S of the father nodei+2Minimum and maximum x-coordinates (B1, B2); calculating the father node S according to a preset minimum horizontal distance formulai+1Sibling node S with the parent nodei+2A minimum horizontal distance therebetween, the minimum horizontal distance formula comprising:
D=max(A2,B2)-min(A1,B1)-(B2-B1)-(A2-A1)
wherein D is the father node Si+1Sibling node S with the parent nodei+2A1 is the parent node Si+1Is the parent node S, A2i+1B1 is the parent node Si+1Of brother node Si+2Is the most important ofSmall x coordinate, B2 being the parent node Si+1Of brother node Si+2The maximum x coordinate of (c).
Further, the processing unit is specifically configured to perform k-means clustering on the text box sequence to obtain a first subsequence and a second subsequence.
Further, the processing unit comprises;
the extraction module is used for randomly extracting 2 text boxes in the text box sequence to serve as a first centroid and a second centroid;
the second calculation module is used for calculating Euclidean distances between the rest text boxes in the text box sequence and the first centroid and the second centroid respectively;
and the dividing module is used for dividing the text boxes with the Euclidean distance from the first centroid to the second centroid to a first subsequence, and dividing the text boxes with the Euclidean distance from the second centroid to the second subsequence.
Further, the apparatus further comprises:
the recognition unit is used for detecting and recognizing the obtained scanning document by using an optical character recognition system to obtain a text box cluster;
the arranging unit is used for arranging the text box clusters according to a preset sequence;
the training unit is used for processing the arranged text box cluster by utilizing a pre-trained reading understanding model to obtain a first text box and a second text box;
and the determining unit is used for determining the text box cluster between the first text box and the second text box as the output text box sequence.
According to an embodiment of the present invention, a storage medium is provided, and the storage medium stores at least one executable instruction, and the computer executable instruction can execute the method for determining the answer sequence in any of the above method embodiments.
Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computer device.
As shown in fig. 3, the computer apparatus may include: a processor (processor)302, a communication Interface 304, a memory 306, and a communication bus 308.
Wherein: the processor 302, communication interface 304, and memory 306 communicate with each other via a communication bus 308.
A communication interface 303 for communicating with network elements of other devices, such as clients or other servers.
The processor 302 is configured to execute the program 310, and may specifically execute the relevant steps in the above-described answer sequence determination method embodiment.
In particular, program 310 may include program code comprising computer operating instructions.
The processor 302 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computer device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 306 for storing a program 310. Memory 306 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 310 may specifically be configured to cause the processor 302 to perform the following operations:
acquiring a text box sequence, and storing the text box sequence to a root node S of a binary tree storage structure0Performing the following steps;
clustering the text box sequence to obtain a first subsequence and a second subsequence, and detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence which simultaneously comprises a first text box and a second text box;
if not, backtracking and merging the binary tree storage structure to obtain an answer sequence;
if so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence;
and outputting the answer sequence.
By the technical scheme, the text box sequence can be stored to the root node S of the binary tree storage structure by acquiring the text box sequence0Performing the following steps; clustering the text box sequence to obtain a first subsequence and a second subsequence; detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence simultaneously comprising a first text box and a second text box; if not, backtracking and merging the binary tree storage structure to obtain an answer sequence; if so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence; and outputting the answer sequence. Therefore, the technical problems that the scanned document structure and typesetting in a real scene are very complex, irrelevant characters are easily contained in the output problem answers, and the accuracy is low can be solved, and the accuracy of the problem answers is improved.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for determining a sequence of answers, comprising:
acquiring a text box sequence, and storing the text box sequence to a root node S of a binary tree storage structure0Performing the following steps;
clustering the text box sequence to obtain a first subsequence and a second subsequence, and detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence which simultaneously comprises a first text box and a second text box;
if not, backtracking and merging the binary tree storage structure to obtain an answer sequence;
if so, saving the endpoint subsequence to the root node S0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1Repeatedly executing clustering processing and detecting steps by the terminal subsequence in the binary tree until no terminal subsequence exists, and backtracking and combining the binary tree storage structure to obtain an answer sequence;
and outputting the answer sequence.
2. The method of claim 1, wherein the pair of the left child nodes S2Before the clustering and detecting steps are repeatedly performed by the endpoint subsequence in (1), the method further comprises:
for the right child node S2The non-endpoint subsequence in (1) is clustered to obtain a third subsequence and a fourth subsequence,
respectively calculating the third subsequence, the fourth subsequence and the left child node S1The minimum horizontal distance between the end point subsequences in (a);
if the minimum horizontal distance is not larger than a preset distance threshold, merging and storing the corresponding third subsequence or fourth subsequence to the left child node S1In (1).
3. The method according to claim 1, wherein the performing traceback merge processing on the binary tree storage structure includes:
according to the left child node S at the bottommost layer of the binary tree storage structure2i+1Find the corresponding parent node Si+1
Calculating the parent node Si+1With said parent node Si+1Of brother node Si+2A minimum horizontal distance therebetween;
judging the father node Si+1With said parent node Si+1Of brother node Si+2Whether the minimum horizontal distance between the two is not greater than a preset distance threshold value;
if not, stopping backtracking and enabling the left child node S2i+1The subsequence in (1) is determined as the answer sequence;
if yes, backtracking is continuously carried out on the upper nodes of the binary tree storage structure.
4. The method of claim 3, wherein computing the parent node Si+1Sibling node S with the parent nodei+2The most important of the twoA small horizontal distance comprising:
obtaining the father node Si+1Minimum and maximum x-coordinates (a1, a 2);
obtaining brother node S of the father nodei+2Minimum and maximum x-coordinates (B1, B2);
calculating the father node S according to a preset minimum horizontal distance formulai+1Sibling node S with the parent nodei+2A minimum horizontal distance therebetween, the minimum horizontal distance formula comprising:
D=max(A2,B2)-min(A1,B1)-(B2-B1)-(A2-A1)
wherein D is the father node Si+1Sibling node S with the parent nodei+2A1 is the parent node Si+1Is the parent node S, A2i+1B1 is the parent node Si+1Of brother node Si+2B2 is the parent node Si+1Of brother node Si+2The maximum x coordinate of (c).
5. The method of claim 1, wherein the clustering the text box sequence to obtain a first subsequence and a second subsequence comprises:
and performing k-means clustering processing on the text box sequence to obtain a first subsequence and a second subsequence.
6. The method of claim 5, wherein the k-means clustering process is performed on the text box sequence to obtain a first subsequence and a second subsequence, including;
randomly extracting 2 text boxes in the text box sequence as a first centroid and a second centroid;
respectively calculating Euclidean distances between the rest text boxes in the text box sequence and the first centroid and the second centroid;
and dividing the text boxes with the Euclidean distance from the first centroid larger than that from the second centroid into a first subsequence, and dividing the text boxes with the Euclidean distance from the second centroid larger than that from the first centroid into a second subsequence.
7. The method of claim 1, wherein obtaining the sequence of text boxes, saving the sequence of text boxes to a root node S of a binary tree storage structure0Before, the method further comprises: detecting and identifying the obtained scanned document by using an optical character recognition system to obtain a text box cluster;
arranging the text box clusters according to a preset sequence;
processing the arranged text box cluster by using a pre-trained reading understanding model to obtain a first text box and a second text box;
and determining a text box cluster between the first text box and the second text box as an output text box sequence.
8. An answer sequence determination apparatus, comprising:
an obtaining unit, configured to obtain a text box sequence, and store the text box sequence to a root node S of a binary tree storage structure0Performing the following steps;
the processing unit is used for clustering the text box sequence to obtain a first subsequence and a second subsequence, and detecting whether an endpoint subsequence exists in the first subsequence and the second subsequence, wherein the endpoint subsequence is a subsequence which simultaneously comprises the first text box and the second text box;
the backtracking unit is used for backtracking and merging the binary tree storage structure if the binary tree storage structure is not the binary tree storage structure, so as to obtain an answer sequence;
a merging unit, configured to save the endpoint subsequence to the root node S if yes0Left child node S of1In (2), saving the non-endpoint subsequence to the root node S0Right child node S of2And for the left child node S1The end point subsequence in (1) repeatedly performs clustering processing anddetecting, namely backtracking and merging the binary tree storage structure until no end point subsequence exists to obtain an answer sequence;
and the output unit is used for outputting the answer sequence.
9. A storage medium having stored therein executable instructions for causing a processor to perform operations corresponding to the determination method of answer sequence according to any one of claims 1-7.
10. A computer device, comprising: a processor, a memory;
the memory is used for storing executable instructions which enable the processor to execute the operation corresponding to the answer sequence determination method of any one of claims 1-7.
CN202011529776.4A 2020-12-22 2020-12-22 Answer sequence determination method and device Pending CN112667771A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011529776.4A CN112667771A (en) 2020-12-22 2020-12-22 Answer sequence determination method and device
PCT/CN2021/109383 WO2022134578A1 (en) 2020-12-22 2021-07-29 Method and apparatus for determining answer sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011529776.4A CN112667771A (en) 2020-12-22 2020-12-22 Answer sequence determination method and device

Publications (1)

Publication Number Publication Date
CN112667771A true CN112667771A (en) 2021-04-16

Family

ID=75407700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011529776.4A Pending CN112667771A (en) 2020-12-22 2020-12-22 Answer sequence determination method and device

Country Status (2)

Country Link
CN (1) CN112667771A (en)
WO (1) WO2022134578A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134578A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Method and apparatus for determining answer sequence

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598445B (en) * 2013-11-01 2019-05-10 腾讯科技(深圳)有限公司 Automatically request-answering system and method
CN104809134B (en) * 2014-01-27 2018-03-09 国际商业机器公司 The method and apparatus for detecting the abnormal subsequence in data sequence
CN109508733A (en) * 2018-10-23 2019-03-22 北京邮电大学 A kind of method for detecting abnormality based on distribution probability measuring similarity
US11405695B2 (en) * 2019-04-08 2022-08-02 Spirent Communications, Inc. Training an encrypted video stream network scoring system with non-reference video scores
CN110287282A (en) * 2019-05-20 2019-09-27 湖南大学 The Intelligent dialogue systems response method and Intelligent dialogue system of calculation are assessed based on tree
CN111814843B (en) * 2020-03-23 2024-02-27 同济大学 End-to-end training method and application of image feature module in visual question-answering system
CN112667771A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Answer sequence determination method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134578A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Method and apparatus for determining answer sequence

Also Published As

Publication number Publication date
WO2022134578A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
Mahdavi et al. ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
Zhang et al. Detection of co-salient objects by looking deep and wide
US9367766B2 (en) Text line detection in images
US10013636B2 (en) Image object category recognition method and device
CN108288051B (en) Pedestrian re-recognition model training method and device, electronic equipment and storage medium
CN109993102B (en) Similar face retrieval method, device and storage medium
WO2018196718A1 (en) Image disambiguation method and device, storage medium, and electronic device
CN108334805B (en) Method and device for detecting document reading sequence
CN109919077B (en) Gesture recognition method, device, medium and computing equipment
CN111539452B (en) Image recognition method and device for multi-task attribute, electronic equipment and storage medium
CN109241299B (en) Multimedia resource searching method, device, storage medium and equipment
CN111325237A (en) Image identification method based on attention interaction mechanism
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
Xu et al. On learning semantic representations for large-scale abstract sketches
CN112667771A (en) Answer sequence determination method and device
CN111159456B (en) Multi-scale clothing retrieval method and system based on deep learning and traditional features
CN111242114B (en) Character recognition method and device
CN110516638B (en) Sign language recognition method based on track and random forest
CN111144469A (en) End-to-end multi-sequence text recognition method based on multi-dimensional correlation time sequence classification neural network
JPH1166238A (en) Handwritten character recognition method
Kataria et al. CNN-bidirectional LSTM based optical character recognition of Sanskrit manuscripts: A comprehensive systematic literature review
CN112329389B (en) Chinese character stroke automatic extraction method based on semantic segmentation and tabu search
Ghosh et al. Efficient indexing for query by string text retrieval
CN113869352A (en) Model training method, clothing retrieval method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40045448

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination