CN112434495A - Selection method, selection device, computer equipment and storage medium - Google Patents

Selection method, selection device, computer equipment and storage medium Download PDF

Info

Publication number
CN112434495A
CN112434495A CN202011471610.1A CN202011471610A CN112434495A CN 112434495 A CN112434495 A CN 112434495A CN 202011471610 A CN202011471610 A CN 202011471610A CN 112434495 A CN112434495 A CN 112434495A
Authority
CN
China
Prior art keywords
text
starting point
cursor position
line
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011471610.1A
Other languages
Chinese (zh)
Inventor
李譞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wanxing Technology Hunan Co ltd
Original Assignee
Wanxing Technology Hunan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wanxing Technology Hunan Co ltd filed Critical Wanxing Technology Hunan Co ltd
Priority to CN202011471610.1A priority Critical patent/CN112434495A/en
Publication of CN112434495A publication Critical patent/CN112434495A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a selection method, a selection device, computer equipment and a storage medium thereof. Constructing a text data structure of a text, and establishing a cursor data structure corresponding to the text data structure; receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in a text data structure, and respectively matching a corresponding starting point cursor position and an end point cursor position in a cursor data structure; and determining a text range according to the starting point cursor position and the end point cursor position, and selecting text contents in the text range. The embodiment of the invention calculates the selected area in the text data structure through the provided starting point coordinate and the end point coordinate, and has the advantages of high text content selection accuracy and convenience for subsequent labeling.

Description

Selection method, selection device, computer equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of text processing, in particular to a selection method, a selection device, computer equipment and a storage medium.
Background
Currently, commonly used layout documents include streaming documents and fixed layout documents; when the page content of the fixed format document is displayed or edited, the text object in the page cannot be wrapped; in an application program capable of viewing a fixed-format document, it is difficult to select text content with a structure of a text paragraph and perform labeling such as highlighting and underlining.
In view of the above problems, the prior art provides many improved methods, but there is still a disorder of the selection effect when displaying or editing, and the obtained selection result may not be the result obtained by selecting according to the commonly understood structure of the text passage.
Disclosure of Invention
Embodiments of the present invention provide a selection method, an apparatus, a computer device, and a storage medium, and aim to solve the problem in the prior art that the accuracy of selecting text content and the effect of marking text content still need to be improved.
In a first aspect, an embodiment of the present invention provides a selection method, which includes:
constructing a text data structure of a text, and establishing a cursor data structure corresponding to the text data structure;
receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in the text data structure, and respectively matching a corresponding starting point cursor position and an end point cursor position in the cursor data structure;
and determining a text range according to the starting point cursor position and the end point cursor position, and selecting text contents in the text range.
In a second aspect, an embodiment of the present invention provides a selection apparatus, which includes:
the construction unit is used for constructing a text data structure of a text and establishing a cursor data structure corresponding to the text data structure;
the searching unit is used for receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in the text data structure, and respectively matching a corresponding starting point cursor position and a corresponding end point cursor position in the cursor data structure;
and the selection unit is used for determining a text range according to the starting point cursor position and the end point cursor position and selecting the text content in the text range.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the selection method according to the first aspect.
In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the selection method according to the first aspect.
The embodiment of the invention discloses a selection method, a selection device, computer equipment and a storage medium thereof, wherein the method comprises the steps of constructing a text data structure of a text and establishing a cursor data structure corresponding to the text data structure; receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in a text data structure, and respectively matching a corresponding starting point cursor position and an end point cursor position in a cursor data structure; and determining a text range according to the starting point cursor position and the end point cursor position, and selecting text contents in the text range. The embodiment of the invention calculates the selected area in the text data structure through the provided starting point coordinate and the end point coordinate, and has the advantages of high text content selection accuracy and convenience for subsequent labeling.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a selection method according to an embodiment of the present invention;
FIG. 2 is a schematic sub-flow chart of a selection method according to an embodiment of the present invention;
FIG. 3 is a schematic view of another sub-flow of the selection method according to the embodiment of the present invention;
FIG. 4 is a schematic view of another sub-flow of the selection method according to the embodiment of the present invention;
FIG. 5 is a schematic view of another sub-flow chart of the selection method according to the embodiment of the present invention;
FIG. 6 is a schematic view of another sub-flow chart of the selection method according to the embodiment of the present invention;
FIG. 7 is a schematic view of another sub-flow chart of the selection method according to the embodiment of the present invention;
FIG. 8 is a schematic view of another sub-flow chart of the selection method according to the embodiment of the present invention;
fig. 9 is a schematic diagram of a selection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, belong to the protection scope of the embodiments of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the invention. As used in the description of embodiments of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the description of embodiments of the present invention and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart of a selection method according to an embodiment of the present invention;
as shown in fig. 1, the method includes steps S101 to S103.
S101, a text data structure of the text is constructed, and a cursor data structure corresponding to the text data structure is established.
In this embodiment, all text objects in a text are analyzed and identified to obtain logical relationships between text words, text lines, text paragraphs, text blocks, and text streams, a text data structure of the text is constructed by using the logical relationships between the text words, the text lines, the text paragraphs, the text blocks, and the text streams, a cursor data structure corresponding to the text data structure is established at the same time, and the corresponding text stream objects, the text block objects, the text paragraph objects, the text line objects, the text word objects, their subscripts, and the subscripts of virtual characters are recorded by the cursor data structure, so that the position of the text to be operated or the range of the text can be specified by the cursor information.
The embodiment of the invention can be applied to fixed format documents, such as PDF documents, and has the advantages of accurate selection and convenient marking when text content is selected by the structure of a text paragraph and marked with highlight, underline and the like in an application program which can view the fixed format documents.
In one embodiment, as shown in fig. 2, step S101 includes:
s201, confirming text objects in the text, and sequentially constructing text words, text lines, text paragraphs, text blocks, text streams and text panel objects step by step;
and S202, taking the constructed selected text board object as a text data structure.
In this embodiment, the process of constructing the text panel object is a process of constructing a text data structure: constructing text words by one or more text objects, and selecting a mapping relation between virtual characters contained in the text words and actual characters in the text objects; constructing a text line from one or more text words; constructing a text passage from one or more text lines; constructing a text block from one or more text paragraphs; constructing a text stream from one or more text blocks; a text panel object is constructed from one or more text streams. And constructing the text board object layer by layer.
S102, receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in a text data structure, and respectively matching a corresponding starting point cursor position and an end point cursor position in a cursor data structure.
In this embodiment, the starting point coordinate and the ending point coordinate are both position points in the text data structure, the position of the starting point coordinate and the position of the ending point coordinate are searched in the text data structure to select a text in the text data structure, and after the positions of the starting point coordinate and the ending point coordinate are found, the positions of the starting point coordinate and the ending point coordinate can be displayed in a cursor form based on the cursor data structure, so that a user can directly find a specific position point, and the user can input an accurate starting point coordinate and an accurate ending point coordinate to accurately select a text range.
In one embodiment, as shown in fig. 3, step S102 includes:
s301, finding out a first text block where a selected starting point coordinate is located in all text streams, and constructing a first text block alternative set; searching a second text block where the selected terminal point coordinate is located in all the text streams, and constructing a second text block alternative set;
s302, finding out a first text paragraph with a selected starting point coordinate in the selected first text block candidate set, and constructing a first text paragraph candidate set; finding out a second text paragraph in which the selected end point coordinate is located in the selected second text block candidate set, and constructing a second text paragraph candidate set;
s303, finding out a first text line where the coordinate of the selected starting point is located in the selected first text paragraph alternative set, and constructing a first text line paragraph alternative set; finding out a second text line where the selected end point coordinate is located in the selected second text paragraph candidate set, and constructing a second text line candidate set;
s304, finding out a first text word in which a selected starting point coordinate is located in the selected first text line candidate set, and constructing a first text word candidate set; searching a second text word in which the selection end point coordinate is located in the selection second text line alternative set, and constructing a second text word alternative set;
s305, finding out a first virtual character where the coordinate of the selection starting point is located in the selection first text word candidate set, and constructing a first virtual character candidate set; searching a second virtual character in which the selection end point coordinate is located in the selection second text word alternative set, and constructing a second virtual character alternative set;
s306, selecting a first cursor position closest to the distance of the coordinate of the selection starting point in the first virtual character alternative set based on the selection cursor data structure, and constructing a first cursor alternative set; selecting a second cursor position closest to the distance of the selection end point coordinate from the second virtual character alternative set, and constructing a second cursor alternative set;
s307, selecting a first cursor position with the straight line distance closest to the coordinate of the selection starting point from the selection first cursor alternative set as a starting point cursor position; and selecting a second cursor position with the straight line distance closest to the selection end point coordinate from the selection second cursor alternative set as an end point cursor position.
In this embodiment, the process of finding the coordinates of the starting point in the text data structure and matching the corresponding cursor position of the starting point in the cursor data structure is as follows: searching all first text blocks of which the starting point coordinates fall in the text stream boundary area in each text stream one by one, and constructing a first text block alternative set, namely confirming that the starting point coordinates fall in the first text blocks in the first text block alternative set, so that the searching range of the starting point coordinates can be preliminarily reduced; then, continuously searching all first text paragraphs of which the starting point coordinates fall in the boundary area of the first text block in each first text block, and constructing a first text paragraph alternative set, namely confirming that the starting point coordinates fall in the first text paragraphs in the first text paragraph alternative set, so that the searching range of the starting point coordinates can be further reduced; it can be understood that, by using the same method, the first text line alternative set, the first text word alternative set and the first virtual character alternative set are continuously searched and sequentially constructed, and the starting point coordinate is searched and confirmed to fall in the first virtual character alternative set step by step; then comparing character origin points of first cursor positions corresponding to all first virtual characters in the first virtual character alternative set with the origin coordinates based on a cursor data structure, selecting the first cursor position closest to the origin coordinates, and constructing a first cursor alternative set, wherein the first cursor position comprises cursor positions in front of the characters and behind the characters; and finally, selecting the first cursor position with the straight line distance closest to the starting point coordinate from the first cursor alternative set as the starting point cursor position.
The process of finding the end point coordinates in the text data structure and matching out the corresponding end point cursor position in the cursor data structure: the process is the same as the process of finding the coordinates of the starting point and matching the cursor position of the starting point, a second text block candidate set, a second text paragraph candidate set, a second text line candidate set, a second text word candidate set and a second virtual character candidate set are found and constructed layer by layer from the text stream, and finally the terminal point coordinates are confirmed to fall in a second virtual character in the second virtual character candidate set; then comparing character origin points and end point coordinates of second cursor positions corresponding to all second virtual characters in a second virtual character alternative set based on a cursor data structure, selecting a second cursor position closest to the end point coordinates, and constructing a second cursor alternative set, wherein the second cursor position comprises cursor positions in front of the characters and behind the characters; and finally, selecting a second cursor position with the straight line distance closest to the end point coordinate from the second cursor alternative set as the starting point cursor position.
In one embodiment, as shown in fig. 4, step S102 further includes:
s401, if the first virtual character with the selected starting point coordinate is not found in the selected first text word alternative set; and failing to find out a second virtual character with the selected end point coordinate in the selected second text word alternative set;
s402, finding out a first text block with a closest starting point coordinate distance in all text streams, and constructing a first text block alternative set; searching a second text block with the closest selection end point coordinate distance in all text streams, and constructing a second text block alternative set;
s403, finding out a first text paragraph with the closest selection starting point coordinate distance in the selected first text block candidate set, and constructing a first text paragraph candidate set; searching a second text paragraph with the closest selection end point coordinate distance in the selected second text block candidate set, and constructing a second text paragraph candidate set;
s404, finding out a first text line with the closest selection starting point coordinate distance in the selected first text paragraph candidate set, and constructing a first text line candidate set; searching a second text line with the closest selection end point coordinate distance in the selected second text paragraph candidate set, and constructing a second text line candidate set;
s405, finding out a first text word with the closest selection starting point coordinate distance in the first text line candidate set, and constructing a first text word candidate set; searching a second text word with the closest selection end point coordinate distance in the second text line candidate set, and constructing a second text word candidate set;
s406, finding out a first virtual character with the closest selection starting point coordinate distance in the selected first text word candidate set, and constructing a first virtual character candidate set; and finding out a second virtual character with the closest selection end coordinate distance in the second text word candidate set, and constructing a second virtual character candidate set.
In this embodiment, in the process of finding the start point coordinate and the end point coordinate, if any first virtual character and any second virtual character cannot be found, that is, the positions of the start point coordinate and the end point coordinate do not fall into any first virtual character and any second virtual character, the positions of the start point coordinate and the end point coordinate need to be found again.
Specifically, the process of searching for the first virtual character again: searching a first text block with the closest starting point coordinate distance (the distance refers to the shortest distance between the starting point coordinate and two closest edges of a rectangular region of the text stream and is used as the length of the hypotenuse of a triangle formed by two right-angle edges) in each text stream one by one, and constructing a first text block alternative set; then, continuously searching a first text paragraph with the shortest coordinate distance (the distance refers to the shortest distance between the first text paragraph and an edge with the same text writing direction) of the starting point coordinate distance in the first text block, wherein the first text paragraph is closest to the boundary area of the first text block, and constructing a first text paragraph alternative set; then continuously finding out a first text line with the shortest coordinate distance (the distance refers to the shortest distance between the first text line and an edge with the same text writing direction) of the starting point coordinate distance in the first text paragraph, wherein the first text line is closest to the boundary area of the first text paragraph, and constructing a first text line set; then continuously searching a first text word with the shortest coordinate distance (the distance refers to the shortest distance between the first text word and an edge perpendicular to the text writing direction) of the starting point coordinate distance in the first text line, wherein the first text word is closest to the boundary area of the first text line, and constructing a first text word set; then continuously searching a first virtual character with the shortest coordinate distance (the distance refers to the shortest distance between the first virtual character and the edge perpendicular to the text writing direction) of the starting point coordinate distance in the first text word, wherein the first virtual character is closest to the boundary area of the first text word, and constructing a first virtual character set; this results in the first virtual character.
Specifically, the process of searching for the second virtual character again: and constructing a second text block alternative set, a second text paragraph alternative set, a second text line alternative set, a second text word alternative set and a second virtual character alternative set in sequence from level to bottom on the same principle of re-searching the first virtual character, so as to obtain a second virtual character.
S103, determining a text range according to the starting point cursor position and the end point cursor position, and selecting text contents in the text range.
In this embodiment, the starting point cursor position and the ending point cursor position are both a specific position point in the text data structure, and a specific text range can be determined by the two specific position points.
In one embodiment, as shown in fig. 5, step S103 includes:
s501, determining a corresponding text range according to a starting point cursor position and a finishing point cursor position;
s502, based on the sequence of text lines recorded in a text data structure, if the text lines do not fall into the text range completely, acquiring rectangular regions of each character one by one from the head of the line or the starting point cursor position of the text line, merging the rectangular regions of each character until meeting the tail of the line or the end point cursor position, and adding the merged rectangular regions into a highlight region set;
s503, if all the text lines fall into the text range, directly adding the rectangular areas of the whole text line into the highlight area set;
s504, displaying a highlight effect on a user interface according to the rectangular area in the highlight area set, or adding text highlight labels under the text.
In this embodiment, starting from the starting point cursor position, the step of constructing a highlight region set is performed for each text line according to the sequence recorded in the text paragraph data structure:
firstly, if the text line is not in the text range of the whole line, acquiring rectangular areas of each character one by one from the starting point cursor position or the line head of the text line, merging the rectangular areas of each character until meeting the line tail or the end point cursor position of the text line, and then adding the merged rectangular areas into a highlight area set; secondly, if the whole text line is in the text range, directly adding the rectangular area of the whole text line into the highlight area set; finally, in the process of acquiring the rectangular areas of the text lines, when the positions of the end point cursor are met, the whole acquiring process is finished, the finally combined rectangular areas are added into the highlight area set, and the construction of the highlight area set is finished; and displaying a highlight effect in the user interface or adding a highlight label into the text according to the rectangular area in the highlight area set.
In one embodiment, as shown in fig. 6, step S104 includes:
s601, determining a corresponding text range according to the starting point cursor position and the end point cursor position;
s602, based on the sequence of the text lines recorded in the text data structure, if the text lines do not fall into the text range completely, starting from the head of the text line or the starting point cursor position, determining the origin of the first character as the underline starting point, then updating the underline terminal point one by one until the word size change or the tail of the line or the terminal point cursor position is met, and then adding the underline into an underline set;
s603, taking the original point of the next character as a new underline starting point again until meeting the change of the font size or the position of a line tail or a terminal cursor, and adding the underline into an underline set;
s604, displaying an underline effect on the user interface according to the line segments in the underline set, or adding text underline marks under the text.
In this embodiment, starting from the position of the start cursor, the step of constructing an underline set is performed on each text line according to the sequence recorded in the text paragraph data structure:
firstly, starting from a starting point cursor position or the head of a text line, taking the obtained origin of a first character as the starting point of an underline, then updating an underline terminal point (the origin position behind the character is the terminal point) one by one until the font size of the character is changed or the tail of the text line or the terminal point cursor position is met, and then adding the underline into an underline set; secondly, linking up the position of the last ending, taking the origin of the next character as a new underline starting point again, then continuing to update the underline end point one by one, obtaining a new underline, and then adding the new underline into an underline set; finally, in the process of obtaining underlines, after meeting the position of the terminal cursor, the whole obtaining process is ended, the finally obtained underlines are added into an underline set, and the construction of the underline set is ended; displaying an underline effect in the user interface or adding an underline label to the text according to underlines in the underline set.
In one embodiment, as shown in fig. 7, step S104 further includes:
s701, determining a corresponding text range according to the starting point cursor position and the end point cursor position;
s702, based on the sequence of text lines recorded in a text data structure, if the text lines do not fall into the text range completely, starting from the line head or starting point cursor position of the text line, determining the original point of the first character as the starting point of a strikethrough, then updating the end point of the strikethrough one by one until the character size change or the line tail or end point cursor position is met, and then adding the strikethrough into a strikethrough set;
s703, taking the original point of the next character as the starting point of a new deletion line again until meeting the change of the font size or the position of a line tail or a terminal cursor, and adding the deletion line into a deletion line set;
s704, displaying a delete line effect on a user interface according to line segments in the delete line set, or adding a text delete line label under the text.
In this embodiment, starting from the starting point cursor position, the step of constructing a strikethrough set is performed on each text line according to the sequence recorded in the text paragraph data structure:
firstly, starting from a starting point cursor position or the head of a line of a text line, taking the obtained origin of a first character as the starting point of a deletion line, then updating the end point of the deletion line (the origin position behind the character is the end point) one by one until the font size of the character changes or the tail of the line of the text line or the end point cursor position is met, and then adding the deletion line into a deletion line set; secondly, linking up the last ending position, taking the original point of the next character as the starting point of a new deletion line again, then continuing to update the end point of the deletion line one by one, obtaining a new deletion line, and adding the new deletion line into the deletion line set; finally, in the process of acquiring the deletion line, after meeting the position of the terminal cursor, the whole acquisition process is ended, the finally acquired deletion line is added into the deletion line set, and the construction of the deletion line set is ended; and displaying the effect of the deletion line in the user interface or adding a deletion line label in the text according to the deletion line in the deletion line set.
Further, in one embodiment, when the strikethrough effect is displayed in the user interface, the strikethrough moves up 25% of the character height in a direction perpendicular to the direction in which the text is written.
In one embodiment, as shown in fig. 8, step S104 further includes:
s801, determining a corresponding text range according to a starting point cursor position and a finishing point cursor position;
s802, based on the sequence of the text paragraphs recorded in the text data structure, if the text paragraphs do not fall into the text range completely, each text character is acquired one by one from the beginning or starting cursor position of the text paragraph and added into the content set until meeting the end or ending cursor position;
s803, if the text paragraphs all fall into the text range, directly adding the characters of the whole text paragraph into the content set until encountering the paragraph tail or the end point cursor position;
s804, when the character of the previous text paragraph is added into the content set and the character of the next text paragraph is added, adding a carriage return symbol into the content set;
and S805, continuously extracting the plain text content in the content set to obtain text data.
In this embodiment, starting from the starting point cursor position, the step of constructing a content set is performed on each text paragraph according to the text paragraph sequence recorded in the text paragraph data structure:
firstly, if the text paragraph is not in the text range of the whole paragraph, acquiring each character one by one from the starting point cursor position or the beginning of the text paragraph, and adding the characters into the content set until encountering the end of the text paragraph or the end point cursor position; secondly, if the text paragraphs are all in the text range, directly adding the characters of the whole text paragraphs into the content set, adding the whole text paragraphs into the content set, and then entering the characters of the next text paragraph for obtaining, wherein at this moment, a carriage return symbol needs to be added into the content set; finally, when the text paragraph where the end point cursor position is located is obtained, stopping obtaining at the end point cursor position, and ending the construction process of the content set; the text in the content set is the corresponding plain text content in the text range, and can be extracted to obtain text data.
Embodiments of the present invention further provide a selection apparatus, where the selection apparatus is configured to execute any of the embodiments of the foregoing selection method. Specifically, referring to fig. 9, fig. 9 is a schematic block diagram of a selection device according to an embodiment of the present invention.
As shown in fig. 9, the selection device 900 includes: a construction unit 901, a search unit 902 and a selection unit 903.
A constructing unit 901, configured to construct a text data structure of a text, and establish a cursor data structure corresponding to the text data structure;
a searching unit 902, configured to receive a start coordinate and an end coordinate given by a user, search for a position of the start coordinate and a position of the end coordinate in a text data structure, and match a corresponding start cursor position and an end cursor position in a cursor data structure respectively;
and the selecting unit 903 is used for determining a text range according to the starting point cursor position and the end point cursor position, and selecting text contents in the text range.
The device calculates the selected area in the text data structure through the starting point coordinate and the end point coordinate provided by the user, and has the advantages of high text content selection accuracy and convenience for subsequent labeling.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the selection method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above selection method is implemented.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
While the embodiments of the present invention have been described with reference to the accompanying drawings, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the embodiments of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of selection, comprising:
constructing a text data structure of a text, and establishing a cursor data structure corresponding to the text data structure;
receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in the text data structure, and respectively matching a corresponding starting point cursor position and an end point cursor position in the cursor data structure;
and determining a text range according to the starting point cursor position and the end point cursor position, and selecting text contents in the text range.
2. The selection method according to claim 1, wherein the constructing a text data structure of the text comprises:
confirming text objects in the text, and sequentially constructing text words, text lines, text paragraphs, text blocks, text streams and text panel objects step by step;
and taking the constructed text board object as a text data structure.
3. The selection method according to claim 2, wherein the receiving a start coordinate and an end coordinate given by a user, finding a position of the start coordinate and a position of the end coordinate in the text data structure, and matching a corresponding start cursor position and an end cursor position in the cursor data structure respectively comprises:
finding out a first text block where the starting point coordinate is located in all text streams, and constructing a first text block alternative set; finding out a second text block where the terminal point coordinate is located in all text streams, and constructing a second text block alternative set;
finding out a first text paragraph in which the starting point coordinate is located in the first text block candidate set, and constructing a first text paragraph candidate set; finding out a second text paragraph in which the end point coordinate is located in the second text block candidate set, and constructing a second text paragraph candidate set;
finding out a first text line where the starting point coordinate is located in the first text paragraph candidate set, and constructing a first text line candidate set; finding out a second text line where the end point coordinate is located in the second text paragraph candidate set, and constructing a second text line candidate set;
finding out a first text word in which the starting point coordinate is located in the first text line candidate set, and constructing a first text word candidate set; finding out a second text word in which the terminal point coordinate is located in the second text line candidate set, and constructing a second text word candidate set;
finding out a first virtual character of the starting point coordinate in the first text word candidate set, and constructing a first virtual character candidate set; finding out a second virtual character where the end point coordinate is located in the second text word alternative set, and constructing a second virtual character alternative set;
selecting a first cursor position closest to the distance of the starting point coordinate from the first virtual character alternative set based on the cursor data structure, and constructing a first cursor alternative set; selecting a second cursor position closest to the distance of the terminal coordinate from the second virtual character alternative set, and constructing a second cursor alternative set;
selecting a first cursor position with a straight line distance closest to the starting point coordinate from the first cursor alternative set as a starting point cursor position; and selecting a second cursor position with the straight line distance closest to the terminal coordinate from the second cursor alternative set as the terminal cursor position.
4. The selection method according to claim 3, wherein the receiving a start coordinate and an end coordinate given by a user, finding a position of the start coordinate and a position of the end coordinate in the text data structure, and matching a corresponding start cursor position and an end cursor position in the cursor data structure, respectively, further comprises:
if the first virtual character of the starting point coordinate is not found in the first text word alternative set; and failing to find out a second virtual character in which the end point coordinate is located in the second text word alternative set;
finding out a first text block with the closest starting point coordinate distance in all text streams, and constructing a first text block alternative set; finding out a second text block with the closest terminal coordinate distance in all text streams, and constructing a second text block alternative set;
finding out a first text paragraph with the closest starting point coordinate distance in the first text block candidate set, and constructing a first text paragraph candidate set; finding out a second text paragraph with the closest end coordinate distance in the second text block candidate set, and constructing a second text paragraph candidate set;
finding out a first text line with the closest starting point coordinate distance in the first text paragraph candidate set, and constructing a first text line candidate set; finding out a second text line with the closest end point coordinate distance in the second text paragraph candidate set, and constructing a second text line candidate set;
finding out a first text word with the closest starting point coordinate distance in the first text line candidate set, and constructing a first text word candidate set; finding out a second text word with the closest end point coordinate distance in the second text line candidate set, and constructing a second text word candidate set;
finding out a first virtual character with the closest starting point coordinate distance in the first text word candidate set, and constructing a first virtual character candidate set; and finding out a second virtual character with the closest end point coordinate distance in the second text word candidate set, and constructing a second virtual character candidate set.
5. The selection method according to claim 1, wherein the determining a text range according to the starting point cursor position and the ending point cursor position and selecting the text content in the text range comprises:
determining a corresponding text range according to the starting point cursor position and the end point cursor position;
based on the sequence of text lines recorded in a text data structure, if the text lines do not fall into the text range completely, acquiring rectangular areas of each character one by one from the head of the text line or the starting point cursor position, merging the rectangular areas of each character until meeting the tail of the line or the end point cursor position, and adding the merged rectangular areas into a highlight area set;
if all the text lines fall into the text range, directly adding the rectangular area of the whole text line into the highlight area set;
and displaying a highlight effect on a user interface according to the rectangular area in the highlight area set, or adding text highlight labels under the text.
6. The method of claim 1, wherein determining a text range according to the starting cursor position and the ending cursor position and selecting text content within the text range further comprises:
determining a corresponding text range according to the starting point cursor position and the end point cursor position;
based on the sequence of the text lines recorded in the text data structure, if the text lines do not fall into the text range completely, starting from the head of the text line or the starting point cursor position, determining the origin of the first character as the underline starting point, then updating the underline terminal point one by one until the word size change or the tail of the line or the terminal point cursor position is met, and then adding the underline into an underline set;
taking the original point of the next character as a new underline starting point again until the character size change or the line tail or the terminal cursor position is met, and then adding the underline into the underline set;
and displaying an underline effect on a user interface according to the line segments in the underline set, or adding text underline marks under the text.
7. The method of claim 1, wherein determining a text range according to the starting cursor position and the ending cursor position and selecting text content within the text range further comprises:
determining a corresponding text range according to the starting point cursor position and the end point cursor position;
based on the sequence of text lines recorded in a text data structure, if the text lines do not fall into the text range completely, starting from the head of the text line or the starting point cursor position, determining the original point of the first character as the starting point of a deletion line, then updating the end point of the deletion line one by one until the character size change or the tail of the line or the end point cursor position is met, and then adding the deletion line into a deletion line set;
taking the original point of the next character as the starting point of a new deletion line again until meeting the change of the font size or the position of a line tail or a terminal cursor, and adding the deletion line into the deletion line set;
and displaying a delete line effect on a user interface according to the line segments in the delete line set, or adding a text delete line label under the text.
8. A selection device, comprising:
the construction unit is used for constructing a text data structure of a text and establishing a cursor data structure corresponding to the text data structure;
the searching unit is used for receiving a starting point coordinate and an end point coordinate given by a user, searching the position of the starting point coordinate and the position of the end point coordinate in the text data structure, and respectively matching a corresponding starting point cursor position and a corresponding end point cursor position in the cursor data structure;
and the selection unit is used for determining a text range according to the starting point cursor position and the end point cursor position and selecting the text content in the text range.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the selection method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the selection method according to any one of claims 1 to 7.
CN202011471610.1A 2020-12-14 2020-12-14 Selection method, selection device, computer equipment and storage medium Pending CN112434495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011471610.1A CN112434495A (en) 2020-12-14 2020-12-14 Selection method, selection device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011471610.1A CN112434495A (en) 2020-12-14 2020-12-14 Selection method, selection device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112434495A true CN112434495A (en) 2021-03-02

Family

ID=74692631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011471610.1A Pending CN112434495A (en) 2020-12-14 2020-12-14 Selection method, selection device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112434495A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101326516A (en) * 2005-12-12 2008-12-17 微软公司 Selecting and formatting warped text
JP2010152500A (en) * 2008-12-24 2010-07-08 Fujitsu Ltd Text range selection processing program, method, and device
US20100293460A1 (en) * 2009-05-14 2010-11-18 Budelli Joe G Text selection method and system based on gestures
CN102880417A (en) * 2011-09-12 2013-01-16 微软公司 Dominant touch selection and the cursor is placed
CN106201255A (en) * 2016-06-30 2016-12-07 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN107977346A (en) * 2017-11-23 2018-05-01 万兴科技股份有限公司 A kind of PDF document edit methods and terminal device
CN108205415A (en) * 2016-12-19 2018-06-26 汉王科技股份有限公司 text selection method and device
CN108470021A (en) * 2018-03-26 2018-08-31 阿博茨德(北京)科技有限公司 The localization method and device of table in PDF document
CN109657220A (en) * 2018-12-11 2019-04-19 万兴科技股份有限公司 The online editing method, apparatus and electronic equipment of PDF document

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101326516A (en) * 2005-12-12 2008-12-17 微软公司 Selecting and formatting warped text
JP2010152500A (en) * 2008-12-24 2010-07-08 Fujitsu Ltd Text range selection processing program, method, and device
US20100293460A1 (en) * 2009-05-14 2010-11-18 Budelli Joe G Text selection method and system based on gestures
CN102880417A (en) * 2011-09-12 2013-01-16 微软公司 Dominant touch selection and the cursor is placed
CN106201255A (en) * 2016-06-30 2016-12-07 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN108205415A (en) * 2016-12-19 2018-06-26 汉王科技股份有限公司 text selection method and device
CN107977346A (en) * 2017-11-23 2018-05-01 万兴科技股份有限公司 A kind of PDF document edit methods and terminal device
CN108470021A (en) * 2018-03-26 2018-08-31 阿博茨德(北京)科技有限公司 The localization method and device of table in PDF document
CN109657220A (en) * 2018-12-11 2019-04-19 万兴科技股份有限公司 The online editing method, apparatus and electronic equipment of PDF document

Similar Documents

Publication Publication Date Title
KR102197501B1 (en) Detection and reconstruction of east asian layout features in a fixed format document
CN101308488B (en) Document stream type information processing method based on format document and device therefor
US6952803B1 (en) Method and system for transcribing and editing using a structured freeform editor
JP4890851B2 (en) Semantic document smart nails
CN108710601B (en) Text display method and equipment, storage medium and electronic equipment
US20060294460A1 (en) Generating a text layout boundary from a text block in an electronic document
JP5439456B2 (en) Electronic comic editing apparatus, method and program
JP5247311B2 (en) Electronic document processing apparatus and electronic document processing method
US8943431B2 (en) Text operations in a bitmap-based document
CN112417826B (en) PDF online editing method and device, electronic equipment and readable storage medium
CN116110051B (en) File information processing method and device, computer equipment and storage medium
CN112434495A (en) Selection method, selection device, computer equipment and storage medium
US20210406453A1 (en) Mapping annotations to ranges of text across documents
KR20120134054A (en) Apparatus for processing user annotations and electronic book service system for the same
KR20060052631A (en) Analysis alternates in context trees
CN112312189B (en) Video generation method and video generation system
US20150095314A1 (en) Document search apparatus and method
EP3287952A1 (en) Input control program, input control device, input control method, character correction program, character correction device, and character correction method
CN113918509A (en) Document comparison display method and document comparison display equipment
CN104463086B (en) A kind of information processing method and equipment
CN108509955B (en) Method, system, and non-transitory computer readable medium for character recognition
CN111046096B (en) Method and device for generating graphic structured information
CN109118505B (en) Font track calculation method and storage medium
US20230079441A1 (en) Apparatus and Method of Re-Ordering Drawing Blocks on a Slide of a User Interface Canvas
CN117421143A (en) Method, device, equipment and medium for copying and optimizing PDF text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination