CN116522876B

CN116522876B - Method and device for realizing PDF text labeling of webpages of fire fox browser

Info

Publication number: CN116522876B
Application number: CN202310513079.7A
Authority: CN
Inventors: 张林辉; 何凡; 计雪莉; 王彬彬; 王云川; 潘海勇
Original assignee: Hubei Haichuang Zhiyun Technology Co ltd; Beijing Leadal Technology Development Co ltd
Current assignee: Hubei Haichuang Zhiyun Technology Co ltd; Beijing Leadal Technology Development Co ltd
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2024-01-09
Anticipated expiration: 2043-05-08
Also published as: CN116522876A

Abstract

The invention provides a method and a device for realizing PDF text labeling of a fire fox browser webpage version, belonging to the technical field of computers, wherein the method comprises the following steps: according to a window.getselection method, obtaining DOM nodes when a mouse starts to drag and when the mouse ends to drag and text coordinates of selected contents in the DOM nodes; judging whether the DOM node when the mouse starts to drag and the DOM node when the mouse finishes to drag are the same, and determining a labeling mode according to a judging result; if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, setting the background color of the selected text directly through style; and if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, marking the selected text according to a preset marking algorithm. The method can set the marked text across pages, the marked text is reproducible, the flexibility, the controllability and the expansibility are higher, and convenience is provided for PDF operation of users.

Description

Method and device for realizing PDF text labeling of webpages of fire fox browser

Technical Field

The invention relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for realizing PDF text labeling of a web page version of a fire fox browser.

Background

In the fire fox browser, only a preview function is provided in a PDF tool originally provided by the fire fox browser, the prior art on the market at present has a core solution method for solving the PDF text annotation of a webpage version, namely, the display effect of the annotation is achieved by switching canvas layers, the method only solves the annotation function visually from a user, and the PDF text annotation is not solved in a core in a real sense.

The prior art has the following defects: the selected marking information cannot be copied, and the canvas cannot realize page-crossing text marking aiming at PDFs of multiple pages.

Disclosure of Invention

The method, the device, the equipment and the storage medium for realizing the PDF text marking of the fire fox browser webpage version can set marking texts in a page-crossing manner, the marked texts are reproducible, the flexibility, the controllability and the expansibility are higher, and convenience is provided for PDF operation of users.

In a first aspect, an embodiment of the present invention provides a method for implementing PDF text labeling of a web page version of a fire fox browser, where the method includes:

according to a window.getselection method, obtaining DOM nodes when a mouse starts to drag and DOM nodes when the mouse ends to drag and text coordinates of selected contents in the DOM nodes;

judging whether the DOM node when the mouse starts to drag and the DOM node when the mouse finishes to drag are the same, and determining a labeling mode according to a judging result;

if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, setting the background color of the selected text directly through style;

if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, marking the selected text according to a preset marking algorithm.

The technical scheme is completely different from that of directly calculating canvas, the method directly calculates the selected DOM and text coordinate information to label, the original PDF analysis data structure can be greatly optimized and reserved, the text can be extracted, the text labeling color can be customized at will, a convenient API is provided for a secondary developer, and repeated labeling and assignment can be conveniently carried out.

Optionally, if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, the method for labeling the selected text according to the preset labeling algorithm includes:

setting a DOM node between the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag through the style, and selecting the background color of the text corresponding to the DOM node;

marking the DOM node when the mouse starts to drag and the selected text corresponding to the DOM node when the mouse ends to drag according to the head-tail cutting algorithm.

Optionally, the method for labeling the selected text corresponding to the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag according to the head-tail cutting algorithm comprises the following steps:

cutting an HTML character string into slices according to the DOM node and the text coordinates of the selected content in the node when the mouse starts to drag, the DOM node and the text coordinates of the selected content in the node when the mouse ends to drag, and the slice method of javascript;

adding DOM nodes at the cutting points in a splicing way, and setting background colors at the DOM nodes added in the splicing way;

the cutting point selects the content from the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag and the unselected content.

Optionally, the method further comprises:

acquiring node data corresponding to the selected text;

wherein the node data includes: selecting DOM nodes corresponding to the text and text coordinates of the selected text;

calculating the position information of DOM nodes corresponding to the selected text according to the node data;

wherein the location information includes: page number and node number;

the position information is used for positioning DOM nodes when the marking patterns are set and displayed back.

In a second aspect, an embodiment of the present invention provides a device for implementing PDF text labeling of a web page version of a fire fox browser, where the device includes:

the acquisition module is used for acquiring DOM nodes when the mouse starts to drag and when the mouse finishes dragging and text coordinates of selected contents in the DOM nodes according to a window. Getselection method;

the determining module is used for judging whether the DOM node when the mouse starts to drag and the DOM node when the mouse finishes dragging are the same or not, and determining a labeling mode according to a judging result;

Optionally, the apparatus further comprises:

the computing module is used for acquiring node data corresponding to the selected text; calculating the position information of DOM nodes corresponding to the selected text according to the node data;

wherein the node data includes: selecting DOM nodes corresponding to the text and text coordinates of the selected text; the location information includes: page number and node number;

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the method according to any implementation manner of the first aspect when executing the program.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the implementations of the first aspect.

The invention provides a method and a device for realizing PDF text labeling of a fire fox browser webpage version, wherein the method comprises the following steps: according to a window.getselection method, obtaining DOM nodes when a mouse starts to drag and when the mouse ends to drag and text coordinates of selected contents in the DOM nodes; judging whether the DOM node when the mouse starts to drag and the DOM node when the mouse finishes to drag are the same, and determining a labeling mode according to a judging result; if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, setting the background color of the selected text directly through style; and if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, marking the selected text according to a preset marking algorithm. The method can set the marked text across pages, the marked text is reproducible, the flexibility, the controllability and the expansibility are higher, and convenience is provided for PDF operation of users.

It should be understood that the description in this summary is not intended to limit the critical or essential features of the embodiments of the invention, nor is it intended to limit the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

The above and other features, advantages and aspects of embodiments of the present invention will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals denote the same or similar elements.

FIG. 1 is a flowchart of a method for realizing PDF text labeling of a web page version of a fire fox browser according to an embodiment of the invention;

fig. 2 is a schematic structural diagram of a device for implementing PDF text labeling of a web page version of a fire fox browser according to an embodiment of the present invention;

fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive faculty, are intended to be within the scope of the present disclosure.

It should be noted that, the description of the embodiment of the present invention is only for the purpose of more clearly describing the technical solution of the embodiment of the present invention, and does not constitute a limitation on the technical solution provided by the embodiment of the present invention.

Fig. 1 is a flowchart of a method for implementing a fire fox browser web page PDF text labeling according to an embodiment of the present invention. As shown in fig. 1, includes:

s101, according to a window. Getselection method, obtaining DOM nodes when the mouse starts to drag and when the mouse ends to drag and text coordinates of selected contents in the DOM nodes.

Optionally, a Selection object is returned after using the window. Getselection method, which indicates the text range or the current position of the cursor selected by the user.

S102, judging whether the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, and determining the labeling mode according to the judging result.

Optionally, if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, setting the background color of the selected text directly through style;

For example, the background color of the text or the text color may be set when the text is annotated. However, the use of this method is not limited to setting the background color and the text color, and may be annotating or labeling.

Alternatively, the slice method may extract a certain portion of the string and return the extracted portion with a new string; designating a portion of string extraction using start (contained) and end (not contained) parameters; wherein, the first character position in the character string is 0, the second character position is 1, and the like.

Alternatively, if negative, the parameter specifies a starting position from the end of the string.

Illustratively, -1 refers to the last character of the string, -2 refers to the penultimate character, and so on.

Illustratively, the text content may be as follows: "a test document, I are the start node, I are the first intermediate node, I are the second intermediate node, I are the third intermediate node, I are the fourth intermediate node, I are the fifth intermediate node, I are the sixth intermediate node, I are the seventh intermediate node, I are the end node. "

The content selected during the labeling of the user is as follows: "Start node, I are the first intermediate node, I are the second intermediate node, I are the third intermediate node, I are the fourth intermediate node, I are the fifth intermediate node, I are the sixth intermediate node, I are the seventh intermediate node, I are the nodes".

Wherein, "i am the first intermediate node, i am the second intermediate node, i am the third intermediate node, i am the fourth intermediate node, i am the fifth intermediate node, i am the sixth intermediate node, i am the seventh intermediate node" the background color of the text can be set by style.

Where "start node" and "I are nodes" are not all content in DOM node, so the HTML string is cut into pieces by the slice method of javascript, and the text after cutting is as follows: i are, start node, i are node, end node.

Then adding DOM nodes in a splicing way between I 'M is' and 'start node', 'I' M is 'knot' and 'bundle node', and setting background color at the DOM nodes added in the splicing way.

Optionally, the method for realizing the fire fox browser webpage PDF text labeling further comprises the following steps:

acquiring node data corresponding to the selected text;

wherein the location information includes: page number and node number; the position information is used for positioning DOM nodes when the marking patterns are set and displayed back.

The embodiment of the invention provides a method for realizing PDF text labeling of a web page version of a fire fox browser, which comprises the following steps: according to a window.getselection method, obtaining DOM nodes when a mouse starts to drag and when the mouse ends to drag and text coordinates of selected contents in the DOM nodes; judging whether the DOM node when the mouse starts to drag and the DOM node when the mouse finishes to drag are the same, and determining a labeling mode according to a judging result; if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, setting the background color of the selected text directly through style; and if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, marking the selected text according to a preset marking algorithm. The method can set the marked text across pages, the marked text is reproducible, the flexibility, the controllability and the expansibility are higher, and convenience is provided for PDF operation of users.

The following describes in detail the device provided in the embodiment of the present application and capable of executing the method for implementing PDF text labeling of web page version of firefox browser.

Fig. 2 is a schematic structural diagram of a device for implementing PDF text labeling of a web page version of a fire fox browser according to an embodiment of the present invention; as shown in fig. 2, the labeling device 20 includes:

the obtaining module 201 is configured to obtain, according to a window. Getselection method, a DOM node when the mouse starts to drag and when the mouse ends to drag, and text coordinates of selected content in the DOM node;

the determining module 202 is configured to determine whether the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are the same, and determine a labeling mode according to a determination result;

Optionally, the labeling device 20 further comprises:

the computing module 203 is configured to obtain node data corresponding to the selected text, where the computing module 203 is configured to obtain the node data; calculating the position information of DOM nodes corresponding to the selected text according to the node data;

Optionally, the determining module 202 is further configured to set, through style, a background color of the selected text corresponding to a DOM node between a DOM node when the mouse starts to drag and a DOM node when the mouse ends to drag; marking the DOM node when the mouse starts to drag and the selected text corresponding to the DOM node when the mouse ends to drag according to the head-tail cutting algorithm.

Optionally, the determining module 202 is further configured to cut the HTML string into slices according to a DOM node when the mouse starts to drag and a text coordinate of the selected content in the DOM node, a DOM node when the mouse ends to drag and a text coordinate of the selected content in the DOM node, and a slice method of javascript; and splicing and adding DOM nodes at the cutting points, and setting background colors at the DOM nodes added by splicing.

The embodiment of the present invention also provides a computer electronic device, fig. 3 shows a schematic diagram of a structure of an electronic device to which the embodiment of the present invention can be applied, and as shown in fig. 3, the computer electronic device includes a central processing module (CPU) 301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the system operation are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other through a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

The following components are connected to the I/O interface 305: an input section 306 including a keyboard, a mouse, and the like; an output portion 307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 308 including a hard disk or the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. The drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 310 as needed, so that a computer program read out therefrom is installed into the storage section 308 as needed.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules or modules may also be provided in a processor, for example, as: a processor includes an acquisition module 201 and a determination module 202, where the names of these modules do not in some cases define the module itself, for example, the acquisition module 201 may also be described as "an acquisition module 201 for acquiring the DOM node at the beginning of a mouse drag and the literal coordinates of the selected content within the DOM node at the end of the drag" according to the window.

As another aspect, the present invention further provides a computer readable storage medium, where the computer readable storage medium may be a computer readable storage medium included in a device for implementing PDF text labeling of a web page version of a fire fox browser in the above embodiment; or may be a computer-readable storage medium, alone, that is not incorporated into an electronic device. The computer readable storage medium stores one or more programs for use by one or more processors in performing a method for implementing a fire fox browser web page PDF text annotation described in the present invention.

The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the invention referred to in the present invention is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for realizing PDF text labeling of a fire fox browser webpage version is characterized by comprising the following steps:

according to a window.getselection method, obtaining DOM nodes when a mouse starts to drag and when the mouse ends to drag and text coordinates of selected contents in the DOM nodes;

if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, marking the selected text according to a preset marking algorithm;

if the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag are not the same, the method for marking the selected text according to the preset marking algorithm comprises the following steps:

setting a background color of a selected text corresponding to a DOM node between the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag through a style;

marking a DOM node when the mouse starts to drag and a selected text corresponding to the DOM node when the mouse ends to drag according to a head-tail cutting algorithm;

the method for marking the selected text corresponding to the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag according to the head-tail cutting algorithm comprises the following steps:

cutting an HTML character string into slices according to DOM nodes and text coordinates of selected contents in the nodes when the mouse starts to drag, DOM nodes and text coordinates of selected contents in the nodes when the mouse ends to drag and a slice method of javascript;

adding DOM nodes at the cutting points in a splicing way, and setting background colors at the DOM nodes added in the splicing way; and the cutting point selects the content from the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag and the unselected content.

2. The method for realizing the fire fox browser webpage PDF text labeling according to claim 1, wherein the method further comprises the steps of:

acquiring node data corresponding to the selected text; the node data includes: the DOM node corresponding to the selected text and the text coordinates of the selected text;

calculating the position information of the DOM node corresponding to the selected text according to the node data; the location information includes: page number and node number;

3. The device for realizing the PDF text labeling of the web page version of the fire fox browser is characterized by comprising the following components:

the acquisition module is used for acquiring DOM nodes when the mouse starts to drag and when the mouse ends to drag and text coordinates of selected contents in the DOM nodes according to a window. Getselection method;

the determining module is further used for setting background colors of the selected texts corresponding to DOM nodes between the DOM nodes when the mouse starts to drag and the DOM nodes when the mouse ends to drag through the style; marking a DOM node when the mouse starts to drag and a selected text corresponding to the DOM node when the mouse ends to drag according to a head-tail cutting algorithm;

the determining module is further used for cutting the HTML character string into slices according to the DOM node when the mouse starts to drag and the text coordinates of the selected content in the node, the DOM node when the mouse ends to drag and the text coordinates of the selected content in the node, and a slice method of javascript; adding DOM nodes at the cutting points in a splicing way, and setting background colors at the DOM nodes added in the splicing way; and the cutting point selects the content from the DOM node when the mouse starts to drag and the DOM node when the mouse ends to drag and the unselected content.

4. A device for realizing fire fox browser web page PDF text labeling according to claim 3, further comprising:

the computing module is used for acquiring node data corresponding to the selected text; calculating the position information of the DOM node corresponding to the selected text according to the node data; the node data includes: the DOM node corresponding to the selected text and the text coordinates of the selected text; the location information includes: page number and node number;

5. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method of claim 1 or 2 when executing the computer program.

6. A computer-readable storage medium, characterized in that a computer program is stored, which, when being executed by a processor, implements the method according to claim 1 or 2.