CN116384346A - Text replacement method, device, terminal and medium based on HTML format - Google Patents

Text replacement method, device, terminal and medium based on HTML format Download PDF

Info

Publication number
CN116384346A
CN116384346A CN202211663274.XA CN202211663274A CN116384346A CN 116384346 A CN116384346 A CN 116384346A CN 202211663274 A CN202211663274 A CN 202211663274A CN 116384346 A CN116384346 A CN 116384346A
Authority
CN
China
Prior art keywords
text
dom
question
proofreading
replacing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211663274.XA
Other languages
Chinese (zh)
Inventor
麦淼
范玉平
张仲凯
张桂梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Southern New Media Technology Co ltd
Original Assignee
Guangdong Southern New Media Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Southern New Media Technology Co ltd filed Critical Guangdong Southern New Media Technology Co ltd
Priority to CN202211663274.XA priority Critical patent/CN116384346A/en
Publication of CN116384346A publication Critical patent/CN116384346A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a character replacement method, a device, a terminal and a medium based on an HTML format, wherein the method comprises the following steps: extracting Text from the HTML DOM tree; performing proofreading word segmentation through Text to obtain proofreading data; the collation data includes question text information, question text location, suggested modification text and question type; determining DOM positions corresponding to the problem text based on the calibration data; and replacing the problem text by the simulated mouse according to the correction data and the DOM position to obtain the target modified text. Compared with the prior art, the Text is extracted from the HTML DOM tree, the DOM position corresponding to the problem Text is determined based on the proofreading data, the Text is replaced, the proofreading of the hypertext markup language is realized, the proofreading efficiency is higher than that of the prior art based on the regular expression, and meanwhile, the Text dislocation in the replacement can be effectively avoided.

Description

Text replacement method, device, terminal and medium based on HTML format
Technical Field
The present invention relates to the field of word processing technologies, and in particular, to a method, an apparatus, a terminal device, and a computer readable storage medium for replacing words based on an HTML format.
Background
At present, a large number of work posts in society have a large number of word processing related work contents, and the work contents mainly comprise mechanical operations such as word input, deletion, replacement, search and the like.
In general, for text replacement, the prior art mainly uses regular expressions to directly replace text, but the problem of dislocation occurs in this way. Meanwhile, on the other hand, only the text can be checked, but the hypertext markup language (HTML) cannot be checked, so that the problem of low checking efficiency exists.
Disclosure of Invention
The invention provides a character replacement method, a device, terminal equipment and a computer readable storage medium based on an HTML format, which are used for solving the problem of character dislocation in the replacement process in the prior art and improving the correction efficiency.
In order to solve the above technical problems, an embodiment of the present invention provides a text replacement method based on HTML format, including:
extracting Text from the HTML DOM tree;
performing proofreading word segmentation through the Text to obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type;
determining the DOM position corresponding to the problem text based on the proofreading data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text.
As a preferred solution, the replacing the question text by the simulation mouse obtains a target modification text, specifically:
and modifying the Text node of the question Text into a DOM node so as to simulate mouse operation, and replacing the question Text with the target modified Text.
Preferably, the target modification text is the suggested modification text;
or, responding to a user input instruction, and acquiring target modification text according to the input instruction.
Preferably, the determining the DOM position corresponding to the question text specifically includes: and determining the DOM position corresponding to the problem text through a depth-first search algorithm.
Correspondingly, the embodiment of the invention also provides a text replacement device based on the HTML format, which comprises an extraction module, a proofreading module and a replacement module; wherein,,
the extraction module is used for extracting Text texts from the HTML DOM tree;
the proofreading module is used for proofreading and word segmentation through the Text to obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type;
the replacement module is used for determining the DOM position corresponding to the problem text based on the correction data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text.
As a preferred scheme, the replacing module replaces the problem text by a simulation mouse to obtain a target modification text, specifically:
and the replacing module modifies the Text node of the question Text into a DOM node so as to simulate mouse operation and replace the question Text into the target modified Text.
Preferably, the target modification text is the suggested modification text;
or, the replacing module is further used for responding to a user input instruction and acquiring the target modification text according to the input instruction.
As a preferred solution, the replacing module determines the DOM position corresponding to the question text, specifically: and the replacement module determines the DOM position corresponding to the problem text through a depth-first search algorithm.
Correspondingly, the embodiment of the invention also provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the text replacement method based on the HTML format when executing the computer program.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein the equipment where the computer readable storage medium is located is controlled to execute the text replacement method based on the HTML format when the computer program runs.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a text replacement method, a device, terminal equipment and a computer readable storage medium based on an HTML format, wherein the text replacement method comprises the following steps: extracting Text from the HTML DOM tree; performing proofreading word segmentation through the Text to obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type; determining the DOM position corresponding to the problem text based on the proofreading data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text. Compared with the prior art, the Text is extracted from the HTML DOM tree, the DOM position corresponding to the problem Text is determined based on the proofreading data, the Text is replaced, the proofreading of the hypertext markup language is realized, the proofreading efficiency is higher than that of the prior art based on the regular expression, and meanwhile, the Text dislocation in the replacement can be effectively avoided.
Further, the DOM position with problems is found based on the depth-first search algorithm, the tree structure of the DOM is more targeted, and the text searching efficiency can be further optimized and improved.
Drawings
Fig. 1: a schematic flow chart of an embodiment of a text replacement method provided by the invention based on an HTML format.
Fig. 2: a Text schematic diagram of an embodiment provided for the present invention.
Fig. 3: schematic diagram of one embodiment of collation data returned for the interface of the present invention.
Fig. 4: schematic diagram of one embodiment of a depth-first search algorithm provided for the present invention.
Fig. 5: an implementation schematic diagram of an embodiment of selecting error text for a simulated mouse is provided.
Fig. 6: an attribute schematic diagram of an embodiment of selecting error text for a simulated mouse is provided.
Fig. 7: a schematic structural diagram of an embodiment of the text replacement device provided by the invention based on the HTML format is provided.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment one:
referring to fig. 1, fig. 1 is a text replacement method based on HTML format according to an embodiment of the present invention, which includes steps S1 to S3, wherein,
and S1, extracting Text from the HTML DOM tree.
S2, performing proofreading word segmentation through the Text to obtain proofreading data; wherein the collation data includes question text information, question text location, suggested modification text and question type.
Step S3, determining the DOM position corresponding to the question text based on the proofreading data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text.
It should be noted that HTML is called hypertext markup language, and includes a series of tags, and by using these tags, document formats can be unified, so that distributed resources are connected into a logic whole. Whereas HTML text is descriptive text made up of HTML commands.
In this embodiment, first, step S1 may extract Text from the HTML DOM tree through the editor, and transmit the extracted Text to the background. Then, the step S2 may obtain the collation data by performing the collation word segmentation on the Text. The collation data comprises question text, and specifically comprises information such as question text information, question text positions, suggested modification text, question types and the like. The suggested modification text is the final result of the suggested modification. The Text of the request may refer to fig. 2 and the interface return collation data may refer to fig. 3 (for illustration only).
Further, the determining the DOM position corresponding to the question text specifically includes: and determining the DOM position corresponding to the problem text through a depth-first search (DFS) algorithm. As can be taken from fig. 3, pos is the starting position of the text, and end_pos is the ending position of the text.
The Depth First Search (DFS) algorithm is specifically (see fig. 4) understood as searching paths in the DOM, i.e. starting from the root node, starting from the left, in principle in the order of root node, left node and right node. Further, judgment is made by returned pos (text start position) and end_pos (text end position), and the target DOM is found.
For example, as an example of the embodiment, if the aaa Text at the position of pos:1 and end_pos:3 of the Text is problematic, then the object DOM needs to be found out of the DOM, < span > b </span > < span > aaa </span >, and the structure of the DOM is tree-shaped, then the DOM of the object is found by the depth-first search method, and the operation of the mouse is simulated, and the object DOM is modified by codes to modify the node attribute thereof and increase the operation of the style. It is emphasized that DOM is an abbreviation for Document Object Model document object model, which is an interface independent of browser, platform, language, and XML can be dynamically modified according to the W3C DOM specification. It expresses HTML documents as a tree structure. Therefore, the tree structure of the DOM is searched more pertinently by the method, the text searching efficiency can be further optimized and improved, and the text searching efficiency, speed and accuracy can be effectively improved.
After the DOM position corresponding to the question text (namely, the target DOM) is determined, replacing the question text by a simulation mouse according to the correction data and the DOM position. The simulated mouse of the embodiment adopts the function of a chrome browser, and selects an error text through the getSelection and getword at (refer to fig. 5 and 6), and modifies the error text by using a document.
It should be noted that, in fig. 6, startcontender refers to a start node and a corresponding offset, endOffset refers to an end node and a corresponding offset, and commonancestor contender is a common ancestor node that is nearest to all nodes.
As a further preferred embodiment, the target modification text may be the suggested modification text; or, in response to a user input instruction, acquiring target modification text according to the input instruction. The implementation of the embodiment of the application specifically provides two modification modes, namely, directly taking the suggested modification text as the target modification text, effectively reducing the workload of a user, realizing full-automatic searching and text replacement, and the other mode of inputting by the user, wherein an input instruction comprises a text to be modified, and then responding and modifying according to the input instruction, thereby realizing accurate text replacement.
Correspondingly, referring to fig. 7, the embodiment of the invention also provides a text replacement device based on the HTML format, which comprises an extraction module 101, a verification module 102 and a replacement module 103; wherein,,
the extracting module 101 is configured to extract Text from the HTML DOM tree;
the proofreading module 102 is configured to perform proofreading word segmentation through the Text, and obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type;
the replacing module 103 is configured to determine, based on the collation data, a DOM position corresponding to the question text; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text.
As a preferred embodiment, the replacing module 103 replaces the question text by using a simulated mouse to obtain the target modified text, specifically:
the replacing module 103 modifies the Text node of the question Text to a DOM node to simulate a mouse operation, and replaces the question Text with the target modified Text.
As a preferred embodiment, the target modification text is the suggested modification text;
or, the replacing module 103 is further configured to respond to a user input instruction, and obtain the target modification text according to the input instruction.
As a preferred embodiment, the replacing module 103 determines the DOM position corresponding to the question text, specifically: the replacing module 103 determines the DOM position corresponding to the question text through a depth-first searching algorithm.
Correspondingly, the embodiment of the invention also provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the text replacement method based on the HTML format when executing the computer program.
The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal, connecting various parts of the entire terminal using various interfaces and lines.
The memory may be used to store the computer program, and the processor may implement various functions of the terminal by running or executing the computer program stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein the equipment where the computer readable storage medium is located is controlled to execute the text replacement method based on the HTML format when the computer program runs.
Wherein the module integrated with the text replacement device based on the HTML format can be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a text replacement method, a device, terminal equipment and a computer readable storage medium based on an HTML format, wherein the text replacement method comprises the following steps: extracting Text from the HTML DOM tree; performing proofreading word segmentation through the Text to obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type; determining the DOM position corresponding to the problem text based on the proofreading data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text. Compared with the prior art, the Text is extracted from the HTML DOM tree, the DOM position corresponding to the problem Text is determined based on the proofreading data, the Text is replaced, the proofreading of the hypertext markup language is realized, the proofreading efficiency is higher than that of the prior art based on the regular expression, and meanwhile, the Text dislocation in the replacement can be effectively avoided.
Further, the DOM position with problems is found based on the depth-first search algorithm, the tree structure of the DOM is more targeted, and the text searching efficiency can be further optimized and improved.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims (10)

1. A text replacement method based on HTML format, comprising:
extracting Text from the HTML DOM tree;
performing proofreading word segmentation through the Text to obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type;
determining the DOM position corresponding to the problem text based on the proofreading data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text.
2. The HTML format-based text replacement method of claim 1, wherein the replacing the question text with the simulated mouse obtains the target modified text, specifically:
and modifying the Text node of the question Text into a DOM node so as to simulate mouse operation, and replacing the question Text with the target modified Text.
3. The HTML-format-based text replacement method according to claim 2, wherein the target modification text is the suggested modification text;
or, responding to a user input instruction, and acquiring target modification text according to the input instruction.
4. A method for replacing text based on HTML format according to any one of claims 1 to 3, wherein the determining the DOM position corresponding to the question text specifically includes: and determining the DOM position corresponding to the problem text through a depth-first search algorithm.
5. The character replacing device based on the HTML format is characterized by comprising an extracting module, a checking module and a replacing module; wherein,,
the extraction module is used for extracting Text texts from the HTML DOM tree;
the proofreading module is used for proofreading and word segmentation through the Text to obtain proofreading data; the collating data comprises question text information, a question text position, a suggested modification text and a question type;
the replacement module is used for determining the DOM position corresponding to the problem text based on the correction data; and replacing the problem text by a simulation mouse according to the correction data and the DOM position to obtain a target modification text.
6. The HTML format-based text replacement device of claim 5, wherein the replacement module replaces the question text by a simulated mouse to obtain the target modified text, specifically:
and the replacing module modifies the Text node of the question Text into a DOM node so as to simulate mouse operation and replace the question Text into the target modified Text.
7. The HTML-based text replacement device of claim 5, wherein the target modification text is the suggested modification text;
or, the replacing module is further used for responding to a user input instruction and acquiring the target modification text according to the input instruction.
8. The text replacement device according to any one of claims 5 to 7, wherein the replacement module determines a DOM position corresponding to the question text, specifically: and the replacement module determines the DOM position corresponding to the problem text through a depth-first search algorithm.
9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a text replacement method based on HTML format according to any one of claims 1 to 4 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform a text replacement method based on an HTML format according to any one of claims 1 to 4.
CN202211663274.XA 2022-12-23 2022-12-23 Text replacement method, device, terminal and medium based on HTML format Pending CN116384346A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211663274.XA CN116384346A (en) 2022-12-23 2022-12-23 Text replacement method, device, terminal and medium based on HTML format

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211663274.XA CN116384346A (en) 2022-12-23 2022-12-23 Text replacement method, device, terminal and medium based on HTML format

Publications (1)

Publication Number Publication Date
CN116384346A true CN116384346A (en) 2023-07-04

Family

ID=86962138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211663274.XA Pending CN116384346A (en) 2022-12-23 2022-12-23 Text replacement method, device, terminal and medium based on HTML format

Country Status (1)

Country Link
CN (1) CN116384346A (en)

Similar Documents

Publication Publication Date Title
RU2358311C2 (en) Word processing document, stored as single xml file, which can be manipulated by applications which can read xml language
US10534830B2 (en) Dynamically updating a running page
CN100440222C (en) System and method for text legibility enhancement
CN108762743B (en) Data table operation code generation method and device
US8219901B2 (en) Method and device for filtering elements of a structured document on the basis of an expression
JPS6375835A (en) Apparatus for generating intended code, program, list and design document
CN110705237B (en) Automatic document generation method, data processing device and storage medium
CN113312108B (en) SWIFT message verification method and device, electronic equipment and storage medium
CN110209387B (en) Method and device for generating top-level HDL file and computer readable storage medium
CN111656453A (en) Hierarchical entity recognition and semantic modeling framework for information extraction
CN107102877A (en) A kind of adaptive approach of browser-cross plug-in unit
CN111176650A (en) Parser generation method, search method, server, and storage medium
US20090083300A1 (en) Document processing device and document processing method
CN114138244A (en) Method and device for automatically generating model files, storage medium and electronic equipment
US20130124969A1 (en) Xml editor within a wysiwyg application
CN113419721B (en) Web-based expression editing method, device, equipment and storage medium
CN111611788A (en) Data processing method and device, electronic equipment and storage medium
US20080005662A1 (en) Server Device and Name Space Issuing Method
CN109947711B (en) Automatic multi-language file management method in IOS project development process
CN114444487A (en) Data processing method, device, equipment and medium
EP1783628A1 (en) Document processing method and device
CN110543641B (en) Chinese and foreign language information comparison method and device
EP1780645A1 (en) Document processing method and device
CN116204692A (en) Webpage data extraction method and device, electronic equipment and storage medium
US20150161085A1 (en) Natural language-aided hypertext document authoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination