CN113779235B - Word document outline recognition processing method and device - Google Patents

Word document outline recognition processing method and device Download PDF

Info

Publication number
CN113779235B
CN113779235B CN202111070726.9A CN202111070726A CN113779235B CN 113779235 B CN113779235 B CN 113779235B CN 202111070726 A CN202111070726 A CN 202111070726A CN 113779235 B CN113779235 B CN 113779235B
Authority
CN
China
Prior art keywords
title
directory
word
label
word file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111070726.9A
Other languages
Chinese (zh)
Other versions
CN113779235A (en
Inventor
麦天骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING LEDICT TECHNOLOGY CO LTD
Original Assignee
BEIJING LEDICT TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING LEDICT TECHNOLOGY CO LTD filed Critical BEIJING LEDICT TECHNOLOGY CO LTD
Priority to CN202111070726.9A priority Critical patent/CN113779235B/en
Publication of CN113779235A publication Critical patent/CN113779235A/en
Application granted granted Critical
Publication of CN113779235B publication Critical patent/CN113779235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Abstract

The invention discloses a Word document outline identification processing method and a Word document outline identification processing device, which are characterized in that Word files are obtained, are locally stored and analyzed, and are converted into HTML code files; circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data; and generating title directory data corresponding to the Word files through tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data. The invention can carry out outline identification processing on the Word document, realize linkage of the catalog and the Word document, facilitate grasping of the outline of the Word document, and can be integrated in an application system to quickly generate a browsing editing help page.

Description

Word document outline recognition processing method and device
Technical Field
The invention relates to the technical field of Word document processing, in particular to a Word document outline identification processing method and device.
Background
Word is a Word processor application developed by microsoft corporation and is a component of Office software. Text and graphics in letters, reports, web pages or emails can be created and edited using Microsoft Office Word. Compared with a tablet and a notepad, the multifunctional notebook has stronger function and more comprehensive performance, and can be inserted with pictures, multimedia, artistic effects and the like. Word documents are widely used in various industries, and great convenience is brought to offices.
At present, along with the continuous advancement of informatization work, various application systems exist in related departments, and related Word documents are processed and displayed through the application systems, and particularly, in administrative departments, the Word documents uploaded by users are required to be processed and optimally displayed. To improve user experience, each application system is designed with help functions. The help function realizes the auxiliary processing of Word documents uploaded by users, and the help functions of each system are respectively different and not unified, so that the users are fussy to use, and the development workload is huge. Although Word software itself has the function of processing title directories, it cannot be incorporated into a specific application system. The Word document generally has outline, and how to quickly perform outline identification processing on the Word document so as to facilitate grasping of outline of the Word document has practical significance.
Disclosure of Invention
Therefore, the invention provides a Word document outline identification processing method and device, which can be used for carrying out outline identification processing on a Word document in a help system to generate a help page so as to facilitate the display processing of the Word document.
In order to achieve the above object, the present invention provides the following technical solutions: a Word document outline identification processing method comprises the following steps:
obtaining a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;
circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data;
and generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data.
As a preferable scheme of the Word document outline identification processing method, the Word file is stored in a local server, the Word file is converted into an HTML code file in the local server, and the generated HTML code file is returned to front-end equipment for displaying the Word file.
As a preferable scheme of the Word document outline identification processing method, the display interface of the front-end device comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file content corresponding to the HTML codes.
As a preferable scheme of the Word document outline identification processing method, when the Word file content of the rich text editor window changes, the title directory after the Word file content changes is triggered and generated again.
As a preferred scheme of the Word document outline identification processing method, comparing a title directory before the Word file content of the rich text editor window is changed with a title directory after the Word file content of the rich text editor window is changed;
if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;
if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;
if the title label exists in the title catalogs before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogs after the Word file content of the rich text editor window is changed.
As a preferred scheme of the Word document outline identification processing method, the title directory data generation step procedure includes:
judging whether the label level of the title is equal to 1:
if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;
judging whether the current level of the current title is larger than the parent level:
if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;
if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.
The invention also provides a Word document outline identification processing device, which comprises:
the Word file processing module is used for acquiring a Word file, locally storing and analyzing the Word file and converting the Word file into an HTML code file;
the title tag acquisition module is used for circulating all title tags in the HTML code file in JavaScript;
the title tag traversing module is used for traversing all title tags of the HTML code file by using a recursion algorithm and arranging the title tags into tree structure data;
the title directory generation module is used for generating title directory data corresponding to the Word file through the tree structure data;
and the linkage processing module is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key.
As a preferable scheme of the Word document outline identification processing device, storing the Word file to a local server, converting the Word file into an HTML code file at the local server, and returning the generated HTML code file to front-end equipment for displaying the Word file;
the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes.
As a preferred scheme of the Word document outline identification processing device, the device further comprises a title directory updating module, which is used for retriggering and generating a title directory after the Word file content of the rich text editor window is changed;
the title directory comparison module is used for comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;
if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;
if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;
if the title label exists in the title catalogs before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogs after the Word file content of the rich text editor window is changed.
As a preferable mode of the Word document outline identification processing device, the title directory generation module is:
judging whether the label level of the title is equal to 1:
if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;
judging whether the current level of the current title is larger than the parent level:
if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;
if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.
The invention has the following advantages: the method comprises the steps of locally storing and analyzing a Word file by acquiring the Word file, and converting the Word file into an HTML code file; circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data; and generating title directory data corresponding to the Word files through tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data. The invention can carry out outline identification processing on the Word document, realize linkage of the catalog and the Word document, facilitate grasping of the outline of the Word document, and can be integrated in an application system to quickly generate a browsing editing help page.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.
FIG. 1 is a schematic flow chart of a Word document outline recognition processing method provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a technical route of a Word document outline recognition processing method provided in an embodiment of the present invention;
FIG. 3 is a schematic diagram showing a Word document outline recognition processing method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a Word document outline recognition processing device provided in an embodiment of the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, 2 and 3, a Word document outline recognition processing method is provided, which includes the following steps:
s1, acquiring a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;
s2, circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data;
s3, generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data.
In this embodiment, the Word file is saved to a local server, the local server converts the Word file into an HTML code file, and the generated HTML code file is returned to a front-end device for displaying the Word file. The display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes.
Specifically, the Word file uploaded by the user is stored on a local server configured by the application system, the step of converting the Word file into the HTML code file is executed on the local server, and the generated result of the HTML code file is returned to the front-end equipment for display, so that the processing efficiency is improved.
Specifically, one implementation code of step S1 is as follows:
in this embodiment, the generated HTML code file is returned to the front-end device, then all the titles (h-tags) of the HTML code are circulated in JavaScript, and all the titles of the current document are sorted into tree-structured data using a recursive algorithm, one implementation code is as follows:
in this embodiment, after the Word file content of the rich text editor window changes, the title directory after the change of the Word file content is re-triggered and generated.
Comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;
if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;
if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;
if the title label exists in the title catalogs before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogs after the Word file content of the rich text editor window is changed.
In this embodiment, the title directory data generation procedure includes:
judging whether the label level of the title is equal to 1:
if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;
judging whether the current level of the current title is larger than the parent level:
if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;
if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.
Specifically, one implementation code for title directory data generation is as follows:
referring to fig. 3, based on the technical scheme of the present invention, a general help system is designed, which is an online management system of a B/S structure, and can generate a corresponding help page, update description, operation guide, etc. through online editing. Each system only needs to simply reference a line js code to implement the help function.
As a general help system, the quick generation of help pages through Word files is supported. The structure of the help page is unified into the right content of the left outline, and many existing help functions exist in Word. The system supports Word document upload functionality.
After the Word document is uploaded, the online identification of the document outline is realized, and the document outline is displayed on the left side in a tree menu mode. And simultaneously, displaying corresponding Word content on the right side. Clicking on the right side of the left catalog locates the corresponding content, and the right side content editing outline may update the left side outline later.
In summary, the Word file is obtained, and is locally saved and analyzed, so that the Word file is converted into the HTML code file; circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data; and generating title directory data corresponding to the Word files through tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data. And after the Word file content of the rich text editor window is changed, the title directory after the Word file content is changed is re-triggered and generated. Comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed; if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label; if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label; if the title label exists in the title catalogue before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogue after the Word file content of the rich text editor window is changed. The invention can carry out outline identification processing on the Word document, realize linkage of the catalog and the Word document, facilitate grasping of the outline of the Word document, and can be integrated in an application system to quickly generate a browsing editing help page.
Example 2
Referring to fig. 4, the present invention further provides a Word document outline recognition processing device, including:
the Word file processing module 1 is used for acquiring a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;
the title tag acquisition module 2 is used for circulating all the title tags in the HTML code file in JavaScript;
the title tag traversing module 3 is used for traversing all title tags of the HTML code file by using a recursive algorithm and arranging the title tags into tree structure data;
a title directory generation module 4, configured to generate title directory data corresponding to the Word file according to the tree structure data;
and the linkage processing module 5 is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key.
In this embodiment, the Word file is saved to a local server, the Word file is converted into an HTML code file at the local server, and the generated HTML code file is returned to a front-end device for displaying the Word file;
the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes.
In this embodiment, the system further includes a title directory update module 6, configured to, after the Word file content of the rich text editor window changes, re-trigger generation of a title directory after the Word file content changes;
the title directory comparison module 7 is used for comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;
if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;
if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;
if the title label exists in the title catalogs before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogs after the Word file content of the rich text editor window is changed.
In this embodiment, the title directory generating module 4:
judging whether the label level of the title is equal to 1:
if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;
judging whether the current level of the current title is larger than the parent level:
if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;
if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.
It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned apparatus is based on the same concept as the method embodiment in embodiment 1 of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.
Example 3
Embodiment 3 of the present invention provides a computer-readable storage medium in which program code of a Word document outline identification processing method is stored, the program code including instructions for executing the Word document outline identification processing method of embodiment 1 or any possible implementation thereof.
Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (SolidStateDisk, SSD)), etc.
Example 4
Embodiment 4 of the present invention provides an electronic device, where the electronic device includes a processor, and the processor is coupled to a storage medium, and when the processor executes instructions in the storage medium, the processor causes the electronic device to execute a Word document outline identification processing method of embodiment 1 or any possible implementation manner thereof.
Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and which may reside separately.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.).
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (2)

1. The Word document outline identification processing method is characterized by comprising the following steps of:
obtaining a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;
circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data;
generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data;
storing the Word file to a local server, converting the Word file into an HTML code file at the local server, and returning the generated HTML code file to front-end equipment for displaying the Word file;
the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes;
when the Word file content of the rich text editor window changes, re-triggering and generating a title directory after the Word file content changes;
comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;
if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;
if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;
if the title label exists in the title catalogues before and after the Word file content of the rich text editor window is changed, continuing to use the main key of the title label in the title catalogues after the Word file content of the rich text editor window is changed;
the title directory data generation procedure comprises the following steps:
judging whether the label level of the title is equal to 1:
if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;
judging whether the current level of the current title is larger than the parent level:
if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;
if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.
2. A Word document outline recognition processing device, comprising:
the Word file processing module is used for acquiring a Word file, locally storing and analyzing the Word file and converting the Word file into an HTML code file;
the title tag acquisition module is used for circulating all title tags in the HTML code file in JavaScript;
the title tag traversing module is used for traversing all title tags of the HTML code file by using a recursion algorithm and arranging the title tags into tree structure data;
the title directory generation module is used for generating title directory data corresponding to the Word file through the tree structure data;
the linkage processing module is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key;
storing the Word file to a local server, converting the Word file into an HTML code file at the local server, and returning the generated HTML code file to front-end equipment for displaying the Word file;
the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes;
the title directory updating module is used for re-triggering and generating a title directory after the Word file content of the rich text editor window is changed;
the title directory comparison module is used for comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;
if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;
if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;
if the title label exists in the title catalogues before and after the Word file content of the rich text editor window is changed, continuing to use the main key of the title label in the title catalogues after the Word file content of the rich text editor window is changed;
the title catalog generation module is as follows:
judging whether the label level of the title is equal to 1:
if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;
judging whether the current level of the current title is larger than the parent level:
if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;
if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.
CN202111070726.9A 2021-09-13 2021-09-13 Word document outline recognition processing method and device Active CN113779235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111070726.9A CN113779235B (en) 2021-09-13 2021-09-13 Word document outline recognition processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111070726.9A CN113779235B (en) 2021-09-13 2021-09-13 Word document outline recognition processing method and device

Publications (2)

Publication Number Publication Date
CN113779235A CN113779235A (en) 2021-12-10
CN113779235B true CN113779235B (en) 2024-02-02

Family

ID=78843368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111070726.9A Active CN113779235B (en) 2021-09-13 2021-09-13 Word document outline recognition processing method and device

Country Status (1)

Country Link
CN (1) CN113779235B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117763206A (en) * 2024-02-20 2024-03-26 暗物智能科技(广州)有限公司 Knowledge tree generation method and device, electronic equipment and storage medium

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5758361A (en) * 1996-03-20 1998-05-26 Sun Microsystems, Inc. Document editor for linear and space efficient representation of hierarchical documents
CN102855257A (en) * 2011-06-30 2013-01-02 北大方正集团有限公司 Catalogue processing method and catalogue processing device
CN103049543A (en) * 2012-12-26 2013-04-17 福建天晴数码有限公司 Method and tool for updating multi-branch configuration file
CN103902632A (en) * 2012-12-31 2014-07-02 华为技术有限公司 File system building method and device in key-value storage system, and electronic device
CN104462045A (en) * 2014-12-15 2015-03-25 北京信息科技大学 Method and device for processing documents
CN105630748A (en) * 2014-10-31 2016-06-01 富士通株式会社 Information processing device and information processing method
CN106033404A (en) * 2015-03-20 2016-10-19 广州金山移动科技有限公司 Chapter skipping method and device
CN107153544A (en) * 2017-05-09 2017-09-12 合肥汉腾信息技术有限公司 A kind of Worksheet self-defining method and device
CN108563729A (en) * 2018-04-04 2018-09-21 福州大学 A kind of bidding website acceptance of the bid information extraction method based on dom tree
CN109145054A (en) * 2018-08-02 2019-01-04 力当高(上海)智能科技有限公司 A kind of method of managing customer end data
CN109815435A (en) * 2019-01-24 2019-05-28 中国人民解放军战略支援部队航天工程大学 A kind of Website page generation method, device and electronic equipment
CN110442822A (en) * 2019-08-02 2019-11-12 腾讯科技(深圳)有限公司 A kind of small routine content displaying method, device, equipment and storage medium
CN111274760A (en) * 2020-01-09 2020-06-12 北京字节跳动网络技术有限公司 Rich text data processing method and device, electronic equipment and computer storage medium
CN111338548A (en) * 2020-03-06 2020-06-26 深圳光大同创新材料有限公司 Method, device and storage medium for browsing and displaying directories and files in split screens
CN111460083A (en) * 2020-03-31 2020-07-28 北京百度网讯科技有限公司 Document title tree construction method and device, electronic equipment and storage medium
US10776434B1 (en) * 2016-11-16 2020-09-15 First American Financial Corporation System and method for document data extraction, data indexing, data searching and data filtering
CN112632437A (en) * 2020-11-27 2021-04-09 中国银联股份有限公司 Webpage generating method and device and computer readable storage medium
CN112668282A (en) * 2020-12-28 2021-04-16 山东鲁能软件技术有限公司 Method and system for converting format of equipment procedure document
CN113282793A (en) * 2021-04-01 2021-08-20 南京航空航天大学 Web table data semantic extraction and RDF construction method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786775B (en) * 2014-12-23 2018-11-16 珠海金山办公软件有限公司 Document schem drawing generating method and system
CN110795916A (en) * 2019-09-27 2020-02-14 北京浪潮数据技术有限公司 Side bar display method and system of document system

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5758361A (en) * 1996-03-20 1998-05-26 Sun Microsystems, Inc. Document editor for linear and space efficient representation of hierarchical documents
CN102855257A (en) * 2011-06-30 2013-01-02 北大方正集团有限公司 Catalogue processing method and catalogue processing device
CN103049543A (en) * 2012-12-26 2013-04-17 福建天晴数码有限公司 Method and tool for updating multi-branch configuration file
CN103902632A (en) * 2012-12-31 2014-07-02 华为技术有限公司 File system building method and device in key-value storage system, and electronic device
CN105630748A (en) * 2014-10-31 2016-06-01 富士通株式会社 Information processing device and information processing method
CN104462045A (en) * 2014-12-15 2015-03-25 北京信息科技大学 Method and device for processing documents
CN106033404A (en) * 2015-03-20 2016-10-19 广州金山移动科技有限公司 Chapter skipping method and device
US10776434B1 (en) * 2016-11-16 2020-09-15 First American Financial Corporation System and method for document data extraction, data indexing, data searching and data filtering
CN107153544A (en) * 2017-05-09 2017-09-12 合肥汉腾信息技术有限公司 A kind of Worksheet self-defining method and device
CN108563729A (en) * 2018-04-04 2018-09-21 福州大学 A kind of bidding website acceptance of the bid information extraction method based on dom tree
CN109145054A (en) * 2018-08-02 2019-01-04 力当高(上海)智能科技有限公司 A kind of method of managing customer end data
CN109815435A (en) * 2019-01-24 2019-05-28 中国人民解放军战略支援部队航天工程大学 A kind of Website page generation method, device and electronic equipment
CN110442822A (en) * 2019-08-02 2019-11-12 腾讯科技(深圳)有限公司 A kind of small routine content displaying method, device, equipment and storage medium
CN111274760A (en) * 2020-01-09 2020-06-12 北京字节跳动网络技术有限公司 Rich text data processing method and device, electronic equipment and computer storage medium
CN111338548A (en) * 2020-03-06 2020-06-26 深圳光大同创新材料有限公司 Method, device and storage medium for browsing and displaying directories and files in split screens
CN111460083A (en) * 2020-03-31 2020-07-28 北京百度网讯科技有限公司 Document title tree construction method and device, electronic equipment and storage medium
CN112632437A (en) * 2020-11-27 2021-04-09 中国银联股份有限公司 Webpage generating method and device and computer readable storage medium
CN112668282A (en) * 2020-12-28 2021-04-16 山东鲁能软件技术有限公司 Method and system for converting format of equipment procedure document
CN113282793A (en) * 2021-04-01 2021-08-20 南京航空航天大学 Web table data semantic extraction and RDF construction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
(java)word转html并提取word中的目录结构树生成到html页面中的左边树;脚穿草鞋;《https://blog.csdn.net/today_/article/details/107901405》;1-8 *
javaScript实现递归树目录结构;个人不完美;《https://zhuanlan.zhihu.com/47844638》;1-3 *

Also Published As

Publication number Publication date
CN113779235A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
US11003846B2 (en) Smarter copy/paste
CN107832045B (en) Method and apparatus for cross programming language interface conversion
US20090217153A1 (en) Document processing and management approach to editing a document in a mark up language environment using undoable commands
US20110178981A1 (en) Collecting community feedback for collaborative document development
US11314757B2 (en) Search results modulator
US20100306307A1 (en) System and method for social bookmarking/tagging at a sub-document and concept level
CN113779235B (en) Word document outline recognition processing method and device
CN110110184B (en) Information inquiry method, system, computer system and storage medium
US20030220914A1 (en) Method for managing data in a network
US20130326329A1 (en) Method and apparatus for collecting, merging and presenting content
CN113760600B (en) Database backup method, database restoration method and related devices
Marchionini et al. Curating for quality: Ensuring data quality to enable new science
CN113704242A (en) Data processing method and device
CN111914521A (en) Document bookmark creating method and device, electronic equipment and readable storage medium
CN113743432A (en) Image entity information acquisition method, device, electronic device and storage medium
CN110874302A (en) Method and device for determining buried point configuration information
CN115248803B (en) Collection method and device suitable for network disk file, network disk and storage medium
CN112835574B (en) Method and device for processing scalable vector graphic icons
CN111177183B (en) Method and device for generating database access statement
CN107391655B (en) Method and device for extracting trial reading file
CN113760271A (en) Method and device for generating codes for describing pages
CN112560420A (en) Method and device for automatically generating report file
CN117591167A (en) Method and device for rapidly analyzing and warehousing messages in multiple formats
CN116149630A (en) Document management method and device, electronic equipment and computer readable storage medium
CN117093754A (en) Digital archive retrieval method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant