CN113779235B

CN113779235B - Word document outline recognition processing method and device

Info

Publication number: CN113779235B
Application number: CN202111070726.9A
Authority: CN
Inventors: 麦天骥
Original assignee: BEIJING LEDICT TECHNOLOGY CO LTD
Current assignee: BEIJING LEDICT TECHNOLOGY CO LTD
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2024-02-02
Anticipated expiration: 2041-09-13
Also published as: CN113779235A

Abstract

The invention discloses a Word document outline identification processing method and a Word document outline identification processing device, which are characterized in that Word files are obtained, are locally stored and analyzed, and are converted into HTML code files; circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data; and generating title directory data corresponding to the Word files through tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data. The invention can carry out outline identification processing on the Word document, realize linkage of the catalog and the Word document, facilitate grasping of the outline of the Word document, and can be integrated in an application system to quickly generate a browsing editing help page.

Description

Word document outline recognition processing method and device

Technical Field

The invention relates to the technical field of Word document processing, in particular to a Word document outline identification processing method and device.

Background

Word is a Word processor application developed by microsoft corporation and is a component of Office software. Text and graphics in letters, reports, web pages or emails can be created and edited using Microsoft Office Word. Compared with a tablet and a notepad, the multifunctional notebook has stronger function and more comprehensive performance, and can be inserted with pictures, multimedia, artistic effects and the like. Word documents are widely used in various industries, and great convenience is brought to offices.

At present, along with the continuous advancement of informatization work, various application systems exist in related departments, and related Word documents are processed and displayed through the application systems, and particularly, in administrative departments, the Word documents uploaded by users are required to be processed and optimally displayed. To improve user experience, each application system is designed with help functions. The help function realizes the auxiliary processing of Word documents uploaded by users, and the help functions of each system are respectively different and not unified, so that the users are fussy to use, and the development workload is huge. Although Word software itself has the function of processing title directories, it cannot be incorporated into a specific application system. The Word document generally has outline, and how to quickly perform outline identification processing on the Word document so as to facilitate grasping of outline of the Word document has practical significance.

Disclosure of Invention

Therefore, the invention provides a Word document outline identification processing method and device, which can be used for carrying out outline identification processing on a Word document in a help system to generate a help page so as to facilitate the display processing of the Word document.

In order to achieve the above object, the present invention provides the following technical solutions: a Word document outline identification processing method comprises the following steps:

obtaining a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;

circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data;

and generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data.

As a preferable scheme of the Word document outline identification processing method, the Word file is stored in a local server, the Word file is converted into an HTML code file in the local server, and the generated HTML code file is returned to front-end equipment for displaying the Word file.

As a preferable scheme of the Word document outline identification processing method, the display interface of the front-end device comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file content corresponding to the HTML codes.

As a preferable scheme of the Word document outline identification processing method, when the Word file content of the rich text editor window changes, the title directory after the Word file content changes is triggered and generated again.

As a preferred scheme of the Word document outline identification processing method, comparing a title directory before the Word file content of the rich text editor window is changed with a title directory after the Word file content of the rich text editor window is changed;

if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;

if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;

if the title label exists in the title catalogs before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogs after the Word file content of the rich text editor window is changed.

As a preferred scheme of the Word document outline identification processing method, the title directory data generation step procedure includes:

judging whether the label level of the title is equal to 1:

if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;

judging whether the current level of the current title is larger than the parent level:

if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;

if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.

The invention also provides a Word document outline identification processing device, which comprises:

the Word file processing module is used for acquiring a Word file, locally storing and analyzing the Word file and converting the Word file into an HTML code file;

the title tag acquisition module is used for circulating all title tags in the HTML code file in JavaScript;

the title tag traversing module is used for traversing all title tags of the HTML code file by using a recursion algorithm and arranging the title tags into tree structure data;

the title directory generation module is used for generating title directory data corresponding to the Word file through the tree structure data;

and the linkage processing module is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key.

As a preferable scheme of the Word document outline identification processing device, storing the Word file to a local server, converting the Word file into an HTML code file at the local server, and returning the generated HTML code file to front-end equipment for displaying the Word file;

the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes.

As a preferred scheme of the Word document outline identification processing device, the device further comprises a title directory updating module, which is used for retriggering and generating a title directory after the Word file content of the rich text editor window is changed;

the title directory comparison module is used for comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;

As a preferable mode of the Word document outline identification processing device, the title directory generation module is:

judging whether the label level of the title is equal to 1:

The invention has the following advantages: the method comprises the steps of locally storing and analyzing a Word file by acquiring the Word file, and converting the Word file into an HTML code file; circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data; and generating title directory data corresponding to the Word files through tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data. The invention can carry out outline identification processing on the Word document, realize linkage of the catalog and the Word document, facilitate grasping of the outline of the Word document, and can be integrated in an application system to quickly generate a browsing editing help page.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.

FIG. 1 is a schematic flow chart of a Word document outline recognition processing method provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a technical route of a Word document outline recognition processing method provided in an embodiment of the present invention;

FIG. 3 is a schematic diagram showing a Word document outline recognition processing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a Word document outline recognition processing device provided in an embodiment of the present invention.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, 2 and 3, a Word document outline recognition processing method is provided, which includes the following steps:

s1, acquiring a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;

s2, circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data;

s3, generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data.

In this embodiment, the Word file is saved to a local server, the local server converts the Word file into an HTML code file, and the generated HTML code file is returned to a front-end device for displaying the Word file. The display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes.

Specifically, the Word file uploaded by the user is stored on a local server configured by the application system, the step of converting the Word file into the HTML code file is executed on the local server, and the generated result of the HTML code file is returned to the front-end equipment for display, so that the processing efficiency is improved.

Specifically, one implementation code of step S1 is as follows:

in this embodiment, the generated HTML code file is returned to the front-end device, then all the titles (h-tags) of the HTML code are circulated in JavaScript, and all the titles of the current document are sorted into tree-structured data using a recursive algorithm, one implementation code is as follows:

in this embodiment, after the Word file content of the rich text editor window changes, the title directory after the change of the Word file content is re-triggered and generated.

Comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;

In this embodiment, the title directory data generation procedure includes:

judging whether the label level of the title is equal to 1:

Specifically, one implementation code for title directory data generation is as follows:

referring to fig. 3, based on the technical scheme of the present invention, a general help system is designed, which is an online management system of a B/S structure, and can generate a corresponding help page, update description, operation guide, etc. through online editing. Each system only needs to simply reference a line js code to implement the help function.

As a general help system, the quick generation of help pages through Word files is supported. The structure of the help page is unified into the right content of the left outline, and many existing help functions exist in Word. The system supports Word document upload functionality.

After the Word document is uploaded, the online identification of the document outline is realized, and the document outline is displayed on the left side in a tree menu mode. And simultaneously, displaying corresponding Word content on the right side. Clicking on the right side of the left catalog locates the corresponding content, and the right side content editing outline may update the left side outline later.

In summary, the Word file is obtained, and is locally saved and analyzed, so that the Word file is converted into the HTML code file; circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data; and generating title directory data corresponding to the Word files through tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data. And after the Word file content of the rich text editor window is changed, the title directory after the Word file content is changed is re-triggered and generated. Comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed; if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label; if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label; if the title label exists in the title catalogue before and after the Word file content of the rich text editor window is changed, the main key of the title label is continuously used in the title catalogue after the Word file content of the rich text editor window is changed. The invention can carry out outline identification processing on the Word document, realize linkage of the catalog and the Word document, facilitate grasping of the outline of the Word document, and can be integrated in an application system to quickly generate a browsing editing help page.

Example 2

Referring to fig. 4, the present invention further provides a Word document outline recognition processing device, including:

the Word file processing module 1 is used for acquiring a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;

the title tag acquisition module 2 is used for circulating all the title tags in the HTML code file in JavaScript;

the title tag traversing module 3 is used for traversing all title tags of the HTML code file by using a recursive algorithm and arranging the title tags into tree structure data;

a title directory generation module 4, configured to generate title directory data corresponding to the Word file according to the tree structure data;

and the linkage processing module 5 is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key.

In this embodiment, the Word file is saved to a local server, the Word file is converted into an HTML code file at the local server, and the generated HTML code file is returned to a front-end device for displaying the Word file;

In this embodiment, the system further includes a title directory update module 6, configured to, after the Word file content of the rich text editor window changes, re-trigger generation of a title directory after the Word file content changes;

the title directory comparison module 7 is used for comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;

In this embodiment, the title directory generating module 4:

judging whether the label level of the title is equal to 1:

It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned apparatus is based on the same concept as the method embodiment in embodiment 1 of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.

Example 3

Embodiment 3 of the present invention provides a computer-readable storage medium in which program code of a Word document outline identification processing method is stored, the program code including instructions for executing the Word document outline identification processing method of embodiment 1 or any possible implementation thereof.

Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (SolidStateDisk, SSD)), etc.

Example 4

Embodiment 4 of the present invention provides an electronic device, where the electronic device includes a processor, and the processor is coupled to a storage medium, and when the processor executes instructions in the storage medium, the processor causes the electronic device to execute a Word document outline identification processing method of embodiment 1 or any possible implementation manner thereof.

Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and which may reside separately.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.).

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. The Word document outline identification processing method is characterized by comprising the following steps of:

generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data;

storing the Word file to a local server, converting the Word file into an HTML code file at the local server, and returning the generated HTML code file to front-end equipment for displaying the Word file;

the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes;

when the Word file content of the rich text editor window changes, re-triggering and generating a title directory after the Word file content changes;

if the title label exists in the title catalogues before and after the Word file content of the rich text editor window is changed, continuing to use the main key of the title label in the title catalogues after the Word file content of the rich text editor window is changed;

the title directory data generation procedure comprises the following steps:

judging whether the label level of the title is equal to 1:

2. A Word document outline recognition processing device, comprising:

the linkage processing module is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key;

the title directory updating module is used for re-triggering and generating a title directory after the Word file content of the rich text editor window is changed;

the title catalog generation module is as follows:

judging whether the label level of the title is equal to 1: