CN112307716A - Document content export method, export device, electronic equipment and storage medium - Google Patents

Document content export method, export device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112307716A
CN112307716A CN201910676712.8A CN201910676712A CN112307716A CN 112307716 A CN112307716 A CN 112307716A CN 201910676712 A CN201910676712 A CN 201910676712A CN 112307716 A CN112307716 A CN 112307716A
Authority
CN
China
Prior art keywords
content
document
symbol
processed
contents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910676712.8A
Other languages
Chinese (zh)
Inventor
欧振羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201910676712.8A priority Critical patent/CN112307716A/en
Publication of CN112307716A publication Critical patent/CN112307716A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the invention provides a document content exporting method, a document content exporting device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring first content from a document to be processed, wherein the first content has a preset identifier in the document to be processed, and the first content comprises: text content, formula content, or picture content; determining second content based on the first content and the document symbol of the front and back of the first content, wherein the second content is a sentence or paragraph comprising the first content, and the document symbol is used for identifying the end of the sentence or paragraph; the second content is derived. The embodiment of the invention can solve the technical problem that the existing document content export method can only export the content with the identification.

Description

Document content export method, export device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of office software technologies, and in particular, to a document content export method, an export apparatus, an electronic device, and a storage medium.
Background
In daily work, when people read documents, important contents in the documents are often required to be identified, for example, the contents are highlighted or underlined, and then the identified contents are exported to generate a new document, so that the contents can be conveniently and quickly queried.
In the prior art, after a user uses a function of exporting key content, a system usually exports only content with an identifier, and the specific process is as follows: the system firstly searches the contents with the marks in the document in sequence according to the reading sequence of the document, and can directly search the contents with the marks in the document because the marks are all preset, then the contents with the marks are exported to generate a new document containing the contents, for example, if the contents with the marks in the document are keywords, the contents exported by the system are also keywords; if the content in the document with the identification is a sentence, the system derives a sentence.
However, the existing document key content derivation methods can only derive content with a logo, that is, what is derived, so that the derived content is easy to be understood and poor in reading experience of users due to deviation from the context.
Disclosure of Invention
Embodiments of the present invention provide a document content export method, an export apparatus, an electronic device, and a storage medium, so as to solve a technical problem that an existing document content export method can only export a content with an identifier. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a document content export method, where the method includes:
acquiring first content from a document to be processed, wherein the first content has a preset identifier in the document to be processed, and the first content comprises: text content, formula content, or picture content;
determining second content based on the first content and a document symbol of a preceding and following text of the first content, wherein the second content is a sentence or a paragraph including the first content, and the document symbol is used for identifying the end of the sentence or the paragraph;
the second content is derived.
Optionally, the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content includes:
determining the second content based on the first content and the document symbol having the least number of words between the context of the first content and the first content.
Optionally, before the determining the second content based on the first content and the document symbol after and before the first content, the method further includes:
acquiring a first selection result of a user for each preset selection item in a first selection interface, wherein the first selection interface is provided with selection items for different derivation modes, and the derivation modes comprise: exporting the whole sentence content and exporting the whole piece of content;
the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content comprises:
according to the derivation mode determined by the first selection result, determining a first symbol of text content located before the first content and determining a second symbol of text content located after the first content in the document to be processed;
determining content between the first symbol and the second symbol as the second content.
Optionally, the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content includes:
in the document to be processed, taking the first character of the first content as a first initial searching point, and searching a document symbol with the least word number with the first content forward as the first symbol;
in the document to be processed, taking the last character of the first content as a second initial searching point, and searching a document symbol with the least word number with the first content backwards as the second symbol;
determining a content between the first symbol and the second symbol as the second content.
Optionally, before the obtaining the first content from the document to be processed, the method further includes:
acquiring a second selection result of a user for each preset selection item in a second selection interface, wherein the second selection interface is provided with selection items for different preset identifications;
the step of deriving the second content comprises:
deriving a plurality of second contents according to different preset identifications determined by the second selection result, wherein the plurality of second contents comprise: and the second content corresponds to the first content with different preset identifications.
Optionally, before the deriving the second content, the method further comprises:
acquiring a third selection result of a user for each preset selection item in a third selection interface, wherein the third selection interface is provided with a selection item for judging whether to derive the catalog of the document to be processed;
the step of deriving the second content comprises:
acquiring a directory of the document to be processed and node identifications corresponding to directory hierarchies in the directory from preset storage structure information of the document to be processed, wherein one node identification is used for identifying one directory hierarchy;
determining a node identifier which is positioned in the document to be processed and is before the second content and has the least word number with the second content;
determining a corresponding relation between the second content and the directory hierarchy according to the determined node identifier;
and adding the second content to a directory hierarchy corresponding to the second content according to the corresponding relation, and exporting the second content and the directory together.
Optionally, the step of obtaining the first content from the document to be processed includes:
acquiring a plurality of first contents from the document to be processed;
the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content comprises:
determining the second content corresponding to the first content according to the first contents and the document symbols before and after the first contents respectively;
when a plurality of identical second contents exist, retaining one of the plurality of identical second contents and deleting the other of the plurality of identical second contents;
when the plurality of identical second contents do not exist, the step of deriving the second contents is performed.
In a second aspect, an embodiment of the present invention provides a document content exporting apparatus, where the apparatus includes:
a first obtaining module, configured to obtain first content from a to-be-processed document, where the first content has a preset identifier in the to-be-processed document, and the first content includes: text content, formula content, or picture content;
a determining module, configured to determine second content based on the first content and a document symbol of a preceding or following text of the first content, where the second content is a sentence or a paragraph that includes the first content, and the document symbol is used to identify a sentence or a paragraph end;
and the export module is used for exporting the second content.
Optionally, the determining module is specifically configured to:
determining the second content based on the first content and the document symbol having the least number of words between the context of the first content and the first content.
Optionally, the apparatus further comprises:
a second obtaining module, configured to obtain a first selection result of a user for each preset selection item in a first selection interface, where the first selection interface is provided with selection items for different derivation manners, and the derivation manner includes: exporting the whole sentence content and exporting the whole piece of content;
the determining module comprises:
a first determining submodule, configured to determine, in the to-be-processed document, a first symbol of text content located before the first content and a second symbol of text content located after the first content according to the derivation manner determined by the first selection result;
a second determining sub-module, configured to determine that content between the first symbol and the second symbol is the second content.
Optionally, the determining module includes:
a first searching submodule, configured to forward search, in the to-be-processed document, a document symbol with a smallest number of words with the first content as a first starting search point, where the document symbol is a document symbol with a smallest number of words with the first content;
a second searching submodule, configured to search backward a document symbol with a smallest word count with respect to the first content as a second starting search point by using a last character of the first content as a second starting search point in the document to be processed, and use the document symbol as the second symbol;
a third determining sub-module, configured to determine content between the first symbol and the second symbol as the second content.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring a second selection result of the user for each preset selection item in a second selection interface, and the second selection interface is provided with selection items for different preset identifications;
the export module is specifically configured to:
deriving a plurality of second contents according to different preset identifications determined by the second selection result, wherein the plurality of second contents comprise: and the second content corresponds to the first content with different preset identifications.
Optionally, the apparatus further comprises:
the fourth obtaining module is used for obtaining a third selection result of the user for each preset selection item in a third selection interface, and the third selection interface is provided with a selection item for judging whether to derive the catalog of the document to be processed;
the derivation module includes:
the acquisition submodule is used for acquiring a directory of the document to be processed and node identifications corresponding to directory hierarchies in the directory from preset storage structure information of the document to be processed, wherein one node identification is used for identifying one directory hierarchy;
a fourth determining submodule, configured to determine a node identifier that is located before the second content in the to-be-processed document and has a smallest number of words with the second content;
a fifth determining submodule, configured to determine, according to the determined node identifier, a correspondence between the second content and the directory hierarchy;
and the adding submodule is used for adding the second content to a directory hierarchy corresponding to the second content according to the corresponding relation and exporting the second content and the directory together.
Optionally, the first obtaining module is specifically configured to:
acquiring a plurality of first contents from the document to be processed;
the determining module includes:
a sixth determining sub-module, configured to determine, according to the plurality of first contents and the document symbols in preceding and following paragraphs of the first contents, each of the second contents corresponding to the first content;
a processing sub-module, configured to, when a plurality of identical second contents exist, retain one of the plurality of identical second contents and delete the other of the plurality of identical second contents;
a triggering sub-module, configured to trigger the deriving module to perform the step of deriving the second content when the plurality of identical second contents do not exist.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus; the machine-readable storage medium stores machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method steps of the document content export method provided by the first aspect of the embodiment of the invention are realized.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is executed by a processor to perform the method steps of the document content export method provided in the first aspect of the embodiment of the present invention.
According to the document content export method, the export device, the electronic device and the storage medium provided by the embodiment of the invention, the first content is obtained from the document to be processed, the second content is determined based on the first content and the document symbol of the front and back text of the first content, the second content is a sentence or paragraph including the first content, and finally the second content is exported. By the document exporting method, the sentence or paragraph comprising the first content can be determined as the second content, and the second content is exported, namely, the first content is exported and the context of the first content is exported at the same time, so that the technical problem that the existing document content exporting method can only export the content with the mark is solved. Therefore, the document export method of the embodiment of the invention leads the content of the new document generated after exporting the second content to be more comprehensive, facilitates the user to understand the content on the basis, does not need to re-look over the original document, and improves the reading experience of the user. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a document content export method according to an embodiment of the present invention;
FIG. 2 is a schematic flowchart of another method for exporting document contents according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating step S102 in a document content export method according to an embodiment of the present invention;
FIG. 4 is a third flowchart illustrating a document content exporting method according to an embodiment of the present invention;
FIG. 5 is a fourth flowchart illustrating a document content exporting method according to an embodiment of the present invention;
FIG. 6 is a fifth flowchart illustrating a document content export method according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a document content export apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a document content export apparatus according to an embodiment of the present invention;
FIG. 9 is a diagram illustrating a first structure of a determination module in a document content export apparatus according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating a second structure of a determination module in a document content export apparatus according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating a third structure of a document content exporting apparatus according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating a fourth structure of a document content exporting apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of an export module in a document content export apparatus according to an embodiment of the present invention;
fig. 14 is a schematic diagram illustrating a third structure of a determination module in a document content export apparatus according to an embodiment of the present invention;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
With reference to this, the embodiment of the present invention first provides a document content export method, which may be applied to a terminal device, for example: electronic devices with information processing capabilities, such as electronic computers, tablet computers, smart phones, and the like.
As shown in fig. 1, an embodiment of the present invention provides a document content export method, which may include the following steps:
s101, acquiring first content from a document to be processed.
In the embodiment of the present invention, the Document to be processed may be a Word Document, a PPT (PowerPoint) Document, or a PDF (Portable Document Format) Document. As long as the user can add the preset identifier to the content in the document, the document of the type can derive the second content by using the document content deriving method provided by the embodiment of the invention.
The first content has a preset identifier in the document to be processed, and the preset identifier may include: highlighting, underlining, italics, or bolding, etc., as long as the preset identifier is an identifier in the identifier library of the document to be processed, and of course, the manner of implementing the feature other than the implementation manner shown in the illustrated example all belongs to the protection scope of the embodiment of the present invention.
The first content is considered to be more important by a user in the process of reading the document to be processed, and the first content may include: text content, formula content, or picture content.
S102, determining second content based on the first content and the document symbols before and after the first content.
In an embodiment of the present invention, the second content may be a sentence or a paragraph including the first content. A sentence herein may refer to a complete sentence, and the end of the sentence is usually a punctuation mark such as a period, question mark or exclamation mark, which is the end of the sentence symbol. A paragraph can refer to a whole paragraph, the end of which is typically a paragraph symbol, such as in a Word document, where the paragraph symbol is a carriage return symbol.
The document symbol is used to identify the end of a sentence or paragraph, and thus the document symbol herein may be an end-of-sentence symbol such as a period, question mark or exclamation mark, or the document symbol may be a paragraph symbol.
As an optional implementation manner of the embodiment of the present invention, step S102 in the embodiment of the present invention may specifically include:
the second content is determined based on the first content and the document symbol having the least number of words between the context of the first content and the first content.
In the embodiment of the present invention, the context of the first content may refer to, in the document to be processed, content located before the first content and located after the first content. When the document symbol is a sentence ending symbol such as a sentence number, a question mark or an exclamation mark, according to the document symbol with the minimum word number between the preceding and following texts of the first content and the first content, the document symbol which is positioned in front of the first content and has the minimum word number with the first content in the document to be processed can be determined to be the sentence ending symbol of the preceding sentence of the sentence where the first content is positioned; in the document to be processed, the document symbol which is located after the first content and has the least number of words with the first content should be the end-of-sentence symbol of the sentence in which the first content is located. The second content can be determined according to the sentence end symbol of the previous sentence and the sentence end symbol of the sentence where the first content is located.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 2, before S102 in the embodiment of the present invention, the method for exporting document content provided by the embodiment of the present invention further includes:
s201, acquiring a first selection result of the user for each preset selection item in the first selection interface.
In the embodiment of the present invention, the electronic device may present, to a user, a first selection interface, where a selection item for different export manners is set in the first selection interface, and the export manner may include: the whole sentence content is derived, and the whole piece of content is derived. The user can select the export mode according to the self understanding condition of the first content. For example, when the user thinks that the first content needs to be understood through a paragraph including the first content, the selection item for deriving the entire piece of content may be selected; when the user thinks that the first content can be understood only by a sentence including the first content, the selection item for deriving the content of the whole sentence can be selected.
After the electronic device obtains the first selection result of the user for each preset selection item in the first selection interface, as shown in fig. 2, S102 in the embodiment of the present invention may specifically include:
s10211, determining a first symbol of the text content before the first content and a second symbol of the text content after the first content in the document to be processed according to the derivation manner determined by the first selection result.
When the first selection result obtained by the electronic device is to derive the whole segment of content, a paragraph symbol of text content located before the first content may be determined in the document to be processed, where the paragraph symbol is the first symbol, and a paragraph symbol of text content located after the first content may be determined in the document to be processed, where the paragraph symbol is the second symbol; when the first selection result obtained by the electronic device is to derive the whole sentence content, a sentence end symbol of the text content before the first content may be determined in the to-be-processed document, where the sentence end symbol is the first symbol, and a sentence end symbol of the text content after the first content may be determined in the to-be-processed document, and the sentence end symbol is the second symbol.
S10212, determining the content between the first symbol and the second symbol as the second content.
Paragraph symbols of text content located before the first content and paragraph symbols of text content located after the first content, between which the paragraph comprising the first content is located; an end-of-sentence symbol of the text content located before the first content and an end-of-sentence symbol of the text content located after the first content, between which the sentence including the first content is located.
The deriving means may derive only the first content, and may select only the first content selection item to be derived when the user considers that only the first content is derived and the first content can be understood without deriving the context of the first content together. And the electronic equipment directly exports the first content after acquiring that the first selection result is that only the first content is exported.
By adopting the technical scheme, selection items aiming at different export modes can be provided for the user, so that the user can correspondingly select according to the actual requirement of the user, and the electronic equipment can export the content which is consistent with the export mode expected by the user according to the selection result of the user aiming at the selection items of different export modes. Therefore, the technical scheme can further meet the requirements of users.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 3, step S102 in the embodiment of the present invention may specifically include:
s10221, in the document to be processed, using the first character of the first content as the first initial searching point, and searching forward the document symbol with the least number of words with the first content as the first symbol.
When the second content is a sentence, a sentence end symbol with the minimum number of words with the first content may be searched forward in the document to be processed by using the first character of the first content as a first starting search point, the searched sentence end symbol is a sentence end symbol of a sentence previous to the sentence where the first content is located, and the sentence end symbol is the first symbol.
S10222, in the document to be processed, using the last character of the first content as the second initial searching point, and searching backward the document symbol with the least number of words with the first content as the second symbol.
In the document to be processed, the last character of the first content may be used as the second starting search point, and the sentence end symbol with the least number of words with the first content is searched backwards, where the searched sentence end symbol is the sentence end symbol of the sentence where the first content is located, and the sentence end symbol is the second symbol.
S10223, determining the content between the first symbol and the second symbol as the second content.
The content between the sentence end symbol of the sentence before the first content and the sentence end symbol of the sentence where the first content is located is the sentence where the first content is located, that is, the second content. Through the method flows of the above steps S10221 to S10223, the second content can be quickly determined.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 4, step S101 in the embodiment of the present invention may specifically include:
s1011, acquiring a plurality of first contents from the document to be processed.
Since the user may need to add the preset identifier to the plurality of first contents when reading the document to be processed, the plurality of first contents may be acquired.
When multiple first contents are acquired, as shown in fig. 4, step S102 in the embodiment of the present invention may specifically include:
s10231, determining second contents corresponding to the first contents according to the first contents and the document symbols before and after the first contents respectively.
First, second content corresponding to each first content may be determined, and a method of specifically determining the second content may refer to steps S10221 to S10223 shown in fig. 3, where the second content may be a sentence or a paragraph including the first content.
S10232, when there are a plurality of identical second contents, one of the plurality of identical second contents is retained, and the other of the plurality of identical second contents is deleted.
When the second content is a sentence, there may be a case where a plurality of first contents are located in the same sentence in the document to be processed, and therefore the second contents corresponding to the plurality of first contents are all the same sentence. The presence of a plurality of identical second contents may be determined, and if the identical second contents are present, one of all the identical second contents may be included, and the other second contents may be deleted, so as to ensure that two identical second contents are not present in all the derived second contents.
S10233, when there are no plurality of identical second contents, a step of deriving the second contents is performed.
If there is no different second content in all the second content, all the second content may be derived.
By adopting the technical scheme, whether the same second content exists is judged, if the same second content exists, one second content in all the same second contents is reserved, and other second contents are deleted, so that the situation that when a plurality of first contents are located in one sentence or one paragraph, the second contents corresponding to the plurality of first contents are all the same, and the plurality of same second contents exist in the finally derived new document is avoided, and therefore, the reading experience of a user can be further increased.
Referring to fig. 1, S103, the second content is derived.
When only one second content exists, exporting the second content and generating a new document; if a plurality of second contents exist, after all the second contents are exported, an enter symbol is added after each second content and a new document is generated.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 5, before S101 in the embodiment of the present invention, the method for exporting document content provided by the embodiment of the present invention further includes:
s301, acquiring a second selection result of the user for each preset selection item in the second selection interface.
In the embodiment of the present invention, the electronic device may present a second selection interface to the user, where the second selection interface is provided with selection items for different preset identifiers. For example, highlight, underline, italic, bold, and the like may be set in the preset selection item, and the preset selection item may be set according to the labeling habit of the user and the document type of the document to be processed, for example, when the document to be processed is a PDF document, since the preset identifiers in the PDF document only have highlight, underline, strikethrough, and the like, the three preset identifiers may be set as three selection items in the second selection interface. For another example, when the document to be processed is a Word document, since there are many preset identifiers in the Word document, but in order to make the difference between the first content and other contents in the document to be processed more obvious, the user may generally select a manner such as highlighting or changing the color of the document content, and therefore, for the Word document, the two preset identifiers may be set as two selection items in the second selection interface.
After the electronic device obtains the selection items of the user for different preset identifiers, as shown in fig. 5, S103 in the embodiment of the present invention may specifically include:
and S10311, deriving a plurality of second contents according to the different preset identifications determined by the second selection result.
In an embodiment of the present invention, the plurality of second contents includes: and the second content corresponds to the first content with different preset identifications.
When the user only selects one selection item of the preset identification, the electronic equipment only needs to export the second content corresponding to the first content with the preset identification; when the user selects a plurality of preset-identified selection items, for example, the user selects highlighting and bolding simultaneously, the electronic device needs to determine the second content corresponding to the highlighted first content and the second content corresponding to the bolded first content, and export both the second contents.
In general, when the document to be processed contains multiple aspects of knowledge, the user adds different preset identifications to the different aspects of knowledge. Therefore, in this case, the above technical solution may provide a user with options for a plurality of different preset identifiers, and when the user only needs knowledge of one aspect, the user may select the preset identifier corresponding to the knowledge of the aspect, and the electronic device only derives the knowledge of the aspect; when a user needs knowledge in multiple aspects, the electronic device can simultaneously export the knowledge in the multiple aspects to a new document by selecting multiple preset identifiers, and the user can also generate a new document by selecting different preset identifiers each time and exporting the new document, so that the knowledge in different aspects is respectively exported to different new documents. Therefore, by adopting the scheme, the user can select according to the actual requirement of the user, and the reading experience of the user is further improved.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 6, before S103 in the embodiment of the present invention, the method for exporting document content provided by the embodiment of the present invention further includes:
s401, acquiring a third selection result of the user for each preset selection item in the third selection interface.
In the embodiment of the present invention, the electronic device may present a third selection interface to the user, where a selection item whether to export the catalog of the document to be processed is set in the third selection interface, and the preset selection item may be generally set as two selection items, namely yes and no, for the user to select.
When the obtained third selection result of the user for the third selection interface is yes, that is, the user wishes to export the catalog of the document to be processed, as shown in fig. 6, step S103 in the embodiment of the present invention may specifically include:
s10321, obtain a directory of the document to be processed from the preset storage structure information of the document to be processed, and node identifiers corresponding to each directory hierarchy in the directory.
The document content, the directory, and the information that the document attribute is equal to the document are usually included in the preset storage structure information, so that the directory of the document to be processed and the node identifier corresponding to each directory hierarchy in the directory can be obtained from the preset storage structure information of the document to be processed. In an embodiment of the present invention, a node identifier is used to identify a directory hierarchy, for example, for a Word document, the node identifier may be a paragraph symbol at the end of each directory hierarchy.
S10322, the node identification which is positioned in the document to be processed before the second content and has the least word number with the second content is determined.
The specific determination method may be that a node identifier with the minimum number of words between the node identifier and the second content is searched forward from the first character of the second content, and the searched node identifier is determined as the node identifier corresponding to the directory hierarchy where the second content is located. When there are a plurality of second contents, the node identifier corresponding to each second content may be determined separately according to the above method.
S10323, determining a correspondence between the second content and the directory hierarchy according to the determined node identifier.
According to the node identifier corresponding to each second content, the directory hierarchy corresponding to each second content can be determined, and then the corresponding relation between each second content and the directory hierarchy can be determined.
S10324, adding the second content to the directory hierarchy corresponding to the second content according to the correspondence, and exporting the second content and the directory together.
Each second content can be added to the directory hierarchy corresponding to each second content according to the corresponding relationship between the second content and the directory hierarchy, and the second content and the directory can be exported together.
By adopting the technical scheme, a selection item for judging whether to export the catalog or not can be provided for a user, the user can select whether to export the catalog or not according to the requirement of the user, and if the user thinks that the content in the document to be processed is less and the catalog is not needed, the electronic equipment only exports the second content by judging whether to choose not; if the user thinks that the second content cannot be completely understood when reading the second content in a new document generated after exporting the second content, or the user wants to see the content before and after the second content in the document to be processed, the electronic device exports the catalog and the second content together by selecting yes, so that the user can quickly find the second content in the document to be processed according to the catalog and look over the content before and after the second content, thereby facilitating better understanding of the second content. Therefore, the technical scheme can further meet the requirements of customers.
According to the document content export method provided by the embodiment of the invention, the first content is obtained from the document to be processed, the second content is determined based on the first content and the document symbol before and after the first content, the second content is a sentence or paragraph including the first content, and finally the second content is exported. By the document exporting method, the sentence or paragraph comprising the first content can be determined as the second content, and the second content is exported, namely, the first content is exported and the context of the first content is exported at the same time, so that the technical problem that the existing document content exporting method can only export the content with the mark is solved. Therefore, the document export method of the embodiment of the invention leads the content of the new document generated after exporting the second content to be more comprehensive, facilitates the user to understand the content on the basis, does not need to re-look over the original document, and improves the reading experience of the user.
A specific embodiment of a document content exporting apparatus according to an embodiment of the present invention corresponds to the flow shown in fig. 1, with reference to fig. 7, fig. 7 is a schematic structural diagram of a document content exporting apparatus according to an embodiment of the present invention, including:
a first obtaining module 501, configured to obtain first content from a to-be-processed document, where the first content has a preset identifier in the to-be-processed document, and the first content includes: text content, formula content, or picture content.
A determining module 502, configured to determine, based on the first content and a document symbol of a preceding or following text of the first content, a second content, where the second content is a sentence or a paragraph that includes the first content, and the document symbol is used to identify an end of the sentence or the paragraph.
A derivation module 503, configured to derive the second content.
As an optional implementation manner of the embodiment of the present invention, the determining module 502 is specifically configured to:
the second content is determined based on the first content and the document symbol having the least number of words between the context of the first content and the first content.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 8, the document content exporting apparatus provided in the embodiment of the present invention may further include:
a second obtaining module 601, configured to obtain a first selection result of a user for each preset selection item in a first selection interface, where the first selection interface is provided with selection items for different derivation manners, and the derivation manner includes: the whole sentence content is derived, and the whole piece of content is derived.
As shown in fig. 9, the determining module 502 may include:
the first determining sub-module 5021 is configured to determine, according to the derivation manner determined by the first selection result, a first symbol of the text content located before the first content and a second symbol of the text content located after the first content in the document to be processed.
The second determining submodule 5022 is used for determining that the content between the first symbol and the second symbol is the second content.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 10, the determining module 502 may include:
the first searching sub-module 5023 is used for searching the document symbol with the least number of words between the document symbol and the first content forward in the document to be processed by taking the first character of the first content as a first starting searching point, and taking the first character of the first content as a first symbol.
The second search sub-module 5024 is configured to search backward a document symbol with the least number of words with the first content as a second symbol, with the last character of the first content as a second starting search point, in the document to be processed.
A third determining submodule 5025 is used for determining the content between the first symbol and the second symbol as the second content.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 11, the document content exporting apparatus provided in the embodiment of the present invention may further include:
a third obtaining module 701, configured to obtain a second selection result of the user for each preset selection item in a second selection interface, where the second selection interface is provided with selection items for different preset identifiers.
The derivation module is specifically configured to:
deriving a plurality of second contents according to different preset identifications determined by the second selection result, wherein the plurality of second contents comprise: and the second content corresponds to the first content with different preset identifications.
As an optional implementation manner of the embodiment of the present invention, as shown in fig. 12, the document content exporting apparatus provided in the embodiment of the present invention may further include:
a fourth obtaining module 801, configured to obtain a third selection result of the user for each preset selection item in a third selection interface, where a selection item for whether to export a directory of the to-be-processed document is set in the third selection interface.
As shown in fig. 13, the derivation module 503 may include:
the obtaining sub-module 5031 is configured to obtain, from the preset storage structure information of the to-be-processed document, a directory of the to-be-processed document and node identifiers corresponding to directory hierarchies in the directory, where one node identifier is used to identify one directory hierarchy.
A fourth determining sub-module 5032, configured to determine the node identifier that is located before the second content in the document to be processed and has the least number of words with the second content.
The fifth determining sub-module 5033 is configured to determine, according to the determined node identifier, a correspondence between the second content and the directory hierarchy.
The adding sub-module 5034 is configured to add the second content to a directory hierarchy corresponding to the second content according to the correspondence, and export the second content and the directory together.
As an optional implementation manner of the embodiment of the present invention, the first obtaining module 501 is specifically configured to: a plurality of first contents are obtained from a document to be processed.
As shown in fig. 14, the determining module 502 may include:
the sixth determining sub-module 5026 is configured to determine each of the second contents corresponding to the first contents according to the plurality of first contents and the document symbols before and after the first contents, respectively.
The processing sub-module 5027 is configured to, when a plurality of identical second contents exist, retain one of the plurality of identical second contents and delete the other of the plurality of identical second contents.
The triggering submodule 5028 is configured to trigger the deriving module 503 to perform the step of deriving the second content when there are no multiple identical second contents.
The document content exporting apparatus provided by the embodiment of the present invention obtains a first content from a document to be processed, determines a second content based on the first content and a document symbol before and after the first content, where the second content is a sentence or paragraph including the first content, and finally exports the second content. By the document exporting method, the sentence or paragraph comprising the first content can be determined as the second content, and the second content is exported, namely, the first content is exported and the context of the first content is exported at the same time, so that the technical problem that the existing document content exporting method can only export the content with the mark is solved. Therefore, the document export method of the embodiment of the invention leads the content of the new document generated after exporting the second content to be more comprehensive, facilitates the user to understand the content on the basis, does not need to re-look over the original document, and improves the reading experience of the user.
An embodiment of the present invention further provides an electronic device, as shown in fig. 15, including a processor 901, a communication interface 902, a memory 903 and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904.
A memory 903 for storing computer programs.
The processor 901 is configured to implement the following steps when executing the program stored in the memory 903:
acquiring first content from a document to be processed, wherein the first content has a preset identifier in the document to be processed, and the first content comprises: text content, formula content, or picture content.
And determining second content based on the first content and the document symbol of the front and back of the first content, wherein the second content is a sentence or paragraph comprising the first content, and the document symbol is used for identifying the end of the sentence or paragraph.
The second content is derived.
According to the electronic device provided by the embodiment of the invention, the first content is obtained from the document to be processed, the second content is determined based on the first content and the document symbol before and after the first content, the second content is a sentence or paragraph including the first content, and finally the second content is exported. By the document exporting method, the sentence or paragraph comprising the first content can be determined as the second content, and the second content is exported, namely, the first content is exported and the context of the first content is exported at the same time, so that the technical problem that the existing document content exporting method can only export the content with the mark is solved. Therefore, the document export method of the embodiment of the invention leads the content of the new document generated after exporting the second content to be more comprehensive, facilitates the user to understand the content on the basis, does not need to re-look over the original document, and improves the reading experience of the user.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is enabled to execute the document content export method described in any of the above embodiments.
For the apparatus/electronic device/storage medium embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.
It should be noted that the apparatus, the electronic device, and the storage medium according to the embodiments of the present invention are respectively an apparatus, an electronic device, and a storage medium to which the above document content export method is applied, and all embodiments of the above document content export method are applicable to the apparatus, the electronic device, and the storage medium, and can achieve the same or similar beneficial effects.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (16)

1. A method for exporting document contents, the method comprising:
acquiring first content from a document to be processed, wherein the first content has a preset identifier in the document to be processed, and the first content comprises: text content, formula content, or picture content;
determining second content based on the first content and a document symbol of a preceding and following text of the first content, wherein the second content is a sentence or a paragraph including the first content, and the document symbol is used for identifying the end of the sentence or the paragraph;
the second content is derived.
2. The method of claim 1, wherein the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content comprises:
determining the second content based on the first content and the document symbol having the least number of words between the context of the first content and the first content.
3. The method of claim 1 or 2, wherein prior to said determining second content based on said first content and a document notation of a preceding or following text of said first content, the method further comprises:
acquiring a first selection result of a user for each preset selection item in a first selection interface, wherein the first selection interface is provided with selection items for different derivation modes, and the derivation modes comprise: exporting the whole sentence content and exporting the whole piece of content;
the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content comprises:
according to the derivation mode determined by the first selection result, determining a first symbol of text content located before the first content and determining a second symbol of text content located after the first content in the document to be processed;
determining content between the first symbol and the second symbol as the second content.
4. The method of claim 3, wherein the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content comprises:
in the document to be processed, taking the first character of the first content as a first initial searching point, and searching a document symbol with the least word number with the first content forward as the first symbol;
in the document to be processed, taking the last character of the first content as a second initial searching point, and searching a document symbol with the least word number with the first content backwards as the second symbol;
determining a content between the first symbol and the second symbol as the second content.
5. The method according to claim 1, wherein before the obtaining the first content from the document to be processed, the method further comprises:
acquiring a second selection result of a user for each preset selection item in a second selection interface, wherein the second selection interface is provided with selection items for different preset identifications;
the step of deriving the second content comprises:
deriving a plurality of second contents according to different preset identifications determined by the second selection result, wherein the plurality of second contents comprise: and the second content corresponds to the first content with different preset identifications.
6. The method of claim 1, wherein prior to said deriving said second content, said method further comprises:
acquiring a third selection result of a user for each preset selection item in a third selection interface, wherein the third selection interface is provided with a selection item for judging whether to derive the catalog of the document to be processed;
the step of deriving the second content comprises:
acquiring a directory of the document to be processed and node identifications corresponding to directory hierarchies in the directory from preset storage structure information of the document to be processed, wherein one node identification is used for identifying one directory hierarchy;
determining a node identifier which is positioned in the document to be processed and is before the second content and has the least word number with the second content;
determining a corresponding relation between the second content and the directory hierarchy according to the determined node identifier;
and adding the second content to a directory hierarchy corresponding to the second content according to the corresponding relation, and exporting the second content and the directory together.
7. The method according to claim 1, wherein the step of obtaining the first content from the document to be processed comprises:
acquiring a plurality of first contents from the document to be processed;
the step of determining the second content based on the first content and the document symbol of the preceding and following text of the first content comprises:
determining the second content corresponding to the first content according to the first contents and the document symbols before and after the first contents respectively;
when a plurality of identical second contents exist, retaining one of the plurality of identical second contents and deleting the other of the plurality of identical second contents;
when the plurality of identical second contents do not exist, the step of deriving the second contents is performed.
8. An apparatus for exporting content of a document, the apparatus comprising:
a first obtaining module, configured to obtain first content from a to-be-processed document, where the first content has a preset identifier in the to-be-processed document, and the first content includes: text content, formula content, or picture content;
a determining module, configured to determine second content based on the first content and a document symbol of a preceding or following text of the first content, where the second content is a sentence or a paragraph that includes the first content, and the document symbol is used to identify a sentence or a paragraph end;
and the export module is used for exporting the second content.
9. The apparatus of claim 8, wherein the determining module is specifically configured to:
determining the second content based on the first content and the document symbol having the least number of words between the context of the first content and the first content.
10. The apparatus of claim 8 or 9, further comprising:
a second obtaining module, configured to obtain a first selection result of a user for each preset selection item in a first selection interface, where the first selection interface is provided with selection items for different derivation manners, and the derivation manner includes: exporting the whole sentence content and exporting the whole piece of content;
the determining module includes:
a first determining submodule, configured to determine, in the to-be-processed document, a first symbol of text content located before the first content and a second symbol of text content located after the first content according to the derivation manner determined by the first selection result;
a second determining sub-module, configured to determine that content between the first symbol and the second symbol is the second content.
11. The apparatus of claim 10, wherein the determining module comprises:
a first searching submodule, configured to forward search, in the to-be-processed document, a document symbol with a smallest number of words with the first content as a first starting search point, where the document symbol is a document symbol with a smallest number of words with the first content;
a second searching submodule, configured to search backward a document symbol with a smallest word count with respect to the first content as a second starting search point by using a last character of the first content as a second starting search point in the document to be processed, and use the document symbol as the second symbol;
a third determining sub-module, configured to determine content between the first symbol and the second symbol as the second content.
12. The apparatus of claim 8, further comprising:
the third acquisition module is used for acquiring a second selection result of the user for each preset selection item in a second selection interface, and the second selection interface is provided with selection items for different preset identifications;
the export module is specifically configured to:
deriving a plurality of second contents according to different preset identifications determined by the second selection result, wherein the plurality of second contents comprise: and the second content corresponds to the first content with different preset identifications.
13. The apparatus of claim 8, further comprising:
the fourth obtaining module is used for obtaining a third selection result of the user for each preset selection item in a third selection interface, and the third selection interface is provided with a selection item for judging whether to derive the catalog of the document to be processed;
the derivation module includes:
the acquisition submodule is used for acquiring a directory of the document to be processed and node identifications corresponding to directory hierarchies in the directory from preset storage structure information of the document to be processed, wherein one node identification is used for identifying one directory hierarchy;
a fourth determining submodule, configured to determine a node identifier that is located before the second content in the to-be-processed document and has a smallest number of words with the second content;
a fifth determining submodule, configured to determine, according to the determined node identifier, a correspondence between the second content and the directory hierarchy;
and the adding submodule is used for adding the second content to a directory hierarchy corresponding to the second content according to the corresponding relation and exporting the second content and the directory together.
14. The apparatus of claim 8, wherein the first obtaining module is specifically configured to:
acquiring a plurality of first contents from the document to be processed;
the determining module includes:
a sixth determining sub-module, configured to determine, according to the plurality of first contents and the document symbols in preceding and following paragraphs of the first contents, each of the second contents corresponding to the first content;
a processing sub-module, configured to, when a plurality of identical second contents exist, retain one of the plurality of identical second contents and delete the other of the plurality of identical second contents;
a triggering sub-module, configured to trigger the deriving module to perform the step of deriving the second content when the plurality of identical second contents do not exist.
15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.
16. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN201910676712.8A 2019-07-25 2019-07-25 Document content export method, export device, electronic equipment and storage medium Pending CN112307716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910676712.8A CN112307716A (en) 2019-07-25 2019-07-25 Document content export method, export device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910676712.8A CN112307716A (en) 2019-07-25 2019-07-25 Document content export method, export device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112307716A true CN112307716A (en) 2021-02-02

Family

ID=74329193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910676712.8A Pending CN112307716A (en) 2019-07-25 2019-07-25 Document content export method, export device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112307716A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022184065A1 (en) * 2021-03-01 2022-09-09 北京字跳网络技术有限公司 Electronic document processing method and apparatus, terminal, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010107327A1 (en) * 2009-03-20 2010-09-23 Syl Research Limited Natural language processing method and system
CN107203498A (en) * 2016-03-18 2017-09-26 北京京东尚科信息技术有限公司 A kind of method, system and its user terminal and server for creating e-book
CN107967249A (en) * 2017-12-25 2018-04-27 重庆宝力优特科技有限公司 A kind of word storage method and device
CN109299214A (en) * 2018-11-09 2019-02-01 医渡云(北京)技术有限公司 Text information extracting method, device, medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010107327A1 (en) * 2009-03-20 2010-09-23 Syl Research Limited Natural language processing method and system
CN107203498A (en) * 2016-03-18 2017-09-26 北京京东尚科信息技术有限公司 A kind of method, system and its user terminal and server for creating e-book
CN107967249A (en) * 2017-12-25 2018-04-27 重庆宝力优特科技有限公司 A kind of word storage method and device
CN109299214A (en) * 2018-11-09 2019-02-01 医渡云(北京)技术有限公司 Text information extracting method, device, medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022184065A1 (en) * 2021-03-01 2022-09-09 北京字跳网络技术有限公司 Electronic document processing method and apparatus, terminal, and storage medium

Similar Documents

Publication Publication Date Title
CN106682219B (en) Associated document acquisition method and device
US20210073463A1 (en) Human-computer interaction method and apparatus thereof
CN105868166B (en) Regular expression generation method and system
WO2016202279A1 (en) Interface interaction method and apparatus
WO2019242164A1 (en) Document management method and apparatus, computer device and storage medium
US20140089841A1 (en) Device and method for providing application interface based on writing input
CN106970758B (en) Electronic document operation processing method and device and electronic equipment
US8756520B2 (en) Individual information element access for unopened objects
CN112307716A (en) Document content export method, export device, electronic equipment and storage medium
CN111414727A (en) Method and device for editing header and footer of PDF (Portable document Format) document and electronic equipment
CN108304117A (en) A kind of list fills in floating based reminding method, device, electronic equipment and storage medium
CN111553130A (en) Chapter title style conversion method and device, electronic equipment and storage medium
CN114047855B (en) Form editing method and device and terminal equipment
CN114282499A (en) Document generation method and device with customized chart, electronic equipment and medium
CN107885862B (en) Image display method and device
CN111563364B (en) Chapter title style conversion method and device, electronic equipment and storage medium
CN113407073A (en) Information display method and electronic equipment
CN111444716A (en) Title word segmentation method, terminal and computer readable storage medium
CN112579937A (en) Character highlight display method and device
US9990420B2 (en) Method of searching and generating a relevant search string
CN112783400B (en) Document content selection method and device, electronic equipment and storage medium
CN110929048A (en) Bookmark generation method and device, electronic equipment and storage medium
CN112100122B (en) Method and device for storing picture
US20210149967A1 (en) Document management apparatus, document management system, and non-transitory computer readable medium storing program
JP7464462B2 (en) Human-computer interaction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination