WO2016201807A1

WO2016201807A1 - Document processing method and device, and computer storage medium

Info

Publication number: WO2016201807A1
Application number: PCT/CN2015/090053
Authority: WO
Inventors: 黄珏
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-06-16
Filing date: 2015-09-18
Publication date: 2016-12-22
Also published as: CN106326198A

Abstract

A document processing method comprises the following steps: acquiring a term in an XML document (S10); determining whether the acquired term matches a term in a preset termbase (S20); and if so, inserting, at a corresponding position of the XML document, link information of the term matching a term in the preset termbase (S30).

Description

Document processing method, device and computer storage medium

Technical field

The present invention relates to the field of automation technologies, and in particular, to a document processing method, apparatus, and computer storage medium.

Background technique

XML (Extensible Markup Language) can extend the markup language. Extended markup language XML is a simple data storage language. Like HTML, it is based on the standard common language SGML. Often used to simplify the storage and sharing of data, it is a powerful tool for processing structured document information.

To accommodate the rapid development of XML documents for products, most companies have established independent termbases to manage terms in XML documents and use them.

Currently, in order to successfully use the terminology when executing an XML document, each time an XML document is produced, the author of the document manually performs the steps of finding, identifying, and creating link information in the XML document in the termbase.

Summary of the invention

Embodiments of the present invention are directed to a document processing method and apparatus capable of automatically inserting link information of a term in an XML document.

A document processing method provided by an embodiment of the present invention includes the steps of: obtaining the term in an XML document; determining whether the obtained term matches a term in a preset termbase; The term matches the term in the preset termbase, and the link information of the term matching the term in the preset termbase is inserted at the corresponding position of the XML document.

Optionally, after the step of obtaining the term in the XML document, the determining is obtained The step of whether the term matches the term in the preset termbase further comprises: determining whether the obtained term matches the term in the blacklist; if the determination result is no, performing the The step of whether the term matches the term in the pre-defined termbase.

Optionally, the step of obtaining the term in the XML document includes: obtaining English content in the XML document; determining whether there is an uppercase letter in the English content other than the initial letter; In addition to the initials in the content, there are also uppercase letters, and the English content is determined to be a term.

Optionally, the step of determining whether the term obtained by the determining matches the term in the preset termbase comprises: determining whether the obtained term matches the term in the whitelist; if the term and the whitelist are obtained In the term matching, the link information of the term matching the term in the whitelist is inserted at the corresponding position of the XML document; if the obtained term does not match the term in the whitelist, the obtained term is determined to be local Whether the terms in the index match; if the terms obtained match the terms in the local index, the link information of the term matching the term in the local index is inserted at the corresponding position of the XML document; if the term is obtained and local If the terms in the index do not match, it is judged whether the obtained term matches the term in the remote termbase; if the term obtained matches the term in the remote termbase, the corresponding position in the XML document is inserted and remotely Link information for terms in the termbase that match terms.

Optionally, if the term obtained is matched with a term in the preset termbase, the step of inserting the link information of the term matching the term in the preset termbase at the corresponding position of the XML document includes: obtaining And displaying a selection interface when the number of terms matching the term in the preset termbase is at least one; receiving a selection command triggered by the user on the selection interface, inserting the corresponding position in the XML document according to the selection command The link information of the term.

An embodiment of the present invention further provides an apparatus for using a term, the apparatus comprising: an obtaining module configured to obtain the term in an XML document; and a first determining module configured to determine the obtained terminology and preset termbase Whether the terms in the match match, the preset termbase includes a whitelist, this a local index and a remote termbase; a processing module configured to insert link information of a term matching the term in the preset termbase at a corresponding position of the XML document if the term obtained matches a term in the preset termbase .

Optionally, the device further includes: a second determining module, configured to determine whether the obtained term matches the blacklist; and the first determining module is configured to: if the determining result of the second determining module is no Then, it is judged whether the obtained term matches the term in the preset termbase.

Optionally, the obtaining module includes: an obtaining unit configured to obtain English content in the XML document; and a determining unit configured to determine whether an uppercase letter exists in the English content except the initial letter; And determining, when the judgment result of the determining unit is that the English content is uppercase, in addition to the initials, determining that the English content is the term.

Optionally, the first determining module includes: a first determining unit configured to determine whether the obtained term matches a term in the whitelist; and the processing module is configured to: if the obtained term and the whitelist In the term matching, the link information of the term matching the term in the whitelist is inserted at the corresponding position of the XML document; the second determining unit is configured to determine when the obtained term does not match the term in the whitelist Whether the obtained term matches the term in the local index, and if the term obtained matches the term in the local index, the processing module inserts a term corresponding to the term in the local index at the corresponding position of the XML document. Linking information; a third determining unit configured to determine whether the obtained term matches a term in the remote termbase when the obtained term does not match the term in the local index, if the term is obtained and remotely If the term in the termbase matches, then the processing module inserts a term that matches the term in the remote termbase at the corresponding location of the XML document. Contact information.

Optionally, the processing module includes: a display unit configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one; Configuring to receive a selection command triggered by the user on the selection interface, The link information of the term is inserted at a corresponding position of the XML document according to the selection command for use of the term.

The embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to perform at least one of the foregoing methods.

The embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining a term in the XML document; if the obtained term matches the term in the preset termbase, the XML is in the XML The corresponding position of the document is inserted into the link information of the term that matches the term in the preset termbase. In the above manner, the embodiment of the present invention can automatically obtain the term from the XML document, without human participation, search for the term matching the obtained term in the preset termbase, and insert a link of the matching term in the corresponding position of the XML document. information. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.

DRAWINGS

1 is a schematic flowchart of a first embodiment of a document processing method according to the present invention;

2 is a schematic flowchart diagram of a second embodiment of a document processing method according to the present invention;

3 is a schematic flowchart of obtaining the term in an XML document according to an embodiment of the present invention;

4 is a schematic flowchart diagram of a third embodiment of a document processing method according to the present invention;

FIG. 5 is a schematic flowchart of linking information of a term that matches a term in a preset termbase in a corresponding position of the XML document according to an embodiment of the present invention;

6 is a schematic diagram of a selection interface in FIG. 5;

FIG. 7 is a schematic structural diagram of a first embodiment of a document processing apparatus according to the present invention; FIG.

FIG. 8 is a schematic structural diagram of a first embodiment of a document processing apparatus according to the present invention; FIG.

9 is a schematic structural view of the module obtained in FIG. 7;

10 is a schematic structural diagram of a first judging module in FIG. 7;

Figure 11 is a schematic structural view of the processing module of Figure 7;

[Correct according to Rule 91 06.01.2016]
FIG. 12 is a schematic structural diagram of another document processing apparatus.

detailed description

The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings.

The embodiment of the invention provides a document processing method.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart diagram of a first embodiment of a document processing method according to the present invention.

In this embodiment, the document processing method includes:

Step S10, obtaining the term in the XML document;

The user inputs the local index file path or the path of the XML file to be processed in the corresponding input box of the software of the embodiment of the present invention. Of course, the XML file to be processed can be opened by the software of the embodiment of the present invention. The software of the present invention finds the corresponding XML document to be processed according to the path of the XML file to be processed input by the user, reads the content in the XML document, and automatically searches for terms in the XML document. The terminology in the present invention includes terms and/or contraction. Abbreviation, thereby obtaining the term in the XML document, and then proceeds to step S20.

Step S20, determining whether the obtained term matches the term in the preset termbase;

The preset term library in this embodiment includes, but is not limited to, the name, address, English full name and Chinese full name of the term. In the specific implementation, different contents may be set according to different terms. Determining whether the term matches the term in the preset termbase according to the term in the XML document obtained in step S10, such as whether the term obtained by the judgment in the embodiment is in the name column in the preset termbase. The content is the same. If the result of the judgment is that the term matches the term in the preset termbase, Then, the process proceeds to step S30, otherwise, the process proceeds to step S40.

Step S30, inserting link information of a term matching the term in the preset termbase in the corresponding position of the XML document;

According to the judgment result of step S20, when the term matches the term in the preset termbase, a corresponding link of the term in the XML document matching the term in the preset termbase is inserted into the link corresponding to the term information. When the XML document is subsequently used, the attribute corresponding to the term can be read through the inserted link information, that is, the name, address, full name of English, and Chinese full name of the term are read, so that the term is used to perform the corresponding Operate or display the corresponding content.

In step S40, the prompt information is displayed.

According to the judgment result of step S20, when the term does not match the term in the preset termbase, the prompt information may be displayed; or the prompt information may be displayed after the XML document is processed; of course, the prompt information may not be displayed.

The embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining the term in the XML document; if the obtained term matches the term in the preset termbase, The link information of the term is inserted into the corresponding position of the XML document for use of the term. In the above manner, the present invention can automatically obtain terms from an XML document, without human intervention, look up terms in the preset termbase that match the obtained terminology, and insert link information of matching terms in the corresponding position of the XML document. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.

Please refer to FIG. 2. FIG. 2 is a schematic flowchart diagram of a second embodiment of a document processing method according to the present invention.

Based on the first embodiment, between step S10 and step S20, the method may further include:

Step S50, determining whether the obtained term matches the term in the blacklist;

In this embodiment, a blacklist may be established locally or in a server, and the blacklist includes terms that do not need to be judged. In the specific implementation, the blacklist may not be established. After the term obtained in step S10, before the step of judging whether the obtained term matches the term in the preset termbase in step S20, it may be judged whether the obtained term matches the term in the blacklist. If the result of the judgment is that the obtained term matches the term in the blacklist, the process proceeds to step S60; if the result of the judgment is that the obtained term does not match the term in the blacklist, the process proceeds to step S20.

In step S60, no processing is performed.

If the result of the determination in step S50 is that the obtained term matches the term in the blacklist, no processing is performed, and of course, it is also possible to return to step S50 to continue to determine whether the next term matches the term in the blacklist.

Please refer to FIG. 3. FIG. 3 is a schematic flowchart of the steps of obtaining the term in the XML document in FIG.

Based on the first embodiment, step S10 includes:

Step S11, obtaining English content in the XML document;

The user inputs the local index file path or the path of the XML document to be processed in the corresponding input box of the software in the embodiment of the present invention. Of course, the XML document to be processed can also be opened by the software of the present invention. After the user inputs the local index file path or the path of the XML document to be processed, the user clicks the start processing button, and the embodiment of the present invention starts to read the XML document node by node based on the start command triggered by the user. XML documents are generally divided into two types: one is a pure English document; the other is a mixture of English and other types of text, such as a mixed document in Chinese and English. Before reading the English content, it is determined whether the XML document to be processed is a pure English document or a hybrid document. If the plain English document is read, the English content is extracted by a space character; if the mixed document is read, the English content in the mixed document is determined, for example, the contents of the mixed document are read one by one, and then Judging whether the read content is English content, of course, it can also be judged step by step according to punctuation marks or Determine whether the read content is English content one by one. In the specific implementation, when the English content in the XML document is read, the position of the read English content may also be recorded, and of course, the position of the read English content may not be recorded, for example, line by line or sentence by sentence. When the XML document is processed, the location of the read English content may not be recorded. When the English content in the XML document is acquired, the process proceeds to step S12.

Step S12, determining whether there is an uppercase letter in the English content other than the initial letter;

According to the English content in the XML document obtained in step S11, it is determined whether there is an uppercase letter in the English content except the initial letter. If it is determined that there is an uppercase letter in addition to the initials in the English content, the process proceeds to step S13, otherwise, the process proceeds to step S14.

Step S13, if there is an uppercase letter in addition to the initial letter in the English content, the English content is determined to be a term;

For the terms in the XML document, there are generally at least two English uppercase letters in a row. According to the judgment result of the step S12, in addition to the initial letter in the English content, if there is an uppercase letter, the English content is determined to be a term. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.

In step S14, it is determined that the English content is not a term.

According to the judgment result of the step S12, it is determined that the English content is not a term, except that there is no uppercase letter except the initial letter in the English content. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.

The embodiment of the present invention utilizes the form of the term in the XML document to judge the term in the XML document by removing the phenomenon that at least one uppercase letter exists in the first letter. First, determining the type of the read XML document, if the XML document is a pure English document, determining the read English content by using a phenomenon that there is a space between each English word; if the XML document is mixed The document determines whether the read content is English content. When the English content is read, the English content in the XML document is extracted, thereby obtaining the English content in the XML document.

Please refer to FIG. 4. FIG. 4 is a schematic flowchart diagram of a third embodiment of a document processing method according to the present invention. Based on the first embodiment of the document processing method of the present invention, step S20 includes:

Step S21, determining whether the obtained term matches the term in the whitelist;

This embodiment may establish a whitelist, a local index, and a remote termbase in a local or server, and the whitelist, local index, and remote termbase may be located in the local terminal or in the server. Optionally, the whitelist and the local index are located in the local terminal, where the whitelist and the local index may be a subset of the remote termbase. In practice, the whitelist, the local index, and the remote termbase may also be three without an intersection. Termbases, each containing different terms. In more implementations, the user can create two termbases, or multiple termbases, according to actual needs. According to the term in the XML document obtained in step S10, it is judged whether the obtained term matches the term in the white list. If the result of the judgment is that the obtained term matches the term in the white list, it proceeds to step S30; if the obtained term does not match the term in the white list, it proceeds to step S22.

In the specific implementation, before performing this step, it may be determined whether the obtained term matches the term in the blacklist, and if the judgment result is that the obtained term does not match the term in the blacklist, the step is performed.

Step S22, determining whether the obtained term matches the term in the local index;

If the result of the determination in step S21 is that the obtained term does not match the term in the white list, it is judged whether the obtained term matches the term in the local index, and the commonly used term is included in the local index. If the result of the judgment is that the obtained term matches the term in the local index, the process proceeds to step S30; if the result of the judgment is that the obtained term does not match the term in the local index, then step S23 is reached.

Step S23, determining whether the obtained term matches the term in the remote termbase;

If the term obtained according to the judgment result of step S22 does not match the term in the local index, it is judged whether the obtained term matches the term in the remote termbase, and the remote termbase is matched. It can be located on a remote server or in a local database. If the result of the judgment is that the obtained term matches the term in the remote termbase, the process proceeds to step S30; if the result of the judgment is that the obtained term does not match the term in the remote termbase, then step S24 is entered.

Step S24, no processing is performed;

If the result of the determination in step S23 is that the obtained term does not match the term in the remote termbase, no processing is performed or the subsequent content of the XML document is continuously processed.

In this embodiment, the whitelist and the local index are a subset of the remote termbase. Of course, the whitelist and the local index are remote termbases that may not have an intersection, and each includes a different term, that is, all the terms are whitelisted as needed. Local indexing and remote termbases. In a specific implementation, when it is determined in step S22 and step S23 that the obtained term matches the term in the local index or the remote termbase, the prompt information may also be displayed, such as whether to add a whitelist, or edit the whitelist, local. Index and remote termbases. In more implementations, only local indexing and remote termbases can be built.

Step S30, inserting link information of the term matching the term in the preset termbase in the corresponding position of the XML document.

If the result of the determination in step S21 is that the obtained term matches the term in the white list, step S30 includes inserting link information of the term matching the term in the whitelist at the corresponding position of the XML document. If the result of the determination in step S22 is that the obtained term matches the term in the white list, step S30 includes inserting link information of a term matching the term in the local index at the corresponding position of the XML document. If the result of the determination in step S23 is that the obtained term matches the term in the whitelist, step S30 includes inserting link information of a term matching the term in the remote termbase at the corresponding position of the XML document.

The invention puts the terms in the three lists of the white list, the local index and the remote termbase, puts the terms determined by the link information in the white list, the commonly used terms are placed in the local index, and then respectively judge whether the obtained terms are Improved lookup in whitelists, local indexes, and remote termbases effectiveness. It can be avoided to some extent that with the increase of the terminology in the termbase, if placed in the same termbase, the problem of long search time is caused.

Please refer to FIG. 5. FIG. 5 is a diagram showing the link information of the term matching the term in the preset termbase in the corresponding position of the XML document if the term obtained in FIG. 1 matches the term in the preset termbase. Schematic diagram of the process.

Step S31, displaying a selection interface when the obtained term matches the number of terms in the preset termbase is at least one;

[Correct according to Rule 91 06.01.2016]
If the number of the term determined in step S20 matches the term in the preset termbase is at least one, the selection interface is displayed, as shown in FIG. 6, which may be in the matching method in FIG. Select the interface diagram. The selection interface includes the content of the read term, the English full name and the Chinese full name of the matching content, the description information of the matching content, the selection item, the edit white list button, and the determination button. The description information of the matching content may be displayed according to the user selecting different options in the selection interface to display the corresponding selection item, and the user may enter the white list editing interface by editing the white list. In the specific implementation, the selection interface may also add or reduce function buttons or set different interfaces according to user settings, such as adding an edit blacklist, editing a local index, and the like, and then entering a corresponding editing interface through corresponding buttons. In a specific implementation, if it is determined that the obtained term matches the term in the whitelist, the link information of the term matching the term in the whitelist may be directly inserted in the corresponding position of the XML document without displaying the selection interface. In more implementations, when the number of the terms that are determined in step S20 and the terminology in the preset termbase are at least two, the selection interface may be displayed, and the user operation may be reduced to some extent. After receiving the selection command triggered by the user based on the selection interface, the process proceeds to step S32.

Step S32, receiving a selection command triggered by the user on the selection interface, and inserting link information of the term in a corresponding position of the XML document according to the selection command.

According to step S31, the user selects a selection command triggered by the selection interface, and the terminal receives a selection command, and inserts a link letter of the term in the corresponding position of the XML document according to the selection command. The term is used by the XML document to use the term by link information when executed. Of course, you can skip this process according to the "skip" selection command, continue to process the subsequent XML document content, or enter the whitelist, local index or remote database editing interface according to the corresponding edit button.

In the embodiment of the present invention, when the number of the terms obtained by the judgment and the terminology in the preset termbase is at least one, the selection interface is displayed. The user can view related information matching the term through the selection interface, help the user identify the correct link information matching the term, improve the speed at which the user recognizes the term, and can enter the corresponding termbase through the selection interface for editing. It can greatly reduce the time spent by users in selecting correct terms; it is convenient for users to adjust the preset termbase during use.

The embodiment of the invention further provides a document processing apparatus.

Please refer to FIG. 7. FIG. 7 is a schematic diagram of functional modules of a first embodiment of a document processing apparatus according to the present invention.

In this embodiment, the document processing apparatus includes: an obtaining module 10, a first determining module 20, and a processing module 30.

Obtaining module 10 configured to obtain terms in an XML document;

The user inputs the local index file path or the path of the XML file to be processed in the corresponding input box of the software in the embodiment of the present invention. Of course, the XML file to be processed can also be opened by the software of the present invention. The software of the present invention finds the corresponding XML document to be processed according to the path of the XML file to be processed input by the user, reads the content in the XML document, and automatically searches for terms in the XML document. The terminology in the present invention includes terms and/or contraction. Abbreviation, thereby obtaining terms in the XML document.

The first determining module 20 is configured to determine whether the obtained term matches the term in the preset termbase, and the preset termbase includes a whitelist, a local index, and a remote termbase.

The preset term library in this embodiment includes, but is not limited to, a name column, an address column, an English full name column, and a Chinese full name column. In the specific implementation, different contents may be set according to different terms. Determining whether the term is preset or not according to a term in the XML document obtained by the obtaining module 10. The term matching in the termbase, such as the term obtained by the judgment in the present embodiment, is the same as the content in the name column in the preset termbase.

The processing module 30 is configured to insert link information of a term matching the term in the preset termbase at a corresponding position of the XML document if the obtained term matches the term in the preset termbase.

According to the judgment result of the first judging module 20, when the term matches the term in the preset termbase, the corresponding position of the term in the XML document matching the term in the preset termbase is inserted correspondingly Link information for the term. When the XML document is subsequently used, the attribute corresponding to the term can be read by reading the link information, that is, the name, address, full name of English, and Chinese full name of the term are read, so that the term is used to perform the corresponding Operate or display the corresponding content. In the specific implementation, according to the determination result of the first determining module 20, the prompt information may be displayed when the term does not match the term in the preset termbase; or the prompt information may be displayed after processing the XML document; A message is displayed.

Please refer to FIG. 8. FIG. 8 is a schematic diagram of functional modules of a second embodiment of a document processing apparatus according to the present invention. Based on the first embodiment of the document processing apparatus of the present invention, the apparatus may further include:

The second determining module 40 is configured to determine whether the obtained term matches the blacklist.

In this embodiment, a blacklist may be established locally or in a server, and the blacklist includes terms that do not need to be judged. In the specific implementation, the blacklist may not be established. After the term is obtained, it may be judged whether the obtained term matches the term in the blacklist before judging whether the obtained term matches the term in the preset termbase.

If the judgment result of the second judging module 40 is that the obtained term does not match the term in the blacklist, the first judging module 20 judges whether the obtained term matches the term in the preset termbase. The first judging module 20 is configured to determine whether the obtained term and the term in the preset termbase are determined if the judgment result of the second judging module 40 is that the obtained term does not match the term in the blacklist. match.

If the judgment result of the second judging module 40 is that the obtained term matches the term in the blacklist, no processing is performed, and it is of course possible to continue to judge whether the next term matches the term in the blacklist.

Please refer to FIG. 9. FIG. 9 is a schematic structural diagram of the module obtained in FIG.

Based on the first embodiment of the document processing apparatus of the present invention, the obtaining module 10 includes:

The obtaining unit 11 is configured to obtain the English content in the XML document.

The user inputs the local index file path or the path of the XML document to be processed in the corresponding input box of the software of the present invention. Of course, the XML document to be processed can also be opened by the software of the present invention. After the user inputs the local index file path or the path of the XML document to be processed, the user clicks the start processing button, and the embodiment of the present invention starts to retrieve the XML document node by node based on the start command triggered by the user. XML documents are generally divided into two types: one is a pure English document; the other is a mixture of English and other types of text, such as a mixed document in Chinese and English. Before reading the English content, it is determined whether the XML document to be processed is a pure English document or a hybrid document. If the pure English document is read, the English content is extracted by the space character; if the mixed document is read, it is determined Mixing the English content in the document, for example, reading the content of the mixed document one by one, and then judging whether the read content is English content, and of course, judging according to the punctuation marks or determining whether the read content is English content one by one. In the specific implementation, when the English content in the XML document is read, the position of the read English content may also be recorded, and of course, the position of the read English content may not be recorded, for example, line by line or sentence by sentence. When the XML document is processed, the location of the read English content may not be recorded.

The determining unit 12 is configured to determine whether there is an uppercase letter in the English content except the initial letter.

According to the English content in the XML document obtained by the obtaining unit 11, it is determined whether there is an uppercase letter in the English content except the initial letter.

The determining unit 13 is configured to determine that the English content is the term when there is an uppercase English letter in addition to the initial letter in the English content in the judgment unit.

For the terms in the XML document, there are generally at least two English uppercase letters in a row. According to the judgment result of the judging unit 12, when it is determined that the English content has an uppercase letter in addition to the initial letter, the English content is determined to be a term. If the judgment result of the judging unit 12 is that there is no uppercase letter other than the initial letter in the English content, it is determined that the English content is not a term. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.

Please refer to FIG. 10. FIG. 10 is a schematic structural block diagram of the first judging module of FIG. 7.

Based on the first embodiment of the document processing apparatus of the present invention, the first determining module 20 includes:

The first determining unit 21 is configured to determine whether the obtained term matches the term in the whitelist. If the term obtained matches a term in the whitelist, the processing module 30 inserts link information for the term that matches the term in the whitelist at the corresponding location of the XML document.

The whitelist, the local index, and the remote termbase may be established in the local or the server, and the whitelist, the local index, and the remote termbase may be located in the local terminal or in the server, optionally the whitelist, The local index is located in the local terminal, where the whitelist and the local index may be a subset of the remote termbase. In practice, the whitelist, the local index, and the remote termbase may also be three termbases without intersections, that is, each includes different terms of. In more implementations, the user can create two termbases, or multiple termbases, according to actual needs. According to the term in the XML document obtained by the obtaining module 10, it is judged whether the obtained term matches the term in the white list. If the term obtained matches a term in the whitelist, the processing module 30 inserts link information for the term that matches the term in the whitelist at the corresponding location of the XML document.

The second determining unit 22 is configured to determine whether the obtained term matches the term in the local index when the obtained term does not match the term in the whitelist. If the term obtained matches a term in the local index, the processing module 30 inserts link information for the term that matches the term in the local index at the corresponding location of the XML document.

If the result of the determination by the first judging unit 21 is that the obtained term does not match the term in the white list, it is judged whether the obtained term matches the term in the local index, and the commonly used term is included in the local index. If the term obtained matches a term in the local index, the processing module inserts link information for the term that matches the term in the local index at the corresponding location of the XML document.

The third determining unit 23 is configured to determine whether the obtained term matches the term in the remote termbase when the obtained term does not match the term in the local index. Processing module 30 Configured to insert link information for terms that match terms in the remote termbase at the corresponding location of the XML document if the terms obtained match the terms in the remote termbase.

According to the judgment result of the second judging unit 22, when the obtained term does not match the term in the local index, it is judged whether the obtained term matches the term in the remote termbase, and the remote termbase may be located in the remote server. It can also be located in a local database. The processing module 30 is configured to insert link information of terms matching the terms in the remote termbase at corresponding positions of the XML document if the terms obtained match the terms in the remote termbase. If the result of the judgment is that the obtained term does not match the term in the remote termbase, no processing is performed or the subsequent content of the XML document is continuously processed.

In this embodiment, the whitelist and the local index are a subset of the remote termbase. Of course, the whitelist and the local index are remote termbases that may not have an intersection, and each includes a different term, that is, all the terms are whitelisted as needed. Local indexing and remote termbases. In a specific implementation, when it is determined that the obtained term matches the term in the local index or the remote termbase in the second determining unit 22 and the third determining unit 23, the prompt information may also be displayed, such as whether to join the whitelist. Or edit whitelists, local indexes, and remote termbases. In more implementations, only local indexing and remote termbases can be built.

In the embodiment of the present invention, the terms are placed in three libraries of a white list, a local index, and a remote termbase. The terms determined by the link information are placed in a white list, and the commonly used terms are placed in the local index, and then the obtained Whether the term is in whitelists, local indexes, and remote termbases improves search efficiency. It can be avoided to some extent that with the increase of the terminology in the termbase, if placed in the same termbase, the problem of long search time is caused.

Please refer to FIG. 11. FIG. 11 is a schematic structural diagram of the processing module of FIG.

Based on the first embodiment of the document processing apparatus of the present invention, the processing module 30 includes:

The display unit 31 is configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one.

If the number of the terms judged by the first judging unit 20 matches the term in the preset termbase is at least one, the selection interface is displayed, as shown in FIG. 6, which is a schematic diagram of the selection interface. The selection interface includes the content of the read term, the English full name and the Chinese full name of the matching content, the description information of the matching content, the selection item, the edit white list button, and the determination button. The description information of the matching content may be displayed according to the user selecting different options in the selection interface to display the corresponding selection item, and the user may enter the white list editing interface by editing the white list. In the specific implementation, the selection interface may also add or reduce function buttons or set different interfaces according to user settings, such as adding an edit blacklist, editing a local index, and the like, and then entering a corresponding editing interface through corresponding buttons. In a specific implementation, if it is determined that the obtained term matches the term in the whitelist, the link information of the term matching the term in the whitelist may be directly inserted in the corresponding position of the XML document without displaying the selection interface. In more implementations, when the number of the terms that are judged to match the terminology in the preset termbase is at least two, the selection interface may be displayed, and the user operation may be reduced to some extent.

The processing unit 32 is configured to receive a selection command triggered by the user on the selection interface, and insert the link information of the term in the corresponding position of the XML document according to the selection command.

According to the selection command triggered by the user in the display unit 31, the terminal receives a selection command, and inserts link information of the term in the corresponding position of the XML document according to the selection command, so that the XML document passes during execution. The link information uses the terminology. Of course, you can skip this process according to the "skip" selection command, continue to process the subsequent XML document content, or enter the whitelist, local index or remote database editing interface according to the corresponding edit button.

In the embodiment of the present invention, when the number of the terms obtained by the judgment and the terminology in the preset termbase is at least one, the selection interface is displayed. The user can view related information matching the term through the selection interface, help the user identify the correct link information matching the term, improve the speed at which the user recognizes the term, and can enter the corresponding termbase through the selection interface for editing. Can reduce the time users spend on picking the right terms; it is convenient for users to adjust during use The preset termbase.

The embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, where the computer executable instructions are used to perform at least one of the foregoing methods, for example, as shown in FIG. 1 to FIG. At least one of the methods shown in 5.

The computer storage medium may include a storage medium such as a hard disk, an optical disk, a magnetic disk, or a flash disk, and may be a non-transitory storage medium.

The embodiment of the present invention further provides document processing. As shown in FIG. 12, the apparatus includes a processor 42, a storage medium 44, and at least one external communication interface 41; the processor 42, the storage medium 44, and the external communication interface 41 are both Connected via bus 43. The processor 42 can be a processing component such as a microprocessor, a central processing unit, a digital signal processor, or a programmable logic array. Computer-executable instructions are stored on the storage medium 44; the processor 42 executing the computer-executable instructions stored in the storage medium 44 may implement any of the above methods.

The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and modifications made in accordance with the principles of the present invention are understood to fall within the scope of the present invention.

Claims

A document processing method, the method comprising the following steps:

Obtain terminology in an extensible markup language XML document;

Determining whether the obtained term matches the term in the preset termbase;

If the term obtained matches a term in the preset termbase, link information for the term that matches the term in the pre-defined termbase is inserted at the corresponding location of the XML document.
The method of claim 1, wherein after the step of obtaining the term in the XML document, the step of determining whether the term obtained by the judgment matches the term in the preset termbase further comprises:

Determining whether the obtained term matches the term in the blacklist;

If the result of the determination is no, the step of performing the judgment is made to match whether the term in the preset termbase matches.
The method of claim 1 wherein said step of obtaining terms in an XML document comprises:

Obtaining the English content in the XML document;

Determining whether there is an uppercase letter in addition to the initial letter in the English content;

If there is an uppercase letter in addition to the initials in the English content, the English content is determined to be a term.
The method according to any one of claims 1 to 3, wherein the step of determining whether the term obtained by the judgment matches the term in the preset termbase comprises:

Determining whether the obtained term matches the term in the whitelist;

If the term obtained matches the term in the whitelist, then the corresponding XML document Location inserts link information for terms that match terms in the whitelist;

If the obtained term does not match the term in the whitelist, it is judged whether the obtained term matches the term in the local index;

If the obtained term matches the term in the local index, the link information of the term matching the term in the local index is inserted at the corresponding position of the XML document;

If the obtained term does not match the term in the local index, it is determined whether the obtained term matches the term in the remote termbase;

If the term obtained matches a term in the remote termbase, the link information of the term that matches the term in the remote termbase is inserted at the corresponding location of the XML document.
The method of claim 1, wherein if the term obtained matches a term in a preset termbase, a link to a term matching the term in the pre-defined termbase is inserted at a corresponding location of the XML document The steps of the information include:

Displaying a selection interface when the number of terms obtained matches the term in the preset termbase is at least one;

Receiving a selection command triggered by the user on the selection interface, and inserting link information of the term in a corresponding position of the XML document according to the selection command.
A document processing apparatus, the apparatus comprising:

Obtain a module configured to obtain terms in an XML document;

a first determining module, configured to determine whether the obtained term matches a term in a preset termbase, where the preset termbase includes a whitelist, a local index, and a remote termbase;

A processing module configured to insert link information of a term that matches a term in the preset termbase at a corresponding location of the XML document if the term obtained matches a term in the preset termbase.
The device of claim 6 wherein said device further comprises:

a second determining module, configured to determine whether the obtained term matches the blacklist;

The first determining module is configured to determine whether the obtained term matches the term in the preset termbase if the determination result of the second determining module is negative.
The apparatus of claim 6 wherein said obtaining module comprises:

Obtaining a unit configured to obtain English content in the XML document;

a determining unit, configured to determine whether there is an uppercase letter in the English content except the initial letter;

The determining unit is configured to determine that the English content is the term when there is an uppercase English letter in addition to the initial letter in the English content in the judgment unit.
The apparatus of any one of claims 6 to 8, wherein the first determining module comprises:

a first determining unit, configured to determine whether the obtained term matches a term in the whitelist, and if the obtained term matches the term in the whitelist, the processing module inserts and is in a corresponding position of the XML document Link information for terms in the whitelist that match terms;

a second determining unit configured to determine, when the obtained term does not match the term in the whitelist, whether the obtained term matches the term in the local index;

The processing module is configured to insert link information of a term that matches a term in the local index at a corresponding position of the XML document if the term obtained matches a term in the local index;

a third determining unit configured to determine, when the obtained term does not match a term in the local index, whether the obtained term matches a term in the remote termbase, if the term is obtained in a remote termbase The term matching, then the processing module inserts link information of terms matching the term in the remote termbase at the corresponding position of the XML document.
The apparatus of claim 6 wherein said processing module comprises:

a display unit, configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one;

The processing unit is configured to receive a selection command triggered by the user on the selection interface, and insert the link information of the term in the corresponding position of the XML document according to the selection command.
A computer storage medium having stored therein computer executable instructions for performing at least one of the methods of claims 1 to 5.