WO2016201807A1 - Document processing method and device, and computer storage medium - Google Patents

Document processing method and device, and computer storage medium Download PDF

Info

Publication number
WO2016201807A1
WO2016201807A1 PCT/CN2015/090053 CN2015090053W WO2016201807A1 WO 2016201807 A1 WO2016201807 A1 WO 2016201807A1 CN 2015090053 W CN2015090053 W CN 2015090053W WO 2016201807 A1 WO2016201807 A1 WO 2016201807A1
Authority
WO
WIPO (PCT)
Prior art keywords
term
matches
termbase
xml document
preset
Prior art date
Application number
PCT/CN2015/090053
Other languages
French (fr)
Chinese (zh)
Inventor
黄珏
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016201807A1 publication Critical patent/WO2016201807A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting

Definitions

  • the present invention relates to the field of automation technologies, and in particular, to a document processing method, apparatus, and computer storage medium.
  • Extended Markup language XML can extend the markup language.
  • Extended markup language XML is a simple data storage language. Like HTML, it is based on the standard common language SGML. Often used to simplify the storage and sharing of data, it is a powerful tool for processing structured document information.
  • Embodiments of the present invention are directed to a document processing method and apparatus capable of automatically inserting link information of a term in an XML document.
  • a document processing method provided by an embodiment of the present invention includes the steps of: obtaining the term in an XML document; determining whether the obtained term matches a term in a preset termbase; The term matches the term in the preset termbase, and the link information of the term matching the term in the preset termbase is inserted at the corresponding position of the XML document.
  • the step of whether the term matches the term in the preset termbase further comprises: determining whether the obtained term matches the term in the blacklist; if the determination result is no, performing the The step of whether the term matches the term in the pre-defined termbase.
  • the step of obtaining the term in the XML document includes: obtaining English content in the XML document; determining whether there is an uppercase letter in the English content other than the initial letter; In addition to the initials in the content, there are also uppercase letters, and the English content is determined to be a term.
  • the step of determining whether the term obtained by the determining matches the term in the preset termbase comprises: determining whether the obtained term matches the term in the whitelist; if the term and the whitelist are obtained In the term matching, the link information of the term matching the term in the whitelist is inserted at the corresponding position of the XML document; if the obtained term does not match the term in the whitelist, the obtained term is determined to be local Whether the terms in the index match; if the terms obtained match the terms in the local index, the link information of the term matching the term in the local index is inserted at the corresponding position of the XML document; if the term is obtained and local If the terms in the index do not match, it is judged whether the obtained term matches the term in the remote termbase; if the term obtained matches the term in the remote termbase, the corresponding position in the XML document is inserted and remotely Link information for terms in the termbase that match terms.
  • the step of inserting the link information of the term matching the term in the preset termbase at the corresponding position of the XML document includes: obtaining And displaying a selection interface when the number of terms matching the term in the preset termbase is at least one; receiving a selection command triggered by the user on the selection interface, inserting the corresponding position in the XML document according to the selection command The link information of the term.
  • An embodiment of the present invention further provides an apparatus for using a term, the apparatus comprising: an obtaining module configured to obtain the term in an XML document; and a first determining module configured to determine the obtained terminology and preset termbase Whether the terms in the match match, the preset termbase includes a whitelist, this a local index and a remote termbase; a processing module configured to insert link information of a term matching the term in the preset termbase at a corresponding position of the XML document if the term obtained matches a term in the preset termbase .
  • the device further includes: a second determining module, configured to determine whether the obtained term matches the blacklist; and the first determining module is configured to: if the determining result of the second determining module is no Then, it is judged whether the obtained term matches the term in the preset termbase.
  • a second determining module configured to determine whether the obtained term matches the blacklist
  • the first determining module is configured to: if the determining result of the second determining module is no Then, it is judged whether the obtained term matches the term in the preset termbase.
  • the obtaining module includes: an obtaining unit configured to obtain English content in the XML document; and a determining unit configured to determine whether an uppercase letter exists in the English content except the initial letter; And determining, when the judgment result of the determining unit is that the English content is uppercase, in addition to the initials, determining that the English content is the term.
  • the first determining module includes: a first determining unit configured to determine whether the obtained term matches a term in the whitelist; and the processing module is configured to: if the obtained term and the whitelist In the term matching, the link information of the term matching the term in the whitelist is inserted at the corresponding position of the XML document; the second determining unit is configured to determine when the obtained term does not match the term in the whitelist Whether the obtained term matches the term in the local index, and if the term obtained matches the term in the local index, the processing module inserts a term corresponding to the term in the local index at the corresponding position of the XML document.
  • Linking information a third determining unit configured to determine whether the obtained term matches a term in the remote termbase when the obtained term does not match the term in the local index, if the term is obtained and remotely If the term in the termbase matches, then the processing module inserts a term that matches the term in the remote termbase at the corresponding location of the XML document. Contact information.
  • the processing module includes: a display unit configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one; Configuring to receive a selection command triggered by the user on the selection interface, The link information of the term is inserted at a corresponding position of the XML document according to the selection command for use of the term.
  • the embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to perform at least one of the foregoing methods.
  • the embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining a term in the XML document; if the obtained term matches the term in the preset termbase, the XML is in the XML The corresponding position of the document is inserted into the link information of the term that matches the term in the preset termbase.
  • the embodiment of the present invention can automatically obtain the term from the XML document, without human participation, search for the term matching the obtained term in the preset termbase, and insert a link of the matching term in the corresponding position of the XML document. information.
  • FIG. 1 is a schematic flowchart of a first embodiment of a document processing method according to the present invention
  • FIG. 2 is a schematic flowchart diagram of a second embodiment of a document processing method according to the present invention.
  • FIG. 3 is a schematic flowchart of obtaining the term in an XML document according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart diagram of a third embodiment of a document processing method according to the present invention.
  • FIG. 5 is a schematic flowchart of linking information of a term that matches a term in a preset termbase in a corresponding position of the XML document according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a selection interface in FIG. 5;
  • FIG. 7 is a schematic structural diagram of a first embodiment of a document processing apparatus according to the present invention.
  • FIG. 8 is a schematic structural diagram of a first embodiment of a document processing apparatus according to the present invention.
  • FIG. 9 is a schematic structural view of the module obtained in FIG. 7;
  • FIG. 10 is a schematic structural diagram of a first judging module in FIG. 7;
  • FIG 11 is a schematic structural view of the processing module of Figure 7;
  • FIG. 12 is a schematic structural diagram of another document processing apparatus.
  • the embodiment of the invention provides a document processing method.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a document processing method according to the present invention.
  • the document processing method includes:
  • Step S10 obtaining the term in the XML document
  • the user inputs the local index file path or the path of the XML file to be processed in the corresponding input box of the software of the embodiment of the present invention.
  • the XML file to be processed can be opened by the software of the embodiment of the present invention.
  • the software of the present invention finds the corresponding XML document to be processed according to the path of the XML file to be processed input by the user, reads the content in the XML document, and automatically searches for terms in the XML document.
  • the terminology in the present invention includes terms and/or contraction. Abbreviation, thereby obtaining the term in the XML document, and then proceeds to step S20.
  • Step S20 determining whether the obtained term matches the term in the preset termbase
  • the preset term library in this embodiment includes, but is not limited to, the name, address, English full name and Chinese full name of the term.
  • different contents may be set according to different terms. Determining whether the term matches the term in the preset termbase according to the term in the XML document obtained in step S10, such as whether the term obtained by the judgment in the embodiment is in the name column in the preset termbase. The content is the same. If the result of the judgment is that the term matches the term in the preset termbase, Then, the process proceeds to step S30, otherwise, the process proceeds to step S40.
  • Step S30 inserting link information of a term matching the term in the preset termbase in the corresponding position of the XML document;
  • step S20 when the term matches the term in the preset termbase, a corresponding link of the term in the XML document matching the term in the preset termbase is inserted into the link corresponding to the term information.
  • the attribute corresponding to the term can be read through the inserted link information, that is, the name, address, full name of English, and Chinese full name of the term are read, so that the term is used to perform the corresponding Operate or display the corresponding content.
  • step S40 the prompt information is displayed.
  • the prompt information when the term does not match the term in the preset termbase, the prompt information may be displayed; or the prompt information may be displayed after the XML document is processed; of course, the prompt information may not be displayed.
  • the embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining the term in the XML document; if the obtained term matches the term in the preset termbase, The link information of the term is inserted into the corresponding position of the XML document for use of the term.
  • the present invention can automatically obtain terms from an XML document, without human intervention, look up terms in the preset termbase that match the obtained terminology, and insert link information of matching terms in the corresponding position of the XML document. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.
  • FIG. 2 is a schematic flowchart diagram of a second embodiment of a document processing method according to the present invention.
  • step S10 between step S10 and step S20, the method may further include:
  • Step S50 determining whether the obtained term matches the term in the blacklist
  • a blacklist may be established locally or in a server, and the blacklist includes terms that do not need to be judged. In the specific implementation, the blacklist may not be established.
  • step S60 no processing is performed.
  • step S50 If the result of the determination in step S50 is that the obtained term matches the term in the blacklist, no processing is performed, and of course, it is also possible to return to step S50 to continue to determine whether the next term matches the term in the blacklist.
  • FIG. 3 is a schematic flowchart of the steps of obtaining the term in the XML document in FIG.
  • step S10 includes:
  • Step S11 obtaining English content in the XML document
  • the user inputs the local index file path or the path of the XML document to be processed in the corresponding input box of the software in the embodiment of the present invention.
  • the XML document to be processed can also be opened by the software of the present invention.
  • XML documents are generally divided into two types: one is a pure English document; the other is a mixture of English and other types of text, such as a mixed document in Chinese and English.
  • the XML document to be processed Before reading the English content, it is determined whether the XML document to be processed is a pure English document or a hybrid document. If the plain English document is read, the English content is extracted by a space character; if the mixed document is read, the English content in the mixed document is determined, for example, the contents of the mixed document are read one by one, and then Judging whether the read content is English content, of course, it can also be judged step by step according to punctuation marks or Determine whether the read content is English content one by one. In the specific implementation, when the English content in the XML document is read, the position of the read English content may also be recorded, and of course, the position of the read English content may not be recorded, for example, line by line or sentence by sentence. When the XML document is processed, the location of the read English content may not be recorded. When the English content in the XML document is acquired, the process proceeds to step S12.
  • Step S12 determining whether there is an uppercase letter in the English content other than the initial letter
  • step S11 it is determined whether there is an uppercase letter in the English content except the initial letter. If it is determined that there is an uppercase letter in addition to the initials in the English content, the process proceeds to step S13, otherwise, the process proceeds to step S14.
  • Step S13 if there is an uppercase letter in addition to the initial letter in the English content, the English content is determined to be a term;
  • the English content is determined to be a term. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.
  • step S14 it is determined that the English content is not a term.
  • step S12 it is determined that the English content is not a term, except that there is no uppercase letter except the initial letter in the English content. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.
  • the embodiment of the present invention utilizes the form of the term in the XML document to judge the term in the XML document by removing the phenomenon that at least one uppercase letter exists in the first letter.
  • determining the type of the read XML document if the XML document is a pure English document, determining the read English content by using a phenomenon that there is a space between each English word; if the XML document is mixed
  • the document determines whether the read content is English content. When the English content is read, the English content in the XML document is extracted, thereby obtaining the English content in the XML document.
  • FIG. 4 is a schematic flowchart diagram of a third embodiment of a document processing method according to the present invention. Based on the first embodiment of the document processing method of the present invention, step S20 includes:
  • Step S21 determining whether the obtained term matches the term in the whitelist
  • This embodiment may establish a whitelist, a local index, and a remote termbase in a local or server, and the whitelist, local index, and remote termbase may be located in the local terminal or in the server.
  • the whitelist and the local index are located in the local terminal, where the whitelist and the local index may be a subset of the remote termbase.
  • the whitelist, the local index, and the remote termbase may also be three without an intersection.
  • Termbases each containing different terms.
  • the user can create two termbases, or multiple termbases, according to actual needs. According to the term in the XML document obtained in step S10, it is judged whether the obtained term matches the term in the white list. If the result of the judgment is that the obtained term matches the term in the white list, it proceeds to step S30; if the obtained term does not match the term in the white list, it proceeds to step S22.
  • the step before performing this step, it may be determined whether the obtained term matches the term in the blacklist, and if the judgment result is that the obtained term does not match the term in the blacklist, the step is performed.
  • Step S22 determining whether the obtained term matches the term in the local index
  • step S21 If the result of the determination in step S21 is that the obtained term does not match the term in the white list, it is judged whether the obtained term matches the term in the local index, and the commonly used term is included in the local index. If the result of the judgment is that the obtained term matches the term in the local index, the process proceeds to step S30; if the result of the judgment is that the obtained term does not match the term in the local index, then step S23 is reached.
  • Step S23 determining whether the obtained term matches the term in the remote termbase
  • step S22 If the term obtained according to the judgment result of step S22 does not match the term in the local index, it is judged whether the obtained term matches the term in the remote termbase, and the remote termbase is matched. It can be located on a remote server or in a local database. If the result of the judgment is that the obtained term matches the term in the remote termbase, the process proceeds to step S30; if the result of the judgment is that the obtained term does not match the term in the remote termbase, then step S24 is entered.
  • Step S24 no processing is performed
  • step S23 If the result of the determination in step S23 is that the obtained term does not match the term in the remote termbase, no processing is performed or the subsequent content of the XML document is continuously processed.
  • the whitelist and the local index are a subset of the remote termbase.
  • the whitelist and the local index are remote termbases that may not have an intersection, and each includes a different term, that is, all the terms are whitelisted as needed.
  • Local indexing and remote termbases when it is determined in step S22 and step S23 that the obtained term matches the term in the local index or the remote termbase, the prompt information may also be displayed, such as whether to add a whitelist, or edit the whitelist, local. Index and remote termbases. In more implementations, only local indexing and remote termbases can be built.
  • Step S30 inserting link information of the term matching the term in the preset termbase in the corresponding position of the XML document.
  • step S30 includes inserting link information of the term matching the term in the whitelist at the corresponding position of the XML document. If the result of the determination in step S22 is that the obtained term matches the term in the white list, step S30 includes inserting link information of a term matching the term in the local index at the corresponding position of the XML document. If the result of the determination in step S23 is that the obtained term matches the term in the whitelist, step S30 includes inserting link information of a term matching the term in the remote termbase at the corresponding position of the XML document.
  • the invention puts the terms in the three lists of the white list, the local index and the remote termbase, puts the terms determined by the link information in the white list, the commonly used terms are placed in the local index, and then respectively judge whether the obtained terms are Improved lookup in whitelists, local indexes, and remote termbases effectiveness. It can be avoided to some extent that with the increase of the terminology in the termbase, if placed in the same termbase, the problem of long search time is caused.
  • FIG. 5 is a diagram showing the link information of the term matching the term in the preset termbase in the corresponding position of the XML document if the term obtained in FIG. 1 matches the term in the preset termbase. Schematic diagram of the process.
  • Step S31 displaying a selection interface when the obtained term matches the number of terms in the preset termbase is at least one;
  • the selection interface is displayed, as shown in FIG. 6, which may be in the matching method in FIG. Select the interface diagram.
  • the selection interface includes the content of the read term, the English full name and the Chinese full name of the matching content, the description information of the matching content, the selection item, the edit white list button, and the determination button.
  • the description information of the matching content may be displayed according to the user selecting different options in the selection interface to display the corresponding selection item, and the user may enter the white list editing interface by editing the white list.
  • the selection interface may also add or reduce function buttons or set different interfaces according to user settings, such as adding an edit blacklist, editing a local index, and the like, and then entering a corresponding editing interface through corresponding buttons.
  • the link information of the term matching the term in the whitelist may be directly inserted in the corresponding position of the XML document without displaying the selection interface.
  • the selection interface may be displayed, and the user operation may be reduced to some extent.
  • Step S32 receiving a selection command triggered by the user on the selection interface, and inserting link information of the term in a corresponding position of the XML document according to the selection command.
  • step S31 the user selects a selection command triggered by the selection interface, and the terminal receives a selection command, and inserts a link letter of the term in the corresponding position of the XML document according to the selection command.
  • the term is used by the XML document to use the term by link information when executed.
  • the selection interface when the number of the terms obtained by the judgment and the terminology in the preset termbase is at least one, the selection interface is displayed.
  • the user can view related information matching the term through the selection interface, help the user identify the correct link information matching the term, improve the speed at which the user recognizes the term, and can enter the corresponding termbase through the selection interface for editing. It can greatly reduce the time spent by users in selecting correct terms; it is convenient for users to adjust the preset termbase during use.
  • the embodiment of the invention further provides a document processing apparatus.
  • FIG. 7 is a schematic diagram of functional modules of a first embodiment of a document processing apparatus according to the present invention.
  • the document processing apparatus includes: an obtaining module 10, a first determining module 20, and a processing module 30.
  • Obtaining module 10 configured to obtain terms in an XML document
  • the user inputs the local index file path or the path of the XML file to be processed in the corresponding input box of the software in the embodiment of the present invention.
  • the XML file to be processed can also be opened by the software of the present invention.
  • the software of the present invention finds the corresponding XML document to be processed according to the path of the XML file to be processed input by the user, reads the content in the XML document, and automatically searches for terms in the XML document.
  • the terminology in the present invention includes terms and/or contraction. Abbreviation, thereby obtaining terms in the XML document.
  • the first determining module 20 is configured to determine whether the obtained term matches the term in the preset termbase, and the preset termbase includes a whitelist, a local index, and a remote termbase.
  • the preset term library in this embodiment includes, but is not limited to, a name column, an address column, an English full name column, and a Chinese full name column. In the specific implementation, different contents may be set according to different terms. Determining whether the term is preset or not according to a term in the XML document obtained by the obtaining module 10.
  • the term matching in the termbase such as the term obtained by the judgment in the present embodiment, is the same as the content in the name column in the preset termbase.
  • the processing module 30 is configured to insert link information of a term matching the term in the preset termbase at a corresponding position of the XML document if the obtained term matches the term in the preset termbase.
  • the corresponding position of the term in the XML document matching the term in the preset termbase is inserted correspondingly Link information for the term.
  • the attribute corresponding to the term can be read by reading the link information, that is, the name, address, full name of English, and Chinese full name of the term are read, so that the term is used to perform the corresponding Operate or display the corresponding content.
  • the prompt information may be displayed when the term does not match the term in the preset termbase; or the prompt information may be displayed after processing the XML document; A message is displayed.
  • the embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining the term in the XML document; if the obtained term matches the term in the preset termbase, The link information of the term is inserted into the corresponding position of the XML document for use of the term.
  • the present invention can automatically obtain terms from an XML document, without human intervention, look up terms in the preset termbase that match the obtained terminology, and insert link information of matching terms in the corresponding position of the XML document. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.
  • FIG. 8 is a schematic diagram of functional modules of a second embodiment of a document processing apparatus according to the present invention. Based on the first embodiment of the document processing apparatus of the present invention, the apparatus may further include:
  • the second determining module 40 is configured to determine whether the obtained term matches the blacklist.
  • a blacklist may be established locally or in a server, and the blacklist includes terms that do not need to be judged. In the specific implementation, the blacklist may not be established. After the term is obtained, it may be judged whether the obtained term matches the term in the blacklist before judging whether the obtained term matches the term in the preset termbase.
  • the first judging module 20 judges whether the obtained term matches the term in the preset termbase.
  • the first judging module 20 is configured to determine whether the obtained term and the term in the preset termbase are determined if the judgment result of the second judging module 40 is that the obtained term does not match the term in the blacklist. match.
  • the judgment result of the second judging module 40 is that the obtained term matches the term in the blacklist, no processing is performed, and it is of course possible to continue to judge whether the next term matches the term in the blacklist.
  • FIG. 9 is a schematic structural diagram of the module obtained in FIG.
  • the obtaining module 10 includes:
  • the obtaining unit 11 is configured to obtain the English content in the XML document.
  • the user inputs the local index file path or the path of the XML document to be processed in the corresponding input box of the software of the present invention.
  • the XML document to be processed can also be opened by the software of the present invention.
  • XML documents are generally divided into two types: one is a pure English document; the other is a mixture of English and other types of text, such as a mixed document in Chinese and English.
  • the XML document to be processed Before reading the English content, it is determined whether the XML document to be processed is a pure English document or a hybrid document. If the pure English document is read, the English content is extracted by the space character; if the mixed document is read, it is determined Mixing the English content in the document, for example, reading the content of the mixed document one by one, and then judging whether the read content is English content, and of course, judging according to the punctuation marks or determining whether the read content is English content one by one. In the specific implementation, when the English content in the XML document is read, the position of the read English content may also be recorded, and of course, the position of the read English content may not be recorded, for example, line by line or sentence by sentence. When the XML document is processed, the location of the read English content may not be recorded.
  • the determining unit 12 is configured to determine whether there is an uppercase letter in the English content except the initial letter.
  • the determining unit 13 is configured to determine that the English content is the term when there is an uppercase English letter in addition to the initial letter in the English content in the judgment unit.
  • the English content is determined to be a term. If the judgment result of the judging unit 12 is that there is no uppercase letter other than the initial letter in the English content, it is determined that the English content is not a term. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.
  • the embodiment of the present invention utilizes the form of the term in the XML document to judge the term in the XML document by removing the phenomenon that at least one uppercase letter exists in the first letter.
  • determining the type of the read XML document if the XML document is a pure English document, determining the read English content by using a phenomenon that there is a space between each English word; if the XML document is mixed
  • the document determines whether the read content is English content. When the English content is read, the English content in the XML document is extracted, thereby obtaining the English content in the XML document.
  • FIG. 10 is a schematic structural block diagram of the first judging module of FIG. 7.
  • the first determining module 20 includes:
  • the first determining unit 21 is configured to determine whether the obtained term matches the term in the whitelist. If the term obtained matches a term in the whitelist, the processing module 30 inserts link information for the term that matches the term in the whitelist at the corresponding location of the XML document.
  • the whitelist, the local index, and the remote termbase may be established in the local or the server, and the whitelist, the local index, and the remote termbase may be located in the local terminal or in the server, optionally the whitelist,
  • the local index is located in the local terminal, where the whitelist and the local index may be a subset of the remote termbase.
  • the whitelist, the local index, and the remote termbase may also be three termbases without intersections, that is, each includes different terms of.
  • the user can create two termbases, or multiple termbases, according to actual needs.
  • the processing module 30 inserts link information for the term that matches the term in the whitelist at the corresponding location of the XML document.
  • the second determining unit 22 is configured to determine whether the obtained term matches the term in the local index when the obtained term does not match the term in the whitelist. If the term obtained matches a term in the local index, the processing module 30 inserts link information for the term that matches the term in the local index at the corresponding location of the XML document.
  • the processing module inserts link information for the term that matches the term in the local index at the corresponding location of the XML document.
  • the third determining unit 23 is configured to determine whether the obtained term matches the term in the remote termbase when the obtained term does not match the term in the local index.
  • Processing module 30 Configured to insert link information for terms that match terms in the remote termbase at the corresponding location of the XML document if the terms obtained match the terms in the remote termbase.
  • the remote termbase may be located in the remote server. It can also be located in a local database.
  • the processing module 30 is configured to insert link information of terms matching the terms in the remote termbase at corresponding positions of the XML document if the terms obtained match the terms in the remote termbase. If the result of the judgment is that the obtained term does not match the term in the remote termbase, no processing is performed or the subsequent content of the XML document is continuously processed.
  • the whitelist and the local index are a subset of the remote termbase.
  • the whitelist and the local index are remote termbases that may not have an intersection, and each includes a different term, that is, all the terms are whitelisted as needed.
  • Local indexing and remote termbases when it is determined that the obtained term matches the term in the local index or the remote termbase in the second determining unit 22 and the third determining unit 23, the prompt information may also be displayed, such as whether to join the whitelist. Or edit whitelists, local indexes, and remote termbases. In more implementations, only local indexing and remote termbases can be built.
  • the terms are placed in three libraries of a white list, a local index, and a remote termbase.
  • the terms determined by the link information are placed in a white list, and the commonly used terms are placed in the local index, and then the obtained Whether the term is in whitelists, local indexes, and remote termbases improves search efficiency. It can be avoided to some extent that with the increase of the terminology in the termbase, if placed in the same termbase, the problem of long search time is caused.
  • FIG. 11 is a schematic structural diagram of the processing module of FIG.
  • the processing module 30 includes:
  • the display unit 31 is configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one.
  • the selection interface is displayed, as shown in FIG. 6, which is a schematic diagram of the selection interface.
  • the selection interface includes the content of the read term, the English full name and the Chinese full name of the matching content, the description information of the matching content, the selection item, the edit white list button, and the determination button.
  • the description information of the matching content may be displayed according to the user selecting different options in the selection interface to display the corresponding selection item, and the user may enter the white list editing interface by editing the white list.
  • the selection interface may also add or reduce function buttons or set different interfaces according to user settings, such as adding an edit blacklist, editing a local index, and the like, and then entering a corresponding editing interface through corresponding buttons.
  • the link information of the term matching the term in the whitelist may be directly inserted in the corresponding position of the XML document without displaying the selection interface.
  • the selection interface may be displayed, and the user operation may be reduced to some extent.
  • the processing unit 32 is configured to receive a selection command triggered by the user on the selection interface, and insert the link information of the term in the corresponding position of the XML document according to the selection command.
  • the terminal receives a selection command, and inserts link information of the term in the corresponding position of the XML document according to the selection command, so that the XML document passes during execution.
  • the link information uses the terminology. Of course, you can skip this process according to the "skip" selection command, continue to process the subsequent XML document content, or enter the whitelist, local index or remote database editing interface according to the corresponding edit button.
  • the selection interface when the number of the terms obtained by the judgment and the terminology in the preset termbase is at least one, the selection interface is displayed.
  • the user can view related information matching the term through the selection interface, help the user identify the correct link information matching the term, improve the speed at which the user recognizes the term, and can enter the corresponding termbase through the selection interface for editing. Can reduce the time users spend on picking the right terms; it is convenient for users to adjust during use The preset termbase.
  • the embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, where the computer executable instructions are used to perform at least one of the foregoing methods, for example, as shown in FIG. 1 to FIG. At least one of the methods shown in 5.
  • the computer storage medium may include a storage medium such as a hard disk, an optical disk, a magnetic disk, or a flash disk, and may be a non-transitory storage medium.
  • the apparatus includes a processor 42, a storage medium 44, and at least one external communication interface 41; the processor 42, the storage medium 44, and the external communication interface 41 are both Connected via bus 43.
  • the processor 42 can be a processing component such as a microprocessor, a central processing unit, a digital signal processor, or a programmable logic array.
  • Computer-executable instructions are stored on the storage medium 44; the processor 42 executing the computer-executable instructions stored in the storage medium 44 may implement any of the above methods.

Abstract

A document processing method comprises the following steps: acquiring a term in an XML document (S10); determining whether the acquired term matches a term in a preset termbase (S20); and if so, inserting, at a corresponding position of the XML document, link information of the term matching a term in the preset termbase (S30).

Description

文档处理方法、装置和计算机存储介质Document processing method, device and computer storage medium 技术领域Technical field
本发明涉及自动化技术领域,尤其涉及一种文档处理方法、装置和计算机存储介质。The present invention relates to the field of automation technologies, and in particular, to a document processing method, apparatus, and computer storage medium.
背景技术Background technique
XML(Extensible Markup Language)即可扩展标记语言,扩展标记语言XML是一种简单的数据存储语言,它与HTML一样,都是出于标准通用语言SGML。常用于简化数据的存储和共享,是当前处理结构化文档信息的有力工具。XML (Extensible Markup Language) can extend the markup language. Extended markup language XML is a simple data storage language. Like HTML, it is based on the standard common language SGML. Often used to simplify the storage and sharing of data, it is a powerful tool for processing structured document information.
为适应产品的XML文档的快速开发,大多数的企业都建立了独立的术语库,以管理XML文档中的术语以及使用这些术语。To accommodate the rapid development of XML documents for products, most companies have established independent termbases to manage terms in XML documents and use them.
目前,为在执行XML文档时,能够顺利使用术语,每次制作XML文档时,文档作者都要手工在术语库中实行查找、辨别和在XML文档制作链接信息的步骤。Currently, in order to successfully use the terminology when executing an XML document, each time an XML document is produced, the author of the document manually performs the steps of finding, identifying, and creating link information in the XML document in the termbase.
发明内容Summary of the invention
本发明实施例期望提出一种文档处理方法和装置,能够实现自动在XML文档中插入术语的链接信息。Embodiments of the present invention are directed to a document processing method and apparatus capable of automatically inserting link information of a term in an XML document.
本发明实施例提供的一种文档处理方法,所述方法包括以下步骤:获得XML文档中的所述术语;判断获得的所述术语与预置术语库中的术语是否匹配;如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。A document processing method provided by an embodiment of the present invention includes the steps of: obtaining the term in an XML document; determining whether the obtained term matches a term in a preset termbase; The term matches the term in the preset termbase, and the link information of the term matching the term in the preset termbase is inserted at the corresponding position of the XML document.
可选地,所述获得XML文档中的所述术语的步骤之后,所述判断获得 的所述术语与预置术语库中的术语是否匹配的步骤之前还包括:判断获得的所述术语与黑名单中的术语是否匹配;如果判断结果为否,则执行所述判断获得的所述术语与预置术语库中的术语是否匹配的步骤。Optionally, after the step of obtaining the term in the XML document, the determining is obtained The step of whether the term matches the term in the preset termbase further comprises: determining whether the obtained term matches the term in the blacklist; if the determination result is no, performing the The step of whether the term matches the term in the pre-defined termbase.
可选地,所述获得XML文档中的所述术语的步骤包括:获得所述XML文档中的英文内容;判断所述英文内容中除首字母之外,是否还存在大写字母;如果所述英文内容中除首字母之外,还存在大写字母,则确定所述英文内容为术语。Optionally, the step of obtaining the term in the XML document includes: obtaining English content in the XML document; determining whether there is an uppercase letter in the English content other than the initial letter; In addition to the initials in the content, there are also uppercase letters, and the English content is determined to be a term.
可选地,所述判断获得的所述术语与预置术语库中的术语是否匹配的步骤包括:判断获得的所述术语与白名单中的术语是否匹配;如果获得的所述术语与白名单中的术语匹配,则在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息;如果获得的所述术语与白名单中的术语不匹配,则判断获得的所述术语与本地索引中的术语是否匹配;如果获得的所述术语与本地索引中的术语匹配,则在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息;如果获得的所述术语与本地索引中的术语不匹配,则判断获得的所述术语与远程术语库中的术语是否匹配;如果获得的所述术语与远程术语库中的术语匹配,则在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。Optionally, the step of determining whether the term obtained by the determining matches the term in the preset termbase comprises: determining whether the obtained term matches the term in the whitelist; if the term and the whitelist are obtained In the term matching, the link information of the term matching the term in the whitelist is inserted at the corresponding position of the XML document; if the obtained term does not match the term in the whitelist, the obtained term is determined to be local Whether the terms in the index match; if the terms obtained match the terms in the local index, the link information of the term matching the term in the local index is inserted at the corresponding position of the XML document; if the term is obtained and local If the terms in the index do not match, it is judged whether the obtained term matches the term in the remote termbase; if the term obtained matches the term in the remote termbase, the corresponding position in the XML document is inserted and remotely Link information for terms in the termbase that match terms.
可选地,所述如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息的步骤包括:在获得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面;接收用户在所述选择界面触发的选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息。Optionally, if the term obtained is matched with a term in the preset termbase, the step of inserting the link information of the term matching the term in the preset termbase at the corresponding position of the XML document includes: obtaining And displaying a selection interface when the number of terms matching the term in the preset termbase is at least one; receiving a selection command triggered by the user on the selection interface, inserting the corresponding position in the XML document according to the selection command The link information of the term.
本发明实施例还提供一种使用术语的装置,所述装置包括:获得模块,配置为获得XML文档中的所述术语;第一判断模块,配置为判断获得的所述术语与预置术语库中的术语是否匹配,所述预置术语库包括白名单、本 地索引和远程术语库;处理模块,配置为如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。An embodiment of the present invention further provides an apparatus for using a term, the apparatus comprising: an obtaining module configured to obtain the term in an XML document; and a first determining module configured to determine the obtained terminology and preset termbase Whether the terms in the match match, the preset termbase includes a whitelist, this a local index and a remote termbase; a processing module configured to insert link information of a term matching the term in the preset termbase at a corresponding position of the XML document if the term obtained matches a term in the preset termbase .
可选地,所述装置还包括:第二判断模块,配置为判断获得的所述术语与黑名单是否匹配;所述第一判断模块,配置为如果所述第二判断模块的判断结果为否,则判断获得的所述术语与预置术语库中的术语是否匹配。Optionally, the device further includes: a second determining module, configured to determine whether the obtained term matches the blacklist; and the first determining module is configured to: if the determining result of the second determining module is no Then, it is judged whether the obtained term matches the term in the preset termbase.
可选地,所述获得模块包括:获得单元,配置为获得所述XML文档中的英文内容;判断单元,配置为判断所述英文内容中除首字母之外,是否存在大写字母;确定单元,配置为在所述判断单元的判断结果为所述英文内容中除首字母之外,存在大写英文字母时,确定所述英文内容为所述术语。Optionally, the obtaining module includes: an obtaining unit configured to obtain English content in the XML document; and a determining unit configured to determine whether an uppercase letter exists in the English content except the initial letter; And determining, when the judgment result of the determining unit is that the English content is uppercase, in addition to the initials, determining that the English content is the term.
可选地,所述第一判断模块包括:第一判断单元,配置为判断获得的所述术语与白名单中的术语是否匹配;所述处理模块,配置为如果获得的所述术语与白名单中的术语匹配,则在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息;第二判断单元,配置为在获得的所述术语与白名单中的术语不匹配时,判断获得的所述术语与本地索引中的术语是否匹配,如果获得的所述术语与本地索引中的术语匹配,则所述处理模块在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息;第三判断单元,配置为在获得的所述术语与本地索引中的术语不匹配时,判断获得的所述术语与远程术语库中的术语是否匹配,如果获得的所述术语与远程术语库中的术语匹配,则所述处理模块在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。Optionally, the first determining module includes: a first determining unit configured to determine whether the obtained term matches a term in the whitelist; and the processing module is configured to: if the obtained term and the whitelist In the term matching, the link information of the term matching the term in the whitelist is inserted at the corresponding position of the XML document; the second determining unit is configured to determine when the obtained term does not match the term in the whitelist Whether the obtained term matches the term in the local index, and if the term obtained matches the term in the local index, the processing module inserts a term corresponding to the term in the local index at the corresponding position of the XML document. Linking information; a third determining unit configured to determine whether the obtained term matches a term in the remote termbase when the obtained term does not match the term in the local index, if the term is obtained and remotely If the term in the termbase matches, then the processing module inserts a term that matches the term in the remote termbase at the corresponding location of the XML document. Contact information.
可选地,所述处理模块包括:显示单元,配置为在所述第一判断模块判断结果为获得的所述术语与预置术语库中匹配的数量为至少一个时,显示选择界面;处理单元,配置为接收用户在所述选择界面触发的选择命令, 根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息,以供使用所述术语。Optionally, the processing module includes: a display unit configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one; Configuring to receive a selection command triggered by the user on the selection interface, The link information of the term is inserted at a corresponding position of the XML document according to the selection command for use of the term.
本发明实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行上述方法的至少其中之一。The embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to perform at least one of the foregoing methods.
本发明实施例通过获得XML文档中的术语;判断获得的所述术语与预置术语库中的术语是否匹配;如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。通过上述方式,本发明实施例可以自动从XML文档中获得术语,无需人为参与,在预置术语库中查找与获得的术语匹配的术语,并在所述XML文档相应位置插入匹配的术语的链接信息。可以节省XML文档作者的手工在术语库中实行查找、辨别和在XML文档制作链接信息的时间,同时可以避免XML文档作者在手工操作过程中容易的误操作,比如可以避免在术语库中重复术语过多时,即一个术语名具有多个不同的解释,会使得XML文档作者在挑选正确条目上花费的时间很长,且可能一个术语名具有多个连接信息时,容易使得制作出现错误的问题。The embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining a term in the XML document; if the obtained term matches the term in the preset termbase, the XML is in the XML The corresponding position of the document is inserted into the link information of the term that matches the term in the preset termbase. In the above manner, the embodiment of the present invention can automatically obtain the term from the XML document, without human participation, search for the term matching the obtained term in the preset termbase, and insert a link of the matching term in the corresponding position of the XML document. information. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.
附图说明DRAWINGS
图1为本发明文档处理方法第一实施例提供的流程示意图;1 is a schematic flowchart of a first embodiment of a document processing method according to the present invention;
图2为本发明文档处理方法第二实施例提供的流程示意图;2 is a schematic flowchart diagram of a second embodiment of a document processing method according to the present invention;
图3为本发明实施例提供的获得XML文档中的所述术语的流程示意图;3 is a schematic flowchart of obtaining the term in an XML document according to an embodiment of the present invention;
图4为本发明文档处理方法第三实施例提供的流程示意图;4 is a schematic flowchart diagram of a third embodiment of a document processing method according to the present invention;
图5为本发明实施例提供的在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息的流程示意图;FIG. 5 is a schematic flowchart of linking information of a term that matches a term in a preset termbase in a corresponding position of the XML document according to an embodiment of the present invention;
图6为图5中选择界面示意图;6 is a schematic diagram of a selection interface in FIG. 5;
图7为本发明文档处理装置第一实施例提供的结构示意图; FIG. 7 is a schematic structural diagram of a first embodiment of a document processing apparatus according to the present invention; FIG.
图8为本发明文档处理装置第一实施例提供的结构示意图;FIG. 8 is a schematic structural diagram of a first embodiment of a document processing apparatus according to the present invention; FIG.
图9为图7中获得模块的结构示意图;9 is a schematic structural view of the module obtained in FIG. 7;
图10为图7中第一判断模块的结构示意图;10 is a schematic structural diagram of a first judging module in FIG. 7;
图11为图7中处理模块的结构示意图;Figure 11 is a schematic structural view of the processing module of Figure 7;
[根据细则91更正 06.01.2016] 
图12发明实施例还提供了另一种文档处理装置的结构示意图。
[Correct according to Rule 91 06.01.2016]
FIG. 12 is a schematic structural diagram of another document processing apparatus.
具体实施方式detailed description
以下结合附图对本发明的优选实施例进行详细说明,应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings.
本发明实施例提供一种文档处理方法。The embodiment of the invention provides a document processing method.
请参照图1,图1为本发明文档处理方法第一实施例的流程示意图。Please refer to FIG. 1. FIG. 1 is a schematic flowchart diagram of a first embodiment of a document processing method according to the present invention.
在本实施例中,该文档处理方法包括:In this embodiment, the document processing method includes:
步骤S10,获得XML文档中的所述术语;Step S10, obtaining the term in the XML document;
用户在本发明实施例软件相应输入框中输入本地索引文件路径或者待处理XML文件路径,当然还可以通过本发明实施例的软件打开待处理的XML文件。本发明软件根据用户输入的待处理的XML文件的路径,找到对应的待处理XML文档,读取XML文档中内容,自动搜索所述XML文档中的术语,本发明中术语包括术语和/或缩略语,从而获得所述XML文档中的术语,然后进入步骤S20。The user inputs the local index file path or the path of the XML file to be processed in the corresponding input box of the software of the embodiment of the present invention. Of course, the XML file to be processed can be opened by the software of the embodiment of the present invention. The software of the present invention finds the corresponding XML document to be processed according to the path of the XML file to be processed input by the user, reads the content in the XML document, and automatically searches for terms in the XML document. The terminology in the present invention includes terms and/or contraction. Abbreviation, thereby obtaining the term in the XML document, and then proceeds to step S20.
步骤S20,判断获得的所述术语与预置术语库中的术语是否匹配;Step S20, determining whether the obtained term matches the term in the preset termbase;
本实施例中预置术语库包括但不限于术语的名称、地址、英文全称和中文全称,具体实施中可以根据不同的术语设置不同的内容。根据步骤S10获得的所述XML文档中的术语,判断所述术语是否与预置术语库中的术语匹配,比如本实施例中通过判断获得的所述术语是否和预置术语库中名称列中的内容相同。如果判断结果为所述术语与预置术语库中的术语匹配, 则进入步骤S30,否则进入步骤S40。The preset term library in this embodiment includes, but is not limited to, the name, address, English full name and Chinese full name of the term. In the specific implementation, different contents may be set according to different terms. Determining whether the term matches the term in the preset termbase according to the term in the XML document obtained in step S10, such as whether the term obtained by the judgment in the embodiment is in the name column in the preset termbase. The content is the same. If the result of the judgment is that the term matches the term in the preset termbase, Then, the process proceeds to step S30, otherwise, the process proceeds to step S40.
步骤S30,在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息;Step S30, inserting link information of a term matching the term in the preset termbase in the corresponding position of the XML document;
根据步骤S20的判断结果,在所述术语与预置术语库中的术语匹配时,在所述XML文档中与预置术语库中的术语匹配的所述术语的相应位置插入对应该术语的链接信息。在后续使用XML文档时,可以通过插入的链接信息读取到对应所述术语的属性,即读取到所述术语的名称、地址、英文全称和中文全称等,从而使用所述术语,执行对应操作或显示对应的内容。According to the judgment result of step S20, when the term matches the term in the preset termbase, a corresponding link of the term in the XML document matching the term in the preset termbase is inserted into the link corresponding to the term information. When the XML document is subsequently used, the attribute corresponding to the term can be read through the inserted link information, that is, the name, address, full name of English, and Chinese full name of the term are read, so that the term is used to perform the corresponding Operate or display the corresponding content.
步骤S40,显示提示信息。In step S40, the prompt information is displayed.
根据步骤S20的判断结果,在所述术语与预置术语库中的术语不匹配时,可以显示提示信息;或者在处理所述XML文档之后显示提示信息;当然也可以不显示提示信息。According to the judgment result of step S20, when the term does not match the term in the preset termbase, the prompt information may be displayed; or the prompt information may be displayed after the XML document is processed; of course, the prompt information may not be displayed.
本发明实施例通过获得XML文档中的所述术语;判断获得的所述术语与预置术语库中的术语是否匹配;如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入所述术语的链接信息,以供使用所述术语。通过上述方式,本发明可以自动从XML文档中获得术语,无需人为参与,在预置术语库中查找与获得的术语匹配的术语,并在所述XML文档相应位置插入匹配的术语的链接信息。可以节省XML文档作者的手工在术语库中实行查找、辨别和在XML文档制作链接信息的时间,同时可以避免XML文档作者在手工操作过程中容易的误操作,比如可以避免在术语库中重复术语过多时,即一个术语名具有多个不同的解释,会使得XML文档作者在挑选正确条目上花费的时间很长,且可能一个术语名具有多个连接信息时,容易使得制作出现错误的问题。The embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining the term in the XML document; if the obtained term matches the term in the preset termbase, The link information of the term is inserted into the corresponding position of the XML document for use of the term. In the above manner, the present invention can automatically obtain terms from an XML document, without human intervention, look up terms in the preset termbase that match the obtained terminology, and insert link information of matching terms in the corresponding position of the XML document. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.
请参照图2,图2为本发明文档处理方法第二实施例的流程示意图。Please refer to FIG. 2. FIG. 2 is a schematic flowchart diagram of a second embodiment of a document processing method according to the present invention.
基于第一实施例,在步骤S10和步骤S20之间还可以包括: Based on the first embodiment, between step S10 and step S20, the method may further include:
步骤S50,判断获得的所述术语与黑名单中的术语是否匹配;Step S50, determining whether the obtained term matches the term in the blacklist;
本实施例可以在本地或者服务器中建立黑名单,所述黑名单包括不需要判断的术语。具体实施中也可以不建立黑名单。在步骤S10获得的所述术语后,在步骤S20判断获得的所述术语与预置术语库中的术语是否匹配的步骤之前,可以先判断获得的所述术语与黑名单中的术语是否匹配。如果判断结果为获得的所述术语与黑名单中的术语匹配,则进入步骤S60;如果判断结果为获得的所述术语与黑名单中的术语不匹配,则进入步骤S20。In this embodiment, a blacklist may be established locally or in a server, and the blacklist includes terms that do not need to be judged. In the specific implementation, the blacklist may not be established. After the term obtained in step S10, before the step of judging whether the obtained term matches the term in the preset termbase in step S20, it may be judged whether the obtained term matches the term in the blacklist. If the result of the judgment is that the obtained term matches the term in the blacklist, the process proceeds to step S60; if the result of the judgment is that the obtained term does not match the term in the blacklist, the process proceeds to step S20.
步骤S60,不做任何处理。In step S60, no processing is performed.
如果步骤S50的判断结果为获得的所述术语与黑名单中的术语匹配,则不做任何处理,当然也可以返回步骤S50,继续判断下一个术语与黑名单中的术语是否匹配。If the result of the determination in step S50 is that the obtained term matches the term in the blacklist, no processing is performed, and of course, it is also possible to return to step S50 to continue to determine whether the next term matches the term in the blacklist.
请参照图3,图3为图1中获得XML文档中的所述术语的步骤的细化流程示意图。Please refer to FIG. 3. FIG. 3 is a schematic flowchart of the steps of obtaining the term in the XML document in FIG.
基于第一实施例,步骤S10包括:Based on the first embodiment, step S10 includes:
步骤S11,获得所述XML文档中的英文内容;Step S11, obtaining English content in the XML document;
用户在本发明实施例软件相应输入框中输入本地索引文件路径或者待处理XML文档路径,当然还可以通过本发明的软件打开待处理的XML文档。用户输入本地索引文件路径或者待处理XML文档路径后,用户点击开始处理按钮,本发明实施例基于用户触发的开始命令,开始逐节点读取所述XML文档。XML文档一般分为两种:一种是纯英文文档;一种是英文和其他类型文字混合的文档,比如中英文混合文档。在读取所述英文内容前,判断待处理的XML文档是纯英文文档还是混合文档。如果读取到的是纯英文文档,则按空格符提取其中的英文内容;如果读取到的是混合文档,则确定混合文档中的英文内容,比如逐一读取所述混合文档的内容,然后判断读取的内容是否为英文内容,当然还可以根据标点符号逐句判断或者 逐一判断读取的内容是否为英文内容。具体实施中在读取到所述XML文档中的英文内容时,还可以记录读取到的英文内容的位置,当然也可以不记录读取到的英文内容的位置,比如在逐行或者逐句处理所述XML文档时,可以不记录读取到的英文内容的位置。获取到所述XML文档中的英文内容时,进入步骤S12。The user inputs the local index file path or the path of the XML document to be processed in the corresponding input box of the software in the embodiment of the present invention. Of course, the XML document to be processed can also be opened by the software of the present invention. After the user inputs the local index file path or the path of the XML document to be processed, the user clicks the start processing button, and the embodiment of the present invention starts to read the XML document node by node based on the start command triggered by the user. XML documents are generally divided into two types: one is a pure English document; the other is a mixture of English and other types of text, such as a mixed document in Chinese and English. Before reading the English content, it is determined whether the XML document to be processed is a pure English document or a hybrid document. If the plain English document is read, the English content is extracted by a space character; if the mixed document is read, the English content in the mixed document is determined, for example, the contents of the mixed document are read one by one, and then Judging whether the read content is English content, of course, it can also be judged step by step according to punctuation marks or Determine whether the read content is English content one by one. In the specific implementation, when the English content in the XML document is read, the position of the read English content may also be recorded, and of course, the position of the read English content may not be recorded, for example, line by line or sentence by sentence. When the XML document is processed, the location of the read English content may not be recorded. When the English content in the XML document is acquired, the process proceeds to step S12.
步骤S12,判断所述英文内容中除首字母之外,是否还存在大写字母;Step S12, determining whether there is an uppercase letter in the English content other than the initial letter;
根据步骤S11获得的所述XML文档中的英文内容,判断所述英文内容中除首字母外,是否还存在大写字母。如果判断到所述英文内容中除首字母之外,还存在大写字母,则进入步骤S13,否则,进入步骤S14。According to the English content in the XML document obtained in step S11, it is determined whether there is an uppercase letter in the English content except the initial letter. If it is determined that there is an uppercase letter in addition to the initials in the English content, the process proceeds to step S13, otherwise, the process proceeds to step S14.
步骤S13,如果所述英文内容中除首字母之外,还存在大写字母,则确定所述英文内容为术语;Step S13, if there is an uppercase letter in addition to the initial letter in the English content, the English content is determined to be a term;
对于在所述XML文档中的术语,一般为连续的至少两个英文大写字母。根据步骤S12的判断结果为判断到所述英文内容中除首字母之外,还存在大写字母,则确定所述英文内容为术语。然后继续读取所述XML文档后续内容,或者继续判断读取到的后续英文内容。For the terms in the XML document, there are generally at least two English uppercase letters in a row. According to the judgment result of the step S12, in addition to the initial letter in the English content, if there is an uppercase letter, the English content is determined to be a term. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.
步骤S14,确定所述英文内容不是术语。In step S14, it is determined that the English content is not a term.
根据步骤S12的判断结果为判断到所述英文内容中除首字母之外,不存在大写字母,则确定所述英文内容不是术语。然后继续读取所述XML文档后续内容,或者继续判断读取到的后续英文内容。According to the judgment result of the step S12, it is determined that the English content is not a term, except that there is no uppercase letter except the initial letter in the English content. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.
本发明实施例利用在XML文档中术语的形式为除去首字母至少还存在一个大写字母的现象,判断所述XML文档中的术语。首先判断读取的所述XML文档的类型,如果所述XML文档是纯英文文档,则利用各个英文单词之间会存在空格的现象,确定读取到的英文内容;如果所述XML文档是混合文档,则判断读取的内容是否为英文内容。在读取到英文内容时,提取所述XML文档中的英文内容,从而获得所述XML文档中的英文内容。 The embodiment of the present invention utilizes the form of the term in the XML document to judge the term in the XML document by removing the phenomenon that at least one uppercase letter exists in the first letter. First, determining the type of the read XML document, if the XML document is a pure English document, determining the read English content by using a phenomenon that there is a space between each English word; if the XML document is mixed The document determines whether the read content is English content. When the English content is read, the English content in the XML document is extracted, thereby obtaining the English content in the XML document.
请参照图4,图4为本发明文档处理方法第三实施例的流程示意图。基于本发明文档处理方法第一实施例,步骤S20包括:Please refer to FIG. 4. FIG. 4 is a schematic flowchart diagram of a third embodiment of a document processing method according to the present invention. Based on the first embodiment of the document processing method of the present invention, step S20 includes:
步骤S21,判断获得的所述术语与白名单中的术语是否匹配;Step S21, determining whether the obtained term matches the term in the whitelist;
本实施例可以在本地或者服务器中建立白名单、本地索引和远程术语库,所述白名单、本地索引和远程术语库可以位于本地终端中也可以位于服务器中。可选地所述白名单、本地索引位于本地终端中,其中,白名单、本地索引可以为远程术语库的子集,具体实施中白名单、本地索引和远程术语库也可以是没有交集的三个术语库,即各自包括不同的术语。在更多的实施中用户可以根据实际需要建立其中两个术语库,或者多个术语库。根据步骤S10获得XML文档中的术语,判断获得的所述术语与白名单中的术语是否匹配。如果判断结果为获得的所述术语与白名单中的术语匹配,则进入步骤S30;如果获得的所述术语与白名单中的术语不匹配,则进入步骤S22。This embodiment may establish a whitelist, a local index, and a remote termbase in a local or server, and the whitelist, local index, and remote termbase may be located in the local terminal or in the server. Optionally, the whitelist and the local index are located in the local terminal, where the whitelist and the local index may be a subset of the remote termbase. In practice, the whitelist, the local index, and the remote termbase may also be three without an intersection. Termbases, each containing different terms. In more implementations, the user can create two termbases, or multiple termbases, according to actual needs. According to the term in the XML document obtained in step S10, it is judged whether the obtained term matches the term in the white list. If the result of the judgment is that the obtained term matches the term in the white list, it proceeds to step S30; if the obtained term does not match the term in the white list, it proceeds to step S22.
具体实施中在执行本步骤之前还可以判断获得的所述术语与黑名单中的术语是否匹配,如果判断结果为获得的所述术语与黑名单中的术语不匹配时,才进行本步骤。In the specific implementation, before performing this step, it may be determined whether the obtained term matches the term in the blacklist, and if the judgment result is that the obtained term does not match the term in the blacklist, the step is performed.
步骤S22,判断获得的所述术语与本地索引中的术语是否匹配;Step S22, determining whether the obtained term matches the term in the local index;
如果步骤S21的判断结果为获得的所述术语与白名单中的术语不匹配,则判断获得的所述术语与本地索引中的术语是否匹配,本地索引中包括常用的术语。如果判断结果为获得的所述术语与本地索引中的术语匹配,则进入步骤S30;如果判断结果为获得的所述术语与本地索引中的术语不匹配,则进入步骤S23。If the result of the determination in step S21 is that the obtained term does not match the term in the white list, it is judged whether the obtained term matches the term in the local index, and the commonly used term is included in the local index. If the result of the judgment is that the obtained term matches the term in the local index, the process proceeds to step S30; if the result of the judgment is that the obtained term does not match the term in the local index, then step S23 is reached.
步骤S23,判断获得的所述术语与远程术语库中的术语是否匹配;Step S23, determining whether the obtained term matches the term in the remote termbase;
根据步骤S22的判断结果为获得的所述术语与本地索引中的术语不匹配时,判断获得的所述术语与远程术语库中的术语是否匹配,远程术语库 可以位于远程服务器中,也可以位于本地数据库中。如果判断结果为获得的所述术语与远程术语库中的术语匹配,则进入步骤S30;如果判断结果为获得的所述术语与远程术语库中的术语不匹配,则进入步骤S24。If the term obtained according to the judgment result of step S22 does not match the term in the local index, it is judged whether the obtained term matches the term in the remote termbase, and the remote termbase is matched. It can be located on a remote server or in a local database. If the result of the judgment is that the obtained term matches the term in the remote termbase, the process proceeds to step S30; if the result of the judgment is that the obtained term does not match the term in the remote termbase, then step S24 is entered.
步骤S24,不进行任何处理;Step S24, no processing is performed;
如果在步骤S23的判断结果为获得的所述术语与远程术语库中的术语不匹配,则不进行任何处理或者继续处理XML文档后续内容。If the result of the determination in step S23 is that the obtained term does not match the term in the remote termbase, no processing is performed or the subsequent content of the XML document is continuously processed.
本实施例中白名单和本地索引是远程术语库的子集,当然白名单和本地索引是远程术语库也可以没有交集,各自包括不同的术语,即将所有的术语按需要分别放在白名单、本地索引和远程术语库中。具体实施中,在步骤S22和步骤S23中在判断到获得的所述术语与本地索引或者远程术语库中的术语匹配时,还可以显示提示信息,比如是否加入白名单,或者编辑白名单、本地索引和远程术语库。在更多的实施中还可以只建立本地索引和远程术语库。In this embodiment, the whitelist and the local index are a subset of the remote termbase. Of course, the whitelist and the local index are remote termbases that may not have an intersection, and each includes a different term, that is, all the terms are whitelisted as needed. Local indexing and remote termbases. In a specific implementation, when it is determined in step S22 and step S23 that the obtained term matches the term in the local index or the remote termbase, the prompt information may also be displayed, such as whether to add a whitelist, or edit the whitelist, local. Index and remote termbases. In more implementations, only local indexing and remote termbases can be built.
步骤S30,在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。Step S30, inserting link information of the term matching the term in the preset termbase in the corresponding position of the XML document.
如果在步骤S21的判断结果是获得的所述术语与白名单中的术语匹配,则步骤S30包括在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息。如果在步骤S22的判断结果是获得的所述术语与白名单中的术语匹配,则步骤S30包括在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息。如果在步骤S23的判断结果是获得的所述术语与白名单中的术语匹配,则步骤S30包括在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。If the result of the determination in step S21 is that the obtained term matches the term in the white list, step S30 includes inserting link information of the term matching the term in the whitelist at the corresponding position of the XML document. If the result of the determination in step S22 is that the obtained term matches the term in the white list, step S30 includes inserting link information of a term matching the term in the local index at the corresponding position of the XML document. If the result of the determination in step S23 is that the obtained term matches the term in the whitelist, step S30 includes inserting link information of a term matching the term in the remote termbase at the corresponding position of the XML document.
本发明将术语放在白名单、本地索引和远程术语库三个库中,将链接信息确定的术语放在白名单中,常用的术语放在本地索引中,然后分别判断获得的所述术语是否在白名单、本地索引和远程术语库中,提高了查找 效率。可以在一定程度上避免随着术语库中术语的增多,如果放在同一个术语库中,导致查找时间长的问题。The invention puts the terms in the three lists of the white list, the local index and the remote termbase, puts the terms determined by the link information in the white list, the commonly used terms are placed in the local index, and then respectively judge whether the obtained terms are Improved lookup in whitelists, local indexes, and remote termbases effectiveness. It can be avoided to some extent that with the increase of the terminology in the termbase, if placed in the same termbase, the problem of long search time is caused.
请参照图5,图5为图1中如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息的流程示意图。Please refer to FIG. 5. FIG. 5 is a diagram showing the link information of the term matching the term in the preset termbase in the corresponding position of the XML document if the term obtained in FIG. 1 matches the term in the preset termbase. Schematic diagram of the process.
步骤S31,在获得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面;Step S31, displaying a selection interface when the obtained term matches the number of terms in the preset termbase is at least one;
[根据细则91更正 06.01.2016] 
如果在步骤S20判断到得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面,如图6所示,图6可为图5中所术匹配方法中的选择界面示意图。所述选择界面包括读取到的所述术语的内容、匹配内容的英文全称和中文全称、匹配内容的说明信息、选择项、编辑白名单按钮和确定按钮等。其中匹配内容的说明信息可以根据用户在选择界面选择不同的选择项显示对应选择项的说明,用户可以通过编辑白名单进入白名单编辑界面。具体实施中所述选择界面还可以根据用户设置需要添加或者减少功能按键或者设置不同的界面,比如添加编辑黑名单、编辑本地索引等按钮,然后通过对应的按钮进入相应编辑界面。具体实施中,如果判断到获得的所述术语与白名单中的术语匹配,则可以不显示选择界面,直接在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息。在更多的实施中还可以在步骤S20判断到得的所述术语与预置术语库中的术语匹配的数量为至少两个时,显示选择界面,可以在一定程度上减少用户操作。在接收到用户基于所述选择界面触发的选择命令后,进入步骤S32。
[Correct according to Rule 91 06.01.2016]
If the number of the term determined in step S20 matches the term in the preset termbase is at least one, the selection interface is displayed, as shown in FIG. 6, which may be in the matching method in FIG. Select the interface diagram. The selection interface includes the content of the read term, the English full name and the Chinese full name of the matching content, the description information of the matching content, the selection item, the edit white list button, and the determination button. The description information of the matching content may be displayed according to the user selecting different options in the selection interface to display the corresponding selection item, and the user may enter the white list editing interface by editing the white list. In the specific implementation, the selection interface may also add or reduce function buttons or set different interfaces according to user settings, such as adding an edit blacklist, editing a local index, and the like, and then entering a corresponding editing interface through corresponding buttons. In a specific implementation, if it is determined that the obtained term matches the term in the whitelist, the link information of the term matching the term in the whitelist may be directly inserted in the corresponding position of the XML document without displaying the selection interface. In more implementations, when the number of the terms that are determined in step S20 and the terminology in the preset termbase are at least two, the selection interface may be displayed, and the user operation may be reduced to some extent. After receiving the selection command triggered by the user based on the selection interface, the process proceeds to step S32.
步骤S32,接收用户在所述选择界面触发的选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息。Step S32, receiving a selection command triggered by the user on the selection interface, and inserting link information of the term in a corresponding position of the XML document according to the selection command.
根据步骤S31,用户在所述选择界面触发的选择命令,终端接收到选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信 息以供所述XML文档在执行时通过链接信息使用所述术语。当然也可以根据“跳过”的选择命令跳过此次处理,继续处理后续的XML文档内容,或者根据对应编辑按钮进入白名单、本地索引或远程数据库的编辑界面。According to step S31, the user selects a selection command triggered by the selection interface, and the terminal receives a selection command, and inserts a link letter of the term in the corresponding position of the XML document according to the selection command. The term is used by the XML document to use the term by link information when executed. Of course, you can skip this process according to the "skip" selection command, continue to process the subsequent XML document content, or enter the whitelist, local index or remote database editing interface according to the corresponding edit button.
本发明实施例在判断获得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面。用户可以通过选择界面查看匹配所述术语的相关信息,帮助用户识别正确的匹配所述术语的链接信息,提高用户识别所述术语的速度,并可以通过选择界面进入对应术语库,进行编辑。能够大大减少用户在挑选正确术语上花费的时间;方便用户在使用过程中调整所述预置的术语库。In the embodiment of the present invention, when the number of the terms obtained by the judgment and the terminology in the preset termbase is at least one, the selection interface is displayed. The user can view related information matching the term through the selection interface, help the user identify the correct link information matching the term, improve the speed at which the user recognizes the term, and can enter the corresponding termbase through the selection interface for editing. It can greatly reduce the time spent by users in selecting correct terms; it is convenient for users to adjust the preset termbase during use.
本发明实施例进一步提供一种文档处理装置。The embodiment of the invention further provides a document processing apparatus.
请参照图7,图7为本发明文档处理装置第一实施例的功能模块示意图。Please refer to FIG. 7. FIG. 7 is a schematic diagram of functional modules of a first embodiment of a document processing apparatus according to the present invention.
在本实施例中,该文档处理装置包括:获得模块10、第一判断模块20和处理模块30。In this embodiment, the document processing apparatus includes: an obtaining module 10, a first determining module 20, and a processing module 30.
获得模块10,配置为获得XML文档中的术语;Obtaining module 10 configured to obtain terms in an XML document;
用户在本发明实施例软件相应输入框中输入本地索引文件路径或者待处理XML文件路径,当然还可以通过本发明的软件打开待处理的XML文件。本发明软件根据用户输入的待处理的XML文件的路径,找到对应的待处理XML文档,读取XML文档中内容,自动搜索所述XML文档中的术语,本发明中术语包括术语和/或缩略语,从而获得所述XML文档中的术语。The user inputs the local index file path or the path of the XML file to be processed in the corresponding input box of the software in the embodiment of the present invention. Of course, the XML file to be processed can also be opened by the software of the present invention. The software of the present invention finds the corresponding XML document to be processed according to the path of the XML file to be processed input by the user, reads the content in the XML document, and automatically searches for terms in the XML document. The terminology in the present invention includes terms and/or contraction. Abbreviation, thereby obtaining terms in the XML document.
第一判断模块20,配置为判断获得的所述术语与预置术语库中的术语是否匹配,所述预置术语库包括白名单、本地索引和远程术语库。The first determining module 20 is configured to determine whether the obtained term matches the term in the preset termbase, and the preset termbase includes a whitelist, a local index, and a remote termbase.
本实施例中预置术语库包括但不限于术语的名称列、地址列、英文全称列和中文全称列,具体实施中可以根据不同的术语设置不同的内容。根据获得模块10获得的所述XML文档中的术语,判断所述术语是否与预置 术语库中的术语匹配,比如本实施例中通过判断获得的所述术语是否和预置术语库中名称列中的内容相同。The preset term library in this embodiment includes, but is not limited to, a name column, an address column, an English full name column, and a Chinese full name column. In the specific implementation, different contents may be set according to different terms. Determining whether the term is preset or not according to a term in the XML document obtained by the obtaining module 10. The term matching in the termbase, such as the term obtained by the judgment in the present embodiment, is the same as the content in the name column in the preset termbase.
处理模块30,配置为如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。The processing module 30 is configured to insert link information of a term matching the term in the preset termbase at a corresponding position of the XML document if the obtained term matches the term in the preset termbase.
根据第一判断模块20的判断结果,在所述术语与预置术语库中的术语匹配时,在所述XML文档中与预置术语库中的术语匹配的所述术语的相应位置插入对应该术语的链接信息。在后续使用XML文档时,可以通过读取链接信息读取到对应所述术语的属性,即读取到所述术语的名称、地址、英文全称和中文全称等,从而使用所述术语,执行对应操作或显示对应的内容。具体实施中还可以根据第一判断模块20的判断结果为所述术语与预置术语库中的术语不匹配时,显示提示信息;或者在处理所述XML文档之后显示提示信息;当然也可以不显示提示信息。According to the judgment result of the first judging module 20, when the term matches the term in the preset termbase, the corresponding position of the term in the XML document matching the term in the preset termbase is inserted correspondingly Link information for the term. When the XML document is subsequently used, the attribute corresponding to the term can be read by reading the link information, that is, the name, address, full name of English, and Chinese full name of the term are read, so that the term is used to perform the corresponding Operate or display the corresponding content. In the specific implementation, according to the determination result of the first determining module 20, the prompt information may be displayed when the term does not match the term in the preset termbase; or the prompt information may be displayed after processing the XML document; A message is displayed.
本发明实施例通过获得XML文档中的所述术语;判断获得的所述术语与预置术语库中的术语是否匹配;如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入所述术语的链接信息,以供使用所述术语。通过上述方式,本发明可以自动从XML文档中获得术语,无需人为参与,在预置术语库中查找与获得的术语匹配的术语,在所述XML文档相应位置插入匹配的术语的链接信息。可以节省XML文档作者的手工在术语库中实行查找、辨别和在XML文档制作链接信息的时间,同时可以避免XML文档作者在手工操作过程中容易的误操作,比如可以避免在术语库中重复术语过多时,即一个术语名具有多个不同的解释,会使得XML文档作者在挑选正确条目上花费的时间很长,且可能一个术语名具有多个连接信息时,容易使得制作出现错误的问题。The embodiment of the present invention determines whether the obtained term matches the term in the preset termbase by obtaining the term in the XML document; if the obtained term matches the term in the preset termbase, The link information of the term is inserted into the corresponding position of the XML document for use of the term. In the above manner, the present invention can automatically obtain terms from an XML document, without human intervention, look up terms in the preset termbase that match the obtained terminology, and insert link information of matching terms in the corresponding position of the XML document. It can save the time of authors of XML documents to manually find, identify and create link information in XML documents, and avoid the easy misoperation of XML document authors during manual operations, such as avoiding repeated terms in termbases. Too many, that is, a term name with multiple different interpretations will make the XML document author take a long time to pick the correct item, and it may be easy to make a mistake when making a term name with multiple connection information.
请参照图8,图8为本发明文档处理装置第二实施例的功能模块示意图。 基于本发明文档处理装置第一实施例,所述装置还可以包括:Please refer to FIG. 8. FIG. 8 is a schematic diagram of functional modules of a second embodiment of a document processing apparatus according to the present invention. Based on the first embodiment of the document processing apparatus of the present invention, the apparatus may further include:
第二判断模块40,配置为判断获得的所述术语与黑名单是否匹配。The second determining module 40 is configured to determine whether the obtained term matches the blacklist.
本实施例可以在本地或者服务器中建立黑名单,所述黑名单包括不需要判断的术语。具体实施中也可以不建立黑名单。在获得的所述术语后,在判断获得的所述术语与预置术语库中的术语是否匹配之前,可以先判断获得的所述术语与黑名单中的术语是否匹配。In this embodiment, a blacklist may be established locally or in a server, and the blacklist includes terms that do not need to be judged. In the specific implementation, the blacklist may not be established. After the term is obtained, it may be judged whether the obtained term matches the term in the blacklist before judging whether the obtained term matches the term in the preset termbase.
如果第二判断模块40的判断结果为获得的所述术语与黑名单中的术语不匹配,则第一判断模块20判断获得的所述术语与预置术语库中的术语是否匹配。这里表示第一判断模块20,配置为如果第二判断模块40的判断结果为获得的所述术语与黑名单中的术语不匹配,则判断获得的所述术语与预置术语库中的术语是否匹配。If the judgment result of the second judging module 40 is that the obtained term does not match the term in the blacklist, the first judging module 20 judges whether the obtained term matches the term in the preset termbase. The first judging module 20 is configured to determine whether the obtained term and the term in the preset termbase are determined if the judgment result of the second judging module 40 is that the obtained term does not match the term in the blacklist. match.
如果第二判断模块40的判断结果为获得的所述术语与黑名单中的术语匹配,则不做任何处理,当然也可以继续判断下一个术语与黑名单中的术语是否匹配。If the judgment result of the second judging module 40 is that the obtained term matches the term in the blacklist, no processing is performed, and it is of course possible to continue to judge whether the next term matches the term in the blacklist.
请参照图9,图9为图7中获得模块的一种结构示意图。Please refer to FIG. 9. FIG. 9 is a schematic structural diagram of the module obtained in FIG.
基于本发明文档处理装置第一实施例,获得模块10包括:Based on the first embodiment of the document processing apparatus of the present invention, the obtaining module 10 includes:
获得单元11,配置为获得所述XML文档中的英文内容。The obtaining unit 11 is configured to obtain the English content in the XML document.
用户在本发明软件相应输入框中输入本地索引文件路径或者待处理XML文档路径,当然还可以通过本发明的软件打开待处理的XML文档。用户输入本地索引文件路径或者待处理XML文档路径后,用户点击开始处理按钮,本发明实施例基于用户触发的开始命令,开始逐节点取所述XML文档。XML文档一般分为两种:一种是纯英文文档;一种是英文和其他类型文字混合的文档,比如中英文混合文档。在读取所述英文内容前,判断待处理的XML文档是纯英文文档还是混合文档。如果读取到的是纯英文文档,则按空格符提取其中的英文内容;如果读取到的是混合文档,则确定 混合文档中的英文内容,比如逐一读取所述混合文档的内容,然后判断读取的内容是否为英文内容,当然还可以根据标点符号逐句判断或者逐一判断读取的内容是否为英文内容。具体实施中在读取到所述XML文档中的英文内容时,还可以记录读取到的英文内容的位置,当然也可以不记录读取到的英文内容的位置,比如在逐行或者逐句处理所述XML文档时,可以不记录读取到的英文内容的位置。The user inputs the local index file path or the path of the XML document to be processed in the corresponding input box of the software of the present invention. Of course, the XML document to be processed can also be opened by the software of the present invention. After the user inputs the local index file path or the path of the XML document to be processed, the user clicks the start processing button, and the embodiment of the present invention starts to retrieve the XML document node by node based on the start command triggered by the user. XML documents are generally divided into two types: one is a pure English document; the other is a mixture of English and other types of text, such as a mixed document in Chinese and English. Before reading the English content, it is determined whether the XML document to be processed is a pure English document or a hybrid document. If the pure English document is read, the English content is extracted by the space character; if the mixed document is read, it is determined Mixing the English content in the document, for example, reading the content of the mixed document one by one, and then judging whether the read content is English content, and of course, judging according to the punctuation marks or determining whether the read content is English content one by one. In the specific implementation, when the English content in the XML document is read, the position of the read English content may also be recorded, and of course, the position of the read English content may not be recorded, for example, line by line or sentence by sentence. When the XML document is processed, the location of the read English content may not be recorded.
判断单元12,配置为判断所述英文内容中除首字母之外,是否存在大写字母。The determining unit 12 is configured to determine whether there is an uppercase letter in the English content except the initial letter.
根据获得单元11获得的所述XML文档中的英文内容,判断所述英文内容中除首字母外,是否还存在大写字母。According to the English content in the XML document obtained by the obtaining unit 11, it is determined whether there is an uppercase letter in the English content except the initial letter.
确定单元13,配置为在所述判断单元的判断结果为所述英文内容中除首字母之外,存在大写英文字母时,确定所述英文内容为所述术语。The determining unit 13 is configured to determine that the English content is the term when there is an uppercase English letter in addition to the initial letter in the English content in the judgment unit.
对于在所述XML文档中的术语,一般为连续的至少两个英文大写字母。根据判断单元12的判断结果为判断到所述英文内容中除首字母之外,还存在大写字母时,确定所述英文内容为术语。如果判断单元12的判断结果为判断到所述英文内容中除首字母之外,不存在大写字母,则确定所述英文内容不是术语。然后继续读取所述XML文档后续内容,或者继续判断读取到的后续英文内容。For the terms in the XML document, there are generally at least two English uppercase letters in a row. According to the judgment result of the judging unit 12, when it is determined that the English content has an uppercase letter in addition to the initial letter, the English content is determined to be a term. If the judgment result of the judging unit 12 is that there is no uppercase letter other than the initial letter in the English content, it is determined that the English content is not a term. Then continue to read the subsequent content of the XML document, or continue to determine the subsequent English content read.
本发明实施例利用在XML文档中术语的形式为除去首字母至少还存在一个大写字母的现象,判断所述XML文档中的术语。首先判断读取的所述XML文档的类型,如果所述XML文档是纯英文文档,则利用各个英文单词之间会存在空格的现象,确定读取到的英文内容;如果所述XML文档是混合文档,则判断读取的内容是否为英文内容。在读取到英文内容时,提取所述XML文档中的英文内容,从而获得所述XML文档中的英文内容。The embodiment of the present invention utilizes the form of the term in the XML document to judge the term in the XML document by removing the phenomenon that at least one uppercase letter exists in the first letter. First, determining the type of the read XML document, if the XML document is a pure English document, determining the read English content by using a phenomenon that there is a space between each English word; if the XML document is mixed The document determines whether the read content is English content. When the English content is read, the English content in the XML document is extracted, thereby obtaining the English content in the XML document.
请参照图10,图10为图7中第一判断模块的一种结构块示意图。 Please refer to FIG. 10. FIG. 10 is a schematic structural block diagram of the first judging module of FIG. 7.
基于本发明文档处理装置第一实施例,第一判断模块20包括:Based on the first embodiment of the document processing apparatus of the present invention, the first determining module 20 includes:
第一判断单元21,配置为判断获得的所述术语与白名单中的术语是否匹配。如果获得的所述术语与白名单中的术语匹配,则处理模块30在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息。The first determining unit 21 is configured to determine whether the obtained term matches the term in the whitelist. If the term obtained matches a term in the whitelist, the processing module 30 inserts link information for the term that matches the term in the whitelist at the corresponding location of the XML document.
本实施例可以在本地或者服务器中建立白名单、本地索引和远程术语库,所述白名单、本地索引和远程术语库可以位于本地终端中也可以位于服务器中,可选地所述白名单、本地索引位于本地终端中,其中,白名单、本地索引可以为远程术语库的子集,具体实施中白名单、本地索引和远程术语库也可以是没有交集的三个术语库,即各自包括不同的术语。在更多的实施中用户可以根据实际需要建立其中两个术语库,或者多个术语库。根据获取模块10获得XML文档中的术语,判断获得的所述术语与白名单中的术语是否匹配。如果获得的所述术语与白名单中的术语匹配,则处理模块30在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息。The whitelist, the local index, and the remote termbase may be established in the local or the server, and the whitelist, the local index, and the remote termbase may be located in the local terminal or in the server, optionally the whitelist, The local index is located in the local terminal, where the whitelist and the local index may be a subset of the remote termbase. In practice, the whitelist, the local index, and the remote termbase may also be three termbases without intersections, that is, each includes different terms of. In more implementations, the user can create two termbases, or multiple termbases, according to actual needs. According to the term in the XML document obtained by the obtaining module 10, it is judged whether the obtained term matches the term in the white list. If the term obtained matches a term in the whitelist, the processing module 30 inserts link information for the term that matches the term in the whitelist at the corresponding location of the XML document.
第二判断单元22,配置为在获得的所述术语与白名单中的术语不匹配时,判断获得的所述术语与本地索引中的术语是否匹配。如果获得的所述术语与本地索引中的术语匹配,则处理模块30在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息。The second determining unit 22 is configured to determine whether the obtained term matches the term in the local index when the obtained term does not match the term in the whitelist. If the term obtained matches a term in the local index, the processing module 30 inserts link information for the term that matches the term in the local index at the corresponding location of the XML document.
如果第一判断单元21的判断结果为获得的所述术语与白名单中的术语不匹配,则判断获得的所述术语与本地索引中的术语是否匹配,本地索引中包括常用的术语。如果获得的所述术语与本地索引中的术语匹配,则所述处理模块在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息。If the result of the determination by the first judging unit 21 is that the obtained term does not match the term in the white list, it is judged whether the obtained term matches the term in the local index, and the commonly used term is included in the local index. If the term obtained matches a term in the local index, the processing module inserts link information for the term that matches the term in the local index at the corresponding location of the XML document.
第三判断单元23,配置为在获得的所述术语与本地索引中的术语不匹配时,判断获得的所述术语与远程术语库中的术语是否匹配。处理模块30 配置为如果获得的所述术语与远程术语库中的术语匹配,则在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。The third determining unit 23 is configured to determine whether the obtained term matches the term in the remote termbase when the obtained term does not match the term in the local index. Processing module 30 Configured to insert link information for terms that match terms in the remote termbase at the corresponding location of the XML document if the terms obtained match the terms in the remote termbase.
根据第二判断单元22的判断结果为获得的所述术语与本地索引中的术语不匹配时,判断获得的所述术语与远程术语库中的术语是否匹配,远程术语库可以位于远程服务器中,也可以位于本地数据库中。处理模块30,配置为如果获得的所述术语与远程术语库中的术语匹配,则在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。如果判断结果为获得的所述术语与远程术语库中的术语不匹配,则不进行任何处理或者继续处理XML文档后续内容。According to the judgment result of the second judging unit 22, when the obtained term does not match the term in the local index, it is judged whether the obtained term matches the term in the remote termbase, and the remote termbase may be located in the remote server. It can also be located in a local database. The processing module 30 is configured to insert link information of terms matching the terms in the remote termbase at corresponding positions of the XML document if the terms obtained match the terms in the remote termbase. If the result of the judgment is that the obtained term does not match the term in the remote termbase, no processing is performed or the subsequent content of the XML document is continuously processed.
本实施例中白名单和本地索引是远程术语库的子集,当然白名单和本地索引是远程术语库也可以没有交集,各自包括不同的术语,即将所有的术语按需要分别放在白名单、本地索引和远程术语库中。具体实施中,在第二判断单元22和第三判断单元23中在判断到获得的所述术语与本地索引或者远程术语库中的术语匹配时,还可以显示提示信息,比如是否加入白名单,或者编辑白名单、本地索引和远程术语库。在更多的实施中还可以只建立本地索引和远程术语库。In this embodiment, the whitelist and the local index are a subset of the remote termbase. Of course, the whitelist and the local index are remote termbases that may not have an intersection, and each includes a different term, that is, all the terms are whitelisted as needed. Local indexing and remote termbases. In a specific implementation, when it is determined that the obtained term matches the term in the local index or the remote termbase in the second determining unit 22 and the third determining unit 23, the prompt information may also be displayed, such as whether to join the whitelist. Or edit whitelists, local indexes, and remote termbases. In more implementations, only local indexing and remote termbases can be built.
本发明实施例将术语放在白名单、本地索引和远程术语库三个库中,将链接信息确定的术语放在白名单中,常用的术语放在本地索引中,然后分别判断获得的所述术语是否在白名单、本地索引和远程术语库中,提高了查找效率。可以在一定程度上避免随着术语库中术语的增多,如果放在同一个术语库中,导致查找时间长的问题。In the embodiment of the present invention, the terms are placed in three libraries of a white list, a local index, and a remote termbase. The terms determined by the link information are placed in a white list, and the commonly used terms are placed in the local index, and then the obtained Whether the term is in whitelists, local indexes, and remote termbases improves search efficiency. It can be avoided to some extent that with the increase of the terminology in the termbase, if placed in the same termbase, the problem of long search time is caused.
请参照图11,图11为图7中处理模块的一种结构示意图。Please refer to FIG. 11. FIG. 11 is a schematic structural diagram of the processing module of FIG.
基于本发明文档处理装置第一实施例,处理模块30包括:Based on the first embodiment of the document processing apparatus of the present invention, the processing module 30 includes:
显示单元31,配置为在所述第一判断模块判断结果为获得的所述术语与预置术语库中匹配的数量为至少一个时,显示选择界面。 The display unit 31 is configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one.
如果在第一判断单元20判断到得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面,如图6所示,图6为选择界面示意图。所述选择界面包括读取到的所述术语的内容、匹配内容的英文全称和中文全称、匹配内容的说明信息、选择项、编辑白名单按钮和确定按钮等。其中匹配内容的说明信息可以根据用户在选择界面选择不同的选择项显示对应选择项的说明,用户可以通过编辑白名单进入白名单编辑界面。具体实施中所述选择界面还可以根据用户设置需要添加或者减少功能按键或者设置不同的界面,比如添加编辑黑名单、编辑本地索引等按钮,然后通过对应的按钮进入相应编辑界面。具体实施中,如果判断到获得的所述术语与白名单中的术语匹配,则可以不显示选择界面,直接在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息。在更多的实施中还可以在判断到得的所述术语与预置术语库中的术语匹配的数量为至少两个时,显示选择界面,可以在一定程度上减少用户操作。If the number of the terms judged by the first judging unit 20 matches the term in the preset termbase is at least one, the selection interface is displayed, as shown in FIG. 6, which is a schematic diagram of the selection interface. The selection interface includes the content of the read term, the English full name and the Chinese full name of the matching content, the description information of the matching content, the selection item, the edit white list button, and the determination button. The description information of the matching content may be displayed according to the user selecting different options in the selection interface to display the corresponding selection item, and the user may enter the white list editing interface by editing the white list. In the specific implementation, the selection interface may also add or reduce function buttons or set different interfaces according to user settings, such as adding an edit blacklist, editing a local index, and the like, and then entering a corresponding editing interface through corresponding buttons. In a specific implementation, if it is determined that the obtained term matches the term in the whitelist, the link information of the term matching the term in the whitelist may be directly inserted in the corresponding position of the XML document without displaying the selection interface. In more implementations, when the number of the terms that are judged to match the terminology in the preset termbase is at least two, the selection interface may be displayed, and the user operation may be reduced to some extent.
处理单元32,配置为接收用户在所述选择界面触发的选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息。The processing unit 32 is configured to receive a selection command triggered by the user on the selection interface, and insert the link information of the term in the corresponding position of the XML document according to the selection command.
根据用户在显示单元31中选择界面触发的选择命令,终端接收到选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息,以供所述XML文档在执行时通过链接信息使用所述术语。当然也可以根据“跳过”的选择命令跳过此次处理,继续处理后续的XML文档内容,或者根据对应编辑按钮进入白名单、本地索引或远程数据库的编辑界面。According to the selection command triggered by the user in the display unit 31, the terminal receives a selection command, and inserts link information of the term in the corresponding position of the XML document according to the selection command, so that the XML document passes during execution. The link information uses the terminology. Of course, you can skip this process according to the "skip" selection command, continue to process the subsequent XML document content, or enter the whitelist, local index or remote database editing interface according to the corresponding edit button.
本发明实施例在判断获得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面。用户可以通过选择界面查看匹配所述术语的相关信息,帮助用户识别正确的匹配所述术语的链接信息,提高用户识别所述术语的速度,并可以通过选择界面进入对应术语库,进行编辑。能够减少用户在挑选正确术语上花费的时间;方便用户在使用过程中调整 所述预置的术语库。In the embodiment of the present invention, when the number of the terms obtained by the judgment and the terminology in the preset termbase is at least one, the selection interface is displayed. The user can view related information matching the term through the selection interface, help the user identify the correct link information matching the term, improve the speed at which the user recognizes the term, and can enter the corresponding termbase through the selection interface for editing. Can reduce the time users spend on picking the right terms; it is convenient for users to adjust during use The preset termbase.
本发明实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行上述方法的至少其中之一,例如可执行如图1至图5中所示方法的至少其中之一。The embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, where the computer executable instructions are used to perform at least one of the foregoing methods, for example, as shown in FIG. 1 to FIG. At least one of the methods shown in 5.
所述计算机存储介质可包括硬盘、光盘、磁盘或闪盘等存储介质,可选为非瞬间存储介质。The computer storage medium may include a storage medium such as a hard disk, an optical disk, a magnetic disk, or a flash disk, and may be a non-transitory storage medium.
本发明实施例还提供了文档处理,如图12所示,所述装置包括处理器42、存储介质44以及至少一个外部通信接口41;所述处理器42、存储介质44以及外部通信接41均通过总线43连接。所述处理器42可为微处理器、中央处理器、数字信号处理器或可编程逻辑阵列等具有处理功能的电子元器件。所述存储介质44上存储有计算机可执行指令;所述处理器42执行所述存储介质44中存储的所述计算机可执行指令可实现上述方法中的任意一个。The embodiment of the present invention further provides document processing. As shown in FIG. 12, the apparatus includes a processor 42, a storage medium 44, and at least one external communication interface 41; the processor 42, the storage medium 44, and the external communication interface 41 are both Connected via bus 43. The processor 42 can be a processing component such as a microprocessor, a central processing unit, a digital signal processor, or a programmable logic array. Computer-executable instructions are stored on the storage medium 44; the processor 42 executing the computer-executable instructions stored in the storage medium 44 may implement any of the above methods.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡按照本发明原理所作的修改,都应当理解为落入本发明的保护范围。 The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and modifications made in accordance with the principles of the present invention are understood to fall within the scope of the present invention.

Claims (11)

  1. 一种文档处理方法,所述方法包括以下步骤:A document processing method, the method comprising the following steps:
    获得可扩展标记语言XML文档中的术语;Obtain terminology in an extensible markup language XML document;
    判断获得的所述术语与预置术语库中的术语是否匹配;Determining whether the obtained term matches the term in the preset termbase;
    如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。If the term obtained matches a term in the preset termbase, link information for the term that matches the term in the pre-defined termbase is inserted at the corresponding location of the XML document.
  2. 如权利要求1所述的方法,其中,所述获得XML文档中的术语的步骤之后,所述判断获得的所述术语与预置术语库中的术语是否匹配的步骤之前还包括:The method of claim 1, wherein after the step of obtaining the term in the XML document, the step of determining whether the term obtained by the judgment matches the term in the preset termbase further comprises:
    判断获得的所述术语与黑名单中的术语是否匹配;Determining whether the obtained term matches the term in the blacklist;
    如果判断结果为否,则执行所述判断获得的所述术语与预置术语库中的术语是否匹配的步骤。If the result of the determination is no, the step of performing the judgment is made to match whether the term in the preset termbase matches.
  3. 如权利要求1所述的方法,其中,所述获得XML文档中的术语的步骤包括:The method of claim 1 wherein said step of obtaining terms in an XML document comprises:
    获得所述XML文档中的英文内容;Obtaining the English content in the XML document;
    判断所述英文内容中除首字母之外,是否还存在大写字母;Determining whether there is an uppercase letter in addition to the initial letter in the English content;
    如果所述英文内容中除首字母之外,还存在大写字母,则确定所述英文内容为术语。If there is an uppercase letter in addition to the initials in the English content, the English content is determined to be a term.
  4. 如权利要求1至3任一项所述的方法,其中,所述判断获得的所述术语与预置术语库中的术语是否匹配的步骤包括:The method according to any one of claims 1 to 3, wherein the step of determining whether the term obtained by the judgment matches the term in the preset termbase comprises:
    判断获得的所述术语与白名单中的术语是否匹配;Determining whether the obtained term matches the term in the whitelist;
    如果获得的所述术语与白名单中的术语匹配,则在所述XML文档相应 位置插入与白名单中术语匹配的术语的链接信息;If the term obtained matches the term in the whitelist, then the corresponding XML document Location inserts link information for terms that match terms in the whitelist;
    如果获得的所述术语与白名单中的术语不匹配,则判断获得的所述术语与本地索引中的术语是否匹配;If the obtained term does not match the term in the whitelist, it is judged whether the obtained term matches the term in the local index;
    如果获得的所述术语与本地索引中的术语匹配,则在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息;If the obtained term matches the term in the local index, the link information of the term matching the term in the local index is inserted at the corresponding position of the XML document;
    如果获得的所述术语与本地索引中的术语不匹配,则判断获得的所述术语与远程术语库中的术语是否匹配;If the obtained term does not match the term in the local index, it is determined whether the obtained term matches the term in the remote termbase;
    如果获得的所述术语与远程术语库中的术语匹配,则在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。If the term obtained matches a term in the remote termbase, the link information of the term that matches the term in the remote termbase is inserted at the corresponding location of the XML document.
  5. 如权利要求1所述的方法,其中,所述如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息的步骤包括:The method of claim 1, wherein if the term obtained matches a term in a preset termbase, a link to a term matching the term in the pre-defined termbase is inserted at a corresponding location of the XML document The steps of the information include:
    在获得的所述术语与预置术语库中的术语匹配的数量为至少一个时,显示选择界面;Displaying a selection interface when the number of terms obtained matches the term in the preset termbase is at least one;
    接收用户在所述选择界面触发的选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息。Receiving a selection command triggered by the user on the selection interface, and inserting link information of the term in a corresponding position of the XML document according to the selection command.
  6. 一种文档处理装置,所述装置包括:A document processing apparatus, the apparatus comprising:
    获得模块,配置为获得XML文档中的术语;Obtain a module configured to obtain terms in an XML document;
    第一判断模块,配置为判断获得的所述术语与预置术语库中的术语是否匹配,所述预置术语库包括白名单、本地索引和远程术语库;a first determining module, configured to determine whether the obtained term matches a term in a preset termbase, where the preset termbase includes a whitelist, a local index, and a remote termbase;
    处理模块,配置为如果获得的所述术语与预置术语库中的术语匹配,则在所述XML文档相应位置插入与预置术语库中术语匹配的术语的链接信息。 A processing module configured to insert link information of a term that matches a term in the preset termbase at a corresponding location of the XML document if the term obtained matches a term in the preset termbase.
  7. 如权利要求6所述的装置,其中,所述装置还包括:The device of claim 6 wherein said device further comprises:
    第二判断模块,配置为判断获得的所述术语与黑名单是否匹配;a second determining module, configured to determine whether the obtained term matches the blacklist;
    所述第一判断模块,配置为如果所述第二判断模块的判断结果为否,则判断获得的所述术语与预置术语库中的术语是否匹配。The first determining module is configured to determine whether the obtained term matches the term in the preset termbase if the determination result of the second determining module is negative.
  8. 如权利要求6所述的装置,其中,所述获得模块包括:The apparatus of claim 6 wherein said obtaining module comprises:
    获得单元,配置为获得所述XML文档中的英文内容;Obtaining a unit configured to obtain English content in the XML document;
    判断单元,配置为判断所述英文内容中除首字母之外,是否存在大写字母;a determining unit, configured to determine whether there is an uppercase letter in the English content except the initial letter;
    确定单元,配置为在所述判断单元的判断结果为所述英文内容中除首字母之外,存在大写英文字母时,确定所述英文内容为所述术语。The determining unit is configured to determine that the English content is the term when there is an uppercase English letter in addition to the initial letter in the English content in the judgment unit.
  9. 如权利要求6至8任一项所述的装置,其中,所述第一判断模块包括:The apparatus of any one of claims 6 to 8, wherein the first determining module comprises:
    第一判断单元,配置为判断获得的所述术语与白名单中的术语是否匹配,如果获得的所述术语与白名单中的术语匹配,则所述处理模块在所述XML文档相应位置插入与白名单中术语匹配的术语的链接信息;a first determining unit, configured to determine whether the obtained term matches a term in the whitelist, and if the obtained term matches the term in the whitelist, the processing module inserts and is in a corresponding position of the XML document Link information for terms in the whitelist that match terms;
    第二判断单元,配置为在获得的所述术语与白名单中的术语不匹配时,判断获得的所述术语与本地索引中的术语是否匹配;a second determining unit configured to determine, when the obtained term does not match the term in the whitelist, whether the obtained term matches the term in the local index;
    所述处理模块配置为如果获得的所述术语与本地索引中的术语匹配,则在所述XML文档相应位置插入与本地索引中术语匹配的术语的链接信息;The processing module is configured to insert link information of a term that matches a term in the local index at a corresponding position of the XML document if the term obtained matches a term in the local index;
    第三判断单元,配置为在获得的所述术语与本地索引中的术语不匹配时,判断获得的所述术语与远程术语库中的术语是否匹配,如果获得的所述术语与远程术语库中的术语匹配,则所述处理模块在所述XML文档相应位置插入与远程术语库中术语匹配的术语的链接信息。 a third determining unit configured to determine, when the obtained term does not match a term in the local index, whether the obtained term matches a term in the remote termbase, if the term is obtained in a remote termbase The term matching, then the processing module inserts link information of terms matching the term in the remote termbase at the corresponding position of the XML document.
  10. 如权利要求6所述的装置,其中,所述处理模块包括:The apparatus of claim 6 wherein said processing module comprises:
    显示单元,配置为在所述第一判断模块判断结果为获得的所述术语与预置术语库中匹配的数量为至少一个时,显示选择界面;a display unit, configured to display a selection interface when the number of matches between the term and the preset termbase obtained by the first determining module is at least one;
    处理单元,配置为接收用户在所述选择界面触发的选择命令,根据所述选择命令在所述XML文档相应位置插入所述术语的链接信息。The processing unit is configured to receive a selection command triggered by the user on the selection interface, and insert the link information of the term in the corresponding position of the XML document according to the selection command.
  11. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至5所述方法的至少其中之一。 A computer storage medium having stored therein computer executable instructions for performing at least one of the methods of claims 1 to 5.
PCT/CN2015/090053 2015-06-16 2015-09-18 Document processing method and device, and computer storage medium WO2016201807A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510334408.7A CN106326198A (en) 2015-06-16 2015-06-16 Method and device for document processing
CN201510334408.7 2015-06-16

Publications (1)

Publication Number Publication Date
WO2016201807A1 true WO2016201807A1 (en) 2016-12-22

Family

ID=57544889

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/090053 WO2016201807A1 (en) 2015-06-16 2015-09-18 Document processing method and device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN106326198A (en)
WO (1) WO2016201807A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1848107A (en) * 2005-04-12 2006-10-18 国际商业机器公司 System and method for providing a transient dictionary that travels with an original electronic document
CN101004762A (en) * 2007-01-10 2007-07-25 张百川 Network web page system of a dynamic multidimensional Internet
CN101311934A (en) * 2008-06-30 2008-11-26 腾讯科技(深圳)有限公司 Medium player based key words content issue method and system
CN101458690A (en) * 2007-12-14 2009-06-17 北京龙拓互动广告有限公司 Advertisement publishing method and advertisement server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1848107A (en) * 2005-04-12 2006-10-18 国际商业机器公司 System and method for providing a transient dictionary that travels with an original electronic document
CN101004762A (en) * 2007-01-10 2007-07-25 张百川 Network web page system of a dynamic multidimensional Internet
CN101458690A (en) * 2007-12-14 2009-06-17 北京龙拓互动广告有限公司 Advertisement publishing method and advertisement server
CN101311934A (en) * 2008-06-30 2008-11-26 腾讯科技(深圳)有限公司 Medium player based key words content issue method and system

Also Published As

Publication number Publication date
CN106326198A (en) 2017-01-11

Similar Documents

Publication Publication Date Title
US7149971B2 (en) Method, apparatus, and system for providing multi-language character strings within a computer
JP2005092271A (en) Question-answering method and question-answering device
JP2000067065A (en) Method for identifying document image and record medium
CN103049098A (en) Device and method for input method shifting
CN106484699B (en) Method and device for generating database query field
JP2005107597A (en) Device and method for searching for similar sentence and program
CN104731364A (en) Input method and input method system
CN107077515B (en) Display control device, display control method, and display control medium
WO2019109514A1 (en) Datasheet backup method, device, electronic apparatus and medium
JP6003263B2 (en) Minutes creation support apparatus, minutes creation support system, minutes creation support method, and program
JP2004302678A (en) Database search path display method
JP2005182460A (en) Information processor, annotation processing method, information processing program, and recording medium having information processing program stored therein
CN109753557B (en) Answer output method, device, equipment and storage medium of question-answering system
CN111444208A (en) Data updating method and related equipment
WO2016201807A1 (en) Document processing method and device, and computer storage medium
CN115713572A (en) Text image generation method and device, electronic equipment and readable storage medium
US20170011020A1 (en) Automated processing of transcripts, transcript designations, and/or video clip load files
US20130174029A1 (en) Method and apparatus for analyzing a document
CN114968345A (en) Code processing method, system, computing device and storage medium
CN109815320B (en) Answer generation method, device, equipment and storage medium of question-answering system
JP2006344053A (en) Patent specification preparation support program
US10445320B2 (en) Document search apparatus, non-transitory computer readable medium, and document search method
JP2005115457A (en) Method of retrieving document file
JP2002140338A (en) Device and method for supporting construction of dictionary
JP2013077084A (en) Sentence example dictionary generation program and sentence example dictionary generation device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15895380

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15895380

Country of ref document: EP

Kind code of ref document: A1