CN109670092A - XML document proofreading method and device - Google Patents

XML document proofreading method and device Download PDF

Info

Publication number
CN109670092A
CN109670092A CN201910013644.7A CN201910013644A CN109670092A CN 109670092 A CN109670092 A CN 109670092A CN 201910013644 A CN201910013644 A CN 201910013644A CN 109670092 A CN109670092 A CN 109670092A
Authority
CN
China
Prior art keywords
xml document
document
check
xml
correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910013644.7A
Other languages
Chinese (zh)
Inventor
王盛华
尹真
王德刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Renhe Huizhi Information Technology Co Ltd
Original Assignee
Beijing Renhe Huizhi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Renhe Huizhi Information Technology Co Ltd filed Critical Beijing Renhe Huizhi Information Technology Co Ltd
Priority to CN201910013644.7A priority Critical patent/CN109670092A/en
Publication of CN109670092A publication Critical patent/CN109670092A/en
Pending legal-status Critical Current

Links

Abstract

The disclosure provides a kind of XML document proofreading method and device, is related to document check and correction technical field.The XML document proofreading method and device that the disclosure provides, it is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document, according to the format specification pre-established, the format of XML document is proofreaded, according to the dictionary pre-established, the participle of XML document is proofreaded, according to the standard database pre-established, the document information of XML document is proofreaded, according to the check and correction model constructed in advance, the body matter of XML document is proofreaded, the intelligence check and correction to XML document is realized, improves work efficiency.

Description

XML document proofreading method and device
Technical field
This disclosure relates to which document proofreads technical field, in particular to a kind of XML document proofreading method and device.
Background technique
During many articles are published, traditional " three examine three schools " system is continued to use always, but extreme portions are to beat paper original text Print off and, by editorial staff or press corrector, to article carry out from the beginning to the end read over processing, and the school for passing through national standard It modifies annotation to symbol, can take a substantial amount of time and energy, efficiency are lower.
Summary of the invention
In view of this, the disclosure provides a kind of XML document proofreading method and device.
A kind of XML document proofreading method that the disclosure provides, which comprises
It is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document;The pass Connection relationship includes the corresponding relationship and this article of the reference identification of each document elements and the document element in the XML document The reference sequence of the reference identification of shelves element.
According to the format specification pre-established, the format of the XML document is proofreaded.
According to the dictionary pre-established, the participle of the XML document is proofreaded.
According to the standard database pre-established, the document information of the XML document is proofreaded.
According to the check and correction model constructed in advance, the body matter of the XML document is proofreaded.
Further, the incidence relation according to foundation to the reference identifications of each document elements in XML document into Row check and correction the step of include:
According to the reference identification of each document elements and the document element in the XML document in the incidence relation Corresponding relationship, detecting to each reference identification in the XML document whether there is the reference identification pair in the XML document The document elements answered;Corresponding document elements if it does not exist then carry out check and correction prompt.
Sequentially according to the reference of the reference identification of each document elements in the XML document in the incidence relation, Whether the reference sequence for detecting the reference identification of the document element to each document elements in the XML document is correct;If no Correctly, then check and correction prompt is carried out.
Further, the document elements include text, figure, table, formula and bibliography;According to the association The reference sequence of the reference identification of each document elements in the XML document in relationship, to each of described XML document Document elements detect the reference identification of the document element reference sequence it is whether correct after, the method also includes:
The inquiry of Digital Object Unique Identifier DOI de-parsing is carried out to each bibliography in the XML document, is searched The Digital Object Unique Identifier DOI of each bibliography.
The Digital Object Unique Identifier DOI of each bibliography is carried out to repeat check and correction, judges the XML document In the Digital Object Unique Identifier DOI of all bibliography whether repeat;Duplicate digital object is only if it exists One identifier DOI, then carry out check and correction prompt.
Further, the format specification includes relationship detection, matching detection and the detection of non-null value item;The basis is pre- The format specification first formulated, the step of proofreading to the format of the XML document include:
According to the format specification pre-established, detection is scanned to the XML Metadata of the XML document.
Judge whether there is at least one of null value item, relationship detection exception and matching detection exception, and if it exists, then Carry out check and correction prompt.
Further, the dictionary includes participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary;The basis is pre- The dictionary first formulated, the step of proofreading to the participle of the XML document include:
Based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, inspection is scanned to the XML document It surveys.
Judge whether there is wrong word/word and/or sensitivity/stop word, and if it exists, then carry out check and correction prompt.
Further, it is being based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML text After shelves are scanned detection, the method also includes:
Judge whether there is syntax error, and if it exists, then carry out check and correction prompt.
Further, after carrying out check and correction prompt, the method also includes:
It according to setting condition, is prompted based on the check and correction, modify to the XML document or sets check and correction prompt This document is set to ignore or do not reresent.
Further, the document information includes authors' working unit and fund project information;
The standard database that the basis pre-establishes, the step of proofreading to the document information include:
Based on the standard database, authors' working unit and fund project information to the XML document carry out similitude Detection;The standard database is previously stored with multiple standardized units and fund project information.
The authors' working unit and fund project information similarity that the XML document is filtered out from the standard database are most High information.
School is carried out according to authors' working unit and fund project information of the highest information of the similarity to the XML document To prompt.
Further, the check and correction model is constructed by following steps:
For each XML document, to the record of XML document reservation editor and modification, and the record and the XML is literary Shelves are associated, and as sample.
Multiple samples are trained, check and correction model is obtained, school is carried out with the body matter to the XML document It is right.
The disclosure provides a kind of XML document verifying unit, including checking module and memory module.
The memory module is stored with the format specification pre-established, the dictionary pre-established, the criterion numeral pre-established The check and correction model constructed according to library and in advance.
The checking module is used for the reference identification according to the incidence relation of foundation to each document elements in XML document It is proofreaded;The incidence relation includes the reference identification of each document elements and the document element in the XML document The reference sequence of the reference identification of corresponding relationship and the document element.
The checking module is used to proofread the format of the XML document according to the format specification pre-established.
The checking module is used to proofread the participle of the XML document according to the dictionary pre-established.
The checking module is used to carry out the document information of the XML document according to the standard database pre-established Check and correction.
The checking module is used to carry out school to the body matter of the XML document according to the check and correction model constructed in advance It is right.
The XML document proofreading method and device that the disclosure provides, according to the incidence relation of foundation to each of XML document The reference identification of document elements is proofreaded, and according to the format specification pre-established, is proofreaded to the format of XML document, root According to the dictionary pre-established, the participle of XML document is proofreaded, according to the standard database pre-established, to XML document Document information is proofreaded, and according to the check and correction model constructed in advance, is proofreaded to the body matter of XML document, is realized pair The intelligence check and correction of XML document has been saved the time, has been improved work efficiency compared with artificial check and correction, and reduce manually at This.
To enable the above objects, features, and advantages of the disclosure to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the disclosure, letter will be made to attached drawing needed in the embodiment below It singly introduces, it should be understood that the following drawings illustrates only some embodiments of the disclosure, therefore is not construed as to range It limits, it for those of ordinary skill in the art, without creative efforts, can also be according to these attached drawings Obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided by the disclosure.
Fig. 2 is a kind of block diagram of XML document verifying unit provided by the disclosure.
Fig. 3 is a kind of flow diagram of XML document proofreading method provided by the disclosure.
Fig. 4 is another flow diagram of XML document proofreading method provided by the disclosure.
Fig. 5 is a kind of application schematic diagram of XML document proofreading method provided by the disclosure.
Fig. 6 is another application schematic diagram of XML document proofreading method provided by the disclosure.
Fig. 7 is another flow diagram of XML document proofreading method provided by the disclosure.
Fig. 8 is another flow diagram of XML document proofreading method provided by the disclosure.
Fig. 9 is another flow diagram of XML document proofreading method provided by the disclosure.
Figure 10 is another application schematic diagram of XML document proofreading method provided by the disclosure.
Figure 11 is another application schematic diagram of XML document proofreading method provided by the disclosure.
Figure 12 is another flow diagram of XML document proofreading method provided by the disclosure.
Figure 13 is another flow diagram of XML document proofreading method provided by the disclosure.
Icon: 100- electronic equipment;10-XML document verifying unit;11- checking module;12- memory module;20- storage Device;30- processor;40- communication unit.
Specific embodiment
Below in conjunction with attached drawing in the disclosure, the technical solution in the disclosure is clearly and completely described, it is clear that Described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Usually retouched in attached drawing here The component for the disclosure stated and shown can be arranged and be designed with a variety of different configurations.Therefore, below to mentioning in the accompanying drawings The detailed description of the embodiment of the disclosure of confession is not intended to limit claimed the scope of the present disclosure, but is merely representative of this Disclosed selected embodiment.Based on embodiment of the disclosure, those skilled in the art are in the premise for not making creative work Under every other embodiment obtained, belong to the disclosure protection range.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
During many articles are published, traditional " three examine three schools " system is continued to use always, and extreme portions are by paper original text It prints, by editorial staff or press corrector, processing of reading over from the beginning to the end is carried out to article, and pass through national standard Proof-reader's marks are modified annotation.
Although currently, there is many periodical guarantees journal article pdf document can be carried out text by unexpected rival's critique system The check and correction of content detects, but also for the check and correction of wrong word, it is still desirable to editor 's proofreading personnel directly repair pdf document Change correction process, remains using the check and correction knowledge in editor or press corrector's brains, high labor cost, expend time and essence Power, and it is easy to appear omission, cause check and correction not comprehensive.
Based on the studies above, the disclosure provides a kind of XML document proofreading method and device, to improve the above problem.
Fig. 1 is please referred to, the output method for the Word text that the disclosure provides is applied to electronic equipment shown in FIG. 1 100.XML document proofreading method provided by the disclosure is executed as the electronic equipment 100.In the disclosure, the electronics is set Standby 100 may be, but not limited to, PC (Personal Computer, PC), laptop, personal digital assistant (Personal Digital Assistant, PDA) or server etc. have the electronic equipment 100 of processing capacity.
The electronic equipment 100 includes XML document verifying unit 10 shown in Fig. 2, memory 20, processor 30 and leads to Believe unit 40;The memory 20, processor 30 and each element of communication unit 40 are directly or indirectly electrical between each other Connection, to realize the transmission or interaction of data.For example, these elements mutually can directly pass through one or more communication bus or letter Number line, which is realized, to be electrically connected.The XML document verifying unit 10 includes that at least one can be with software or firmware (Firmware) Form is stored in the software function module in the memory 20, and the processor 30 is stored in memory 20 by operation Software program and module, thereby executing various function application and data processing.
The memory 20 may be, but not limited to, random access memory (Random AccessMemory, RAM), Read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
The processor 30 can be a kind of IC chip, the processing capacity with signal.The processor 30 can To be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc..
The communication unit 40 is used for the communication established between the electronic equipment 100 and other external equipments by network Connection, and carried out data transmission by the network.
Fig. 3 is please referred to, Fig. 3 is the flow diagram of XML document proofreading method provided by the disclosure.Below to figure The detailed process of XML document proofreading method shown in 3 is described in detail.
Step S10: school is carried out according to reference identification of the incidence relation of foundation to each document elements in XML document It is right.
Wherein, the incidence relation includes the reference mark of each document elements and the document element in the XML document The reference sequence of the reference identification of the corresponding relationship and the document element of knowledge.
In the disclosure, fragmentation processing is carried out to non-structured document (such as Word document), according to unstructured text Each document elements in shelves divide non-structured document, the document data block after being divided, wherein unstructured The corresponding document data block of each document elements in document;Structuring processing is carried out to the non-structured document after division, Each the document data block obtained to division is converted to expandable mark language XML data, obtains XML document, meanwhile, to every One document data block establishes the reference with the document data block according to the reference sequence of the reference identification of the document data block The association of mark obtains incidence relation to get each document elements and the reference identification of the document element into XML document The reference identification of corresponding relationship and the document element reference sequence, in turn, can be by the reference identification of document elements The document element can be pin-pointed to.
Further, Fig. 4 is please referred to, the document elements include text, figure, table, formula and bibliography; The step of incidence relation according to foundation proofreads the reference identification of each document elements in XML document includes step Rapid S11 and step S12.
Step S11: according to each document elements and the document element in the XML document in the incidence relation The corresponding relationship of reference identification detects in the XML document each reference identification in the XML document and draws with the presence or absence of this With the corresponding document elements of mark.
Wherein, according in the XML document each document elements and the document element reference identification corresponding relationship, Detection is scanned to each reference identification in the XML document, to each reference identification detect in the XML document whether There are the corresponding document elements of the reference identification, and corresponding document elements, then carry out check and correction prompt if it does not exist.
For example, please referring to Fig. 5, Fig. 5 is a kind of schematic diagram of this method in practical applications, the reference mark in Fig. 5 Knowledge has [16], table 1 and Fig. 1, i.e. arrow meaning in Fig. 5;In the disclosure, the corresponding document elements of reference identification [16] are ginseng Document is examined, the corresponding document elements of reference identification table 1 are table, and the corresponding document elements of reference identification Fig. 1 are attached drawing;To reference Mark [16] is detected, and is inquired in the XML document with the presence or absence of reference identification [16] corresponding bibliography, and if it exists, The detection of next reference identification is skipped to, if it does not exist, carries out check and correction prompt, indicates that there is no reference marks in the XML document Know [16] corresponding bibliography;Reference identification Fig. 1 is detected, is inquired in the XML document with the presence or absence of reference identification The corresponding attached drawing of Fig. 1 carries out check and correction prompt if it does not exist, indicates that there is no reference identification Fig. 1 is corresponding in the XML document Attached drawing;If it exists, the detection of next reference identification is skipped to, until reference identification all in the XML document has all detected It is complete.
Step S12: according to drawing for the reference identification of each document elements in the XML document in the incidence relation With sequence, whether just the reference sequence of the reference identification of the document element is detected to each document elements in the XML document Really;If incorrect, check and correction prompt is carried out.
Wherein, according to the reference of the reference identification of each document elements in the XML document in the incidence relation Sequentially, detection is scanned to each reference identification in the XML document, to each document elements in the XML document Whether the reference sequence for detecting the reference identification of the document element is correct, if incorrect, carries out check and correction prompt.
For example, please referring to Fig. 6, Fig. 6 is a kind of schematic diagram of this method in practical applications, the reference mark in Fig. 6 Knowledge has [5], [6-8] and [9,10], i.e. arrow meaning in Fig. 6, reference identification [6-8] includes reference identification [6], reference in figure Identify [7] and reference identification [8];When being detected to reference identification [6-8], if reference identification [6] is for the first time in Fig. 6 Arrow be directed toward the position of reference identification [6-8] and be cited, and reference identification [7] is before reference identification [6] is cited And it is cited, then it represents that the reference sequence of reference identification [7] is incorrect, check and correction prompt is carried out, if correctly, skipping to next draw With the detection of the reference sequence of mark, until the reference sequence of reference identification all in the XML document has all detected.Again For example, reference identification is Fig. 1, Fig. 2, Fig. 3, if Fig. 2 is cited before Fig. 1 is cited for the first time, then it represents that the reference of Fig. 2 It is sequentially incorrect, check and correction prompt is carried out, if correctly, skipping to the detection of the reference sequence of next reference identification, until described The reference sequence of all reference identifications has all detected in XML document.
Further, Fig. 7 is please referred to, the document elements include text, figure, table, formula and bibliography; In the reference sequence according to the reference identifications of each document elements in the XML document in the incidence relation, to described Each document elements in XML document detect the reference identification of the document element reference sequence it is whether correct after, the method It further include step S13 to step S14.
Step S13: Digital Object Unique Identifier DOI de-parsing is carried out to each bibliography in the XML document Inquiry, searches the Digital Object Unique Identifier DOI of each bibliography.
Wherein, for bibliography, each bibliography has only one Digital Object Unique Identifier (Digital Object Unique Identifier, DOI), according to DOI, can inquire the corresponding bibliography of the DOI, It is right after whether the reference sequence for detecting the reference identification of the document element to each document elements in the XML document is correct Each bibliography in the XML document carries out the inquiry of Digital Object Unique Identifier DOI de-parsing, searches each with reference to text The DOI offered.
For each bibliography in the XML document, according to the name of the bibliography, author, publication time etc. Information carries out the inquiry of DOI de-parsing to the bibliography, searches the DOI of the bibliography, for example, carrying out to bibliography [X] The DOI of the inquiry of DOI de-parsing, the bibliography [X] found is xxxxxxxx, after the DOI for finding bibliography [X], If check to bibliography [X], user only needs to click the DOI of bibliography [X], can jump to corresponding net On standing, bibliography [X] is checked.
Step S14: the Digital Object Unique Identifier DOI of each bibliography is carried out to repeat check and correction, judges institute Whether the Digital Object Unique Identifier DOI for stating all bibliography in XML document repeats.Wherein, it weighs if it exists Multiple Digital Object Unique Identifier DOI, carries out check and correction prompt.
Wherein, the inquiry of DOI de-parsing is being carried out to each bibliography in the XML document, is finding each reference After the DOI of document, the DOI of each bibliography is carried out to repeat check and correction, be judged all described in the XML document The DOI of bibliography whether there is duplicate DOI, and if it exists, then carry out check and correction prompt.
For example, it is assumed that the DOI of the bibliography [X] and bibliography [Y] that find be it is duplicate, then determine bibliography Misquotation, and carry out check and correction prompt.
Further, turning back to Fig. 3 is combined, the step of XML document proofreading method further includes step S20.
S20: according to the format specification pre-established, the format of the XML document is proofreaded.
Wherein, optionally, in the disclosure, the XML document be periodical document to be proofreaded, the format specification according to Periodical Publisking standard, is pre-established, and the format specification includes relationship detection, matching detection and the inspection of non-null value item It surveys.
Further, Fig. 8, the format specification that the basis pre-establishes, to the format of the XML document are please referred to The step of being proofreaded includes step S21 to step S22.
Step S21: according to the format specification pre-established, detection is scanned to the XML Metadata of the XML document.
Wherein, after having pre-established format specification according to periodical Publisking standard, according to the format specification pre-established Detection is scanned to the XML Metadata of the XML document.In the disclosure, the XML document is periodical text to be proofreaded Shelves, therefore, the XML document include abstract, keyword, employ the document datas such as date, Toll table and text, and in institute It states in XML document, the XML Metadata is remainder data in addition to textual data, for example, abstract, keyword, employing the date And the document datas such as Toll table.
Step S22: at least one of null value item, relationship detection exception and matching detection exception are judged whether there is. Wherein, null value item, relationship detect at least one of exception and matching detection exception if it exists, carry out check and correction prompt.
Wherein, in the disclosure, the non-null value item detection indicates that the XML Metadata cannot be null value, that is, is directed to institute The abstract in XML document is stated, keyword, the date is employed, Toll table, repairs back the XML Metadatas such as date, if detection obtains wherein Any one metadata is null value, then determines that there are null value items, carry out check and correction prompt.For example, if detection obtains the XML document In abstract part be null value, i.e., there is no abstract part, then be determined to have null value item, carry out check and correction prompt, alternatively, if inspection Measuring the Toll table part in the XML document is null value, that is, Toll table part is not present, then determines to exist for null value , carry out check and correction prompt.
For journal article, exists and employ date, Toll table, repair back the number of files such as date, start page and sign-off sheet According to therefore, the relationship detection includes employing the date > repairing back the date > Toll table, sign-off sheet > starting in the XML document Page etc., i.e., the Toll table in the described XML document, which cannot be later than, repairs back the date;Repairing back the date cannot be later than and employ the date, described The sign-off sheet of XML document cannot be less than start page.It repairs back the date if the Toll table in the XML document is later than and/or repairs back Date, which is later than, employs date and/or sign-off sheet less than start page, then predicting relation detection is abnormal, and carries out check and correction prompt.
In the disclosure, the matching detection include authors' name Chinese and the matching of English it is corresponding, topic Chinese and The matching correspondence of English and the matching of multiple keywords are corresponding etc., if detection obtains any of them and cannot match correspondence, Then determine matching detection exception, carry out check and correction prompt, for example, if detection obtains the Chinese of authors' name and English and cannot match pair It answers, then determines matching detection exception, carry out check and correction prompt.
In the disclosure, according to the format specification pre-established, inspection is scanned to the XML Metadata of the XML document After survey, as long as detection obtains carrying out school there are the one of which in null value item, relationship detection exception and matching detection exception To prompt.
Optionally, in the disclosure, according to the format specification pre-established, the format of the XML document is proofreaded, It further include the comparison to figure, the network edition of table and galley size in the XML document, if scheming in the XML document, the net of table The size of network version size and galley is inconsistent, then carries out check and correction prompt.
Further, turning back to combining refering to 3, the step of XML document proofreading method further includes step S30.
Step S30: according to the dictionary pre-established, the participle of the XML document is proofreaded.
Wherein, the dictionary pre-established includes participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary;According to default Condition, the dictionary pre-established will do it update, to increase or safeguard wrong word/dictionary, sensitivity/disabling character word stock. For example, then according to predetermined period, being carried out more to the dictionary pre-established when the preset condition is predetermined period Newly, to increase or safeguard wrong word/dictionary, sensitivity/disabling dictionary.
Further, Fig. 9 is please referred to, the dictionary that the basis pre-establishes carries out the participle of the XML document The step of check and correction includes step S31 to step S32.
Step S31: be based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML document into Row Scanning Detction.
Step S32: wrong word/word and/or sensitivity/stop word are judged whether there is, and if it exists, then carry out check and correction prompt.
Wherein, be based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML document into After row Scanning Detction, wrong word/word or sensitivity/stop word are one of if it exists, then carry out check and correction prompt.
For example, please referring to Figure 10, Figure 10 is a kind of schematic diagram of this method in practical applications;In Figure 10, base In the preparatory participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, " Provice " that detection obtains arrow meaning goes out Therefore existing misspelling has carried out check and correction prompt thereunder.
Further, it is being based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML text After shelves are scanned detection, the method also includes:
Judge whether there is syntax error, and if it exists, then carry out check and correction prompt.
Wherein, it is based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, the XML document is carried out Scanning Detction judges whether there is syntax error, and if it exists, then carries out check and correction prompt.
For example, please referring to Figure 11, Figure 11 is a kind of schematic diagram of this method in practical applications;In Figure 11, base In the preparatory participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, " completion " that detection obtains arrow meaning occurs Therefore syntax error has carried out check and correction prompt thereunder.
Further, after carrying out check and correction prompt, the method also includes following steps.
It according to setting condition, is prompted based on the check and correction, modify to the XML document or sets check and correction prompt This document is set to ignore or do not reresent.
Wherein, described impose a condition includes that this document is ignored, do not reresent, suggesting that modification and specified modification etc. are selected , it is prompted according to check and correction, user can modify to the content in the XML document, or the check and correction is prompted to be arranged Ignore for this document or does not reresent.
Further, turning back to combining refering to Fig. 3, the XML document proofreading method further includes step S40.
Step S40: according to the standard database pre-established, the document information of the XML document is proofreaded.
Wherein, the document information includes authors' working unit and fund project information.
Further, turning back in conjunction with refering to fig. 12, the standard database that the basis pre-establishes believes the document The step of breath is proofreaded includes step S41 to step S43.
Step S41: being based on the standard database, authors' working unit and fund project information to the XML document into The detection of row similitude.
Wherein, the standard database is previously stored with multiple standardized units and fund project information;It is described more A standardized unit and fund project information are by the result data after accumulation check and correction or by the offer of official's unit, check and correction After member's confirmation, storage storage is carried out, the multiple standardized unit and fund project information will be used as reference standard pair The authors' working unit and fund project information of the XML document are detected.Optionally, in the disclosure, the normal data Library can be updated according to preset condition, to increase or safeguard standardized unit and the fund in the standard database Project information.
Step S42: the authors' working unit and fund project information of the XML document are filtered out from the standard database The highest information of similarity.
Wherein, due to being previously stored with multiple standardized units and fund project information in the standard database, Therefore, when the authors' working unit to the XML document and fund project information carry out similitude detection, from the standard database In filter out the authors' working unit and the highest information of fund project information similarity of the XML document.
Step S43: believed according to authors' working unit and fund project of the highest information of the similarity to the XML document Breath carries out check and correction prompt.
Wherein, in the authors' working unit and fund project information for filtering out the XML document from the standard database After the highest information of similarity, the highest information of the similarity is shown, and carries out check and correction prompt, user can be according to aobvious The information and check and correction prompt shown, authors' working unit and fund project information selection replaceability to the XML document cover, certainly Definition modification or not.
Further, turning back to combining refering to Fig. 3, the XML document proofreading method further includes step S50.
Step S50: according to the check and correction model constructed in advance, the body matter of the XML document is proofreaded.
Further, Figure 13 is please referred to, the check and correction model is constructed by following steps.
Step S70: being directed to each XML document, retains the XML document record of editor with modification, and by the record It is associated with the XML document, and as sample.
Wherein, for each XML document, retain user to the editor of the body matter of the document and the record of modification, And be associated the record of editor and modification with the document, as sample.
Step S71: being trained multiple samples, check and correction model is obtained, with the body matter to the XML document It is proofreaded.
Wherein, to each XML document, retain user to the editor of the body matter of the document and the record of modification, and The record of editor and modification are associated with the document, after sample, obtain trained number by constantly accumulating sample According to, the training data got is trained, check and correction model is obtained, it, then can be by XML document after obtaining check and correction model Body matter proofreaded.
The XML document proofreading method that the disclosure provides is handled using structuring processing and fragmentation, establishes incidence relation, It is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document, according to what is pre-established Format specification proofreads the format of XML document, according to the dictionary pre-established, proofreads to the participle of XML document, According to the standard database pre-established, the document information of XML document is proofreaded, according to the check and correction model constructed in advance, The body matter of XML document is proofreaded, the intelligence check and correction to XML document is realized, improves working efficiency, reduce cost.
Combine Fig. 2 incorporated by reference to returning, the disclosure provides a kind of XML document verifying unit 10, including checking module 11 and Memory module 12.
The memory module 12 is stored with the format specification pre-established, the dictionary pre-established, the standard pre-established Database and the check and correction model constructed in advance.
The checking module 11 is used for the reference mark according to the incidence relation of foundation to each document elements in XML document Knowledge is proofreaded;The incidence relation includes the reference identification of each document elements and the document element in the XML document Corresponding relationship and the document element reference identification reference sequence.
The checking module 11 is used to proofread the format of the XML document according to the format specification pre-established.
The checking module 11 is used to proofread the participle of the XML document according to the dictionary pre-established.
The checking module 11 is used for according to the standard database that pre-establishes, to the document information of the XML document into Row check and correction.
The checking module 11 is used to carry out the body matter of the XML document according to the check and correction model constructed in advance Check and correction.
It is apparent to those skilled in the art that for convenience and simplicity of description, the XML text of foregoing description The specific work process of shelves verifying unit 10 no longer can excessively repeat herein with reference to the corresponding process in preceding method.
To sum up, XML document proofreading method and device that the disclosure provides are handled using structuring processing and fragmentation, are built Vertical incidence relation, is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document, according to The format specification pre-established proofreads the format of XML document, according to the dictionary pre-established, to the participle of XML document It is proofreaded, according to the standard database pre-established, the document information of XML document is proofreaded, according to what is constructed in advance Model is proofreaded, the body matter of XML document is proofreaded, the intelligence check and correction to XML document is realized, the disclosure is provided XML document proofreading method and device are applied to journal article and proofread, and are greatly improved working efficiency, save time and reduction Cost.
In several embodiments provided by the disclosure, it should be understood that disclosed device and method can also pass through Other modes are realized.Device and method embodiment described above is only schematical, for example, the flow chart in attached drawing The device of multiple embodiments according to the disclosure, the system in the cards of method and computer program product are shown with block diagram Framework, function and operation.In this regard, each box in flowchart or block diagram can represent a module, program segment or generation A part of code, a part of the module, section or code include one or more for realizing defined logic function Executable instruction.It should also be noted that function marked in the box can also be in some implementations as replacement Occur different from the sequence marked in attached drawing.For example, two continuous boxes can actually be basically executed in parallel, they Sometimes it can also execute in the opposite order, this depends on the function involved.It is also noted that block diagram and or flow chart In each box and the box in block diagram and or flow chart combination, can function or movement as defined in executing it is special Hardware based system is realized, or can be realized using a combination of dedicated hardware and computer instructions.
In addition, each functional module in each embodiment of the disclosure can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the disclosure is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, electronic equipment or network equipment etc.) execute all or part of step of each embodiment the method for the disclosure Suddenly.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), deposits at random The various media that can store program code such as access to memory (RAM, Random Access Memory), magnetic or disk. It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device. In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Process, method, article or equipment in there is also other identical elements.
The foregoing is merely the alternative embodiments of the disclosure, are not limited to the disclosure, for the skill of this field For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.

Claims (10)

1. a kind of XML document proofreading method, which is characterized in that the described method includes:
It is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document;The association is closed System includes the corresponding relationship and the document member of the reference identification of each document elements and the document element in the XML document The reference sequence of the reference identification of element;
According to the format specification pre-established, the format of the XML document is proofreaded;
According to the dictionary pre-established, the participle of the XML document is proofreaded;
According to the standard database pre-established, the document information of the XML document is proofreaded;
According to the check and correction model constructed in advance, the body matter of the XML document is proofreaded.
2. XML document proofreading method according to claim 1, which is characterized in that the incidence relation pair according to foundation The step of reference identification of each document elements in XML document is proofreaded include:
According to pair of the reference identification of each document elements and the document element in the XML document in the incidence relation It should be related to, each reference identification in the XML document be detected corresponding with the presence or absence of the reference identification in the XML document Document elements;Corresponding document elements if it does not exist then carry out check and correction prompt;
According to the reference of the reference identification of each document elements in the XML document in incidence relation sequence, to institute Whether the reference sequence for stating the reference identification that each document elements in XML document detect the document element is correct;If incorrect, Then carry out check and correction prompt.
3. XML document proofreading method according to claim 2, which is characterized in that the document elements include text, figure, Table, formula and bibliography;In the reference according to each document elements in the XML document in the incidence relation The reference sequence of mark, the reference for the reference identification for detecting the document element to each document elements in the XML document are suitable After whether sequence is correct, the method also includes:
The inquiry of Digital Object Unique Identifier DOI de-parsing is carried out to each bibliography in the XML document, is searched each The Digital Object Unique Identifier DOI of bibliography;
The Digital Object Unique Identifier DOI of each bibliography is carried out to repeat check and correction, is judged in the XML document Whether the Digital Object Unique Identifier DOI of all bibliography repeats;Duplicate digital object is uniquely marked if it exists Know symbol DOI, then carries out check and correction prompt.
4. XML document proofreading method according to claim 1, which is characterized in that the format specification include relationship detection, Matching detection and the detection of non-null value item;The format specification that the basis pre-establishes carries out school to the format of the XML document Pair step include:
According to the format specification pre-established, detection is scanned to the XML Metadata of the XML document;
Judge whether there is at least one of null value item, relationship detection exception and matching detection exception, and if it exists, then carry out Check and correction prompt.
5. XML document proofreading method according to claim 1, which is characterized in that the dictionary includes segmenting dictionary, mistake not Character/word library and sensitivity/disabling dictionary;The dictionary that the basis pre-establishes proofreads the participle of the XML document Step includes:
Based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, detection is scanned to the XML document;
Judge whether there is wrong word/word and/or sensitivity/stop word, and if it exists, then carry out check and correction prompt.
6. XML document proofreading method according to claim 5, which is characterized in that other based on the participle dictionary, mistake Character/word library and sensitivity/disabling dictionary, after being scanned detection to the XML document, the method also includes:
Judge whether there is syntax error, and if it exists, then carry out check and correction prompt.
7. according to the described in any item XML document proofreading methods of claim 2-6, which is characterized in that after carrying out check and correction prompt, The method also includes:
It according to setting condition, is prompted based on the check and correction, modify to the XML document or sets check and correction prompt to This document is ignored or is not reresented.
8. XML document proofreading method according to claim 1, which is characterized in that the document information includes authors' working unit And fund project information;
The standard database that the basis pre-establishes, the step of proofreading to the document information include:
Based on the standard database, authors' working unit and fund project information to the XML document carry out similitude detection; The standard database is previously stored with multiple standardized units and fund project information;
Authors' working unit and the fund project information similarity that the XML document is filtered out from the standard database are highest Information;
Check and correction is carried out according to authors' working unit and fund project information of the highest information of the similarity to the XML document to mention Show.
9. XML document proofreading method according to claim 1, which is characterized in that the check and correction model passes through following steps Building:
For each XML document, retain the XML document record of editor with modification, and by the record and the XML document into Row association, and as sample;
Multiple samples are trained, check and correction model is obtained, is proofreaded with the body matter to the XML document.
10. a kind of XML document verifying unit, which is characterized in that including checking module and memory module;
The memory module is stored with the format specification pre-established, the dictionary pre-established, the standard database pre-established And the check and correction model constructed in advance;
The checking module is used to carry out the reference identification of each document elements in XML document according to the incidence relation of foundation Check and correction;The incidence relation includes that each document elements in the XML document are corresponding with the reference identification of the document element The reference sequence of relationship and the reference identification of the document element;
The checking module is used to proofread the format of the XML document according to the format specification pre-established;
The checking module is used to proofread the participle of the XML document according to the dictionary pre-established;
The checking module is used to proofread the document information of the XML document according to the standard database pre-established;
The checking module is used to proofread the body matter of the XML document according to the check and correction model constructed in advance.
CN201910013644.7A 2019-01-07 2019-01-07 XML document proofreading method and device Pending CN109670092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910013644.7A CN109670092A (en) 2019-01-07 2019-01-07 XML document proofreading method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910013644.7A CN109670092A (en) 2019-01-07 2019-01-07 XML document proofreading method and device

Publications (1)

Publication Number Publication Date
CN109670092A true CN109670092A (en) 2019-04-23

Family

ID=66150213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910013644.7A Pending CN109670092A (en) 2019-01-07 2019-01-07 XML document proofreading method and device

Country Status (1)

Country Link
CN (1) CN109670092A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334333A (en) * 2019-06-18 2019-10-15 中国平安财产保险股份有限公司 A kind of information amending method and relevant apparatus
CN110990593A (en) * 2019-12-17 2020-04-10 北大方正集团有限公司 Method and device for detecting reference falling space
CN113986968A (en) * 2021-10-22 2022-01-28 广西电网有限责任公司 Scheme intelligent proofreading method based on electric power standard standardization datamation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101952802A (en) * 2007-06-21 2011-01-19 汤姆森路透社全球资源公司 Method and system for author and publisher's checking list of references
CN102799569A (en) * 2011-05-27 2012-11-28 汉王科技股份有限公司 Method and device for checking electronic publication (EPUB) document
CN105095184A (en) * 2015-06-11 2015-11-25 周连惠 Method for spelling and grammar proofreading of text document
CN106326193A (en) * 2015-06-18 2017-01-11 北京大学 Footnote identification method and footnote and footnote citation association method in fixed-layout document
CN106970749A (en) * 2017-02-06 2017-07-21 广东小天才科技有限公司 A kind of writing method and device based on mobile terminal
CN107463666A (en) * 2017-08-02 2017-12-12 成都德尔塔信息科技有限公司 A kind of filtering sensitive words method based on content of text
CN108052490A (en) * 2017-12-29 2018-05-18 北京仁和汇智信息技术有限公司 A kind of online methodology of composition of XML papers and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101952802A (en) * 2007-06-21 2011-01-19 汤姆森路透社全球资源公司 Method and system for author and publisher's checking list of references
CN102799569A (en) * 2011-05-27 2012-11-28 汉王科技股份有限公司 Method and device for checking electronic publication (EPUB) document
CN105095184A (en) * 2015-06-11 2015-11-25 周连惠 Method for spelling and grammar proofreading of text document
CN106326193A (en) * 2015-06-18 2017-01-11 北京大学 Footnote identification method and footnote and footnote citation association method in fixed-layout document
CN106970749A (en) * 2017-02-06 2017-07-21 广东小天才科技有限公司 A kind of writing method and device based on mobile terminal
CN107463666A (en) * 2017-08-02 2017-12-12 成都德尔塔信息科技有限公司 A kind of filtering sensitive words method based on content of text
CN108052490A (en) * 2017-12-29 2018-05-18 北京仁和汇智信息技术有限公司 A kind of online methodology of composition of XML papers and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
侯修洲等: "基于VBA的Word文档XML结构化标记方法", 《编辑学报》 *
侯修洲等: "基于逻辑原则的科技论文自动校对方法", 《中国科技期刊研究》 *
卓利艳: "字词级中文文本自动校对的方法研究", 《万方数据库》 *
张涛: "中文文本自动校对系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334333A (en) * 2019-06-18 2019-10-15 中国平安财产保险股份有限公司 A kind of information amending method and relevant apparatus
CN110334333B (en) * 2019-06-18 2023-08-25 中国平安财产保险股份有限公司 Information modification method and related device
CN110990593A (en) * 2019-12-17 2020-04-10 北大方正集团有限公司 Method and device for detecting reference falling space
CN110990593B (en) * 2019-12-17 2023-09-19 新方正控股发展有限责任公司 Citation falling empty detection method and device
CN113986968A (en) * 2021-10-22 2022-01-28 广西电网有限责任公司 Scheme intelligent proofreading method based on electric power standard standardization datamation
CN113986968B (en) * 2021-10-22 2022-09-16 广西电网有限责任公司 Scheme intelligent proofreading method based on electric power standard standardization datamation

Similar Documents

Publication Publication Date Title
US10049096B2 (en) System and method of template creation for a data extraction tool
EP3318978A1 (en) System and method for semantic analysis of speech
US20030140311A1 (en) Method for content mining of semi-structured documents
US20060285746A1 (en) Computer assisted document analysis
CN109670092A (en) XML document proofreading method and device
Kiefer Assessing the Quality of Unstructured Data: An Initial Overview.
CN116244410B (en) Index data analysis method and system based on knowledge graph and natural language
CN108153728B (en) Keyword determination method and device
CN107423738B (en) Test paper subject positioning method and device based on template matching
CN106372232B (en) Information mining method and device based on artificial intelligence
JP5629976B2 (en) Patent specification evaluation / creation work support apparatus, method and program
CN114911999A (en) Name matching method and device
CN110688315A (en) Interface code detection report generation method, electronic device, and storage medium
CN111158973B (en) Web application dynamic evolution monitoring method
US20090327210A1 (en) Advanced book page classification engine and index page extraction
US9530070B2 (en) Text parsing in complex graphical images
CN110309258B (en) Input checking method, server and computer readable storage medium
CN114462383B (en) Method, system, storage medium and equipment for obtaining design specification of building drawing
CN106528506B (en) A kind of data processing method based on XML tag, device and terminal device
CN114220113A (en) Paper quality detection method, device and equipment
CN114154480A (en) Information extraction method, device, equipment and storage medium
CN109213830B (en) Document retrieval system for professional technical documents
KR101945234B1 (en) Method for Searching Semiconductor Parts Using Algorithm of Eliminating Last Alphabet
CN108255887B (en) Method and device for verifying industry text
CN113722421A (en) Contract auditing method and system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190423

RJ01 Rejection of invention patent application after publication