CN109670092A - XML document proofreading method and device - Google Patents
XML document proofreading method and device Download PDFInfo
- Publication number
- CN109670092A CN109670092A CN201910013644.7A CN201910013644A CN109670092A CN 109670092 A CN109670092 A CN 109670092A CN 201910013644 A CN201910013644 A CN 201910013644A CN 109670092 A CN109670092 A CN 109670092A
- Authority
- CN
- China
- Prior art keywords
- xml document
- document
- check
- xml
- correction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The disclosure provides a kind of XML document proofreading method and device, is related to document check and correction technical field.The XML document proofreading method and device that the disclosure provides, it is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document, according to the format specification pre-established, the format of XML document is proofreaded, according to the dictionary pre-established, the participle of XML document is proofreaded, according to the standard database pre-established, the document information of XML document is proofreaded, according to the check and correction model constructed in advance, the body matter of XML document is proofreaded, the intelligence check and correction to XML document is realized, improves work efficiency.
Description
Technical field
This disclosure relates to which document proofreads technical field, in particular to a kind of XML document proofreading method and device.
Background technique
During many articles are published, traditional " three examine three schools " system is continued to use always, but extreme portions are to beat paper original text
Print off and, by editorial staff or press corrector, to article carry out from the beginning to the end read over processing, and the school for passing through national standard
It modifies annotation to symbol, can take a substantial amount of time and energy, efficiency are lower.
Summary of the invention
In view of this, the disclosure provides a kind of XML document proofreading method and device.
A kind of XML document proofreading method that the disclosure provides, which comprises
It is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document;The pass
Connection relationship includes the corresponding relationship and this article of the reference identification of each document elements and the document element in the XML document
The reference sequence of the reference identification of shelves element.
According to the format specification pre-established, the format of the XML document is proofreaded.
According to the dictionary pre-established, the participle of the XML document is proofreaded.
According to the standard database pre-established, the document information of the XML document is proofreaded.
According to the check and correction model constructed in advance, the body matter of the XML document is proofreaded.
Further, the incidence relation according to foundation to the reference identifications of each document elements in XML document into
Row check and correction the step of include:
According to the reference identification of each document elements and the document element in the XML document in the incidence relation
Corresponding relationship, detecting to each reference identification in the XML document whether there is the reference identification pair in the XML document
The document elements answered;Corresponding document elements if it does not exist then carry out check and correction prompt.
Sequentially according to the reference of the reference identification of each document elements in the XML document in the incidence relation,
Whether the reference sequence for detecting the reference identification of the document element to each document elements in the XML document is correct;If no
Correctly, then check and correction prompt is carried out.
Further, the document elements include text, figure, table, formula and bibliography;According to the association
The reference sequence of the reference identification of each document elements in the XML document in relationship, to each of described XML document
Document elements detect the reference identification of the document element reference sequence it is whether correct after, the method also includes:
The inquiry of Digital Object Unique Identifier DOI de-parsing is carried out to each bibliography in the XML document, is searched
The Digital Object Unique Identifier DOI of each bibliography.
The Digital Object Unique Identifier DOI of each bibliography is carried out to repeat check and correction, judges the XML document
In the Digital Object Unique Identifier DOI of all bibliography whether repeat;Duplicate digital object is only if it exists
One identifier DOI, then carry out check and correction prompt.
Further, the format specification includes relationship detection, matching detection and the detection of non-null value item;The basis is pre-
The format specification first formulated, the step of proofreading to the format of the XML document include:
According to the format specification pre-established, detection is scanned to the XML Metadata of the XML document.
Judge whether there is at least one of null value item, relationship detection exception and matching detection exception, and if it exists, then
Carry out check and correction prompt.
Further, the dictionary includes participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary;The basis is pre-
The dictionary first formulated, the step of proofreading to the participle of the XML document include:
Based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, inspection is scanned to the XML document
It surveys.
Judge whether there is wrong word/word and/or sensitivity/stop word, and if it exists, then carry out check and correction prompt.
Further, it is being based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML text
After shelves are scanned detection, the method also includes:
Judge whether there is syntax error, and if it exists, then carry out check and correction prompt.
Further, after carrying out check and correction prompt, the method also includes:
It according to setting condition, is prompted based on the check and correction, modify to the XML document or sets check and correction prompt
This document is set to ignore or do not reresent.
Further, the document information includes authors' working unit and fund project information;
The standard database that the basis pre-establishes, the step of proofreading to the document information include:
Based on the standard database, authors' working unit and fund project information to the XML document carry out similitude
Detection;The standard database is previously stored with multiple standardized units and fund project information.
The authors' working unit and fund project information similarity that the XML document is filtered out from the standard database are most
High information.
School is carried out according to authors' working unit and fund project information of the highest information of the similarity to the XML document
To prompt.
Further, the check and correction model is constructed by following steps:
For each XML document, to the record of XML document reservation editor and modification, and the record and the XML is literary
Shelves are associated, and as sample.
Multiple samples are trained, check and correction model is obtained, school is carried out with the body matter to the XML document
It is right.
The disclosure provides a kind of XML document verifying unit, including checking module and memory module.
The memory module is stored with the format specification pre-established, the dictionary pre-established, the criterion numeral pre-established
The check and correction model constructed according to library and in advance.
The checking module is used for the reference identification according to the incidence relation of foundation to each document elements in XML document
It is proofreaded;The incidence relation includes the reference identification of each document elements and the document element in the XML document
The reference sequence of the reference identification of corresponding relationship and the document element.
The checking module is used to proofread the format of the XML document according to the format specification pre-established.
The checking module is used to proofread the participle of the XML document according to the dictionary pre-established.
The checking module is used to carry out the document information of the XML document according to the standard database pre-established
Check and correction.
The checking module is used to carry out school to the body matter of the XML document according to the check and correction model constructed in advance
It is right.
The XML document proofreading method and device that the disclosure provides, according to the incidence relation of foundation to each of XML document
The reference identification of document elements is proofreaded, and according to the format specification pre-established, is proofreaded to the format of XML document, root
According to the dictionary pre-established, the participle of XML document is proofreaded, according to the standard database pre-established, to XML document
Document information is proofreaded, and according to the check and correction model constructed in advance, is proofreaded to the body matter of XML document, is realized pair
The intelligence check and correction of XML document has been saved the time, has been improved work efficiency compared with artificial check and correction, and reduce manually at
This.
To enable the above objects, features, and advantages of the disclosure to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the disclosure, letter will be made to attached drawing needed in the embodiment below
It singly introduces, it should be understood that the following drawings illustrates only some embodiments of the disclosure, therefore is not construed as to range
It limits, it for those of ordinary skill in the art, without creative efforts, can also be according to these attached drawings
Obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided by the disclosure.
Fig. 2 is a kind of block diagram of XML document verifying unit provided by the disclosure.
Fig. 3 is a kind of flow diagram of XML document proofreading method provided by the disclosure.
Fig. 4 is another flow diagram of XML document proofreading method provided by the disclosure.
Fig. 5 is a kind of application schematic diagram of XML document proofreading method provided by the disclosure.
Fig. 6 is another application schematic diagram of XML document proofreading method provided by the disclosure.
Fig. 7 is another flow diagram of XML document proofreading method provided by the disclosure.
Fig. 8 is another flow diagram of XML document proofreading method provided by the disclosure.
Fig. 9 is another flow diagram of XML document proofreading method provided by the disclosure.
Figure 10 is another application schematic diagram of XML document proofreading method provided by the disclosure.
Figure 11 is another application schematic diagram of XML document proofreading method provided by the disclosure.
Figure 12 is another flow diagram of XML document proofreading method provided by the disclosure.
Figure 13 is another flow diagram of XML document proofreading method provided by the disclosure.
Icon: 100- electronic equipment;10-XML document verifying unit;11- checking module;12- memory module;20- storage
Device;30- processor;40- communication unit.
Specific embodiment
Below in conjunction with attached drawing in the disclosure, the technical solution in the disclosure is clearly and completely described, it is clear that
Described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Usually retouched in attached drawing here
The component for the disclosure stated and shown can be arranged and be designed with a variety of different configurations.Therefore, below to mentioning in the accompanying drawings
The detailed description of the embodiment of the disclosure of confession is not intended to limit claimed the scope of the present disclosure, but is merely representative of this
Disclosed selected embodiment.Based on embodiment of the disclosure, those skilled in the art are in the premise for not making creative work
Under every other embodiment obtained, belong to the disclosure protection range.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
During many articles are published, traditional " three examine three schools " system is continued to use always, and extreme portions are by paper original text
It prints, by editorial staff or press corrector, processing of reading over from the beginning to the end is carried out to article, and pass through national standard
Proof-reader's marks are modified annotation.
Although currently, there is many periodical guarantees journal article pdf document can be carried out text by unexpected rival's critique system
The check and correction of content detects, but also for the check and correction of wrong word, it is still desirable to editor 's proofreading personnel directly repair pdf document
Change correction process, remains using the check and correction knowledge in editor or press corrector's brains, high labor cost, expend time and essence
Power, and it is easy to appear omission, cause check and correction not comprehensive.
Based on the studies above, the disclosure provides a kind of XML document proofreading method and device, to improve the above problem.
Fig. 1 is please referred to, the output method for the Word text that the disclosure provides is applied to electronic equipment shown in FIG. 1
100.XML document proofreading method provided by the disclosure is executed as the electronic equipment 100.In the disclosure, the electronics is set
Standby 100 may be, but not limited to, PC (Personal Computer, PC), laptop, personal digital assistant
(Personal Digital Assistant, PDA) or server etc. have the electronic equipment 100 of processing capacity.
The electronic equipment 100 includes XML document verifying unit 10 shown in Fig. 2, memory 20, processor 30 and leads to
Believe unit 40;The memory 20, processor 30 and each element of communication unit 40 are directly or indirectly electrical between each other
Connection, to realize the transmission or interaction of data.For example, these elements mutually can directly pass through one or more communication bus or letter
Number line, which is realized, to be electrically connected.The XML document verifying unit 10 includes that at least one can be with software or firmware (Firmware)
Form is stored in the software function module in the memory 20, and the processor 30 is stored in memory 20 by operation
Software program and module, thereby executing various function application and data processing.
The memory 20 may be, but not limited to, random access memory (Random AccessMemory, RAM),
Read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only
Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM),
Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
The processor 30 can be a kind of IC chip, the processing capacity with signal.The processor 30 can
To be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network
Processor, NP) etc..
The communication unit 40 is used for the communication established between the electronic equipment 100 and other external equipments by network
Connection, and carried out data transmission by the network.
Fig. 3 is please referred to, Fig. 3 is the flow diagram of XML document proofreading method provided by the disclosure.Below to figure
The detailed process of XML document proofreading method shown in 3 is described in detail.
Step S10: school is carried out according to reference identification of the incidence relation of foundation to each document elements in XML document
It is right.
Wherein, the incidence relation includes the reference mark of each document elements and the document element in the XML document
The reference sequence of the reference identification of the corresponding relationship and the document element of knowledge.
In the disclosure, fragmentation processing is carried out to non-structured document (such as Word document), according to unstructured text
Each document elements in shelves divide non-structured document, the document data block after being divided, wherein unstructured
The corresponding document data block of each document elements in document;Structuring processing is carried out to the non-structured document after division,
Each the document data block obtained to division is converted to expandable mark language XML data, obtains XML document, meanwhile, to every
One document data block establishes the reference with the document data block according to the reference sequence of the reference identification of the document data block
The association of mark obtains incidence relation to get each document elements and the reference identification of the document element into XML document
The reference identification of corresponding relationship and the document element reference sequence, in turn, can be by the reference identification of document elements
The document element can be pin-pointed to.
Further, Fig. 4 is please referred to, the document elements include text, figure, table, formula and bibliography;
The step of incidence relation according to foundation proofreads the reference identification of each document elements in XML document includes step
Rapid S11 and step S12.
Step S11: according to each document elements and the document element in the XML document in the incidence relation
The corresponding relationship of reference identification detects in the XML document each reference identification in the XML document and draws with the presence or absence of this
With the corresponding document elements of mark.
Wherein, according in the XML document each document elements and the document element reference identification corresponding relationship,
Detection is scanned to each reference identification in the XML document, to each reference identification detect in the XML document whether
There are the corresponding document elements of the reference identification, and corresponding document elements, then carry out check and correction prompt if it does not exist.
For example, please referring to Fig. 5, Fig. 5 is a kind of schematic diagram of this method in practical applications, the reference mark in Fig. 5
Knowledge has [16], table 1 and Fig. 1, i.e. arrow meaning in Fig. 5;In the disclosure, the corresponding document elements of reference identification [16] are ginseng
Document is examined, the corresponding document elements of reference identification table 1 are table, and the corresponding document elements of reference identification Fig. 1 are attached drawing;To reference
Mark [16] is detected, and is inquired in the XML document with the presence or absence of reference identification [16] corresponding bibliography, and if it exists,
The detection of next reference identification is skipped to, if it does not exist, carries out check and correction prompt, indicates that there is no reference marks in the XML document
Know [16] corresponding bibliography;Reference identification Fig. 1 is detected, is inquired in the XML document with the presence or absence of reference identification
The corresponding attached drawing of Fig. 1 carries out check and correction prompt if it does not exist, indicates that there is no reference identification Fig. 1 is corresponding in the XML document
Attached drawing;If it exists, the detection of next reference identification is skipped to, until reference identification all in the XML document has all detected
It is complete.
Step S12: according to drawing for the reference identification of each document elements in the XML document in the incidence relation
With sequence, whether just the reference sequence of the reference identification of the document element is detected to each document elements in the XML document
Really;If incorrect, check and correction prompt is carried out.
Wherein, according to the reference of the reference identification of each document elements in the XML document in the incidence relation
Sequentially, detection is scanned to each reference identification in the XML document, to each document elements in the XML document
Whether the reference sequence for detecting the reference identification of the document element is correct, if incorrect, carries out check and correction prompt.
For example, please referring to Fig. 6, Fig. 6 is a kind of schematic diagram of this method in practical applications, the reference mark in Fig. 6
Knowledge has [5], [6-8] and [9,10], i.e. arrow meaning in Fig. 6, reference identification [6-8] includes reference identification [6], reference in figure
Identify [7] and reference identification [8];When being detected to reference identification [6-8], if reference identification [6] is for the first time in Fig. 6
Arrow be directed toward the position of reference identification [6-8] and be cited, and reference identification [7] is before reference identification [6] is cited
And it is cited, then it represents that the reference sequence of reference identification [7] is incorrect, check and correction prompt is carried out, if correctly, skipping to next draw
With the detection of the reference sequence of mark, until the reference sequence of reference identification all in the XML document has all detected.Again
For example, reference identification is Fig. 1, Fig. 2, Fig. 3, if Fig. 2 is cited before Fig. 1 is cited for the first time, then it represents that the reference of Fig. 2
It is sequentially incorrect, check and correction prompt is carried out, if correctly, skipping to the detection of the reference sequence of next reference identification, until described
The reference sequence of all reference identifications has all detected in XML document.
Further, Fig. 7 is please referred to, the document elements include text, figure, table, formula and bibliography;
In the reference sequence according to the reference identifications of each document elements in the XML document in the incidence relation, to described
Each document elements in XML document detect the reference identification of the document element reference sequence it is whether correct after, the method
It further include step S13 to step S14.
Step S13: Digital Object Unique Identifier DOI de-parsing is carried out to each bibliography in the XML document
Inquiry, searches the Digital Object Unique Identifier DOI of each bibliography.
Wherein, for bibliography, each bibliography has only one Digital Object Unique Identifier
(Digital Object Unique Identifier, DOI), according to DOI, can inquire the corresponding bibliography of the DOI,
It is right after whether the reference sequence for detecting the reference identification of the document element to each document elements in the XML document is correct
Each bibliography in the XML document carries out the inquiry of Digital Object Unique Identifier DOI de-parsing, searches each with reference to text
The DOI offered.
For each bibliography in the XML document, according to the name of the bibliography, author, publication time etc.
Information carries out the inquiry of DOI de-parsing to the bibliography, searches the DOI of the bibliography, for example, carrying out to bibliography [X]
The DOI of the inquiry of DOI de-parsing, the bibliography [X] found is xxxxxxxx, after the DOI for finding bibliography [X],
If check to bibliography [X], user only needs to click the DOI of bibliography [X], can jump to corresponding net
On standing, bibliography [X] is checked.
Step S14: the Digital Object Unique Identifier DOI of each bibliography is carried out to repeat check and correction, judges institute
Whether the Digital Object Unique Identifier DOI for stating all bibliography in XML document repeats.Wherein, it weighs if it exists
Multiple Digital Object Unique Identifier DOI, carries out check and correction prompt.
Wherein, the inquiry of DOI de-parsing is being carried out to each bibliography in the XML document, is finding each reference
After the DOI of document, the DOI of each bibliography is carried out to repeat check and correction, be judged all described in the XML document
The DOI of bibliography whether there is duplicate DOI, and if it exists, then carry out check and correction prompt.
For example, it is assumed that the DOI of the bibliography [X] and bibliography [Y] that find be it is duplicate, then determine bibliography
Misquotation, and carry out check and correction prompt.
Further, turning back to Fig. 3 is combined, the step of XML document proofreading method further includes step S20.
S20: according to the format specification pre-established, the format of the XML document is proofreaded.
Wherein, optionally, in the disclosure, the XML document be periodical document to be proofreaded, the format specification according to
Periodical Publisking standard, is pre-established, and the format specification includes relationship detection, matching detection and the inspection of non-null value item
It surveys.
Further, Fig. 8, the format specification that the basis pre-establishes, to the format of the XML document are please referred to
The step of being proofreaded includes step S21 to step S22.
Step S21: according to the format specification pre-established, detection is scanned to the XML Metadata of the XML document.
Wherein, after having pre-established format specification according to periodical Publisking standard, according to the format specification pre-established
Detection is scanned to the XML Metadata of the XML document.In the disclosure, the XML document is periodical text to be proofreaded
Shelves, therefore, the XML document include abstract, keyword, employ the document datas such as date, Toll table and text, and in institute
It states in XML document, the XML Metadata is remainder data in addition to textual data, for example, abstract, keyword, employing the date
And the document datas such as Toll table.
Step S22: at least one of null value item, relationship detection exception and matching detection exception are judged whether there is.
Wherein, null value item, relationship detect at least one of exception and matching detection exception if it exists, carry out check and correction prompt.
Wherein, in the disclosure, the non-null value item detection indicates that the XML Metadata cannot be null value, that is, is directed to institute
The abstract in XML document is stated, keyword, the date is employed, Toll table, repairs back the XML Metadatas such as date, if detection obtains wherein
Any one metadata is null value, then determines that there are null value items, carry out check and correction prompt.For example, if detection obtains the XML document
In abstract part be null value, i.e., there is no abstract part, then be determined to have null value item, carry out check and correction prompt, alternatively, if inspection
Measuring the Toll table part in the XML document is null value, that is, Toll table part is not present, then determines to exist for null value
, carry out check and correction prompt.
For journal article, exists and employ date, Toll table, repair back the number of files such as date, start page and sign-off sheet
According to therefore, the relationship detection includes employing the date > repairing back the date > Toll table, sign-off sheet > starting in the XML document
Page etc., i.e., the Toll table in the described XML document, which cannot be later than, repairs back the date;Repairing back the date cannot be later than and employ the date, described
The sign-off sheet of XML document cannot be less than start page.It repairs back the date if the Toll table in the XML document is later than and/or repairs back
Date, which is later than, employs date and/or sign-off sheet less than start page, then predicting relation detection is abnormal, and carries out check and correction prompt.
In the disclosure, the matching detection include authors' name Chinese and the matching of English it is corresponding, topic Chinese and
The matching correspondence of English and the matching of multiple keywords are corresponding etc., if detection obtains any of them and cannot match correspondence,
Then determine matching detection exception, carry out check and correction prompt, for example, if detection obtains the Chinese of authors' name and English and cannot match pair
It answers, then determines matching detection exception, carry out check and correction prompt.
In the disclosure, according to the format specification pre-established, inspection is scanned to the XML Metadata of the XML document
After survey, as long as detection obtains carrying out school there are the one of which in null value item, relationship detection exception and matching detection exception
To prompt.
Optionally, in the disclosure, according to the format specification pre-established, the format of the XML document is proofreaded,
It further include the comparison to figure, the network edition of table and galley size in the XML document, if scheming in the XML document, the net of table
The size of network version size and galley is inconsistent, then carries out check and correction prompt.
Further, turning back to combining refering to 3, the step of XML document proofreading method further includes step S30.
Step S30: according to the dictionary pre-established, the participle of the XML document is proofreaded.
Wherein, the dictionary pre-established includes participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary;According to default
Condition, the dictionary pre-established will do it update, to increase or safeguard wrong word/dictionary, sensitivity/disabling character word stock.
For example, then according to predetermined period, being carried out more to the dictionary pre-established when the preset condition is predetermined period
Newly, to increase or safeguard wrong word/dictionary, sensitivity/disabling dictionary.
Further, Fig. 9 is please referred to, the dictionary that the basis pre-establishes carries out the participle of the XML document
The step of check and correction includes step S31 to step S32.
Step S31: be based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML document into
Row Scanning Detction.
Step S32: wrong word/word and/or sensitivity/stop word are judged whether there is, and if it exists, then carry out check and correction prompt.
Wherein, be based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML document into
After row Scanning Detction, wrong word/word or sensitivity/stop word are one of if it exists, then carry out check and correction prompt.
For example, please referring to Figure 10, Figure 10 is a kind of schematic diagram of this method in practical applications;In Figure 10, base
In the preparatory participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, " Provice " that detection obtains arrow meaning goes out
Therefore existing misspelling has carried out check and correction prompt thereunder.
Further, it is being based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, to the XML text
After shelves are scanned detection, the method also includes:
Judge whether there is syntax error, and if it exists, then carry out check and correction prompt.
Wherein, it is based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, the XML document is carried out
Scanning Detction judges whether there is syntax error, and if it exists, then carries out check and correction prompt.
For example, please referring to Figure 11, Figure 11 is a kind of schematic diagram of this method in practical applications;In Figure 11, base
In the preparatory participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, " completion " that detection obtains arrow meaning occurs
Therefore syntax error has carried out check and correction prompt thereunder.
Further, after carrying out check and correction prompt, the method also includes following steps.
It according to setting condition, is prompted based on the check and correction, modify to the XML document or sets check and correction prompt
This document is set to ignore or do not reresent.
Wherein, described impose a condition includes that this document is ignored, do not reresent, suggesting that modification and specified modification etc. are selected
, it is prompted according to check and correction, user can modify to the content in the XML document, or the check and correction is prompted to be arranged
Ignore for this document or does not reresent.
Further, turning back to combining refering to Fig. 3, the XML document proofreading method further includes step S40.
Step S40: according to the standard database pre-established, the document information of the XML document is proofreaded.
Wherein, the document information includes authors' working unit and fund project information.
Further, turning back in conjunction with refering to fig. 12, the standard database that the basis pre-establishes believes the document
The step of breath is proofreaded includes step S41 to step S43.
Step S41: being based on the standard database, authors' working unit and fund project information to the XML document into
The detection of row similitude.
Wherein, the standard database is previously stored with multiple standardized units and fund project information;It is described more
A standardized unit and fund project information are by the result data after accumulation check and correction or by the offer of official's unit, check and correction
After member's confirmation, storage storage is carried out, the multiple standardized unit and fund project information will be used as reference standard pair
The authors' working unit and fund project information of the XML document are detected.Optionally, in the disclosure, the normal data
Library can be updated according to preset condition, to increase or safeguard standardized unit and the fund in the standard database
Project information.
Step S42: the authors' working unit and fund project information of the XML document are filtered out from the standard database
The highest information of similarity.
Wherein, due to being previously stored with multiple standardized units and fund project information in the standard database,
Therefore, when the authors' working unit to the XML document and fund project information carry out similitude detection, from the standard database
In filter out the authors' working unit and the highest information of fund project information similarity of the XML document.
Step S43: believed according to authors' working unit and fund project of the highest information of the similarity to the XML document
Breath carries out check and correction prompt.
Wherein, in the authors' working unit and fund project information for filtering out the XML document from the standard database
After the highest information of similarity, the highest information of the similarity is shown, and carries out check and correction prompt, user can be according to aobvious
The information and check and correction prompt shown, authors' working unit and fund project information selection replaceability to the XML document cover, certainly
Definition modification or not.
Further, turning back to combining refering to Fig. 3, the XML document proofreading method further includes step S50.
Step S50: according to the check and correction model constructed in advance, the body matter of the XML document is proofreaded.
Further, Figure 13 is please referred to, the check and correction model is constructed by following steps.
Step S70: being directed to each XML document, retains the XML document record of editor with modification, and by the record
It is associated with the XML document, and as sample.
Wherein, for each XML document, retain user to the editor of the body matter of the document and the record of modification,
And be associated the record of editor and modification with the document, as sample.
Step S71: being trained multiple samples, check and correction model is obtained, with the body matter to the XML document
It is proofreaded.
Wherein, to each XML document, retain user to the editor of the body matter of the document and the record of modification, and
The record of editor and modification are associated with the document, after sample, obtain trained number by constantly accumulating sample
According to, the training data got is trained, check and correction model is obtained, it, then can be by XML document after obtaining check and correction model
Body matter proofreaded.
The XML document proofreading method that the disclosure provides is handled using structuring processing and fragmentation, establishes incidence relation,
It is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document, according to what is pre-established
Format specification proofreads the format of XML document, according to the dictionary pre-established, proofreads to the participle of XML document,
According to the standard database pre-established, the document information of XML document is proofreaded, according to the check and correction model constructed in advance,
The body matter of XML document is proofreaded, the intelligence check and correction to XML document is realized, improves working efficiency, reduce cost.
Combine Fig. 2 incorporated by reference to returning, the disclosure provides a kind of XML document verifying unit 10, including checking module 11 and
Memory module 12.
The memory module 12 is stored with the format specification pre-established, the dictionary pre-established, the standard pre-established
Database and the check and correction model constructed in advance.
The checking module 11 is used for the reference mark according to the incidence relation of foundation to each document elements in XML document
Knowledge is proofreaded;The incidence relation includes the reference identification of each document elements and the document element in the XML document
Corresponding relationship and the document element reference identification reference sequence.
The checking module 11 is used to proofread the format of the XML document according to the format specification pre-established.
The checking module 11 is used to proofread the participle of the XML document according to the dictionary pre-established.
The checking module 11 is used for according to the standard database that pre-establishes, to the document information of the XML document into
Row check and correction.
The checking module 11 is used to carry out the body matter of the XML document according to the check and correction model constructed in advance
Check and correction.
It is apparent to those skilled in the art that for convenience and simplicity of description, the XML text of foregoing description
The specific work process of shelves verifying unit 10 no longer can excessively repeat herein with reference to the corresponding process in preceding method.
To sum up, XML document proofreading method and device that the disclosure provides are handled using structuring processing and fragmentation, are built
Vertical incidence relation, is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document, according to
The format specification pre-established proofreads the format of XML document, according to the dictionary pre-established, to the participle of XML document
It is proofreaded, according to the standard database pre-established, the document information of XML document is proofreaded, according to what is constructed in advance
Model is proofreaded, the body matter of XML document is proofreaded, the intelligence check and correction to XML document is realized, the disclosure is provided
XML document proofreading method and device are applied to journal article and proofread, and are greatly improved working efficiency, save time and reduction
Cost.
In several embodiments provided by the disclosure, it should be understood that disclosed device and method can also pass through
Other modes are realized.Device and method embodiment described above is only schematical, for example, the flow chart in attached drawing
The device of multiple embodiments according to the disclosure, the system in the cards of method and computer program product are shown with block diagram
Framework, function and operation.In this regard, each box in flowchart or block diagram can represent a module, program segment or generation
A part of code, a part of the module, section or code include one or more for realizing defined logic function
Executable instruction.It should also be noted that function marked in the box can also be in some implementations as replacement
Occur different from the sequence marked in attached drawing.For example, two continuous boxes can actually be basically executed in parallel, they
Sometimes it can also execute in the opposite order, this depends on the function involved.It is also noted that block diagram and or flow chart
In each box and the box in block diagram and or flow chart combination, can function or movement as defined in executing it is special
Hardware based system is realized, or can be realized using a combination of dedicated hardware and computer instructions.
In addition, each functional module in each embodiment of the disclosure can integrate one independent portion of formation together
Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the disclosure is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, electronic equipment or network equipment etc.) execute all or part of step of each embodiment the method for the disclosure
Suddenly.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), deposits at random
The various media that can store program code such as access to memory (RAM, Random Access Memory), magnetic or disk.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability
Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including
Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device.
In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element
Process, method, article or equipment in there is also other identical elements.
The foregoing is merely the alternative embodiments of the disclosure, are not limited to the disclosure, for the skill of this field
For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair
Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.
Claims (10)
1. a kind of XML document proofreading method, which is characterized in that the described method includes:
It is proofreaded according to reference identification of the incidence relation of foundation to each document elements in XML document;The association is closed
System includes the corresponding relationship and the document member of the reference identification of each document elements and the document element in the XML document
The reference sequence of the reference identification of element;
According to the format specification pre-established, the format of the XML document is proofreaded;
According to the dictionary pre-established, the participle of the XML document is proofreaded;
According to the standard database pre-established, the document information of the XML document is proofreaded;
According to the check and correction model constructed in advance, the body matter of the XML document is proofreaded.
2. XML document proofreading method according to claim 1, which is characterized in that the incidence relation pair according to foundation
The step of reference identification of each document elements in XML document is proofreaded include:
According to pair of the reference identification of each document elements and the document element in the XML document in the incidence relation
It should be related to, each reference identification in the XML document be detected corresponding with the presence or absence of the reference identification in the XML document
Document elements;Corresponding document elements if it does not exist then carry out check and correction prompt;
According to the reference of the reference identification of each document elements in the XML document in incidence relation sequence, to institute
Whether the reference sequence for stating the reference identification that each document elements in XML document detect the document element is correct;If incorrect,
Then carry out check and correction prompt.
3. XML document proofreading method according to claim 2, which is characterized in that the document elements include text, figure,
Table, formula and bibliography;In the reference according to each document elements in the XML document in the incidence relation
The reference sequence of mark, the reference for the reference identification for detecting the document element to each document elements in the XML document are suitable
After whether sequence is correct, the method also includes:
The inquiry of Digital Object Unique Identifier DOI de-parsing is carried out to each bibliography in the XML document, is searched each
The Digital Object Unique Identifier DOI of bibliography;
The Digital Object Unique Identifier DOI of each bibliography is carried out to repeat check and correction, is judged in the XML document
Whether the Digital Object Unique Identifier DOI of all bibliography repeats;Duplicate digital object is uniquely marked if it exists
Know symbol DOI, then carries out check and correction prompt.
4. XML document proofreading method according to claim 1, which is characterized in that the format specification include relationship detection,
Matching detection and the detection of non-null value item;The format specification that the basis pre-establishes carries out school to the format of the XML document
Pair step include:
According to the format specification pre-established, detection is scanned to the XML Metadata of the XML document;
Judge whether there is at least one of null value item, relationship detection exception and matching detection exception, and if it exists, then carry out
Check and correction prompt.
5. XML document proofreading method according to claim 1, which is characterized in that the dictionary includes segmenting dictionary, mistake not
Character/word library and sensitivity/disabling dictionary;The dictionary that the basis pre-establishes proofreads the participle of the XML document
Step includes:
Based on the participle dictionary, wrong word/dictionary and sensitivity/disabling dictionary, detection is scanned to the XML document;
Judge whether there is wrong word/word and/or sensitivity/stop word, and if it exists, then carry out check and correction prompt.
6. XML document proofreading method according to claim 5, which is characterized in that other based on the participle dictionary, mistake
Character/word library and sensitivity/disabling dictionary, after being scanned detection to the XML document, the method also includes:
Judge whether there is syntax error, and if it exists, then carry out check and correction prompt.
7. according to the described in any item XML document proofreading methods of claim 2-6, which is characterized in that after carrying out check and correction prompt,
The method also includes:
It according to setting condition, is prompted based on the check and correction, modify to the XML document or sets check and correction prompt to
This document is ignored or is not reresented.
8. XML document proofreading method according to claim 1, which is characterized in that the document information includes authors' working unit
And fund project information;
The standard database that the basis pre-establishes, the step of proofreading to the document information include:
Based on the standard database, authors' working unit and fund project information to the XML document carry out similitude detection;
The standard database is previously stored with multiple standardized units and fund project information;
Authors' working unit and the fund project information similarity that the XML document is filtered out from the standard database are highest
Information;
Check and correction is carried out according to authors' working unit and fund project information of the highest information of the similarity to the XML document to mention
Show.
9. XML document proofreading method according to claim 1, which is characterized in that the check and correction model passes through following steps
Building:
For each XML document, retain the XML document record of editor with modification, and by the record and the XML document into
Row association, and as sample;
Multiple samples are trained, check and correction model is obtained, is proofreaded with the body matter to the XML document.
10. a kind of XML document verifying unit, which is characterized in that including checking module and memory module;
The memory module is stored with the format specification pre-established, the dictionary pre-established, the standard database pre-established
And the check and correction model constructed in advance;
The checking module is used to carry out the reference identification of each document elements in XML document according to the incidence relation of foundation
Check and correction;The incidence relation includes that each document elements in the XML document are corresponding with the reference identification of the document element
The reference sequence of relationship and the reference identification of the document element;
The checking module is used to proofread the format of the XML document according to the format specification pre-established;
The checking module is used to proofread the participle of the XML document according to the dictionary pre-established;
The checking module is used to proofread the document information of the XML document according to the standard database pre-established;
The checking module is used to proofread the body matter of the XML document according to the check and correction model constructed in advance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013644.7A CN109670092A (en) | 2019-01-07 | 2019-01-07 | XML document proofreading method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013644.7A CN109670092A (en) | 2019-01-07 | 2019-01-07 | XML document proofreading method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109670092A true CN109670092A (en) | 2019-04-23 |
Family
ID=66150213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910013644.7A Pending CN109670092A (en) | 2019-01-07 | 2019-01-07 | XML document proofreading method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670092A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334333A (en) * | 2019-06-18 | 2019-10-15 | 中国平安财产保险股份有限公司 | A kind of information amending method and relevant apparatus |
CN110990593A (en) * | 2019-12-17 | 2020-04-10 | 北大方正集团有限公司 | Method and device for detecting reference falling space |
CN113986968A (en) * | 2021-10-22 | 2022-01-28 | 广西电网有限责任公司 | Scheme intelligent proofreading method based on electric power standard standardization datamation |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101952802A (en) * | 2007-06-21 | 2011-01-19 | 汤姆森路透社全球资源公司 | Method and system for author and publisher's checking list of references |
CN102799569A (en) * | 2011-05-27 | 2012-11-28 | 汉王科技股份有限公司 | Method and device for checking electronic publication (EPUB) document |
CN105095184A (en) * | 2015-06-11 | 2015-11-25 | 周连惠 | Method for spelling and grammar proofreading of text document |
CN106326193A (en) * | 2015-06-18 | 2017-01-11 | 北京大学 | Footnote identification method and footnote and footnote citation association method in fixed-layout document |
CN106970749A (en) * | 2017-02-06 | 2017-07-21 | 广东小天才科技有限公司 | A kind of writing method and device based on mobile terminal |
CN107463666A (en) * | 2017-08-02 | 2017-12-12 | 成都德尔塔信息科技有限公司 | A kind of filtering sensitive words method based on content of text |
CN108052490A (en) * | 2017-12-29 | 2018-05-18 | 北京仁和汇智信息技术有限公司 | A kind of online methodology of composition of XML papers and device |
-
2019
- 2019-01-07 CN CN201910013644.7A patent/CN109670092A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101952802A (en) * | 2007-06-21 | 2011-01-19 | 汤姆森路透社全球资源公司 | Method and system for author and publisher's checking list of references |
CN102799569A (en) * | 2011-05-27 | 2012-11-28 | 汉王科技股份有限公司 | Method and device for checking electronic publication (EPUB) document |
CN105095184A (en) * | 2015-06-11 | 2015-11-25 | 周连惠 | Method for spelling and grammar proofreading of text document |
CN106326193A (en) * | 2015-06-18 | 2017-01-11 | 北京大学 | Footnote identification method and footnote and footnote citation association method in fixed-layout document |
CN106970749A (en) * | 2017-02-06 | 2017-07-21 | 广东小天才科技有限公司 | A kind of writing method and device based on mobile terminal |
CN107463666A (en) * | 2017-08-02 | 2017-12-12 | 成都德尔塔信息科技有限公司 | A kind of filtering sensitive words method based on content of text |
CN108052490A (en) * | 2017-12-29 | 2018-05-18 | 北京仁和汇智信息技术有限公司 | A kind of online methodology of composition of XML papers and device |
Non-Patent Citations (4)
Title |
---|
侯修洲等: "基于VBA的Word文档XML结构化标记方法", 《编辑学报》 * |
侯修洲等: "基于逻辑原则的科技论文自动校对方法", 《中国科技期刊研究》 * |
卓利艳: "字词级中文文本自动校对的方法研究", 《万方数据库》 * |
张涛: "中文文本自动校对系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334333A (en) * | 2019-06-18 | 2019-10-15 | 中国平安财产保险股份有限公司 | A kind of information amending method and relevant apparatus |
CN110334333B (en) * | 2019-06-18 | 2023-08-25 | 中国平安财产保险股份有限公司 | Information modification method and related device |
CN110990593A (en) * | 2019-12-17 | 2020-04-10 | 北大方正集团有限公司 | Method and device for detecting reference falling space |
CN110990593B (en) * | 2019-12-17 | 2023-09-19 | 新方正控股发展有限责任公司 | Citation falling empty detection method and device |
CN113986968A (en) * | 2021-10-22 | 2022-01-28 | 广西电网有限责任公司 | Scheme intelligent proofreading method based on electric power standard standardization datamation |
CN113986968B (en) * | 2021-10-22 | 2022-09-16 | 广西电网有限责任公司 | Scheme intelligent proofreading method based on electric power standard standardization datamation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10049096B2 (en) | System and method of template creation for a data extraction tool | |
EP3318978A1 (en) | System and method for semantic analysis of speech | |
US20030140311A1 (en) | Method for content mining of semi-structured documents | |
US20060285746A1 (en) | Computer assisted document analysis | |
CN109670092A (en) | XML document proofreading method and device | |
Kiefer | Assessing the Quality of Unstructured Data: An Initial Overview. | |
CN116244410B (en) | Index data analysis method and system based on knowledge graph and natural language | |
CN108153728B (en) | Keyword determination method and device | |
CN107423738B (en) | Test paper subject positioning method and device based on template matching | |
CN106372232B (en) | Information mining method and device based on artificial intelligence | |
JP5629976B2 (en) | Patent specification evaluation / creation work support apparatus, method and program | |
CN114911999A (en) | Name matching method and device | |
CN110688315A (en) | Interface code detection report generation method, electronic device, and storage medium | |
CN111158973B (en) | Web application dynamic evolution monitoring method | |
US20090327210A1 (en) | Advanced book page classification engine and index page extraction | |
US9530070B2 (en) | Text parsing in complex graphical images | |
CN110309258B (en) | Input checking method, server and computer readable storage medium | |
CN114462383B (en) | Method, system, storage medium and equipment for obtaining design specification of building drawing | |
CN106528506B (en) | A kind of data processing method based on XML tag, device and terminal device | |
CN114220113A (en) | Paper quality detection method, device and equipment | |
CN114154480A (en) | Information extraction method, device, equipment and storage medium | |
CN109213830B (en) | Document retrieval system for professional technical documents | |
KR101945234B1 (en) | Method for Searching Semiconductor Parts Using Algorithm of Eliminating Last Alphabet | |
CN108255887B (en) | Method and device for verifying industry text | |
CN113722421A (en) | Contract auditing method and system and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190423 |
|
RJ01 | Rejection of invention patent application after publication |