CN116702702B

CN116702702B - Automatic typesetting method and system based on XML

Info

Publication number: CN116702702B
Application number: CN202310397252.1A
Authority: CN
Inventors: 肖辉; 万捷; 彭干; 程成
Original assignee: Artron Art Group Co ltd; Beijing Artron Art Printing Co ltd
Current assignee: Artron Art Group Co ltd; Beijing Artron Art Printing Co ltd
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2024-02-13
Anticipated expiration: 2043-04-14
Also published as: CN116702702A

Abstract

The invention relates to an automatic typesetting method and system based on XML, which relates to the technical field of automatic typesetting, wherein the system comprises the following steps: the importing module is used for importing XML format data; the analysis module is used for analyzing the imported XML format data, and comprises: the system comprises a classifying unit, a recognition unit and a checking unit, wherein the classifying unit is used for classifying data types of imported XML format data, the recognition unit is used for recognizing labels of the classified text data, and the checking unit is used for checking secondary judgment results of the labels; the reorganization module is used for carrying out structural reorganization on the verified labels; the modeling module is used for creating a label style template; the typesetting module is used for importing the recombined data into the label style template for typesetting; the adjusting module is used for carrying out layout adjustment on typeset data; and the export module is used for exporting the file after the layout adjustment. The invention effectively improves typesetting efficiency.

Description

Automatic typesetting method and system based on XML

Technical Field

The invention relates to the technical field of automatic typesetting, in particular to an automatic typesetting method and system based on XML.

Background

The network compiling platform and the content system have a great number of printing and publishing demands, the traditional method is that the platform exports relevant data, the relevant data are arranged by editing and are delivered to typesetting staff for typesetting and outputting printed files, the middle links are more, time and labor are wasted, mistakes are easy to occur, and the efficiency is low.

Chinese patent publication No.: CN110032720a discloses a visual report typesetting and automatic generating method and system based on XML, comprising: designing an XML report template format; the XML report template format directly maps the report batch production program; automatically generating an XML report template based on a visual mode; the visualization mode is realized by an online page application mode; automatically extracting a mappable report content template file by the XML report template; and automatically backfilling the XML report template after replacing the content; reports are generated based on the XML report templates. Therefore, the scheme does not accurately analyze XML data, and has the problems of low typesetting precision and low typesetting efficiency.

Disclosure of Invention

Therefore, the invention provides an automatic typesetting method and system based on XML, which are used for solving the problems of inaccurate typesetting data analysis, low typesetting precision and low typesetting efficiency in the prior art.

To achieve the above object, in one aspect, the present invention provides an automatic typesetting system based on XML, including:

the importing module is used for importing XML format data;

the analysis module is used for analyzing the imported XML format data, is connected with the importing module and comprises: the system comprises a classifying unit, a recognizing unit and a checking unit, wherein the classifying unit is used for classifying data of imported XML format data into text data and picture data, the recognizing unit is used for carrying out tag recognition on the classified text data, the recognizing unit is connected with the classifying unit, when the tag recognition is carried out, the recognizing unit is used for carrying out matching on each tag keyword and each paragraph content of the text data, calculating the tag matching degree P of each paragraph, after the calculation is finished, the recognizing unit is also used for adjusting the tag matching degree P according to whether the same keyword appears in the paragraph, after the adjustment is finished, the recognizing unit is also used for correcting the adjusted tag matching degree P 'according to the same keyword quantity appearing in the paragraph, after the correction is finished, the recognizing unit is also used for carrying out primary judgment on the tag of the paragraph according to the corrected tag matching degree P', and carrying out secondary judgment on the tag of which is successfully matched with the tag judgment time mark according to the paragraph quantity, and the checking unit is used for checking the secondary judgment result of the tag according to the corresponding paragraph quantity of the same paragraph when the checking is carried out the checking, and the checking unit is used for carrying out the checking on the tag judgment result of the second judgment according to the paragraph corresponding to the paragraph quantity;

the reorganization module is used for carrying out structural reorganization on each label after verification and is connected with the analysis module;

the modeling module is used for creating a label style template which is connected with the reorganization module;

the typesetting module is used for importing the data after the label structure is recombined into a label style template for typesetting and is connected with the recombination module;

the adjustment module is used for carrying out layout adjustment on typeset data, is connected with the typesetting module, and is also used for adjusting the dynamic header format when carrying out adjustment so that the header formats of all pages after adjustment are the same, and creating an index label and a reference label;

and the export module is used for exporting the file after layout adjustment and is connected with the adjustment module.

Further, when calculating the tag matching degree P of each paragraph, the identifying unit sets p= (p1+p2+ … Pn)/n, n is the number of similar keywords in the paragraph, n is greater than or equal to 1, pi is the matching degree of similar keywords in the paragraph, pi=l/L0, i=1, 2 … n, L is the number of words of the similar keywords, L is greater than or equal to 2, and L0 is the number of words of the tag keywords.

Further, the identification unit adjusts the tag matching degree P according to whether the same keyword appears in the paragraph when adjusting the tag matching degree P, wherein,

when the same keyword appears in the paragraph, the identification unit selects an adjustment coefficient t to adjust the tag matching degree P so as to increase the tag matching degree, wherein t is more than 1 and less than 1.2, the adjusted tag matching degree is P ', and P' =P×t is set;

when the same keyword does not appear in the paragraph, the recognition unit does not make an adjustment.

Further, when the identification unit corrects the adjusted tag matching degree P ', the identification unit compares the same number S of keywords appearing in the paragraph with a preset same number S0 of keywords, corrects the adjusted tag matching degree P' according to the comparison result, wherein,

when S is more than 1 and less than or equal to S0, the identification unit selects a first correction coefficient g1 to correct the adjusted tag matching degree P' so as to increase the tag matching degree, wherein g1 is more than 1 and less than 1.1;

when S > S0, the identification unit selects a second correction coefficient g2 to correct the adjusted tag matching degree P' so as to increase the tag matching degree, and sets g2=g1+g1× (S-S0)/S;

when the i-th correction coefficient gi is selected to correct the adjusted tag matching degree P ', i=1, 2 is set, the corrected tag matching degree is P ", and P" =p' ×gi is set.

Further, when the identification unit judges the label of the paragraph according to the corrected label matching degree P ', the corrected label matching degree P' is compared with the preset label matching degree P0, and the label of the paragraph is judged for the first time according to the comparison result,

when P' is more than or equal to P0, the identification unit judges that the label is successfully matched, and takes the label successfully matched as the label of the paragraph;

when P' < P0, the identification unit judges that the tag matching fails.

Further, the identification unit compares the word number Z of the successfully matched label paragraph with the word number of each preset label paragraph when performing label secondary judgment, and performs label secondary judgment on the successfully matched label paragraph after the primary label judgment according to the comparison result,

when Z is smaller than Z1 or Z is larger than Z2, the identification unit judges that the successfully matched label cannot be used as the label of the paragraph, and carries out label primary judgment again on the paragraph;

when Z1 is less than or equal to Z2, the identification unit judges that the label successfully matched is used as the label of the paragraph;

wherein Z1 is the number of first preset label paragraph words, Z2 is the number of second preset label paragraph words, and Z1 is less than Z2.

Further, when the verification unit verifies the label secondary judgment result, the verification unit verifies the label secondary judgment result of the paragraph according to the label number corresponding to the same paragraph, wherein,

when a plurality of labels exist in the same paragraph, the verification unit judges that verification fails, sorts the labels of the paragraph according to the matching degree from large to small, and takes the label with the largest matching degree as the label of the paragraph;

when a single label exists in the same paragraph, the verification unit judges that verification is successful.

Further, when the reorganization module reorganizes the structure of each verified label, the label name which is verified successfully is matched with the label name in the preset label structure, and the label is reorganized according to the matching result,

when the tag name successfully checked is successfully matched with the tag name in the preset tag structure, the reorganization module reorganizes the tag structure according to the preset tag structure;

when the label name successfully checked is failed to be matched with the label name in the preset label structure, the reorganization module carries out label judgment again on the paragraph corresponding to the label which is failed to be matched, and when the label judgment is carried out again on the paragraph, the selected label is not used any more until the label name of the paragraph is successfully matched with the label name in the preset label structure.

Further, the label style template comprises a paragraph style, a character style, an object style and a table style corresponding to the label.

On the other hand, the invention also provides an automatic typesetting method based on XML, which comprises the following steps,

step S1, importing XML format data to be typeset through an importing module;

s2, analyzing the imported XML format data through an analysis module to identify tags of the XML format data;

s3, carrying out structural reorganization on each identified label through a reorganization module;

s4, creating a label style template through a modeling module;

s5, importing the data after the label structure reorganization into a label style template through a typesetting module for typesetting;

s6, performing layout adjustment on typeset data through an adjustment module;

and S7, exporting the document with the adjusted layout through an export module.

Compared with the prior art, the system has the beneficial effects that the system is applied to automatic typesetting, the analysis module analyzes the imported XML format data to identify the tags of the XML format data, so that the accuracy of data analysis is effectively ensured, the typesetting efficiency is improved, the structure reorganization of the verified tags is performed by the reorganization module, the accuracy of the structure reorganization of the tags is effectively ensured, and the typesetting efficiency is improved.

In particular, in the embodiment, the matching degree of the similar keywords in the paragraphs is calculated through the ratio of the number of words of the similar keywords to the number of words of the tag keywords, and the matching degree of the similar keywords in the paragraphs is averaged compared with the number of the similar keywords in the paragraphs, so that the tag matching degree P of each paragraph is calculated, the accuracy of data analysis is effectively ensured, and the typesetting efficiency is improved.

Especially, when the identification unit adjusts the tag matching degree P, the identification unit adjusts the tag matching degree P according to whether the same keyword appears in the paragraph, if the same keyword appears in the paragraph, the identification unit selects the adjustment coefficient t to adjust the tag matching degree P so as to increase the tag matching degree, and if the same keyword does not appear in the paragraph, the identification unit does not adjust, thereby effectively ensuring the accuracy of data analysis and further improving typesetting efficiency.

In particular, when the identification unit corrects the adjusted tag matching degree P ', the identification unit compares the number S of identical keywords appearing in the paragraph with the number S0 of preset identical keywords, if the number of identical keywords is greater than 1 and less than or equal to the number of preset identical keywords, selects the first correction coefficient g1 to correct the adjusted tag matching degree P ' so as to increase the tag matching degree, and selects the second correction coefficient g2 to correct the adjusted tag matching degree P ' so as to further increase the tag matching degree, effectively ensures the accuracy of data analysis, and improves the typesetting efficiency.

Particularly, when the identification unit in this embodiment determines the label of the paragraph according to the corrected label matching degree P ", the corrected label matching degree P" is compared with the preset label matching degree P0, if the corrected label matching degree P "is greater than or equal to the preset label matching degree P0, the identification unit determines that the label matching is successful, and uses the successfully matched label as the label of the paragraph, if the corrected label matching degree P" is less than the preset label matching degree P0, the identification unit determines that the label matching is failed, thereby effectively ensuring the accuracy of data analysis and improving typesetting efficiency.

Especially, in this embodiment, different labels are provided with different preset label paragraph numbers, when the identification unit performs label secondary judgment, the identification unit compares the number Z of successfully matched labels with the number Z of preset label paragraphs, if the number Z of successfully matched labels is smaller than the number Z1 of preset label paragraphs or larger than the number Z2 of preset label paragraphs, where Z1 is smaller than Z2, the identification unit judges that the successfully matched labels cannot be used as labels of the paragraphs, and performs label primary judgment again on the paragraphs, and if the number Z of successfully matched labels is within the number Z1 of preset label paragraphs and the number Z2 of preset label paragraphs, the identification unit judges that the successfully matched labels are used as labels of the paragraphs, so that the accuracy of data analysis is effectively ensured, and typesetting efficiency is improved.

In particular, when the verification unit verifies the label judgment result, the verification unit verifies the label judgment result according to the number of labels corresponding to the same paragraph, if a plurality of labels exist in the same paragraph, the verification unit performs matching degree sequencing on the labels, the label with the highest matching degree is used as the label of the paragraph, and if a single label exists in the same paragraph, the verification is successful, so that the accuracy of data analysis is effectively ensured, and the typesetting efficiency is improved.

Especially, when the reorganization module performs structure reorganization on each checked label, the label name that is checked successfully is matched with the label name in the preset label structure, if the label name that is checked successfully is matched with the label name in the preset label structure, the reorganization module performs structure reorganization on the label according to the preset label structure, if the label name that is checked successfully is matched with the label name in the preset label structure, the reorganization module performs label judgment again on a paragraph corresponding to the label that is matched with the failure, and when the label judgment is performed again on the paragraph, the selected label is not used until the label name of the paragraph is matched with the label name in the preset label structure successfully, so that the reorganization precision of the label structure is effectively ensured, and the typesetting efficiency is improved.

Drawings

Fig. 1 is a schematic diagram of an automatic typesetting system based on XML in an embodiment of the present invention;

fig. 2 is a flow chart of an automatic typesetting method based on XML in an embodiment of the present invention.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.

Referring to fig. 1, an automatic typesetting system based on XML according to an embodiment of the present invention includes:

the importing module is used for importing XML format data;

the modeling module is used for creating a label style template which is connected with the reorganization module, wherein the label style template comprises paragraph styles, character styles, object styles, table styles and the like corresponding to labels;

Specifically, the system is applied to automatic typesetting, the imported XML format data is analyzed through the analysis module, tag identification is carried out on the XML format data, and therefore accuracy of data analysis is effectively guaranteed, typesetting efficiency is improved, structure reorganization is carried out on the verified tags through the reorganization module, accuracy of tag structure reorganization is effectively guaranteed, and typesetting efficiency is improved. The Chinese character label in this embodiment includes an author label, a text title label, a abstract label, a reference label, and the like, when dividing data types, dividing data containing JPG format into picture data, dividing other data into text data, where the label key is preset, is a key character in the label text, such as the author label, the label key is the author, the paragraphs are identification units divide according to punctuation marks when identifying, the text content between two periods in the text data is used as a paragraph, the same key word is a key character with the same number of words connected with the same number of words, and the number of words connected with the other is greater than or equal to 2 words, when typesetting is performed by the typesetting module in this embodiment, typesetting the picture data and the text data after the label structure is recombined according to a preset application rule and a formula processing mode, generating a result and storing the result, where the preset application rule is an alignment rule, is a first line and a second line, a small number 4 is a song title, a left alignment, the paragraphs, a small number 4 is a paragraph, a left word, a left alignment mode is a line name, and the index label is referred to the label, and the label is found in the index label, and the label can be referred to.

Specifically, when calculating the tag matching degree P of each paragraph, the identifying unit sets p= (p1+p2+ … Pn)/n, n is the number of similar keywords in the paragraph, n is greater than or equal to 1, pi is the matching degree of similar keywords in the paragraph, pi=l/L0, i=1, 2 … n, L is the number of words of the similar keywords, L is greater than or equal to 2, and L0 is the number of words of the tag keywords.

Specifically, in this embodiment, the matching degree of the similar keywords in the paragraphs is calculated by the ratio of the number of words of the similar keywords to the number of words of the tag keywords, and the matching degree of the similar keywords in the paragraphs is averaged compared with the number of the similar keywords in the paragraphs, so as to calculate the tag matching degree P of each paragraph, thereby effectively ensuring the accuracy of data analysis and improving the typesetting efficiency. The similar keywords described in this embodiment are the same or similar keywords with the number of connected words being 2 or more, for example, the similar keywords having "title", "text mark", "text question", "text title", etc. as "text title".

Specifically, when the identification unit adjusts the tag matching degree P, the identification unit adjusts the tag matching degree P according to whether the same keyword appears in the paragraph, wherein,

when the same keyword appears in the paragraph, the identification unit selects an adjustment coefficient t to adjust the tag matching degree P to increase the tag matching degree, 1 < t < 1.2, the adjusted tag matching degree is P ', and P' =p×t is set.

Specifically, when the identification unit adjusts the tag matching degree P, the identification unit adjusts the tag matching degree P according to whether the same keyword appears in the paragraph, if the same keyword appears in the paragraph, the identification unit selects the adjustment coefficient t to adjust the tag matching degree P so as to increase the tag matching degree, and if the same keyword does not appear in the paragraph, the identification unit does not adjust, thereby effectively ensuring the accuracy of data analysis and further improving typesetting efficiency.

Specifically, when the identification unit corrects the adjusted tag matching degree P ', the identification unit compares the same number S of keywords appearing in the paragraph with a preset same number S0 of keywords, corrects the adjusted tag matching degree P' according to the comparison result, wherein,

Specifically, when the identification unit corrects the adjusted tag matching degree P ', the identification unit compares the number S of identical keywords appearing in the paragraph with the number S0 of preset identical keywords, if the number of identical keywords is greater than 1 and less than or equal to the number of preset identical keywords, selects the first correction coefficient g1 to correct the adjusted tag matching degree P ' so as to increase the tag matching degree, and selects the second correction coefficient g2 to correct the adjusted tag matching degree P ' so as to further increase the tag matching degree, effectively ensures the accuracy of data analysis, and improves the typesetting efficiency.

Specifically, when the identification unit judges the label of the paragraph according to the corrected label matching degree P ', the corrected label matching degree P' is compared with the preset label matching degree P0, and the label of the paragraph is judged for the first time according to the comparison result,

when P' < P0, the identification unit judges that the tag matching fails.

Specifically, when the identification unit determines the label of the paragraph according to the corrected label matching degree P ", the identification unit compares the corrected label matching degree P" with the preset label matching degree P0, if the corrected label matching degree P "is greater than or equal to the preset label matching degree P0, the identification unit determines that the label matching is successful, and uses the successfully matched label as the label of the paragraph, and if the corrected label matching degree P" is less than the preset label matching degree P0, the identification unit determines that the label matching is failed, thereby effectively ensuring the accuracy of data analysis and improving typesetting efficiency.

Specifically, when the identification unit performs the secondary label judgment, the word number Z of the paragraph with successfully matched labels is compared with the word number of each preset label paragraph, and the secondary label judgment is performed on the paragraph with successfully matched labels after the primary label judgment according to the comparison result,

Specifically, in this embodiment, different labels are provided with different preset label paragraph numbers, when the identification unit performs label secondary judgment, the identification unit compares the number Z of successfully matched labels with the preset label paragraph numbers, if the number Z of successfully matched labels is smaller than the first preset label paragraph number Z1 or larger than the second preset label paragraph number Z2, wherein Z1 is smaller than Z2, the identification unit judges that the successfully matched labels cannot be used as labels of the paragraphs, and performs label primary judgment again on the paragraphs, and if the number Z of successfully matched labels is within the first preset label paragraph number Z1 and the second preset label paragraph number Z2, the identification unit judges that the successfully matched labels are used as labels of the paragraphs, so that the accuracy of data analysis is effectively ensured, and the typesetting efficiency is improved.

Specifically, when the verification unit verifies the label secondary judgment result, the verification unit verifies the label secondary judgment result of the paragraph according to the label number corresponding to the same paragraph, wherein,

Specifically, when the verification unit verifies the label judgment result, the verification unit verifies the label judgment result according to the number of labels corresponding to the same paragraph, if a plurality of labels exist in the same paragraph, the verification unit performs matching degree sequencing on the labels, the label with the highest matching degree is used as the label of the paragraph, and if a single label exists in the same paragraph, the verification is successful, so that the accuracy of data analysis is effectively ensured, and the typesetting efficiency is improved.

Specifically, when the reorganization module reorganizes the structure of each verified label, the label name which is verified successfully is matched with the label name in the preset label structure, and the label is reorganized according to the matching result,

Specifically, when the reorganization module performs structure reorganization on each checked label, the label name that is checked successfully is matched with the label name in the preset label structure, if the label name that is checked successfully is matched with the label name in the preset label structure, the reorganization module performs structure reorganization on the label according to the preset label structure, if the label name that is checked successfully is matched with the label name in the preset label structure, the reorganization module performs label judgment again on a paragraph corresponding to the label that is matched with the failure, and when the label judgment is performed again on the paragraph, the selected label is not used until the label name of the paragraph is matched with the label name in the preset label structure successfully, so that the reorganization precision of the label structure is effectively ensured, and the typesetting efficiency is improved. In this embodiment, when the tag name of the successful verification matches the tag name in the preset tag structure, the reorganization module re-performs the tag judgment on the paragraph corresponding to the tag with the matching failure, when the tag judgment is performed on the paragraph again, the selected tag is not used until the tag name of the paragraph matches the tag name in the preset tag structure, if the tag name of the successful verification has a, b, c, d, e, f, when the tag name of the successful verification matches the tag name in the preset tag structure, c is not the tag name in the preset tag structure, then c is matched with the failure, at this time, the tag judgment is performed again on the paragraph corresponding to c, when the tag of the paragraph corresponding to c is re-judged, c is not used, so as to re-perform the tag judgment on the paragraph corresponding to the tag with the matching failure, until the tag name of the paragraph matches the tag name of the preset tag structure successfully.

Referring to fig. 2, an automatic typesetting method based on XML according to an embodiment of the present invention includes,

step S1, importing XML format data to be typeset through an importing module;

s4, creating a label style template through a modeling module;

s6, performing layout adjustment on typeset data through an adjustment module;

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. An automatic typesetting system based on XML, the system comprising:

the importing module is used for importing XML format data;

the analysis module is used for analyzing the imported XML format data, is connected with the importing module and comprises: the system comprises a classifying unit, a recognizing unit and a checking unit, wherein the classifying unit is used for classifying data of imported XML format data into text data and picture data, the recognizing unit is used for recognizing labels of the classified text data, the recognizing unit is connected with the classifying unit, when the labels are recognized, the recognizing unit is used for matching each label keyword with each paragraph content of the text data, calculating the label matching degree P of each paragraph, after the calculation is completed, the recognizing unit is also used for adjusting the label matching degree P according to whether the same label keyword appears in the paragraphs, after the adjustment is completed, the recognizing unit is also used for correcting the adjusted label matching degree P 'according to the same label keyword quantity appearing in the paragraphs, after the correction is completed, the recognizing unit is also used for carrying out primary judgment on labels of the paragraphs according to the corrected label matching degree P', and carrying out secondary judgment on the labels of the paragraphs successfully matched with the label primary judgment time marks according to the paragraph quantity, and the checking unit is used for checking the secondary label judgment results, when the checking is carried out, the checking unit is used for checking the label secondary judgment results of the labels according to the same label quantity as the corresponding paragraphs;

the export module is used for exporting the file after layout adjustment and is connected with the adjustment module;

when the identification unit calculates the label matching degree P of each paragraph, P= (P1+P2+ … Pn)/n is set, n is the number of similar keywords in the paragraph, n is more than or equal to 1, pi is the matching degree of the similar keywords in the paragraph, pi=L/L0, i=1, 2 … n, L is the word number of the similar keywords, L is more than or equal to 2, and L0 is the word number of the label keywords;

the similar keywords are identical or similar keywords with the number of the connected words being more than or equal to 2 words,

when the identification unit adjusts the tag matching degree P, the identification unit adjusts the tag matching degree P according to whether the same tag keywords appear in the paragraphs, wherein,

when the same tag keywords appear in the paragraphs, the identification unit selects an adjustment coefficient t to adjust the tag matching degree P so as to increase the tag matching degree, wherein t is more than 1 and less than 1.2, the adjusted tag matching degree is P ', and P' =P×t is set;

when the same tag key does not appear in the paragraph, the recognition unit does not make an adjustment,

when the identification unit corrects the adjusted tag matching degree P ', the identification unit compares the number S of the same tag keywords appearing in the paragraph with the number S0 of preset same tag keywords, corrects the adjusted tag matching degree P' according to the comparison result, wherein,

when the ith correction coefficient gi is selected to correct the adjusted tag matching degree P ', i=1, 2 is set, the corrected tag matching degree is P ", and P" =p' ×gi is set;

when judging the label of the paragraph according to the corrected label matching degree P ', the identification unit compares the corrected label matching degree P' with the preset label matching degree P0 and carries out primary judgment on the label of the paragraph according to the comparison result, wherein,

when P' is less than P0, the identification unit judges that the label matching fails;

when the identification unit performs the secondary label judgment, the word number Z of the successfully label matched paragraph is compared with the word number of each preset label paragraph, and the secondary label judgment is performed on the successfully label matched paragraph after the primary label judgment according to the comparison result,

2. The automatic typesetting system based on XML as recited in claim 1, wherein the verification unit verifies the tag secondary judgment result of the paragraph according to the number of tags corresponding to the same paragraph when verifying the tag secondary judgment result, wherein,

3. The automatic typesetting system based on XML as recited in claim 2, wherein when the reorganization module reorganizes the structure of each verified label, the label name that is verified successfully is matched with the label name in the preset label structure, and the label is reorganized according to the matching result,

4. An XML-based automatic typesetting system according to claim 3, wherein said tag style templates include paragraph styles, character styles, object styles and form styles to which tags correspond.

5. A typesetting method applied to the automatic typesetting system based on XML according to any one of claims 1 to 4, comprising,

step S1, importing XML format data to be typeset through an importing module;

s4, creating a label style template through a modeling module;

s6, performing layout adjustment on typeset data through an adjustment module;