CN111259202A - Document structured data embedding method and system - Google Patents

Document structured data embedding method and system Download PDF

Info

Publication number
CN111259202A
CN111259202A CN202010024636.5A CN202010024636A CN111259202A CN 111259202 A CN111259202 A CN 111259202A CN 202010024636 A CN202010024636 A CN 202010024636A CN 111259202 A CN111259202 A CN 111259202A
Authority
CN
China
Prior art keywords
document
data
structured data
structured
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010024636.5A
Other languages
Chinese (zh)
Other versions
CN111259202B (en
Inventor
杨建庆
谢现举
孙双魁
张春花
罗江怡
吴淼
彭梦姚
罗潮霞
刘杰
白有为
刘康
王海燕
孙永花
李晶晶
周彪
魏清
李祥花
田蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xining Ningguang Engineering Consultation Co ltd
PowerChina Qinghai Electric Power Engineering Co Ltd
Original Assignee
Xining Ningguang Engineering Consultation Co ltd
PowerChina Qinghai Electric Power Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xining Ningguang Engineering Consultation Co ltd, PowerChina Qinghai Electric Power Engineering Co Ltd filed Critical Xining Ningguang Engineering Consultation Co ltd
Priority to CN202010024636.5A priority Critical patent/CN111259202B/en
Publication of CN111259202A publication Critical patent/CN111259202A/en
Application granted granted Critical
Publication of CN111259202B publication Critical patent/CN111259202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to the field of computer knowledge management, in particular to a method and a system for embedding structured data of a document. The method of the invention not only meets the requirements of manual reading, understanding, using and filing, but also can realize the automatic collection and processing of the document embedded with the structured data, and can effectively control the standardization degree and the data precision requirement of the document.

Description

Document structured data embedding method and system
Technical Field
The invention relates to the field of computer knowledge management, in particular to a method and a system for embedding structured data of a document.
Background
With the continuous development of paperless office of computers, paper documents are largely replaced by electronic documents in the engineering field, which is mainly convenient for manual reading, use, transmission and record. However, in the application process of the electronic document, due to the fact that different professional fields, software, management and using methods exist, the storage format and the expression form of the document are different, document editing, management and document normalization are completed in a manual mode at present, and due to the fact that many subjectivity exist in the manual mode and the completed document is unstructured, difficulty is brought to automatic knowledge discovery and extraction.
Taking engineering project technical documents as an example, data parameters in each professional document are more, the traditional professional documents are different according to different document layout formats and technical index expression forms in the professional field, and the requirements on the precision of the data are not uniform. Therefore, although such documents meet professional use habits, the document normalization degree and the data precision requirement are difficult to be effectively controlled by a manual control means. Because documents are various and unstructured, keywords can be found effectively by semantic analysis and extraction in the aspects of collection and extraction of document data, but parameters, data and description information meeting engineering requirements are difficult to extract accurately. Because the quality of engineering project documents directly affects the construction quality and safety of engineering, the requirements of the engineering field on the documents cannot be met by manual control, extraction and conversion.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a document structured data embedding method and a document structured data embedding system, which are used for solving the problems that the prior art cannot simultaneously meet the traditional use requirements of the documents in the engineering field, the document standardization control, the key index parameter precision control, the automatic document data extraction and the knowledge discovery and extraction.
In order to achieve the purpose, the invention adopts the following technical scheme:
a document structured data embedding method, comprising the steps of:
(1) constructing a document structured frame template, dividing a document into frame structures of different topic chapters, title paragraphs and description areas according to the professional field, the document type and the topic according to the professional document standardization standard, and associating a structured data label and an extensible semi-structured data label according to the corresponding relation between the document frame structure and the target content to form the document structured frame template capable of being repeatedly quoted;
(2) editing and managing documents through a document structured framework template, selecting a corresponding template according to the professional field, the document type and the theme, preloading the template into a document editor, completing the editing and management of the documents in a user interaction mode, and dynamically completing the editing and management of a structured data label and an extensible semi-structured data label corresponding to the document structured framework structure according to a format determined by the template through the interaction mode;
(3) structured data acquisition, namely extracting and converting document data edited by a document editor into xml structural data and document attribute fields according to the incidence relation between a document structured frame template and corresponding structured data labels and extensible semi-structured data labels;
(4) embedding structured document data, namely obtaining a document edited by a user according to a template frame from the process of editing and managing the document through a document structured frame template and using the document as a document main body, packaging xml structure data and document attribute fields obtained in the structured data acquisition stage according to an embedded body format, determining the embedding position of the structure data in a target document through embedding point reliability verification in advance according to the storage format characteristics of the target document file, and embedding the embedding position into the target format file to obtain the document embedded with the structured data;
(5) extracting structured data from a document embedded with the structured data, extracting structural body data in the document according to the embedded form characteristics and acquiring related attribute information of the document by reading the document embedded with the structured data, and extracting the structured data and extensible semi-structured data in the structural body data by taking a frame structure provided by a template as a characteristic according to matching of template attribute values to the corresponding template.
Further, the title paragraph includes, but is not limited to, content scope, table style, text font, text indentation, line space.
Further, the structured data tag is used for marking the explicit description and the key index parameter related to the subject content, the title paragraph and the description area in the document.
Further, the extensible semi-structured data tags are used to mark up descriptions and key index parameters in the document that are related to subject matter, title paragraphs, and description areas, but cannot be specified.
Further, the document attribute fields include, but are not limited to, document title, document subject classification, template information, author, document version number, document summary, keywords, digital signer information, and digital signature.
Meanwhile, the application also provides a document structured data embedding system which comprises a template generator, a document editor, a structured data collector, a data authentication processor, a structured data controller, a template library and a data extraction and conversion interface;
the template generator consists of a document structure body extraction module, a template matching module, a document frame generation module and a structured field generation module;
the document editor consists of a document loading frame module, a document editing and displaying unit module, a structured data labeling editing and displaying unit module, a structured document generating module and a document type universal editor interface, and realizes the editing and structured data embedding, displaying and modifying operations of a newly-built document, an embedded structured data document and an original unstructured document;
the structured data acquisition device consists of a data acquisition module, a document attribute and structured data extraction module and an xml structural data generation module, and realizes that data provided by the document editor is extracted and converted into xml structural data and document attribute fields according to the incidence relation between a structured data label corresponding to a document frame and an extensible semi-structured data label;
the data authentication processor adopts digital signatures to realize digital signature and signature verification of the xml structure data, so that the xml structure data is ensured not to be tampered and intervened when not authorized, and the xml structure data meets the requirements on integrity and usability;
the structured data controller consists of a structural body data packaging module to be embedded, a data verification state confirming and extracting module, a matching file type and embedding position control module, and is used for realizing the packaging of a structured data body to be embedded, the data packaging meeting the requirements of the data extracting and converting interface and determining the storage position of a pre-embedded structured data body in a target file;
the data extraction and conversion interface realizes the butt joint with a data application system and data conversion;
the template library consists of a special template library and a general template library, wherein the special template library is used for storing a relational database of special document templates which meet the classification standard of the professional field and aim at specific work requirements, and the general template library is used for storing a relational database of general templates which meet the classification standard of the professional field and aim at professional characteristics.
Further, the template generator extracts corresponding templates from the template library through document attributes of the embedded structured data document or according to professional fields, document types and themes of an original unstructured document and a newly-built document, and associates a structured data tag and an extensible semi-structured data tag according to a corresponding relation between a document frame structure and target content.
Furthermore, the special template library is a structured document and a structured label set which take a structured data frame as a main body and take an extensible semi-structured data frame as an auxiliary body.
Further, the universal template library is a structured document frame and a structured label set which take an extensible semi-structured data frame as a main body.
Further, the data application system is an office automation system and/or a business system.
Compared with the prior art, the invention has the following effects:
(1) the document processed by the method can extract the structured data related to and consistent with the document on the premise of not changing the original appearance of the document, so that the document is convenient to use manually and meets knowledge discovery and knowledge extraction. Compared with the existing XML document, the XML document has the characteristics of not changing the habit of manual reading and use and having the characteristic of structuring.
(2) The document with embedded structured data can be read and displayed normally by software associated with existing document format types without any processing.
(3) The structured data body of the document is not easy to be tampered by people, so that the document data has authenticity and non-repudiation.
In summary, through the processing of the data embedding system and the data embedding method, the document can meet the requirements of manual reading, understanding, using and filing, and the structured and semi-structured data information embedded in the document can be extracted through a structured extraction mode, so that the automatic acquisition and processing of the document embedded with structured data can be realized, and the standardization degree and the data precision requirement of the document can be effectively controlled.
Drawings
FIG. 1 is a flow chart of a document structured data embedding method provided by the present application;
FIG. 2 is a schematic structural diagram of a document structured data embedding system provided in the present application;
FIG. 3 is a schematic diagram of an embedded volume format according to an embodiment of the present application;
fig. 4 is a flowchart for extracting and processing structured data from a document embedded with structured data according to embodiment 2 of the present application.
Detailed Description
In order to make the technical solutions of the present invention better understood, the following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings and examples, which are used for illustrating the present invention and are not intended to limit the scope of the present invention.
Example 1
Fig. 1 is a flowchart illustrating a document structured data embedding method according to embodiments of the present application, where the method is applicable to a server, a personal computer, and a smart terminal (smart phone, tablet computer).
Fig. 2 is a schematic structural diagram of a document structured data embedding system according to embodiments of the present application, where the system includes a template generator, a document editor, a structured data collector, a data authentication processor, a structured data controller, a data extraction and conversion interface, and a template library.
As shown in fig. 1, the method comprises the steps of:
s1, constructing a document structured framework template, wherein the key of the process is that the constructed document structured framework template should meet the framework structure requirements of chapters, title paragraphs and description areas of related professional fields, document types and related subject requirements, and establishing a structured data label and an extensible semi-structured data label according to the corresponding relation between the document framework structure and the target content. The method can play the roles of establishing a standardized document, conveniently extracting structured data, standardizing key description and index parameters of the document and recycling.
Compared with the traditional document template, the document structured framework template establishes specific writing requirement description for the document subject content, the title paragraph and the description area, and meanwhile, the specific writing requirement description is used as a control index for document editing in the form of structured tags and attributes to standardize the edited content.
The structured tags are used for marking the explicit description and key index parameters related to the subject content, the title paragraph and the description area in the document. Are typically part of the writing requirements, such as: the key description and index parameters of the general project, site position, site area, excavation amount of the foundation pit and the like which are usually involved in the civil engineering technical document.
The extensible semi-structured tag is used for marking the description and key index parameters related to the subject content, the title paragraph and the description area but not clear, in the document, such as: the particularity of the geological condition and related index parameters. The data precision of the key description and the index parameter of the document is predetermined by the corresponding label attribute of the template, so that the data can be extracted and controlled during document editing.
Preferably, taking a new project professional document as an example, a document structured framework template is constructed according to the above, and the completed template is stored in the template library shown in fig. 2 according to the classification criteria.
And S2, editing and managing the document through the document frame template, wherein the process is respectively completed by the template generator and the document editor. And the extraction of the document structured framework template and the corresponding document editing and structured data collection operation are realized.
As shown in fig. 2, preferably, taking the special template as an example, according to the created document related to professional field, document type and related subject, the user selects a corresponding template already constructed in the special template library from the template library through the template matching module by the template generator, sends the template into the document framework to generate a template, generates a tagged template capable of being recognized, controlled and positioned by the document editor by extracting the framework structure in the template and the structural tag and the extensible semi-structural tag associated therewith, and sends the tagged template to the loaded document framework template of the document editor.
The editing and management of the document are completed in a user interaction mode, and the editing and management of the structured data tags and the extensible semi-structured data tags corresponding to the frame structure of the document are dynamically completed according to the format determined by the template through the interaction mode, and the process is completed by a document editor.
And the loading document framework module is responsible for loading and distributing the tagged template, and distributes the loaded tagged template to the structured data tagged editing and displaying unit module.
The structured data labeling editing and displaying unit module starts the document editing and displaying unit module, dynamically finishes the collection of key description and index parameters appointed by labels and attributes in a user interaction mode, and ensures that document content editing is finished while the requirements of document standardization control and key index parameter precision control are met.
And when the document editing and displaying unit module is started, notifying a document type universal editor interface according to the document type, calling a document editing component of the corresponding document type, and synchronizing the document editing component with the structured data labeling editing and displaying unit module to finish document processing and storing.
And S3, structured data acquisition, wherein the process is that the structured data acquisition device extracts and converts the structured data into xml structured data and document attribute fields according to the incidence relation between the document frame structure and the corresponding structured data label and extensible semi-structured data label.
After the user sends out a document storage instruction, the document editor sends the collected document attributes and the collected structured data to the structured data collector, the data collector module receives the data, and after the data collector module receives the data, the document attribute and structured data extraction template is started, document attributes and structured data in the data are extracted, and the extracted document attributes are sent to the structured data controller.
And the xml structural data generation module is responsible for converting the attributes and the structured data extracted by the document attribute and structured data extraction template into xml structural data and document attribute fields according to the incidence relation between the document frame structure and the corresponding structured data label and extensible semi-structured data label.
The data authentication processor is responsible for carrying out digital signature on the xml structure data output by the structured data collector so as to ensure that the xml structure data meets the requirements of authenticity and non-repudiation. And after the signature is finished, sending the signed xml structure data and the signature state information to the structured data controller.
And S4, embedding the document structured data, wherein the process is respectively completed by the structured data controller and the document editor. The signed xml structure data and the document attribute field are packaged according to an embedded body format shown in fig. 4, according to the storage format characteristics of the target document file, on the premise of not influencing and interfering the storage format of the target document file, the embedding position of the structured data body in the target document is determined in advance through embedded point reliability verification, and the structured data body is embedded into the target format file to obtain the document embedded with the structured data.
And the data verification state confirmation and extraction module is responsible for receiving the signature verification information and the xml structure data from the data authentication processor, and preferably, after the signature verification information indicates that the xml structure data is signed by the current user, extracting the xml structure data.
And the to-be-embedded structural body data packaging module is responsible for receiving the document attribute field from the structured data collector and packaging the received document attribute field and the xml structural body data verified by the data verification state confirming and extracting module according to an embedded body format shown in fig. 3.
And the matching file type and embedding position control module is used for matching the embedding position configuration information determined by the embedding point reliability verification in advance according to the storage format characteristics of the target document file to obtain the pre-embedding position of the structured data body in the target document, and sending the position information and the embedded body to be embedded after the structural body data packaging module packages the position information and the embedded body to be embedded to the document editor.
And after the structured data controller sends the embedded position information and the embedded body to the document editor, the structured document generation module of the document editor is responsible for receiving the data.
When the structured document generation module receives the embedded position information and the embedded body sent by the structured data controller, firstly, the document which is processed by the document editing and displaying unit module is called, and secondly, the received position information and the target format document are subjected to embedded point positioning operation. Taking an RTF (rich text format) type document as an example, the document body is encapsulated by a marker, so that the tail of the type document, namely the document body periphery, is used as an embedding position, and the target file storage format is not influenced or interfered. And finally, after the positioning operation is finished, the received embedded body is integrally embedded into the target document embedding point. And after the operations are completed, executing document storage operation to generate the document embedded with the structured data.
And S5, extracting the structured data from the document embedded with the structured data, wherein the process is completed by a template generator, a template library, a document editor, a structured data collector, a data authentication processor, a structured data controller and a data extraction and conversion interface in sequence. By reading the document embedded with the structured data, extracting the structural data in the document according to the embedded style characteristics shown in fig. 3 and obtaining the related attribute information of the document, matching the structural data with the corresponding template according to the template ID attribute value, and extracting the structured data and the extensible semi-structured data in the structural data by taking the framework structure provided by the template as the characteristics.
Example 2
Fig. 4 is a flowchart for extracting and processing structured data from a document embedded with structured data according to an embodiment of the present application, and the specific steps are as follows:
s51, the created document embedded with structured data is opened by the user through a document structured data embedding system 2.
Preferably, when a user reads a document embedded with structured data created by the document structured data embedding S4 process through a document structured data embedding system, the document embedded with structured data is first opened and read by a template generator of the system.
When the template generator receives an opening instruction of a user for the selected document embedded with the structured data, the document structure body extraction module executes document opening and reading operations, extracts structure body data in the document according to one embedded body format characteristic shown in fig. 4 from the read document, acquires document related attribute information, and simultaneously removes the structure body from the document data to obtain a source document before embedding the structure body data. And starts the template matching module.
And S52, matching the structural frame template.
And the template matching module is used for extracting the document structured frame template information adopted by the document from the acquired document related attribute information and matching the document structured frame template corresponding to the document structured frame template from the template according to the template ID attribute value.
S53, creating a tagged data set.
Generating a template by a document frame, generating a labeled data set with data, which can be identified, controlled and positioned by a document editor, by the document structured frame template and the structural data according to the incidence relation of the document frame structure, the structured label and the extensible semi-structured label. And sending the tagged data set and the source document I to the document editor.
And S54, displaying and extracting the document structured data.
And after the document editor receives the tagged data set with the data and the source document sent by the template generator, the document framework module is responsible for loading and distributing the tagged data set and the source document data, and respectively distributes the loaded tagged data set and the source document data to the structured data tagged editing and displaying unit module and the document editing and displaying unit module.
The structured data labeling editing and displaying unit module is used for extracting a document structured frame, label values and related attributes from the labeled data set, and identifying structured label information and extensible semi-structured label information in a document interface in a manner of column division and highlight display in a manner of synchronizing with the document editing and displaying unit module. And simultaneously, sending the document attribute and the structured data to a structured data collector.
S55, displaying the document content.
And the document editing and displaying unit module is responsible for displaying the source document in the document interface in the original format and keeping synchronization with the structured data labeling editing and displaying unit module.
S56, extracting xml structure data.
And after the structured data collector receives the document attribute and the structured data sent by the document editor, extracting and converting the document attribute and the document attribute into xml structural data and document attribute fields according to the incidence relation between the document frame structure and the corresponding structured data label and extensible semi-structured data label. Since the above process is consistent with the structured data acquisition S3, it will not be described in detail.
And S57, verifying the xml structure digital signature.
The data authentication processor is responsible for carrying out digital signature verification on the xml structure data sent by the structured data collector, and the verification is completed by matching the signer information provided by the document attribute field with the signer digital certificate approved by the system. When the signer is approved by the system and the digital signature is matched with the xml structure data, the signature status information is verified. Authenticity and non-repudiation of the xml structure data are ensured by the above procedure. And after signature verification is completed, sending the xml structure data and the signature state information to the structured data controller.
S58, signature status confirmation.
And after the structured data controller receives the xml structure data and the signature state information sent by the data authentication processor, the data verification state confirmation and extraction module judges whether to continue to execute the next operation or not according to the signature state information.
Preferably, after the xml structure data passes the signature status verification, the data verification status confirmation module extracts the xml structure data and sends the xml structure data to the data extraction and conversion interface.
And S59, extracting and converting the data.
And after the data extraction and conversion interface receives the xml structure data and the document attribute information sent by the structured data controller, the data extraction and conversion interface adopts a serialization technology to realize the butt joint and the data conversion with a data application system, and the knowledge discovery and extraction are realized.
The invention provides a document structured data embedding method and a system, the system is composed of a template generator, a document editor, a structured data collector, a data authentication processor, a structured data controller, a data extraction and conversion interface and a template library, the generated document embedded with structured data can be used by various terminals, the automatic acquisition and processing of the document embedded with structured data are realized, the standardization degree and the data precision requirement of the document can be effectively controlled, and the manual burden can be greatly reduced.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A method for embedding structured data in a document, comprising the steps of:
(1) constructing a document structured frame template, dividing a document into frame structures of different topic chapters, title paragraphs and description areas according to the professional field, the document type and the topic according to the professional document standardization standard, and associating a structured data label and an extensible semi-structured data label according to the corresponding relation between the document frame structure and the target content to form the document structured frame template capable of being repeatedly quoted;
(2) editing and managing documents through a document structured framework template, selecting a corresponding template according to the professional field, the document type and the theme, preloading the template into a document editor, completing the editing and management of the documents in a user interaction mode, and dynamically completing the editing and management of a structured data label and an extensible semi-structured data label corresponding to the document structured framework structure according to a format determined by the template through the interaction mode;
(3) structured data acquisition, namely extracting and converting document data edited by a document editor into xml structural data and document attribute fields according to the incidence relation between a document structured frame template and corresponding structured data labels and extensible semi-structured data labels;
(4) embedding structured document data, namely obtaining a document edited by a user according to a template frame from the process of editing and managing the document through a document structured frame template and using the document as a document main body, packaging xml structure data and document attribute fields obtained in the structured data acquisition stage according to an embedded body format, determining the embedding position of the structure data in a target document through embedding point reliability verification in advance according to the storage format characteristics of the target document file, and embedding the embedding position into the target format file to obtain the document embedded with the structured data;
(5) extracting structured data from a document embedded with the structured data, extracting structural body data in the document according to the embedded form characteristics and acquiring related attribute information of the document by reading the document embedded with the structured data, and extracting the structured data and extensible semi-structured data in the structural body data by taking a frame structure provided by a template as a characteristic according to matching of template attribute values to the corresponding template.
2. The document structured data embedding method according to claim 1, wherein: the title paragraph includes, but is not limited to, content scope, table style, text font, text indentation, line spacing.
3. The document structured data embedding method according to claim 1, wherein: the structured data tag is used for marking the explicit description and key index parameters related to the subject content, the title paragraph and the description area in the document.
4. The document structured data embedding method according to claim 1, wherein: the extensible semi-structured data tag is used for marking the description and key index parameters related to the subject content, the title paragraph and the description area but not clear.
5. The document structured data embedding method according to claim 1, wherein: the document attribute fields include, but are not limited to, document title, document subject classification, template information, author, document version number, document summary, keywords, digital signer information, and digital signature.
6. A document structured data embedding system, characterized by: the system comprises a template generator, a document editor, a structured data collector, a data authentication processor, a structured data controller, a template library and a data extraction and conversion interface;
the template generator consists of a document structure body extraction module, a template matching module, a document frame generation module and a structured field generation module;
the document editor consists of a document loading frame module, a document editing and displaying unit module, a structured data labeling editing and displaying unit module, a structured document generating module and a document type universal editor interface, and realizes the editing and structured data embedding, displaying and modifying operations of a newly-built document, an embedded structured data document and an original unstructured document;
the structured data acquisition device consists of a data acquisition module, a document attribute and structured data extraction module and an xml structural data generation module, and realizes that data provided by the document editor is extracted and converted into xml structural data and document attribute fields according to the incidence relation between a structured data label corresponding to a document frame and an extensible semi-structured data label;
the data authentication processor adopts digital signatures to realize digital signature and signature verification of the xml structure data, so that the xml structure data is ensured not to be tampered and intervened when not authorized, and the xml structure data meets the requirements on integrity and usability;
the structured data controller consists of a structural body data packaging module to be embedded, a data verification state confirming and extracting module, a matching file type and embedding position control module, and is used for realizing the packaging of a structured data body to be embedded, the data packaging meeting the requirements of the data extracting and converting interface and determining the storage position of a pre-embedded structured data body in a target file;
the data extraction and conversion interface realizes the butt joint with a data application system and data conversion;
the template library consists of a special template library and a general template library, wherein the special template library is used for storing a relational database of special document templates which meet the classification standard of the professional field and aim at specific work requirements, and the general template library is used for storing a relational database of general templates which meet the classification standard of the professional field and aim at professional characteristics.
7. The document structured data embedding system of claim 6, wherein: the template generator extracts corresponding templates from the template library through the document attributes of the embedded structured data document or according to the professional fields, document types and themes of the original unstructured document and the newly-built document, and associates the structured data tags and the extensible semi-structured data tags according to the corresponding relation between the document frame structure and the target content.
8. The document structured data embedding system of claim 6, wherein: the special template library is a structured document and a structured label set which take a structured data frame as a main body and an extensible semi-structured data frame as an auxiliary body.
9. The document structured data embedding system of claim 6, wherein: the general template library is a structured document frame and a structured label set which take an extensible semi-structured data frame as a main body.
10. The document structured data embedding system of claim 6, wherein: the data application system is an office automation system and/or a business system.
CN202010024636.5A 2020-01-10 2020-01-10 Document structured data embedding method and system Active CN111259202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010024636.5A CN111259202B (en) 2020-01-10 2020-01-10 Document structured data embedding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010024636.5A CN111259202B (en) 2020-01-10 2020-01-10 Document structured data embedding method and system

Publications (2)

Publication Number Publication Date
CN111259202A true CN111259202A (en) 2020-06-09
CN111259202B CN111259202B (en) 2023-08-04

Family

ID=70946913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010024636.5A Active CN111259202B (en) 2020-01-10 2020-01-10 Document structured data embedding method and system

Country Status (1)

Country Link
CN (1) CN111259202B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859886A (en) * 2020-06-22 2020-10-30 远光软件股份有限公司 Document generation method and device based on product prototype interface
CN112488642A (en) * 2020-11-20 2021-03-12 中国电建集团华东勘测设计研究院有限公司 Cloud file management method based on structured tags and taking object as core
CN113191719A (en) * 2021-04-01 2021-07-30 北京优易惠技术有限公司 File processing method and system in bidding field
CN113435167A (en) * 2021-07-21 2021-09-24 北京国基科技股份有限公司 Document generation method and device for assisting technical state file
CN113723071A (en) * 2021-08-31 2021-11-30 重庆富民银行股份有限公司 Electronic file checking method, system, storage medium and equipment
CN114297998A (en) * 2021-11-19 2022-04-08 嘉兴恒创电力设计研究院有限公司 One-key mapping method and system for intensive management of multi-professional drawings of power transmission line
CN114741717A (en) * 2022-06-14 2022-07-12 合肥高维数据技术有限公司 Hidden information embedding and extracting method based on OOXML document
CN115688733A (en) * 2022-12-29 2023-02-03 南方电网科学研究院有限责任公司 Method and system for writing standard document

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293523A (en) * 1999-04-05 2000-10-20 Mitsubishi Electric Corp Method and device for processing structured document
JP2005234837A (en) * 2004-02-19 2005-09-02 Fujitsu Ltd Structured document processing method, structured document processing system and its program
US20090150364A1 (en) * 1999-07-16 2009-06-11 Oracle International Corporation Automatic generation of document summaries through use of structured text
CN102646125A (en) * 2012-02-28 2012-08-22 中国标准化研究院 Structured digital content extraction and reorganization method
CN102855243A (en) * 2011-06-28 2013-01-02 北大方正集团有限公司 Method and device for extracting document structure
CN102982010A (en) * 2011-09-02 2013-03-20 北大方正集团有限公司 Method and device for abstracting document structure
CN104572744A (en) * 2013-10-23 2015-04-29 北大方正集团有限公司 Structured document generating method and device
CN108021632A (en) * 2017-11-23 2018-05-11 中国移动通信集团河南有限公司 Unstructured data and the mutual conversion process method of structural data
US20180329873A1 (en) * 2015-04-08 2018-11-15 Google Inc. Automated data extraction system based on historical or related data
CN110442851A (en) * 2019-07-23 2019-11-12 南京国睿信维软件有限公司 The method of power editor is independently limited based on the document automated modular of Office Word and multiple terminals
CN110618983A (en) * 2019-08-15 2019-12-27 复旦大学 JSON document structure-based industrial big data multidimensional analysis and visualization method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293523A (en) * 1999-04-05 2000-10-20 Mitsubishi Electric Corp Method and device for processing structured document
US20090150364A1 (en) * 1999-07-16 2009-06-11 Oracle International Corporation Automatic generation of document summaries through use of structured text
JP2005234837A (en) * 2004-02-19 2005-09-02 Fujitsu Ltd Structured document processing method, structured document processing system and its program
CN102855243A (en) * 2011-06-28 2013-01-02 北大方正集团有限公司 Method and device for extracting document structure
CN102982010A (en) * 2011-09-02 2013-03-20 北大方正集团有限公司 Method and device for abstracting document structure
CN102646125A (en) * 2012-02-28 2012-08-22 中国标准化研究院 Structured digital content extraction and reorganization method
CN104572744A (en) * 2013-10-23 2015-04-29 北大方正集团有限公司 Structured document generating method and device
US20180329873A1 (en) * 2015-04-08 2018-11-15 Google Inc. Automated data extraction system based on historical or related data
CN108021632A (en) * 2017-11-23 2018-05-11 中国移动通信集团河南有限公司 Unstructured data and the mutual conversion process method of structural data
CN110442851A (en) * 2019-07-23 2019-11-12 南京国睿信维软件有限公司 The method of power editor is independently limited based on the document automated modular of Office Word and multiple terminals
CN110618983A (en) * 2019-08-15 2019-12-27 复旦大学 JSON document structure-based industrial big data multidimensional analysis and visualization method

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
乐小虬等: "DPaper:一种面向语义出版的结构化论文写作工具设计与实现", 《现代图书情报技术》 *
乐小虬等: "DPaper:一种面向语义出版的结构化论文写作工具设计与实现", 《现代图书情报技术》, no. 11, 15 November 2016 (2016-11-15) *
李宁等: "软件结构化文档编制工具的研究与实现", 《微计算机信息》 *
李宁等: "软件结构化文档编制工具的研究与实现", 《微计算机信息》, no. 36, 25 December 2007 (2007-12-25) *
杨晶等: "一种基于XML的非结构化数据转换方法", 《计算机科学》 *
杨晶等: "一种基于XML的非结构化数据转换方法", 《计算机科学》, 15 November 2017 (2017-11-15) *
熊志刚等: "基于AbiWord的结构化电子病历系统研究", 《中国数字医学》 *
熊志刚等: "基于AbiWord的结构化电子病历系统研究", 《中国数字医学》, no. 02, 15 February 2017 (2017-02-15) *
蒋悦等: "基于文档树的XML文件转换", 《计算机工程》 *
蒋悦等: "基于文档树的XML文件转换", 《计算机工程》, no. 21, 5 November 2003 (2003-11-05) *
许斗等: "XML的半结构化数据表示方法及其在医学文档处理中的应用", 《计算机工程》 *
许斗等: "XML的半结构化数据表示方法及其在医学文档处理中的应用", 《计算机工程》, no. 01, 20 January 2002 (2002-01-20) *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859886A (en) * 2020-06-22 2020-10-30 远光软件股份有限公司 Document generation method and device based on product prototype interface
CN111859886B (en) * 2020-06-22 2024-02-02 远光软件股份有限公司 Document generation method and device based on product prototype interface
CN112488642A (en) * 2020-11-20 2021-03-12 中国电建集团华东勘测设计研究院有限公司 Cloud file management method based on structured tags and taking object as core
CN112488642B (en) * 2020-11-20 2024-03-12 中国电建集团华东勘测设计研究院有限公司 Cloud file management method based on structured labels and taking object as core
CN113191719A (en) * 2021-04-01 2021-07-30 北京优易惠技术有限公司 File processing method and system in bidding field
CN113435167A (en) * 2021-07-21 2021-09-24 北京国基科技股份有限公司 Document generation method and device for assisting technical state file
CN113723071A (en) * 2021-08-31 2021-11-30 重庆富民银行股份有限公司 Electronic file checking method, system, storage medium and equipment
CN113723071B (en) * 2021-08-31 2023-05-09 重庆富民银行股份有限公司 Electronic archive verification method, system, storage medium and equipment
CN114297998A (en) * 2021-11-19 2022-04-08 嘉兴恒创电力设计研究院有限公司 One-key mapping method and system for intensive management of multi-professional drawings of power transmission line
CN114741717A (en) * 2022-06-14 2022-07-12 合肥高维数据技术有限公司 Hidden information embedding and extracting method based on OOXML document
CN114741717B (en) * 2022-06-14 2022-09-06 合肥高维数据技术有限公司 Hidden information embedding and extracting method based on OOXML document
CN115688733A (en) * 2022-12-29 2023-02-03 南方电网科学研究院有限责任公司 Method and system for writing standard document

Also Published As

Publication number Publication date
CN111259202B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN111259202B (en) Document structured data embedding method and system
US10810365B2 (en) Workflow system and method for creating, distributing and publishing content
US11627001B2 (en) Collaborative document editing
US7996767B2 (en) System and method for generating electronic patent application files
CN106682219B (en) Associated document acquisition method and device
TWI237191B (en) Method of extracting a section of a page from a portable document format file, system for extracting a section of a page of a portable document format file, and computer readable medium containing executable instructions
US9122886B2 (en) Track changes permissions
CN1858786B (en) Electronic file formatting annotate and comment system and method
US9230356B2 (en) Document collaboration effects
US8924424B2 (en) Metadata record generation
US20170220858A1 (en) Optical recognition of tables
US9720886B2 (en) System and method for dynamic linking between graphic documents and comment data bases
US20200012709A1 (en) Automatic document generation systems and methods
CN114330233A (en) Method for realizing correlation between electronic form content and file through file bottom
CN110211581B (en) Laboratory automatic voice recognition recording identification system and method
US8719690B2 (en) Method and system for automatic data aggregation
CN110471892B (en) Revit file data collection method and related device
CN112783482B (en) Visual form generation method, device, equipment and storage medium
US20070220439A1 (en) Information Management Device
CN111079375B (en) Information sorting method and device, computer storage medium and terminal
US7730105B2 (en) Time sharing managing apparatus, document creating apparatus, document reading apparatus, time sharing managing method, document creating method, and document reading method
CN116467402A (en) Work package document management method and system
CN117829128A (en) Intelligent manufacturing standard extraction system
CN114218468A (en) Spreadjs-based data capture display method
CN117688345A (en) Data service method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant