CN110941616A - Method, device and equipment for generating associated data in publication and storage medium - Google Patents

Method, device and equipment for generating associated data in publication and storage medium Download PDF

Info

Publication number
CN110941616A
CN110941616A CN201911171719.0A CN201911171719A CN110941616A CN 110941616 A CN110941616 A CN 110941616A CN 201911171719 A CN201911171719 A CN 201911171719A CN 110941616 A CN110941616 A CN 110941616A
Authority
CN
China
Prior art keywords
entity object
isli
finished product
tree
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911171719.0A
Other languages
Chinese (zh)
Other versions
CN110941616B (en
Inventor
杨燕菲
杨雷鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201911171719.0A priority Critical patent/CN110941616B/en
Publication of CN110941616A publication Critical patent/CN110941616A/en
Application granted granted Critical
Publication of CN110941616B publication Critical patent/CN110941616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Abstract

The invention provides a method, a device, equipment and a storage medium for generating associated data in a publication, wherein the method comprises the steps of extracting a content entity object tree and layout object reference data corresponding to the content entity object from document data, constructing an international standard association identifier (ISLI) associated tree according to the layout object reference data, determining a finished product entity object tree according to the ISLI associated tree, and outputting ISLI associated information, wherein the ISLI associated information comprises the content entity object, the finished product entity object and an ISLI associated node. The method can automatically generate the associated data between the document data and the content entity object, has high efficiency of generating the associated data, and is convenient for maintenance and management.

Description

Method, device and equipment for generating associated data in publication and storage medium
Technical Field
The embodiment of the invention relates to computer technology, in particular to a method, a device, equipment and a storage medium for generating associated data in a publication.
Background
The composite publication which is distributed in multiple formats and takes the content as the center is becoming the development trend in the printing and publishing industry. In order to adapt to the development trend, an International Standard Link Identifier (ISLI) is established between the content entity object and the finished product entity object, so that resources are utilized to the maximum extent, and the requirement of the user is urgent.
In the publishing industry, content entity objects refer to data that convey information to readers, while product entity objects refer to data that present content entity objects in a presentation. In the prior art, an association relationship is established between a content entity object and a finished product entity object, and a manual method is usually adopted to add position data of the content entity object between the finished product entity objects in the content entity object. The method is extremely tedious, time-consuming and labor-consuming; and the modification is difficult and the maintenance is complex.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating associated data in a publication, so as to improve the efficiency of generating the associated data.
In a first aspect, the present invention provides a method for generating associated data in a publication, the method comprising:
extracting a content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one content entity object;
establishing an international standard association identifier (ISLI) association tree according to the reference data of the layout object; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
determining a finished product entity object tree according to the ISLI association tree; wherein the finished product entity object tree comprises at least one finished product entity object;
and outputting ISLI associated information, wherein the ISLI associated information comprises a content entity object, a finished product entity object and an ISLI associated node.
Optionally, the content entity object includes: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
Optionally, the extracting the content entity object tree and the layout object reference data corresponding to the content entity object from the document data specifically includes:
typesetting the document data to obtain a content entity object tree and layout object reference data corresponding to the content entity object;
and constructing object information of the content entity object aiming at each content entity object.
Optionally, the object information of the content entity object includes:
content entity object identification, content entity type and content entity hierarchy.
Optionally, the building of an international standard association identifier ISLI association tree according to the layout object reference data corresponding to the content entity object specifically includes:
building an ISLI association node; each content entity object corresponds to one ISLI association node;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of ISLI (integrated services digital interface) associated nodes according to the reference data of the layout object; wherein the large sample area list comprises at least one large sample area.
Optionally, the object information of the ISLI associated node includes:
the content entity object, the object information of the content entity object, the layout object reference data, the thumbnail area list and the finished product entity object identification.
Optionally, the object information of the large sample area includes:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
Optionally, determining a finished product entity object tree according to the ISLI association tree specifically includes:
initializing the finished product entity object tree;
creating a finished product entity object list according to the large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each finished entity object, inserting the finished entity object into a finished entity object tree.
Optionally, after inserting the finished entity object into the finished entity object tree for each finished entity object, further comprising:
acquiring an updated finished product entity object identifier of a finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
Optionally, the object information of the finished entity object includes: finished product entity object identification and a full-page area.
Optionally, creating a finished product entity object list according to the massive sample area list associated with the ISLI associated node, where the creating specifically includes:
and constructing a finished product entity object aiming at each large sample area.
Optionally, the outputting the ISLI related information specifically includes:
for each content entity object, obtaining object information of the content entity object;
aiming at each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining object information of the ISLI associated node;
and outputting the ISLI association information.
In a second aspect, the present invention provides an apparatus for generating associated data in a publication, the apparatus comprising:
the extraction module is used for extracting the content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one content entity object;
the building module is used for building an international standard association identifier (ISLI) association tree according to the layout object reference data; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
the construction module is also used for determining a finished product entity object tree according to the ISLI associated tree; wherein the finished product entity object tree comprises at least one finished product entity object;
and the output module is used for outputting the ISLI associated information, wherein the ISLI associated information comprises a content entity object, a finished product entity object and an ISLI associated node.
Optionally, the content entity object includes: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
Optionally, the extraction module is specifically configured to:
typesetting the document data to obtain a content entity object tree and layout object reference data corresponding to the content entity object;
and constructing object information of the content entity object aiming at each content entity object.
Optionally, the object information of the content entity object includes:
content entity object identification, content entity type and content entity hierarchy.
Optionally, the building block is specifically configured to:
building an ISLI association node; each content entity object corresponds to one ISLI association node;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of ISLI (integrated services digital interface) associated nodes according to the reference data of the layout object; wherein the large sample area list comprises at least one large sample area.
Optionally, the object information of the ISLI associated node includes:
the content entity object, the object information of the content entity object, the layout object reference data, the thumbnail area list and the finished product entity object identification.
Optionally, the object information of the large sample area includes:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
Optionally, the building block is specifically configured to:
initializing the finished product entity object tree;
creating a finished product entity object list according to the large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each finished entity object, inserting the finished entity object into a finished entity object tree.
Optionally, the building block is specifically configured to:
acquiring an updated finished product entity object identifier of a finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
Optionally, the object information of the finished entity object includes: finished product entity object identification and a full-page area.
Optionally, the building block is specifically configured to:
and constructing a finished product entity object aiming at each large sample area.
Optionally, the output module is specifically configured to:
for each content entity object, obtaining object information of the content entity object;
aiming at each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining node information of the ISLI associated node;
and outputting the ISLI association information.
In a third aspect, the present invention provides an apparatus comprising:
a memory for storing a program;
a processor for executing the program stored in the memory, the processor being adapted to perform the method of generating associated data in a publication according to the first aspect and the alternative when the program is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of generating associated data in a publication according to the first aspect and the alternative.
The invention provides a method, a device, equipment and a storage medium for generating associated data in a publication, wherein the method for generating the associated data comprises the following steps: the method for generating associated data in a publication provided by this embodiment includes obtaining a content entity object tree and layout object reference data corresponding to the content entity object, and constructing an ISLI associated tree according to the layout object reference data. And determining a finished product entity object tree according to the ISLI association tree. By adopting the method, the association data between the finished product entity object and the content entity object can be automatically generated, the efficiency of generating the association data is high, and the maintenance and management are convenient.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for generating associated data in a publication according to an exemplary embodiment of the present invention;
FIG. 2 is a schematic diagram of a content entity object provided by the present invention;
FIG. 3 is a diagram of a content entity object provided by the present invention;
fig. 4 is a schematic structural diagram illustrating an associated data generating apparatus in a publication according to an exemplary embodiment of the present invention;
fig. 5 is a schematic diagram of the structure of an apparatus according to an exemplary embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the prior art, an association relationship is usually established between a content entity object and a finished product entity object by a manual method, that is, position data of the content entity object between the finished product entity objects is added in the content entity object. The method is extremely tedious, time-consuming and labor-consuming; and the modification is difficult and the maintenance is complex. The embodiment of the invention provides a method, a device, equipment and a storage medium for generating associated data, which are used for improving the efficiency of generating the associated data.
Fig. 1 is a flow chart illustrating a method for generating associated data in a publication according to an exemplary embodiment of the present invention. As shown in fig. 1, the method for generating associated data in a publication provided by the present invention includes the following steps:
s110, extracting the layout object reference data corresponding to the content entity object tree and the content entity object from the document data.
More specifically, the document data refers to data in which a content entity object is presented in a certain display form. The finished document data may be data showing the content entity object in the form of a web page, an electronic book, a paper book, and the like. The document data may be based on XML, or may be in other formats.
The content entity object tree comprises at least one content entity object. A content entity object is data that conveys some information to the reader. The content entity object may include:
(1.1) the entirety of the publication.
(1.2) the basic layer component of the publication, which refers to a basic segment in the document data, is based on the typesetting domain, identifiable data entity object, such as: single chapters, sections, titles, paragraphs, illustrations, tables, audios and videos, and the like.
(1.3) the application layer component of the publication refers to an application segment in the document data, which is an identifiable resource entity object based on a certain application field, such as test questions and answers in the textbook, and entries and paraphrases in the tool book.
Extracting a content entity object tree from the document data, wherein the content entity object tree specifically comprises the following contents:
and S111, typesetting the document data based on the XML format, and generating a content entity object tree based on the XML format and layout object reference data corresponding to the content entity object.
S112, on the XML-format content entity object tree, finding the content entity object according to the query condition based on the basic layer, creating the object information of the content entity object, and recording the object information into the content entity object information list.
S113, finding the content entity object on the XML-format content entity object tree according to the query condition based on the application layer, creating the information of the content entity object, and recording the information into the content entity object information list.
In S112 and S113, the object information of the content entity object includes the following contents:
and (2.1) content entity object identification for uniquely identifying the content entity object.
(2.2) the content entity type refers to the layout type corresponding to the content entity object, for example: a block, an article, a paragraph, a segment of text, a table or cell, etc.
And (2.3) an application entity hierarchy which indicates whether the content entity object is the base layer data or not and whether the content entity object is the application layer data or not.
Now, content entity object information is illustrated by way of example, fig. 2 is a schematic diagram of a content entity object provided by the present invention, as shown in fig. 2, a certain travel manual includes a plurality of chapters, each chapter corresponds to a tourist attraction, each chapter includes a plurality of sections, the chapter is used for introducing information of the aspects of the overview, history, scenic spot, traffic, etc. of the tourist attraction, and each section is composed of a title and a section. Fig. 3 is a schematic diagram of a content entity object provided by the present invention. As shown in fig. 3, the travel manual may also include multimedia interactive effects, for example, when the reader scans the traffic route through the reading tool, the reader may start navigation in the map tool. Object information of a part of the contents entity objects of the travel manual is shown in the following table 1.
TABLE 1 object information of a part of the content entity objects of a travel manual
Figure BDA0002288895440000071
And S120, constructing an international standard association identifier (ISLI) association tree according to the layout object reference data.
More specifically, an international standard association identifier (ISLI) association tree is established according to the reference data of the layout object, and the method specifically comprises the following steps:
s121, building an ISLI association node. And (3) carrying out initialization processing on the ISLI association tree, namely constructing an ISLI association node corresponding to each content entity object.
S122, aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node. Namely, traversing the ISLI associated nodes in the ISLI associated tree to obtain layout object reference data of the ISLI associated nodes.
And S123, determining a large sample area list of the ISLI associated nodes according to the layout object reference data. Wherein the large sample area list comprises at least one large sample area.
The object information of the large sample area includes:
and (3.1) large sample area identification for uniquely identifying the area position of the content entity object.
And (3.2) the page index is used for representing the typesetting page where the content entity object is positioned.
(3.3) a page area for representing an area of the contents entity object in the layout page,
(3.4) page types, for example: page, block, column, partition within a column, row, segment within a row.
And (3.5) a large sample object interval which represents an interval where the content entity object is located, wherein each area in the typesetting page comprises a plurality of intervals.
Taking the example of the object information of the full-page area as an example, with reference to fig. 3, the area position data set of the content entity object "navigation N1" and the area position data set of the content entity object "navigation N3" both include only 1 full-page area, because the content entity object "navigation N2" occupies two text lines, and thus the content entity object "navigation N2" includes two full-page areas. The object information of the full-scale area of the three content entity objects is shown in table 2 below.
TABLE 2 area List of thumbnail
Figure BDA0002288895440000081
In this embodiment, the object information of the ISLI-associated node includes:
and (4.1) the content entity object is represented by the content entity object identifier and specifies the content entity object corresponding to the ISLI associated node.
And (4.2) the object information of the content entity object, which specifies the object information of the content object entity corresponding to the ISLI associated node.
And (4.3) referring to data by the layout object, and specifying the layout object corresponding to the ISLI association node.
And (4.4) a large sample area list, wherein the large sample area list is associated with the designated ISLI associated node.
And (4.5) identifying the finished product entity object, and specifying the finished product entity object associated with the ISLI association node.
The object information of the finished product entity object comprises a finished product entity object identifier and a full-page area.
And S130, determining a finished product entity object tree according to the ISLI association tree.
More specifically, determining a finished product entity object tree according to the ISLI association tree specifically includes:
s131, initializing the finished product entity object tree.
In this case, the initialization process is to construct the sub-tree nodes of the region position data object tree from the document data.
S132, creating a finished product entity object list according to the large sample area list associated with the ISLI associated nodes.
The finished product entity object list comprises at least one finished product entity object, ISLI associated nodes in the ISLI associated tree are traversed, and for each ISLI associated node, a large sample area list associated with the ISLI associated node is determined. And (4) constructing a finished product entity object aiming at each large sample area, traversing each large sample area, and constructing a finished product entity object list.
S133, aiming at each finished product entity object, inserting the finished product entity object into the finished product entity object tree.
And traversing each finished product entity object in the finished product entity object list, and inserting the finished product entity object into the finished product entity object tree.
S134, after the finished product entity object is inserted into the finished product entity object tree, the updated finished product entity object identifier of the finished product entity object is obtained.
If the finished product entity object already exists at the inserted position on the finished product entity object tree, unifying the identifier of the finished product entity object to be inserted and the identifier of the finished product entity object at the position on the finished product entity object tree to obtain the updated finished product entity object identifier of the finished product entity object.
And S135, replacing the updated finished product entity object identifier with the finished product entity object identifier in the ISLI associated node.
And after the finished product entity object identifier is updated, replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
And S140, outputting the ISLI related information.
More specifically, the ISLI-related information includes a content entity object, a finished product entity object, and an ISLI-related node. Outputting the ISLI association information, which specifically comprises:
s141, for each content entity object, object information of the content entity object is obtained.
And traversing the content entity object on the content entity object tree, and outputting the object information corresponding to the content entity object to the XML content resource.
And S142, aiming at each finished product entity object, obtaining the object information of the finished product entity object.
And traversing the finished product entity object in the finished product entity object tree, and outputting the object information of the finished product entity object to a finished product resource.
S143, aiming at each ISLI associated node, obtaining node information of the ISLI associated node.
Traversing ISLI associated nodes on the ISLI associated tree to obtain node information of the ISLI associated nodes
And S144, outputting the ISLI related information.
The method for generating associated data in a publication provided by this embodiment includes obtaining a content entity object tree and layout object reference data corresponding to the content entity object, and constructing an ISLI associated tree according to the layout object reference data. And determining a finished product entity object tree according to the ISLI association tree. By adopting the method, the association data between the finished product entity object and the content entity object can be automatically generated, the efficiency of generating the association data is high, and the maintenance and management are convenient.
Fig. 4 is a schematic structural diagram of an apparatus for generating related data in a publication according to an exemplary embodiment of the present invention, and as shown in fig. 4, the present invention provides an apparatus for generating related data in a publication, where the apparatus 200 for generating related data includes:
an extracting module 210, configured to extract a content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one content entity object;
the building module 220 is configured to build an international standard association identifier (ISLI) association tree according to the layout object reference data; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
the building module 220 is further configured to determine a finished product entity object tree according to the ISLI associated tree; wherein the finished product entity object tree comprises at least one finished product entity object;
the output module 230 is configured to output ISLI related information, where the ISLI related information includes a content entity object, a finished product entity object, and an ISLI related node.
Optionally, the content entity object includes: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
Optionally, the extracting module 210 is specifically configured to:
typesetting the document data to obtain a content entity object tree and layout object reference data corresponding to the content entity object;
and constructing object information of the content entity object aiming at each content entity object.
Optionally, the object information of the content entity object includes:
content entity object identification, content entity type and content entity hierarchy.
Optionally, the building module 220 is specifically configured to:
building an ISLI association node; each content entity object corresponds to one ISLI association node;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of ISLI (integrated services digital interface) associated nodes according to the reference data of the layout object; wherein the large sample area list comprises at least one large sample area.
Optionally, the object information of the ISLI associated node includes:
the content entity object, the object information of the content entity object, the layout object reference data, the thumbnail area list and the finished product entity object identification.
Optionally, the object information of the large sample area includes:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
Optionally, the building module 220 is specifically configured to:
initializing the finished product entity object tree;
creating a finished product entity object list according to the large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each finished entity object, inserting the finished entity object into a finished entity object tree.
Optionally, the building module 220 is specifically configured to:
acquiring an updated finished product entity object identifier of a finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
Optionally, the object information of the finished entity object includes: finished product entity object identification and a full-page area.
Optionally, the building module 220 is specifically configured to:
and constructing a finished product entity object aiming at each large sample area.
Optionally, the output module 230 is specifically configured to:
for each content entity object, obtaining object information of the content entity object;
aiming at each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining object information of the ISLI associated node;
and outputting the ISLI association information.
Fig. 5 is a schematic structural diagram of an apparatus according to an exemplary embodiment of the present invention, and as shown in fig. 5, an apparatus 300 provided in this embodiment includes: a processor 310 and a memory 320.
A memory 320 for storing computer-executable instructions;
the processor 310 is configured to execute the computer-executable instructions stored in the memory to implement the steps performed by the method for generating associated data in the publication in the above-described embodiment. Reference may be made in particular to the relevant description in the associated data generating embodiments in the aforementioned publications.
Alternatively, the memory 320 may be separate or integrated with the processor 310.
When the memory 320 is provided separately, the device further includes a bus 330 for connecting the memory 320 and the processor 310.
The embodiment of the invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the method for generating the associated data in the publication, which is executed by the above device, is realized.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (15)

1. A method for generating associated data in a publication, the method comprising:
extracting a content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one of the content entity objects;
establishing an international standard association identifier (ISLI) association tree according to the layout object reference data; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
determining a finished product entity object tree according to the ISLI associated tree; wherein the finished entity object tree includes at least one of the finished entity objects;
and outputting ISLI association information, wherein the ISLI association information comprises the content entity object, the finished product entity object and the ISLI association node.
2. The method of claim 1, wherein the content entity object comprises: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
3. The method according to claim 2, wherein the extracting layout object reference data corresponding to the content entity object tree and the content entity object from the document data specifically comprises:
typesetting the document data to obtain the content entity object tree and layout object reference data corresponding to the content entity object;
and constructing object information of the content entity object aiming at each content entity object.
4. The method of claim 3, wherein the object information of the content entity object comprises:
content entity object identification, content entity type and content entity hierarchy.
5. The method according to any one of claims 1 to 4, wherein the constructing an international standard association identifier (ISLI) association tree according to the layout object reference data corresponding to the content entity object specifically comprises:
constructing the ISLI association node; wherein each content entity object corresponds to one of the ISLI associated nodes;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of the ISLI associated node according to the layout object reference data; wherein the large sample area list comprises at least one large sample area.
6. The method of claim 5, wherein the object information of the ISLI association node comprises:
the content entity object quoted, the object information of the content entity object, the layout object quote data, the sample area list and the finished product entity object mark.
7. The method according to claim 5, wherein the object information of the large sample area comprises:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
8. The method according to any one of claims 1 to 4, wherein the determining a finished product entity object tree according to the ISLI association tree specifically comprises:
initializing the finished product entity object tree;
creating a finished product entity object list according to the large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each of the finished physical objects, inserting the finished physical object into the finished physical object tree.
9. The method of claim 8, wherein after said inserting said finished physical object into said finished physical object tree for each of said finished physical objects, further comprises:
acquiring an updated finished product entity object identifier of the finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
10. The method of claim 8, wherein the object information of the finished entity object comprises: finished product entity object identification and a full-page area.
11. The method according to claim 8, wherein the creating a finished product entity object list according to the large sample area list associated with the ISLI associated node specifically comprises:
and constructing the finished product entity object aiming at each large sample area.
12. The method according to any one of claims 1 to 4, wherein the outputting of the ISLI association information specifically includes:
for each content entity object, obtaining object information of the content entity object;
for each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining node information of the ISLI associated node;
and outputting the ISLI association information.
13. An apparatus for generating associated data in a publication, the apparatus comprising:
the extraction module is used for extracting the content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one of the content entity objects;
the building module is used for building an international standard association identifier (ISLI) association tree according to the layout object reference data; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
the building module is also used for determining a finished product entity object tree according to the ISLI associated tree; wherein the finished entity object tree comprises at least one finished entity object;
and the output module is used for outputting ISLI associated information, wherein the ISLI associated information comprises the content entity object, the finished product entity object and the ISLI associated node.
14. An apparatus, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory, the processor being configured to execute the method for generating associated data in a publication as claimed in any one of claims 1 to 12 when the program is executed.
15. A computer-readable storage medium characterized by comprising instructions which, when executed on a computer, cause the computer to execute the method for generating association data in a publication as claimed in any one of claims 1 to 12.
CN201911171719.0A 2019-11-26 2019-11-26 Method, device and equipment for generating associated data in publication and storage medium Active CN110941616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911171719.0A CN110941616B (en) 2019-11-26 2019-11-26 Method, device and equipment for generating associated data in publication and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911171719.0A CN110941616B (en) 2019-11-26 2019-11-26 Method, device and equipment for generating associated data in publication and storage medium

Publications (2)

Publication Number Publication Date
CN110941616A true CN110941616A (en) 2020-03-31
CN110941616B CN110941616B (en) 2023-03-14

Family

ID=69908802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911171719.0A Active CN110941616B (en) 2019-11-26 2019-11-26 Method, device and equipment for generating associated data in publication and storage medium

Country Status (1)

Country Link
CN (1) CN110941616B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632010A (en) * 2020-12-29 2021-04-09 深圳市天朗时代科技有限公司 File storage method, device and equipment of ISLI (Integrated services digital interface) metadata and readable storage medium
CN112766937A (en) * 2021-04-07 2021-05-07 中国科学院成都文献情报中心 Knowledge work organization and processing system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567303A (en) * 2010-12-24 2012-07-11 北京大学 Typesetting method and device for variable official document data
CN104239305A (en) * 2013-06-07 2014-12-24 阿里巴巴集团控股有限公司 Electronic document generating and displaying method and apparatus
US9130832B1 (en) * 2014-10-09 2015-09-08 Splunk, Inc. Creating entity definition from a file
US9146954B1 (en) * 2014-10-09 2015-09-29 Splunk, Inc. Creating entity definition from a search result set
CN106610929A (en) * 2015-10-26 2017-05-03 北大方正集团有限公司 Method and device for typesetting digital publishing structured content file
CN108829758A (en) * 2018-05-28 2018-11-16 郑州悉知信息科技股份有限公司 A kind of Website construction method and apparatus
CN109670160A (en) * 2017-10-13 2019-04-23 北大方正集团有限公司 The typesetting processing method and device of file

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567303A (en) * 2010-12-24 2012-07-11 北京大学 Typesetting method and device for variable official document data
CN104239305A (en) * 2013-06-07 2014-12-24 阿里巴巴集团控股有限公司 Electronic document generating and displaying method and apparatus
US9130832B1 (en) * 2014-10-09 2015-09-08 Splunk, Inc. Creating entity definition from a file
US9146954B1 (en) * 2014-10-09 2015-09-29 Splunk, Inc. Creating entity definition from a search result set
CN106610929A (en) * 2015-10-26 2017-05-03 北大方正集团有限公司 Method and device for typesetting digital publishing structured content file
CN109670160A (en) * 2017-10-13 2019-04-23 北大方正集团有限公司 The typesetting processing method and device of file
CN108829758A (en) * 2018-05-28 2018-11-16 郑州悉知信息科技股份有限公司 A kind of Website construction method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
方婷云: "基于XML的社科期刊自适应排版技术研究", 《中国优秀博硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632010A (en) * 2020-12-29 2021-04-09 深圳市天朗时代科技有限公司 File storage method, device and equipment of ISLI (Integrated services digital interface) metadata and readable storage medium
CN112632010B (en) * 2020-12-29 2024-03-19 深圳市天朗时代科技有限公司 File storage method, device and equipment of ISLI metadata and readable storage medium
CN112766937A (en) * 2021-04-07 2021-05-07 中国科学院成都文献情报中心 Knowledge work organization and processing system and method

Also Published As

Publication number Publication date
CN110941616B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN110083805B (en) Method and system for converting Word file into EPUB file
CN106156239B (en) Table extraction method and device
CN111160030B (en) Information extraction method, device and storage medium
JP2009122760A (en) Document processing apparatus, document processing method, and document processing program
US20140046899A1 (en) Method and Apparatus of Implementing Navigation of Product Properties
CN111291024A (en) Data processing method and device, electronic equipment and storage medium
CN108334508B (en) Webpage information extraction method and device
US7720814B2 (en) Repopulating a database with document content
CN110941616B (en) Method, device and equipment for generating associated data in publication and storage medium
CN105094775B (en) Webpage generation method and device
CN111274239A (en) Test paper structuralization processing method, device and equipment
KR20120051419A (en) Apparatus and method for extracting cascading style sheet
CN104317909A (en) Method and device for verifying data of points of interest
CN110569371A (en) Knowledge graph construction method and device and storage equipment
CN110825805A (en) Data visualization method and device
JP5446877B2 (en) Structure identification device
CN114462393A (en) Webpage text information extraction method and device, terminal equipment and storage medium
JP2015005100A (en) Information processor, template generation method, and program
CN103678263A (en) Graphical interface display method and system for incidence relations among document chapters
CN115331247A (en) Document structure identification method and device, electronic equipment and readable storage medium
CN111723177B (en) Modeling method and device of information extraction model and electronic equipment
CN109684962B (en) AR electronic book quality detection method
CN111401005B (en) Text conversion method and device and readable storage medium
CN108268659B (en) Method and system for classifying same news information
CN103778104A (en) Information processing device, information processing method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230628

Address after: 3007, Hengqin International Financial Center Building, No. 58 Huajin Street, Hengqin New District, Zhuhai City, Guangdong Province, 519030

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

TR01 Transfer of patent right