Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating associated data in a publication, which are used for improving the efficiency of generating the associated data.
In a first aspect, the present invention provides a method for generating associated data in a publication, the method comprising:
extracting a content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one content entity object;
establishing an international standard association identifier (ISLI) association tree according to the reference data of the layout object; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
determining a finished product entity object tree according to the ISLI association tree; wherein the finished product entity object tree comprises at least one finished product entity object;
and outputting ISLI associated information, wherein the ISLI associated information comprises a content entity object, a finished product entity object and an ISLI associated node.
Optionally, the content entity object includes: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
Optionally, the extracting, from the document data, the content entity object tree and the layout object reference data corresponding to the content entity object specifically include:
typesetting the document data to obtain a content entity object tree and layout object reference data corresponding to the content entity object;
and constructing object information of the content entity object aiming at each content entity object.
Optionally, the object information of the content entity object includes:
content entity object identification, content entity type and content entity hierarchy.
Optionally, the establishing an international standard association identifier (ISLI) association tree according to the layout object reference data corresponding to the content entity object specifically includes:
building an ISLI association node; each content entity object corresponds to one ISLI association node;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of ISLI (integrated services digital interface) associated nodes according to the reference data of the layout object; wherein the large sample area list comprises at least one large sample area.
Optionally, the object information of the ISLI correlation node includes:
the content entity object, the object information of the content entity object, the layout object reference data, the thumbnail area list and the finished product entity object identification.
Optionally, the object information of the large sample area includes:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
Optionally, determining a finished product entity object tree according to the ISLI association tree specifically includes:
initializing the finished product entity object tree;
creating a finished product entity object list according to a large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each finished entity object, inserting the finished entity object into a finished entity object tree.
Optionally, after inserting the finished entity object into the finished entity object tree for each finished entity object, further comprising:
acquiring an updated finished product entity object identifier of a finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
Optionally, the object information of the finished entity object includes: finished product entity object identification and a full-page area.
Optionally, creating a finished product entity object list according to the massive sample area list associated with the ISLI associated node, where the creating specifically includes:
and constructing a finished product entity object aiming at each big sample area.
Optionally, the outputting the ISLI related information specifically includes:
for each content entity object, obtaining object information of the content entity object;
aiming at each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining object information of the ISLI associated node;
and outputting the ISLI association information.
In a second aspect, the present invention provides an apparatus for generating associated data in a publication, the apparatus comprising:
the extraction module is used for extracting the content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one content entity object;
the building module is used for building an international standard association identifier (ISLI) association tree according to the layout object reference data; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
the building module is also used for determining a finished product entity object tree according to the ISLI association tree; wherein the finished product entity object tree comprises at least one finished product entity object;
and the output module is used for outputting the ISLI associated information, wherein the ISLI associated information comprises a content entity object, a finished product entity object and an ISLI associated node.
Optionally, the content entity object includes: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
Optionally, the extraction module is specifically configured to:
typesetting the document data to obtain a content entity object tree and layout object reference data corresponding to the content entity object;
for each content entity object, object information of the content entity object is constructed.
Optionally, the object information of the content entity object includes:
content entity object identification, content entity type and content entity hierarchy.
Optionally, the building block is specifically configured to:
building an ISLI association node; each content entity object corresponds to one ISLI association node;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of ISLI (integrated services digital interface) associated nodes according to the reference data of the layout object; wherein the large sample area list comprises at least one large sample area.
Optionally, the object information of the ISLI associated node includes:
the content entity object, the object information of the content entity object, the layout object reference data, the thumbnail area list and the finished product entity object identification.
Optionally, the object information of the large sample area includes:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
Optionally, the building block is specifically configured to:
initializing the finished product entity object tree;
creating a finished product entity object list according to the large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each finished entity object, inserting the finished entity object into the finished entity object tree.
Optionally, the building block is specifically configured to:
acquiring an updated finished product entity object identifier of a finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
Optionally, the object information of the finished entity object includes: finished product entity object identification and a full-page area.
Optionally, the building block is specifically configured to:
and constructing a finished product entity object aiming at each large sample area.
Optionally, the output module is specifically configured to:
for each content entity object, obtaining object information of the content entity object;
aiming at each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining node information of the ISLI associated node;
and outputting the ISLI association information.
In a third aspect, the present invention provides an apparatus comprising:
a memory for storing a program;
a processor for executing the program stored in the memory, the processor being adapted to perform the method of generating associated data in a publication according to the first aspect or alternative when the program is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of generating associated data in a publication according to the first aspect and the alternative.
The invention provides a method, a device, equipment and a storage medium for generating associated data in a publication, wherein the method for generating the associated data comprises the following steps: the method for generating associated data in a publication provided in this embodiment includes obtaining a content entity object tree and layout object reference data corresponding to the content entity object, and constructing an ISLI associated tree according to the layout object reference data. And determining a finished product entity object tree according to the ISLI association tree. By adopting the method, the association data between the finished product entity object and the content entity object can be automatically generated, the efficiency of generating the association data is high, and the maintenance and management are convenient.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the prior art, an association relationship is usually established between a content entity object and a finished product entity object by a manual method, that is, position data of the content entity object between the finished product entity objects is added in the content entity object. The method is extremely tedious, time-consuming and labor-consuming; and the modification is difficult and the maintenance is complex. The embodiment of the invention provides a method, a device, equipment and a storage medium for generating associated data, which are used for improving the efficiency of generating the associated data.
Fig. 1 is a flow chart illustrating a method for generating associated data in a publication according to an exemplary embodiment of the present invention. As shown in fig. 1, the method for generating associated data in a publication provided by the present invention includes the following steps:
s110, extracting the layout object reference data corresponding to the content entity object tree and the content entity object from the document data.
More specifically, the document data refers to data in which a content entity object is presented in a certain display form. The finished document data may be data that shows the content entity object in the form of a web page, an electronic book, a paper book, or the like. The document data may be based on XML, or may be in other formats.
The content entity object tree comprises at least one content entity object. A content entity object is data that conveys some information to the reader. The content entity object may include:
(1.1) the entirety of the publication.
(1.2) the basic layer component of the publication, which refers to a basic segment in the document data, is based on the typesetting domain, identifiable data entity object, such as: single chapters, sections, titles, paragraphs, illustrations, tables, audios and videos, and the like.
(1.3) the application layer component of the publication refers to an application segment in the document data, which is an identifiable resource entity object based on a certain application field, such as test questions and answers in the textbook, and entries and paraphrases in the tool book.
Extracting a content entity object tree from the document data, wherein the content entity object tree specifically comprises the following contents:
and S111, typesetting the document data based on the XML format, and generating a content entity object tree based on the XML format and layout object reference data corresponding to the content entity object.
S112, on the XML format content entity object tree, finding the content entity object according to the query condition based on the basic layer, creating the object information of the content entity object, and recording the object information into the content entity object information list.
S113, finding the content entity object on the XML-format content entity object tree according to the query condition based on the application layer, creating the information of the content entity object, and recording the information into a content entity object information list.
In S112 and S113, the object information of the content entity object includes the following contents:
and (2.1) content entity object identification for uniquely identifying the content entity object.
(2.2) the content entity type refers to the layout type corresponding to the content entity object, for example: a block, an article, a paragraph, a segment of text, a table or cell, etc.
And (2.3) an application entity hierarchy which indicates whether the content entity object is base layer data or not and is application layer data or not.
Now, content entity object information is illustrated by way of example, fig. 2 is a schematic diagram of a content entity object provided by the present invention, as shown in fig. 2, a certain travel manual includes a plurality of chapters, each chapter corresponds to a tourist attraction, each chapter includes a plurality of sections, the chapter is used for introducing information of the aspects of the overview, history, scenic spot, traffic, etc. of the tourist attraction, and each section is composed of a title and a section. Fig. 3 is a schematic diagram of a content entity object provided by the present invention. As shown in fig. 3, the travel manual may further include some multimedia interactive effects, for example, when the reader scans the traffic route through the recognition tool, the reader may start navigation in the map tool. Object information of a part of the contents entity object of the travel manual is shown in the following table 1.
TABLE 1 object information of a part of the content entity objects of a travel manual
And S120, constructing an international standard association identifier (ISLI) association tree according to the layout object reference data.
More specifically, an international standard association identifier (ISLI) association tree is established according to the layout object reference data, and the method specifically comprises the following steps:
s121, building an ISLI association node. The ISLI association tree is initialized, that is, for each content entity object, an ISLI association node corresponding to the content entity object is constructed.
S122, obtaining layout object reference data of the ISLI associated nodes aiming at each ISLI associated node. Namely, traversing the ISLI associated nodes in the ISLI associated tree to obtain layout object reference data of the ISLI associated nodes.
And S123, determining a large sample area list of the ISLI associated nodes according to the layout object reference data. Wherein the large sample area list comprises at least one large sample area.
The object information of the large sample area includes:
and (3.1) large sample area identification for uniquely identifying the area position of the content entity object.
And (3.2) the page index is used for representing the typesetting page where the content entity object is positioned.
(3.3) a page area for indicating an area of the contents entity object in the imposition page,
(3.4) page types, for example: page, block, column, partition within a column, row, segment within a row.
And (3.5) a full-page object interval which represents the interval of the content entity object, wherein each area in the typesetting page comprises a plurality of intervals.
Taking the example of the object information of the large sample area as an example, with reference to fig. 3, both the area position data set of the content entity object "navigation N1" and the area position data set of the "navigation N3" only include 1 large sample area, because the content entity object "navigation N2" occupies two text lines, the content entity object "navigation N2" includes two large sample areas. The object information of the full-scale area of the three content entity objects is shown in table 2 below.
TABLE 2 area List of thumbnail
In this embodiment, the object information of the ISLI-associated node includes:
and (4.1) the content entity object is represented by the content entity object identifier, and the content entity object corresponding to the ISLI relevant node is specified.
And (4.2) the object information of the content entity object, which specifies the object information of the content object entity corresponding to the ISLI associated node.
And (4.3) referring data of the layout object, and specifying the layout object corresponding to the ISLI association node.
And (4.4) a large sample area list, wherein the large sample area list is associated with the designated ISLI associated node.
And (4.5) identifying the finished product entity object, and specifying the finished product entity object associated with the ISLI association node.
The object information of the finished product entity object comprises a finished product entity object identifier and a full-page proof area.
And S130, determining a finished product entity object tree according to the ISLI association tree.
More specifically, determining a finished product entity object tree according to the ISLI association tree specifically includes:
s131, initializing the finished product entity object tree.
In this case, the initialization process is to construct the tree subtree nodes of the region position data object according to the document data.
S132, creating a finished product entity object list according to the large sample area list associated with the ISLI associated nodes.
The finished product entity object list comprises at least one finished product entity object, ISLI associated nodes in the ISLI associated tree are traversed, and for each ISLI associated node, a large sample area list associated with the ISLI associated node is determined. And (4) constructing a finished product entity object aiming at each large sample area, traversing each large sample area, and constructing a finished product entity object list.
And S133, aiming at each finished product entity object, inserting the finished product entity object into a finished product entity object tree.
And traversing each finished product entity object in the finished product entity object list, and inserting the finished product entity object into the finished product entity object tree.
S134, after the finished product entity object is inserted into the finished product entity object tree, the updated finished product entity object identifier of the finished product entity object is obtained.
If the finished product entity object exists at the inserting position on the finished product entity object tree, unifying the identifier of the finished product entity object to be inserted and the identifier of the finished product entity object at the position on the finished product entity object tree to obtain the updated finished product entity object identifier of the finished product entity object.
And S135, replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
And after the finished product entity object identifier is updated, replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
And S140, outputting the ISLI association information.
More specifically, the ISLI-associated information includes a content entity object, a finished product entity object, and an ISLI-associated node. Outputting the ISLI association information, which specifically comprises:
s141, for each content entity object, object information of the content entity object is obtained.
And traversing the content entity object on the content entity object tree, and outputting the object information corresponding to the content entity object to the XML content resource.
And S142, aiming at each finished product entity object, obtaining the object information of the finished product entity object.
And traversing the finished product entity object in the finished product entity object tree, and outputting the object information of the finished product entity object to the finished product resource.
S143, aiming at each ISLI associated node, obtaining node information of the ISLI associated node.
Traversing ISLI associated nodes on an ISLI associated tree to obtain node information of the ISLI associated nodes
And S144, outputting the ISLI related information.
The method for generating associated data in a publication provided by this embodiment includes obtaining a content entity object tree and layout object reference data corresponding to the content entity object, and constructing an ISLI associated tree according to the layout object reference data. And determining a finished product entity object tree according to the ISLI association tree. By adopting the method, the association data between the finished product entity object and the content entity object can be automatically generated, the efficiency of generating the association data is high, and the maintenance and the management are convenient.
Fig. 4 is a schematic structural diagram of an apparatus for generating related data in a publication according to an exemplary embodiment of the present invention, and as shown in fig. 4, the present invention provides an apparatus for generating related data in a publication, where the apparatus 200 for generating related data includes:
an extracting module 210, configured to extract a content entity object tree and layout object reference data corresponding to the content entity object from the document data; wherein the content entity object tree comprises at least one content entity object;
the building module 220 is configured to build an international standard association identifier (ISLI) association tree according to the layout object reference data; the ISLI association tree comprises at least one ISLI association node, and the ISLI association node is used for expressing the positioning relation between the content entity object and the finished product entity object;
the building module 220 is further configured to determine a finished product entity object tree according to the ISLI association tree; wherein the finished product entity object tree comprises at least one finished product entity object;
the output module 230 is configured to output ISLI associated information, where the ISLI associated information includes a content entity object, a finished product entity object, and an ISLI associated node.
Optionally, the content entity object includes: the overall content of the publication, a base layer component of the publication, or an application layer component of the publication.
Optionally, the extracting module 210 is specifically configured to:
typesetting the document data to obtain a content entity object tree and layout object reference data corresponding to the content entity object;
for each content entity object, object information of the content entity object is constructed.
Optionally, the object information of the content entity object includes:
content entity object identification, content entity type and content entity hierarchy.
Optionally, the building module 220 is specifically configured to:
building an ISLI association node; each content entity object corresponds to one ISLI association node;
aiming at each ISLI associated node, obtaining layout object reference data of the ISLI associated node;
determining a large sample area list of ISLI (integrated abstract language) associated nodes according to the reference data of the layout object; wherein the large sample area list comprises at least one large sample area.
Optionally, the object information of the ISLI correlation node includes:
the content entity object, the object information of the content entity object, the layout object reference data, the thumbnail area list and the finished product entity object identification.
Optionally, the object information of the large sample area includes:
the page comprises a large sample area identifier, a page index, a page area, a page type and a large sample object interval.
Optionally, the building module 220 is specifically configured to:
initializing the finished product entity object tree;
creating a finished product entity object list according to a large sample area list associated with the ISLI associated node; wherein the finished product entity object list comprises at least one finished product entity object;
for each finished entity object, inserting the finished entity object into a finished entity object tree.
Optionally, the building module 220 is specifically configured to:
acquiring an updated finished product entity object identifier of a finished product entity object;
and replacing the finished product entity object identifier in the ISLI associated node with the updated finished product entity object identifier.
Optionally, the object information of the finished entity object includes: and identifying a finished product entity object and a full-page area.
Optionally, the building module 220 is specifically configured to:
and constructing a finished product entity object aiming at each large sample area.
Optionally, the output module 230 is specifically configured to:
for each content entity object, obtaining object information of the content entity object;
aiming at each finished product entity object, obtaining object information of the finished product entity object;
for each ISLI associated node, obtaining object information of the ISLI associated node;
and outputting the ISLI association information.
Fig. 5 is a schematic structural diagram of an apparatus according to an exemplary embodiment of the present invention, and as shown in fig. 5, an apparatus 300 provided in this embodiment includes: a processor 310 and a memory 320.
A memory 320 for storing computer-executable instructions;
the processor 310 is configured to execute the computer-executable instructions stored in the memory to implement the steps performed by the method for generating associated data in the publication in the above-described embodiment. Reference may be made in particular to the relevant description in the associated data generating embodiments in the aforementioned publications.
Alternatively, the memory 320 may be separate or integrated with the processor 310.
When the memory 320 is provided separately, the device further includes a bus 330 for connecting the memory 320 and the processor 310.
The embodiment of the present invention further provides a computer-readable storage medium, in which computer execution instructions are stored, and when a processor executes the computer execution instructions, the method for generating associated data in a publication executed by the above apparatus is implemented.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.