CN111241212B

CN111241212B - Knowledge graph construction method and device, storage medium and electronic equipment

Info

Publication number: CN111241212B
Application number: CN202010066621.5A
Authority: CN
Inventors: 李慧; 许蕾; 郝吉芳; 杨卓士; 商晓健; 王炳乾
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2023-10-24
Anticipated expiration: 2040-01-20
Also published as: CN111241212A; WO2021147786A1

Abstract

The disclosure belongs to the technical field of knowledge graph construction, and relates to a construction method and device of a knowledge graph in the art field, a storage medium and electronic equipment. The method comprises the following steps: carrying out first preprocessing on the structured data in the internal artistic data source and the external artistic data source to generate first structured data; performing second preprocessing on unstructured data and semi-structured data in the internal artistic data source and the external artistic data source to obtain second structured data; carrying out fusion processing on the first structured data and the second structured data to generate fusion artistic data; wherein the converged artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity; and generating an artistic triplet according to the artistic entity and the artistic relationship, and generating an artistic domain knowledge graph according to the artistic triplet.

Description

Knowledge graph construction method and device, storage medium and electronic equipment

Technical Field

The disclosure relates to the technical field of knowledge graph construction, in particular to a construction method of a knowledge graph in an art field, a construction device of the knowledge graph in the art field, a computer readable storage medium and electronic equipment.

Background

The knowledge graph is also called a scientific knowledge graph, describes knowledge resources and carriers thereof by using a visual technology, digs, analyzes, constructs, draws and displays knowledge and the interrelation among the knowledge resources, is a series of various different graphs for displaying the knowledge development process and the structural relationship, and provides a way for better organizing, managing and understanding mass information of the Internet. The knowledge graph is also a prototype for constructing a next generation search engine, so that the search is more semantic and intelligent. At present, the knowledge graph has two types, namely a general knowledge graph and a domain knowledge graph. The domain knowledge graph is also called an industry knowledge graph or a vertical knowledge graph, and is generally oriented to a specific domain and corresponds to an industry knowledge base based on semantic technology. Because the domain knowledge graph is constructed based on industry data, more strict and rich data modes exist, and higher requirements are also placed on the depth and accuracy of domain knowledge.

However, the existing construction method of the domain knowledge graph has the problem that the construction method of the English professional domain knowledge graph cannot be fully applied to the construction of the Chinese professional domain knowledge graph, and the existing construction method of the professional domain knowledge graph is difficult to consider the scale and accuracy of acquiring the professional knowledge and is also difficult to fuse the domain knowledge acquired from various data sources.

In view of this, there is a need in the art to develop a new method and apparatus for constructing knowledge maps in artistic fields.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure aims to provide a construction method of an artistic domain knowledge graph, a construction device of the artistic domain knowledge graph, a computer readable storage medium and electronic equipment, so as to overcome the problems of small scale, low accuracy, imperfect knowledge and the like of the knowledge graph caused by the limitation of related technologies at least to a certain extent.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the present disclosure, there is provided a method for constructing an art domain knowledge graph, the method comprising: carrying out first preprocessing on the structured data in the internal artistic data source and the external artistic data source to generate first structured data; performing second preprocessing on unstructured data and semi-structured data in the internal artistic data source and the external artistic data source to obtain second structured data; carrying out fusion processing on the first structured data and the second structured data to generate fusion artistic data; wherein the fusion artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity; and generating an artistic triplet according to the artistic entity and the artistic relationship, and generating an artistic domain knowledge graph according to the artistic triplet.

In an exemplary embodiment of the present disclosure, the first preprocessing of the structured data in the internal artistic data source and the external artistic data source to generate first structured data includes: carrying out data cleaning on the structured data in the internal artistic data source and the external artistic data source; performing repeatability test on the data cleaning result of the structured data to generate repeatability test data; and generating a data dictionary and an error correction dictionary according to the repeatability test data, and generating first structured data based on the data dictionary.

In an exemplary embodiment of the present disclosure, the data cleansing of structured data in the internal and external art data sources includes: carrying out single-value attribute judgment processing on the structured data in the internal artistic data source and the external artistic data source to obtain single-value structured data; acquiring a first structuring entity and a second structuring relation in the single-value structuring data, and counting the result of single-value attribute judging processing to obtain a multi-value data table; if the multi-value data table does not contain multi-value data, taking the first structuring entity and the second structuring relation as a data cleaning result; and if the multi-value data table contains multi-value data, obtaining a second structuring entity and a second structuring relation according to the multi-value data table to serve as a data cleaning result.

In an exemplary embodiment of the disclosure, the obtaining the second structured entity and the second structured relationship according to the multi-value data table as a data cleaning result includes: updating a data dictionary or an error correction dictionary according to the multi-value data table; and obtaining a second structuring entity and a second structuring relation as a data cleaning result according to the data dictionary or the updating result of the error correction dictionary.

In an exemplary embodiment of the disclosure, the performing a repeatability test on the data cleansing result of the structured data, generating repeatability test data includes: performing repeatability test on the artwork entity on the data cleaning result of the original structured data to generate an artwork repeatability test result; if the artwork repeatability test results are the same, carrying out artist entity repeatability test on the data cleaning results to generate artist repeatability test results; if the artist repeatability test results are the same, carrying out repeatability test on the creation time entity on the data cleaning result to generate creation time repeatability test results; if the results of the repeated test of the creation time are the same, determining that the artwork entity is a repeated artwork; and carrying out fusion processing on the repeated artworks, and generating repeatability test data according to the fusion processing result passing the verification.

In an exemplary embodiment of the present disclosure, the method further comprises: if the artist repeatability test results are different or the creation time repeatability test results are different, determining that the artwork entity is a famous artwork; and carrying out de-duplication treatment on the renamed artwork, and generating the repeatability test data according to a de-duplication treatment result.

In one exemplary embodiment of the present disclosure, the first structured data includes target artwork data, target artist data, and target artistic institution data; the fusing processing is performed on the first structured data and the second structured data to generate fused artistic data, which includes: carrying out fusion processing on the reference artist data in the second structured data and the target artist data to generate fusion artist data; carrying out fusion processing on the reference artwork data in the second structured data and the target artwork data to generate fusion artwork data; and carrying out fusion processing on the reference artistic institution data and the target artistic institution data in the second structured data to generate fusion artistic institution data.

In an exemplary embodiment of the disclosure, the fusing the reference artist data in the second structured data with the target artist data to generate fused artist data includes: performing vector conversion on the reference artist data and the target artist data in the second structured data according to the word vector model to obtain an artist word vector sequence; calculating artist similarity vectors among the artist word vector sequences, and carrying out weighted calculation according to first weights of the artist similarity vectors; obtaining artist similarity according to the weighted calculation result, and judging whether the artist similarity is larger than a first threshold value or not; and carrying out fusion processing on the reference artist data and the target artist data corresponding to the artist similarity larger than the first threshold value to generate fusion artist data.

In an exemplary embodiment of the disclosure, the fusing the reference artwork data in the second structured data and the target artwork data to generate fused artwork data includes: performing vector conversion on the reference artwork data and the target artwork data in the second structured data according to the word vector model to obtain an artwork word vector sequence; calculating artwork similarity vectors among the artwork word vector sequences, and carrying out weighted calculation according to second weights of the artwork similarity vectors; obtaining the similarity of the artwork according to the weighted calculation result, and judging whether the similarity of the artwork is larger than a second threshold value or not; and carrying out fusion processing on the reference artwork data and the target artwork data corresponding to the artwork similarity larger than the second threshold value to generate fusion artwork data.

In an exemplary embodiment of the disclosure, the fusing the reference artistic entity data in the second structured data and the target artistic entity data to generate fused artistic entity data includes: performing vector conversion on the reference artistic organization data and the target artistic organization data in the second structured data according to the word vector model to obtain an artistic organization word vector sequence; calculating artistic mechanism similarity vectors among the artistic mechanism word vector sequences, and carrying out weighted calculation according to the third weights of the artistic mechanism similarity vectors; obtaining the similarity of the artistic institutions according to the weighted calculation result, and judging whether the similarity of the artistic institutions is larger than a third threshold value or not; and carrying out fusion processing on the reference artistic institution data and the target artistic institution data corresponding to the artistic institution similarity which is larger than the third threshold value, and generating fusion artistic institution data.

According to an aspect of the present disclosure, there is provided a construction apparatus of an art domain knowledge graph, the apparatus including: the data processing module is configured to perform first preprocessing on the structured data in the internal artistic data source and the external artistic data source to generate first structured data; the data analysis module is configured to perform second preprocessing on unstructured data and semi-structured data in the internal art data source and the external art data source to obtain second structured data; the data fusion module is configured to fuse the first structured data with the second structured data to generate fused artistic data; wherein the fusion artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity; and the map generation module is configured to generate an artistic triplet according to the artistic entity and the artistic relation and generate an artistic domain knowledge map according to the artistic triplet.

According to one aspect of the present disclosure, there is provided an electronic device including: a processor and a memory; the memory stores computer readable instructions, which when executed by the processor, implement the method for constructing an artistic field knowledge graph according to any of the above-described exemplary embodiments.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of constructing an art domain knowledge graph in any of the above-described exemplary embodiments.

As can be seen from the above technical solutions, the method for constructing an artistic field knowledge graph, the apparatus for constructing an artistic field knowledge graph, the computer storage medium, and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:

in the method and the device provided by the exemplary embodiment of the disclosure, on one hand, the data fusion processing is performed through the data in the external data source and the normalized data, so that the scale of entity knowledge in the art field is greatly increased, and the accuracy of knowledge acquisition in the art field is improved; on the other hand, the knowledge graph in the art field is generated according to the artistic triples, so that the relevance of the entities in the knowledge graph and the comprehensiveness of the knowledge graph search are improved, the query intention is more accurately understood, and the retrieval accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

Fig. 1 schematically illustrates a flowchart of a method for constructing an art domain knowledge graph in an exemplary embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of generating first structured data in an exemplary embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a method of data cleansing in an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of obtaining data cleansing results in an exemplary embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram of a method of generating repeatability test data in an exemplary embodiment of the disclosure;

FIG. 6 schematically illustrates a flow diagram of another method of generating repeatability test data in an exemplary embodiment of the disclosure;

FIG. 7 schematically illustrates a flow diagram of a method of generating fused artistic data in an exemplary embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow diagram of a method of obtaining fused artist data in an exemplary embodiment of the present disclosure;

FIG. 9 schematically illustrates a flow diagram of a method of obtaining fused artwork data in an exemplary embodiment of the present disclosure;

FIG. 10 schematically illustrates a flow diagram of a method of obtaining fused artistic data in an exemplary embodiment of the present disclosure;

fig. 11 schematically illustrates a flowchart of an art domain knowledge graph construction method of an application scenario in an exemplary embodiment of the present disclosure;

fig. 12 schematically illustrates a flowchart of a method for performing first preprocessing of data in an application scenario in an exemplary embodiment of the present disclosure;

FIG. 13 schematically illustrates a flow diagram of a method for data cleansing in an application scenario in an exemplary embodiment of the present disclosure;

fig. 14 schematically illustrates a flowchart of a processing method when a drawing is repeated in an application scenario in an exemplary embodiment of the present disclosure;

Fig. 15 schematically illustrates a flowchart of a method for generating fusion art data in an application scenario in an exemplary embodiment of the present disclosure;

FIG. 16 schematically illustrates an interface schematic of an art domain knowledge graph visualized under an application scenario in an exemplary embodiment of the present disclosure;

FIG. 17 schematically illustrates a scene diagram of an art domain knowledge graph application in an art encyclopedia in an exemplary embodiment of the present disclosure;

fig. 18 schematically illustrates a scene diagram of an art domain knowledge graph applied in a knowledge graph in an exemplary embodiment of the present disclosure;

FIG. 19 schematically illustrates a scene diagram of an art knowledge graph application in an art knowledge question and answer in an exemplary embodiment of the disclosure;

FIG. 20 schematically illustrates a scene diagram of an art domain knowledge graph application in an art knowledge overview in an exemplary embodiment of the disclosure;

fig. 21 schematically illustrates a structural diagram of a construction apparatus of an art domain knowledge graph in an exemplary embodiment of the present disclosure;

fig. 22 schematically illustrates an electronic device for implementing a construction method of an art domain knowledge graph in an exemplary embodiment of the present disclosure;

fig. 23 schematically illustrates a computer-readable storage medium for implementing a construction method of an art domain knowledge graph in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second" and the like are used merely as labels, and are not intended to limit the number of their objects.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

Aiming at the problems in the related art, the disclosure provides a construction method of a knowledge graph in the art field. Fig. 1 shows a flowchart of a method for constructing an artistic domain knowledge graph, and as shown in fig. 1, the method for constructing an artistic domain knowledge graph at least comprises the following steps:

s110, carrying out first preprocessing on the structured data in the internal artistic data source and the external artistic data source to generate first structured data.

And S120, performing second preprocessing on unstructured data and semi-structured data in the internal artistic data source and the external artistic data source to obtain second structured data.

S130, fusing the first structured data and the second structured data to generate fused artistic data; wherein the converged artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity.

And S140, generating an artistic triplet according to the artistic entity and the artistic relation, and generating an artistic field knowledge graph according to the artistic triplet.

In the exemplary embodiment of the disclosure, on one hand, the data fusion processing is performed through the data in the external data source and the normalized data, so that the scale of entity knowledge in the art field is greatly increased, and the accuracy of knowledge acquisition in the art field is improved; on the other hand, the knowledge graph in the art field is generated according to the artistic triples, so that the relevance of the entities in the knowledge graph and the comprehensiveness of the knowledge graph search are improved, the query intention is more accurately understood, and the retrieval accuracy is improved.

The steps of the construction method of the knowledge graph in the art are described in detail below.

In step S110, first preprocessing is performed on the structured data in the internal artistic data source and the external artistic data source, and first preprocessing of the first structured data is generated.

In an exemplary embodiment of the present disclosure, the internal artistic data source and the external artistic data source may be determined for an acquisition source of artistic data, for example, data in the internal artistic data source may be mainly structured data after being manually processed, and data in the external artistic data source may be crawled according to public data of the internet, mainly semi-structured data. However, the internal artistic data source may also include unstructured data and semi-structured data, and the external artistic data source may also include structured data and unstructured data, so that structured data in the internal data source and the external data source may be obtained.

After the structured data is loaded, a first preprocessing may be performed according to the structured data to obtain first structured data. In an alternative embodiment, fig. 2 shows a flow diagram of a method of generating first structured data, as shown in fig. 2, the method comprising at least the steps of: in step S210, data cleansing is performed on structured data in the internal art data source and the external art data source.

Specifically, in an alternative embodiment, fig. 3 shows a schematic flow chart of a method for performing data cleansing on structured data, and as shown in fig. 3, the method at least includes the following steps: in step S310, single-value attribute determination processing is performed on structured data in the internal art data source and the external art data source to obtain single-value structured data. Wherein, the single-value attribute can be an attribute with only one specific value of data. For example, the method for determining the single-value attribute of the structured data may be to determine whether a composer of a drawing has only one person, if the composer corresponding to a drawing has two kinds of Sanskrit and Sanskrit valley respectively, the corresponding single-value structured data is not obtained; if the corresponding artist of a drawing is Sanskyline, the single-value structured data of the artist of the drawing can be obtained to be Sanskyline. In addition, the single-valued structured data may include drawings, creation time, genre, nationality, etc., to which the present exemplary embodiment is not particularly limited.

In step S320, a first structured entity and a first structured relationship in the single-value structured data are acquired, and a multi-value data table is obtained by counting the result of the single-value attribute determination processing. In the acquired single-valued structured data, the corresponding first structured entity and first structured relationship may be extracted. For example, the first structured entity may include an artist entity, an artwork entity, an creation time entity, a genre entity, a nationality entity, and so forth; for an artist entity, the corresponding structured relationships may include relationships between authored artwork entities, authoring times corresponding to all authored artwork entities, formed genres and home nationalities, and so forth. In addition, the structured data which is not passed by the single-value attribute judgment can be counted to obtain a multi-value data table.

In step S330, if the multi-value data table does not contain multi-value data, the first structured entity and the first structured relationship are used as the data cleansing result. When the multi-value data is not counted or all the multi-value data in the multi-value data table are updated, further audit can be performed. And when the audit is manual audit and the manual audit passes, directly determining the obtained first structuring entity and the first structuring relation as a data cleaning result. In addition, the auditing mode can also be to automatically audit according to preset rules, so that auditing flow and labor cost are saved, and auditing accuracy is improved.

It should be noted that in the embodiment of the present invention, the auditing step may be automatic auditing according to a custom setting rule, or may be direct auditing by a manual operation. Both manual auditing and automatic auditing are interchangeable.

In step S340, if the multi-level data table contains multi-level data, the second structured entity and the second structured relationship are obtained according to the multi-level data table as a data cleaning result.

Further, in an alternative embodiment, fig. 4 shows a flow chart of a method for obtaining a data cleaning result, and as shown in fig. 4, the method at least includes the following steps: in step S410, the data dictionary and the error correction dictionary are updated according to the multi-value data table. For example, a multi-value data table of a composer corresponding to a drawing may include two values of Sanskyline and Sanskyline, and during further manual verification, the Sanskyline is found to be an alias of Sanskyline, so that the Sanskyline can be replaced by the Sanskyline, and a corresponding error correction dictionary is generated for updating. In addition, the auditing mode can also be to automatically audit according to preset rules, so that auditing flow and labor cost are saved, and auditing accuracy is improved. And, the error correction dictionary may be a database for storing data sources related to operations of adding data, modifying data, etc., relationships with other data, uses, formats, etc.; correspondingly, the data dictionary can be a database for storing data sources of standard data such as formats and contents, relations with other data, uses, formats and the like.

In step S420, according to the updated result of the data dictionary or the error correction dictionary, the second structured entity and the second structured relationship are obtained as the data cleaning result. And when the multi-value data table is not empty, further judging the updating results of the data dictionary and the error correction dictionary until the multi-value data in all the multi-value data tables are updated, wherein the obtained second structuring entity and second structuring relation are used as data cleaning results.

In the present exemplary embodiment, the data cleaning result is generated by the structured entity and the structured relationship in the multi-value data table, so that the knowledge in the art field is convenient to update, and the updating mode is simple and the accuracy is high.

In step S220, the data cleaning result of the structured data is subjected to a repeatability test, and repeatability test data is generated. In an alternative embodiment, the structured entities include an artwork entity, an artist entity, and an authoring time entity, and fig. 5 shows a flow chart of a method of generating repeatability test data, as shown in fig. 5, the method comprising at least the steps of: in step S510, the data cleaning result of the structured data is subjected to the repeatability test of the artwork entity, and an artwork repeatability test result is generated. The checking of the artwork entity can be to acquire the name of the artwork and judge whether the name of the artwork is consistent or not to generate a corresponding artwork repeatability checking result. For example, when the obtained artwork with the name "Mona Lisa" is repeatedly checked, the obtained result of the repeatability check is the same; when the repeatability test of the artwork entity is performed on a painting named "Mona Lisa" and a painting named "girl wearing pearl earrings", the repeatability test results can be obtained differently. It should be noted that the repeatability verification is to determine whether two entities are substantially identical, for example, if the name of the author of an artwork is full name and short name, the author is substantially the same, and the repeatability verification structure should be the same.

In step S520, if the result of the repeatability test of the artwork is the same, the data cleaning result is subjected to the repeatability test of the artist entity, and the artist repeatability test result is generated. For example, when the names of two drawings are Mona Lisa, it may be determined that the results of the repeatability test of the artwork of the drawings are the same. However, the two drawings may be processed by a post-composer, may be from a different museum, or may be different drawings due to other reasons, so that further determination may be made. In particular, it may be a repeatability test of the authored artist entity of the artwork.

In step S530, if the artist repeatability test results are the same, the data cleansing solution is employed to conduct the repeatability test of the authoring time entity, and an authoring time repeatability test result is generated. For example, when the corresponding artists for both of the pictures named "Mona Lisa" are also the same, the result of the artist repeatability test may be determined to be the same. Further, repeatability test can be performed on the time of creation.

In step S540, if the creation time repeatability test results are the same, it is determined that the artwork entity is a repeated artwork. For example, when the artist and the creation time corresponding to the two paintings named "Mona Lisa" are the same, it may be determined that the result of the creation time repeatability test is the same. Thus, the artwork may be determined to be a repeated artwork based on the results of the repeatability tests in these three dimensions.

In step S550, the repeated works of art are subjected to fusion processing, and repeatability test data is generated according to the result of the fusion processing passed by the audit. When the two artwork entities are found to be repeated artwork, the two artwork entities can be subjected to fusion processing, and the result of the fusion processing is subjected to manual auditing. The manual audit can further determine whether the repeatability test results of other dimensions are the same, and if the repeatability test results pass the manual audit, the repeatability test data of the artwork can be generated. The data dictionary may also be updated based on the repeatability test data.

In the present exemplary embodiment, entity fusion processing can be performed on repeated drawings through three-dimensional determination, so that updating of the data dictionary is realized, the data dictionary can be more accurately perfected, knowledge updating of the data dictionary is ensured, and the workload of multiple determinations caused by the same data dictionary is also reduced.

In addition to generating the repeatability test data from the repeated artwork, the repeatability test data may also be generated from the renamed artwork. In an alternative embodiment, fig. 6 shows a flow chart of another method of generating repeatability test data, as shown in fig. 6, comprising at least the steps of: in step S610, if the artist repeatability test result is different or the creation time repeatability test result is different, it is determined that the artwork entity is a famous artwork. When the result of the artwork repeatability test is the same, whether the result of the artist repeatability test is the same as the result of the creation time repeatability test can be further judged. For example, when the names of two drawings are both "self-portrait", it may be determined that two different painters are authoring, and thus the artist repeatability test results are different. In view of this, these two "self-portrait" drawings are made as rename drawings. In addition, when the result of the repeatability test of the artwork is the same, the repeatability test can be further performed on the creation time to determine whether the artwork is a famous artwork.

In step S620, the duplicate artwork is subjected to a duplicate removal process, and duplicate test data is generated according to the duplicate removal process result. For example, when two "self-portrait" drawings are made as duplicate drawings, a deduplication process may be performed, i.e., the two drawings are determined as two data dictionaries. In order to determine the updating accuracy of the data dictionary, manual auditing can be performed, and only the renamed artwork passing the auditing can generate repeatability test data to update the data dictionary. In addition, the auditing mode can also be to automatically audit according to preset rules, so that auditing flow and labor cost are saved, and auditing accuracy is improved.

In the present exemplary embodiment, by performing determination of other two dimensions on the artwork with the same name, and performing entity deduplication processing on the duplicate painting to update the data dictionary, the problems of inaccurate knowledge determination and stale data dictionary caused by too few dimensions can be avoided, and the accuracy of the data dictionary is ensured.

In step S230, a data dictionary and an error correction dictionary are generated from the repeatability test data, and first structured data is obtained based on the data dictionary. Specifically, the first structured data includes a data dictionary, and some attribute data is not included in the data dictionary, and the attribute data is also included in the first structured data. After the repeatability test data is generated, there may be a problem of incorrect artwork names or other information, and further manual review may be performed. And after the manual auditing is passed, generating a corresponding data dictionary or an error correction dictionary. In addition, the auditing mode can also be to automatically audit according to preset rules, so that auditing flow and labor cost are saved, and auditing accuracy is improved. Before the target art data is generated from the data dictionary or the error correction dictionary, there may be a problem of whether the interval between foreign names is "·", or "-" or whether the interval between creation times is "." or "-", and therefore, the normalization of these data may be determined, and the data dictionary or the error correction dictionary conforming to the storage specification of the art field database may be used as the target art data, or the data dictionary or the error correction dictionary not conforming to the storage specification may be corrected, or the target art data may be used.

In the present exemplary embodiment, the first preprocessing process for the structured data may generate the corresponding target artistic data, which has a simple and accurate processing manner, reduces the manual workload, and has extremely strong practicability.

In step S120, the unstructured data and the semi-structured data in the internal artistic data source and the external artistic data source are subjected to a second preprocessing to obtain second structured data.

The second preprocessing may be a step of data cleaning, specifically may be art data consistency checking, missing value processing, invalid value processing, repeated data judgment, and the like, and may also be a cleaning work performed on data cleaning by configuring or embedding a custom code according to a processing requirement on the art data.

In the exemplary embodiment of the present disclosure, the data integrity in the internal art data source may be only about 60%, so that the data in the internal art data source may be expanded by fusion and gap filling of the data in the external art data source, and the data integrity in the internal art data source may be improved. In particular, semi-structured data may be crawled in the internet public data to obtain second structured data that may be used for population.

The method for processing the semi-structured data may be to parse the semi-structured data using a preset rule and a preset regular expression. For example, by "the works of da vinci have mona lisa, last dinner, rock-room goddess, etc.", the rules "the works of the author" have "the works" "; through the ' Jack Mona Lisa of Davinci ' which reflects exquisite artistic works of him ', the Jack "works" of rules "author" can be constructed; rules can be constructed that Mona Lisa is an Italian painter Davinci created oil painting and that work is an author; by "Mona Lisa represents the highest artistic achievement of Davinci," rules "can be constructed, with" works "representing" authors "and so on. The semi-structured data can be analyzed through the manually constructed preset rules to obtain target structured data. In addition, a regular expression can be constructed according to the method that "da vinci, 4 th of 1452, italian, representative is drawn with Mona Lisa, last dinner, rock-room holy bus, etc." the regular expression is that the content before the second comma is filled into the month of birth, the content between the second comma and the third comma is filled into nationality, and the content after the third comma is filled into the representative. Therefore, the second structured data can also be obtained by performing second preprocessing on the semi-structured data through the constructed regular expression.

In step S130, the first structured data and the second structured data are fused to generate fused artistic data; wherein the converged artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity.

In an exemplary embodiment of the present disclosure, the first structured data includes target artwork data, target artist data, and target art institution data, fig. 7 shows a flow chart of a method of generating fused art data, as shown in fig. 7, the method comprising at least the steps of:

in step S710, the reference artist data and the target artist data in the second structured data are fused, and fused artist data is generated.

In an alternative embodiment, fig. 8 shows a flow chart of a method of obtaining fused artist data, as shown in fig. 8, the method comprising at least the steps of: in step S810, vector conversion is performed on the reference artist data and the target artist in the second structured data according to the word vector model, so as to obtain an artist word vector sequence. The Word vector model may be a Word2Vec model. The Word2Vec model is a Word2Vec tool published by google in 2013, and can be regarded as an important application of deep learning in the field of natural language processing. Although Word2Vec has only three layers of neural networks, very good results have been achieved. The Word2Vec model can be used for expressing the Word as a Word vector, and the Word is digitized, so that a computer can better understand the Word, and the vector generated by the Word can embody semantic information. To use this semantic information, the Word2Vec Model may be implemented in two specific ways, namely a Continuous Bag of Words Model (CBOW) and a Skip-grams Model. Wherein the CBOW model is used for predicting input word segmentation by giving context information; the Skip-grams model predicts context given an input word segmentation, where the first part is the build model and the second part obtains the embedded word vector through the model. Preferably, vector conversion of word sequences may employ Skip-grams models. The Skip-gram model is utilized to convert word vectors, a real number vector with 300 dimensions can be used for uniquely representing a word in a word space, and the reference artist data and the target artist data are represented by multiplying the number of word sequences by a 300 vector matrix, so that corresponding artist word vector sequences are obtained.

In step S820, artist similarity vectors between the artist word vector sequences are calculated, and weighted calculation is performed according to the first weights of the artist similarity vectors. For artist similarity, there may be multiple dimensions of artist similarity vectors, such as nationality of artist, genre of artist, etc. Thus, an artist similarity vector between the sequence of word vectors for each dimension of the artist can be calculated first.

For example, the lengths of two artist word vector sequences in the same dimension may not be identical, so that the two artist word vector sequences may be used as inputs to a twin long-term memory (abbreviated as twin LSTM) network model to accommodate variable length sequence pairs. The twin long-term memory network model consists of two identical neural network models, and the twin purpose is achieved between the two neural network models through sharing weights. The reference artist word vector sequence and the target artist word vector sequence are respectively input into two neural network models, and artist similarity components between the input reference artist word vector sequence and the target artist word vector sequence are evaluated by calculating the distance between the two vector sequences. Wherein the distance calculation between two vector sequences depends mainly on the manhattan distance. In addition, the artist similarity vector may be calculated by other algorithms, which is not particularly limited in the present exemplary embodiment.

After obtaining the artist similarity components of each dimension, weighting calculation can be performed on each artist similarity component according to a preset first weight, and a weighting calculation result is obtained. For example, the dimensions associated with the artist similarity component may include genre, nationality. Further, the corresponding first weight may be set to 0.4 for the genre, and may be set to 0.6 for the nationality, so as to multiply the genre component in the artist similarity component by 0.4, multiply the nationality component in the artist similarity component by 0.6, and perform summation calculation to obtain a corresponding calculation result.

In step S830, the artist similarity is obtained according to the weighted calculation result, and it is determined whether the artist similarity is greater than a first threshold. The artist similarity is obtained after weighting the artist similarity components of each dimension. Accordingly, it is possible to set the first threshold value according to the overall value of the artist similarity and determine whether the artist similarity is greater than the first threshold value. For example, the first threshold may be set to 1. When the weighted calculation result is 0.8, determining that the artist similarity is smaller than a first threshold according to the fact that 0.8 is smaller than 1; when the weighted calculation result is 1.2, it may be determined that the artist similarity is greater than the first threshold according to 1.2 being greater than 1.

In step S840, the reference artist data and the target artist data corresponding to the artist similarity greater than the first threshold are subjected to fusion processing, and fused artist data is generated. When the judgment result is that the artist similarity is larger than the first threshold, it can be determined that the reference artist data and the target artist data point to the same artist, so that fusion processing is performed on the reference artist data and the target artist data, and fusion artist data is obtained.

In the present exemplary embodiment, the reference artist data and the target artist data meeting the preset conditions may be fused to obtain the fused artist data by calculating the similarity vector of each dimension corresponding to the reference artist data and the target artist data, so that the calculation mode is simple, the fusion accuracy is high, and the accuracy of artist data acquisition is improved.

In step S720, the reference artwork data and the target artwork data in the second structured data are fused, so as to generate fused artwork data.

In an alternative embodiment, fig. 9 shows a flow chart of a method for obtaining fused artwork data, as shown in fig. 9, the method at least comprises the following steps: in step S910, vector conversion is performed on the reference artwork data and the target artwork data in the second structured data according to the word vector model, so as to obtain an artwork word vector sequence. The Word vector model may be a Word2Vec model. The Word2Vec model can be used for expressing the Word as a Word vector, and the Word is digitized, so that a computer can better understand the Word, and the vector generated by the Word can embody semantic information. To use this semantic information, the Word2Vec Model may be implemented in two specific ways, namely a Continuous Bag of Words Model (CBOW) and a Skip-grams Model. Preferably, vector conversion of word sequences may employ Skip-grams models. The Skip-grams model is utilized to convert word vectors, a real number vector with 300 dimensions can be used for uniquely representing a word in a word space, and the reference artwork data and the target artwork data are represented by multiplying the number of word sequences by a 300 vector matrix, so that corresponding artwork word vector sequences are obtained.

In step S920, an artwork similarity vector between the artwork word vector sequences is calculated, and a weighting calculation is performed according to the second weight of the artwork similarity vector. For artwork similarity, there may be multiple dimensions of artwork similarity vectors, such as genre to which the artwork belongs, creation time of the artwork, artwork organization for artwork preservation, etc. Thus, an artist similarity vector between the sequence of word vectors for each dimension of the artist can be calculated first.

For example, the lengths of two artwork word vector sequences in the same dimension may be inconsistent, so that the two artwork word vector sequences may be used as inputs to a twin long-term memory (abbreviated as twin LSTM) network model to accommodate variable length sequence pairs. And respectively inputting the reference artwork word vector sequence and the target artwork word vector sequence into two neural network models, and evaluating an artwork similarity component between the input reference artwork word vector sequence and the target artwork word vector sequence by calculating the distance between the two vector sequences. Wherein the distance calculation between two vector sequences depends mainly on the manhattan distance. In addition, the artwork similarity vector may be calculated by other algorithms, which is not particularly limited in the present exemplary embodiment.

After obtaining the artwork similarity components of each dimension, weighting calculation can be performed on the artwork similarity components according to a preset second weight, and a weighting calculation result is obtained. For example, the corresponding second weight may be set to 0.4 for the artwork, 0.3 for the creation time, and 0.3 for the second weight set by the preservation mechanism, further, the genre component in the artwork similarity component is multiplied by 0.4, the creation time component in the artwork similarity component is multiplied by 0.3, the artistic mechanism component in the artwork similarity component is multiplied by 0.3, and the corresponding calculation result is obtained by summation calculation.

In step S930, the artwork similarity is obtained according to the weighted calculation result, and it is determined whether the artwork similarity is greater than a second threshold. And obtaining the artwork similarity after weighting and calculating the artwork similarity components of each dimension. Accordingly, it is possible to set the second threshold value according to the overall value for the artwork similarity and determine whether the artwork similarity is greater than the second threshold value. For example, the second threshold may be set to 2. When the weighted calculation result is 0.8, determining that the artist similarity is smaller than a second threshold according to the fact that 0.8 is smaller than 2; when the weighted calculation result is 3.2, it may be determined that the artist similarity is greater than the second threshold according to 3.2 being greater than 2.

In step S940, the reference artwork data and the target artwork data corresponding to the artwork similarity greater than the second threshold are fused to generate fused artwork data. When the result of the determination is that the similarity of the artwork is greater than the second threshold, it may be determined that the reference artwork data and the target artwork data point to the same artwork, so that fusion processing is performed on the reference artwork data and the target artwork data, and fused artwork data is obtained.

In the present exemplary embodiment, the reference artwork data and the target artwork data meeting the preset conditions may be fused to obtain the fused artwork data by calculating the similarity vector of each dimension corresponding to the reference artwork data and the target artwork data, so that the calculation mode is simple, the fusion accuracy is high, and the accuracy of artwork data acquisition is improved.

In step S730, the reference artistic structure data in the second structured data and the target artistic structure data are fused to generate fused artistic organization data.

In an alternative embodiment, FIG. 10 shows a flow chart of a method of obtaining fused artistic data, as shown in FIG. 10, the method comprising at least the steps of: in step S1010, vector conversion is performed on the reference artistic organ data and the target artistic organ data in the second structured data according to the word vector model, so as to obtain an artistic organ word vector sequence. The Word vector model may be a Word2Vec model. The Word2Vec model can be used for expressing the Word as a Word vector, and the Word is digitized, so that a computer can better understand the Word, and the vector generated by the Word can embody semantic information. To use this semantic information, the Word2Vec Model may be implemented in two specific ways, namely a Continuous Bag of Words Model (CBOW) and a Skip-grams Model. Preferably, vector conversion of word sequences may employ Skip-grams models. The Skip-gram model is utilized to convert word vectors, a real vector with 300 dimensions can be used for uniquely representing a word in a word space, and the reference artistic organization data and the target artistic organization data are represented by multiplying the number of word sequences by a 300 vector matrix so as to obtain corresponding artistic organization word vector sequences.

In step S1020, an artistic organ similarity vector between the artistic organ word vector sequences is calculated, and a weighted calculation is performed according to a third weight of the artistic organ similarity vector. For artistic similarity, there may be multiple dimensions of the artwork similarity vector, such as the country in which the artistic organization is located, the time of establishment of the artistic organization, the number of collections of the artistic organization, etc. Thus, the artistic similarity vector between the word vector sequences of each dimension of the artistic organization can be calculated first.

For example, the lengths of the two artistic word vector sequences in the same dimension may be inconsistent, so that the two artistic word vector sequences may be used as the input of a twin long-term memory (abbreviated as twin LSTM) network model to adapt to the variable-length sequence pairs. The method comprises the steps of inputting a reference artistic mechanism word vector sequence and a target artistic mechanism word vector sequence into two neural network models respectively, and evaluating artistic mechanism similarity components between the input reference artistic mechanism word vector sequence and the target artistic mechanism word vector sequence by calculating the distance between the two vector sequences. Wherein the distance calculation between two vector sequences depends mainly on the manhattan distance. In addition, the artistic similarity vector may be calculated by other algorithms, which is not particularly limited in the present exemplary embodiment.

After obtaining the similarity components of the artistic institutions in each dimension, weighting calculation can be carried out on the similarity components of each artistic institution according to a preset third weight, and a weighting calculation result is obtained. For example, the dimensions associated with the similarity component of an artistic organization may include the country in which the artistic organization is located, the time of establishment of the artistic organization, and the number of collections of the artistic organization. Further, a corresponding third weight may be set to 0.5 for the country, a corresponding third weight may be set to 0.2 for the establishment time, a corresponding third weight may be set to 0.3 for the number of collection works, so as to multiply the country component in the similarity component of the artistic organization by 0.5, the establishment time component in the similarity component of the artistic organization by 0.2, and the number component of collection works in the similarity component of the artistic organization by 0.3, and the corresponding calculation result is obtained by summation calculation.

In step S1030, the similarity of the artistic organization is obtained according to the weighted calculation result, and it is determined whether the similarity of the artistic organization is greater than a third threshold. And obtaining the similarity of the artistic institutions after weighting and calculating the similarity components of the artistic institutions in each dimension. Therefore, it is possible to set the third threshold value according to the overall value of the similarity to the artistic institution and determine whether the similarity to the artistic institution is greater than the third threshold value. For example, the third threshold may be set to 3. When the weighted calculation result is 0.8, determining that the artist similarity is smaller than a third threshold according to the fact that 0.8 is smaller than 3; when the weighted calculation result is 3.2, it may be determined that the artist similarity is greater than the third threshold according to 3.2 being greater than 3.

In step S1040, the reference artistic institution data and the target artistic institution data corresponding to the artistic institution similarity greater than the third threshold are fused, so as to generate fused artistic institution data. When the judgment result is that the similarity of the artistic organization is larger than the third threshold value, the reference artistic organization data and the target artistic organization data can be determined to point to the same artistic organization, so that fusion processing is carried out on the reference artistic organization data and the target artistic organization data, and fusion artistic organization data is obtained.

In the present exemplary embodiment, the reference art mechanism data and the target art mechanism data meeting the preset conditions may be fused to obtain the fused art mechanism data by calculating the similarity vector of each dimension corresponding to the reference art mechanism data and the target art mechanism data, so that the calculation mode is simple, the fusion accuracy is high, and the accuracy of obtaining the art mechanism data is improved.

In step S140, an artistic triplet is generated according to the artistic entity and the artistic relationship, and an artistic field knowledge graph is generated according to the artistic triplet.

In the exemplary embodiment of the present disclosure, the artistic entity that can be extracted from the converged artistic data may include an artist, an artwork, an artistic organization, etc., and it should be noted that if the converged artistic data also includes other artistic entities, it may also be used as a part of generating the knowledge graph of the artistic domain.

The knowledge graph is also called a scientific knowledge graph, a series of different graphs for displaying the knowledge development process and the structural relationship, a visualization technology is used for describing knowledge resources and carriers thereof, knowledge and the interrelationship between the knowledge resources and carriers are mined, analyzed, constructed, drawn and displayed, and practical and valuable references are provided for discipline research by combining the theory and the method of disciplines such as application mathematics, graphics, information visualization technology, information science and the like with the methods of metering introduction analysis, co-occurrence analysis and the like, and the visualized graph is utilized for displaying the core structure, development history, leading edge field and the modern theory of the whole knowledge structure of the disciplines to achieve the multi-discipline fusion purpose. The knowledge graph is a structured semantic knowledge base for describing concepts and interrelationships thereof in a physical world in a symbolic form, and its basic constituent units are entity-relation-entity triples and entity and related attribute-key value pairs thereof, and the entities are mutually connected through relations to form a net-shaped knowledge structure.

Therefore, after the physical relations among artists, artists and artistic institutions are extracted, an association model of the knowledge graph of the artistic domain can be constructed, and the visual knowledge graph of the artistic domain can be drawn through a drawing program.

The method for constructing the knowledge graph of the art field in the embodiment of the disclosure is described in detail below in connection with an application scenario.

Fig. 11 is a flowchart illustrating a method for constructing an artistic domain knowledge graph in an application scene, as shown in fig. 11, in step S1110, structured data in an internal data source, that is, original structured data, is loaded. In addition, structured data in external data sources may be loaded to enrich the original structured data source.

In step S1111, a first preprocessing process such as data cleansing and error correction processing is performed on the obtained original structured data. Specifically, fig. 12 shows a flowchart of a method for performing the first preprocessing of data in the application scenario, as shown in fig. 12, in step S1210, the original data, specifically, the structured data in the internal data source and the external data source, is loaded as the original structured data.

In step S1211, the original structured data is subjected to data cleansing. Specifically, fig. 13 is a flow chart illustrating a method for performing data cleansing on original structured data in an application scenario, as shown in fig. 13, in step S1310, the original structured data, specifically, structured data in an internal data source and an external data source, is loaded as the original structured data.

In step S1311, single-value attribute determination processing is performed on the original structured data.

In step S1312, the first structured entity and the first structured relationship in the single-value structured data are obtained according to the result of the single-value attribute determination processing, and the multi-value data that do not satisfy the single-value attribute are counted, so as to obtain the corresponding multi-value data table.

In step S1313, it is determined whether or not there is any non-updated multi-value data in the multi-value data table. Outputting a first structuring entity and a first structuring relationship when the multi-value data table is empty; when the multi-value data table is not empty, the multi-value data may be audited to update the error correction dictionary or the data dictionary.

After the data cleansing of the original structure data, in step S1212, a repeatability test may be performed on the data cleansing result. When the artistic repetition test results are the same, fig. 14 shows a flowchart of a processing method for the same painting, and as shown in fig. 14, in step S1410, the painting with the same result of the artistic quality repetition test is obtained.

In step S1411, an artist repeatability test is performed, and an artist repeatability test result is generated.

In step S1412, if the artist repeatability test results are the same, the authoring time repeatability test is further performed to generate an authoring time repeatability test result.

In step S1413, if the creation time repeatability test results are the same, it is determined that the two drawings are repeated.

In step S1414, data fusion processing is performed on the two repeated drawings, and a result of the corresponding data fusion processing is obtained.

In step S1415, the data fusion processing result is manually checked, and a manual check result is obtained.

In step S1416, when the manual audit result is audit pass, a data dictionary is generated for updating.

In step S1417, when it is determined that the creation artist and creation time of the same-name painting are the same, it can be determined that the two painting are the double-name painting.

In step S1418, the duplicate name works are subjected to duplicate removal, and the duplicate removal results are manually checked to determine the accuracy of the duplicate removal.

In step S1213, the data cleaning result that has been subjected to the re-processing or the fusion processing and that has not been repeated is manually checked for errors in the name of the drawing, the name of the artist, and the like.

In step S1214, a corresponding data dictionary or error correction dictionary may be generated according to the error correction processing result.

In step S1215, the generated data dictionary or error correction dictionary may have a case where the naming convention is inconsistent with the storage convention in the database, and further data convention processing steps may be performed. The dictionary subjected to normalization processing is newly added into a data dictionary or an error correction dictionary.

In step S1112, target art data, that is, updated data, may be generated from the generated data dictionary or error correction dictionary.

In step S1113, a data fusion process may be performed on the first structured data and the second structured data. The second structured data may be structured data that is converted by crawling semi-structured data from an external data source and then processing the semi-structured data, and is stored in a MySQL database. Fig. 15 shows a flow chart of a method for generating fusion art data in an application scene, as shown in fig. 15, in step S1510, semi-structured data is crawled from an external data source. The external data source may be a public data source of the internet or may be other data sources, which is not limited in this exemplary embodiment.

In step S1511, the semi-structured data is subjected to a second preprocessing according to a preset rule and a regular expression, so as to obtain structured data.

In step S1512, further, normalization processing may be performed on the obtained structured data to generate second structured data.

In step S1513, the artist similarity vectors for the respective dimensions of the reference artist and target artist data are calculated using the Word2Vec algorithm, respectively.

In step S1514, the artist similarity vector is weighted according to the first weight, and the corresponding artist similarity is generated.

In step S1515, the artist similarity is compared with a first threshold.

In step S1516, when the artist similarity is greater than the first threshold, the reference artist data and the target artist data are fused, generating fused artist data.

In step S1517, the Word2Vec algorithm is used to calculate the artwork similarity vectors for each dimension of the reference artwork and the target artwork data, respectively.

In step S1518, the artwork similarity vector is weighted according to the second weight, and the corresponding artwork similarity is generated.

In step S1519, the artwork similarity is compared with a second threshold.

In step S1520, when the artwork similarity is greater than the second threshold, the reference artwork data and the target artwork data are fused to generate fused artwork data.

In step S1521, the Word2Vec algorithm is used to calculate the artistic conception similarity vector for each dimension of the reference artistic conception and the target artistic conception data.

In step S1522, the artistic similarity vector is weighted according to the third weight, and the corresponding artistic similarity is generated.

In step S1523, the artistic similarity is compared with a third threshold.

In step S1524, when the artistic similarity is greater than the third threshold, the reference artistic data and the target artistic data are fused to generate fused artistic data.

In step S1114, fused artist data, fused artwork data, and fused artistic institution data may be obtained.

In addition, after the fusion processing data is obtained, in step S1115, artistic data in the unmatched external data source may be extracted to evaluate the fusion processing data. In this scenario, the primary evaluation criteria include the accuracy and integrity of the fusion process data.

In step S1116, the artistic entity in the merged artistic data and the artistic relationship corresponding to the artistic entity are extracted to implement pattern (Schema) design of the database. The schema includes schema objects, such as a table (table), a column (column), a data type (data type), a view (view), a stored procedure (stored procedures), a relation (relationships), a primary key (primary key), an external key (formaign key), and the like. The database schema may be represented by a visual map showing artistic entities and their relationships to each other.

In step S1117, an art domain knowledge map composed of the generated art triples of artists, works of art, and institutions of art is acquired, and the whole is stored in a map database, for example Neo4 j. Fig. 16 shows an interface schematic of a visual knowledge graph of art, as shown in fig. 16, the art entity includes artist, artwork, and artistic organization. Wherein, the entity related to the artist can be nationality, death year, birth place, birth year, month and year, genre, etc., and the attribute corresponding to the artist has English name and alias; entities related to the artwork may be creation year, creation medium, category, subject, etc., and attributes corresponding to the artwork may be unique codes (Identity document, abbreviated ID), aliases, and sizes; the attribute corresponding to the artistic organization has an english name.

The knowledge graph in the art field can be respectively applied to art encyclopedia, art graph, art knowledge question-answering and art knowledge summary. Fig. 17 shows a schematic view of a scene applied in an art encyclopedia, as shown in fig. 17, in which, after a user initiates a search, da vinci can be identified by means of art entity recognition and a thulac word segmentation package and a data dictionary, and knowledge related to the da vinci is presented; FIG. 18 shows a schematic view of a scene applied in a knowledge graph, as shown in FIG. 18, visually demonstrated by drawing the knowledge graph drawn by the component E-charts; FIG. 19 shows a schematic view of a scene applied to artistic knowledge questions and answers, and as shown in FIG. 19, the user's questions are word-segmented by a thuac word segmentation package, and a visual knowledge graph corresponding to the artistic questions is generated by a matching result of a preset rule or regular expression; fig. 20 shows a schematic view of a scene applied in an artistic knowledge overview, as shown in fig. 20, with which a corresponding artistic knowledge overview can be generated using a data dictionary.

In the exemplary embodiment of the disclosure, on one hand, the data fusion processing is performed through the data in the external data source and the normalized data, so that the scale of entity knowledge in the art field is greatly increased, and the accuracy of knowledge acquisition in the art field is improved; on the other hand, the knowledge graph of the art field is generated according to the artistic entity and the artistic relationship, so that the relevance of the entity in the knowledge graph and the comprehensiveness of the knowledge graph search are improved, the query intention is more accurately understood, and the retrieval accuracy is improved.

It should be noted that while the implementations of the above exemplary embodiments describe the steps of the methods in this disclosure in a particular order, this does not require or imply that the steps must be performed in that particular order or that all of the steps must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

In addition, in the exemplary embodiment of the disclosure, a device for constructing a knowledge graph in an art field is also provided. Fig. 21 shows a schematic structural diagram of an apparatus for constructing an art domain knowledge graph, and as shown in fig. 21, an apparatus 2100 for constructing an art domain knowledge graph may include: a data processing module 2110, a data parsing module 2120, a data fusion module 2130 and a map generation module 2140.

Wherein:

a data processing module 2110 configured to perform a first preprocessing on the structured data in the internal art data source and the external art data source, generating a first structured data; a data parsing module 2120 configured to perform a second preprocessing on unstructured data and semi-structured data in the internal artistic data source and the external artistic data source to obtain second structured data; a data fusion module 2130 configured to fuse the first structured data with the second structured data to generate fused artistic data; wherein the converged artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity; the map generation module 2140 is configured to generate an artistic triplet according to the artistic entity and the artistic relationship, and generate an artistic domain knowledge map according to the artistic triplet.

The specific details of the device for constructing the knowledge graph in the art field are described in detail in the construction method of the knowledge graph in the corresponding art field, so that the details are not repeated here.

It should be noted that although several modules or units of the construction apparatus 2100 for artistic domain knowledge graph are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied. Each module or unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more modules or units may be integrated in one module or unit. The modules or units can be realized in the form of hardware or in the form of software functional modules or units, and the specific hardware can be a CPU, a microprocessor, a GPU, an FPGA, a singlechip or the like.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In addition, in an exemplary embodiment of the present disclosure, there is also provided an electronic device capable of implementing the above method, where the electronic device includes a processor, a memory, and a memory for storing executable instructions of the processor; the processor is configured to perform the above-described art domain knowledge graph construction method via execution of the executable instructions.

An electronic device 2200 according to such an embodiment of the present invention is described below with reference to fig. 22. The electronic device 2200 shown in fig. 22 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in FIG. 22, the electronic device 2200 is in the form of a general purpose computing device. Components of electronic device 2200 may include, but are not limited to: the at least one processing unit 2210, the at least one storage unit 2220, a bus 2230 connecting different system components (including the storage unit 2220 and the processing unit 2210), and a display unit 2240.

Wherein the storage unit stores program code that is executable by the processing unit 2210 such that the processing unit 2210 performs the steps according to various exemplary embodiments of the invention described in the "exemplary method" section of the present specification.

The storage unit 2220 may include a readable medium in the form of a volatile storage unit, such as a Random Access Memory (RAM) 2221 and/or a cache memory unit 2222, and may further include a Read Only Memory (ROM) 2223.

The storage unit 2220 may also include a program/utility 2224 having a set (at least one) of program modules 2225, such program modules 2225 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 2230 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 2200 may also communicate with one or more external devices 2400 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 2200, and/or any device (e.g., router, modem, etc.) that enables the electronic device 2200 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 2250. Also, electronic device 2200 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 2260. As shown, network adapter 2240 communicates with other modules of electronic device 2200 over bus 2230. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 2200, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a computer program capable of implementing the method described above in the present specification. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.

Referring to fig. 23, a program product 2300 for implementing the above-described method according to an embodiment of the invention is described, which may employ a portable compact disc read-only memory (CD-ROM) and comprise program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. The construction method of the knowledge graph in the art field is characterized by comprising the following steps:

carrying out first preprocessing on the structured data in the internal artistic data source and the external artistic data source to generate first structured data;

performing second preprocessing on unstructured data and semi-structured data in the internal artistic data source and the external artistic data source to obtain second structured data;

carrying out fusion processing on the first structured data and the second structured data to generate fusion artistic data; wherein the converged artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity;

Generating an artistic triplet according to the artistic entity and the artistic relationship, and generating an artistic domain knowledge graph according to the artistic triplet;

the first preprocessing is performed on the structured data in the internal artistic data source and the external artistic data source to generate first structured data, which comprises the following steps:

carrying out data cleaning on the structured data in the internal artistic data source and the external artistic data source;

performing repeatability test on the data cleaning results of the structured data in the internal artistic data source and the external artistic data source to generate repeatability test data;

and generating a data dictionary and an error correction dictionary according to the repeatability test data, and obtaining first structured data based on the data dictionary.

2. The method for constructing a knowledge graph in an art field according to claim 1, wherein the data cleaning of the structured data in the internal art data source and the external art data source comprises:

carrying out single-value attribute judgment processing on the structured data in the internal artistic data source and the external artistic data source to obtain single-value structured data;

acquiring a first structuring entity and a first structuring relation in the single-value structuring data, and counting the result of single-value attribute judging processing to obtain a multi-value data table;

If the multi-value data table does not contain multi-value data, taking the first structuring entity and the first structuring relation as a data cleaning result;

and if the multi-value data table contains multi-value data, obtaining a second structuring entity and a second structuring relation according to the multi-value data table to serve as a data cleaning result.

3. The method for constructing an artistic field knowledge graph according to claim 2, wherein said obtaining a second structured entity and a second structured relation according to said multi-value data table as a data cleaning result comprises:

updating a data dictionary or an error correction dictionary according to the multi-value data table;

and obtaining a second structuring entity and a second structuring relation as a data cleaning result according to the updated data dictionary or the updated result of the error correction dictionary.

4. The method for constructing a knowledge graph in an artistic field according to claim 3, wherein the performing a repeatability test on the data cleaning result of the structured data to generate repeatability test data includes:

performing repeatability test on artwork entities on the data cleaning results of the structured data in the internal art data source and the external art data source to generate artwork repeatability test results;

If the artwork repeatability test results are the same, carrying out artist entity repeatability test on the data cleaning results to generate artist repeatability test results;

if the artist repeatability test results are the same, carrying out repeatability test on the creation time entity on the data cleaning result to generate creation time repeatability test results;

if the results of the repeated test of the creation time are the same, determining that the artwork entity is a repeated artwork;

and carrying out fusion processing on the repeated artworks, and generating repeatability test data according to the fusion processing result passing the verification.

5. The method for constructing an art domain knowledge graph according to claim 4, further comprising:

if the artist repeatability test results are different or the creation time repeatability test results are different, determining that the artwork entity is a famous artwork;

and carrying out de-duplication treatment on the renamed artwork, and generating the repeatability test data according to a de-duplication treatment result.

6. The method of claim 1, wherein the first structured data includes target artwork data, target artist data, and target art organization data;

The fusing processing is performed on the first structured data and the second structured data to generate fused artistic data, which includes:

carrying out fusion processing on the reference artist data in the second structured data and the target artist data to generate fusion artist data;

carrying out fusion processing on the reference artwork data in the second structured data and the target artwork data to generate fusion artwork data;

and carrying out fusion processing on the reference artistic institution data and the target artistic institution data in the second structured data to generate fusion artistic institution data.

7. The method for constructing an artistic field knowledge graph according to claim 6, wherein the fusing the reference artist data in the second structured data and the target artist data to generate fused artist data includes:

performing vector conversion on the reference artist data and the target artist data in the second structured data according to the word vector model to obtain an artist word vector sequence;

calculating artist similarity vectors among the artist word vector sequences, and carrying out weighted calculation according to first weights of the artist data similarity vectors;

Obtaining artist similarity according to the weighted calculation result, and judging whether the artist similarity is larger than a first threshold value or not;

and carrying out fusion processing on the reference artist data and the target artist data corresponding to the artist similarity larger than the first threshold value to generate fusion artist data.

8. The method for constructing a knowledge graph in an artistic field according to claim 6, wherein the fusing the reference artwork data in the second structured data and the target artwork data to generate fused artwork data includes:

performing vector conversion on the reference artwork data and the target artwork data in the second structured data according to the word vector model to obtain an artwork word vector sequence;

calculating artwork similarity vectors among the artwork word vector sequences, and carrying out weighted calculation according to second weights of the artwork similarity vectors;

obtaining the similarity of the artwork according to the weighted calculation result, and judging whether the similarity of the artwork is larger than a second threshold value or not;

and carrying out fusion processing on the reference artwork data and the target artwork data corresponding to the artwork similarity larger than the second threshold value to generate fusion artwork data.

9. The method for constructing a knowledge graph in an artistic field according to claim 6, wherein the fusing the reference artistic organization data and the target artistic organization data in the second structured data to generate fused artistic organization data includes:

performing vector conversion on the reference artistic organization data and the target artistic organization data in the second structured data according to the word vector model to obtain an artistic organization word vector sequence;

calculating artistic mechanism similarity vectors among the artistic mechanism word vector sequences, and carrying out weighted calculation according to the third weights of the artistic mechanism similarity vectors;

obtaining the similarity of the artistic institutions according to the weighted calculation result, and judging whether the similarity of the artistic institutions is larger than a third threshold value or not;

and carrying out fusion processing on the reference artistic institution data and the target artistic institution data corresponding to the artistic institution similarity which is larger than the third threshold value, and generating fusion artistic institution data.

10. The utility model provides a construction device of art field knowledge graph which characterized in that includes:

the data processing module is configured to perform first preprocessing on the structured data in the internal artistic data source and the external artistic data source to generate first structured data;

The data analysis module is configured to perform second preprocessing on unstructured data and semi-structured data in the internal art data source and the external art data source to obtain second structured data;

the data fusion module is configured to fuse the first structured data with the second structured data to generate fusion art data; wherein the converged artistic data comprises an artistic entity and an artistic relationship corresponding to the artistic entity;

the map generation module is configured to generate an artistic triplet according to the artistic entity and the artistic relation and generate an artistic domain knowledge map according to the artistic triplet;

11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of constructing an art field knowledge graph according to any one of claims 1-9.

12. An electronic device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of constructing an art knowledge graph of any one of claims 1-9 via execution of the executable instructions.