WO2020253355A1 - Procédé et dispositif de fusion d'entités, dispositif électronique et support de stockage - Google Patents

Procédé et dispositif de fusion d'entités, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2020253355A1
WO2020253355A1 PCT/CN2020/085909 CN2020085909W WO2020253355A1 WO 2020253355 A1 WO2020253355 A1 WO 2020253355A1 CN 2020085909 W CN2020085909 W CN 2020085909W WO 2020253355 A1 WO2020253355 A1 WO 2020253355A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
attribute
similarity
phrase
data
Prior art date
Application number
PCT/CN2020/085909
Other languages
English (en)
Chinese (zh)
Inventor
郝吉芳
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2020253355A1 publication Critical patent/WO2020253355A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the embodiments of the present disclosure relate to methods and devices for entity fusion, electronic devices, and non-transitory computer-readable storage media.
  • Knowledge graph is a structured semantic knowledge base, which is used to quickly describe concepts and their relationships in the physical world.
  • the knowledge graph effectively processes, processes, and integrates the data of intricate documents, transforms it into simple and clear triples of "entity, relationship, entity”, and aggregates a large amount of knowledge to achieve rapid response and reasoning.
  • An embodiment of the present disclosure provides a method for entity fusion, including: acquiring data of multiple entities, wherein the multiple entities include a first entity and a second entity; and obtaining data from the multiple entities Extract multiple attribute groups one-to-one corresponding to the multiple entities from the data, wherein each attribute group includes at least one attribute, and each attribute is expressed by a phrase or a short sentence; Converted into a vector to obtain the attribute vector of the attribute expressed by the phrase or short sentence; and calculate the attribute vector of the first entity and the second entity based on the attribute vector of the corresponding attribute of the first entity and the second entity The attribute similarity of the corresponding attribute is based on the attribute similarity of the corresponding attribute of the first entity and the second entity to realize the fusion of the first entity and the second entity.
  • converting the phrase or short sentence into a vector to obtain the attribute vector of the attribute expressed by the phrase or short sentence includes: dividing the phrase or short sentence into one or Multiple words; converting the one or more words into one or more word vectors; and obtaining the attribute expressed by the phrase or short sentence according to the one or more word vectors corresponding to the phrase or short sentence The attribute vector of.
  • obtaining the attribute vector of the attribute expressed by the phrase or short sentence according to the one or more word vectors corresponding to the phrase or short sentence includes: determining The weight of the one or more word vectors; according to the weight of the one or more word vectors, the one or more word vectors are weighted and averaged to obtain the attribute of the attribute expressed by the phrase or short sentence vector.
  • the fusion of the first entity and the second entity is realized based on the attribute similarity of the corresponding attributes of the first entity and the second entity, It includes: obtaining the entity similarity between the first entity and the second entity according to the attribute similarity of the corresponding attributes of the first entity and the second entity; and comparing the entity similarity When the entity similarity is higher than or equal to the predetermined threshold, it is determined to merge the first entity and the second entity; when the entity similarity is lower than the predetermined threshold, it is determined not to Perform the fusion of the first entity and the second entity.
  • the relationship between the first entity and the second entity is obtained according to the attribute similarity of the corresponding attributes of the first entity and the second entity
  • the entity similarity includes: assigning corresponding weights to the corresponding attributes of the first entity and the second entity; performing attribute similarity of the corresponding attributes of the first entity and the second entity Weighted average to obtain the entity similarity between the first entity and the second entity.
  • the method before converting the phrase or short sentence into a vector, the method further includes: determining whether the attribute expressed by the phrase or short sentence exists in the synonym dictionary, And in response to the presence of the attribute expressed by the phrase or short sentence in the synonym dictionary, the attribute similarity corresponding to the attribute expressed by the phrase or short sentence is calculated based on the synonym dictionary.
  • obtaining data of multiple entities includes: obtaining data of the multiple entities from multiple data sources, where the multiple data sources include different first data sources.
  • a data source and a second data source the data of the first entity comes from the first data source, and the data of the second entity comes from the second data source.
  • the plurality of entities are art entities in the art field.
  • the art entity includes an artwork, an artist, or an art institution.
  • the data of the multiple entities includes structured data and/or unstructured data.
  • An embodiment of the present disclosure provides a device for entity fusion, including: an acquisition device configured to acquire data of multiple entities, wherein the multiple entities include a first entity and a second entity; It is configured to extract multiple attribute groups one-to-one corresponding to the multiple entities from the acquired data of the multiple entities, wherein each of the attribute groups includes at least one attribute, and each of the attributes passes through a phrase Or a short sentence expression; a converter configured to convert a phrase or a short sentence into a vector to obtain an attribute vector of the attribute expressed by the phrase or a short sentence; and a blender configured to be based on the first entity and the The attribute vector of the corresponding attribute of the second entity calculates the attribute similarity of the corresponding attribute of the first entity and the second entity, and is based on the attribute similarity of the corresponding attribute of the first entity and the second entity The attribute similarity realizes the fusion of the first entity and the second entity.
  • the converter is further configured to: segment a phrase or short sentence into one or more words; convert the one or more words into one or more words Vector; and according to the one or more word vectors corresponding to the phrase or short sentence, the attribute vector of the attribute expressed by the phrase or short sentence is obtained.
  • the converter is further configured to: determine the weight of the one or more word vectors; and according to the weight of the one or more word vectors, the one Or multiple word vectors are weighted and averaged to obtain the attribute vector of the attribute expressed by the phrase or short sentence.
  • the fusion device is further configured to: obtain the first entity according to the attribute similarity of the corresponding attributes of the first entity and the second entity And comparing the entity similarity with the second entity; and comparing the entity similarity with a predetermined threshold, and when the entity similarity is higher than or equal to the predetermined threshold, determining to merge the first entity and the Second entity; when the entity similarity is lower than the predetermined threshold, it is determined not to perform the fusion of the first entity and the second entity.
  • the fusion device is further configured to: assign corresponding weights to the corresponding attributes of the first entity and the second entity;
  • the attribute similarity of the corresponding attribute of the second entity is weighted and averaged to obtain the entity similarity between the first entity and the second entity.
  • the fusion device is further configured to: before converting the phrase or short sentence into a vector, determine whether the attribute expressed by the phrase or short sentence is in the synonym dictionary. In the case where the attribute expressed by the phrase or short sentence exists in the synonym dictionary, the attribute similarity corresponding to the attribute expressed by the phrase or short sentence is calculated based on the synonym dictionary.
  • the acquiring device is further configured to: acquire data of the multiple entities from multiple data sources, wherein the multiple data sources include different first data sources.
  • a data source and a second data source the data of the first entity comes from the first data source, and the data of the second entity comes from the second data source.
  • the multiple entities are art entities in the art field.
  • An embodiment of the present disclosure provides an electronic device, including: a memory for storing a computer readable program; a processor for running the computer readable program; wherein the processor executes the computer readable program When realizing the steps of the method provided according to any one of the above embodiments.
  • An embodiment of the present disclosure provides a non-transitory computer-readable storage medium that stores computer-readable instructions, where the computer-readable instructions implement the steps of the method provided in any one of the foregoing embodiments when the computer-readable instructions are executed by a processor.
  • FIG. 1A illustrates an example of structured data of an entity to which the entity fusion method provided by an embodiment of the present disclosure can be applied.
  • FIG. 1B illustrates an example of semi-structured data of an entity to which the entity fusion method provided by an embodiment of the present disclosure can be applied.
  • FIG. 2 illustrates a flowchart of a method for entity fusion provided by an embodiment of the present disclosure.
  • Fig. 3 illustrates a flowchart of another method for entity fusion provided by an embodiment of the present disclosure.
  • Fig. 4 illustrates a flowchart of a method for calculating attribute similarity based on a synonym dictionary.
  • Fig. 5 illustrates a block diagram of a device for entity fusion provided by an embodiment of the present disclosure.
  • Fig. 6 illustrates an exemplary block diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 7 illustrates a block diagram of an example system including an electronic device provided by an embodiment of the present disclosure.
  • FIG. 8 illustrates a schematic diagram of a non-transitory storage medium provided by at least one embodiment of the present disclosure.
  • entity fusion can be carried out by establishing a linear programming model, based on the description of synonymous entities and SVM (support vector machine) classification, entity reduction of the same name or comparison of ambiguous items.
  • art websites provide structured data (for example, relational databases) and semi-structured data (for example, XML, JSON, encyclopedia, etc.) related to art entities such as artworks, artists, and art institutions.
  • structured data for example, relational databases
  • semi-structured data for example, XML, JSON, encyclopedia, etc.
  • FIG. 1A schematically shows structured data from a website.
  • the structured data is related to the painting "Mona Lisa".
  • the structured data is logically expressed and realized by a two-dimensional table structure, and strictly follows the data format and length specifications.
  • the two-dimensional table corresponding to the structured data has a fixed data structure pattern including the title of the work, the introduction of the work, the year, the content subject, the price of the electronic version, the original author, etc.
  • FIG. 1B schematically shows semi-structured data from another website.
  • the semi-structured data is also related to the painting "Mona Lisa".
  • the semi-structured data is also a kind of structured data, the semi-structured data does not adopt the form of a table to express and implement related data.
  • the semi-structured data includes Chinese name, foreign name, specifications, author, painting type, creation year, current collection place, material and other data information.
  • the embodiments of the present disclosure provide a method and device for entity fusion, an electronic device, and a non-transitory computer-readable storage medium.
  • the method for entity fusion includes: acquiring data of multiple entities, wherein the multiple entities include a first entity and a second entity; and extracting multiple entities corresponding to multiple entities one-to-one from the acquired data of multiple entities.
  • each attribute group includes at least one attribute
  • each attribute is expressed by a phrase or a short sentence; the phrase or short sentence is converted into a vector to obtain the attribute vector of the attribute expressed by the phrase or the short sentence; and based on the first
  • the attribute vector of the corresponding attributes of the entity and the second entity calculates the attribute similarity of the corresponding attributes of the first entity and the second entity, and realizes the first entity based on the attribute similarity of the corresponding attributes of the first entity and the second entity Integration with the second entity.
  • FIG. 2 shows a flowchart of a method 200 for entity fusion according to an embodiment of the present disclosure.
  • the method 200 may be executed by an electronic device, and may be implemented by software, firmware, hardware, or a combination thereof.
  • the method 200 is shown as a set of steps 201-205 and is not limited to the order of operations shown to perform the steps. The description of the method 200 will be made below in conjunction with the example of the art entity shown in FIGS. 1A and 1B.
  • step 201 data of multiple entities is obtained.
  • the plurality of entities includes a first entity and a second entity.
  • the first entity may be the first artwork entity shown in FIG. 1A
  • the second entity may be the second artwork entity shown in FIG. 1B.
  • step S201 includes: acquiring data of multiple entities from multiple data sources.
  • the multiple data sources include different first data sources and second data sources, the data of the first entity comes from the first data source, and the data of the second entity comes from the second data source.
  • the data source may be a news website, an encyclopedia website, or any website containing data related to the entity (for example, a website containing novel data).
  • the data source can also be a public or private database.
  • the data of the first artwork entity shown in Figure 1A comes from a private database specially constructed for the art entity and entered and reviewed by experts, while the data of the second artwork entity shown in Figure 1B comes from a public database. Encyclopedia website.
  • related data of art entities can come from various websites in related fields (such as the art field).
  • web crawler technology can be used to crawl related websites to collect related data of art entities.
  • the entity may be an art entity in the art field.
  • the art entity may include artworks (such as paintings, sculptures, antiques, etc.), artists (such as painters, sculptors, musicians, etc.), and art institutions (Such as art galleries, museums, etc.) etc.
  • artworks such as paintings, sculptures, antiques, etc.
  • artists such as painters, sculptors, musicians, etc.
  • art institutions such as art galleries, museums, etc.
  • entity data can include structured and semi-structured data.
  • the semi-structured data when the acquired data includes semi-structured data, can be structured.
  • the web page parsing function of the web crawler can be used to structure the semi-structured data.
  • structured processing can also be done by creating a dedicated dictionary.
  • an art field dictionary can be created.
  • the art domain dictionary can contain correct expressions of each artistic entity and/or the attributes of each artistic entity.
  • the semi-structured data shown in FIG. 1B can be expressed by, for example, a two-dimensional table structure similar to the two-dimensional table shown in FIG. 1A after being structured, and the two-dimensional table structure includes Chinese name, foreign name, painting type, current collection location, specifications, author, creation year, material and other items.
  • the name attribute of the second artwork entity shown in FIG. 1B can be processed to adopt an expression consistent with the name attribute of the first artwork entity shown in FIG. 1A, for example, the second artwork shown in FIG. 1B
  • the attribute of the entity "Chinese name” is replaced with "work name", "author” is replaced with "original author", etc.
  • all the data may be preprocessed.
  • the preprocessing may include the grammar and data expression form (such as attribute The expression form of data), etc., are normalized or any other applicable processing, so as to normalize the expression form of the data.
  • step 202 multiple attribute groups corresponding to multiple entities one-to-one are extracted from the acquired data of multiple entities.
  • the attributes of the entity can be extracted from the structured data according to the fixed structure mode of the structured data.
  • the two-dimensional table always contains attribute entries such as the name of the work, the introduction of the work, the year, the content and subject matter, the price of the electronic version, and the original author. Therefore, various attributes or attributes required for subsequent processing can be extracted from such structured data.
  • each attribute group includes at least one attribute.
  • the attribute group corresponding to the entity may include attributes such as the name of the work, the introduction of the work, the year, the content subject, the price of the electronic version, and the original author.
  • Attribute group means a collection of attributes of an entity.
  • Attributes refer to the properties and relationships of various related information that can represent entities.
  • the attributes of entities can be expressed in phrases or short sentences.
  • art entities can include artworks, artists, and art institutions.
  • the attribute group corresponding to the artwork can include attributes such as the author, English name, creation year, creation medium, collection location, and size of the artwork.
  • the attribute group corresponding to the artist can include the artist's English name, nationality, place of birth, year of birth, year of death, genre, and representative works.
  • the attribute group corresponding to the art institution may include the English name, location, and representative artwork of the art institution.
  • the attributes of artistic entities are usually used to identify the name of an object (for example, people, objects, institutions, places, etc.) and usually include time, the attributes of artistic entities can often be used in phrases or short sentences expression.
  • the attributes of the painting such as the title, content and subject matter, original author, and time, are all expressed in phrases or short sentences.
  • the attributes corresponding to the first entity and the attributes corresponding to the second entity may all be the same or similar, and at least part of the attributes corresponding to the first entity may also be different from the attributes corresponding to the second entity. Belongs to the first entity, not the second entity.
  • the second artwork entity shown in FIG. 1B includes material attributes, while the first artwork entity shown in FIG. 1A does not include material attributes.
  • the "attribute corresponding to the first entity” means the attribute extracted from the data of the first entity
  • the "attribute corresponding to the second entity” means the attribute extracted from the data of the second entity.
  • step 203 the phrase or short sentence is converted into a vector to obtain an attribute vector of the attribute expressed by the phrase or short sentence.
  • the word embedding technology converts words into vectors, that is, maps words to a semantic space to obtain vectors. In this semantic space, similar words correspond to similar word vectors, so the similarity between different words can be calculated.
  • a neural network can be used to map words to a vector space.
  • word vectors are used as input features of deep learning models.
  • word vector technology can be divided into word vector technology based on statistical methods and word vector technology based on language models.
  • word vector technology based on statistical methods can be divided into word vector technology based on co-occurrence matrix and word vector technology based on singular value decomposition.
  • the language model in the word vector technology based on the language model, the language model generates the word vector by training the neural network language model NNLM (neural network language model), and the word vector is an incidental output of the language model.
  • word vector technology based on language models includes word vector technology based on word2vec.
  • word2vec is implemented by borrowing a neural network, using skip-gram and continuous bag of words (CBOW).
  • CBOW skip-gram model uses a word as input to predict the context around it, and the CBOW model uses the context of a word as input to predict the word itself.
  • word embedding tools can be used to convert phrases or short sentences into vectors.
  • Word embedding tools can include word2vec (words into vectors) tools, GloVe (Global Vectors for Word Representation, global vectors for word representation) tools, Embedding Layer tools, and so on.
  • the phrases or short sentences expressing the 6 attributes can be respectively converted into corresponding 6 attribute vectors, so as to obtain the 6 attribute vectors corresponding to the first artwork entity, and the 6 attributes correspond to the 6 attribute vectors one-to-one.
  • the second artwork entity "Mona Lisa” shown in Figure 1B since the second artwork entity has 8 attributes, namely: Chinese name, foreign name, painting type, current collection place, specification, author , Creation time, material, so after vector conversion, 8 attribute vectors corresponding to the second artwork entity can be obtained.
  • step 204 the attribute similarity of the corresponding attributes of the first entity and the second entity is calculated based on the attribute vectors of the corresponding attributes of the first entity and the second entity.
  • the corresponding attribute may indicate attributes with the same or similar attribute names.
  • the first artwork entity in Figure 1A has six attributes: work name, work introduction, age, content subject matter, electronic version price, and original author
  • the second artwork entity in Figure 1B has Chinese name, foreign name, and painting type , 8 attributes of current collection, specification, author, creation year, and material. It can be understood that due to the attribute: work name in FIG. 1A and the attribute: Chinese name in FIG. 1B actually point to the same attribute, the "work name” attribute and the "Chinese name” attribute are corresponding attributes.
  • the first artwork entity and the second artwork entity involve a total of 10 attributes, namely: work name/Chinese name; year/year of creation; content subject/type of painting; original author/author; work introduction; electronic version price; foreign name ; The current collection place; specifications; material. Therefore, in the example shown in FIGS. 1A and 1B, the attribute similarity of the corresponding attribute can be calculated based on the attribute vector of each corresponding attribute of the first artwork entity and the second artwork entity. For example, it can also be based on The attribute vectors of the 10 attributes (including corresponding attributes) of the first artwork entity and the second artwork entity are used to calculate the attribute similarity of the 10 attributes.
  • the attribute may be set to a null value. For example, before determining the similarity of the attributes of the two entities, it is possible to determine in advance whether any of the corresponding attributes of the two entities is null. Exemplarily, if a certain same attribute of two entities is a null value or the same attribute of one of the entities is a null value, then the attribute similarity of the same attribute is 0.
  • a similarity calculation method may be used to calculate the similarity of the attribute vectors of one or more corresponding attributes, and then calculate the attribute similarity of each corresponding attribute. Similarity calculation methods include but are not limited to angle cosine similarity calculation, distance calculation and other methods.
  • step 205 based on the attribute similarity of the corresponding attributes of the first entity and the second entity, the fusion of the first entity and the second entity is realized.
  • step 205 may include: obtaining the entity similarity between the first entity and the second entity according to the attribute similarity of the corresponding attributes of the first entity and the second entity; and comparing the entity similarity with a predetermined threshold, and the entities are similar. When the degree is higher than or equal to the predetermined threshold, it is determined to merge the first entity and the second entity; when the entity similarity is lower than the predetermined threshold, it is determined not to merge the first entity and the second entity.
  • entity fusion means judging whether the entities are the same. For example, “performing the fusion of the first entity and the second entity” means judging that the first entity and the second entity are the same entity, " "Do not merge the first entity and the second entity” means to determine that the first entity and the second entity are different entities.
  • the fusion of the first entity and the second entity can also be realized based on the attribute similarity of all attributes (all attributes including corresponding attributes) of the first entity and the second entity, thereby improving the fusion. The accuracy of the first entity and the second entity.
  • obtaining the entity similarity between the first entity and the second entity may include: assigning corresponding attributes to the corresponding attributes of the first entity and the second entity Weight: weighted average the attribute similarity of the corresponding attributes of the first entity and the second entity to obtain the entity similarity between the first entity and the second entity.
  • the attribute similarity of each corresponding attribute of the first entity and the second entity may be weighted and averaged to obtain the entity similarity between the first entity and the second entity.
  • the obtained entity similarity is compared with a predetermined threshold to determine whether to perform entity fusion.
  • a predetermined threshold By assigning different weights to different attributes, and assigning greater weights to attributes that have greater contributions to the characteristics of the entity, the accuracy of the entity similarity judgment can be improved. For example, if the entity similarity is higher than or equal to the predetermined threshold, it can be determined that the first entity and the second entity are the same entity, and thus the first entity and the second entity can be merged. Conversely, if the entity similarity is lower than the predetermined threshold, it can be determined that the first entity and the second entity are not the same entity, and therefore the first entity and the second entity are not merged.
  • the attributes of the entity may be assigned corresponding weights in advance, for example, based on expert experience. Since the weight assigned in this way depends on the experience and knowledge of the expert, the weight can more credibly and scientifically indicate the importance of each attribute of the entity in the process of characterizing entity characteristics, thereby further improving the accuracy of entity similarity judgment , And thereby improve the effect of entity fusion.
  • the attribute similarity of each corresponding attribute of the first entity and the second entity may be directly arithmetic averaged to obtain the entity similarity between the first entity and the second entity .
  • attributes such as author, creation year, and current collection place may be attributes that uniquely identify the entity, which can be more accurately represented The entity, therefore, can assign a relatively high weight to these attributes such as author, creation year, and current collection location, so that the entity similarity is more accurate.
  • Other attributes of the artwork entity such as material, size, subject matter, etc., may be attributes shared by the artwork entity and other entities. Therefore, relatively low weights can be assigned to these attributes such as material, size, and subject matter.
  • the attributes of the entity can be classified into, for example, name category, time category, other categories, etc., and different weights are assigned to each category. For example, you can assign a relatively high weight to the name category.
  • the predetermined threshold may be inductively derived from the known similarity between the same entities and the similarity between different entities. For example, if the similarity between the same entities is higher, the predetermined threshold may be higher; if the similarity between the same entities is lower, the predetermined threshold may also be lower.
  • classification methods in machine learning such as logistic regression, naive Bayes, SVM, etc.
  • clustering methods such as K-means clustering, dbscan clustering, etc.
  • the attributes of the entity are described by phrases or short sentences, and the similarity is judged by mapping the attributes of the entity to the corresponding word vector, which reduces the computational complexity of entity fusion and improves knowledge The efficiency of map construction.
  • Experimental tests show that using the entity fusion method according to the embodiment of the present disclosure, the accuracy of entity fusion can reach 87.6%.
  • step S203 includes: segmenting a phrase or short sentence into one or more words; converting one or more words into one or more word vectors; and according to one or more corresponding phrases or short sentences. Multiple word vectors, get the attribute vector of the attribute expressed by the phrase or short sentence.
  • obtaining the attribute vector of the attribute expressed by the phrase or short sentence includes: determining the weight of the one or more word vectors; The weight is a weighted average of one or more word vectors to obtain the attribute vector of the attribute expressed by the phrase or short sentence.
  • FIG. 3 shows a flowchart of yet another method 300 for entity fusion provided by an embodiment of the present disclosure.
  • the method 300 can also be executed by an electronic device, and can be implemented by software, firmware, hardware, or a combination thereof.
  • the method 300 is shown as a set of steps 301-308 and is not limited to the order of operations shown to perform the steps.
  • step 301 the attributes of the entity expressed through phrases or short sentences are extracted from the collected data.
  • attributes containing less than a certain number of words can be extracted from each attribute of the entity.
  • the attributes that can characterize the characteristics of the art entities are usually characterized by phrases or short sentences containing a limited number of words, so by selecting some attributes instead of all attributes, the amount of data that needs to be processed can be reduced, thereby improving the integration of entities speed.
  • step 302 corresponding attribute pairs of two entities from different data sources are determined.
  • the same or similar attributes shared by two entities can be searched as corresponding attribute pairs.
  • Thesaurus is a simple way that can be used to judge the similarity of words. Therefore, in some embodiments, when the two attributes in the corresponding attribute pair are both a single word, the similarity of the words can be judged by using the synonym dictionary, and then the similarity of the corresponding attributes of the corresponding attribute pair can be judged. In step 303, it can be determined whether two attributes in the corresponding attribute pair are expressed by a single word and whether they exist in the synonym dictionary.
  • step 303 can be implemented by searching for word items matching these two attributes in the thesaurus. Since the calculation complexity of the judgment made by the synonym dictionary is low, it will help speed up the judgment of attribute similarity and increase the speed of entity fusion.
  • step S304 is executed.
  • the phrase or short sentence used to express the attribute is divided into one or more words.
  • the segmentation of a phrase or a short sentence can be achieved, for example, by using a word segmentation tool to segment the phrase or short sentence.
  • one or more meaningful words can also be extracted from short sentences by means such as named entity recognition.
  • a word embedding tool such as a word2vec tool
  • a word2vec tool can be used to convert each word into a corresponding word vector.
  • the word2vec tool is a model for generating word vectors based on neural networks.
  • the word2vec tool can quickly and effectively map each word to a vector based on a given corpus through an optimized training model, that is, express it into a vector form for natural language processing.
  • Word2vec relies on skip-grams or continuous bag-of-words (CBOW) to build neural word embeddings, and get the word vector of each word with a fixed dimension.
  • CBOW continuous bag-of-words
  • the generated word vector can be used to represent the relationship between words. Because the word2vec tool also maintains contextual information in the process of vectorizing words, the word2vec tool helps to judge similarity from a semantic point of view and is better adapted to the attributes of entities expressed through phrases or short sentences Similarity judgment.
  • the word2vec tool may be trained in advance using training data.
  • the training data can be news data (12 gigabytes), encyclopedia data (20 gigabytes), novel data training (90 gigabytes), or any other data containing entity-related information.
  • the converted word vector may be a 64-dimensional or 128-dimensional vector.
  • the training data is also a 64-dimensional or 128-dimensional vector.
  • any available trained word2vec tool can also be used, such as the word2vec tool that has been disclosed.
  • each word vector is weighted and averaged to obtain the attribute vector of the attribute expressed by the phrase or short sentence.
  • the proportion of meaningful word vectors in the attribute vector can be strengthened, thereby improving the accuracy of the attribute vector in the vector space to express its corresponding attributes, thereby improving the judgment of entity similarity Accuracy.
  • the weight of the word vector can be set according to empirical values or experimental values.
  • words with specific meanings in phrases or short sentences such as person names, place names, organization names, proper nouns, or word types with obvious patterns, such as time, currency names, etc., may be assigned relatively high values.
  • the weight of For example, for the attributes of the first artwork entity in Figure 1A: content subject matter, the content subject matter includes the four words Da Vinci, Renaissance, oil painting, and portrait painting. The name “Da Vinci” in the above four words can be compared. "Is given the highest weight, the proper term “Renaissance” is given the second highest weight, and the generic terms "oil painting” and "portrait” are given a lower weight.
  • step 307 for two entities from different data sources, the attribute similarity is calculated based on the attribute vectors of the two attributes in each corresponding attribute pair of the two entities.
  • the method before converting the phrase or short sentence into a vector, the method further includes: determining whether the attribute expressed by the phrase or short sentence exists in the synonym dictionary, and responding to the attribute expressed by the phrase or short sentence It exists in the thesaurus, based on the synonym dictionary to calculate the attribute similarity corresponding to the attribute expressed by the phrase or short sentence.
  • step 303 when it is determined that the attribute in the corresponding attribute pair is a single word and exists in the synonym dictionary, in step 308, the attribute similarity is calculated based on the synonym dictionary.
  • FIG. 4 schematically shows a flowchart of a method 400 for calculating attribute similarity based on a thesaurus.
  • a synonym dictionary is obtained.
  • the synonym dictionary may be an existing synonym dictionary that can be downloaded from the Internet.
  • the synonym dictionary may also be a dictionary specially created based on the type of knowledge graph to be constructed.
  • step 402 the two attributes of the corresponding attribute pair are encoded using the synonym dictionary.
  • Step 402 includes finding all the meaning items and their codes for each attribute.
  • Sense items are all possible meanings of a word, that is to say, sense items represent items listed in the same item in the synonym dictionary according to their meaning.
  • step 403 using the number of the sense item, the similarity of the sense item is calculated according to the semantic distance of the two sense items.
  • the synonym word forest dictionary as an example, it can be judged at which level the two sense items that are leaf nodes in the synonym word forest branch (that is, where the numbers of the two sense items are different). By judging from the first level, if in a branch level, two sense items have the same number, the final coefficient corresponding to the branch level is 1, that is, multiplied by 1.
  • the branch level is obtained Corresponding coefficient, and multiply the corresponding coefficient and adjustment parameter in the branch layer as the final coefficient corresponding to the branch layer, and finally multiply the final coefficients of all branch layers to calculate the similarity of the two sense terms.
  • the adjustment parameter is used to control the similarity of the sense items between 0 and 1.
  • step 404 the similarity of all pairs of the two attributes is calculated respectively, and the similarity of the two attributes is obtained as the attribute similarity of the corresponding attribute pair.
  • Each sense item pair includes two sense items, one sense item is the sense item of one of the two words, and the other sense item is the sense item of the other of the two words.
  • the step of calculating the attribute similarity using synonym word forest is optional.
  • the word2vec tool can be directly used to convert the word vector to the word vector when extracting entity attributes, without making judgments about synonym word forests.
  • the efficiency of entity fusion can be improved while the complexity is small, and the construction efficiency of the knowledge graph can be improved.
  • Fig. 5 shows a structural block diagram of a device 500 for entity fusion according to an embodiment of the present disclosure.
  • the device 500 may be used to implement various embodiments of the entity fusion method described above.
  • the device 500 for entity fusion includes an acquisition device 511, an extraction device 512, a converter 513, and a fusion device 514.
  • the acquiring device 511 may be configured to acquire data of multiple entities.
  • the obtaining means 511 may be implemented by a web crawler.
  • the plurality of entities includes a first entity and a second entity.
  • the multiple entities may be art entities in the art field.
  • the first entity may be the first artwork entity shown in FIG. 1A
  • the second entity may be the second artwork entity shown in FIG. 1B.
  • the acquiring device 511 is also configured to acquire data of multiple entities from multiple data sources.
  • the multiple data sources include different first data sources and second data sources, the data of the first entity comes from the first data source, and the data of the second entity comes from the second data source.
  • the data source is any website or database connected to the network 530 that contains entity-related information.
  • the data source includes, but is not limited to, a news website 521, an encyclopedia website 522, a novel website 523, and the like.
  • the obtaining device 511 may be connected to a data source via a network 530, for example, in a wired or wireless manner, so as to collect data from the data source.
  • the acquired data may include structured or semi-structured data related to the entity.
  • the extraction device 512 may be configured to extract multiple attribute groups corresponding to multiple entities one-to-one from the acquired data of multiple entities.
  • each attribute group includes at least one attribute, and each attribute is expressed by a phrase or a short sentence.
  • the extraction device 512 may be configured to extract various attributes of the entity from a two-dimensional table of structured data.
  • the extraction device 512 may be configured to perform structured processing on semi-structured data.
  • the extraction device 512 may also preprocess the data (structured data or semi-structured data) to make the data The normalization of grammar and data representation.
  • the converter 513 is configured to convert the phrase or short sentence used to express the attribute of the entity into a vector to obtain the attribute vector of the attribute expressed by the phrase or the short sentence.
  • the converter 513 is further configured to segment a phrase or short sentence into one or more words; use a word embedding tool (such as the word2vec tool) to convert one or more words into one or more word vectors; And according to one or more word vectors, the attribute vector of the corresponding attribute of the phrase or short sentence is obtained.
  • a word embedding tool such as the word2vec tool
  • the converter 513 when performing the operation of obtaining the attribute vector of the corresponding attribute of the phrase or short sentence according to one or more word vectors, is further configured to: determine the weight of one or more word vectors; according to one or more word vectors The weight of the word vector is a weighted average of one or more word vectors to obtain the attribute vector of the attribute expressed by the phrase or short sentence.
  • the fuser 514 is configured to calculate the attribute similarity of the corresponding attributes of the first entity and the second entity based on the attribute vector of the corresponding attributes of the first entity and the second entity, and based on the attributes of the corresponding attributes of the first entity and the second entity Similarity, to achieve the integration of the first entity and the second entity.
  • the fusion unit 514 may include a similarity calculation module 5141 and a fusion decision module 5142.
  • the similarity calculation module 5141 may be configured to calculate the first entity and the second entity based on the attribute similarity of the corresponding attributes of the first entity and the second entity through similarity calculation methods (such as angle cosine similarity, distance calculation, etc.) The entity similarity between the two entities.
  • the similarity calculation module 5141 may also be configured to assign corresponding weights to the corresponding attributes of the first entity and the second entity; weighted average the attribute similarities of the corresponding attributes of the first entity and the second entity , To get the entity similarity between the first entity and the second entity.
  • the weight of the attributes of the entity may be assigned in advance based on expert experience. For another example, the attribute weight can also be specified by the user.
  • the fusion decision module 5142 may determine whether to perform the fusion of the first entity and the second entity by comparing the entity similarity with a predetermined threshold. For example, the fusion decision module 5142 is used to compare the entity similarity with a predetermined threshold. When the entity similarity is higher than or equal to the predetermined threshold, it is determined to merge the first entity and the second entity; when the entity similarity is lower than the predetermined threshold, it is determined not to Perform the fusion of the first entity and the second entity.
  • the predetermined threshold may be inductively derived from the known similarity between the same entity and the similarity between different entities.
  • the predetermined threshold may be determined based on empirical values or experimental values.
  • the similarity calculation module 5141 and/or the fusion decision module 5142 include codes and programs stored in a memory; the processor can execute the codes and programs to implement the similarity calculation modules 5141 and 5141 as described above. /Or merge some or all of the functions of the decision module 5142.
  • the similarity calculation module 5141 and/or the fusion decision module 5142 may be dedicated hardware devices to implement some or all of the functions of the similarity calculation module 5141 and/or the fusion decision module 5142 as described above.
  • the similarity calculation module 5141 and/or the fusion decision module 5142 may be a circuit board or a combination of multiple circuit boards to implement the functions described above.
  • the one circuit board or the combination of multiple circuit boards may include: (1) one or more processors; (2) one or more non-transitory computer-readable computers connected to the processors And (3) the processor executable firmware stored in the memory.
  • the fusion unit 514 is also configured to determine whether the attribute expressed by the phrase or short sentence exists in the synonym dictionary before converting the phrase or short sentence into a vector, and in response to the attribute expressed by the phrase or short sentence in the synonym dictionary Calculate the attribute similarity corresponding to the attribute expressed by the phrase or short sentence based on the synonym dictionary.
  • the fusion device 514 can also be configured to use machine learning classification (such as logistic regression, naive Bayes, SVM, etc.), clustering methods (such as K-means clustering, dbscan clustering, etc.) to determine two Whether the entities are the same entity and whether they need to be merged.
  • machine learning classification such as logistic regression, naive Bayes, SVM, etc.
  • clustering methods such as K-means clustering, dbscan clustering, etc.
  • Fig. 6 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure.
  • the electronic device 60 may include a memory 61 and a processor 62.
  • the memory 61 may be used to store a computer-readable program.
  • the processor 62 may be used to run a computer-readable program, and when the computer-readable program is run by the processor 62, the method for entity fusion according to any of the foregoing embodiments can be executed.
  • the processor 62 may be a central processing unit (CPU), a tensor processor (TPU) and other devices with data processing capabilities and/or program execution capabilities, and may control other components in the electronic device 60 to perform desired functions .
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the memory 61 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor 62 may run the computer-readable program to implement various functions of the electronic device 60.
  • data transmission between the memory 61 and the processor 62 may be implemented through a network or a bus system.
  • the memory 61 and the processor 62 may directly or indirectly communicate with each other.
  • the memory 61 may also include data of multiple entities.
  • FIG. 7 illustrates an example system 600 of an electronic device provided by an embodiment of the present disclosure.
  • the system 600 includes an example electronic device representing one or more systems and/or devices that can implement various technologies related to entity fusion described herein 610.
  • the electronic device 610 may be a server of a service provider, a device associated with a client (eg, client device), a system on a chip, and/or any other suitable computing device or computing system.
  • the aforementioned device 500 for entity fusion may take the form of an electronic device 610.
  • the device 500 for entity fusion may be implemented as a computer program in the form of an entity fusion application 616.
  • the electronic device 610 includes a processing system 611, one or more computer-readable media 612, and one or more I/O interfaces 613 (ie, input/output interfaces) that are communicatively coupled to each other.
  • the electronic device 610 may also include a system bus or other data and command transmission systems to couple various components to each other.
  • the system bus may include any one of different bus structures or a combination of multiple bus structures in different bus structures.
  • the bus structure includes a memory bus or a memory controller, a peripheral bus, a universal serial bus, and/or using various bus architectures. Any kind of processor or local bus.
  • the bus structure also includes control and data lines.
  • the processing system 611 represents an element that uses hardware to perform one or more operations. Therefore, the processing system 611 includes hardware elements 614 that can be configured as processors, functional blocks, and the like.
  • the processing system 611 may include other logic devices implemented as an application specific integrated circuit in hardware or formed using one or more semiconductors.
  • the hardware element 614 is not limited by the material it is formed of or the processing mechanism employed therein.
  • the processor may be composed of semiconductor(s) and/or transistors (e.g., electronic integrated circuits (ICs)).
  • the instructions executable by the processor may be electronically executable instructions.
  • the computer readable medium 612 includes a memory/storage device 615.
  • Memory/storage 615 represents the memory/storage capacity associated with one or more computer-readable media.
  • the memory/storage device 615 may include volatile media (such as random access memory (RAM)) and/or nonvolatile media (such as read only memory (ROM), flash memory, optical disks, magnetic disks, etc.).
  • the memory/storage device 615 may include fixed media (for example, RAM, ROM, fixed hard disk drive, etc.) and removable media (for example, flash memory, removable hard disk drive, optical disk, etc.).
  • the computer-readable medium 612 may be configured in various other ways as described further below.
  • One or more I/O interfaces 613 may support input/output data flow between the electronic device 610 and other components (eg, user interface, etc.).
  • One or more I/O interfaces 613 represent elements that allow a user to input commands and information to the electronic device 610, and may also allow various input/output devices to present information to the user and/or other components or devices.
  • Examples of input devices include keyboards, cursor control devices (e.g., mice), microphones (e.g., for voice input), scanners, touch functions (e.g., capacitive or other sensors configured to detect physical touch), cameras ( For example, visible or invisible wavelengths (such as infrared frequencies) can be used to detect motions that do not involve touch as gestures) and so on.
  • Examples of output devices include display devices (for example, monitors or projectors), speakers, printers, network cards, haptic response devices, and so on. Therefore, the electronic device 610 may be configured in various ways described further below to support user interaction.
  • the electronic device 610 also includes a physical integration application 616.
  • the body fusion application 616 may be a software example of the apparatus 500 for entity fusion described in FIG. 5, and can be combined with other elements in the electronic device 610 to implement the technology described herein.
  • the electronic device 610 may include a communication port, and the communication port is used to connect to a network that implements data communication.
  • the electronic device 610 can send and receive information and data through the communication port.
  • the software/hardware elements or program modules described in the present disclosure include routines, programs, objects, elements, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the terms "module”, “function” and “component” used in the present disclosure generally mean software, firmware, hardware, or a combination thereof.
  • the features of the technologies described in this disclosure are platform-independent, which means that these technologies can be implemented on various computing platforms with various processors.
  • the described modules and technologies can be implemented by being stored on some form of computer-readable medium or transmitted across some form of computer-readable medium.
  • the computer-readable medium may include various media that can be accessed by the electronic device 610.
  • the computer-readable medium may include “computer-readable storage medium” and “computer-readable signal medium”. Contrary to mere signal transmission, carrier waves or the signal itself, "computer-readable storage medium” refers to a medium and/or device capable of permanently storing information, and/or a tangible storage device.
  • the hardware element 614 and the computer-readable medium 612 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware.
  • the hardware element 614 may be used to implement the technology described herein.
  • the hardware element 614 may include an integrated circuit or a system on a chip, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a complex programmable logic device (CPLD), and other implementations in silicon or components of other hardware devices.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • CPLD complex programmable logic device
  • the hardware element 614 may serve as a processing device that executes program tasks defined by instructions, modules, and/or logic embodied by the hardware element, and as a hardware device that stores instructions for execution, for example, a hardware element 614 may be a computer-readable storage medium.
  • the software, hardware or program modules and other program modules may be implemented as one or more instructions and/or logic on some form of computer-readable storage medium and/or embodied by one or more hardware elements 614.
  • the electronic device 610 may be configured to implement specific instructions and/or functions corresponding to software and/or hardware modules. Therefore, the electronic device 610 may be implemented at least partially in hardware, for example, by using a computer-readable storage medium and/or hardware element 614 of the processing system.
  • the instructions and/or functions may be executed/operated by one or more articles (for example, one or more electronic devices 610 and/or processing system 611) to implement the techniques, modules, and examples described herein.
  • the electronic device 610 may adopt various different configurations.
  • the electronic device 610 may be implemented as a computer-type device including a personal computer, a desktop computer, a multi-screen computer, a laptop computer, a netbook, and the like.
  • the electronic device 610 may also be implemented as a mobile device type device including mobile devices such as a mobile phone, a portable music player, a portable game device, a tablet computer, and a multi-screen computer.
  • the electronic device 610 may also be implemented as a television-type device, including a device with or connected to a larger screen in a casual viewing environment. These devices include televisions, set-top boxes, game consoles, etc.
  • the technology described herein may be supported by various configurations of the electronic device 610 and is not limited to specific examples of the technology described herein. As shown in FIG. 7, the functions can also be implemented in whole or in part on the "cloud" 620 through the use of a distributed system, for example, through the platform 622.
  • the cloud 620 includes and/or represents a platform 622 for resources 624.
  • the platform 622 abstracts the underlying functions of the hardware (for example, server) and software resources of the cloud 620.
  • the resource 624 may include applications and/or data that can be used when performing computer processing on a server remote from the electronic device 610.
  • Resources 624 may also include services provided through the Internet and/or through networks such as cellular or Wi-Fi networks.
  • the platform 622 can abstract resources and functions to connect the electronic device 610 with other electronic devices.
  • the platform 622 can also be used for the classification of abstract resources. Therefore, in the interconnected device embodiment, the implementation of the functions described herein may be distributed throughout the system 600. For example, the functions may be partially implemented on the electronic device 610 and through a platform 622 that abstracts the functions of the cloud 620.
  • FIG. 8 is a schematic diagram of a non-transitory storage medium provided by at least one embodiment of the present disclosure.
  • one or more computer-readable instructions 801 may be stored non-transitory on a non-transitory computer-readable storage medium 800.
  • the computer-readable instruction 801 is executed by a computer, one or more steps in the method for entity fusion according to any of the above embodiments can be executed.
  • the non-transitory computer-readable storage medium 800 may be applied to the aforementioned apparatus 500 for entity fusion and the electronic device 60.
  • the non-transitory computer-readable storage medium 800 may include the memory 61 in the electronic device 60.
  • non-transitory computer-readable storage medium 800 reference may be made to the description of the memory in the embodiment of the electronic device 60, and repetitions are not repeated.
  • each functional module may be implemented in a single module, implemented in multiple modules, or implemented as a part of other functional modules without departing from the present disclosure.
  • functionality described as being performed by a single module may be performed by multiple different modules. Therefore, references to specific functional modules are only regarded as references to appropriate modules for providing the described functionality, rather than indicating a strict logical or physical structure or organization. Therefore, the present disclosure may be implemented in a single module, or may be physically and functionally distributed between different modules and circuits.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un dispositif de fusion d'entités, ainsi qu'un dispositif électronique et un support de stockage. Le procédé comprend : l'acquisition de données d'une pluralité d'entités (201) ; l'extraction, à partir des données acquises de la pluralité d'entités, d'une pluralité de groupes d'attributs en correspondance biunivoque avec la pluralité d'entités (202) ; la conversion d'une expression ou d'une phrase courte en un vecteur de façon à obtenir un vecteur d'attribut d'un attribut exprimé par l'expression ou la phrase courte (203) ; le calcul de la similarité d'attributs entre des attributs correspondants d'une première entité et d'une seconde entité sur la base de vecteurs d'attributs des attributs correspondants (204) ; et la fusion de la première entité et de la seconde entité sur la base de la similarité d'attributs entre les attributs correspondants (205).
PCT/CN2020/085909 2019-06-20 2020-04-21 Procédé et dispositif de fusion d'entités, dispositif électronique et support de stockage WO2020253355A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910536514.1A CN110222200A (zh) 2019-06-20 2019-06-20 用于实体融合的方法和设备
CN201910536514.1 2019-06-20

Publications (1)

Publication Number Publication Date
WO2020253355A1 true WO2020253355A1 (fr) 2020-12-24

Family

ID=67814301

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085909 WO2020253355A1 (fr) 2019-06-20 2020-04-21 Procédé et dispositif de fusion d'entités, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN110222200A (fr)
WO (1) WO2020253355A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118194214A (zh) * 2024-05-20 2024-06-14 江西博微新技术有限公司 一种输电立体巡检方法、系统、计算机及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222200A (zh) * 2019-06-20 2019-09-10 京东方科技集团股份有限公司 用于实体融合的方法和设备
CN110909170B (zh) * 2019-10-12 2022-09-23 百度在线网络技术(北京)有限公司 兴趣点知识图谱构建方法、装置、电子设备及存储介质
CN111241212B (zh) * 2020-01-20 2023-10-24 京东方科技集团股份有限公司 知识图谱的构建方法及装置、存储介质、电子设备
CN111597788B (zh) * 2020-05-18 2023-11-14 腾讯科技(深圳)有限公司 基于实体对齐的属性融合方法、装置、设备及存储介质
CN111522968B (zh) * 2020-06-22 2023-09-08 中国银行股份有限公司 知识图谱融合方法及装置
CN111897968A (zh) * 2020-07-20 2020-11-06 国网浙江省电力有限公司嘉兴供电公司 一种工业信息安全知识图谱构建方法和系统
CN113705236B (zh) * 2021-04-02 2024-06-11 腾讯科技(深圳)有限公司 实体比较方法、装置、设备及计算机可读存储介质
CN113609838B (zh) * 2021-07-14 2024-05-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 文档信息抽取及图谱化方法和系统
CN113760995A (zh) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 一种实体链接方法及系统、设备和存储介质
CN114139547B (zh) * 2021-11-25 2023-07-04 北京中科闻歌科技股份有限公司 知识融合方法、装置、设备、系统及介质
CN114169966B (zh) * 2021-12-08 2022-08-05 海南港航控股有限公司 一种用张量提取货物订单元数据的方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014055155A1 (fr) * 2012-10-01 2014-04-10 Recommind, Inc. Analyse de pertinence de document dans des systèmes d'apprentissage machine
CN104699818A (zh) * 2015-03-25 2015-06-10 武汉大学 一种多源异构的多属性poi融合方法
CN108572947A (zh) * 2017-03-13 2018-09-25 腾讯科技(深圳)有限公司 一种数据融合方法及装置
CN108647318A (zh) * 2018-05-10 2018-10-12 北京航空航天大学 一种基于多源数据的知识融合方法
CN110222200A (zh) * 2019-06-20 2019-09-10 京东方科技集团股份有限公司 用于实体融合的方法和设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110128552A (ko) * 2010-05-24 2011-11-30 임춘성 컨설팅 지식융합 방법 및 그 시스템
CN105893481B (zh) * 2016-03-29 2019-01-29 国家计算机网络与信息安全管理中心 一种基于马尔可夫聚类的实体间关系消解方法
CN108804544A (zh) * 2018-05-17 2018-11-13 深圳市小蛙数据科技有限公司 互联网影视多源数据融合方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014055155A1 (fr) * 2012-10-01 2014-04-10 Recommind, Inc. Analyse de pertinence de document dans des systèmes d'apprentissage machine
CN104699818A (zh) * 2015-03-25 2015-06-10 武汉大学 一种多源异构的多属性poi融合方法
CN108572947A (zh) * 2017-03-13 2018-09-25 腾讯科技(深圳)有限公司 一种数据融合方法及装置
CN108647318A (zh) * 2018-05-10 2018-10-12 北京航空航天大学 一种基于多源数据的知识融合方法
CN110222200A (zh) * 2019-06-20 2019-09-10 京东方科技集团股份有限公司 用于实体融合的方法和设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118194214A (zh) * 2024-05-20 2024-06-14 江西博微新技术有限公司 一种输电立体巡检方法、系统、计算机及存储介质

Also Published As

Publication number Publication date
CN110222200A (zh) 2019-09-10

Similar Documents

Publication Publication Date Title
WO2020253355A1 (fr) Procédé et dispositif de fusion d'entités, dispositif électronique et support de stockage
WO2022007823A1 (fr) Procédé et dispositif de traitement de données de texte
US10915577B2 (en) Constructing enterprise-specific knowledge graphs
RU2678716C1 (ru) Использование автоэнкодеров для обучения классификаторов текстов на естественном языке
US10394854B2 (en) Inferring entity attribute values
KR102354716B1 (ko) 딥 러닝 모델을 이용한 상황 의존 검색 기법
JP5936698B2 (ja) 単語意味関係抽出装置
WO2020062770A1 (fr) Procédé et appareil de construction de dictionnaire de domaine et dispositif et support d'enregistrement
CN106776673B (zh) 多媒体文档概括
KR20200094627A (ko) 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체
WO2019041521A1 (fr) Appareil et procédé d'extraction de mot-clé d'utilisateur et support de mémoire lisible par ordinateur
US9483460B2 (en) Automated formation of specialized dictionaries
CN112632226B (zh) 基于法律知识图谱的语义搜索方法、装置和电子设备
CN111539197A (zh) 文本匹配方法和装置以及计算机系统和可读存储介质
CN111737997A (zh) 一种文本相似度确定方法、设备及储存介质
CN112100401B (zh) 面向科技服务的知识图谱构建方法、装置、设备及存储介质
CN110134965B (zh) 用于信息处理的方法、装置、设备和计算机可读存储介质
EP3385868A1 (fr) Désambiguïsation de titres dans une taxonomie de réseau social
KR20190118744A (ko) 딥러닝 기반의 지식 구조 생성 방법을 활용한 의료 문헌 구절 검색 방법 및 시스템
CN116821195B (zh) 一种基于数据库自动生成应用的方法
TWI640877B (zh) 語意分析裝置、方法及其電腦程式產品
JP7343649B2 (ja) 埋め込み類似度に基づく商品検索方法、コンピュータ装置、およびコンピュータプログラム
WO2022228127A1 (fr) Procédé et appareil de traitement de texte d'élément, dispositif électronique et support de stockage
JP2021163477A (ja) 画像処理方法及び装置、電子機器、コンピュータ可読記憶媒体並びにコンピュータプログラム
WO2015159702A1 (fr) Système d'extraction d'informations partielles

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20826686

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20826686

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20826686

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.07.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20826686

Country of ref document: EP

Kind code of ref document: A1