CN114490884A - Method and device for determining entity association relationship, electronic equipment and storage medium - Google Patents

Method and device for determining entity association relationship, electronic equipment and storage medium Download PDF

Info

Publication number
CN114490884A
CN114490884A CN202111576489.3A CN202111576489A CN114490884A CN 114490884 A CN114490884 A CN 114490884A CN 202111576489 A CN202111576489 A CN 202111576489A CN 114490884 A CN114490884 A CN 114490884A
Authority
CN
China
Prior art keywords
entities
commodity
determining
entity
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111576489.3A
Other languages
Chinese (zh)
Other versions
CN114490884B (en
Inventor
陈凤娇
刘金宝
曹雪智
张富峥
武威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN202111576489.3A priority Critical patent/CN114490884B/en
Publication of CN114490884A publication Critical patent/CN114490884A/en
Application granted granted Critical
Publication of CN114490884B publication Critical patent/CN114490884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method, a device, electronic equipment and a storage medium for determining entity incidence relation, relates to the technical field of data processing, and aims to provide a method for determining entity incidence relation, wherein the method does not depend on user behavior data, and semantic relevance between the determined entities with the incidence relation is strong. The method comprises the following steps: carrying out entity identification on the title to obtain a plurality of entities in the commodity title; respectively acquiring respective semantic vectors of the plurality of entities; and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.

Description

Method and device for determining entity association relationship, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for determining an entity association relationship, an electronic device, and a storage medium.
Background
Knowledge maps have been widely used, and the basic component of knowledge maps, entities (entities), has also gained increasing attention. An entity is an identifier unique to a thing and is also an important hub for connecting unstructured text and structured knowledge. Determining the association relationship between the entities becomes an important part.
In the related art, the association relationship between entities is usually determined based on user behavior, for example, if a user searches for a tomato and then clicks on the tomato, the tomato and the tomato are considered to be possibly synonyms; or directly generating words with similar semantemes through a machine translation model. However, the user behavior-based method requires rich user behavior data and depends on the quality of query terms input by the user; the resulting semantics generated by the machine translation model-based approach drift severely.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for determining an entity association relationship, so as to overcome the foregoing problems or at least partially solve the foregoing problems.
In a first aspect of the embodiments of the present invention, a method for determining an entity association relationship is provided, where the method includes:
carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
respectively acquiring respective semantic vectors of the plurality of entities;
and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.
Optionally, determining a target association relationship between two entities includes:
splicing the semantic vectors of every two entities in a positive sequence to obtain a first spliced vector, and splicing the semantic vectors of every two entities in a reverse sequence to obtain a second spliced vector;
determining whether correlation exists between every two entities according to the first splicing vector, and determining whether correlation exists between every two entities according to the second splicing vector;
determining the target association relationship between every two entities as a synonym relationship under the condition that the two determination results are that the association exists between every two entities;
and under the condition that one determination result is that the association exists between every two entities and the other determination result is that the association does not exist between every two entities, determining that the target association relationship between every two entities is a high-low word relationship.
Optionally, the method further comprises:
determining an incidence relation between a plurality of pairwise entities aiming at a plurality of commodity titles;
aggregating a plurality of incidence relations including the target incidence relation, and determining the incidence relation between every two entities;
the method further comprises the following steps:
and completing the commodity map based on the determined incidence relation.
Optionally, after determining the association relationship between each two entities, the method further includes:
acquiring the context of each entity with the incidence relation in the commodity title;
storing the context with the occurrence frequency higher than the preset frequency and the incidence relation between every two entities in a template;
acquiring a new commodity title, and matching the new commodity title with the stored template;
and according to the matching result, determining the entity which has the association relation with the entity in the template from the new commodity title.
Optionally, before the entity identification of the title of the goods, the method includes:
acquiring a preset filtering word, wherein the preset filtering word indicates that the commodity type is a combined commodity;
and filtering the candidate commodity title containing the preset filter words to obtain the commodity title.
Optionally, the method further comprises:
acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities with the same view, and text superposition exists;
obtaining pictures of the two commodities and comparing the similarity;
and under the condition that the similarity is higher than a preset threshold value, determining that the association relation between the commodity entities in the pair of commodity titles is a homonym relation.
Optionally, the method further comprises:
acquiring a query word;
determining a rewriting word of the query word based on the determined incidence relation;
and carrying out commodity inquiry based on the inquiry words and the rewriting words thereof.
In a second aspect of the embodiments of the present invention, there is provided an apparatus for determining an entity association relationship, where the apparatus includes:
the entity identification module is used for carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
a semantic vector acquisition module, configured to acquire respective semantic vectors of the multiple entities respectively;
and the incidence relation determining module is used for determining a target incidence relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.
In a third aspect of the embodiments of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the method for determining an entity association relationship according to the first aspect of the embodiments of the present invention is implemented.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for determining entity association relationship according to the first aspect of the embodiments of the present invention.
The embodiment of the invention has the following advantages:
in this embodiment, entity identification may be performed on a commodity title to obtain a plurality of entities in the commodity title; respectively acquiring respective semantic vectors of the plurality of entities; and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities. Therefore, the semantic relevance between the entities in the commodity title is strong, the target relevance between the entities in the commodity title is determined, and the strong semantic relevance between the determined entities with the target relevance can be ensured; in addition, the commodity titles are not influenced by user behavior data, so that the method for determining the entity association relationship through the commodity titles can cover new commodity titles and long-tail commodity titles with low flow.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flowchart illustrating steps of a method for determining entity associations according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a process of determining entity relationships according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for determining an entity association relationship in an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
In order to solve the problems that a method for determining entity association relationship in the related art needs abundant user behavior data, semantic drift is serious and the like, the applicant proposes that: the semantic relevance between the entities in the commodity title is strong, and the association relation between the entities in the commodity title can be determined by mining the commodity title.
Referring to fig. 1, a flowchart illustrating steps of a method for determining an entity association relationship in an embodiment of the present invention is shown, and as shown in fig. 1, the method for determining an entity association relationship may specifically include the following steps:
step S11: and carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title.
The co-occurring commodities in the packaged commodity title are usually not related, and the semantic relevance between the identified entities is not strong, for example, if the cut commodity title is 'watermelon + apple + juicy peach', and the gift box commodity title is 'peanut, cashew nut, almond kernel, walnut kernel yearly gift box'.
Therefore, before entity identification is carried out on the commodity titles, the commodity titles need to be screened firstly, the commodity titles can be screened manually, and the commodity titles can also be screened through preset filter words. Optionally, as an embodiment, the filtering the product titles by using the preset filter words includes: before entity identification is carried out on the commodity titles, candidate commodity titles are obtained, and preset filtering words are obtained, wherein the preset filtering words indicate that the commodity types are combined commodities; and filtering the candidate commodity title containing the preset filter words to obtain the commodity title.
The preset filter words indicate that the commodity type is a combined commodity, such as a fruit cut, a gift box, a gift bag, mixed package and the like. And identifying whether the candidate commodity title contains a preset filter word or not, wherein the co-occurrence commodities in the commodity title containing the preset filter word are usually irrelevant. And filtering the candidate commodity titles containing the preset filter words by using the preset filter words to obtain the commodity titles not containing the preset filter words.
In this way, the entities in the commodity titles for mining the association relationship between the entities can be ensured to have strong semantic association, and correspondingly, the entities with the association relationship determined by the commodity titles also have strong semantic association.
The entity identification is to extract entity information from the text data and perform entity identification on the title of the product, and usually a plurality of entities such as products, brands, categories, and the like are identified. For example, the entities "meal magic," thermos cup, "and" water cup "may be identified for the title" meal magic brief lightweight thermos cup portable straight water cup. The embodiment of the invention does not limit the method for entity identification.
Step S12: and respectively acquiring the respective semantic vectors of the plurality of entities.
After the entities are identified, semantic vectors for each entity can be obtained through a pre-trained model. The pre-training model may be a BERT (Bidirectional Encoder representation from transforms) model or other models, which is not limited in this respect. The position of each entity can be marked by using different marks, and then the whole commodity title is input into the pre-training model, so that the obtained semantic vector of the entity is more accurate.
Step S13: and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.
If there is a target association relationship between every two entities, the two entities are rewrite words, and the rewrite word of one word is a synonym or hypernym of the same. Any two entities in the multiple entities are spliced in tandem, and whether the two entities are the rewritten words or not is predicted through a pre-trained multilayer fully-connected network. And then, the two entities are spliced again after the order of the two entities is reversed, and then the two entities after splicing again are subjected to a pre-trained multilayer fully-connected network to predict whether the two entities are the rewritten words or not. By predicting the result twice, the target association relationship between the two entities can be determined.
The expression ability of the multilayer full-connection network is strong, the spliced semantic vectors are identified through the pre-trained multilayer full-connection network, and whether entities corresponding to the two semantic vectors are rewritten words or not can be predicted. The multi-layer fully-connected network may be obtained by supervised training or self-supervised training, which is not limited by the embodiment of the present invention.
Alternatively, the incidence relation between the entities can also be predicted through a deep learning model, an entity relation recognition model and the like.
Optionally, as an embodiment, determining a target association relationship between two entities includes: splicing the semantic vectors of every two entities in a positive sequence to obtain a first spliced vector, and splicing the semantic vectors of every two entities in a reverse sequence to obtain a second spliced vector; determining whether correlation exists between every two entities according to the first splicing vector, and determining whether correlation exists between every two entities according to the second splicing vector; determining the target association relationship between every two entities as a synonym relationship under the condition that the two determination results are that the association exists between every two entities; and under the condition that one determination result is that the association exists between every two entities and the other determination result is that the association does not exist between every two entities, determining that the target association relationship between every two entities is a high-low word relationship.
Performing positive sequence splicing on semantic vectors of every two entities to obtain a first spliced vector; performing reverse-order splicing on the semantic vectors of every two entities to obtain a second spliced vector; wherein, the positive sequence and the negative sequence are only two opposite sequences. For example, the first splicing vector and the second splicing vector obtained by splicing the entity A and the entity B in the positive order and splicing the entity B in the negative order can be any and different one of A-B or B-A respectively.
And determining whether the correlation exists between every two entities according to the first splicing vector and determining whether the correlation exists between every two entities according to the second splicing vector through a pre-trained multilayer full-connection network and a deep learning model. And under the condition that the two determination results are that the association exists, the target association relationship between every two entities is a synonym relationship. And in the case that the two determination results are that no association exists, the two entities have no association relationship. Under the condition that one determination result is that the association exists and the other determination result is that the association does not exist, the target association relation between every two entities is a high-low level word relation; wherein, which entity is the superior word and which entity is the inferior word is determined according to a model for determining whether the association exists between every two entities; if the model determines whether the preceding semantic vector in the spliced vector is the hypernym or synonym of the subsequent semantic vector, the entity corresponding to the preceding semantic vector is the hypernym of the entity corresponding to the subsequent semantic vector in the spliced vector with the association if one determination result is that the association exists and the other determination result is that the association does not exist.
For example, a first splicing vector obtained by the entity A and the entity B is A-B, and a second splicing vector obtained is B-A;
if A and B are related in the first splicing vector A-B, A is an hypernym or synonym of B, B and A are also related in the second splicing vector B-A, and B is the hypernym or synonym of A, the entity A and the entity B can only be synonyms of each other;
if in the first splicing vector A-B and the second splicing vector B-A, neither A nor B is associated, and neither A nor B is a hypernym or a synonym of the opposite party, the entity A and the entity B are not associated;
and if A and B are associated in the first spliced vector A-B, and A is the hypernym or synonym of B, and B and A are not associated in the second spliced vector B-A, and B is not the hypernym or synonym of A, the entity A is the hypernym of the entity B.
Alternatively, the target association relationship between every two entities can be directly identified through a pre-trained entity relationship identification model.
By adopting the technical scheme of the embodiment of the application, entity identification can be carried out on the commodity title to obtain a plurality of entities in the commodity title; respectively acquiring respective semantic vectors of the plurality of entities; and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities. Therefore, the semantic relevance between the entities in the commodity title is strong, the target relevance between the entities in the commodity title is determined, and the strong semantic relevance between the determined entities with the target relevance can be ensured; in addition, the commodity titles are not influenced by user behavior data, so that the method for determining the entity association relationship through the commodity titles can cover new commodity titles and long-tail commodity titles with low flow.
Optionally, as an embodiment, the method for determining an entity association relationship further includes: determining an incidence relation between a plurality of pairwise entities aiming at a plurality of commodity titles; aggregating a plurality of association relations including the target association relation, and determining association relations between every two entities; the method further comprises the following steps: and completing the commodity map based on the determined incidence relation.
For each item title, an association between each pair of two entities of the plurality of entities in the item title may be determined, and for the plurality of item titles, an association between each pair of two entities of the plurality of entities in each item title may be determined. In order to make the obtained association relationship between the entities more accurate and exclude that a small part of the commodity titles have particularity, the association relationship between the entities determined by a plurality of commodity titles can be aggregated to ensure that the obtained association relationship is more accurate.
In order to determine whether an association relationship exists between an entity A and an entity B, a plurality of commodity titles comprising the entity A or the entity B are obtained, then a plurality of target association relationships of the entity A and a plurality of target association relationships of the entity B in the plurality of commodity titles are obtained, a plurality of target association relationships of the entity A and a plurality of target association relationships of the entity B are judged, and related scores of the target association relationships of the entity A and the entity B are judged, and a mean value of the obtained related scores is obtained and compared with a preset score to determine the association relationship between the entity A and the entity B. For example, predicting whether the entity "cake" is the superior word of the entity "moon cake", obtaining a plurality of product titles as "special product of traditional cake of green bean cake", "traditional cake of five kernels moon cake", and "traditional cake of yolk moon cake", respectively, may obtain the association relationship: the cake-green bean cake (hypernym), cake-moon cake (hypernym), wherein the relative score of the cake-green bean cake (hypernym) is 0.2, and the relative score of the cake-moon cake (hypernym) is 0.9, so that the relative score of the average three correlations is 0.67, which is greater than the preset score of 0.5, and thus the entity "cake" is the hypernym of the entity "moon cake".
Optionally, the association relations may be directly aggregated, and the association relation in the aggregated cluster is determined to account for the proportion of the corresponding association relation, and the association relation in the cluster is determined to be the association relation of the two entities when the proportion is higher than a preset proportion.
After determining the association relationship between two entities, the commodity map or the corresponding knowledge map may be completed based on the determined association relationship.
By adopting the technical scheme of the embodiment of the application, the influence of wrong incidence relation determined by a small number of special commodity titles can be eliminated through a plurality of commodity titles, the accuracy of the confirmed incidence relation of every two entities is ensured, and the commodity map is completed based on the determined correct incidence relation.
Optionally, as an embodiment, after determining the association relationship between every two entities, the method further includes: acquiring the context of each entity with the incidence relation in the commodity title; storing the context with the occurrence frequency higher than the preset frequency and the incidence relation between every two entities in a template; acquiring a new commodity title, and matching the new commodity title with the stored template; and according to the matching result, determining the entity which has the association relation with the entity in the template from the new commodity title.
After determining the association relationship between two entities, storing the two entities and the association relationship in a template. And then when a new commodity title is obtained, in order to obtain a rewritten word of a certain entity in the new commodity title, only another entity corresponding to the certain entity needs to be directly inquired in the template.
An entity may have multiple rewrites for different associations. Optionally, a plurality of product titles corresponding to the determined association relationship may be obtained, contexts of the two entities in the plurality of product titles may be obtained, and the contexts with the occurrence frequency higher than the preset frequency and the association relationship between the two entities may be stored in the template. In this way, when acquiring the rewritten word of the entity in the new product title, the corresponding rewritten word of the entity in the context can be determined by matching the contexts of the entities at the same time, so that the rewritten word can be acquired more accurately.
By adopting the technical scheme of the embodiment of the application, after the entity incidence relation is determined, every two entities and the entity incidence relation thereof are stored in the template, and the context with the occurrence frequency higher than the preset frequency is also stored in the template, so that when the rewriting words of the entities in the new commodity title are determined, only the stored template needs to be directly inquired, and the efficiency is higher.
Optionally, as an embodiment, the method for determining an entity association relationship further includes: acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities with the same view, and text superposition exists; obtaining pictures of the two commodities and comparing the similarity; and under the condition that the similarity is higher than a preset threshold value, determining that the association relation between the commodity entities in the pair of commodity titles is a homonym relation.
Some commodities with shorter titles and lacking co-occurrence entities, such as fresh fruit and vegetable commodities, have only a single main entity, and the information provided by the titles of the commodities is insufficient, but the image correlation of the commodities is high, and the incidence relation of the commodity entities can be judged through the image correlation. For example, the commodity title "selenium watermelon sand" only contains a single main entity "selenium watermelon sand" and does not have multiple entities, so that the association relationship between every two entities cannot be mined based on the commodity title. In this regard, the applicant has conceived that the association relationship of the entities may be determined based on the similarity of the pictures of the commodities included in the pair or the plurality of commodity titles.
And determining the incidence relation of the commodity entities according to the image similarity, and comparing the similarity of the images of the commodities in the same category in order to reduce the calculation amount and improve the accuracy. The method comprises the steps of firstly obtaining a plurality of commodity categories, aggregating commodities of the same category, and then identifying commodity entities in commodity titles under each category.
The method comprises the steps of obtaining a pair of product titles with the same category and overlapped texts of the product titles and product entities in the pair of product titles. And acquiring pictures of the two commodities, wherein the pictures are usually main commodities. And acquiring the representation vectors of the pictures of the two commodities, and calculating the similarity of the pictures of the two commodities according to the representation vectors of the pictures of the two commodities. The representation vector of the picture may be obtained through a pre-trained picture representation model, which is not limited in this embodiment of the present invention.
And under the condition that the similarity is higher than a preset threshold value, determining that the two commodity entities are homonyms. The homonyms comprise synonyms and belong to words corresponding to nodes below the same hypernym. For example, selenium sand melon and stone-seam melon are the lower-level words of watermelon, and selenium sand melon and stone-seam melon are the same-level words.
After determining the hypernym and the collocate of an entity, the hypernym of the collocate of the entity can be determined as the hypernym of the entity. For example, after the superior word of the selenium-watermelon is determined to be the watermelon and the selenium-watermelon and the stone-crack watermelon are the collocated words, the superior word of the stone-crack watermelon can be determined to be the watermelon.
Optionally, the association relationship of the multiple pairs of commodity entities is determined through the multiple pairs of commodity titles and the commodity entities in the multiple pairs of commodity titles, and the association relationship of the multiple pairs of commodity entities is aggregated to determine the final association relationship of the commodity entities. The method for determining the association relationship between the pairs of commodity titles and the commodity entities in the pairs of commodity titles may refer to the method for determining the association relationship between the entities according to the plurality of commodity titles, and will not be described in detail herein.
Referring to fig. 2, a schematic diagram of determining entity relationships from pictures of goods is shown. Wherein, the commodities are selenium sand melon and stone-seam melon which belong to the same fruit purpose. Acquiring titles and pictures corresponding to commodities, and respectively carrying out entity identification on the two commodity titles to identify commodity entity selenium sand melons and commodity entity stone crack melons; and acquiring picture representation vectors of the two commodities, and calculating the similarity of the two representation vectors. When the similarity of the expression vectors of the pictures of the two commodities exceeds a preset threshold value, the two commodities are homonyms. And querying the upper lexicon to know that the watermelon is the superior word of the stone watermelon, and determining that the superior word of the selenium watermelon is the watermelon, so that the generated rewritten word of the selenium watermelon is the watermelon.
By adopting the technical scheme of the embodiment of the application, the incidence relation of the commodity entities is determined by utilizing the information of the commodity pictures on the basis of utilizing the commodity titles, and the problem that part of the commodity titles are too short and only contain a single main entity is solved.
Optionally, as an embodiment, the method for determining an entity association relationship further includes: acquiring a query word; determining a rewriting word of the query word based on the determined incidence relation; and carrying out commodity inquiry based on the inquiry words and the rewriting words thereof.
When the commodity is inquired by using the inquiry words, the inquiry words are obtained, the rewritten words of the inquiry words are determined through the incidence relation in the knowledge map, the commodity map or the stored template, and the commodity inquiry is carried out based on the inquiry words and the rewritten words. Therefore, the recall rate of the commodities can be improved when the commodities are inquired.
Optionally, the entity in the commodity title is rewritten in advance through the association relationship in the knowledge map, the commodity map or the stored template, and a plurality of commodity titles after each commodity title is rewritten are stored, when the query is performed by using the query word, the query is performed in the commodity title and the rewritten commodity title, and when the rewritten commodity title is queried, the original commodity title corresponding to the rewritten commodity title is recalled.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Fig. 3 is a schematic structural diagram of an apparatus for determining an entity association relationship according to an embodiment of the present invention, and as shown in fig. 3, the apparatus for determining an entity association relationship includes an entity identification module, a semantic vector acquisition module, and an association relationship determination module, where:
the entity identification module is used for carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
a semantic vector acquisition module, configured to acquire respective semantic vectors of the plurality of entities respectively;
and the incidence relation determining module is used for determining a target incidence relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.
Optionally, as an embodiment, the association relation determining module includes:
the splicing unit is used for splicing the semantic vectors of every two entities in a positive sequence to obtain a first splicing vector, and splicing the semantic vectors of every two entities in a reverse sequence to obtain a second splicing vector;
the association determining unit is used for determining whether association exists between every two entities according to the first splicing vector and determining whether association exists between every two entities according to the second splicing vector;
the first relation determining unit is used for determining that the target association relation between every two entities is a synonym relation under the condition that the two determination results are that the association exists between every two entities;
and the second relation determining unit is used for determining that the target association relation between every two entities is an upper-lower-level word relation under the condition that one determining result is that the association exists between every two entities and the other determining result is that the association does not exist between every two entities.
Optionally, as an embodiment, the apparatus further includes:
the system comprises a plurality of incidence relation determining modules, a plurality of association relation determining module and a plurality of display modules, wherein the incidence relation determining modules are used for determining incidence relations between every two entities aiming at a plurality of commodity titles;
the aggregation module is used for aggregating a plurality of incidence relations including the target incidence relation and determining the incidence relation between every two entities;
the device further comprises:
and the map completion module is used for completing the commodity map based on the determined incidence relation.
Optionally, as an embodiment, the apparatus further includes:
the context acquisition module is used for acquiring the context of each entity with the association relationship in the commodity title after determining the association relationship between each entity;
the storage module is used for storing the context with the occurrence frequency higher than the preset frequency and the incidence relation between every two entities in the template;
the matching module is used for acquiring a new commodity title and matching the new commodity title with the stored template;
and the entity determining module is used for determining the entity which has the incidence relation with the entity in the template from the new commodity title according to the matching result.
Optionally, as an embodiment, the apparatus further includes:
the system comprises a filtering word acquisition module, a filtering word processing module and a filtering word processing module, wherein the filtering word acquisition module is used for acquiring a preset filtering word before entity identification is carried out on a commodity title, and the preset filtering word indicates that the commodity type is a combined commodity;
and the filtering module is used for filtering the candidate commodity title containing the preset filter words to obtain the commodity title.
Optionally, as an embodiment, the apparatus further includes:
the system comprises a commodity entity acquisition module, a commodity entity acquisition module and a commodity entity display module, wherein the commodity entity acquisition module is used for acquiring a pair of commodity titles and commodity entities in the commodity titles, and the commodity titles are titles of two commodities with the same category and have text superposition;
the similarity comparison module is used for acquiring pictures of the two commodities and comparing the similarity;
and the homonym module is used for determining that the association relation between the commodity entities in the pair of commodity titles is a homonym relation under the condition that the similarity is higher than a preset threshold value.
Optionally, as an embodiment, the apparatus further includes:
the query term module is used for acquiring query terms;
a rewrite word module for determining a rewrite word of the query word based on the determined association relationship;
and the commodity query module is used for carrying out commodity query based on the query words and the rewriting words thereof.
By adopting the technical scheme of the embodiment of the application, entity identification can be carried out on the commodity title to obtain a plurality of entities in the commodity title; respectively acquiring respective semantic vectors of the plurality of entities; and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities. Therefore, the semantic relevance between the entities in the commodity title is strong, the target relevance between the entities in the commodity title is determined, and the strong semantic relevance between the determined entities with the target relevance can be ensured; in addition, the commodity titles are not influenced by user behavior data, so that the method for determining the entity association relationship through the commodity titles can cover new commodity titles and long-tail commodity titles with low flow.
It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points.
An embodiment of the present invention further provides an electronic device, including: a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method for determining entity association relationship according to any of the above embodiments.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed, the method for determining an entity association relationship in any of the above embodiments is implemented.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method, the apparatus, the electronic device, and the storage medium for determining entity association provided by the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for determining entity associations, the method comprising:
carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
respectively acquiring respective semantic vectors of the plurality of entities;
and determining a target association relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.
2. The method of claim 1, wherein determining a target association between two entities comprises:
splicing the semantic vectors of every two entities in a positive sequence to obtain a first spliced vector, and splicing the semantic vectors of every two entities in a reverse sequence to obtain a second spliced vector;
determining whether correlation exists between every two entities according to the first splicing vector, and determining whether correlation exists between every two entities according to the second splicing vector;
determining the target association relationship between every two entities as a synonym relationship under the condition that the two determination results are that the association exists between every two entities;
and under the condition that one determination result is that the association exists between every two entities and the other determination result is that the association does not exist between every two entities, determining that the target association relationship between every two entities is a high-low word relationship.
3. The method of claim 1, further comprising:
determining an incidence relation between a plurality of pairwise entities aiming at a plurality of commodity titles;
aggregating a plurality of incidence relations including the target incidence relation, and determining the incidence relation between every two entities;
the method further comprises the following steps:
and completing the commodity map based on the determined incidence relation.
4. The method of claim 3, wherein after determining the association relationship between two entities, further comprising:
acquiring the context of each entity with the incidence relation in the commodity title;
storing the context with the occurrence frequency higher than the preset frequency and the incidence relation between every two entities in a template;
acquiring a new commodity title, and matching the new commodity title with the stored template;
and according to the matching result, determining the entity which has the association relation with the entity in the template from the new commodity title.
5. The method of claim 1, prior to the entity identifying the item title, comprising:
acquiring a preset filtering word, wherein the preset filtering word indicates that the commodity type is a combined commodity;
and filtering the candidate commodity title containing the preset filter words to obtain the commodity title.
6. The method of claim 1, further comprising:
acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities with the same view, and text superposition exists;
obtaining pictures of the two commodities and comparing the similarity;
and under the condition that the similarity is higher than a preset threshold value, determining that the association relation between the commodity entities in the pair of commodity titles is a homonym relation.
7. The method according to any one of claims 1-6, further comprising:
acquiring a query word;
determining a rewriting word of the query word based on the determined incidence relation;
and carrying out commodity inquiry based on the inquiry words and the rewriting words thereof.
8. An apparatus for determining entity associations, the apparatus comprising:
the entity identification module is used for carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
a semantic vector acquisition module, configured to acquire respective semantic vectors of the plurality of entities respectively;
and the incidence relation determining module is used for determining a target incidence relation between every two entities according to the respective semantic vectors of every two entities in the plurality of entities.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for determining entity associations according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method of determining entity associations according to any one of claims 1 to 7.
CN202111576489.3A 2021-12-21 2021-12-21 Method, device, electronic equipment and storage medium for determining entity association relation Active CN114490884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111576489.3A CN114490884B (en) 2021-12-21 2021-12-21 Method, device, electronic equipment and storage medium for determining entity association relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111576489.3A CN114490884B (en) 2021-12-21 2021-12-21 Method, device, electronic equipment and storage medium for determining entity association relation

Publications (2)

Publication Number Publication Date
CN114490884A true CN114490884A (en) 2022-05-13
CN114490884B CN114490884B (en) 2023-06-06

Family

ID=81494204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111576489.3A Active CN114490884B (en) 2021-12-21 2021-12-21 Method, device, electronic equipment and storage medium for determining entity association relation

Country Status (1)

Country Link
CN (1) CN114490884B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086356A (en) * 2018-07-18 2018-12-25 哈尔滨工业大学 The incorrect link relationship diagnosis of extensive knowledge mapping and modification method
CN109885691A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Knowledge mapping complementing method, device, computer equipment and storage medium
CN110598001A (en) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 Method, device and storage medium for extracting association entity relationship
CN111428046A (en) * 2020-03-18 2020-07-17 浙江网新恩普软件有限公司 Knowledge graph generation method based on bidirectional L STM deep neural network
CN112035672A (en) * 2020-07-23 2020-12-04 深圳技术大学 Knowledge graph complementing method, device, equipment and storage medium
CN112507715A (en) * 2020-11-30 2021-03-16 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining incidence relation between entities
CN112528037A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Edge relation prediction method, device, equipment and storage medium based on knowledge graph
CN112612961A (en) * 2020-12-28 2021-04-06 完美世界(北京)软件科技发展有限公司 Information searching method and device, storage medium and computer equipment
CN112668719A (en) * 2020-11-06 2021-04-16 北京工业大学 Knowledge graph construction method based on engineering capacity improvement
CN113392312A (en) * 2020-03-12 2021-09-14 阿里巴巴集团控股有限公司 Information processing method and system and electronic equipment
CN113468298A (en) * 2020-03-31 2021-10-01 阿里巴巴集团控股有限公司 Commodity title processing method and device, electronic equipment and computer-readable storage medium
CN113806561A (en) * 2021-10-11 2021-12-17 中国人民解放军国防科技大学 Knowledge graph fact complementing method based on entity attributes

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086356A (en) * 2018-07-18 2018-12-25 哈尔滨工业大学 The incorrect link relationship diagnosis of extensive knowledge mapping and modification method
CN109885691A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Knowledge mapping complementing method, device, computer equipment and storage medium
CN110598001A (en) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 Method, device and storage medium for extracting association entity relationship
CN113392312A (en) * 2020-03-12 2021-09-14 阿里巴巴集团控股有限公司 Information processing method and system and electronic equipment
CN111428046A (en) * 2020-03-18 2020-07-17 浙江网新恩普软件有限公司 Knowledge graph generation method based on bidirectional L STM deep neural network
CN113468298A (en) * 2020-03-31 2021-10-01 阿里巴巴集团控股有限公司 Commodity title processing method and device, electronic equipment and computer-readable storage medium
CN112035672A (en) * 2020-07-23 2020-12-04 深圳技术大学 Knowledge graph complementing method, device, equipment and storage medium
CN112668719A (en) * 2020-11-06 2021-04-16 北京工业大学 Knowledge graph construction method based on engineering capacity improvement
CN112507715A (en) * 2020-11-30 2021-03-16 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining incidence relation between entities
CN112528037A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Edge relation prediction method, device, equipment and storage medium based on knowledge graph
CN112612961A (en) * 2020-12-28 2021-04-06 完美世界(北京)软件科技发展有限公司 Information searching method and device, storage medium and computer equipment
CN113806561A (en) * 2021-10-11 2021-12-17 中国人民解放军国防科技大学 Knowledge graph fact complementing method based on entity attributes

Also Published As

Publication number Publication date
CN114490884B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
EP3896581A1 (en) Learning to rank with cross-modal graph convolutions
CN109145153B (en) Intention category identification method and device
CN107329949B (en) Semantic matching method and system
CN106815252B (en) Searching method and device
AU2014201827B2 (en) Scoring concept terms using a deep network
WO2019194986A1 (en) Automated extraction of product attributes from images
CN110472027B (en) Intent recognition method, apparatus, and computer-readable storage medium
CN110321537B (en) Method and device for generating file
KR20180011221A (en) Select representative video frames for videos
EP3100212A1 (en) Generating vector representations of documents
JP2015201185A (en) Method for specifying topic of lecture video and non-temporary computer readable medium
CN109145110B (en) Label query method and device
EP3314461A1 (en) Learning entity and word embeddings for entity disambiguation
CN104049755A (en) Information processing method and device
EP3699780A1 (en) Method and apparatus for recommending entity, electronic device and computer readable medium
CN110019669B (en) Text retrieval method and device
CN111522886B (en) Information recommendation method, terminal and storage medium
US20180075070A1 (en) Search space reduction for knowledge graph querying and interactions
WO2020063524A1 (en) Method and system for determining legal instrument
CN105260385A (en) Picture retrieval method
CN111428506A (en) Entity classification method, entity classification device and electronic equipment
CN116069905A (en) Image text model processing method and image text retrieval system
CN107577667B (en) Entity word processing method and device
CN114490884A (en) Method and device for determining entity association relationship, electronic equipment and storage medium
WO2015143911A1 (en) Method and device for pushing webpages containing time-relevant information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant