CN114490884B - Method, device, electronic equipment and storage medium for determining entity association relation - Google Patents
Method, device, electronic equipment and storage medium for determining entity association relation Download PDFInfo
- Publication number
- CN114490884B CN114490884B CN202111576489.3A CN202111576489A CN114490884B CN 114490884 B CN114490884 B CN 114490884B CN 202111576489 A CN202111576489 A CN 202111576489A CN 114490884 B CN114490884 B CN 114490884B
- Authority
- CN
- China
- Prior art keywords
- entities
- commodity
- determining
- word
- association
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method, a device, electronic equipment and a storage medium for determining an entity association relationship, relates to the technical field of data processing, and aims to provide a method for determining the entity association relationship. The method comprises the following steps: entity identification is carried out on the title to obtain a plurality of entities in the commodity title; respectively acquiring semantic vectors of the entities; and determining the target association relationship between every two entities according to the semantic vector of each two entities in the plurality of entities.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, an electronic device, and a storage medium for determining an entity association relationship.
Background
Knowledge maps have wide application, and the basic constituent unit, namely Entity (Entity), is becoming more important. An entity is a thing-unique identifier and is also an important hub connecting unstructured text and structured knowledge. Determining the association between entities becomes an important part.
In the related art, the association relationship between entities is determined, and similar entities are usually determined based on user behaviors, for example, a user searches for tomatoes and clicks on the tomatoes, so that the tomatoes and the tomatoes are considered to be possible synonyms; or generating words with similar semantics directly through a machine translation model. However, the user behavior-based method requires rich user behavior data and depends on the quality of query words input by the user; the semantic drift of the results generated by the machine translation model-based method is severe.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for determining an entity association relationship, so as to overcome or at least partially solve the foregoing problems.
In a first aspect of an embodiment of the present invention, a method for determining an entity association relationship is provided, where the method includes:
entity identification is carried out on the commodity title, and a plurality of entities in the commodity title are obtained;
respectively acquiring semantic vectors of the entities;
and determining the target association relationship between every two entities according to the semantic vector of each two entities in the plurality of entities.
Optionally, determining the target association relationship between every two entities includes:
the semantic vector positive sequence of every two entities is spliced to obtain a first spliced vector, and the semantic vector negative sequence of every two entities is spliced to obtain a second spliced vector;
determining whether an association exists between every two entities according to the first splicing vector, and determining whether an association exists between every two entities according to the second splicing vector;
under the condition that the two determination results are that the two entities have association, determining the target association relationship between the two entities as a synonym relationship;
and determining that the target association relationship between every two entities is the context word relationship under the condition that one determination result is that the two entities have association and the other determination result is that the two entities have no association.
Optionally, the method further comprises:
determining association relations among a plurality of entities for a plurality of commodity titles;
aggregating a plurality of incidence relations including the target incidence relation to determine the incidence relation between every two entities;
the method further comprises the steps of:
and completing the commodity map based on the determined association relation.
Optionally, after determining the association relationship between every two entities, the method further includes:
acquiring the context of each entity with association relationship in the commodity title;
storing the context with the occurrence frequency higher than the preset frequency and the association relation between every two entities in a template;
acquiring a new commodity title, and matching the new commodity title with the stored template;
and determining the entity with the association relation with the entity in the template from the new commodity title according to the matching result.
Optionally, before the entity identification is performed on the commodity title, the method comprises:
acquiring a preset filter word, wherein the preset filter word represents that the commodity type is a combined commodity;
and filtering the candidate commodity titles containing the preset filtering words to obtain the commodity titles.
Optionally, the method further comprises:
acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities under the same category, and texts are overlapped;
obtaining pictures of the two commodities and comparing similarity;
and under the condition that the similarity is higher than a preset threshold value, determining the association relationship between the commodity entities in the pair of commodity titles as a co-located word relationship.
Optionally, the method further comprises:
acquiring query words;
determining a rewritten word of the query word based on the determined association relationship;
and carrying out commodity inquiry based on the inquiry words and the rewritten words thereof.
In a second aspect of an embodiment of the present invention, there is provided an apparatus for determining an entity association relationship, where the apparatus includes:
the entity identification module is used for carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
the semantic vector acquisition module is used for respectively acquiring the semantic vectors of the entities;
and the association relation determining module is used for determining the target association relation between every two entities according to the semantic vectors of every two entities in the plurality of entities.
In a third aspect of the embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for determining an entity association relationship according to the first aspect of the embodiment of the present invention when the processor executes the computer program.
In a fourth aspect of the embodiment of the present invention, there is provided a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method for determining an entity association relationship according to the first aspect of the embodiment of the present invention.
The embodiment of the invention has the following advantages:
in this embodiment, entity identification may be performed on a commodity title to obtain a plurality of entities in the commodity title; respectively acquiring semantic vectors of the entities; and determining the target association relationship between every two entities according to the semantic vector of each two entities in the plurality of entities. Therefore, the semantic relevance among the entities in the commodity title is strong, the target relevance among the entities in the commodity title is determined, and the determined entities with the target relevance can be ensured to have stronger semantic relevance; in addition, the commodity titles are not influenced by the user behavior data, so that the method for determining the entity association relationship through the commodity titles can cover new commodity titles and long-tail commodity titles with low flow.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a method for determining entity association in an embodiment of the present invention;
FIG. 2 is a flow chart of determining entity relationships in an embodiment of the invention;
fig. 3 is a schematic structural diagram of an apparatus for determining an entity association relationship in an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
In order to solve the problems that the method for determining the entity association relation in the related technology needs rich user behavior data, serious semantic drift and the like, the applicant proposes that: the entity in the commodity title has strong semantic association, and the association relationship between the entities in the commodity title can be determined by mining the commodity title.
Referring to fig. 1, a flowchart illustrating steps of a method for determining an entity association relationship in an embodiment of the present invention, as shown in fig. 1, the method for determining an entity association relationship may specifically include the following steps:
step S11: and carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title.
Co-occurrence commodities in the group package commodity titles are generally irrelevant, semantic relevance among identified entities is not strong, for example, if the cut commodity titles are watermelon, apple and juicy peach, and the gift box commodity titles are peanut, cashew nut, peach kernel and walnut kernel annual cargo gift box.
Therefore, before entity identification is performed on the commodity titles, the commodity titles need to be screened first, the commodity titles can be screened manually, and the commodity titles can be screened through preset filtering words. Optionally, as an embodiment, screening the commodity title by the preset filter word includes: before entity identification is carried out on commodity titles, candidate commodity titles are obtained, preset filter words are obtained, and the preset filter words represent that commodity types are combined commodities; and filtering the candidate commodity titles containing the preset filtering words to obtain the commodity titles.
The preset filter words represent the commodity types as combined commodities, such as fruit cuts, gift boxes, gift bags, mixed packages and the like. Whether the candidate commodity titles contain preset filter words or not is identified, and co-occurring commodities in the commodity titles containing the preset filter words are not generally relevant. And filtering the candidate commodity titles containing the preset filter words by utilizing the preset filter words to obtain commodity titles without the preset filter words.
Therefore, the entities in the commodity titles for mining the association relationship between the entities can be guaranteed to have strong semantic association, and correspondingly, the entities with the association relationship determined by the commodity titles also have strong semantic association.
The entity identification is to extract entity information from text data, and to identify the entity of the product title, a plurality of entities such as the product, the brand, the class, etc. are usually identified. For example, the entities "meal magician", "thermos cup" and "water cup" may be identified for the heading "meal magician brief lightweight thermos cup portable straight body water cup". The embodiment of the invention does not limit the method for identifying the entity.
Step S12: and respectively acquiring semantic vectors of the entities.
After the entities are identified, the semantic vector for each entity may be obtained through a pre-trained model. The pre-training model may be a BERT (Bidirectional Encoder Representations from Transformers, a language representation model) model or other model, as embodiments of the invention are not limited in this respect. Different labels can be used to mark the location of each entity, and then the entire commodity title is input into the pre-training model to make the semantic vector of the resulting entity more accurate.
Step S13: and determining the target association relationship between every two entities according to the semantic vector of each two entities in the plurality of entities.
And if the target association relationship exists between every two entities, the two entities are mutually rewritten words, and the rewritten word of one word is a synonym or a superword thereof. Any two entities in the plurality of entities are spliced in tandem, and whether the two entities are rewritten words is predicted through a pre-trained multi-layer full-connection network. And then reversing the sequence of the two entities and splicing again, and then, predicting whether the two entities are rewritten words or not through a pre-trained multi-layer full-connection network. Through the two prediction results, the target association relationship between the two entities can be determined.
The multi-layer full-connection network has strong expression capability, and the spliced semantic vectors are identified through the pre-trained multi-layer full-connection network, so that whether the entities corresponding to the two semantic vectors are rewritten words can be predicted. Wherein the multi-layer fully connected network may be a supervised training or a self-supervised training, as embodiments of the present invention are not limited in this respect.
Alternatively, the association relationship between entities may also be predicted by a deep learning model, an entity relationship recognition model, or the like.
Optionally, as an embodiment, determining the target association relationship between every two entities includes: the semantic vector positive sequence of every two entities is spliced to obtain a first spliced vector, and the semantic vector negative sequence of every two entities is spliced to obtain a second spliced vector; determining whether an association exists between every two entities according to the first splicing vector, and determining whether an association exists between every two entities according to the second splicing vector; under the condition that the two determination results are that the two entities have association, determining the target association relationship between the two entities as a synonym relationship; and determining that the target association relationship between every two entities is the context word relationship under the condition that one determination result is that the two entities have association and the other determination result is that the two entities have no association.
Performing positive sequence splicing on semantic vectors of every two entities to obtain a first spliced vector; performing reverse sequence splicing on semantic vectors of every two entities to obtain a second spliced vector; wherein the positive and negative sequences are only two opposite sequences. For example, the first and second splice vectors obtained by forward and reverse splicing of entities A and B may be any and different one of A-B or B-A, respectively.
And determining whether the two entities have association according to a first splicing vector and determining whether the two entities have association according to a second splicing vector through a pre-trained multi-layer full-connection network and a deep learning model. And under the condition that the two determination results are all associated, the target association relationship between every two entities is a synonym relationship. And under the condition that the two determination results are that no association exists, no association relationship exists between the two entities. Under the condition that one of the determination results is that the association exists and the other determination result is that the association does not exist, the target association relationship between every two entities is a context word relationship; wherein, which entity is the upper word and which entity is the lower word is determined according to a model for determining whether the association exists between every two entities; if the model determines whether the preceding semantic vector in the spliced vector is a superword or a synonym of the following semantic vector, if one of the determining results is that the association exists, and if the other determining result is that the association does not exist, the entity corresponding to the preceding semantic vector is the superword of the entity corresponding to the following semantic vector in the associated spliced vector.
For example, ase:Sub>A first splicing vector obtained by the entity A and the entity B is A-B, and ase:Sub>A second splicing vector obtained by the entity B is B-A;
if in the first splicing vector A-B, A and B are associated, A is ase:Sub>A hypernym or synonym of B, and in the second splicing vector B-A, B and A are also associated, B is ase:Sub>A hypernym or synonym of A, then the entity A and the entity B can only be synonyms;
if the first splicing vector A-B and the second splicing vector B-A have no association, and neither A nor B are upper words or synonyms of each other, then the entity A and the entity B have no association relationship;
if in the first splice vector A-B, A and B are associated, A is the hypernym or synonym of B, and in the second splice vector B-A, B and A are not associated, B is not the hypernym or synonym of A, then the entity A is the hypernym of the entity B.
Optionally, the target association relationship between every two entities can be directly identified through a pre-trained entity relationship identification model.
By adopting the technical scheme of the embodiment of the application, entity identification can be carried out on the commodity title, so that a plurality of entities in the commodity title are obtained; respectively acquiring semantic vectors of the entities; and determining the target association relationship between every two entities according to the semantic vector of each two entities in the plurality of entities. Therefore, the semantic relevance among the entities in the commodity title is strong, the target relevance among the entities in the commodity title is determined, and the determined entities with the target relevance can be ensured to have stronger semantic relevance; in addition, the commodity titles are not influenced by the user behavior data, so that the method for determining the entity association relationship through the commodity titles can cover new commodity titles and long-tail commodity titles with low flow.
Optionally, as an embodiment, the method for determining the entity association relationship further includes: determining association relations among a plurality of entities for a plurality of commodity titles; aggregating a plurality of incidence relations including the target incidence relation to determine the incidence relation between every two entities; the method further comprises the steps of: and completing the commodity map based on the determined association relation.
For each commodity title, an association relationship between every two pairs of entities in the plurality of entities in the commodity title may be determined, and for the plurality of commodity titles, an association relationship between every two pairs of entities in the plurality of entities in the commodity title may be determined. In order to make the obtained association relationship between the entities more accurate, the association relationship between the entities determined by the plurality of commodity titles can be aggregated to ensure that the obtained association relationship is more accurate by excluding the specificity of a small part of commodity titles.
In order to determine whether an association relationship exists between the entity A and the entity B, acquiring a plurality of commodity titles containing the entity A or the entity B, acquiring a plurality of target association relationships of the entity A and a plurality of target association relationships of the entity B in the commodity titles, judging the correlation scores of the plurality of target association relationships of the entity A and the plurality of target association relationships of the entity B and the target association relationships of the entity A and the entity B, acquiring the average value of the acquired plurality of correlation scores, comparing the average value with a preset score, and determining the association relationship between the entity A and the entity B. For example, predicting whether the entity "cake" is the upper word of the entity "moon cake" or not, obtaining a plurality of commodity titles of "cake traditional mung bean cake specialty", "cake traditional five-kernel moon cake", "cake traditional yolk moon cake" respectively, and obtaining the association relation: cake-green bean cake (superword), cake-moon cake (superword) and cake-moon cake (superword), wherein the relevant score of cake-green bean cake (superword) is 0.2, and the relevant score of cake-moon cake (superword) is 0.9, so that the relevant score of three association relations on average is 0.67 and is larger than the preset score of 0.5, and the entity cake is the superword of the entity moon cake.
Optionally, the association relationships may be directly aggregated, and the proportion of the association relationship in the aggregated cluster to the corresponding association relationship is determined, and if the proportion is higher than a preset proportion, the association relationship in the cluster is determined to be the association relationship of the two entities.
After the association relation of every two entities is determined, the commodity map or the corresponding knowledge map can be complemented based on the determined association relation.
By adopting the technical scheme of the embodiment of the application, the influence of the wrong association relation determined by a small number of special commodity titles can be eliminated through a plurality of commodity titles, the accuracy of the association relation of the confirmed two-by-two entities is ensured, and the commodity atlas is complemented based on the determined correct association relation.
Optionally, as an embodiment, after determining the association relationship between every two entities, the method further includes: acquiring the context of each entity with association relationship in the commodity title; storing the context with the occurrence frequency higher than the preset frequency and the association relation between every two entities in a template; acquiring a new commodity title, and matching the new commodity title with the stored template; and determining the entity with the association relation with the entity in the template from the new commodity title according to the matching result.
After determining the association relationship between every two entities, storing the two entities and the association relationship in a template. When a new commodity title is acquired, only another entity corresponding to the entity needs to be directly inquired in the template in order to acquire the rewritten word of the entity in the new commodity title.
One entity may have multiple rewritten words for different associations. Optionally, a plurality of commodity titles corresponding to the determined association relationship may be obtained, the context of the two entities in the plurality of commodity titles may be obtained, and the context with the occurrence frequency higher than the preset frequency and the association relationship between the two entities may be stored together in the template. Thus, when acquiring the rewritten word of the entity in the new commodity title, the corresponding rewritten word of the entity in the context can be determined by simultaneously matching the context of the entity, so that the rewritten word can be acquired more accurately.
By adopting the technical scheme of the embodiment of the application, after the entity association relationship is determined, every two entities and the entity association relationship are stored in the template, and the context with the occurrence frequency higher than the preset frequency is also stored in the template, so that when the rewritten word of the entity in the new commodity title is determined, only the stored template is required to be directly inquired, and the method has higher efficiency.
Optionally, as an embodiment, the method for determining the entity association relationship further includes: acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities under the same category, and texts are overlapped; obtaining pictures of the two commodities and comparing similarity; and under the condition that the similarity is higher than a preset threshold value, determining the association relationship between the commodity entities in the pair of commodity titles as a co-located word relationship.
Some commodities with short commodity titles and lacking co-occurrence entities, such as fresh fruit and vegetable commodities, some commodities have only single main entity, and the commodity titles provide insufficient information, but the commodity titles have higher picture correlation, and the association relation of all commodity entities can be judged through the picture correlation. For example, the commodity title "fresh selenium melon" only contains a single main entity "selenium melon" and does not have a plurality of entities, so that the association relationship between every two entities cannot be mined based on the commodity title alone. In this regard, the applicant contemplates that the association relationship of the entities may be determined based on the similarity of the pictures of the merchandise contained in the paired or multiple merchandise titles.
And determining the association relation of commodity entities according to the similarity of the pictures, and comparing the similarity of the pictures of the commodities under the same class in order to reduce the calculated amount and improve the accuracy. The method comprises the steps of firstly obtaining a plurality of commodity categories, aggregating similar target commodities, and then identifying commodity entities in commodity titles under each category.
And acquiring a pair of commodity titles which are under the same category and have overlapped texts of the commodity titles, and acquiring commodity entities in the pair of commodity titles. And acquiring pictures of the two commodities, wherein the pictures are usually commodity main pictures. And obtaining the representation vectors of the pictures of the two commodities, and calculating the similarity of the pictures of the two commodities through the representation vectors of the pictures of the two commodities. The representation vector of the picture can be obtained through a pre-trained picture characterization model, which is not limited in the embodiment of the invention.
And under the condition that the similarity is higher than a preset threshold value, determining that the two commodity entities are the same words. The co-located words comprise synonyms and belong to words corresponding to nodes below the same hypernym. For example, selenium sand melon and stone-stitch melon are both hyponyms of watermelons, and selenium sand melon and stone-stitch melon are homonyms of each other.
After determining the hypernym and the homonym of an entity, the hypernym of the homonym of the entity may be determined to be the hypernym of the entity. For example, after determining that the hypernym of the selenium sand melon is watermelon and the selenium sand melon and the stone-slit melon are homonyms, the hypernym of the stone-slit melon can be determined to be watermelon.
Optionally, determining the association relation of the plurality of pairs of commodity entities through the plurality of pairs of commodity titles and commodity entities in the plurality of pairs of commodity titles, and aggregating the association relation of the plurality of pairs of commodity entities to determine the final association relation of the commodity entities. The method for determining the association relationship between the plurality of pairs of commodity entities according to the plurality of pairs of commodity titles and the commodity entities in the plurality of pairs of commodity titles may refer to the method for determining the association relationship between the entities according to the plurality of commodity titles, which is not described in detail herein.
Referring to fig. 2, a schematic diagram of determining entity relationships from merchandise pictures is shown. Wherein the commodity is selenium sand melon and stone-seam melon belonging to fruit purpose. Acquiring titles and pictures corresponding to commodities, and respectively carrying out entity identification on the two commodity titles to identify commodity entity selenium sand melons and commodity entity stone gourds; and obtaining the picture representation vectors of the two commodities, and calculating the similarity of the two representation vectors. When the similarity of the representation vectors of the pictures of the two commodities exceeds a preset threshold, the two commodities are the same words. And inquiring the upper word bank to obtain that the watermelon is the upper word of the stone-slit melon, and determining that the upper word of the selenium-sand melon is the watermelon, so that the rewritten word of the selenium-sand melon is generated as the watermelon.
By adopting the technical scheme of the embodiment of the application, the association relation of the commodity entities is determined by utilizing the information of the commodity pictures on the basis of utilizing the commodity titles, so that the problem that part of commodity titles are too short and only comprise a single main entity is solved.
Optionally, as an embodiment, the method for determining the entity association relationship further includes: acquiring query words; determining a rewritten word of the query word based on the determined association relationship; and carrying out commodity inquiry based on the inquiry words and the rewritten words thereof.
When the commodity is queried by utilizing the query word, the query word is acquired, the rewritten word of the query word is determined through the association relation in the knowledge graph, the commodity graph or the stored template, and commodity query is performed based on the query word and the rewritten word. Therefore, the recall rate of the commodity can be improved when the commodity is inquired.
Optionally, the entity in the commodity title is rewritten in advance through the association relation in the knowledge graph, the commodity graph or the stored template, and a plurality of commodity titles after rewriting each commodity title are stored, when the commodity title and the rewritten commodity title are inquired by using the inquiry word, the inquiry is performed, and when the rewritten commodity title is inquired, the original commodity title corresponding to the rewritten commodity title is recalled.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Fig. 3 is a schematic structural diagram of a device for determining an association relationship between entities according to an embodiment of the present invention, where, as shown in fig. 3, the device for determining an association relationship between entities includes an entity identification module, a semantic vector acquisition module, and an association relationship determination module, where:
the entity identification module is used for carrying out entity identification on the commodity title to obtain a plurality of entities in the commodity title;
the semantic vector acquisition module is used for respectively acquiring the semantic vectors of the entities;
and the association relation determining module is used for determining the target association relation between every two entities according to the semantic vectors of every two entities in the plurality of entities.
Optionally, as an embodiment, the association determining module includes:
the splicing unit is used for splicing the semantic vectors of the two entities in positive sequence to obtain a first splicing vector, and splicing the semantic vectors of the two entities in reverse sequence to obtain a second splicing vector;
the association determining unit is used for determining whether association exists between every two entities according to the first splicing vector and determining whether association exists between every two entities according to the second splicing vector;
the first relation determining unit is used for determining that the target association relation between every two entities is a synonym relation under the condition that the two determining results are that the association exists between every two entities;
and the second relation determining unit is used for determining that the target association relation between every two entities is the context word relation under the condition that one determining result is that the association exists between every two entities and the other determining result is that the association does not exist between every two entities.
Optionally, as an embodiment, the apparatus further includes:
the plurality of association relation determining modules are used for determining association relation among a plurality of entities for a plurality of commodity titles;
the aggregation module is used for aggregating a plurality of incidence relations including the target incidence relation and determining the incidence relation between every two entities;
the apparatus further comprises:
and the map completion module is used for completing the commodity map based on the determined association relation.
Optionally, as an embodiment, the apparatus further includes:
the context acquisition module is used for acquiring the context of each of the two entities with the association relationship in the commodity title after determining the association relationship between the two entities;
the storage module is used for storing the context with the occurrence frequency higher than the preset frequency and the association relation between every two entities in the template;
the matching module is used for acquiring a new commodity title and matching the new commodity title with the stored template;
and the entity determining module is used for determining the entity with the association relation with the entity in the template from the new commodity title according to the matching result.
Optionally, as an embodiment, the apparatus further includes:
the system comprises a filtering word acquisition module, a processing module and a processing module, wherein the filtering word acquisition module is used for acquiring a preset filtering word before entity identification is carried out on a commodity title, and the preset filtering word represents that the commodity type is a combined commodity;
and the filtering module is used for filtering the candidate commodity titles containing the preset filtering words to obtain the commodity titles.
Optionally, as an embodiment, the apparatus further includes:
the commodity entity acquisition module is used for acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities under the same category, and texts are overlapped;
the similarity comparison module is used for acquiring pictures of the two commodities and comparing the similarity;
and the co-located word module is used for determining that the association relationship between the commodity entities in the pair of commodity titles is a co-located word relationship under the condition that the similarity is higher than a preset threshold value.
Optionally, as an embodiment, the apparatus further includes:
the query term module is used for acquiring query terms;
the rewritten word module is used for determining rewritten words of the query words based on the determined association relation;
and the commodity query module is used for querying the commodity based on the query word and the rewritten word.
By adopting the technical scheme of the embodiment of the application, entity identification can be carried out on the commodity title, so that a plurality of entities in the commodity title are obtained; respectively acquiring semantic vectors of the entities; and determining the target association relationship between every two entities according to the semantic vector of each two entities in the plurality of entities. Therefore, the semantic relevance among the entities in the commodity title is strong, the target relevance among the entities in the commodity title is determined, and the determined entities with the target relevance can be ensured to have stronger semantic relevance; in addition, the commodity titles are not influenced by the user behavior data, so that the method for determining the entity association relationship through the commodity titles can cover new commodity titles and long-tail commodity titles with low flow.
It should be noted that, the device embodiment is similar to the method embodiment, so the description is simpler, and the relevant places refer to the method embodiment.
The embodiment of the invention also provides electronic equipment, which comprises: the method for determining entity association according to any one of the embodiments described above is implemented when the processor executes the program.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and the computer program is executed to realize the method for determining entity association relation in any embodiment.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices, and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The above detailed description of a method, an apparatus, an electronic device, and a storage medium for determining an association relationship between entities provided in the present application applies specific examples to illustrate principles and embodiments of the present application, where the above description of the embodiments is only used to help understand the method and core ideas of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (8)
1. A method for determining an association of entities, the method comprising:
performing entity identification on a commodity title comprising at least two entities to obtain a plurality of entities in the commodity title; respectively acquiring semantic vectors of the entities; determining a target association relationship between every two entities according to semantic vectors of every two entities in the plurality of entities;
aiming at a plurality of commodity titles, determining association relations among a plurality of pairwise entities in each commodity title;
aggregating a plurality of incidence relations including the target incidence relation, and determining the aggregated incidence relation between every two entities;
acquiring a first query term;
determining a rewritten word of the first query word based on the determined aggregated association; the rewritten word of the first query word is a synonym or superword with the first query word having an aggregated association relationship;
aiming at commodity titles only comprising a single entity, acquiring a pair of commodity titles and commodity entities in the pair of commodity titles, wherein the pair of commodity titles are titles of two commodities under the same class and have text superposition; obtaining pictures of the two commodities and comparing similarity; under the condition that the similarity is higher than a preset threshold value, determining that the association relationship between commodity entities in the pair of commodity titles is a homonymy relationship;
acquiring a second query term;
determining a rewritten word of the second query word based on the determined association relationship between the commodity entities in the pair of commodity titles; the rewritten word of the second query word is an upper word of a word with a same-position word relation with the second query word;
and carrying out commodity inquiry based on the first inquiry word and the rewritten word thereof and the second inquiry word and the rewritten word thereof.
2. The method of claim 1, wherein determining the target association between the two entities comprises:
the semantic vector positive sequence of every two entities is spliced to obtain a first spliced vector, and the semantic vector negative sequence of every two entities is spliced to obtain a second spliced vector;
determining whether an association exists between every two entities according to the first splicing vector, and determining whether an association exists between every two entities according to the second splicing vector;
under the condition that the two determination results are that the two entities have association, determining the target association relationship between the two entities as a synonym relationship;
and determining that the target association relationship between every two entities is the context word relationship under the condition that one determination result is that the two entities have association and the other determination result is that the two entities have no association.
3. The method according to claim 1, wherein the method further comprises:
and completing the commodity map based on the determined association relation after aggregation.
4. The method of claim 3, further comprising, after determining the aggregated association between the two entities:
acquiring the context of each entity with the aggregated association relationship in the commodity title;
storing the context with the occurrence frequency higher than the preset frequency and the association relation between every two aggregated entities in a template;
acquiring a new commodity title, and matching the new commodity title with the stored template;
and determining the entity with the aggregated association relationship with the entity in the template from the new commodity title according to the matching result.
5. The method of claim 1, comprising, prior to entity identification of the commodity title:
acquiring a preset filter word, wherein the preset filter word represents that the commodity type is a combined commodity;
and filtering the candidate commodity titles containing the preset filtering words to obtain the commodity titles.
6. An apparatus for determining an association of entities, the apparatus comprising:
the entity identification module is used for carrying out entity identification on the commodity title aiming at the commodity title comprising at least two entities to obtain a plurality of entities in the commodity title;
the semantic vector acquisition module is used for respectively acquiring the semantic vectors of the entities;
the association relation determining module is used for determining the target association relation between every two entities according to the semantic vectors of every two entities in the plurality of entities;
the system comprises a plurality of association relation determining modules, a plurality of storage modules and a plurality of storage modules, wherein the association relation determining modules are used for determining association relation among a plurality of entities in each commodity title;
the aggregation module is used for aggregating a plurality of incidence relations including the target incidence relation and determining the aggregated incidence relation between every two entities;
the first query term module is used for acquiring a first query term;
the first rewrite word module is used for determining rewrite words of the first query word based on the determined association relation after aggregation; the rewritten word of the first query word is a synonym or superword with the first query word having an aggregated association relationship;
the commodity entity acquisition module is used for acquiring a pair of commodity titles and commodity entities in the pair of commodity titles aiming at commodity titles only comprising a single entity, wherein the pair of commodity titles are titles of two commodities under the same category and have text coincidence; obtaining pictures of the two commodities and comparing similarity; under the condition that the similarity is higher than a preset threshold value, determining that the association relationship between commodity entities in the pair of commodity titles is a homonymy relationship;
the second query term module is used for acquiring a second query term;
the second rewrite word module is used for determining rewrite words of the second query word based on the determined association relationship between the commodity entities in the pair of commodity titles; the rewritten word of the second query word is an upper word of a word with a same-position word relation with the second query word;
and the commodity query module is used for querying the commodity based on the first query word and the rewritten word thereof and the second query word and the rewritten word thereof.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of determining entity associations according to any of claims 1 to 5 when the computer program is executed by the processor.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of determining an entity association as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111576489.3A CN114490884B (en) | 2021-12-21 | 2021-12-21 | Method, device, electronic equipment and storage medium for determining entity association relation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111576489.3A CN114490884B (en) | 2021-12-21 | 2021-12-21 | Method, device, electronic equipment and storage medium for determining entity association relation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114490884A CN114490884A (en) | 2022-05-13 |
CN114490884B true CN114490884B (en) | 2023-06-06 |
Family
ID=81494204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111576489.3A Active CN114490884B (en) | 2021-12-21 | 2021-12-21 | Method, device, electronic equipment and storage medium for determining entity association relation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114490884B (en) |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109086356B (en) * | 2018-07-18 | 2020-09-25 | 哈尔滨工业大学 | Method for diagnosing and correcting error connection relation of large-scale knowledge graph |
CN109885691B (en) * | 2019-01-08 | 2024-06-25 | 平安科技(深圳)有限公司 | Knowledge graph completion method, knowledge graph completion device, computer equipment and storage medium |
CN110598001A (en) * | 2019-08-05 | 2019-12-20 | 平安科技(深圳)有限公司 | Method, device and storage medium for extracting association entity relationship |
CN113392312A (en) * | 2020-03-12 | 2021-09-14 | 阿里巴巴集团控股有限公司 | Information processing method and system and electronic equipment |
CN111428046B (en) * | 2020-03-18 | 2021-06-01 | 浙江网新恩普软件有限公司 | Knowledge graph generation method based on bidirectional LSTM deep neural network |
CN113468298A (en) * | 2020-03-31 | 2021-10-01 | 阿里巴巴集团控股有限公司 | Commodity title processing method and device, electronic equipment and computer-readable storage medium |
CN112035672B (en) * | 2020-07-23 | 2023-05-09 | 深圳技术大学 | Knowledge graph completion method, device, equipment and storage medium |
CN112668719A (en) * | 2020-11-06 | 2021-04-16 | 北京工业大学 | Knowledge graph construction method based on engineering capacity improvement |
CN112507715B (en) * | 2020-11-30 | 2024-01-16 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining association relation between entities |
CN112528037B (en) * | 2020-12-04 | 2024-04-09 | 北京百度网讯科技有限公司 | Side relation prediction method, device, equipment and storage medium based on knowledge graph |
CN112612961B (en) * | 2020-12-28 | 2024-02-02 | 完美世界(北京)软件科技发展有限公司 | Information searching method, device, storage medium and computer equipment |
CN113806561A (en) * | 2021-10-11 | 2021-12-17 | 中国人民解放军国防科技大学 | Knowledge graph fact complementing method based on entity attributes |
-
2021
- 2021-12-21 CN CN202111576489.3A patent/CN114490884B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114490884A (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240220527A1 (en) | Classifying data objects | |
CN104035917B (en) | A kind of knowledge mapping management method and system based on semantic space mapping | |
US10366327B2 (en) | Generating vector representations of documents | |
US9117006B2 (en) | Recommending keywords | |
CN105144164B (en) | Scoring concept terms using a deep network | |
US8560513B2 (en) | Searching for information based on generic attributes of the query | |
CN108228758B (en) | Text classification method and device | |
WO2019194986A1 (en) | Automated extraction of product attributes from images | |
CN108959531B (en) | Information searching method, device, equipment and storage medium | |
CN111832290B (en) | Model training method and device for determining text relevance, electronic equipment and readable storage medium | |
US20180089197A1 (en) | Internet search result intention | |
US9256649B2 (en) | Method and system of filtering and recommending documents | |
US11227113B2 (en) | Precision batch interaction with a question answering system | |
US11157540B2 (en) | Search space reduction for knowledge graph querying and interactions | |
CN112685642A (en) | Label recommendation method and device, electronic equipment and storage medium | |
US20130304471A1 (en) | Contextual Voice Query Dilation | |
CN104123285B (en) | The air navigation aid and device of search result | |
US20150026184A1 (en) | Methods and systems for content management | |
CN105468649A (en) | Method and apparatus for determining matching of to-be-displayed object | |
CN111737494A (en) | Knowledge graph generation method of intelligent learning system | |
CN111428506A (en) | Entity classification method, entity classification device and electronic equipment | |
CN112307314A (en) | Method and device for generating fine selection abstract of search engine | |
CN103389981B (en) | Network label automatic identification method and its system | |
CN114036283A (en) | Text matching method, device, equipment and readable storage medium | |
CN114490884B (en) | Method, device, electronic equipment and storage medium for determining entity association relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |