CN114116914A - Entity retrieval method and device based on semantic tag and electronic equipment - Google Patents

Entity retrieval method and device based on semantic tag and electronic equipment Download PDF

Info

Publication number
CN114116914A
CN114116914A CN202111260744.3A CN202111260744A CN114116914A CN 114116914 A CN114116914 A CN 114116914A CN 202111260744 A CN202111260744 A CN 202111260744A CN 114116914 A CN114116914 A CN 114116914A
Authority
CN
China
Prior art keywords
entity
reference semantic
determining
semantic
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111260744.3A
Other languages
Chinese (zh)
Inventor
陈子平
朱嘉琪
卢佳俊
柴春光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111260744.3A priority Critical patent/CN114116914A/en
Publication of CN114116914A publication Critical patent/CN114116914A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides an entity retrieval method, apparatus, electronic device and storage medium based on semantic tags, which relate to the technical field of computers, and in particular to the technical field of artificial intelligence such as big data processing, knowledge graph, natural language processing, etc. The specific implementation scheme is as follows: acquiring an entity retrieval request, wherein the retrieval request comprises a target semantic tag; acquiring a plurality of candidate entities from the entity library according to the matching degree of a reference semantic label corresponding to each entity in the entity library and a target semantic label; and determining the display sequence of the candidate entities in the retrieval result according to the association degree between each candidate entity and the corresponding reference semantic label. Therefore, in the entity retrieval process, the obtained candidate entities are sequentially displayed according to the association degree between the entities and the reference semantic tags, so that not only is the accuracy and reliability of entity retrieval improved, but also a user can quickly and accurately determine the target entities from the retrieval result.

Description

Entity retrieval method and device based on semantic tag and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as big data processing, knowledge graph, and natural language processing, and in particular, to a semantic tag-based entity retrieval method and apparatus, an electronic device, and a storage medium.
Background
With the continuous development and improvement of artificial intelligence technology, a plurality of entities related to semantic tags can be searched from a large amount of data information according to the semantic tags in the entity searching process. In the related art, semantic tags corresponding to entities are usually implemented based on a big data mining technology. However, in the mining process, semantic tags with low relevance to the entities are extremely easy to introduce, so that the accuracy of the entities retrieved according to the semantic tags is possibly low. Therefore, how to improve the accuracy and reliability of the entity search is an important research direction.
Disclosure of Invention
The disclosure provides an entity retrieval method and device based on semantic labels, electronic equipment and a storage medium.
According to a first aspect of the present disclosure, there is provided a semantic tag-based entity retrieval method, including:
acquiring an entity retrieval request, wherein the retrieval request comprises a target semantic tag;
acquiring a plurality of candidate entities from an entity library according to the matching degree of a reference semantic label corresponding to each entity in the entity library and the target semantic label;
and determining the display sequence of the candidate entities in the retrieval result according to the association degree between each candidate entity and the corresponding reference semantic label.
According to a second aspect of the present disclosure, there is provided a semantic tag-based entity retrieval apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an entity retrieval request which comprises a target semantic tag;
the second acquisition module is used for acquiring a plurality of candidate entities from the entity library according to the matching degree of the reference semantic label corresponding to each entity in the entity library and the target semantic label;
and the first determining module is used for determining the display sequence of the candidate entities in the retrieval result according to the association degree between each candidate entity and the corresponding reference semantic label.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the semantic tag based entity retrieval method of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to execute the semantic tag based entity retrieval method according to the first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the steps of the semantic tag based entity retrieval method according to the first aspect.
The entity retrieval method, the entity retrieval device, the electronic equipment and the storage medium based on the semantic tags have the following beneficial effects:
in the embodiment of the disclosure, an entity retrieval request including a target semantic tag is firstly obtained, then a plurality of candidate entities are obtained from an entity library according to the matching degree of a reference semantic tag corresponding to each entity in the entity library and the target semantic tag, and finally the display sequence of the candidate entities in a retrieval result is determined according to the association degree between each candidate entity and the corresponding reference semantic tag. Therefore, in the entity retrieval process, the obtained candidate entities are sequentially displayed according to the association degree between the entities and the reference semantic tags, so that not only is the accuracy and reliability of entity retrieval improved, but also a user can quickly and accurately determine the target entities from the retrieval result.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart diagram illustrating a semantic tag-based entity retrieval method according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a semantic tag-based entity retrieval method according to yet another embodiment of the present disclosure;
FIG. 2a is a schematic illustration of a knowledge-graph provided in accordance with an embodiment of the present disclosure;
FIG. 2b is a schematic diagram of a concept graph provided in accordance with yet another embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a semantic tag-based entity retrieval method according to yet another embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an entity retrieval apparatus based on semantic tags according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of an electronic device for implementing the semantic tag-based entity retrieval method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiment of the disclosure relates to the technical field of artificial intelligence such as computer vision and deep learning.
Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
The big data processing technology is used for collecting a large amount of data through various channels, and deeply mining and analyzing the data by using a cloud computing technology, so that rules and characteristics among the data can be timely found out, and the value of the data is summarized and summarized. The big data processing technology has very important significance for knowing data characteristics and predicting development trend.
A knowledge graph is essentially a semantic network, and is a graph-based data structure, consisting of nodes and edges. In the knowledge graph, each node represents an entity existing in the real world, and each edge is a relationship between the entities. Generally, a knowledge graph is a relationship network obtained by connecting all kinds of information together, and provides the ability to analyze problems from the perspective of relationships.
Natural language processing is the computer processing, understanding and use of human languages (such as chinese, english, etc.), which is a cross discipline between computer science and linguistics, also commonly referred to as computational linguistics. Since natural language is the fundamental mark that humans distinguish from other animals. Without language, human thinking has not been talk about, so natural language processing embodies the highest task and context of artificial intelligence, that is, only when a computer has the capability of processing natural language, the machine has to realize real intelligence.
Fig. 1 is a schematic flowchart of an entity retrieval method based on semantic tags according to an embodiment of the present disclosure.
It should be noted that the main execution body of the entity retrieval method based on the semantic tags in this embodiment is an entity retrieval device based on the semantic tags, the device may be implemented in a software and/or hardware manner, the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.
As shown in fig. 1, the entity retrieval method based on semantic tags includes:
s101: and acquiring an entity retrieval request, wherein the retrieval request comprises a target semantic tag.
In the present disclosure, entities may be things that exist objectively and are distinguishable from each other. Such as a concept, thing, person, or event, etc. For example, the entity may be "XX cartoon," "XX company," "XX employee," etc., and the disclosure is not limited thereto.
In the present disclosure, the semantic tag may represent attribute information of the entity, and thus, entity retrieval may be performed based on the semantic tag. For example, the semantic tag may be "comedy movie", and based on the semantic tag, "comedy movie a", "comedy movie B", etc. may be retrieved.
The entity retrieval request can be a request input by a user at a client and used for acquiring the entity related to the semantic tag. For example, the entity retrieval request may be "comedy action movie", "cartoon of 2021 year", and the like.
The target semantic tag may be a tag related to attribute information of the entity included in the retrieval request. The number of target semantic tags may be 1 or multiple, which is not limited in the present disclosure.
For example, if the search request is "animation in 2021 year", the corresponding target semantic tags may be "animation" and "2021 year".
S102: and acquiring a plurality of candidate entities from the entity library according to the matching degree of the reference semantic label corresponding to each entity in the entity library and the target semantic label.
The reference semantic tag corresponding to the entity may be a semantic tag that is stored in the entity library and can reflect attribute information of the entity. It should be noted that one or more reference semantic tags corresponding to one entity may be used, and the disclosure does not limit this.
The candidate entity may be an entity whose matching degree between the reference semantic tag corresponding to each entity in the entity library and the target semantic tag is greater than a preset threshold. Or, the matching degrees of the reference semantic tag and the target semantic tag corresponding to each entity in the entity library may be sequentially ranked from high to low, and the entities corresponding to the n matching degrees with the highest matching degree may be used as candidate entities.
Optionally, the matching degree between the reference semantic tag corresponding to each entity in the entity library and the target semantic tag may be determined according to a pre-trained matching degree network model. The network structure of the matching degree network model may be a convolutional neural network, or a twin neural network, and the like, which is not limited in this disclosure.
Or, the matching degree between the reference semantic label and the target semantic label corresponding to each entity in the entity library may be determined according to the euclidean distance, the cosine similarity, and the like between the reference semantic label and the target semantic label corresponding to each entity.
Under the condition that one entity corresponds to a plurality of reference semantic tags, the matching degree between a target semantic tag and each reference semantic tag can be calculated firstly, and the matching degree corresponding to the maximum value is determined as the matching degree between the reference semantic tag corresponding to the entity and the target semantic tag; or, the average value of the matching degree between each reference semantic tag and the target semantic tag may also be used as the matching degree between the reference semantic tag and the target semantic tag corresponding to each entity, and the like, which is not limited in this disclosure.
S103: and determining the display sequence of the candidate entities in the retrieval result according to the association degree between each candidate entity and the corresponding reference semantic label.
The association degree between the candidate entity and the corresponding reference semantic tag can reflect the degree of correlation between the reference semantic tag and the candidate entity, and the higher the association degree is, the greater the correlation is. That is, the candidate entity with higher relevance may be closer to the entity that the user wants to acquire. Therefore, in the present disclosure, the display order of the multiple candidate entities in the search result may be determined according to the association degree between each candidate entity and the corresponding reference semantic tag.
For example, the candidate entities corresponding to each association degree may be sequentially displayed according to the sequence of the association degree between each candidate entity and the corresponding reference semantic tag from high to low. That is, the candidate entity corresponding to the maximum association degree is placed at the first position, and so on.
Or, the correlation degree between the candidate entity and the corresponding reference semantic label can be divided into strong correlation, weak correlation and irrelevant according to the correlation degree between the candidate entity and the corresponding reference semantic label, and then the candidate entity corresponding to the strong correlation is displayed at the top of the retrieval result, and the irrelevant candidate entity is not displayed after the strong correlation.
Optionally, if the association degree between the candidate entity and the corresponding reference semantic tag is greater than the first threshold, the candidate entity and the corresponding reference semantic tag are in a strong correlation relationship. Or the association degree between the candidate entity and the corresponding reference semantic label is between the first threshold and the second threshold, and then the candidate entity and the corresponding reference semantic label are in a weak correlation relationship. Or, if the association degree between the candidate entity and the corresponding reference semantic label is smaller than the second threshold, the candidate entity and the corresponding reference semantic label are not related.
In the embodiment of the disclosure, an entity retrieval request including a target semantic tag is firstly obtained, then a plurality of candidate entities are obtained from an entity library according to the matching degree of a reference semantic tag corresponding to each entity in the entity library and the target semantic tag, and finally the display sequence of the candidate entities in a retrieval result is determined according to the association degree between each candidate entity and the corresponding reference semantic tag. Therefore, in the entity retrieval process, the obtained candidate entities are sequentially displayed according to the association degree between the entities and the reference semantic tags, so that not only is the accuracy and reliability of entity retrieval improved, but also a user can quickly and accurately determine the target entities from the retrieval result.
Through the analysis, the display sequence of the candidate entities in the retrieval result can be determined according to the association degree between each candidate entity and the corresponding reference semantic label. The process of determining the association between an entity and a corresponding reference semantic label is described in detail below with reference to fig. 2 to 3.
Fig. 2 is a flowchart illustrating an entity retrieval method based on semantic tags according to another embodiment of the present disclosure. As shown in fig. 2, the entity retrieval method based on semantic tags includes:
s201: and acquiring a reference corpus set.
The reference corpus may be a full-network corpus, that is, all corpora that can be obtained from the network.
Optionally, after the reference corpus is obtained, the reference corpus may be preprocessed to improve quality of the reference corpus and simplify the number of the reference corpuses.
Optionally, the pre-processing may comprise: clause, high-quality sentence filtering, negative filtering, entity chain, and the like, which is not limited by this disclosure.
The sentence segmentation can segment a complete paragraph or character into a plurality of sentences, and provides a basis for acquiring the occurrence times of the reference semantic tags later. For example, in the sentence "movie a is a comedy movie", the entity "movie a" and the reference semantic tag "comedy movie" co-occur once.
The high-quality sentence filtering may filter meaningless corpora included in the reference corpus. For example, the nonsense corpus may be spoken corpus, exclamation words, etc., which are not limited by this disclosure.
The negative filtering may filter the corpora containing the negative meaning in the reference corpus set. For example, if the reference corpus is "movie a is not a comedy movie", the reference corpus may be filtered when calculating the co-occurrence times of "movie a" and "comedy movie", so as to improve the accuracy of determining the co-occurrence times of the entities and the corresponding reference semantic tags in the reference corpus.
The entity chain refers to the association between the entities contained in the reference corpus and the entities in the entity library.
S202: and determining the co-occurrence quantity of each entity and the corresponding reference semantic tag in the entity library in the reference corpus.
The co-occurrence quantity is used for representing the occurrence times of the entity and the corresponding reference semantic tag in the reference corpus set.
It can be understood that, in the present disclosure, the correlation between the entity and the corresponding reference semantic tag can be reflected by the co-occurrence amount of the entity and the corresponding reference semantic tag in the reference corpus set, and the larger the co-occurrence amount is, the larger the correlation between the entity and the corresponding reference semantic tag is; the smaller the co-occurrence, the less correlation between the entity and the corresponding reference semantic label.
Optionally, the co-occurrence number of each entity and the corresponding reference semantic tag in the reference corpus set and the occurrence number of the entity in the reference corpus set may be determined first, and then the co-occurrence amount is determined according to the co-occurrence number and the occurrence number.
Wherein, the formula for calculating the co-occurrence amount can be as follows:
Figure BDA0003324725100000071
wherein, TagiFor the ith reference semantic tag, EntityjFor the jth entity in the entity library, NC (Tag)i,Entityj) CO-occurrence quantity of j-th entity in entity library and corresponding i-th reference semantic label in reference entity library, CO (Tag)i,Entityj) Co-occurrence number of jth Entity and corresponding ith reference semantic label in reference corpus, O (Entity)j) Is the number of occurrences of the jth entity in the reference corpus.
For example, if the entity is "movie a", the corresponding reference semantic tag is "comedy movie", the frequency of occurrence of "movie a" in the reference corpus is Y, and the frequency of occurrence of "movie a" and "comedy movie" in the same reference corpus is X, the co-occurrence amount of the entity and the corresponding reference semantic tag is X/Y.
Optionally, when a plurality of reference semantic tags corresponding to one entity belong to the same tag cluster, the co-occurrence frequency of each entity and each reference semantic tag in the tag cluster in the reference corpus set may be determined, and then the co-occurrence amount is determined according to the co-occurrence frequency of each reference semantic tag in the entity and tag cluster and the occurrence frequency of the entity, so that the determined co-occurrence amount is more accurate.
Wherein a label cluster may be a set of multiple similar reference semantic labels. For example, "bicycle" and "bicycle" belong to the same tag cluster, and "love" belong to the same tag cluster, etc.
Optionally, synonymy clustering may be performed on the reference semantic tags included in the entity library based on the semantic similarity model to form a tag cluster. Alternatively, synonymy clustering may be performed on the reference semantic tags included in the entity library by a clustering algorithm to form a tag cluster. The present disclosure is not limited thereto.
For example, the entity is "tv drama a", and the corresponding reference semantic tags may include "love" and "love", where "love" and "love" belong to the same tag cluster. In the reference corpus, the times of occurrence of the TV drama A and the love in the same reference corpus is X1The frequency of occurrence of TV drama A and love in the same reference corpus is X2If the co-occurrence frequency of the "TV play A" and the "love" in the label cluster and the "love" in the reference corpus is X1+X2
S203: and determining the matching degree between each entity and the corresponding reference semantic label based on the preset concept graph and the knowledge graph.
Wherein, what is stored in the preset knowledge map is the SPO triplet value, namely < entity, attribute value >. As shown in fig. 2a, for example, entity 1 is "animation a", and the SPO triplet value is < animation a, character, caun >, indicating that "caun" is the character in "animation a"; the SPO triplet value is < animation a, type, animation >, i.e. "animation a" is a cartoon.
The preset concept graph stores attribute concept sets < concept 1, instance (isA) and concept 2> for representing that concept 1 is an instance of concept 2. As shown in fig. 2b, for example, < li-white, isA, person >, indicating that "li-white" is an example in "person"; < first detective caucasian, isA, cartoon >, means that "first detective caucasian" is an example in "cartoon", etc.
Optionally, the preset concept graph may be traversed first to obtain the upper concept corresponding to each reference semantic tag, then the preset knowledge graph is traversed to determine an instance corresponding to the upper concept, and finally the matching degree between the entity and the corresponding reference semantic tag is determined according to the matching degree between the instance and the reference semantic tag.
The matching degree between the entity and the corresponding reference semantic label can be determined according to the following discrimination formula:
Figure BDA0003324725100000091
wherein, δ (Tag)i,Entityj) A concept (Tag) as a degree of association between the jth entity in the entity library and the corresponding ith reference semantic Tagi) Tag for reference semantic TagiThe general concept of (a) is that,
Figure BDA0003324725100000092
concept(Tagi) Representing a reference semantic TagiThe upper concept of (1) is present in the concept graph, n represents "and", value (Entity)j,concept(Tagi) Represent EntityjCorresponding TagiA generic concept of (2), an example in a knowledge graph. Namely, TagiThe upper concept of (2) is present in the concept graph, and TagiAnd in the case of not being equal to the instance of the concept in the knowledge graph, the matching degree between the entity and the corresponding reference semantic label is 0, otherwise, the matching degree is 1.
For example, Entity in Entity libraryjIs "movie A", the corresponding reference semantic TagiIs "cartoon". In the concept graph, the upper concept of the animation is ' type ', in the example of the knowledge graph, the type of the ' movie A ' is ' movie ', the ' movie ' is not equal to the ' animation ', and therefore the match of the ' movie A ' and the ' animationThe degree of mixing is 0.
S204: and determining the association degree between each entity and the corresponding reference semantic label according to the co-occurrence amount and the matching degree of each entity and the corresponding reference semantic label.
The calculation formula of the association between the entity and the corresponding reference semantic label may be:
Score(Tagi,Entityj)=NC(Tagi,Entityj)*δ(Tagi,Entityj)
among them, Score (Tag)i,Entityj) And the association degree between the jth entity in the entity library and the corresponding ith reference semantic label.
In the embodiment of the disclosure, a reference corpus set is obtained first, then the co-occurrence amount of each entity and the corresponding reference semantic tag in the entity library in the reference corpus set is determined, then the matching degree between each entity and the corresponding reference semantic tag is determined based on a preset concept graph and a preset knowledge graph, and finally the association degree between each entity and the corresponding reference semantic tag is determined according to the co-occurrence amount and the matching degree of each entity and the corresponding reference semantic tag. Therefore, the association degree between each entity and the corresponding reference semantic label is determined according to a large number of reference corpora, knowledge maps and concept maps, so that low association between the entities in the entity library and the reference semantic labels is avoided, and the accuracy of the entity retrieval result is guaranteed.
FIG. 3 is a flowchart illustrating a semantic tag-based entity retrieval method according to yet another embodiment of the present disclosure; as shown in fig. 3, the entity retrieval method based on semantic tags includes:
s301: and acquiring a reference corpus set.
S302: and determining the co-occurrence quantity of each entity and the corresponding reference semantic tag in the entity library in the reference corpus.
S303: and determining the matching degree between each entity and the corresponding reference semantic label based on the preset concept graph and the knowledge graph.
The specific implementation forms of step 301 to step 303 may refer to detailed steps in other embodiments in the present disclosure, which are not described herein again.
S304: and under the condition that any entity corresponds to a plurality of reference semantic tags, determining the co-occurrence probability of every two reference semantic tags in the plurality of reference semantic tags among the same entity in the entity library.
It is understood that the co-occurrence probability of every two reference semantic tags between the same entity in the entity library can reflect whether the two reference semantic tags are mutually exclusive. If the two reference semantic tags are mutually exclusive, the correlation between the two reference semantic tags is low.
Optionally, under the condition that any entity corresponds to multiple reference semantic tags, the co-occurrence frequency of every two reference semantic tags between the same entities in the entity library and the number of the entities in the entity library including each reference semantic tag may be determined, and then the co-occurrence probability of the two reference semantic tags is determined according to the co-occurrence frequency of the two reference semantic tags and the number of the entities in the entity library including each reference semantic tag.
The calculation formula of the co-occurrence probability of the two reference semantic tags between the same entities in the entity library is as follows:
Figure BDA0003324725100000101
wherein, P (Tag)x,Tagy) Tag for reference semantic TagxAnd TagyProbability of CO-occurrence between the same entities in a library of entities, CO (Tag)x,Tagy) Is TagxAnd TagyFrequency of co-occurrence, N, between the same entities in a library of entitiesTagxIncluding Tag for entity libraryxNumber of entities of (1), NTagyIncluding Tag for entity libraryxThe number of entities of (2).
S305: and determining a correlation coefficient between each reference semantic label and any entity according to the co-occurrence probability of each reference semantic label and another reference semantic label in the plurality of reference semantic labels between the same entity in the entity library and the number of the plurality of reference semantic labels.
The calculation formula of the correlation coefficient between each reference semantic label and any entity is as follows:
Figure BDA0003324725100000111
among them, coeff (Tag)i,Entityj) Tag for reference semantic TagiWith EntityjCoefficient of correlation between, SetjFor Entity in Entity libraryjCorresponding sets of multiple reference semantic tags, P (Tag)i,Tagk) Is TagiAnd TagkWith EntityjProbability of co-occurrence between them.
S306: and determining the association degree between each entity and the corresponding reference semantic label according to the correlation coefficient, the co-occurrence amount and the matching degree between each reference semantic label and any entity.
The calculation formula of the association between the entity and the corresponding reference semantic label may be:
Score(Tagi,Entityj)=NC(Tagi,Entityj)*δ(Tagi,Entityj)*coeff(Tagi,Entityj)
among them, Score (Tag)i,Entityj) And the association degree between the jth entity in the entity library and the corresponding ith reference semantic label.
In the embodiment of the disclosure, the co-occurrence amount of each entity and the corresponding reference semantic tag in the entity library in the reference corpus set is determined, then the matching degree between each entity and the corresponding reference semantic tag is determined based on a preset concept map and a preset knowledge map, the correlation coefficient between each reference semantic tag and any entity is determined according to the co-occurrence probability between each reference semantic tag and another reference semantic tag in the entity library and the number of the plurality of reference semantic tags, and finally the correlation degree between each entity and the corresponding reference semantic tag is determined according to the correlation coefficient, the co-occurrence amount and the matching degree between each reference semantic tag and any entity. Therefore, the association degree between each entity and the corresponding reference semantic label is determined according to a large number of reference corpora, knowledge maps, concept maps and the co-occurrence probability of every two reference semantic labels in the same entity in the entity library, so that the low correlation between the entities in the entity library and the reference semantic labels is further avoided, and the accuracy of the entity retrieval result is further guaranteed.
Fig. 4 is a schematic structural diagram of an entity retrieval apparatus based on semantic tags according to an embodiment of the present disclosure; as shown in fig. 4, the entity retrieving apparatus 400 based on semantic tags includes:
the first obtaining module 410 is configured to obtain an entity retrieval request, where the retrieval request includes a target semantic tag.
The second obtaining module 420 is configured to obtain a plurality of candidate entities from the entity library according to the matching degree between the reference semantic tag corresponding to each entity in the entity library and the target semantic tag.
The first determining module 430 is configured to determine a display order of the multiple candidate entities in the search result according to the association degree between each candidate entity and the corresponding reference semantic tag.
In some embodiments of the present disclosure, among others, further comprising:
and the third obtaining module is used for obtaining the reference corpus set.
And the second determining module is used for determining the co-occurrence quantity of each entity and the corresponding reference semantic tag in the entity library in the reference corpus set, wherein the co-occurrence quantity is used for representing the occurrence times of the entity and the corresponding reference semantic tag in the reference corpus set.
And the third determining module is used for determining the matching degree between each entity and the corresponding reference semantic label based on the preset concept graph and the knowledge graph.
And the fourth determining module is used for determining the association degree between each entity and the corresponding reference semantic label according to the co-occurrence amount and the matching degree of each entity and the corresponding reference semantic label.
In some embodiments of the present disclosure, the third determining module is specifically configured to:
traversing a preset concept map to obtain a superior concept corresponding to each reference semantic label;
traversing a preset knowledge graph to determine an example corresponding to the upper concept;
and determining the matching degree between the entity and the corresponding reference semantic label according to the matching degree between the example and the reference semantic label.
In some embodiments of the disclosure, wherein the fourth determining module comprises:
the first determining unit is used for determining the co-occurrence probability of every two reference semantic tags in the plurality of reference semantic tags among the same entities in the entity library under the condition that any entity corresponds to the plurality of reference semantic tags.
And the second determining unit is used for determining a correlation coefficient between each reference semantic label and any entity according to the co-occurrence probability of each reference semantic label and another reference semantic label in the plurality of reference semantic labels between the same entity in the entity library and the number of the plurality of reference semantic labels.
And the third determining unit is used for determining the association degree between each entity and the corresponding reference semantic label according to the correlation coefficient, the co-occurrence amount and the matching degree between each reference semantic label and any entity.
In some embodiments of the present disclosure, the first determining unit is specifically configured to:
determining the co-occurrence frequency of every two reference semantic tags between the same entities in the entity library and the number of the entities corresponding to each reference semantic tag in the entity library;
and determining the co-occurrence probability of the two reference semantic tags according to the co-occurrence frequency of the two reference semantic tags and the number of the entities corresponding to each reference semantic tag in the entity library.
In some embodiments of the disclosure, wherein the second determining module comprises:
a fourth determining unit, configured to determine the co-occurrence frequency of each entity and the corresponding reference semantic tag in the reference corpus set and the occurrence frequency of the entity in the reference corpus set;
and the fifth determining unit is used for determining the co-occurrence quantity according to the co-occurrence times and the occurrence times.
In some embodiments of the present disclosure, the fourth determining unit is specifically configured to:
and under the condition that a plurality of reference semantic labels belong to the same label cluster, determining the co-occurrence frequency of each entity and each reference semantic label in the label cluster in the reference corpus set.
It should be noted that the foregoing explanation of the entity retrieving method based on semantic tags is also applicable to the entity retrieving apparatus based on semantic tags in this embodiment, and is not repeated here.
The device in the embodiment of the disclosure firstly obtains an entity retrieval request including a target semantic tag, then obtains a plurality of candidate entities from an entity library according to the matching degree of a reference semantic tag corresponding to each entity in the entity library and the target semantic tag, and finally determines the display sequence of the plurality of candidate entities in a retrieval result according to the association degree between each candidate entity and the corresponding reference semantic tag. Therefore, in the entity retrieval process, the obtained candidate entities are sequentially displayed according to the association degree between the entities and the reference semantic tags, so that not only is the accuracy and reliability of entity retrieval improved, but also a user can quickly and accurately determine the target entities from the retrieval result.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 505 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 505 such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as semantic tag-based entity retrieval. For example, in some embodiments, semantic tag-based entity retrieval may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 505. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When loaded into RAM 503 and executed by computing unit 501, may perform one or more of the steps of semantic tag-based entity retrieval described above. Alternatively, in other embodiments, the computing unit 501 may be configured to perform semantic tag-based entity retrieval in any other suitable manner (e.g., by way of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
In the embodiment of the disclosure, an entity retrieval request including a target semantic tag is firstly obtained, then a plurality of candidate entities are obtained from an entity library according to the matching degree of a reference semantic tag corresponding to each entity in the entity library and the target semantic tag, and finally the display sequence of the candidate entities in a retrieval result is determined according to the association degree between each candidate entity and the corresponding reference semantic tag. Therefore, in the entity retrieval process, the obtained candidate entities are sequentially displayed according to the association degree between the entities and the reference semantic tags, so that not only is the accuracy and reliability of entity retrieval improved, but also a user can quickly and accurately determine the target entities from the retrieval result.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A semantic tag-based entity retrieval method comprises the following steps:
acquiring an entity retrieval request, wherein the retrieval request comprises a target semantic tag;
acquiring a plurality of candidate entities from an entity library according to the matching degree of a reference semantic label corresponding to each entity in the entity library and the target semantic label;
and determining the display sequence of the candidate entities in the retrieval result according to the association degree between each candidate entity and the corresponding reference semantic label.
2. The method of claim 1, further comprising:
acquiring a reference corpus set;
determining a co-occurrence quantity of each entity and a corresponding reference semantic tag in the entity library in the reference corpus, wherein the co-occurrence quantity is used for representing the occurrence times of the entity and the corresponding reference semantic tag in the reference corpus;
determining the matching degree between each entity and the corresponding reference semantic label based on a preset concept map and a knowledge map;
and determining the association degree between each entity and the corresponding reference semantic label according to the co-occurrence amount and the matching degree of each entity and the corresponding reference semantic label.
3. The method of claim 2, wherein the determining the matching degree between each entity and the corresponding reference semantic tag based on the preset concept graph and knowledge graph comprises:
traversing the preset concept graph to obtain a superior concept corresponding to each reference semantic label;
traversing the preset knowledge graph to determine an example corresponding to the upper concept;
and determining the matching degree between the entity and the corresponding reference semantic label according to the matching degree between the instance and the reference semantic label.
4. The method of claim 2 or 3, wherein the determining the association between each entity and the corresponding reference semantic tag comprises:
under the condition that any entity corresponds to a plurality of reference semantic tags, determining the co-occurrence probability of every two reference semantic tags in the plurality of reference semantic tags among the same entities in the entity library;
determining a correlation coefficient between each reference semantic label and any one entity according to the co-occurrence probability of each reference semantic label and another reference semantic label in the plurality of reference semantic labels between the same entity in the entity library and the number of the plurality of reference semantic labels;
and determining the association degree between each entity and the corresponding reference semantic label according to the correlation coefficient, the co-occurrence amount and the matching degree between each reference semantic label and any entity.
5. The method of claim 4, wherein the determining a co-occurrence probability of each two of the plurality of reference semantic tags between the same entity in the entity library comprises:
determining the co-occurrence frequency of every two reference semantic tags between the same entities in the entity library and the number of the entities corresponding to each reference semantic tag in the entity library;
and determining the co-occurrence probability of the two reference semantic labels according to the co-occurrence frequency of the two reference semantic labels and the number of the entities corresponding to each reference semantic label in the entity library.
6. The method of claim 2 or 3, wherein the determining the co-occurrence of each entity in the entity library and the corresponding reference semantic tag in the reference corpus comprises:
determining the co-occurrence frequency of each entity and the corresponding reference semantic tag in the reference corpus set and the occurrence frequency of the entity in the reference corpus set;
and determining the co-occurrence quantity according to the co-occurrence times and the occurrence times.
7. The method of claim 6, wherein the determining a number of co-occurrences of each of the entities with a corresponding reference semantic tag in the reference corpus comprises:
and under the condition that a plurality of reference semantic labels belong to the same label cluster, determining the co-occurrence frequency of each entity and each reference semantic label in the label cluster in the reference corpus set.
8. An entity retrieval apparatus based on semantic tags, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an entity retrieval request which comprises a target semantic tag;
the second acquisition module is used for acquiring a plurality of candidate entities from the entity library according to the matching degree of the reference semantic label corresponding to each entity in the entity library and the target semantic label;
and the first determining module is used for determining the display sequence of the candidate entities in the retrieval result according to the association degree between each candidate entity and the corresponding reference semantic label.
9. The apparatus of claim 8, further comprising:
the third acquisition module is used for acquiring a reference corpus set;
a second determining module, configured to determine a co-occurrence amount of each entity and a corresponding reference semantic tag in the entity library in the reference corpus, where the co-occurrence amount is used to characterize the occurrence times of the entity and the corresponding reference semantic tag in the reference corpus;
the third determining module is used for determining the matching degree between each entity and the corresponding reference semantic label based on a preset concept map and a preset knowledge map;
and the fourth determining module is used for determining the association degree between each entity and the corresponding reference semantic label according to the co-occurrence amount and the matching degree of each entity and the corresponding reference semantic label.
10. The apparatus of claim 9, wherein the third determining module is specifically configured to:
traversing the preset concept graph to obtain a superior concept corresponding to each reference semantic label;
traversing the preset knowledge graph to determine an example corresponding to the upper concept;
and determining the matching degree between the entity and the corresponding reference semantic label according to the matching degree between the instance and the reference semantic label.
11. The apparatus of claim 9 or 10, wherein the fourth determining means comprises:
a first determining unit, configured to determine, when any entity corresponds to multiple reference semantic tags, a co-occurrence probability of every two reference semantic tags in the multiple reference semantic tags among the same entity in the entity library;
a second determining unit, configured to determine a correlation coefficient between each of the plurality of reference semantic tags and any one of the entities according to a co-occurrence probability between each of the plurality of reference semantic tags and another reference semantic tag in the same entity in the entity library and the number of the plurality of reference semantic tags;
and the third determining unit is used for determining the association degree between each entity and the corresponding reference semantic label according to the correlation coefficient, the co-occurrence amount and the matching degree between each reference semantic label and any entity.
12. The apparatus of claim 11, wherein the first determining unit is specifically configured to:
determining the co-occurrence frequency of every two reference semantic tags between the same entities in the entity library and the number of the entities corresponding to each reference semantic tag in the entity library;
and determining the co-occurrence probability of the two reference semantic labels according to the co-occurrence frequency of the two reference semantic labels and the number of the entities corresponding to each reference semantic label in the entity library.
13. The apparatus of claim 9 or 10, wherein the second determining means comprises:
a fourth determining unit, configured to determine the co-occurrence frequency of each entity and the corresponding reference semantic tag in the reference corpus and the occurrence frequency of the entity in the reference corpus;
and the fifth determining unit is used for determining the co-occurrence quantity according to the co-occurrence times and the occurrence times.
14. The apparatus of claim 13, wherein the fourth determining unit is specifically configured to:
and under the condition that a plurality of reference semantic labels belong to the same label cluster, determining the co-occurrence frequency of each entity and each reference semantic label in the label cluster in the reference corpus set.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising computer instructions which, when executed by a processor, carry out the steps of the method of any one of claims 1 to 7.
CN202111260744.3A 2021-10-27 2021-10-27 Entity retrieval method and device based on semantic tag and electronic equipment Pending CN114116914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111260744.3A CN114116914A (en) 2021-10-27 2021-10-27 Entity retrieval method and device based on semantic tag and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111260744.3A CN114116914A (en) 2021-10-27 2021-10-27 Entity retrieval method and device based on semantic tag and electronic equipment

Publications (1)

Publication Number Publication Date
CN114116914A true CN114116914A (en) 2022-03-01

Family

ID=80377452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111260744.3A Pending CN114116914A (en) 2021-10-27 2021-10-27 Entity retrieval method and device based on semantic tag and electronic equipment

Country Status (1)

Country Link
CN (1) CN114116914A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741385A (en) * 2022-03-10 2022-07-12 北京元年科技股份有限公司 Method, device and equipment for generating general data relation structure and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741385A (en) * 2022-03-10 2022-07-12 北京元年科技股份有限公司 Method, device and equipment for generating general data relation structure and readable storage medium

Similar Documents

Publication Publication Date Title
US11301637B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
CN112749344B (en) Information recommendation method, device, electronic equipment, storage medium and program product
CN113590645B (en) Searching method, searching device, electronic equipment and storage medium
CN113033194B (en) Training method, device, equipment and storage medium for semantic representation graph model
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN113660541B (en) Method and device for generating abstract of news video
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
CN114330335A (en) Keyword extraction method, device, equipment and storage medium
CN112560461A (en) News clue generation method and device, electronic equipment and storage medium
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN114116914A (en) Entity retrieval method and device based on semantic tag and electronic equipment
CN112560425A (en) Template generation method and device, electronic equipment and storage medium
US20230004715A1 (en) Method and apparatus for constructing object relationship network, and electronic device
CN114201607B (en) Information processing method and device
CN113553410B (en) Long document processing method, processing device, electronic equipment and storage medium
CN116227569A (en) Performance evaluation method and device for pre-training language model and interpretability method
CN112818221B (en) Entity heat determining method and device, electronic equipment and storage medium
CN112818167B (en) Entity retrieval method, entity retrieval device, electronic equipment and computer readable storage medium
CN114417862A (en) Text matching method, and training method and device of text matching model
CN113536751A (en) Processing method and device of table data, electronic equipment and storage medium
CN112528644A (en) Entity mounting method, device, equipment and storage medium
CN112784600A (en) Information sorting method and device, electronic equipment and storage medium
CN117764044A (en) Document dividing method and device, electronic equipment and storage medium
CN115858729A (en) Multi-type knowledge retrieval and statistics method, device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination