CN110147421B - Target entity linking method, device, equipment and storage medium - Google Patents

Target entity linking method, device, equipment and storage medium Download PDF

Info

Publication number
CN110147421B
CN110147421B CN201910388403.0A CN201910388403A CN110147421B CN 110147421 B CN110147421 B CN 110147421B CN 201910388403 A CN201910388403 A CN 201910388403A CN 110147421 B CN110147421 B CN 110147421B
Authority
CN
China
Prior art keywords
information
text information
word
entity text
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910388403.0A
Other languages
Chinese (zh)
Other versions
CN110147421A (en
Inventor
吴坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910388403.0A priority Critical patent/CN110147421B/en
Publication of CN110147421A publication Critical patent/CN110147421A/en
Application granted granted Critical
Publication of CN110147421B publication Critical patent/CN110147421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a target entity linking method, a device, equipment and a storage medium, wherein the method comprises the following steps: performing multi-dimensional text analysis processing on the target entity text information to obtain multi-dimensional text information comprising word information and word weight information; determining candidate entity text information from a preset entity library based on the word information, wherein the preset entity library comprises word information and word weight information of the entity text information; inputting the word information and word weight information of the target entity text information and the word information and word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information; and using the associated entity text information as link entity text information of the target entity text information. By the technical scheme, the representation capability of the entity text information can be improved, the accuracy of the determined link entity text information is improved, and the entity link of the target entity can be successfully realized based on the link entity text information.

Description

Target entity linking method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for linking a target entity.
Background
A POI (Point of interest) is a representation of geographic information collected in a geographic information system, and may be a building, a business, a mailbox, a bus station, or the like. The attribute information of each POI entity may generally include entity text information and address information. The POI entity link refers to a process of linking POI entity text information in the address text to entity text information in a POI entity library so as to acquire accurate address information, and has wide application prospects in the fields of natural language processing, information retrieval and the like.
Most of the existing POI entity link technologies adopt a scheme of calculating text similarity and candidate sorting, and particularly, keywords can be constructed according to word segmentation information of target entity text information; then, recalling the text information of the relevant entity through the keywords; then, based on the similarity of texts between the target entity text information and the related entity text information, sequencing from high to low; and selecting the related entity text information with the most advanced sequence as the link entity text information of the target entity text information, and further acquiring the address information of the target entity text information. However, in the existing scheme, only the text similarity between the entity text information is considered, and whether the entity text information corresponds to the same entity cannot be accurately judged, so that a link error is caused, the problem of entity ambiguity is not well solved, and the accuracy is low. Therefore, there is a need to provide a more reliable or efficient solution.
Disclosure of Invention
The application provides a target entity linking method, a device, equipment and a storage medium, which can improve the representation capability of entity text information, further improve the accuracy of the determined linked entity text information, and successfully realize the entity linking of a target entity based on the linked entity text information.
In one aspect, the present application provides a target entity linking method, where the method includes:
performing multi-dimensional text analysis processing on target entity text information to obtain multi-dimensional text information, wherein the multi-dimensional text information comprises word information and word weight information;
determining candidate entity text information of the target entity text information from a preset entity library based on the word information, wherein the preset entity library comprises word information and word weight information of the entity text information;
inputting the word information and word weight information of the target entity text information and the word information and word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information;
and taking the associated entity text information as link entity text information of the target entity text information.
Another aspect provides a target entity linking apparatus, including:
the multidimensional text analysis processing module is used for carrying out multidimensional text analysis processing on target entity text information to obtain multidimensional text information, and the multidimensional text information comprises word information and word weight information;
the candidate entity text information determining module is used for determining candidate entity text information of the target entity text information from a preset entity library based on the word information, wherein the preset entity library comprises the word information and the word weight information of the entity text information;
the semantic association module is used for inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information;
and the link entity text information determining module is used for taking the associated entity text information as link entity text information of the target entity text information.
Another aspect provides a target entity linking device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the target entity linking method as described above.
Another aspect provides a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the target entity linking method as described above.
The target entity linking method, device, equipment and storage medium provided by the application have the following technical effects:
the method comprises the steps of carrying out multi-dimensional text analysis processing on target entity text information to obtain multi-dimensional text information which can represent the target entity text information from more dimensions; then, candidate entity text information is screened out from a preset entity library based on word information in the multi-dimensional text information; then, inputting word information and word weight information of the target entity text information and the candidate entity text information into a semantic association model for semantic association, and combining the word weight during the semantic association while considering the strength of association among the entity text information and the representation of the importance degree of each feature in each entity text information; the link entity text information of the target entity text information can be accurately determined, and the target entity text information is successfully linked to the preset entity library. By utilizing the technical scheme provided by the embodiment of the specification, the representation capability of the entity text information can be greatly improved, the accuracy of the determined link entity text information is further improved, and the entity link of the target entity can be successfully realized based on the link entity text information.
Drawings
In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an entity linking system provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a target entity linking method according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a method for performing multidimensional text analysis processing on target entity text information to obtain multidimensional text information according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another method for performing multidimensional text analysis processing on target entity text information to obtain multidimensional text information according to the embodiment of the present application;
FIG. 5 is a flowchart illustrating a semantic association model training method according to an embodiment of the present disclosure;
fig. 6 is a schematic flow chart of a method for inputting word information and word weight information of the target entity text information and word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information according to the embodiment of the present application;
FIG. 7 is a diagram illustrating an example of obtaining association scores of target entity text information and candidate entity text information by performing semantic association based on a semantic association model according to an embodiment of the present application;
fig. 8 is a schematic flowchart of another target entity linking method provided in an embodiment of the present application;
FIG. 9 is a schematic diagram of a scenario of entity linking provided in an embodiment of the present application;
FIG. 10 is a schematic structural diagram of a target entity linking apparatus according to an embodiment of the present application;
fig. 11 is a schematic diagram of a server according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of an entity linking system according to an embodiment of the present disclosure, and as shown in fig. 1, the system at least includes an entity library building module and an entity linking module.
Specifically, in this embodiment of the present disclosure, the system may include a server that operates independently, or a distributed server, or a server cluster composed of multiple servers.
Specifically, in this embodiment of the present specification, the entity library construction module may be configured to construct an entity library (i.e., a preset entity library) including a text index and a spatial index of a large amount of entity text information; specifically, the spatial index may include a spatial index of a large amount of entity text information (POI entity text information) and address information of the entity text information, and specifically, the address information may include, but is not limited to, road network data, town data, village data, and gateway database data. The text index may include a text index between a large amount of entity text information and word information of the large amount of entity text information, and specifically, the word information may include, but is not limited to, information such as a full name, an alias, a keyword after word segmentation, pinyin, a synonym, and an error correction word of the entity text information. In addition, the text index may further include a text index of a large amount of entity text information and word weights, word role information, word hierarchy information, and word function information of the large amount of entity text information.
In practical applications, the entity library may further include a ranking index of a large amount of entity text information and its ranking information. Specifically, the ranking information of the entity text information may include a ranking determined based on the information reflecting the popularity of the entity text information, such as the search amount and the like of the entity text information, and generally, the higher the popularity of the entity text information is, the higher the ranking of the entity text information is.
Specifically, in the embodiment of the present specification, the entity linking module may be configured to perform multidimensional text analysis processing on target entity text information; then, recalling candidate entity text information of the target entity text information based on the information after the multidimensional text analysis processing and text index information in the entity library; then, entity association is carried out by combining a semantic association model, and associated entity text information of the target entity text information is determined from the candidate entity text information; then, the spatial index information in the entity library can be combined to carry out address matching verification on the associated entity text information and the target entity text information; after the verification is passed, the associated entity text information can be used as link entity text information of the target entity text information, and then the entity link of the target entity text information is realized based on the link entity text information.
A specific embodiment of a target entity linking method according to the present application is described below, and fig. 2 is a schematic flowchart of a target entity linking method according to the present application, where the method operation steps described in the embodiments or the flowchart are provided in this specification, but more or fewer operation steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In actual system or server product execution, sequential execution or parallel execution (e.g., parallel processor or multithreaded processing environments) may occur according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:
s201: and carrying out multi-dimensional text analysis processing on the target entity text information to obtain multi-dimensional text information.
In the embodiment of the present specification, the target entity text information may be text information of a certain target entity; in an embodiment of this specification, the multi-dimensional text information may include word information and word weight information. Specifically, the word information at least includes one of the following: original word information, pinyin information, synonym information and error correction word information;
when the word information includes original word information, pinyin information, synonym information, and error correction word information, as shown in fig. 3, performing multidimensional text analysis processing on the target entity text information to obtain multidimensional text information may include:
s2011: and performing word segmentation processing on the target entity text information to obtain original word information of the target entity text information.
In the embodiment of the present specification, word segmentation processing may be performed on target entity text information by using a natural language processing algorithm, and the obtained multiple words after word segmentation may be used as original word information of the target entity text information.
In a specific embodiment, for example, the text information of the target entity is: in the Chinese technical transaction building, correspondingly, the original word information after word segmentation processing can comprise four words of China, technology, transaction and building.
In addition, it should be noted that, in some embodiments, the original word information after the word segmentation processing may further include a single word.
S2013: and taking the pinyin information of the original word information as the pinyin information of the target entity text information.
In the embodiment of the present specification, the pinyin information of each character in the original word information may be obtained to obtain the pinyin information of the target entity text information.
In a specific embodiment, for example, the original word information of the target entity text information is china, technology, business, and building, and accordingly, the pinyin information of the target entity text information may include: zhongguo, jishu, jiaoyi, dasha.
S2015: and carrying out synonymy conversion processing on the original word information to obtain synonymy information of the target entity text information.
In the embodiments of the present specification, synonym information of the target entity text information may be obtained by combining a synonym transformation model. Specifically, the original word information may be input into a predetermined synonymy conversion model to perform synonymy conversion processing, and words in the original word information are converted into synonyms, so as to obtain synonym information of the target entity text information.
Specifically, the synonymy transformation model is determined by adopting the following method:
1) acquiring data of word pairs to be trained;
2) and performing synonymy transformation training on the second deep learning model based on the word pair data to obtain a synonymy transformation model.
Specifically, the word pair data to be trained may include a plurality of word pairs labeled with the same or different semantics.
The second deep learning model in the embodiments of the present specification may include, but is not limited to, a convolutional neural network, a logistic regression, a recurrent neural network, or the like.
S2017: and carrying out error correction processing on the original word information to obtain error correction word information of the target entity text information.
In the embodiment of the present specification, the performing error correction processing on the original word information may include, but is not limited to, performing error correction by combining pinyin of a word in the original word information or performing error correction on font of the word in the original word information.
In a specific embodiment, taking the example of error correction by combining the pinyin of a word in the original word information, error correction can be performed based on a certain rule, and specifically, rule check can be performed on the pinyin string input by the user through a rule of refining the pinyin entry to obtain a corresponding error correction result. Specifically, for example, error correction processing is performed on huanglongcun to obtain error correction words: huanglongcun.
S2019: and determining word weights of words in the original word information, the pinyin information, the synonym information and the error-correcting word information based on a word weight recognition model.
In an embodiment of the present specification, the word weight recognition model includes a model obtained by training based on word information labeled with word weights. Specifically, a large amount of word information marked with word weights can be collected; and performing word weight recognition training on the third deep learning model based on the word information marked with the word weight to obtain a word weight recognition model. Subsequently, the word weight of the word in the word information can be obtained by inputting the word information into the word weight recognition model.
The third deep learning model in the embodiments of the present specification may include, but is not limited to, a convolutional neural network, a logistic regression, a recurrent neural network, or the like.
Specifically, the word information marked with word weight may include a plurality of words marked with word weight, and specifically, the word weight of a word may represent indelibility of the word in the entity text information. Specifically, the indelibility of a word in the entity text information may reflect the role of the word when the entity text information is distinguished from other entity text information.
In a specific embodiment, the word information is assumed to include three words of "middle guancun", "avenue", "46 number"; correspondingly, the two words of 'middle customs' and 'big street' are often more frequent than '46', the word weight is determined in the existing word frequency mode, and the word weight of the two words of 'middle customs' and 'big street' is more important than the word weight of '46'; however, in consideration of indelibility of words in the entity text information, that is, "middle guan village", "big street" and "46" have a larger role in distinguishing the entity text information "middle guan village street 46" from other entity text information than "middle guan village" and "big street". Specifically, in the embodiment of the present specification, the word weights of "middle guancun", "avenue, and" 46 "may be respectively denoted as 0.66, 0.77, and 0.94.
In the embodiment of the specification, the target entity text information can be better represented by acquiring the multi-dimensional word information and the word weight representing the indelibility of the words in the word information in the entity text information, so that the accuracy of the subsequently determined link entity text information is improved.
In some embodiments, the multi-dimensional textual information may further include at least one of: word role information, word hierarchy information, and word function information. Specifically, as shown in fig. 4, when the multi-dimensional text information includes: when the word role information, the word level information, and the word function information are provided, the performing multidimensional text analysis processing on the target entity text information to obtain multidimensional text information may include:
s20111: and determining word role information of the target entity text information based on the part of speech of the word in the target entity text information.
In the embodiment of the present specification, different words in the target entity text information often have different parts of speech, and specifically, the part of speech of a word may refer to a word characteristic as a basis for dividing a part of speech.
In a specific embodiment, for example, the target entity text information is a chinese technical transaction building, and accordingly, the word roles of "china" in four words of "china", "technology", "transaction" and "building" in the target entity text information are country names, the word roles of "technology" are business nouns, the word roles of "transaction" are business verbs, and the word roles of "building" are category words.
S20113: and determining word level information of the target entity text information based on the structural relationship among the words in the target entity text information.
In the embodiment of the present specification, a certain structural relationship often exists between words in the target entity text information, for example, a master-slave relationship exists between words, and correspondingly, master-slave hierarchical recognition is performed on the structural relationship between words, so as to determine word level information in the target entity text information, for example, a "B seat" of a chinese technical transaction building "is a slave component of the" chinese technical transaction building ", and correspondingly, the" chinese technical transaction building "may be a master level and a" B seat "is a slave level.
S20115: and performing function analysis on words in the target entity text information to obtain word function information of the target entity text information.
In this embodiment of the present specification, performing function analysis on the vocabulary in the target entity text information may include mapping the vocabulary in the target entity text information to a preset functional group, and specifically, the preset functional group may include: core words, namely words such as unique proper nouns, quantity, directions and the like of the text information of the positioning target entity; category words, namely, words such as services and categories of the text information of the positioning target entity; additional words, i.e. additional information (such as branch, main points, etc.) which supplement the text information of the target entity; other words, namely words except the core word, the category word and the additional word in the text information of the target entity.
In the embodiment of the specification, the word role information, the word level information and the word function information of the target entity text information are acquired, so that the target entity text information can be represented from more dimensions, the target entity text information can be better represented, and the accuracy of the subsequently determined link entity text information is improved.
S203: and determining candidate entity text information of the target entity text information from a preset entity library based on the word information.
In an embodiment of the present specification, the preset entity library may include word information and word weight information of entity text information. Specifically, the step of determining the word information and the word weight of the entity text information in the preset entity library may refer to the step of determining the word information and the word weight of the merchant target entity text information, and is not described herein again.
In practical application, the word information and the word weight in the preset entity library are included in the text index in the preset entity library, that is, the word information and the word weight correspond to the entity text information.
Specifically, determining candidate entity text information of the target entity text information from a preset entity library based on the word information may include:
1) and determining the text correlation of the target entity text information and the entity text information in the preset entity library based on the word information of the target entity text information and the word information of the entity text information in the preset entity library.
In this embodiment of the present specification, the text relevance may include a specific value quantized by a preset rule, of a tokenized representation that can reflect a text relevance degree or a trend between the target entity text information and the entity text information in a preset entity library; when the text correlation degree between the target entity text information and the entity text information in the preset entity library is better, the text correlation is higher, and the specific value is higher; on the contrary, when the text correlation degree between the target entity text information and the entity text information in the preset entity library is worse, the text correlation is smaller, and the specific value is smaller.
In this embodiment of the present specification, determining the text correlation between the target entity text information and the entity text information in the preset entity library may include, but is not limited to, using a BM25 algorithm, and specifically, each word in word information (where the word information may include, but is not limited to, at least one of original word information, pinyin information, synonym information, and error correction word information) of the target entity text information may be regarded as qi(ii) a Then, considering word information (the word information may include entity text information) of entity text information in the preset entity library as d, and calculating each qiScoring the correlation with d, and finally, scoring qiAnd carrying out weighted summation on the relevance scores of the word information d relative to the entity text information so as to obtain the relevance score (namely text relevance) of the target entity text information and the entity text information d.
The general formula of the BM25 algorithm is as follows:
Figure BDA0002055605830000101
wherein Q represents target entity text information, QiRepresenting the ith word in the text information of the target entity; d represents word information of any entity text information in a preset entity library; wiRepresenting the weight of the ith word in the text information of the target entity; r (q)iAnd d) represents a relevance score of the ith word with the word information d of the entity text information.
In the examples of this specification, WiMay be determined in conjunction with a term frequency-inverse document frequency (TF-IDF) algorithm.
In the embodiment of the specification, each word q in the text information of the target entityiCorrelation score R (q) with entity text information in preset entity libraryiD) can be combined with the following two formulasAnd (3) calculating:
Figure BDA0002055605830000111
wherein k is1,k2B is an adjustment factor, which is usually set according to the actual application, e.g. k1=2,k210, b is 0.75. fi is the frequency of occurrence of qi in d, qfiIs qiThe occurrence frequency in the target entity text information, dl is the length of the word information d of the entity text information in the preset entity library, and avgdl is the length of the word information of the entity text information in the preset entity library.
In the embodiment of the present specification, it is considered that, because entity text information is mostly short text and the length difference between word information of the entity text information is not large, it is meaningless to divide the length of word information d of the entity text information in the preset entity library by avgdl (average length of word information of all entity text information) in the original formula, so that the avgdl parameter in the existing formula is directly set as the length of word information of the entity text information in the preset entity library, thereby simplifying the calculation process and improving the calculation efficiency.
2) Determining candidate entity text information of the target entity text information from a preset entity library based on the text correlation.
In this embodiment of the present specification, after obtaining the text relevance, entity text information whose text relevance to word information of the target entity text information is greater than or equal to a first preset threshold may be used as candidate entity text information of the target entity text information.
In other embodiments, after the text relevance is obtained, the text relevance may be ranked from large to small, and the entity text information ranked at the first preset position is used as the candidate entity text information of the target entity text information.
In another embodiment, the candidate entity text information may be selected in combination with semantic relevance, and specifically, the determining the candidate entity text information of the target entity text information from a preset entity library based on the word information may include:
1) and determining semantic correlation between the target entity text information and word information of the entity text information in the preset entity library.
In this embodiment of the present specification, when determining semantic relevance, the word information of the target entity text information and the entity text information in the preset entity library may include, but is not limited to, at least one of original word information, pinyin information, synonym information, and error-correcting word information. Specifically, the step of determining semantic relevance may be as follows:
1) determining word vectors of word information of target entity text information and entity text information in a preset entity library;
in particular, the manner in which the Word vectors are determined herein may include, but is not limited to, incorporating a Word2vector model.
2) And calculating the similarity between the word vectors.
In the embodiments of the present specification, semantic relevance between words is characterized by similarity between word vectors of word information. Specifically, the similarity between word vectors herein may include, but is not limited to, cosine distance, euclidean distance, manhattan distance, etc. between word vectors.
In an embodiment of the present specification, the semantic relevance may include a semantic relevance degree or a semantic relevance trend that can reflect a semantic relevance between the target entity text information and the entity text information in the preset entity library; when the semantic correlation degree between the entity text information in the target entity text information preset entity library is better, the similarity between word vectors is higher, and the semantic correlation is higher; on the contrary, the semantic correlation degree between the word information of the target entity text information and the entity text information in the preset entity library is worse. The smaller the similarity among the word vectors is, the smaller the semantic correlation is;
in addition, when the word information includes multi-dimensional information (i.e., multiple kinds of word information), word vectors of the word information of each dimension may be respectively obtained, then the word vectors of the multiple dimensions are weighted and averaged, and similarity between the target entity text information and the entity text information in the preset entity library is calculated based on the word vectors after weighted and averaged word vectors of the target entity text information and the entity text information in the preset entity library, so as to obtain semantic correlation between the target entity text information and the entity text information in the preset entity library.
2) Determining candidate entity text information of the target entity text information from a preset entity library based on the semantic correlation.
In this embodiment of the present specification, after obtaining the semantic relevance, entity text information whose semantic relevance with word information of the target entity text information is greater than or equal to a second preset threshold may be used as candidate entity text information of the target entity text information.
In other embodiments, after the semantic relevance is obtained, the entity text information ranked at the second top preset position may be used as candidate entity text information of the target entity text information according to the descending order of the semantic relevance.
In practical applications, the entity text information often corresponds to an address information, and correspondingly, in other embodiments, the candidate entity text information may be selected in combination with the address information. Correspondingly, the preset entity library may further include address information of entity text information, and the method may further include:
1) and acquiring the address information of the text information of the target entity.
2) And determining the correlation of the address information of the target entity text information and the candidate entity text information.
In this embodiment of the present specification, the correlation between the address information may include a specific value quantized by a preset rule, where the characterization of the target entity text information is capable of reflecting the matching degree or trend of the target entity text information and the address information of the preset entity library; the better the matching degree between the address information is, the greater the correlation between the address information is, the larger the specific value is; conversely, the worse the matching degree between the address information is, the smaller the correlation between the address information is, and the smaller the specific value is.
Specifically, the address information may include, but is not limited to, road network data, village and town data, village data, portal database data, and the like; the correlation between the address information in the embodiment of the present specification may include comparing two of the road network data, the town data, the village data, the door address database data, and the like in the two address information, where one of the two address information is the same, and a specific value corresponding to the correlation of the address information is added by 1, and on the contrary, the specific value corresponding to the correlation of the address information is different and is not changed.
3) Determining a first target candidate entity text information of the target entity text information from the candidate entity text information based on the correlation of the address information.
In this embodiment of the present specification, after obtaining the correlation of the address information, entity text information corresponding to address information whose correlation with the address information of the target entity text information is greater than or equal to a third preset threshold may be used as candidate entity text information of the target entity text information.
In addition, it should be noted that, in practical applications, the address information in the preset entity library is included in the spatial index in the preset entity library, that is, the address information corresponds to the entity text information.
In other embodiments, after the correlation of the address information is obtained, the entity text information ranked at the third preset position may be used as the candidate entity text information of the target entity text information, and the order may be performed according to the size of the correlation of the address information from large to small.
In other embodiments, the preset entity library further includes ranking information of the entity text information generated based on the heat information of the entity text information, the ranking information of the entity text information often represents that the importance degree of the entity text information is corresponding, and the candidate entity text information can be selected in combination with the ranking information, so that candidate entities with higher importance degrees are prevented from being filtered out. Specifically, the method may further include:
1) and obtaining ranking information of the text information of the first target candidate entity.
Specifically, the determination of the ranking information of the text information of the first target candidate entity may be combined with the above related steps, and is not described herein again.
2) Determining second target candidate entity text information of the target entity text information from the first target candidate entity text information based on the ranking information;
in this embodiment of the specification, after the ranking information is obtained, the ranking information may be sorted from front to back, and the entity text information ranked at the fourth preset position is used as the candidate entity text information of the target entity text information.
In addition, it should be noted that, in practical applications, the ranking information in the preset entity library is included in the ranking index in the preset entity library, that is, the ranking information corresponds to the entity text information.
In the embodiment of the present specification, the first preset threshold, the second preset threshold, the third preset threshold, the first preset bit, the second preset bit, the third preset bit, and the fourth preset bit may be set in combination with practical applications.
In other embodiments, before determining candidate entity text information of the target entity text information from a preset entity library based on the word information, a part of entity text information may be recalled from the preset entity library in combination with a fuzzy matching algorithm and the word information, so as to reduce the subsequent amount of computation.
Accordingly, the determining of the candidate entity text information of the target entity text information from the preset entity library based on the word information may include determining the candidate entity text information of the target entity text information from a part of entity text information based on the word information.
In the embodiment of the specification, the candidate entity text information of the target entity text information is screened in a multi-angle mode through text correlation, semantic correlation, address information correlation and ranking information, so that the correlation between the screened candidate entity text information and the target entity text information can be greatly improved, and the accuracy of the subsequently determined link entity text information is further ensured.
In addition, it should be noted that, in practical applications, the above-mentioned multiple schemes for determining candidate entity text information may not be limited to the above-mentioned processing sequence, and in practical applications, for example, a scheme based on ranking information may be performed before a scheme based on address information to determine candidate entity text information; multiple schemes may also be combined to determine candidate entity textual information.
S205: and inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information.
In this embodiment of the present specification, a semantic association model that can perform semantic association on two pieces of entity text information (i.e., identify whether two pieces of entity text information are the same entity text information) may be trained in advance, and in this embodiment of the present specification, the semantic association model may include a model obtained by training based on word information with word weights. The first deep learning model used in the training process of the semantic association model in the embodiments of the present specification may include, but is not limited to, logistic regression, deep semantic matching model (MatchPyramid), and the like. Specifically, taking a deep semantic matching model (MatchPyramid) as an example, as shown in fig. 5, the semantic association model is determined by adopting the following method:
s2051: positive sample data is acquired.
In this embodiment, the positive sampling data may include entity text information of a same cluster relationship and/or entity text information of a main sub-relationship. Specifically, the entity text information in the same cluster relationship may include different text information of the same entity text information, such as university of beida and beijing. Specifically, the entity text information of the main-sub relationship may include text information of different levels corresponding to the same entity text information, such as a north large library (sub) and a north large library (main).
S2053: negative sample data is acquired.
In an embodiment of the present specification, the negative sampling data may include entity text information with an address information error and/or entity text information with a correlation between two different entity text information satisfying a preset condition.
Specifically, the entity text information whose correlation between two different entity text information satisfies the preset condition may include that the correlation degree of the two different entity text information is greater than the set degree.
S2055: and respectively carrying out multi-dimensional text analysis processing on the positive sampling data and the negative sampling data to obtain multi-dimensional text information of the positive sampling data and the negative sampling data, wherein the multi-dimensional text information comprises word information and word weight information.
Specifically, the multi-dimensional text analysis processing may refer to the above-mentioned related steps of multi-dimensional text analysis on the target entity text information, and is not described herein again.
S2057: and inputting the word information and the word weight information of the positive sampling data and the word information and the word weight information of the negative sampling data into a first deep learning model for semantic association training to obtain the semantic association model.
In a particular embodiment, the first deep learning model may include a correlation matrix build layer, a convolutional layer, a pooling layer, and a multi-layer perceptron.
Wherein the incidence matrix construction layer is used for constructing an incidence matrix based on word information of two incidence objects (positive sampling data and negative sampling data);
the convolutional layer can be used for determining corresponding feature vectors based on the incidence matrix constructed by the incidence matrix construction layer;
the pooling layer can be used for performing dimension reduction processing on the feature vectors output by the convolutional layer;
the multilayer perceptron can be used for performing similarity fitting processing on the feature vectors and the combined word weights after the dimensionality reduction processing of the pooling layer to obtain a correlation score representing the degree of correlation between two correlation objects.
Specifically, in the process of training the semantic association model, training data (word information and word weight of positive sampling data, and word information and word weight of negative sampling data) are input into the first deep learning model, and the association matrix construction layer assigns values to the association matrix based on whether two entity text information in the positive sampling data are consistent, wherein the consistent value is 1, and the inconsistent value is 0; then, after the processing of the convolution layer and the pooling layer, the characteristic vector of each word in the word information of the entity text information can be obtained; then, when performing similar fitting processing on word information of two entity text information based on the feature vector in the multilayer perceptron, namely when calculating the similarity between the feature vectors of the two word information (the similarity here may include but is not limited to cosine distance, Euclidean distance, Manhattan distance, etc.), the output association score is the probability p (p is a number between 0 and 1) that the training data is positive sample data by combining the word weight of each feature vector corresponding to the word in the word information, and the sample labels of the positive sample data and the negative sample data respectively make y be 1 and 0, the loss between the sample label y and the probability p is defined as (y-p) ^2, correspondingly, the error can be obtained according to (y-p) ^2 in the training process; each threshold is updated using a gradient descent method and trained again. The modified threshold value can make the error between the probability p of the next model output and the sample label y smaller, and when the error is less than a certain value, the current first deep learning model can be used as the line meaning association model.
In the embodiment of the present specification, when a semantic association model is established in a word weight manner, it is ensured that both the strength of association between associated objects and the representation of the importance of each feature (each word in word information) in each associated object are considered.
Specifically, as shown in fig. 6, inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association, so as to obtain associated entity text information of the target entity text information, which may include:
s2059: and in an incidence matrix construction layer of the semantic association model, constructing an incidence matrix based on the word information of the target entity text information and the candidate entity text information.
S20511: in a convolution layer of the semantic association model, determining feature vectors of the target entity text information and the candidate entity text information based on the association matrix.
S20513: and in the pooling layer of the semantic association model, performing dimension reduction processing on the feature vectors of the target entity text information and the candidate entity text information.
S20515: and performing similar fitting processing on the feature vector and the word weight of the target entity text information and the feature vector and the word weight of the candidate entity text information after the dimension reduction processing in the multilayer perceptron of the semantic association model to obtain an association score representing the association degree between the target entity text information and the candidate entity text information.
S20517: and determining associated entity text information of the target entity text information from the candidate entity text information based on the association score.
In this embodiment of the present specification, candidate entity text information with the highest relevance score may be used as the relevant entity text information of the target entity text information.
In a specific embodiment, as shown in fig. 7, fig. 7 is a schematic diagram of an example of obtaining association scores of target entity text information and candidate entity text information by performing semantic association based on a semantic association model according to the embodiment of the present application.
Furthermore, it should be noted that fig. 7 is only an example of the first deep learning model, and in practical applications, the first deep learning model may include more layers, for example, the convolutional layer is three layers.
In addition, it should be noted that when the word information includes word information of multiple dimensions, similarity fitting processing may be performed on the word information of each dimension, and the similarity corresponding to each dimension is calculated by combining the word weight of each word in the word information of the dimension, and then the multiple dimensions are weighted and averaged to obtain a final association score.
In this embodiment of the present specification, the entering of the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information may include: inputting the word information and the word weight information of the target entity text information and the first target candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information; or, inputting word information and word weight information of the target entity text information and second target candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information;
in other embodiments, the preset entity library may further include word role information, word hierarchy information, and word function information of the entity text information; in practical applications, the word role information, the word hierarchy information and the word function information in the preset entity library are included in a text index in the preset entity library, that is, the word role information, the word hierarchy information and the word function information all correspond to the entity text information.
Correspondingly, inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association, and obtaining associated entity text information of the target entity text information may include: and inputting the word information, the word weight information, the word role information, the word hierarchy information and the word function information of the target entity text information and the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information.
Specifically, considering that the word information, the word weight information, the word role information, the word level information, and the word function information correspond to the feature vectors of different dimensions of the entity text information, when performing similarity fitting processing on the feature vectors of each dimension, then performing weighted average on a plurality of dimensions to serve as a final association score.
In the embodiment of the description, when semantic association processing is performed on two entity text messages, the strength of association between the entity text messages and the representation of the importance of each feature in each entity text message are considered in combination with word weights, and in addition, the entity text messages are represented by information of multiple dimensions, so that the representation capability of feature vectors of the entity text messages on the entity text messages can be greatly improved, and the accuracy of subsequently determining and linking the entity text messages is improved.
S207: and taking the associated entity text information as link entity text information of the target entity text information.
In this embodiment of the present specification, after determining the associated entity text information, the associated entity text information may be directly used as link entity text information of the target entity text information. In the embodiment of the present specification, the link entity text information of the target entity text information and the target entity text information are the same entity text information.
In other embodiments, to better ensure the accuracy of the link entity text information, before the associated entity text information is used as the link entity text information of the target entity text information, as shown in fig. 8, the method may further include:
s209: matching and verifying the target entity text information and the address information of the associated entity text information;
in this embodiment, the address information of the entity text information may include address information corresponding to a certain entity. Correspondingly, when the address information of the associated entity text information is matched with the address information of the target entity text information, the step of taking the associated entity text information as the link entity text information of the target entity text information is executed.
As can be seen from the technical solutions provided by the embodiments of the present specification, the embodiments of the present specification perform multidimensional text analysis processing on target entity text information to obtain multidimensional text information that can represent the target entity text information from more dimensions; then, candidate entity text information is screened out from a preset entity library based on word information in the multi-dimensional text information; then, the word information and the word weight information of the target entity text information and the candidate entity text information are input into a semantic association model for semantic association, and the strength of association among the entity text information and the representation of the importance degree of each feature in each entity text information are considered simultaneously by combining the word weight when the semantic association is carried out, so that the associated entity text information of the target entity text information can be accurately determined; and then, matching verification is carried out through the target entity text information and the address information of the associated entity text information, so that the accuracy of determining the link entity text information of the target entity text information can be better ensured, and the target entity text information is successfully linked to a preset entity library. By using the technical scheme provided by the embodiment of the specification, the representation capability of the entity text information can be greatly improved, the accuracy of the determined link entity text information is further improved, and the entity link of the target entity can be successfully realized based on the link entity text information.
With reference to fig. 9, a scene that logistics information filled by a user needs to be linked with entities in an entity library in a logistics business is described, specifically, the logistics information filled by the user includes a chinese xxxxx company, a seat xx, of a chinese xxxxx mansion, xx, in xx district of beijing city; wherein, the Chinese xxxxx mansion seat A is target entity text information; "xx district xx road xx number" in Beijing City is address information. Correspondingly, multidimensional text information of the Chinese xxxxx mansion seat A can be obtained by analyzing the multidimensional text of the Chinese xxxxx mansion seat A; then, recalling candidate entity text information of 'China xxxxx mansion seat A' from an entity library based on the word information after the multidimensional text analysis processing; then, entity association is carried out by combining a semantic association model, and associated entity text information of 'China xxxxx mansion seat A' is determined from the candidate entity text information; then, the matching verification of the associated entity text information and the address information 'xx road xx number in xx district in Beijing City' of 'China xxxxx mansion seat A' can be carried out by combining the spatial index in the entity library; after the verification is passed, the associated entity text information can be used as link entity text information of 'China xxxxx building seat A', further, the entity link of 'China xxxxx building seat A' is realized on the basis of the link entity text information, the target entity text information is successfully linked to the entity library, further, the coordinate of the link entity text information in the entity library can be obtained, and the accurate positioning of the user logistics information is realized.
In addition, the detailed information related to the entity text information, such as the coordinates of the entity text information in the entity library, may be stored in the entity library, or may be stored in another database.
An embodiment of the present application further provides a target entity linking apparatus, as shown in fig. 10, the apparatus includes:
the multidimensional text analysis processing module 1010 may be configured to perform multidimensional text analysis processing on target entity text information to obtain multidimensional text information, where the multidimensional text information includes word information and word weight information;
a candidate entity text information determining module 1020, configured to determine candidate entity text information of the target entity text information from a preset entity library based on the word information, where the preset entity library includes word information and word weight information of the entity text information;
a semantic association module 1030, configured to input the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association, so as to obtain associated entity text information of the target entity text information;
the link entity text information determining module 1040 may be configured to use the associated entity text information as link entity text information of the target entity text information.
In some embodiments, the semantic association module 1030 may include:
the incidence matrix construction unit is used for constructing an incidence matrix based on the word information of the target entity text information and the candidate entity text information in an incidence matrix construction layer of the semantic association model;
a feature vector determination unit, configured to determine, in a convolutional layer of the semantic correlation model, feature vectors of the target entity text information and the candidate entity text information based on the correlation matrix;
the dimension reduction processing unit is used for carrying out dimension reduction processing on the feature vectors of the target entity text information and the candidate entity text information in the pooling layer of the semantic association model;
a similarity fitting processing unit, configured to perform similarity fitting processing on the feature vector and the word weight of the target entity text information and the feature vector and the word weight of the candidate entity text information after the dimension reduction processing in a multilayer perceptron of the semantic association model, to obtain an association score representing an association degree between the target entity text information and the candidate entity text information;
and the associated entity text information determining unit is used for determining associated entity text information of the target entity text information from the candidate entity text information based on the association score.
In some embodiments, the semantic association model includes determining with:
the device comprises a positive sampling data acquisition unit, a data processing unit and a data processing unit, wherein the positive sampling data acquisition unit is used for acquiring positive sampling data, and the positive sampling data comprises entity text information of a same cluster relation and/or entity text information of a main sub-relation;
and the negative sampling data acquisition unit is used for acquiring negative sampling data, and the negative sampling data comprises entity text information with wrong address information and/or entity text information with correlation between every two different entity text information meeting preset conditions.
The multidimensional text analysis processing unit is used for respectively carrying out multidimensional text analysis processing on the positive sampling data and the negative sampling data to obtain multidimensional text information of the positive sampling data and the negative sampling data, wherein the multidimensional text information comprises word information and word weight information;
and the semantic association training unit is used for inputting the word information and the word weight information of the positive sampling data and the word information and the word weight information of the negative sampling data into a first deep learning model for semantic association training to obtain the semantic association model.
In some embodiments, the word information includes at least one of: original word information, pinyin information, synonym information and error correction word information;
when the word information includes original word information, pinyin information, synonym information, and error correction word information, the multidimensional text analysis processing module 1010 includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the target entity text information to obtain original word information of the target entity text information;
a pinyin information determination unit, configured to use pinyin information of the original word information as pinyin information of the target entity text information;
a synonymy conversion processing unit, configured to perform synonymy conversion processing on the original word information to obtain synonymy information of the target entity text information;
the error correction processing unit is used for carrying out error correction processing on the original word information to obtain error correction word information of the target entity text information;
the word weight determining unit is used for determining word weights of words in the original word information, the pinyin information, the synonym information and the error correction word information based on a word weight recognition model, and the word weight recognition model comprises a model obtained by training based on word information marked with word weights;
the word weight of the word characterizes the indelibility of the word in the entity text information.
In some embodiments, the candidate entity textual information determination module 1020 includes:
a text correlation determination unit, configured to determine text correlation between the target entity text information and the entity text information in the preset entity library based on the word information of the target entity text information and the word information of the entity text information in the preset entity library;
a first candidate entity text information determining unit, configured to determine candidate entity text information of the target entity text information from a preset entity library based on the text correlation.
In some embodiments, the candidate entity textual information determination module 1020 includes:
a semantic correlation determining unit, configured to determine semantic correlation between the target entity text information and word information of the entity text information in the preset entity library;
a second candidate entity text information determining unit, configured to determine candidate entity text information of the target entity text information from a preset entity library based on the semantic correlation.
In some embodiments, the preset entity library may further include address information of entity text information, and the candidate entity text information determining module 1020 further includes:
an address information obtaining unit, configured to obtain address information of the target entity text information;
a correlation determination unit configured to determine a correlation between the target entity text information and address information of candidate entity text information;
a third candidate entity text information determining unit configured to determine a first target candidate entity text information of the target entity text information from the candidate entity text information based on the correlation of the address information;
correspondingly, the semantic association module 1030 is specifically configured to input the word information and the word weight information of the target entity text information and the first target candidate entity text information into a semantic association model for semantic association, so as to obtain associated entity text information of the target entity text information.
In some embodiments, the preset entity library may further include ranking information of entity text information, and the candidate entity text information determining module 1020 further includes:
the ranking information acquisition unit is used for acquiring ranking information of the first target candidate entity text information;
a fourth candidate entity text information determining unit configured to determine second target candidate entity text information of the target entity text information from the first target candidate entity text information based on the ranking information;
correspondingly, the semantic association module 1030 is specifically configured to input the word information and the word weight information of the target entity text information and the second target candidate entity text information into a semantic association model for semantic association, so as to obtain associated entity text information of the target entity text information.
In some embodiments, the multi-dimensional textual information may further include at least one of: word role information, word level information and word function information;
when the multi-dimensional text information includes: when the word role information, the word level information, and the word function information are included in the multidimensional text analysis processing module 1010, the multidimensional text analysis processing module may include:
the word role information determining unit is used for determining word role information of the target entity text information based on the part of speech of the words in the target entity text information;
the word level information determining unit is used for determining word level information of the target entity text information based on the structural relationship among words in the target entity text information;
and the function analysis unit is used for carrying out function analysis on the words in the target entity text information to obtain word function information of the target entity text information.
In some embodiments, the preset entity library may further include word role information, word hierarchy information, and word function information of the entity text information;
the semantic association module 1030 may be specifically configured to input word information, word weight information, word role information, word hierarchy information, and word function information of the target entity text information and the candidate entity text information into a semantic association model for semantic association, so as to obtain associated entity text information of the target entity text information.
In some embodiments, the apparatus further comprises:
the matching verification module is used for performing matching verification on the target entity text information and the address information of the associated entity text information;
correspondingly, the link entity text information determining module 1040 is further configured to, when the address information of the associated entity text information matches the address information of the target entity text information, use the associated entity text information as the link entity text information of the target entity text information.
The device and method embodiments in the device embodiment are based on the same application concept.
The embodiment of the present application provides a target entity linking device, where the target entity linking device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the target entity linking method provided in the foregoing method embodiment.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.
The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking the example of running on a server, fig. 11 is a block diagram of a hardware structure of the server of the target entity linking method provided in the embodiment of the present application. As shown in fig. 11, the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1110 (the processors 1110 may include but are not limited to Processing devices such as a microprocessor MCU or a programmable logic device FPGA), a memory 1130 for storing data, and one or more storage media 1120 (e.g., one or more mass storage devices) for storing applications 1123 or data 1122. The memory 1130 and the storage medium 1120 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 1120 may include one or more modules, each of which may include a series of instruction operations for a server. Still further, the central processor 1110 may be configured to communicate with the storage medium 1120, and execute a series of instruction operations in the storage medium 1120 on the server 1100. The server 1100 may also include one or more power supplies 1160, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1140, and/or one or more operating systems 1121, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
The input output interface 1140 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 1100. In one example, i/o Interface 1140 includes a Network adapter (NIC) that can be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 1140 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
It will be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 1100 may also include more or fewer components than shown in FIG. 11, or have a different configuration than shown in FIG. 11.
Embodiments of the present application further provide a storage medium, which may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions related to implementing a target entity linking method in the method embodiments, where the at least one instruction, the at least one program, the code set, or the set of instructions are loaded and executed by the processor to implement the target entity linking method provided in the method embodiments.
Optionally, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
As can be seen from the embodiments of the target entity linking method, device, apparatus, or storage medium provided by the present application, in the present application, multidimensional text information that can represent target entity text information from more dimensions is obtained by performing multidimensional text analysis processing on target entity text information; then, candidate entity text information is screened out from a preset entity library based on word information in the multi-dimensional text information; then, the word information and the word weight information of the target entity text information and the candidate entity text information are input into a semantic association model for semantic association, and the strength of association among the entity text information and the representation of the importance degree of each feature in each entity text information are considered simultaneously by combining the word weight when the semantic association is carried out, so that the associated entity text information of the target entity text information can be accurately determined; and then, matching verification is carried out through the target entity text information and the address information of the associated entity text information, so that the accuracy of determining the link entity text information of the target entity text information can be better ensured, and the target entity text information is successfully linked to a preset entity library. By using the technical scheme provided by the embodiment of the specification, the representation capability of the entity text information can be greatly improved, the accuracy of the determined link entity text information is further improved, and the entity link of the target entity can be successfully realized based on the link entity text information.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages or disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (14)

1. A method for linking target entities, the method comprising:
performing multi-dimensional text analysis processing on target entity text information to obtain multi-dimensional text information, wherein the multi-dimensional text information comprises word information and word weight information, and the word weight information represents the action size of the word information when the target entity text information is different from other entity text information;
determining candidate entity text information of the target entity text information from a preset entity library based on the word information, wherein the preset entity library comprises word information and word weight information of the entity text information;
inputting the word information and word weight information of the target entity text information and the word information and word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information;
and taking the associated entity text information as link entity text information of the target entity text information.
2. The method according to claim 1, wherein the entering of the word information and word weight information of the target entity text information and the word information and word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information comprises:
in an incidence matrix construction layer of the semantic association model, constructing an incidence matrix based on the word information of the target entity text information and the candidate entity text information;
determining feature vectors of the target entity text information and the candidate entity text information based on the incidence matrix in a convolution layer of the semantic incidence model;
in the pooling layer of the semantic association model, performing dimension reduction processing on the feature vectors of the target entity text information and the candidate entity text information;
performing similar fitting processing on the feature vector and the word weight of the target entity text information and the feature vector and the word weight of the candidate entity text information after the dimension reduction processing in a multilayer perceptron of the semantic association model to obtain an association score representing the association degree between the target entity text information and the candidate entity text information;
and determining associated entity text information of the target entity text information from the candidate entity text information based on the association score.
3. The method of claim 1, wherein the semantic association model comprises determining:
acquiring positive sampling data, wherein the positive sampling data comprises entity text information of a same cluster relation and/or entity text information of a main sub-relation;
acquiring negative sampling data, wherein the negative sampling data comprises entity text information with wrong address information and/or entity text information with correlation between two different entity text information meeting preset conditions;
respectively carrying out multi-dimensional text analysis processing on the positive sampling data and the negative sampling data to obtain multi-dimensional text information of the positive sampling data and the negative sampling data, wherein the multi-dimensional text information comprises word information and word weight information;
and inputting the word information and the word weight information of the positive sampling data and the word information and the word weight information of the negative sampling data into a first deep learning model for semantic association training to obtain the semantic association model.
4. The method of claim 1, wherein the word information comprises at least one of: original word information, pinyin information, synonym information and error correction word information;
when the word information includes original word information, pinyin information, synonym information and error correction word information, performing multidimensional text analysis processing on the target entity text information to obtain multidimensional text information includes:
performing word segmentation processing on the target entity text information to obtain original word information of the target entity text information;
taking the pinyin information of the original word information as the pinyin information of the target entity text information;
synonymy conversion processing is carried out on the original word information to obtain synonymy information of the target entity text information;
carrying out error correction processing on the original word information to obtain error correction word information of the target entity text information;
determining word weights of words in the original word information, the pinyin information, the synonym information and the error-correcting word information based on a word weight recognition model, wherein the word weight recognition model comprises a model obtained by training based on word information marked with word weights.
5. The method of claim 1, wherein the determining candidate entity text information for the target entity text information from a preset entity library based on the word information comprises:
determining text correlation between the target entity text information and the entity text information in the preset entity library based on the word information of the target entity text information and the word information of the entity text information in the preset entity library;
determining candidate entity text information of the target entity text information from a preset entity library based on the text correlation.
6. The method of claim 1, wherein the determining candidate entity text information for the target entity text information from a preset entity library based on the word information comprises:
determining semantic correlation between the target entity text information and word information of the entity text information in the preset entity library;
determining candidate entity text information of the target entity text information from a preset entity library based on the semantic correlation.
7. The method of claim 1, wherein the preset entity library further comprises address information of entity text information, and the method further comprises:
acquiring address information of the target entity text information;
determining the correlation of the address information of the target entity text information and the candidate entity text information;
determining first target candidate entity text information of the target entity text information from the candidate entity text information based on the correlation of the address information;
correspondingly, the step of inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information includes:
and inputting the word information and the word weight information of the target entity text information and the first target candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information.
8. The method of claim 7, wherein the preset entity library further comprises ranking information of entity text information, the method further comprising:
obtaining ranking information of the first target candidate entity text information;
determining second target candidate entity text information of the target entity text information from the first target candidate entity text information based on the ranking information;
correspondingly, the step of inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information includes:
and inputting the word information and the word weight information of the target entity text information and the second target candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information.
9. The method of claim 1, wherein the multi-dimensional text information further comprises at least one of: word role information, word level information and word function information;
when the multi-dimensional text information includes: when the word role information, the word level information and the word function information are obtained, the multi-dimensional text analysis processing is carried out on the target entity text information, and the obtaining of the multi-dimensional text information comprises the following steps:
determining word role information of the target entity text information based on the part of speech of the words in the target entity text information;
determining word level information of the target entity text information based on a structural relationship among words in the target entity text information;
and performing function analysis on words in the target entity text information to obtain word function information of the target entity text information.
10. The method according to claim 9, wherein the preset entity library further includes word role information, word hierarchy information, and word function information of entity text information;
the step of inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information comprises the following steps:
and inputting the word information, the word weight information, the word role information, the word hierarchy information and the word function information of the target entity text information and the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information.
11. The method of claim 1, wherein prior to using the associated entity textual information as the linking entity textual information for the target entity textual information, the method further comprises:
matching and verifying the target entity text information and the address information of the associated entity text information;
and when the address information of the associated entity text information is matched with the address information of the target entity text information, executing the step of taking the associated entity text information as the link entity text information of the target entity text information.
12. A target entity linking apparatus, the apparatus comprising:
the multidimensional text analysis processing module is used for carrying out multidimensional text analysis processing on target entity text information to obtain multidimensional text information, wherein the multidimensional text information comprises word information and word weight information, and the word weight information represents the action size of the word information when the target entity text information is different from other entity text information;
the candidate entity text information determining module is used for determining candidate entity text information of the target entity text information from a preset entity library based on the word information, wherein the preset entity library comprises the word information and the word weight information of the entity text information;
the semantic association module is used for inputting the word information and the word weight information of the target entity text information and the word weight information of the candidate entity text information into a semantic association model for semantic association to obtain associated entity text information of the target entity text information;
and the link entity text information determining module is used for taking the associated entity text information as link entity text information of the target entity text information.
13. A target entity linking device, comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes or set of instructions, which is loaded and executed by the processor to implement the target entity linking method according to any one of claims 1 to 11.
14. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the target entity linking method according to any one of claims 1 to 11.
CN201910388403.0A 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium Active CN110147421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910388403.0A CN110147421B (en) 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910388403.0A CN110147421B (en) 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110147421A CN110147421A (en) 2019-08-20
CN110147421B true CN110147421B (en) 2022-06-21

Family

ID=67595091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910388403.0A Active CN110147421B (en) 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110147421B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929035B (en) * 2019-11-27 2022-09-30 中国传媒大学 Information prediction method and system for film and television works
CN111523326B (en) 2020-04-23 2023-03-17 北京百度网讯科技有限公司 Entity chain finger method, device, equipment and storage medium
CN111737430B (en) * 2020-06-16 2024-04-05 北京百度网讯科技有限公司 Entity linking method, device, equipment and storage medium
CN112115235A (en) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 Entity attribute data query and configuration method, device and server
US20220300799A1 (en) * 2021-03-16 2022-09-22 International Business Machines Corporation Neuro-Symbolic Approach for Entity Linking
CN114970491B (en) * 2022-08-02 2022-10-04 深圳市城市公共安全技术研究院有限公司 Text connectivity judgment method and device, electronic equipment and storage medium
CN117014382B (en) * 2023-10-07 2023-12-29 北京中科网芯科技有限公司 Data stream processing system and method based on convergence and distribution equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN106156196A (en) * 2015-04-22 2016-11-23 富士通株式会社 Extract the apparatus and method of text feature
CN107480125A (en) * 2017-07-05 2017-12-15 重庆邮电大学 A kind of relational links method of knowledge based collection of illustrative plates
CN108509517A (en) * 2018-03-09 2018-09-07 东南大学 A kind of streaming topic evolution tracking towards real-time news content
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482876B (en) * 2008-12-11 2011-11-09 南京大学 Weight-based link multi-attribute entity recognition method
JP6101563B2 (en) * 2013-05-20 2017-03-22 株式会社日立製作所 Information structuring system
EP3144822A1 (en) * 2015-09-21 2017-03-22 Tata Consultancy Services Limited Tagging text snippets
CN106649853A (en) * 2016-12-30 2017-05-10 儒安科技有限公司 Short text clustering method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN106156196A (en) * 2015-04-22 2016-11-23 富士通株式会社 Extract the apparatus and method of text feature
CN107480125A (en) * 2017-07-05 2017-12-15 重庆邮电大学 A kind of relational links method of knowledge based collection of illustrative plates
CN108509517A (en) * 2018-03-09 2018-09-07 东南大学 A kind of streaming topic evolution tracking towards real-time news content
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Veysel Yücesoy 等.Effect of cooccurance weighting to English word embeddings.《Signal Processing and Communications Applications Conference》.2017, *
马芳 等.中文科技期刊论文多标签分类研究.《图书情报导刊》.2019,第26-32页. *

Also Published As

Publication number Publication date
CN110147421A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110147421B (en) Target entity linking method, device, equipment and storage medium
CN109815308B (en) Method and device for determining intention recognition model and method and device for searching intention recognition
CN107436875B (en) Text classification method and device
US10120861B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN106709040B (en) Application search method and server
CN110609902A (en) Text processing method and device based on fusion knowledge graph
CN105069103A (en) Method and system for APP search engine to utilize client comment
CN112115232A (en) Data error correction method and device and server
CN114329225B (en) Search method, device, equipment and storage medium based on search statement
CN110442702A (en) Searching method, device, readable storage medium storing program for executing and electronic equipment
CN111488468A (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN116848490A (en) Document analysis using model intersection
US10198497B2 (en) Search term clustering
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN110147494A (en) Information search method, device, storage medium and electronic equipment
CN113988157A (en) Semantic retrieval network training method and device, electronic equipment and storage medium
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN113434767A (en) UGC text content mining method, system, device and storage medium
CN110019714A (en) More intent query method, apparatus, equipment and storage medium based on historical results
CN111858860B (en) Search information processing method and system, server and computer readable medium
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN116644148A (en) Keyword recognition method and device, electronic equipment and storage medium
CN113807102B (en) Method, device, equipment and computer storage medium for establishing semantic representation model
CN111339287B (en) Abstract generation method and device
CN110781283B (en) Chain brand word stock generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant