CN110147421A - A kind of target entity link method, device, equipment and storage medium - Google Patents

A kind of target entity link method, device, equipment and storage medium Download PDF

Info

Publication number
CN110147421A
CN110147421A CN201910388403.0A CN201910388403A CN110147421A CN 110147421 A CN110147421 A CN 110147421A CN 201910388403 A CN201910388403 A CN 201910388403A CN 110147421 A CN110147421 A CN 110147421A
Authority
CN
China
Prior art keywords
information
text information
word
entity text
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910388403.0A
Other languages
Chinese (zh)
Other versions
CN110147421B (en
Inventor
吴坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910388403.0A priority Critical patent/CN110147421B/en
Publication of CN110147421A publication Critical patent/CN110147421A/en
Application granted granted Critical
Publication of CN110147421B publication Critical patent/CN110147421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of target entity link method, device, equipment and storage mediums, which comprises to target entity text information carry out various dimensions text analyzing processing, obtain include word information and word weight information various dimensions text information;Candidate's entity text information is determined from default entity library based on word information, default entity library includes the word information and word weight information of entity text information;The word information of target entity text information, the word information of word weight information and candidate entity text information, word weight information input semantic relationship model are subjected to semantic association, obtain associated entity text information;Using associated entity text information as the link entity text information of target entity text information.The characterization ability to entity text information can be improved using the technical solution that the application passes through, and then improve the accuracy of determining link entity text information, the entity link of target entity can be successfully realized based on the link entity text information.

Description

A kind of target entity link method, device, equipment and storage medium
Technical field
This application involves field of computer technology more particularly to a kind of target entity link method, device, equipment and storages Medium.
Background technique
POI (Point of interest, point of interest) is the geography information form of expression collected in GIS-Geographic Information System, It can be a solitary building, a businessman, a mailbox or a bus station etc..Each POI entity attributes information is general It may include entity text information and address information.POI entity link refers to the POI entity text information chain in the text of address The entity text information into POI entity library, and then the process of accurate address information is obtained, it is examined in natural language processing, information The fields such as rope have a wide range of applications.
Existing POI entity link technology mostly uses greatly the scheme for calculating text similarity and candidate sequence, specifically, can According to the participle information architecture keyword of target entity text information;Then, related entities text envelope is recalled by keyword Breath;Then, the similarity based on text between target entity text information and related entities text information is ranked up from high to low; Link entity text information of the sequence near preceding related entities text information as target entity text information is chosen, and then is obtained Get the address information of target entity text information.But between only only accounting for entity text information in above-mentioned existing scheme Text similarity often can not accurately judge whether entity text information corresponds to same entity, will lead to link in this way The problem of mistake, there is no very good solution entity ambiguities, accuracy rate is lower.Accordingly, it is desirable to provide more reliable or more effective Scheme.
Summary of the invention
This application provides a kind of target entity link method, device, equipment and storage medium, can be improved to entity text The characterization ability of this information, and then the accuracy of determining link entity text information is improved, it is based on the link entity text envelope Breath can successfully realize the entity link of target entity.
On the one hand, this application provides a kind of target entity link methods, which comprises
Various dimensions text analyzing processing is carried out to target entity text information, obtains various dimensions text information, the multidimensional Spending text information includes word information and word weight information;
The candidate entity text envelope of the target entity text information is determined from default entity library based on the word information Breath, the default entity library includes the word information and word weight information of entity text information;
By the word information of the target entity text information, the word letter of word weight information and the candidate entity text information Breath, word weight information input semantic relationship model carry out semantic association, obtain the associated entity of the target entity text information Text information;
Using the associated entity text information as the link entity text information of the target entity text information.
On the other hand a kind of target entity linked set is provided, described device includes:
Various dimensions text analyzing processing module, for carrying out various dimensions text analyzing processing to target entity text information, Various dimensions text information is obtained, the various dimensions text information includes word information and word weight information;
Candidate entity text information determining module, for determining the target from default entity library based on the word information The candidate entity text information of entity text information, the default entity library include the word information and word weight of entity text information Information;
Semantic association module, for by the word information, word weight information and the candidate of the target entity text information Word information, the word weight information input semantic relationship model of entity text information carry out semantic association, obtain the target entity The associated entity text information of text information;
Entity text information determining module is linked, for using the associated entity text information as the target entity text The link entity text information of this information.
On the other hand a kind of target entity chained device is provided, the equipment includes processor and memory, described to deposit Be stored at least one instruction, at least a Duan Chengxu, code set or instruction set in reservoir, at least one instruction, it is described extremely A few Duan Chengxu, the code set or instruction set are loaded by the processor and are executed to realize that above-mentioned target entity such as links Method.
On the other hand a kind of computer readable storage medium is provided, at least one finger is stored in the storage medium Enable, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or Instruction set is loaded by processor and is executed to realize such as above-mentioned target entity link method.
Target entity link method, device, equipment and storage medium provided by the present application, have the following technical effect that
For the application by carrying out various dimensions text analyzing processing to target entity text information, obtaining can be from more dimension It spends to characterize the various dimensions text information of target entity text information;Then, based on the word information in various dimensions text information from The candidate entity text information filtered out in default entity library;Then, by target entity text information and candidate entity text envelope The word information of breath, word weight information input semantic relationship model carry out semantic association, by when carrying out semantic association, bluebeard compound Weight has combined associated degree of strength between entity text information and to each feature in each entity text information The characterization of significance level;The link entity text information that can accurately determine target entity text information, successfully by target Entity text information is linked to default entity library.It can be greatly improved using the technical solution that this specification embodiment provides to reality The characterization ability of body text information, and then the accuracy of determining link entity text information is improved, based on the link entity text This information can successfully realize the entity link of target entity.
Detailed description of the invention
It in ord to more clearly illustrate embodiments of the present application or technical solution in the prior art and advantage, below will be to implementation Example or attached drawing needed to be used in the description of the prior art are briefly described, it should be apparent that, the accompanying drawings in the following description is only It is only some embodiments of the present application, for those of ordinary skill in the art, without creative efforts, It can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is a kind of schematic diagram of entity link system provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of target entity link method provided by the embodiments of the present application;
Fig. 3 is that a kind of pair of target entity text information provided by the embodiments of the present application carries out various dimensions text analyzing processing, Obtain the flow diagram of various dimensions text information method;
Fig. 4 is that another kind provided by the embodiments of the present application carries out at various dimensions text analyzing target entity text information Reason, obtains the flow diagram of various dimensions text information method;
Fig. 5 is a kind of flow diagram of semantic relationship model training method provided by the embodiments of the present application;
Fig. 6 is provided by the embodiments of the present application a kind of by the word information of the target entity text information, word weight information Semantic association is carried out with word information, the word weight information input semantic relationship model of the candidate entity text information, obtains institute State the flow diagram of the associated entity text information method of target entity text information;
Fig. 7 be the embodiment of the present application be based on semantic relationship model carry out semantic association, obtain target entity text information with A kind of exemplary schematic diagram of the relationship score of candidate entity text information;
Fig. 8 is the flow diagram of another target entity link method provided by the embodiments of the present application;
Fig. 9 is a kind of schematic diagram of a scenario of entity link provided by the embodiments of the present application;
Figure 10 is a kind of structural schematic diagram of target entity linked set provided by the embodiments of the present application;
Figure 11 is a kind of structural schematic diagram of server of the application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, those of ordinary skill in the art without making creative work it is obtained it is all its His embodiment, shall fall in the protection scope of this application.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, product or server need not limit In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce The other step or units of product or equipment inherently.
Referring to Fig. 1, Fig. 1 is a kind of schematic diagram of entity link system provided by the embodiments of the present application, as shown in Figure 1, The system at least may include entity library building module and entity link module.
Specifically, the system may include an independently operated server in this specification embodiment, or distribution Formula server, or the server cluster being made of multiple servers.
Specifically, it includes a large amount of entity texts that the entity library building module, which can be used for constructing, in this specification embodiment The text index of this information and the entity library (i.e. default entity library) of spatial index;Specifically, the spatial index may include The spatial index of the address information of a large amount of entity text informations (POI entity text information) and entity text information, specifically, institute Stating address information can include but is not limited to road net data, small towns data, village data and door location library data etc..The text This index may include the text index between a large amount of entity text informations and the word information of a large amount of entity text informations, specifically , the word information can include but is not limited to the keyword after entity text information full name, alias, participle, phonetic, synonymous The information such as word, error correction term.In addition, text index can also include a large amount of entity text informations and a large amount of entity text information words Weight, word Role Information, word hierarchical information and word functional information text index.
In practical applications, the entity library can also include the ranking rope of a large amount of entity text informations and its ranking information Draw.Specifically, the ranking information of entity text information may include the reaction such as volumes of searches, amount of thumbing up based on entity text information Ranking determined by the information of entity text information temperature, in general, the temperature of entity text information is higher, the entity text envelope The ranking of breath is more forward.
Specifically, in this specification embodiment, the entity link module can be used for target entity text information into Row various dimensions text analyzing processing;Then, based on the text index in various dimensions text analyzing treated information and entity library Information recalls the candidate entity text information of target entity text information;Then, entity associated is carried out in conjunction with semantic relationship model, The associated entity text information of target entity text information is determined from candidate entity text information;It then, can be with binding entity Spatial index information in library is associated entity text information and matches verifying with target entity text message address;It is logical in verifying Later, can be using associated entity text information as the link entity text information of target entity text information, and then being based on should Link the entity link that entity text information realizes target entity text information.
A kind of specific embodiment of target entity link method of the application introduced below, Fig. 2 are that the embodiment of the present application provides A kind of target entity link method flow diagram, present description provides as described in embodiment or flow chart method behaviour Make step, but based on routine or may include more or less operating procedure without creative labor.It is arranged in embodiment The step of act, sequence was only one of numerous step execution sequence mode, does not represent and unique executes sequence.In practice System or server product when executing, can execute or parallel execute according to embodiment or method shown in the drawings sequence (such as environment of parallel processor or multiple threads).It is specific as shown in Fig. 2, the method may include:
S201: various dimensions text analyzing processing is carried out to target entity text information, obtains various dimensions text information.
In this specification embodiment, target entity text information can be the text information of a certain target entity;This explanation In book embodiment, the various dimensions text information may include word information and word weight information.Specifically, the word information is at least Including one of the following: former word information, Pinyin information, synonymous word information, error correction word information;
When the word information includes former word information, Pinyin information, synonymous word information and error correction word information, as shown in figure 3, Described to carry out various dimensions text analyzing processing to target entity text information, obtaining various dimensions text information may include:
S2011: word segmentation processing is carried out to the target entity text information, obtains the original of the target entity text information Word information.
In this specification embodiment, target entity text information can be carried out at participle with unified with nature Language Processing algorithm It manages, multiple words after obtained participle can be used as the former word information of the target entity text information.
In a specific embodiment, such as target entity text information are as follows: Chinese technology business mansion, correspondingly, Former word information after word segmentation processing may include China, technology, transaction, the word of mansion four.
In addition, it should be noted that, can also include in some embodiments, in the former word information after word segmentation processing single Word.
S2013: using the Pinyin information of the former word information as the Pinyin information of the target entity text information.
In this specification embodiment, the Pinyin information of each word in available original word information, to obtain target entity text The Pinyin information of this information.
In a specific embodiment, for example, the former word information of target entity text information be China, technology, transaction, Mansion, correspondingly, the Pinyin information of target entity text information may include: zhongguo, jishu, jiaoyi, dasha.
S2015: synonymous conversion process is carried out to the former word information, obtains the synonym of the target entity text information Information.
In this specification embodiment, the synonym letter of target entity text information can be obtained in conjunction with synonymous transformation model Breath.Specifically, the former predetermined synonymous transformation model of word information input can be subjected to synonymous conversion process, it will be described Word is converted to synonym in former word information, obtains the synonymous word information of target entity text information.
Specifically, the synonymous transformation model includes being determined using following manner:
1) word to be trained is obtained to data;
2) synonymous conversion is carried out to the second deep learning model to data based on institute's predicate to train, obtain synonymous modulus of conversion Type.
Specifically, word to be trained may include multipair be labeled with semantic or different semantic words pair to data.
The second deep learning model in this specification embodiment can include but is not limited to convolutional neural networks, logic is returned Return or recurrent neural network etc..
S2017: correction process is carried out to the former word information, obtains the error correction term letter of the target entity text information Breath.
In this specification embodiment, carrying out correction process to former word information can include but is not limited to combine in former word information The font that the phonetic of word carries out word in error correction or former word information carries out error correction.
In a specific embodiment, it by taking the phonetic for combining word in former word information carries out error correction as an example, can be based on Certain rule carries out error correction, specifically, can be by the rule of refinement phonetic entry, to the pinyin string executing rule of user's input It examines to obtain corresponding error correction result.Specifically, for example carrying out correction process to village, Huanglong, error correction term is obtained: the village Huang Long.
S2019: word-based weight identification model determines the former word information, Pinyin information, synonymous word information and error correction term The word weight of word in information.
In this specification embodiment, institute's predicate weight identification model includes trained based on the word information for being labeled with word weight The model arrived.Specifically, the word information for being largely labeled with word weight can be acquired;Based on being labeled with the word information of word weight to Three deep learning models carry out word weight recognition training, obtain word weight identification model.It is subsequent, a word information is input to word Weight identification model can be obtained by the word weight of word in the word information.
Third deep learning model in this specification embodiment can include but is not limited to convolutional neural networks, logic is returned Return or recurrent neural network etc..
Specifically, may include multiple words for being labeled with word weight in the word information for being labeled with word weight, specifically , the word weight of word can characterize the word in entity text information can not be Deletional.Specifically, word is in entity text The word that Deletional can not can reflect in this information is different from other entity text informations Shi Suoqi in entity text information Effect size.
In a specific embodiment, it is assumed that word information includes " Zhong Guan-cun ", " street ", " No. 46 " three words;Phase It answers, the frequency that " Zhong Guan-cun ", " street " two words often occur than " No. 46 " here is high, is come in a manner of existing word frequency true Determine word weight, the word weight of the word weight ratio " No. 46 " of two words in " Zhong Guan-cun ", " street " is big;But from word in entity text Deletional it can not consider in information, i.e., " Zhong Guan-cun ", " street " " No. 46 " are for entity text information " Zhongguancun Street 46 Number " role size considers when being different from other entity text informations, " No. 46 " distinguish in " Zhongguancun Street 46 " Role is greater than " Zhong Guan-cun " and " street " when other entity text informations.Specifically, in this specification embodiment, it can The word weight of " Zhong Guan-cun ", " street " " No. 46 " is respectively labeled as 0.66,0.77,0.94.
In this specification embodiment, word is in entity text envelope in the word information and characterization word information by obtaining various dimensions Word weight that can not be Deletional in breath can preferably characterize target entity text information, and then improve the link of subsequent determination The accuracy of entity text information.
In some embodiments, the various dimensions text information at least can also include one of the following: word Role Information, word Hierarchical information and word functional information.Specifically, as shown in figure 4, when the various dimensions text information includes: word Role Information, word It is described that various dimensions text analyzing processing is carried out to target entity text information when hierarchical information and word functional information, obtain multidimensional Spending text information may include:
S20111: the part of speech based on word in the target entity text information determines the target entity text information Word Role Information.
In this specification embodiment, different words often has different parts of speech in target entity text information, specifically , the characteristics of part of speech of word can refer to using word the basis as Part of Speech Division.
In a specific embodiment, such as target entity text information is Chinese technology business mansion, correspondingly, mesh The word role for marking " China " in " China ", " technology ", " transaction " and " mansion " four words in entity text information is state Family's name, the word role of " technology " are business noun, and the word role of " transaction " is business verb, and the word role of " mansion " is classification Word.
S20113: based on the structural relation between word in the target entity text information, the target entity text is determined The word hierarchical information of this information.
In this specification embodiment, often there is certain structural relation in target entity text information between word, such as There are master-slave relationships between word, correspondingly, principal and subordinate's hierarch recognition is carried out between the structural relation word, so that it is determined that target entity Word hierarchical information in text information, such as Chinese technology business mansion Building B, " Building B " be " Chinese technology business mansion " subordinate at Point, correspondingly, " Chinese technology business mansion " can based on level, " Building B " subordinate level.
S20115: functional analysis is carried out to word in the target entity text information, obtains the target entity text The word functional information of information.
In this specification embodiment, carrying out functional analysis to word in target entity text information may include that will mark entity Word word is mapped in preset function grouping in text information, specifically, preset function grouping may include: core word, i.e., it is fixed The words such as proper noun, quantity, the orientation of position target entity text information uniqueness;Classifier, i.e. positioning target entity text envelope Breath is the words such as what business, classification;Adjunct word, that is, additional information (such as branch, master of the target entity text information that remarks additionally Point etc.);Other words, i.e. word in target entity text information other than core word, classifier and adjunct word.
In this specification embodiment, by the word Role Information, word hierarchical information and the word that obtain target entity text information Functional information can characterize target entity text information from more dimensions, can preferably characterize target entity text envelope Breath, and then improve the accuracy of the subsequent link entity text information determined.
S203: the candidate entity text of the target entity text information is determined from default entity library based on the word information This information.
In this specification embodiment, the default entity library may include the word information and word weight letter of entity text information Breath.Specifically, the word information of entity text information and the determination step of word weight may refer to trade company's target in default entity library The word information of entity text information and the determination step of word weight, details are not described herein.
In practical applications, the word information and word weight preset in entity library are contained in the text in default entity library In index, i.e. word information and word weight is corresponding with entity text information.
Specifically, the candidate entity of the target entity text information is determined from default entity library based on the word information Text information may include:
1) word of entity text information in the word information and the default entity library based on the target entity text information Information determines the text relevant of entity text information in the target entity text information and the default entity library.
In this specification embodiment, the text relevant may include being able to reflect target entity text information and presetting What the characterization characterization of text degree of correlation or trend in entity library between entity text information was quantified by default rule One particular value;When the text degree of correlation in target entity text information and default entity library between entity text information is better, Text relevant is bigger, and the particular value is bigger;Conversely, when entity text information in target entity text information and default entity library Between text degree of correlation it is poorer, text relevant is smaller, and the particular value is smaller.
In this specification embodiment, the text of entity text information in target entity text information and default entity library is determined Correlation can include but is not limited to using BM25 algorithm, specifically, can be by the word information of target entity text information (here Word information can include but is not limited at least one of former word information, Pinyin information, synonymous word information and error correction word information) In each word regard q asi;Then, by the word information of entity text information in default entity library, (word information here may include Entity text information full name) regard d as, calculate each qiWith the Relevance scores of d, finally, by qiRelative to entity text information The Relevance scores of word information d are weighted summation, so that it is related to entity text information d's to obtain target entity text information Property score (i.e. text relevant).
The general formula of BM25 algorithm is as follows:
Wherein, Q indicates target entity text information, qiIndicate i-th of word in target entity text information;D is indicated The word information of any entity text information in default entity library;WiIndicate the power of i-th of word in target entity text information Weight;R(qi, d) indicate i-th word and entity text information word information d Relevance scores.
In this specification embodiment, WiIt can be determined in conjunction with reverse document-frequency (TF-IDF) algorithm of word frequency-.
In this specification embodiment, each word q in target entity text informationiWith entity text envelope in default entity library Relevance scores R (the q of breathi, d) and it can be calculated in conjunction with following two formula:
Wherein, k1, k2, b is regulatory factor, is configured generally according to practical situations, such as k1=2, k2=10, b =0.75.Fi is the frequency of occurrences of the qi in d, qfiFor qiThe frequency of occurrences in target entity text information, dl are default real The length of the word information d of entity text information in body library, avgdl are the length of the word information of entity text information in default entity library Degree.
In this specification embodiment, it is contemplated that because entity text information is mostly short text, the word of entity text information Length difference between information is little, in former formula by the length of the word information d of entity text information in default entity library divided by Avgdl (average length of the word information of all entity text informations) is nonsensical, therefore directly by the avgdl in existing formula Parameter is set as the length of the word information of entity text information in default entity library, and then simplifies calculating process, improves and calculates effect Rate.
2) the candidate entity text of the target entity text information is determined from default entity library based on the text relevant This information.
In this specification embodiment, after obtaining text relevant, it can will believe with the word of target entity text information The text relevant of breath is more than or equal to time of the entity text information of the first preset threshold as the target entity text information Select entity text information.
In further embodiments, after obtaining text relevant, can according to text relevant size by greatly to It is small to be ranked up, the entity text information of the in the top first default position is real as the candidate of the target entity text information Body text information.
Described above is the scheme for carrying out candidate entity text information selection in conjunction with text relevant, in other realities It applies in example, can be combined with the selection that semantic dependency carries out candidate entity text information, specifically, described believed based on institute's predicate Breath determines that the candidate entity text information of the target entity text information may include: from default entity library
1) language of the word information of entity text information in the target entity text information and the default entity library is determined Adopted correlation.
In this specification embodiment, when determining semantic dependency, the target entity text information and the default reality The word information of entity text information can include but is not limited to former word information, Pinyin information, synonymous word information and error correction in body library At least one of word information.Specifically, the step of determining semantic dependency can be such that
1) term vector of the word information of entity text information in target entity text information and default entity library is determined;
Specifically, determining that the mode of term vector can include but is not limited to combine Word2vector model here.
2) similarity between term vector is calculated.
In this specification embodiment, the semanteme characterized between word by the similarity between the term vector of word information is related Property.Specifically, the similarity between term vector here can include but is not limited to COS distance between term vector, Euclidean distance, Manhatton distance etc..
In this specification embodiment, the semantic dependency may include being able to reflect target entity text information and presetting Semantic degree of correlation or trend in entity library between entity text information;When target entity text information is preset in entity library in fact Semantic degree of correlation between body text information is better, and the similarity between term vector is bigger, and semantic dependency is bigger;Conversely, working as Semantic degree of correlation in the word information of target entity text information and default entity library between entity text information is poorer.Word to Similarity between amount is smaller, and semantic dependency is smaller;
In addition, each dimension can be sought respectively when word information includes information (the i.e. a variety of word information) of various dimensions Then the term vector of multiple dimensions is weighted and averaged by the term vector of word information, and based on target entity text information and in advance If the term vector in entity library after the weighted average of entity text information carries out real in mark entity text information and default entity library The calculating of body text information similarity, and then obtain in target entity text information and default entity library between entity text information Semantic dependency.
2) the candidate entity text of the target entity text information is determined from default entity library based on the semantic dependency This information.
In this specification embodiment, after obtaining semantic dependency, it can will believe with the word of target entity text information The semantic dependency of breath is more than or equal to time of the entity text information of the second preset threshold as the target entity text information Select entity text information.
In further embodiments, after obtaining semantic dependency, can according to semantic dependency size by greatly to It is small to be ranked up, the entity text information of the in the top second default position is real as the candidate of the target entity text information Body text information.
In practical applications, entity text information often corresponds to an address information, correspondingly, in other embodiments In, it can be combined with address information to carry out the selection of candidate entity text information.Correspondingly, the default entity library can be with Address information including entity text information, the method can also include:
1) address information of the target entity text information is obtained.
2) correlation of the address information of the target entity text information and candidate entity text information is determined.
In this specification embodiment, the correlation between the address information may include being able to reflect target entity text envelope Breath and the characterization of the address information matching degree in default entity library or trend characterize quantified by default rule it is one specific Value;When matching degree is better between address information, the correlation between address information is bigger, and the particular value is bigger;Conversely, when address is believed Matching degree is poorer between breath, and the correlation between address information is smaller, and the particular value is smaller.
Specifically, the address information can include but is not limited to road net data, small towns data, village data, Yi Jimen Location library data etc.;Correlation in this specification embodiment between address information may include to two address information Road netting indexs According to, small towns data, village data and door location library data etc. compared two-by-two, have a Xiang Xiangtong, the correlation of address information Corresponding particular value adds 1, conversely, different, particular value corresponding to the correlation of address information is constant.
3) the target entity text is determined from the candidate entity text information based on the correlation of the address information First object candidate's entity text information of this information.
It, can will be with target entity text information after obtaining the correlation of address information in this specification embodiment Address information correlation be more than or equal to third predetermined threshold value address information corresponding to entity text information be used as described in The candidate entity text information of target entity text information.
In addition, it should be noted that, in practical applications, the address information in default entity library is contained in default real In spatial index in body library, i.e., address information is corresponding with entity text information.
It in further embodiments, can be according to the correlation of address information after obtaining the correlation of address information Size it is descending be ranked up, using third in the top preset position entity text information as the target entity text The candidate entity text information of information.
In further embodiments, the default entity library further includes produced by the temperature information based on entity text information Entity text information ranking information, the ranking information of entity text information often characterizes the important journey of the entity text information Degree avoids the higher time of significance level correspondingly, can be combined with ranking information to carry out the selection of candidate entity text information Entity is selected to be not filtered.Specifically, the method can also include:
1) ranking information of the first object candidate entity text information is obtained.
Specifically, the determination of the ranking information of first object candidate entity text information here is in combination with above-mentioned related step Suddenly, details are not described herein.
2) the target entity text is determined from the first object candidate entity text information based on the ranking information The second target candidate entity text information of this information;
In this specification embodiment, after obtaining ranking information, it can be ranked up from front to back according to ranking information, Using the entity text information of the 4th default position in the top as the candidate entity text information of the target entity text information.
In addition, it should be noted that, in practical applications, the ranking information in default entity library is contained in default real In ranking index in body library, i.e., ranking information is corresponding with entity text information.
In this specification embodiment, the first preset threshold, the second preset threshold, third predetermined threshold value, the first default position, Two default positions, third are preset position and the 4th default position and can be configured in conjunction with practical application.
In further embodiments, the target entity text envelope is being determined from default entity library based on the word information Before the candidate entity text information of breath, it can be combined with fuzzy matching algorithm and word information from default entity library and recall part reality Body text information, to reduce subsequent calculation amount.
Correspondingly, the candidate for determining the target entity text information from default entity library based on the word information Entity text information may include determining the target entity text envelope from part entity text information based on the word information The candidate entity text information of breath.
In this specification embodiment, believed by text relevant, semantic dependency, the correlation of address information and ranking The screening that multi-angle carries out the candidate entity text information of target entity text information is ceased, the candidate filtered out can be greatly improved Relevance between entity text information and target entity text information, and then guarantee the subsequent link entity text information determined Accuracy.
It, in practical applications can be in addition, it should be noted that, the scheme of above-mentioned a variety of determining candidate entity text informations It is not limited in above-mentioned processing sequence, in practical applications, such as can also be carried out before the scheme based on address information Scheme based on ranking information, to determine candidate entity text information;It can also be with kinds of schemes in conjunction with determining candidate entity text This information.
S205: by the word information of the target entity text information, word weight information and the candidate entity text information Word information, word weight information input semantic relationship model carry out semantic association, obtain the pass of the target entity text information Join entity text information.
In this specification embodiment, can train in advance can carry out semantic association to two entity text informations (identifying whether two entity text informations are same entity text information) semantic relationship model, this specification embodiment In, the semantic relationship model may include the model obtained based on the word information training for being provided with word weight.This specification is real Apply the first deep learning model employed in the training process of a semantic relationship model can include but is not limited to logistic regression, Deep semantic Matching Model (MatchPyramid) etc..Specifically, by taking deep semantic Matching Model (MatchPyramid) as an example, As shown in figure 5, the semantic relationship model includes being determined using following manner:
S2051: positive sampled data is obtained.
In this specification embodiment, positive sampled data may include that the entity text information of same cluster relationship and/or boss are closed The entity text information of system.Specifically, the entity text information of the same cluster relationship may include same entity text information Different text informations, such as Beijing University and Peking University.Specifically, the entity text information of boss's relationship may include corresponding to The text information of the different levels of same entity text information, such as library, Beijing University (son) and Beijing University (master).
S2053: negative sampled data is obtained.
In this specification embodiment, negative sampled data may include the entity text information and/or two of address information mistake Correlation between two different entity text informations meets the entity text information of preset condition.
Specifically, the entity text information that the correlation between different entity text informations meets preset condition two-by-two can be with Degree of correlation including two different entity text informations is greater than the degree of setting.
S2055: various dimensions text analyzing processing is carried out to the positive sampled data and the negative sampled data respectively, is obtained The various dimensions text information of the positive sampled data and the negative sampled data, the various dimensions text information include word information and Word weight information.
Specifically, various dimensions text analyzing processing here may refer to the above-mentioned various dimensions to target entity text information The correlation step of text analyzing, details are not described herein.
S2057: by the word information of the positive sampled data and word weight information and the word information of the negative sampled data The first deep learning model is inputted with word weight information and carries out semantic association training, obtains the semantic relationship model.
In a specific embodiment, the first deep learning model may include incidence matrix structure layer, convolutional layer, pond Change layer and multilayer perceptron.
Wherein, the incidence matrix structure layer is used to be based on two affiliated partners (positive sampled data and negative sampled data) Word information constructs incidence matrix;
The incidence matrix that the convolutional layer can be used for constructing layer building based on correlation matrix determines corresponding feature vector;
The pond layer can be used for carrying out dimension-reduction treatment to the feature vector that convolutional layer exports;
The multilayer perceptron can be used for carrying out feature vector, the bluebeard compound weight after the layer dimension-reduction treatment of pond similar Process of fitting treatment obtains the relationship score of correlation degree between two affiliated partners of characterization.
Specifically, by training data, (word information and word of positive sampled data are weighed in semantic relationship model training process The word information and word weight of weight and negative sampled data) it is input to the first deep learning model, incidence matrix structure layer is based on just Whether two entity text informations is consistent in sampled data, carries out assignment to incidence matrix, consistent is 1, inconsistent to be 0;Then, by convolutional layer, pond layer processing after, the feature of each word in the word information of available entity text information Vector;Then, it is carried out in multilayer perceptron in the word information for carrying out two entity text informations based on feature vector similar quasi- When closing processing, i.e., (similarity here can include but is not limited to remaining the similarity between the feature vector of two word informations of calculating Chordal distance, Euclidean distance, manhatton distance etc.) when, the word weight of each feature vector corresponding word in word information can be combined, The relationship score of output is that training data is positive and samples the Probability p (number of the p between 0-1) of notebook data, and positive sampled data Make y be 1 and 0 respectively with the sample label of negative sampled data, the loss of sample label y and Probability p is defined as (y-p) ^2, phase It answers, according to the available error e rror of (y-p) ^2 in training process;Using gradient descent method, each threshold value is updated, again Training.The threshold value modified can make the error between the Probability p and sample label y of model output next time become smaller, when When the error is less than certain value, the first current deep learning model can be made into row semantic relationship model.
In this specification embodiment, by way of word weight when progress semantic relationship model is established, it ensure that simultaneously Take into account associated degree of strength between affiliated partner and to feature each in each affiliated partner (each word in word information) The characterization of significance level.
Specifically, as shown in fig. 6, by the word information, word weight information and the candidate of the target entity text information Word information, the word weight information input semantic relationship model of entity text information carry out semantic association, obtain the target entity The associated entity text information of text information may include:
S2059: in the incidence matrix structure layer of the semantic relationship model, based on the target entity text information and The word information of candidate's entity text information constructs incidence matrix.
S20511: in the convolutional layer of the semantic relationship model, the target entity is determined based on the incidence matrix The feature vector of text information and the candidate entity text information.
S20513: in the pond layer of the semantic relationship model, to the target entity text information and the candidate The feature vector of entity text information carries out dimension-reduction treatment.
S20515: in the multilayer perceptron of the semantic relationship model, to the target entity text after dimension-reduction treatment At the similar fitting of the feature vector of this information, word weight and the feature vector of the candidate entity text information, the progress of word weight Reason obtains the relationship score for characterizing correlation degree between the target entity text information and the candidate entity text information.
S20517: the target entity text envelope is determined from the candidate entity text information based on the relationship score The associated entity text information of breath.
It, can be using the highest candidate entity text information of relationship score as target entity text in this specification embodiment The associated entity text information of information.
In a specific embodiment, as shown in fig. 7, Fig. 7 is that the embodiment of the present application is based on semantic relationship model progress language Justice association obtains a kind of exemplary schematic diagram of the relationship score of target entity text information and candidate entity text information.
In addition, it should be noted that, Fig. 7 is only a kind of example of the first deep learning model, in practical applications, the One deep learning model may include more layers, such as convolutional layer is three layers.
Furthermore, it is necessary to which specification can be to the word of each dimension when word information includes the word information of multiple dimensions When information carries out similar process of fitting treatment, in conjunction with the word weight calculation of each word in the dimension word information, to obtain each dimension institute right Then the similarity answered is weighted and averaged multiple dimensions, as final relationship score.
In this specification embodiment, by the word information, word weight information and the candidate of the target entity text information Word information, the word weight information input semantic relationship model of entity text information carry out semantic association, obtain the target entity The associated entity text information of text information may include: that the target entity text information and the first object is candidate real Word information, the word weight information input semantic relationship model of body text information carry out semantic association, obtain the target entity text The associated entity text information of this information;Or, by the word of target entity text information and the second target candidate entity text information Information, word weight information input semantic relationship model carry out semantic association, and the association for obtaining the target entity text information is real Body text information;
In further embodiments, the default entity library can also include word Role Information, the word of entity text information Hierarchical information and word functional information;In practical applications, institute's predicate Role Information, word hierarchical information and the word in entity library are preset Functional information is contained in the text index in default entity library, i.e., word Role Information, word hierarchical information and word functional information are equal It is corresponding with entity text information.
Correspondingly, by the word information of the target entity text information, word weight information and the candidate entity text envelope Word information, the word weight information input semantic relationship model of breath carry out semantic association, obtain the target entity text information Associated entity text information may include: by the word letter of the target entity text information and the candidate entity text information Breath, word weight information, word Role Information, word hierarchical information and word functional information input semantic relationship model carry out semantic pass Connection, obtains the associated entity text information of the target entity text information.
Specifically, considering word information, word weight information, word Role Information, word hierarchical information and word functional information pair The feature vector of entity text information different dimensions is answered, similar process of fitting treatment can be carried out to the feature vector of each dimension When, then multiple dimensions are weighted and averaged, as final relationship score.
In this specification embodiment, when two entity text informations carry out semantic association processing, bluebeard compound weight is simultaneously The associated degree of strength between entity text information and the significance level to each feature in each entity text information are taken into account Characterization can greatly improve the spy of entity text information in addition, characterizing entity text information by the information of multiple dimensions Vector is levied to the characterization ability of entity text information, and then improves the accuracy of subsequent determining link entity text information.
S207: using the associated entity text information as the link entity text envelope of the target entity text information Breath.
It, can be directly by the associated entity text after determining associated entity text information in this specification embodiment Link entity text information of this information as the target entity text information.In this specification embodiment, target entity text The link entity text information of this information and the target entity text information are same entity text information.
In further embodiments, in order to preferably guarantee to link the accuracy of entity text information, by the association Before entity text information is as the link entity text information of the target entity text information, as shown in figure 8, the method Can also include:
S209: matching is carried out to the address information of the target entity text information and the associated entity text information and is tested Card;
In this specification embodiment, the address information of entity text information may include the letter of address corresponding to a certain entity Breath.Correspondingly, when the address information of the associated entity text information and the address information of the target entity text information Timing is executed using the associated entity text information as the step of the link entity text information of the target entity text information Suddenly.
The technical solution provided by above this specification embodiment is as it can be seen that this specification embodiment passes through to target entity text This information carries out various dimensions text analyzing processing, obtains the multidimensional that target entity text information can be characterized from more dimensions Spend text information;Then, the candidate entity text filtered out from default entity library based on the word information in various dimensions text information This information;Then, by the word information of target entity text information and candidate entity text information, the semantic pass of word weight information input Gang mould type carries out semantic association, and by when carrying out semantic association, bluebeard compound weight has been combined between entity text information Associated degree of strength and characterization to the significance level of each feature in each entity text information, and then can accurately really Make the associated entity text information of target entity text information;Then, in addition, also by target entity text information and described The address information of associated entity text information carries out matching verifying, can preferably guarantee to determine target entity text information The accuracy for linking entity text information, is successfully linked to default entity library for target entity text information.Utilize this specification The technical solution that embodiment provides can greatly improve the characterization ability to entity text information, and then it is real to improve determining link The accuracy of body text information can successfully realize the entity link of target entity based on the link entity text information.
Below in conjunction with Fig. 9, introduce entity in the logistics information that needs to fill in user in logistics business and entity library into The scene of row link, specifically, introducing the logistics information that user fills in includes the China mansion xxxxx of the area xx, Beijing road xx xx Building A xx layers of xx company;Wherein, " the Chinese mansion xxxxx Building A " is target entity text information;" area xx, Beijing road xx xx " For address information.Correspondingly, " Chinese xxxxx can be obtained by the various dimensions text analyzing to " the Chinese mansion xxxxx Building A " The various dimensions text information of mansion Building A ";Then, recalled based on various dimensions text analyzing treated word information from entity library " in The candidate entity text information of the mansion state xxxxx Building A ";Then, entity associated is carried out in conjunction with semantic relationship model, it is real from candidate The associated entity text information of " the Chinese mansion xxxxx Building A " is determined in body text information;It then, can be in binding entity library Spatial index is associated the address information " area xx, Beijing road the xx xx of entity text information Yu " the Chinese mansion xxxxx Building A " Number " matching verifying;It, can be using associated entity text information as the link of " the Chinese mansion xxxxx Building A " after being verified Entity text information, and then the entity link based on link entity text information realization " the Chinese mansion xxxxx Building A ", by mesh Mark entity text information be successfully linked to entity library, and then in available entity library the link entity text information coordinate, Realize the accurate positionin to user's logistics information.
In addition, it should be noted that, the correlation of the entities text information such as coordinate of entity text information is detailed in entity library Information can store in entity library, also can store in other databases.
The embodiment of the present application also provides a kind of target entity linked sets, and as shown in Figure 10, described device includes:
Various dimensions text analyzing processing module 1010 can be used for carrying out various dimensions text point to target entity text information Analysis processing, obtains various dimensions text information, the various dimensions text information includes word information and word weight information;
Candidate entity text information determining module 1020 can be used for determining from default entity library based on the word information The candidate entity text information of the target entity text information, the default entity library include the word information of entity text information With word weight information;
Semantic association module 1030, can be used for by the word information of the target entity text information, word weight information and Word information, the word weight information input semantic relationship model of candidate's entity text information carry out semantic association, obtain described The associated entity text information of target entity text information;
Entity text information determining module 1040 is linked, can be used for using the associated entity text information as the mesh Mark the link entity text information of entity text information.
In some embodiments, the semantic association module 1030 may include:
Incidence matrix construction unit, for being based on the mesh in the incidence matrix structure layer of the semantic relationship model The word information for marking entity text information and the candidate entity text information constructs incidence matrix;
Feature vector determination unit, for it is true to be based on the incidence matrix in the convolutional layer of the semantic relationship model The feature vector of the fixed target entity text information and the candidate entity text information;
Dimension-reduction treatment unit, in the pond layer of the semantic relationship model, to the target entity text information Dimension-reduction treatment is carried out with the feature vector of the candidate entity text information;
Similar process of fitting treatment unit, in the multilayer perceptron of the semantic relationship model, after dimension-reduction treatment Feature vector, feature vector, the word weight of word weight and the candidate entity text information of the target entity text information Similar process of fitting treatment is carried out, obtains characterizing correlation degree between the target entity text information and the candidate entity text information Relationship score;
Associated entity text information determination unit, for being based on the relationship score from the candidate entity text information Determine the associated entity text information of the target entity text information.
In some embodiments, the semantic relationship model includes being determined using following units:
Positive sampled data acquiring unit, for obtaining positive sampled data, the positive sampled data includes the reality of same cluster relationship The entity text information of body text information and/or boss's relationship;
Negative sampled data acquiring unit, for obtaining negative sampled data, the negative sampled data includes address information mistake Entity text information and/or entity text informations different two-by-two between correlation meet the entity text envelope of preset condition Breath.
Various dimensions text analyzing processing unit is more for carrying out respectively to the positive sampled data and the negative sampled data Dimension text analyzing processing, obtains the various dimensions text information of the positive sampled data and the negative sampled data, the multidimensional Spending text information includes word information and word weight information;
Semantic association training unit, for by the word information of the positive sampled data and word weight information and described negative The word information and word weight information of sampled data input the first deep learning model and carry out semantic association training, obtain the semanteme Correlation model.
In some embodiments, the word information includes at least one of the following: former word information, Pinyin information, synonym letter Breath, error correction word information;
When the word information includes former word information, Pinyin information, synonymous word information and error correction word information, the various dimensions Text analyzing processing module 1010 includes:
Word segmentation processing unit obtains the target entity for carrying out word segmentation processing to the target entity text information The former word information of text information;
Pinyin information determination unit, for using the Pinyin information of the former word information as the target entity text information Pinyin information;
Synonymous conversion processing unit obtains the target entity for carrying out synonymous conversion process to the former word information The synonymous word information of text information;
Correction process unit obtains the target entity text information for carrying out correction process to the former word information Error correction word information;
Word weight determining unit determines the former word information, Pinyin information, synonym for word-based weight identification model The word weight of word in information and error correction word information, institute's predicate weight identification model includes based on the word information for being labeled with word weight The model that training obtains;
The word weight of the word characterize the word in entity text information can not be Deletional.
In some embodiments, the candidate entity text information determining module 1020 includes:
Text relevant determination unit, for word information and the default entity based on the target entity text information The word information of entity text information determines entity text envelope in the target entity text information and the default entity library in library The text relevant of breath;
First candidate entity text information determination unit, for determining institute from default entity library based on the text relevant State the candidate entity text information of target entity text information.
In some embodiments, the candidate entity text information determining module 1020 includes:
Semantic dependency determination unit, for determining entity in the target entity text information and the default entity library The semantic dependency of the word information of text information;
Second candidate entity text information determination unit, for determining institute from default entity library based on the semantic dependency State the candidate entity text information of target entity text information.
In some embodiments, the default entity library can also include the address information of entity text information, the time Select entity text information determining module 1020 further include:
Address information acquiring unit, for obtaining the address information of the target entity text information;
Correlation determination unit, for determining the address letter of the target entity text information and candidate entity text information The correlation of breath;
Third candidate's entity text information determination unit, it is real from the candidate for the correlation based on the address information First object candidate's entity text information of the target entity text information is determined in body text information;
Correspondingly, the semantic association module 1030 is specifically used for the target entity text information and first mesh Word information, the word weight information input semantic relationship model progress semantic association for marking candidate entity text information, obtain the mesh Mark the associated entity text information of entity text information.
In some embodiments, the default entity library can also include the ranking information of entity text information, the time Select entity text information determining module 1020 further include:
Ranking information acquiring unit, for obtaining the ranking information of the first object candidate entity text information;
4th candidate entity text information determination unit, for candidate real from the first object based on the ranking information The second target candidate entity text information of the target entity text information is determined in body text information;
Correspondingly, the semantic association module 1030 is specifically used for the target entity text information and second mesh Word information, the word weight information input semantic relationship model progress semantic association for marking candidate entity text information, obtain the mesh Mark the associated entity text information of entity text information.
In some embodiments, the various dimensions text information at least can also include one of the following: word Role Information, word Hierarchical information and word functional information;
When the various dimensions text information includes: word Role Information, word hierarchical information and word functional information, the multidimensional Spending text analyzing processing module 1010 may include:
Word Role Information determination unit, for the part of speech based on word in the target entity text information, determine described in The word Role Information of target entity text information;
Word hierarchical information determination unit, for based on the structural relation between word in the target entity text information, really The word hierarchical information of the fixed target entity text information;
Functional analysis unit obtains the mesh for carrying out functional analysis to word in the target entity text information Mark the word functional information of entity text information.
In some embodiments, the default entity library can also include word Role Information, the word layer of entity text information Grade information and word functional information;
The semantic association module 1030 can be specifically used for the target entity text information and the candidate entity Word information, word weight information, word Role Information, word hierarchical information and the word functional information of text information input semantic association mould Type carries out semantic association, obtains the associated entity text information of the target entity text information.
In some embodiments, described device further include:
Authentication module is matched, for the address letter to the target entity text information and the associated entity text information Breath carries out matching verifying;
Correspondingly, link entity text information determining module 1040 is also used to when the address of the associated entity text information When information is matched with the address information of the target entity text information, using the associated entity text information as the target The link entity text information of entity text information.
Apparatus and method embodiment in the Installation practice is based on similarly application design.
The embodiment of the present application provides a kind of target entity chained device, the target entity chained device include processor and Memory is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in the memory, this at least one refers to It enables, an at least Duan Chengxu, the code set or the instruction set are loaded by the processor and executed to realize such as above method embodiment Provided target entity link method.
Memory can be used for storing software program and module, and processor is stored in the software program of memory by operation And module, thereby executing various function application and data processing.Memory can mainly include storing program area and storage number According to area, wherein storing program area can application program needed for storage program area, function etc.;Storage data area can store basis The equipment uses created data etc..In addition, memory may include high-speed random access memory, can also include Nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.Phase Ying Di, memory can also include Memory Controller, to provide access of the processor to memory.
Embodiment of the method provided by the embodiment of the present application can be in mobile terminal, terminal, server or class As execute in arithmetic unit.For running on the server, Figure 11 is a kind of target entity provided by the embodiments of the present application The hardware block diagram of the server of link method.As shown in figure 11, which can produce because configuration or performance are different Raw bigger difference, may include one or more central processing units (Central Processing Units, CPU) 1110 (processing units that processor 1110 can include but is not limited to Micro-processor MCV or programmable logic device FPGA etc.), Memory 1130 for storing data, the storage medium of one or more storage application programs 1123 or data 1122 1120 (such as one or more mass memory units).Wherein, memory 1130 and storage medium 1120 can be of short duration Storage or persistent storage.The program for being stored in storage medium 1120 may include one or more modules, and each module can To include to the series of instructions operation in server.Further, central processing unit 1110 can be set to be situated between with storage Matter 1120 communicates, and the series of instructions operation in storage medium 1120 is executed on server 1100.Server 1100 can be with Including one or more power supplys 1160, one or more wired or wireless network interfaces 1150, one or one with Upper input/output interface 1140, and/or, one or more operating systems 1121, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Input/output interface 1140 can be used for that data are received or sent via a network.Above-mentioned network is specifically real Example may include the wireless network that the communication providers of server 1100 provide.In an example, input/output interface 1140 wraps A network adapter (Network Interface Controller, NIC) is included, base station and other network equipments can be passed through It is connected so as to be communicated with internet.In an example, input/output interface 1140 can be radio frequency (Radio Frequency, RF) module, it is used to wirelessly be communicated with internet.
It will appreciated by the skilled person that structure shown in Figure 11 is only to illustrate, above-mentioned electronics is not filled The structure set causes to limit.For example, server 1100 may also include more perhaps less component or tool than shown in Figure 11 There is the configuration different from shown in Figure 11.
Embodiments herein additionally provides a kind of storage medium, and the storage medium may be disposed among server to protect It deposits for realizing target entity link method a kind of in embodiment of the method relevant at least one instruction, at least a Duan Chengxu, generation Code collection or instruction set, at least one instruction, an at least Duan Chengxu, the code set or the instruction set are loaded and are held by the processor It goes to realize the target entity link method of above method embodiment offer.
Optionally, in the present embodiment, above-mentioned storage medium can be located in multiple network servers of computer network At least one network server.Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, only Read memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), movement The various media that can store program code such as hard disk, magnetic or disk.
By the embodiment of above-mentioned target entity link method provided by the present application, device, equipment or storage medium as it can be seen that this By carrying out various dimensions text analyzing processing to target entity text information in application, obtain to characterize from more dimensions The various dimensions text information of target entity text information;Then, based on the word information in various dimensions text information from default entity The candidate entity text information filtered out in library;Then, the word of target entity text information and candidate entity text information is believed Breath, word weight information input semantic relationship model carry out semantic association, by when carrying out semantic association, bluebeard compound weight, together When taken into account the associated degree of strength between entity text information and the important journey to each feature in each entity text information The characterization of degree, and then can accurately determine the associated entity text information of target entity text information;Then, in addition, also Matching verifying is carried out by the address information of target entity text information and the associated entity text information, can preferably be protected Card determines the accuracy of the link entity text information of target entity text information, successfully links target entity text information To default entity library.The characterization to entity text information can be greatly improved using the technical solution that this specification embodiment provides Ability, and then the accuracy of determining link entity text information is improved, it can be successful based on the link entity text information Realize the entity link of target entity.
It should be understood that above-mentioned the embodiment of the present application sequencing is for illustration only, do not represent the advantages or disadvantages of the embodiments. And above-mentioned this specification specific embodiment is described.Other embodiments are within the scope of the appended claims.One In a little situations, the movement recorded in detail in the claims or step can be executed according to the sequence being different from embodiment and Still desired result may be implemented.In addition, process depicted in the drawing not necessarily requires the particular order shown or company Continuous sequence is just able to achieve desired result.In some embodiments, multitasking and parallel processing it is also possible or It may be advantageous.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For equipment and storage medium embodiment, since it is substantially similar to the method embodiment, so be described relatively simple, correlation Place illustrates referring to the part of embodiment of the method.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims (14)

1. a kind of target entity link method, which is characterized in that the described method includes:
Various dimensions text analyzing processing is carried out to target entity text information, obtains various dimensions text information, the various dimensions text This information includes word information and word weight information;
The candidate entity text information of the target entity text information, institute are determined from default entity library based on the word information State the word information and word weight information that default entity library includes entity text information;
By the word information of the target entity text information, word weight information and the candidate entity text information word information, Word weight information inputs semantic relationship model and carries out semantic association, obtains the associated entity text of the target entity text information Information;
Using the associated entity text information as the link entity text information of the target entity text information.
2. the method according to claim 1, wherein it is described by the word information of the target entity text information, The word information of word weight information and the candidate entity text information, word weight information input semantic relationship model carry out semantic pass Connection, the associated entity text information for obtaining the target entity text information include:
It is real based on the target entity text information and the candidate in the incidence matrix structure layer of the semantic relationship model The word information of body text information constructs incidence matrix;
In the convolutional layer of the semantic relationship model, the target entity text information and institute are determined based on the incidence matrix State the feature vector of candidate entity text information;
In the pond layer of the semantic relationship model, to the target entity text information and the candidate entity text information Feature vector carry out dimension-reduction treatment;
In the multilayer perceptron of the semantic relationship model, to the feature of the target entity text information after dimension-reduction treatment The similar process of fitting treatment of vector, the feature vector of word weight and the candidate entity text information, progresss of word weight, obtains characterizing institute State the relationship score of correlation degree between target entity text information and the candidate entity text information;
Determine that the association of the target entity text information is real from the candidate entity text information based on the relationship score Body text information.
3. the method according to claim 1, wherein the semantic relationship model includes true using following manner It is fixed:
Positive sampled data is obtained, the positive sampled data includes the entity text information of same cluster relationship and/or the reality of boss's relationship Body text information;
Obtain negative sampled data, the negative sampled data includes the entity text information of address information mistake and/or different two-by-two Entity text information between correlation meet the entity text information of preset condition;
Various dimensions text analyzing processing is carried out to the positive sampled data and the negative sampled data respectively, obtains the positive sampling The various dimensions text information of data and the negative sampled data, the various dimensions text information include word information and word weight letter Breath;
The word information of the positive sampled data and word weight information and the word information and word weight of the negative sampled data are believed Breath the first deep learning model of input carries out semantic association training, obtains the semantic relationship model.
4. the method according to claim 1, wherein the word information is including at least one of the following: former word information, Pinyin information, synonymous word information, error correction word information;
It is described to target entity when the word information includes former word information, Pinyin information, synonymous word information and error correction word information Text information carries out various dimensions text analyzing processing, and obtaining various dimensions text information includes:
Word segmentation processing is carried out to the target entity text information, obtains the former word information of the target entity text information;
Using the Pinyin information of the former word information as the Pinyin information of the target entity text information;
Synonymous conversion process is carried out to the former word information, obtains the synonymous word information of the target entity text information;
Correction process is carried out to the former word information, obtains the error correction word information of the target entity text information;
Word-based weight identification model determines the former word information, Pinyin information, word in synonymous word information and error correction word information Word weight, institute's predicate weight identification model includes based on the obtained model of word information training for being labeled with word weight.
5. the method according to claim 1, wherein described determine institute from default entity library based on the word information The candidate entity text information for stating target entity text information includes:
The word information of entity text information is true in word information and the default entity library based on the target entity text information The text relevant of entity text information in the fixed target entity text information and the default entity library;
The candidate entity text information of the target entity text information is determined from default entity library based on the text relevant.
6. the method according to claim 1, wherein described determine institute from default entity library based on the word information The candidate entity text information for stating target entity text information includes:
Determine that the target entity text information is related to the semanteme of the word information of entity text information in the default entity library Property;
The candidate entity text information of the target entity text information is determined from default entity library based on the semantic dependency.
7. the method according to claim 1, wherein the default entity library further includes the ground of entity text information Location information, the method also includes:
Obtain the address information of the target entity text information;
Determine the correlation of the address information of the target entity text information and candidate entity text information;
The target entity text information is determined from the candidate entity text information based on the correlation of the address information First object candidate's entity text information;
Correspondingly, described by the word information of the target entity text information, word weight information and the candidate entity text envelope Word information, the word weight information input semantic relationship model of breath carry out semantic association, obtain the target entity text information Associated entity text information includes:
Word information, the word weight information of the target entity text information and the first object candidate entity text information is defeated Enter semantic relationship model and carry out semantic association, obtains the associated entity text information of the target entity text information.
8. the method according to the description of claim 7 is characterized in that the default entity library further includes the row of entity text information Name information, the method also includes:
Obtain the ranking information of the first object candidate entity text information;
The target entity text information is determined from the first object candidate entity text information based on the ranking information The second target candidate entity text information;
Correspondingly, described by the word information of the target entity text information, word weight information and the candidate entity text envelope Word information, the word weight information input semantic relationship model of breath carry out semantic association, obtain the target entity text information Associated entity text information includes:
Word information, the word weight information of the target entity text information and the second target candidate entity text information is defeated Enter semantic relationship model and carry out semantic association, obtains the associated entity text information of the target entity text information.
9. the method according to claim 1, wherein the various dimensions text information at least further include it is following it One: word Role Information, word hierarchical information and word functional information;
It is described to target reality when the various dimensions text information includes: word Role Information, word hierarchical information and word functional information Body text information carries out various dimensions text analyzing processing, and obtaining various dimensions text information includes:
Based on the part of speech of word in the target entity text information, the word role letter of the target entity text information is determined Breath;
Based on the structural relation between word in the target entity text information, the word layer of the target entity text information is determined Grade information;
Functional analysis is carried out to word in the target entity text information, obtains the word function of the target entity text information Information.
10. according to the method described in claim 9, it is characterized in that, the default entity library further includes entity text information Word Role Information, word hierarchical information and word functional information;
The word by the word information of the target entity text information, word weight information and the candidate entity text information is believed Breath, word weight information input semantic relationship model carry out semantic association, obtain the associated entity of the target entity text information Text information includes:
By the word information of the target entity text information and the candidate entity text information, word weight information, word role letter Breath, word hierarchical information and word functional information input semantic relationship model carry out semantic association, obtain the target entity text The associated entity text information of information.
11. the method according to claim 1, wherein using the associated entity text information as the mesh Before the link entity text information for marking entity text information, the method also includes:
Matching verifying is carried out to the address information of the target entity text information and the associated entity text information;
When the address information of the associated entity text information is matched with the address information of the target entity text information, hold The step of being about to link entity text information of the associated entity text information as the target entity text information.
12. a kind of target entity linked set, which is characterized in that described device includes:
Various dimensions text analyzing processing module is obtained for carrying out various dimensions text analyzing processing to target entity text information Various dimensions text information, the various dimensions text information include word information and word weight information;
Candidate entity text information determining module, for determining the target entity from default entity library based on the word information The candidate entity text information of text information, the default entity library include the word information and word weight letter of entity text information Breath;
Semantic association module, for by the word information of the target entity text information, word weight information and the candidate entity Word information, the word weight information input semantic relationship model of text information carry out semantic association, obtain the target entity text The associated entity text information of information;
Entity text information determining module is linked, for using the associated entity text information as the target entity text envelope The link entity text information of breath.
13. a kind of target entity chained device, which is characterized in that the equipment includes processor and memory, the memory In be stored at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, described at least one Duan Chengxu, the code set or instruction set are loaded by the processor and are executed to realize as described in claim 1 to 11 is any Target entity link method.
14. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium A few Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or instruction Collection is loaded by processor and is executed to realize the target entity link method as described in claim 1 to 11 is any.
CN201910388403.0A 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium Active CN110147421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910388403.0A CN110147421B (en) 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910388403.0A CN110147421B (en) 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110147421A true CN110147421A (en) 2019-08-20
CN110147421B CN110147421B (en) 2022-06-21

Family

ID=67595091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910388403.0A Active CN110147421B (en) 2019-05-10 2019-05-10 Target entity linking method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110147421B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929035A (en) * 2019-11-27 2020-03-27 中国传媒大学 Information prediction method and system for film and television works
CN111523326A (en) * 2020-04-23 2020-08-11 北京百度网讯科技有限公司 Entity chain finger method, device, equipment and storage medium
CN111737430A (en) * 2020-06-16 2020-10-02 北京百度网讯科技有限公司 Entity linking method, device, equipment and storage medium
CN112115235A (en) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 Entity attribute data query and configuration method, device and server
WO2022194086A1 (en) * 2021-03-16 2022-09-22 International Business Machines Corporation A neuro-symbolic approach for entity linking
WO2023098658A1 (en) * 2022-08-02 2023-06-08 深圳市城市公共安全技术研究院有限公司 Text cohesion determination method and apparatus, and electronic device and storage medium
CN117014382A (en) * 2023-10-07 2023-11-07 北京中科网芯科技有限公司 Data stream processing system and method based on convergence and distribution equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482876A (en) * 2008-12-11 2009-07-15 南京大学 Weight-based link multi-attribute entity recognition method
US20140344274A1 (en) * 2013-05-20 2014-11-20 Hitachi, Ltd. Information structuring system
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN106156196A (en) * 2015-04-22 2016-11-23 富士通株式会社 Extract the apparatus and method of text feature
US20170083484A1 (en) * 2015-09-21 2017-03-23 Tata Consultancy Services Limited Tagging text snippets
CN106649853A (en) * 2016-12-30 2017-05-10 儒安科技有限公司 Short text clustering method based on deep learning
CN107480125A (en) * 2017-07-05 2017-12-15 重庆邮电大学 A kind of relational links method of knowledge based collection of illustrative plates
CN108509517A (en) * 2018-03-09 2018-09-07 东南大学 A kind of streaming topic evolution tracking towards real-time news content
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482876A (en) * 2008-12-11 2009-07-15 南京大学 Weight-based link multi-attribute entity recognition method
US20140344274A1 (en) * 2013-05-20 2014-11-20 Hitachi, Ltd. Information structuring system
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN106156196A (en) * 2015-04-22 2016-11-23 富士通株式会社 Extract the apparatus and method of text feature
US20170083484A1 (en) * 2015-09-21 2017-03-23 Tata Consultancy Services Limited Tagging text snippets
CN106649853A (en) * 2016-12-30 2017-05-10 儒安科技有限公司 Short text clustering method based on deep learning
CN107480125A (en) * 2017-07-05 2017-12-15 重庆邮电大学 A kind of relational links method of knowledge based collection of illustrative plates
CN108509517A (en) * 2018-03-09 2018-09-07 东南大学 A kind of streaming topic evolution tracking towards real-time news content
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
VEYSEL YÜCESOY 等: "Effect of cooccurance weighting to English word embeddings", 《SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE》, 29 June 2017 (2017-06-29), pages 1 - 4 *
朱颢东: "《文本挖掘中若干核心技术研究》", 31 March 2017, 北京理工大学出版社, pages: 17 - 18 *
程童凌: "基于维基类百科知识资源的实体关系发现和语义标注", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
程童凌: "基于维基类百科知识资源的实体关系发现和语义标注", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, 15 March 2016 (2016-03-15), pages 138 - 7770 *
马芳 等: "中文科技期刊论文多标签分类研究", 《图书情报导刊》, 25 February 2019 (2019-02-25), pages 26 - 32 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929035A (en) * 2019-11-27 2020-03-27 中国传媒大学 Information prediction method and system for film and television works
CN110929035B (en) * 2019-11-27 2022-09-30 中国传媒大学 Information prediction method and system for film and television works
CN111523326A (en) * 2020-04-23 2020-08-11 北京百度网讯科技有限公司 Entity chain finger method, device, equipment and storage medium
CN111523326B (en) * 2020-04-23 2023-03-17 北京百度网讯科技有限公司 Entity chain finger method, device, equipment and storage medium
US11704492B2 (en) 2020-04-23 2023-07-18 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, electronic device, and storage medium for entity linking by determining a linking probability based on splicing of embedding vectors of a target and a reference text
CN111737430A (en) * 2020-06-16 2020-10-02 北京百度网讯科技有限公司 Entity linking method, device, equipment and storage medium
CN111737430B (en) * 2020-06-16 2024-04-05 北京百度网讯科技有限公司 Entity linking method, device, equipment and storage medium
CN112115235A (en) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 Entity attribute data query and configuration method, device and server
WO2022194086A1 (en) * 2021-03-16 2022-09-22 International Business Machines Corporation A neuro-symbolic approach for entity linking
WO2023098658A1 (en) * 2022-08-02 2023-06-08 深圳市城市公共安全技术研究院有限公司 Text cohesion determination method and apparatus, and electronic device and storage medium
CN117014382A (en) * 2023-10-07 2023-11-07 北京中科网芯科技有限公司 Data stream processing system and method based on convergence and distribution equipment
CN117014382B (en) * 2023-10-07 2023-12-29 北京中科网芯科技有限公司 Data stream processing system and method based on convergence and distribution equipment

Also Published As

Publication number Publication date
CN110147421B (en) 2022-06-21

Similar Documents

Publication Publication Date Title
CN110147421A (en) A kind of target entity link method, device, equipment and storage medium
CN110147551B (en) Multi-category entity recognition model training, entity recognition method, server and terminal
Wang et al. SMOTETomek-based resampling for personality recognition
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
Karatzoglou et al. A Seq2Seq learning approach for modeling semantic trajectories and predicting the next location
CN110309514A (en) A kind of method for recognizing semantics and device
CN105701120B (en) The method and apparatus for determining semantic matching degree
CN110457675A (en) Prediction model training method, device, storage medium and computer equipment
CN108509463A (en) A kind of answer method and device of problem
CN109408622A (en) Sentence processing method and its device, equipment and storage medium
CN110781204B (en) Identification information determining method, device, equipment and storage medium of target object
Ziyadi et al. Example-based named entity recognition
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN111222847B (en) Open source community developer recommendation method based on deep learning and unsupervised clustering
CN108986907A (en) A kind of tele-medicine based on KNN algorithm divides the method for examining automatically
CN110032650B (en) Training sample data generation method and device and electronic equipment
CN110489521A (en) Text categories detection method, device, electronic equipment and computer-readable medium
CN110377739A (en) Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN107644051A (en) System and method for the packet of similar entity
CN110874528A (en) Text similarity obtaining method and device
CN113297351A (en) Text data labeling method and device, electronic equipment and storage medium
CN112380421A (en) Resume searching method and device, electronic equipment and computer storage medium
CN115409111A (en) Training method of named entity recognition model and named entity recognition method
CN113869065B (en) Emotion classification method and system based on 'word-phrase' attention mechanism
Jaiswal et al. Genetic approach based bug triage for sequencing the instance and features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant