CN113220899A - Intellectual property identity identification method based on academic talent information intellectual map - Google Patents
Intellectual property identity identification method based on academic talent information intellectual map Download PDFInfo
- Publication number
- CN113220899A CN113220899A CN202110506792.XA CN202110506792A CN113220899A CN 113220899 A CN113220899 A CN 113220899A CN 202110506792 A CN202110506792 A CN 202110506792A CN 113220899 A CN113220899 A CN 113220899A
- Authority
- CN
- China
- Prior art keywords
- information
- intellectual property
- entity
- entities
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000013145 classification model Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 239000003814 drug Substances 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000008520 organization Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000008092 positive effect Effects 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Technology Law (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intellectual property identity identification method based on an academic talent information intellectual map. The method aims at intellectual property information of the whole network, such as invention patents, thesis, soft works and the like, realizes the identification of the authors of the intellectual property information, and further completes the establishment of an academic talent information knowledge base in the big data industry. The invention has positive effect on discovering and effectively reserving high-quality talents as soon as possible.
Description
Technical Field
The invention relates to an intellectual property identity identification method based on an academic talent information intellectual map. The method mainly aims at intellectual property information of the whole network, such as patent inventions, thesis, soft works and the like, and realizes the identification of the author of the intellectual property information by the method, thereby completing the establishment of an academic talent information knowledge base in the big data industry, and having positive effects on early discovery and effective reservation of high-quality talents.
Background
The traditional entity linking method mainly comprises three processes of nominal identification, candidate entity generation and candidate entity sequencing, wherein the nominal identification is mostly acquired based on an entity identification technology, the candidate entity generation generally comprises information extraction of a knowledge base, an associated dictionary corresponding to an entity is constructed, and a large number of candidate entities can be generated only by simply matching character strings in the dictionary when candidates are generated according to the dictionary. The knowledge base is generally selected as Wikipedia, so that data limitation is large, and meanwhile, only a dictionary matching mode is adopted, so that not only are too many candidate entities caused, not only is resource waste caused, but also interference items are improved, and the accuracy is reduced.
The mainstream candidate entity ordering method adopts the idea based on similarity comparison, and the basic idea is to select the candidate entity with the maximum similarity as the link target by calculating the context similarity between the entity nominal item extracted from the text and the candidate entity obtained from the knowledge base query. Most of the similarity calculation is carried out by adopting a machine learning method based on artificially defined rules, for example, context features (keywords) and page structure features (such as page redirection, anchor text and the like) of candidate entities on a Wikipedia page are added during solving; or selecting entity popularity to assist disambiguation; or choose to add consideration to the relevance of categories between entities (link relationships, probability of co-occurrence, etc.); the method based on the artificial definition rule has large limitation, cannot acquire all rule information comprehensively, and meanwhile, the context information of the candidate entity based on the Wikipedia or the encyclopedia is not comprehensive and disordered enough, so that the method brings obstruction to accurate identification.
In addition, entity linking methods based on deep learning are also becoming more popular, compared with traditional methods, deep learning methods do not need to define relevant features manually, such as document representation of learning entities based on a Deep Neural Network (DNN), and category representation is obtained by using a CNN; or the expression, the entity and the context are subjected to embedded expression, the characteristics are extracted through CNN, and finally the similarity of the expression and the entity is calculated for linking; or the BERT pre-training language model analyzes the context of the entity nominal item and the correlation information of the candidate entity, and enhances the result of entity link by improving the semantic analysis effect. However, the deep learning method requires too much data and has high requirements on machine performance, especially in a big data scene, and meanwhile, deep learning completely depends on the corpus, and if the deep learning method is completely depended on, the effect may be worse when the corpus is deviated.
Disclosure of Invention
The invention aims to provide an intellectual property identity identification method based on an academic talent information intellectual map.
In order to solve the technical problems, the invention adopts the technical scheme that the intellectual property identity identification method based on the academic talent information intellectual map comprises the following steps:
(1) the crawler acquires talent information data comprising names, resumes and intellectual property information, and establishes a talent information intellectual map based on neo4j according to the information; the knowledge graph is formed in a triple E ═ sub, rela, obj > form and specifically comprises attribute information of the entities and the relationship among the entities;
(2) and (3) nominal identification: for intellectual property information M to be linked, directly acquiring structural form characteristic information and text information of intellectual property in the intellectual property information M based on a regularization rule, wherein M is (M ═1,M2,…,Mn) Wherein M isiObtaining the nominal item;
(3) candidate entity generation: dividing intellectual property information corresponding to entities in the intellectual property map into 4 subject categories of text, theory, agriculture and medicine, and constructing a subject classification model based on word2vec and TextCNN for judgment;
the process of candidate entity generation is as follows:
firstly, for intellectual property information M to be linked, matching is carried out through fuzzy query of an intellectual map according to a designated item, and a possible entity set D is obtained (D)1,D2,…,Dn);
Secondly, obtaining the category of the intellectual property information M by using the trained discipline classification model, wherein M istype=TextCNN(M);
And finally, respectively inputting the entities in the set D into the trained discipline classification model to obtain categories, wherein the final candidate entities are H (H ═ H)1,H2,…,Hk),{Hi∈D,Hitype=MtypeI ═ 1,2, …, k }, where k is the final number of candidate entities after class filtering;
(4) candidate entity ordering: respectively from form features FformalAnd semantic features FsemanticTwo aspects are used for carrying out entity sequencing; for each candidate entity Hi, all information can be obtained based on the knowledge graph except the structural form characteristic information (Gi) directly obtained from the graph1,Gi2,…,Gin) Besides, the method also comprises semantic features Gi based on the intellectual property contentn+1That is, all the information of the candidate entity Hi is Gi ═ (Gi)1,Gi2,…,Gin,Gin+1);
The specific candidate entity ordering process is as follows:
determining weight information of each feature as Wi ═ by using AHP (approximate height-weighted prediction) method1,Wi2,…,Win,Win+1);
Form character FformalSolving: for each candidate entity Hi, all information can be obtained based on the knowledge graph, the text content of the intellectual property information is removed, and only the structural form characteristic information Gi ═ is reserved1,Gi2,…,Gin) (ii) a For the intellectual property information M to be linked, the form characteristics are determined by calculating the matching degree of M and Gi', and the matching degree of M and Gi Wherein M isjStructural formal characteristic information, i.e. F, for intellectual property information M to be linkedformal=Sk;
Semantic feature FsemanticSolving: between intellectual property information M to be linked and candidate entity HiSemantic feature F ofsemanticThe value of (c) is measured by similarity, and character-based CBA (charbi _ lstm + attribute) network is selected to solve the similarity, specifically, semantic features Gi of candidate entities Hin+1And text information M of intellectual property information M to be linkednThe similarity probabilities [ y1, y2] are determined by a softmax layer after a bi _ lstm + attention layer, respectively]The final similarity probability y1 is FsemanticTaking the value of (A);
sorting: the final similarity Fi F between the candidate entity Hi and the intellectual property information M to be linkedsemantic+FformalThe final linked entity is the entity corresponding to max (fi).
Preferably, the intellectual property information includes information of articles and patents; the attribute information of the entity comprises basic information of a person, graduate colleges, professions, work units and intellectual property information; the relationships among the entities comprise cooperative relationships and alumni relationships.
Preferably, the structured form characteristic information includes author information, organization information, partner information, and periodical information.
Preferably, in the step (3), the construction of the TextCNN subject classification model based on word2vec includes the following specific steps:
firstly, collecting text data of intellectual property rights and labeling, wherein the labeled categories are respectively 4 categories of text (0), theory (1), agriculture (2) and medicine (3);
secondly, segmenting words and stopping the words;
thirdly, training a word2vec model, performing embedding to obtain nxk-dimensional vectors, and converting the class labels into a form of unique hot codes;
and finally, inputting the embedded vectors and the labels into a TextCNN network to train to obtain a subject classification model, wherein the subject classification model structure comprises a convolution and pooling layer, a data splicing layer, a faltent layer, a dropout layer, a full connection layer and a softmax layer.
Preferably, in the step (4), the AHP solving process first constructs an application example through an expert system, obtains an importance judgment matrix of each feature information through example analysis, and then calculates the weight by using SPSSAU software.
Preferably, the character-based CBA network similarity solution in step (4) includes the following specific steps:
first, data is collected and labeled in the form of s1, s2, label, which is 0 if s1 is similar to s2, otherwise it is 1;
secondly, obtaining vector representation of word2vec based on characters;
thirdly, inputting the data into a CBA network for training to obtain a trained CBA model;
and finally, inputting the candidate entity Hi to be identified and the intellectual property information M to be linked into the trained CBA model, and acquiring an output result [ y1, y2] of the softmax layer, wherein y1 is the similarity.
The invention has the beneficial effects that:
1. the traditional entity link can not effectively express the association degree problem between the nominal item and the candidate entity directly according to the characteristics extracted from the entity context texts such as Wikipedia and the like.
2. The traditional candidate entity generation method is directly matched without screening the part, so that the workload is increased, the interference rate is improved, and meanwhile, the situation that most of prediction error data is caused by type errors due to the fact that semantic information is directly matched is not considered.
3. In consideration of the similarity based on the conventional artificial definition rule and the superiority and inferiority based on the deep learning similarity, the invention integrates the formal features based on the knowledge map and the semantic features based on the intellectual property rights in the candidate entity ordering process, and simultaneously realizes the ordering of the similarity of the candidate entities by the AHP method based on statistics.
4. Meanwhile, in the solving process of semantic features based on intellectual property, the accuracy of the similarity of traditional cosine solving based on word2vec is low, an improved siamese CBA is adopted, characters are based, an attention mechanism is added, the similarity probability of the network is extracted to serve as a similarity value, and the accuracy is improved.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a schematic diagram of a TextCNN network structure according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a CBA network structure according to an embodiment of the present invention.
Detailed Description
The intellectual property identity identification method based on the academic talent information intellectual map comprises the following steps:
1. the crawler acquires talent information data, including names, resumes and intellectual property information, wherein the intellectual property information includes information such as articles and patents, and establishes a talent information intellectual map based on neo4j according to the information, the intellectual map is formed in a triple E ═ sub, rela, obj > form and specifically includes attribute information of entities (talents) and relationships among the entities, wherein the attribute information of the entities includes basic information of people, colleges and universities, professions, work units and intellectual property information, and the relationships among the entities include a cooperative relationship and an alumni relationship.
2. And (3) nominal identification: for the intellectual property information M to be linked, due to the normalization of the intellectual property information format, the structured form feature information and the intellectual property content information can be directly obtained based on the regularization rule, wherein M is (M ═1,M2,…,Mn) And the structural form characteristic information comprises author information, organization information, partner information and periodical information.
3. Candidate entity generation: in the conventional method for acquiring candidate entities, matching is generally performed by directly querying features in a map, or entity matching is performed according to similarity methods such as edit distance, but from the previous results, more than half of errors in a sample with model prediction errors are found to be of different types, so that the embodiment provides a type-based candidate entity generation method, which achieves a less and more precise target, improves accuracy and reduces calculation workload.
Specifically, in order to prevent omission caused by too fine type division, the embodiment divides intellectual property information corresponding to the entities in the intellectual property map into 4 subject categories of text, physics, agriculture, and medicine, and constructs a subject classification model based on word2vec and TextCNN.
The generation process of the candidate entity firstly obtains the candidate entity according to the nominal item, and then further screens by utilizing subject categories so as to reduce the operation amount. The method comprises the following specific steps:
firstly, for intellectual property information M to be linked, matching is carried out through fuzzy query of an intellectual map according to a designated item, and a possible entity set D is obtained (D)1,D2,…,Dn) (ii) a The part of query can be directly obtained by using the statement query of neo4j, so that the query is quick;
secondly, obtaining the category of the intellectual property information M by using the trained discipline classification model, wherein M istype=TextCNN(M);
And finally, respectively inputting the entities in the set D into the trained discipline classification model to obtain categories, wherein the final candidate entities are H (H ═ H)1,H2,…,Hk),{Hi∈D,Hitype=MtypeAnd i is 1,2, …, k, where k is the final number of candidate entities after class filtering.
The construction of the textCNN subject classification model based on word2vec comprises the following specific steps:
firstly, collecting text data of intellectual property rights and labeling, wherein the labeled categories are respectively 4 categories of text (0), theory (1), agriculture (2) and medicine (3);
secondly, segmenting words and stopping the words;
thirdly, training a word2vec model, performing embedding to obtain nxk-dimensional vectors, and converting the class labels into a form of unique hot codes;
and finally, inputting the embedded vectors and the labels into a TextCNN network to train to obtain a subject classification model, wherein the subject classification model structure comprises a convolution and pooling layer, a data splicing layer, a faltent layer, a dropout layer, a full connection layer and a softmax layer.
Fig. 1 shows a specific structure of the TextCNN network.
4. Candidate entity ordering: most of the traditional candidate entity sorting methods are based on similarity, for example, sorting is carried out by means of artificially defined characteristics between the referee and the target entity; or directly solving the similarity between the context information and the user information by utilizing a deep learning method. Considering the superiority and inferiority of each of the two methods, the embodiment integrates the two methods, fuses the structural information of the knowledge graph, and respectively adopts the form characteristics FformalAnd semantic features FsemanticTwo aspects are used to perform entity ordering.
For each candidate entity Hi, all information can be obtained based on the knowledge graph except the structural form characteristic information (Gi) directly obtained from the graph1,Gi2,…,Gin) Besides, the method also comprises semantic features Gi based on the intellectual property contentn+1That is, all the information of the candidate entity Hi is Gi ═ (Gi)1,Gi2,…,Gin,Gin+1)。
The specific candidate entity ordering process is as follows:
1) determining weight information of each feature as Wi ═ by using AHP (approximate height-weighted prediction) method1,Wi2,…,Win,Win+1) (ii) a The specific AHP solving method can firstly construct an application example, obtain an importance judgment matrix of each characteristic information through example analysis, and then calculate the weight by utilizing SPSSAU software.
2) Form character FformalSolving: for each candidate entity Hi, all information can be obtained based on the knowledge graph, the text content of the intellectual property information is removed, and only the structural form characteristic information Gi ═ is reserved1,Gi2,…,Gin) (ii) a For intellectual property information M to be linked, determining the shape by calculating the matching degree of M and GiFormula characteristics, degree of matching of M to Gi Wherein M isjStructural formal characteristic information, i.e. F, for intellectual property information M to be linkedformal=Sk。
3) Semantic feature FsemanticSolving: semantic features F between intellectual property information M to be linked and candidate entities HisemanticThe value of (1) is measured by similarity, and character-based CBA (charbi _ lstm + attribute) network is selected to solve the similarity, specifically, semantic features Gi of candidate entities Hi are used for solving the similarityn+1And text information M of intellectual property information M to be linkednInputting the probability into CBA network to obtain the similar probability [ y1, y2]]The final similarity probability y1 is FsemanticThe value of (a).
As shown in fig. 2, the character-based CBA network construction process includes the following steps:
first, data is collected and labeled in the form of s1, s2, label, which is 0 if s1 is similar to s2, otherwise it is 1;
secondly, obtaining vector representation of word2vec based on characters;
and finally, inputting the data and the label into a CBA network for training to obtain a trained CBA model.
4) Sorting: the final similarity Fi F between the candidate entity Hi and the intellectual property information M to be linkedsemantic+FformalThe final linked entity is the entity corresponding to max (fi).
The embodiment has the following technical characteristics:
1. the embodiment integrates data accumulation of the big data industry, provides the entity link model fusing the structural information of the knowledge graph, and improves the convenience, richness and effectiveness of data.
2. According to the candidate entity generation method based on the type, the calculation workload is reduced, meanwhile, the interference items are reduced, and the accuracy is improved.
3. In the candidate entity ordering process, the formal features based on the intellectual map and the semantic features based on the intellectual property are integrated, and the ordering of the similarity of the candidate entities is realized by the AHP method based on statistics.
4. An improved siamese network CBA is adopted, based on characters, an attention mechanism is added, the similarity probability of the network is extracted to serve as a similarity value, the method replaces the traditional cosine similarity method directly based on word2vec, word segmentation interference is avoided, semantic relevance is increased, and accuracy is improved.
The above-described embodiments of the present invention do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (6)
1. The intellectual property identity identification method based on the academic talent information intellectual map comprises the following steps:
(1) the crawler acquires talent information data comprising names, resumes and intellectual property information, and establishes a talent information intellectual map based on neo4j according to the information; the knowledge graph is formed in a triple E ═ sub, rela, obj > form and specifically comprises attribute information of the entities and the relationship among the entities;
(2) and (3) nominal identification: for intellectual property information M to be linked, directly acquiring structural form characteristic information and text information of intellectual property in the intellectual property information M based on a regularization rule, wherein M is (M ═1,M2,…,Mn) Wherein M isiObtaining the nominal item;
(3) candidate entity generation: dividing intellectual property information corresponding to entities in the intellectual property map into 4 subject categories of text, theory, agriculture and medicine, and constructing a subject classification model based on word2vec and TextCNN for judgment;
the process of candidate entity generation is as follows:
firstly, for intellectual property information M to be linked, matching is carried out through fuzzy query of an intellectual map according to a designated item, and a possible entity set D is obtained (D)1,D2,…,Dn);
Secondly, obtaining the category of the intellectual property information M by using the trained discipline classification model, wherein M istype=TextCNN(M);
And finally, respectively inputting the entities in the set D into the trained discipline classification model to obtain categories, wherein the final candidate entities are H (H ═ H)1,H2,…,Hk),{Hi∈D,Hitype=MtypeI ═ 1,2, …, k }, where k is the final number of candidate entities after class filtering;
(4) candidate entity ordering: respectively from form features FformalAnd semantic features FsemanticTwo aspects are used for carrying out entity sequencing; for each candidate entity Hi, all information can be obtained based on the knowledge graph except the structural form characteristic information (Gi) directly obtained from the graph1,Gi2,…,Gin) Besides, the method also comprises semantic features Gi based on the intellectual property contentn+1That is, all the information of the candidate entity Hi is Gi ═ (Gi)1,Gi2,…,Gin,Gin+1);
The specific candidate entity ordering process is as follows:
determining weight information of each feature as Wi ═ by using AHP (approximate height-weighted prediction) method1,Wi2,…,Win,Win+1);
Form character FformalSolving: for each candidate entity Hi, all information can be obtained based on the knowledge graph, the text content of the intellectual property information is removed, and only the structural form characteristic information Gi ═ is reserved1,Gi2,…,Gin) (ii) a For the intellectual property information M to be linked, the form characteristics are determined by calculating the matching degree of M and Gi', and the matching degree of M and GiWherein M isjStructural formal characteristic information, i.e. F, for intellectual property information M to be linkedformal=Sk;
Semantic feature FsemanticSolving: semantic features F between intellectual property information M to be linked and candidate entities HisemanticThe value of (c) is measured by similarity, and character-based CBA (charbi-lstm + attribute) network is selected to solve the similarity, specifically, semantic features Gi of candidate entities Hin+1And text information M of intellectual property information M to be linkednThe similarity probabilities [ y1, y2] are determined by a softmax layer after passing through a bi-lstm + attention layer respectively]The final similarity probability y1 is FsemanticTaking the value of (A);
sorting: the final similarity of the candidate entity Hi and the intellectual property information M to be linked is listed as Fsemantic+FformalThe final linked entity is the entity corresponding to max (fi).
2. The intellectual property identity recognition method of claim 1, wherein: the intellectual property information comprises information of articles and patents; the attribute information of the entity comprises basic information of a person, graduate colleges, professions, work units and intellectual property information; the relationships among the entities comprise cooperative relationships and alumni relationships.
3. The intellectual property identity recognition method of claim 1, wherein: the structural form characteristic information comprises author information, organization information, partner information and periodical information.
4. The intellectual property identity recognition method of claim 1, wherein: in the step (3), the construction of the TextCNN subject classification model based on word2vec comprises the following specific steps:
firstly, collecting text data of intellectual property rights and labeling, wherein the labeled categories are respectively 4 categories of text (0), theory (1), agriculture (2) and medicine (3);
secondly, segmenting words and stopping the words;
thirdly, training a word2vec model, performing embedding to obtain nxk-dimensional vectors, and converting the class labels into a form of unique hot codes;
and finally, inputting the embedded vectors and the labels into a TextCNN network to train to obtain a subject classification model, wherein the subject classification model structure comprises a convolution and pooling layer, a data splicing layer, a faltent layer, a dropout layer, a full connection layer and a softmax layer.
5. The intellectual property identity recognition method of claim 1, wherein: in the step (4), the AHP solving process firstly constructs an application example through an expert system, obtains an importance judgment matrix of each characteristic information through example analysis, and then calculates the weight by utilizing SPSSAU software.
6. The intellectual property identity recognition method of claim 1, wherein: the character-based CBA network similarity solving method in the step (4) comprises the following specific steps:
first, data is collected and labeled as s1, s2, label, with s1 being similar to s2, labe1 being 0, otherwise 1;
secondly, obtaining vector representation of word2vec based on characters;
thirdly, inputting the data into a CBA network for training to obtain a trained CBA model;
and finally, inputting the candidate entity Hi to be identified and the intellectual property information M to be linked into the trained CBA model, and acquiring an output result [ y1, y2] of the softmax layer, wherein y1 is the similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110506792.XA CN113220899A (en) | 2021-05-10 | 2021-05-10 | Intellectual property identity identification method based on academic talent information intellectual map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110506792.XA CN113220899A (en) | 2021-05-10 | 2021-05-10 | Intellectual property identity identification method based on academic talent information intellectual map |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113220899A true CN113220899A (en) | 2021-08-06 |
Family
ID=77094162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110506792.XA Withdrawn CN113220899A (en) | 2021-05-10 | 2021-05-10 | Intellectual property identity identification method based on academic talent information intellectual map |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113220899A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913514A (en) * | 2021-12-23 | 2022-08-16 | 号百信息服务有限公司 | Intelligent abnormal vehicle moving identification system |
CN115170353A (en) * | 2022-07-12 | 2022-10-11 | 朗动信息咨询(上海)有限公司 | Intellectual property achievement transformation analysis and evaluation system based on big data processing |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268643A (en) * | 2018-01-22 | 2018-07-10 | 北京邮电大学 | A kind of Deep Semantics matching entities link method based on more granularity LSTM networks |
CN108920556A (en) * | 2018-06-20 | 2018-11-30 | 华东师范大学 | Recommendation expert method based on subject knowledge map |
CN110990590A (en) * | 2019-12-20 | 2020-04-10 | 北京大学 | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning |
CN112131404A (en) * | 2020-09-19 | 2020-12-25 | 哈尔滨工程大学 | Entity alignment method in four-risk one-gold domain knowledge graph |
CN112131275A (en) * | 2020-09-23 | 2020-12-25 | 中国科学技术大学智慧城市研究院(芜湖) | Enterprise portrait construction method of holographic city big data model and knowledge graph |
CN112330183A (en) * | 2020-11-18 | 2021-02-05 | 布瑞克农业大数据科技集团有限公司 | Method and system for constructing big data portrait of agricultural enterprise |
CN112380865A (en) * | 2020-11-10 | 2021-02-19 | 北京小米松果电子有限公司 | Method, device and storage medium for identifying entity in text |
-
2021
- 2021-05-10 CN CN202110506792.XA patent/CN113220899A/en not_active Withdrawn
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268643A (en) * | 2018-01-22 | 2018-07-10 | 北京邮电大学 | A kind of Deep Semantics matching entities link method based on more granularity LSTM networks |
CN108920556A (en) * | 2018-06-20 | 2018-11-30 | 华东师范大学 | Recommendation expert method based on subject knowledge map |
CN110990590A (en) * | 2019-12-20 | 2020-04-10 | 北京大学 | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning |
CN112131404A (en) * | 2020-09-19 | 2020-12-25 | 哈尔滨工程大学 | Entity alignment method in four-risk one-gold domain knowledge graph |
CN112131275A (en) * | 2020-09-23 | 2020-12-25 | 中国科学技术大学智慧城市研究院(芜湖) | Enterprise portrait construction method of holographic city big data model and knowledge graph |
CN112380865A (en) * | 2020-11-10 | 2021-02-19 | 北京小米松果电子有限公司 | Method, device and storage medium for identifying entity in text |
CN112330183A (en) * | 2020-11-18 | 2021-02-05 | 布瑞克农业大数据科技集团有限公司 | Method and system for constructing big data portrait of agricultural enterprise |
Non-Patent Citations (1)
Title |
---|
罗安根: "《融合知识图谱的实体链接的算法研究》", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913514A (en) * | 2021-12-23 | 2022-08-16 | 号百信息服务有限公司 | Intelligent abnormal vehicle moving identification system |
CN115170353A (en) * | 2022-07-12 | 2022-10-11 | 朗动信息咨询(上海)有限公司 | Intellectual property achievement transformation analysis and evaluation system based on big data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106776711B (en) | Chinese medical knowledge map construction method based on deep learning | |
WO2018196561A1 (en) | Label information generating method and device for application and storage medium | |
CN111813950B (en) | Building field knowledge graph construction method based on neural network self-adaptive optimization tuning | |
CN110990590A (en) | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning | |
CN110134757A (en) | A kind of event argument roles abstracting method based on bull attention mechanism | |
CN112733866B (en) | Network construction method for improving text description correctness of controllable image | |
CN113515632B (en) | Text classification method based on graph path knowledge extraction | |
CN111858940B (en) | Multi-head attention-based legal case similarity calculation method and system | |
CN111159485B (en) | Tail entity linking method, device, server and storage medium | |
CN112069408A (en) | Recommendation system and method for fusion relation extraction | |
CN111061939B (en) | Scientific research academic news keyword matching recommendation method based on deep learning | |
CN113806563A (en) | Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material | |
CN111858896B (en) | Knowledge base question-answering method based on deep learning | |
CN115438674B (en) | Entity data processing method, entity linking method, entity data processing device, entity linking device and computer equipment | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN113220899A (en) | Intellectual property identity identification method based on academic talent information intellectual map | |
CN117236338B (en) | Named entity recognition model of dense entity text and training method thereof | |
CN116975256B (en) | Method and system for processing multisource information in construction process of underground factory building of pumped storage power station | |
CN112256904A (en) | Image retrieval method based on visual description sentences | |
CN107391565A (en) | A kind of across language hierarchy taxonomic hierarchies matching process based on topic model | |
CN114997288A (en) | Design resource association method | |
CN114238653A (en) | Method for establishing, complementing and intelligently asking and answering knowledge graph of programming education | |
CN116108191A (en) | Deep learning model recommendation method based on knowledge graph | |
CN117371481A (en) | Neural network model retrieval method based on meta learning | |
CN112417322A (en) | Type discrimination method and system for interest point name text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210806 |