CN112507707A - Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things - Google Patents

Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things Download PDF

Info

Publication number
CN112507707A
CN112507707A CN202011408521.2A CN202011408521A CN112507707A CN 112507707 A CN112507707 A CN 112507707A CN 202011408521 A CN202011408521 A CN 202011408521A CN 112507707 A CN112507707 A CN 112507707A
Authority
CN
China
Prior art keywords
chinese
english
word
sub
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011408521.2A
Other languages
Chinese (zh)
Inventor
高昇宇
皮一晨
朱红
周冬旭
张玮亚
刘少君
胡年超
李存斌
王其清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202011408521.2A priority Critical patent/CN112507707A/en
Publication of CN112507707A publication Critical patent/CN112507707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things, and belongs to the technical field of data processing methods for management. The method comprises the steps of dividing the power Internet of things into 8 sub-fields, obtaining documents through retrieval, and extracting titles, abstracts, keywords and publication years of the documents as document data; extracting sentences containing keywords in the abstract as input of a space tool, training to obtain an entity recognition model, and traversing each sentence in the abstract to perform entity recognition to obtain key technical terms of the power internet of things; mapping Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model, constructing a co-occurrence matrix of key technical terms and sub-fields, calculating two-dimensional mutual information of any two sub-fields, and finally judging the association strength between innovation technologies of any two sub-fields according to the two-dimensional mutual information. The method can provide reliable data sources for judging the association degree between the innovative technologies of the power Internet of things in different fields.

Description

Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things
Technical Field
The invention relates to a method for analyzing and judging mutual cooperative relationship among innovative technologies in different sub-fields of an electric power internet of things, and belongs to the technical field of data processing methods suitable for management.
Background
The electric power internet of things is an information physical fusion system, and the construction process of the electric power internet of things is also an innovative application process of the internet of things related technology in an electric power system. The research on the technical coupling action points and the collaborative innovation relationship between the internet of things related technology and the power system is beneficial to searching key technical breakthrough points of the power internet of things and developing efficient innovation paths.
At present, the coupling collaborative research aiming at the electric power system and the innovative technology of the internet of things focuses on the technical development situation of the internet of things, but because the electric power internet of things is a physical information fusion system and the technical innovation thereof comprises two aspects of construction of the electric power system and the internet of things, the currently known coupling collaborative research aiming at the electric power system and the innovative technology of the internet of things cannot provide an effective and reliable analysis basis for judging the development direction of the innovative technology of the electric power internet of things.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: effective and reliable data basis is provided for judging the development direction of the power Internet of things innovation technology.
The technical scheme provided by the invention for solving the technical problems is as follows: a correlation degree analysis and judgment method for innovative technologies in different fields of the power Internet of things comprises the following steps:
step 1, dividing and collecting document data in the field of power internet of things, specifically comprising the following steps:
dividing the power internet of things into 8 sub-fields of a power source end, a network end, a load end, a storage end, an internet of things sensing layer, a network layer, a computing layer and an application layer; constructing Chinese and English literature search formulas related to the sub-fields according to the definitions of the sub-fields, wherein each search formula comprises a plurality of search terms; searching the technical sub-fields from the known network and the Web of Science core database according to the search formula, respectively acquiring Chinese documents and English documents, respectively extracting titles, abstracts, keywords and publication years from the Chinese documents and the English documents as Chinese document data and English document data, and forming Chinese document data and English document data together;
step 2, obtaining key technical terms of the power internet of things, specifically as follows:
step 2.1, extracting sentences containing the keywords corresponding to each document in the abstract of each document, and taking the extracted sentences as input of a space tool and training to obtain an entity recognition model;
step 2.2, traversing each sentence in all abstracts in the Chinese and English literature data by using the entity recognition model to perform entity recognition, if the recognized entity is in the same sentence with the keyword of the literature, using the entity as a key technical term of the power Internet of things, and counting the occurrence times of all power Internet of things key technical terms in the Chinese and English literature data;
step 3, performing unified vectorization processing on the Chinese and English literature data of the power internet of things, and mapping the Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model, wherein the steps are as follows:
3.1, self-defining a Chinese and English translation anchor file, wherein the Chinese and English translation anchor file defines the one-to-one correspondence of Chinese and English translations of common words, obtaining an additional Chinese and English correspondence by calling a search word in a Baidu translation Chinese search formula and translating Chinese in the key technical term obtained in the step 2.2 into English, and adding the additional Chinese and English correspondence to the Chinese and English translation anchor file;
step 3.2, performing word segmentation processing on the Chinese and English literature data by using Chinese and English word segmentation tools respectively to obtain a Chinese word sequence of the Chinese literature data and an English word sequence of the English literature data respectively, training the Chinese word sequence and the English word sequence of the literature data respectively by using word2vec to obtain a word vector of each word, wherein the word vectors respectively form a Chinese literature word embedding matrix and an English literature word embedding matrix, and the dimension of each matrix is the number of words in the corresponding Chinese literature or English literature data multiplied by the same word vector dimension d;
and 3.3, constructing a bilingual word vector mapping model from Chinese to English, as shown in a formula (1).
Figure RE-GDA0002942335740000021
Where d denotes the word vector dimension, Md(R) represents a real matrix defined on a real number field R by d, S and T represent a Chinese embedding matrix and an English literature word embedding matrix respectively, W is a weight matrix, argmin represents minimizing the distance between the Chinese word contribution embedding matrix S and the English literature word embedding matrix T | | | WS-T | | |F,|| ||FExpressing Frobenius norm, and obtaining an optimal weight matrix W by a calculation result*And obtaining a Chinese and English bilingual word embedding matrix through the bilingual word vector mapping model.
Step 4, constructing a co-occurrence matrix of the key technical terms and the sub-fields, which comprises the following specific steps:
step 4.1, dividing the search word into 8 types according to 8 sub-fields, and taking a word vector corresponding to the search word as a word vector v of the search word according to the Chinese and English literature bilingual word embedding matrix obtained in the step 3;
step 4.2, selecting a word vector corresponding to the key technical term from the Chinese and English literature bilingual word embedding matrix obtained in the step 3 as a word vector u of the key technical term, calculating the similarity D (u, v) between the word vector u of the key technical term and the word vector v of the search word according to cosine similarity, and setting a similarity threshold value to be 0.3, wherein when D (u, v) > 0.3, the key technical term belongs to the sub-field corresponding to the word vector v of the search word;
step 4.3, obtaining the subordinate relationship between the key technical terms and the sub-fields according to the step 4.2, taking the sum of the times of all the key technical terms corresponding to the sub-fields appearing in the literature data as the co-occurrence times of the key technical terms and the sub-fields, and constructing a co-occurrence matrix of the key technical terms and the sub-fields according to the publication year division of the literature;
step 5, calculating the mutual information of any two sub-fields, specifically as follows:
step 5.1 for any two sub-domains x1And x2Respectively calculating the one-dimensional information entropy H (x) of the two sub-domains according to the formula (2)1) And H (x)2)。
Figure RE-GDA0002942335740000022
Wherein x is a sub-domain, ciThe number of co-occurrences of key technical terms in the i (i ═ 1, 2.., 8) th sub-domain;
step 5.1 separately calculating two sub-domains x according to the formula (3)1And x2Two-dimensional information entropy H (x)1,x2),
Figure RE-GDA0002942335740000031
Wherein, c1And c2Respectively being any two sub-fields x1And x2The number of co-occurrences of the key technical term of (c),
then the two sub-domains x1And x2The two-dimensional mutual information quantity is obtained by calculating the formula (4),
H(x1)+H(x2)-H(x1,x2) (4),
judging any two sub-fields x according to the two-dimensional mutual information quantity1And x2The degree of correlation between the innovative technologies of (1).
The invention has the beneficial effects that: because the power internet of things comprises sub-fields of power and the internet of things, most of the existing power internet of things innovation technology research based on scientific literature is based on literature statistical measurement methods, and therefore data analysis on the key technology of the power internet of things and the relation between the key technology and the related sub-fields related to the content of the scientific literature is lacked; according to the method, from the perspective of analyzing the text data of the electric power Internet of things literature, the key technical terms contained in the text of the electric power Internet of things sub-field literature are mined, the subordination relation between the technical terms and the sub-field is established, the co-occurrence times of the electric power Internet of things key technical terms and the electric power Internet of things sub-field are counted, and a more reliable data source is provided for judging the degree of the cooperative association between the electric power Internet of things innovation technologies in different fields.
Drawings
The method for analyzing and judging the association degree of innovative technologies in different fields of the power internet of things is further described with reference to the accompanying drawings.
Fig. 1 is a distribution diagram of a Chinese word-donation embedding matrix in a two-dimensional plane.
Fig. 2 is a distribution diagram of an english literature word embedding matrix in a two-dimensional plane.
Fig. 3 is a distribution diagram of a chinese-english bilingual word embedding matrix on a two-dimensional plane.
FIG. 4 is a relationship diagram of mutual information quantity between three pairs of source-load, source-store, and network-store domains.
Detailed Description
Examples
The relevance degree analysis and judgment method for the innovative technologies in different fields of the power internet of things comprises the following steps:
step 1, dividing and collecting document data in the field of power internet of things, specifically comprising the following steps:
dividing the power internet of things into 8 sub-fields of a power source end, a network end, a load end, a storage end, an internet of things sensing layer, a network layer, a computing layer and an application layer; constructing Chinese and English literature search formulas related to the sub-fields according to the definitions of the sub-fields, wherein each search formula comprises a plurality of search terms; searching the technical sub-fields from the HowNet and the Web of Science core database according to the search formula, respectively obtaining Chinese documents and English documents, and extracting titles, abstracts, keywords and publication years of the documents (including the Chinese documents and the English documents) as document data (including Chinese document data and English document data); the chinese and english literature search formula part of this example is shown in table 1 below,
TABLE 1
Figure RE-GDA0002942335740000041
Figure RE-GDA0002942335740000051
Figure RE-GDA0002942335740000061
The number of documents retrieved and acquired in this embodiment is shown in table 2.
TABLE 2
Figure RE-GDA0002942335740000062
Step 2, obtaining key technical terms of the power internet of things, specifically as follows:
step 2.1, extracting sentences containing the keywords corresponding to the documents in the abstract of each document, taking the extracted sentences as input of a space tool and training to obtain an entity recognition model, wherein the space tool is an open source tool designed aiming at NLP word segmentation, entity recognition and part of speech tagging and supports custom training of the entity recognition model;
and 2.2, traversing each sentence in all abstracts in the document data by using the entity identification model to identify an entity, if the identified entity is in the same sentence with the keyword of the document, taking the entity as a key technical term of the power internet of things, counting the occurrence frequency of all key technical terms of the power internet of things in the document data, and obtaining the key technical term with higher occurrence frequency as shown in a table 3.
TABLE 3
Figure RE-GDA0002942335740000071
And 3, uniformly vectorizing the Chinese and English literature data of the power Internet of things, and mapping the Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model. In order to avoid the influence of Chinese and English literature data difference on the attribution of the sub-fields for judging the key technical terms of the power internet of things, the multi-language natural language processing word embedding technology is used for vectorizing the Chinese and English literature data to obtain Chinese and English words and donation word embedding matrixes distributed in the same vector space, so that the dependency relationship between the key technical terms of the power internet of things and the sub-fields is conveniently established, and the method specifically comprises the following steps:
3.1, self-defining a Chinese and English translation anchor file, wherein the Chinese and English translation anchor file defines the one-to-one correspondence of Chinese and English translations of common words, obtaining an additional Chinese and English correspondence by calling a search word in a Baidu translation Chinese search formula and translating Chinese in the key technical term obtained in the step 2.3 into English, and adding the additional Chinese and English correspondence to the Chinese and English translation anchor file;
step 3.2, performing word segmentation processing on document data (including Chinese document data and English document data) by using Chinese and English word segmentation tools (such as jieba and nltk) respectively to obtain Chinese word sequences and English word sequences of the document data respectively, and training the Chinese word sequences and the English word sequences respectively by using word2vec to obtain word vectors of each word (the word2vec model can represent the words as multidimensional vectors so as to map texts to word embedding matrixes formed by the multidimensional vectors), wherein the word vectors respectively form a Chinese document word embedding matrix and an English document word embedding matrix, and the dimension of each matrix is the dimension d of the word vector multiplied by the number of the words in the corresponding document data (the Chinese document data or the English document data); the word vector represents each word as a vector, and the dimensions of the word vector indicate the number of elements contained in the vector. The word vector dimension is set to 300 in this embodiment. Fig. 1 and fig. 2 show the distribution of the chinese literature word embedding matrix and the english literature word embedding matrix in a two-dimensional plane, respectively.
Step 3.3, a bilingual word vector mapping model from Chinese to English is constructed, as shown in formula (1),
Figure RE-GDA0002942335740000081
where d denotes the word vector dimension, Md(R) represents a real matrix defined on a real number field R by d, S and T represent a Chinese embedding matrix and an English literature word embedding matrix respectively, W is a weight matrix, argmin represents minimizing the distance between the Chinese word contribution embedding matrix S and the English literature word embedding matrix T | | | WS-T | | |F,|| ||FExpressing Frobenius norm, and obtaining an optimal weight matrix W by a calculation result*(ii) a And obtaining a Chinese and English bilingual word embedding matrix through the bilingual word vector mapping model.
The optimization goal of the model is to solve a weight matrix W so that the distance between the Chinese donation word embedding matrix and the English literature word embedding matrix is | | | WS-T | |FThe shortest, thereby unifying the vector space where the Chinese word-donation embedding matrix and the English literature word embedding matrix are located; the model can be converted into a Procrustes problem, and iterative solution is carried out by adopting a singular value decomposition and gradient descent method to obtain the optimal W*The translation anchor file provides a one-to-one correspondence relationship of partial Chinese and English reference words, and the Chinese and English reference words are embedded into any two Chinese sums in the matrixThe distance between the English word vectors can be indirectly calculated by solving the word vector distance between each word vector and the reference word in the same language. Thus, by the weight matrix W*The Chinese document word embedding matrix can be mapped to the same vector space as the English document word embedding matrix, so that word vectors in the word embedding matrix can be compared with each other to jointly form a Chinese and English document bilingual word embedding matrix, the word embedding matrix comprises word vectors corresponding to all word sequences of Chinese and English document data, and the distribution of the Chinese and English bilingual word embedding matrix on a two-dimensional plane is shown as shown in fig. 3.
Step 4, constructing a co-occurrence matrix of the key technical terms and the sub-fields, which comprises the following specific steps:
step 4.1, extracting the search words in the search formula in the table 1, classifying the search words into 8 types according to 8 sub-fields, and extracting word vectors corresponding to the search words as word vectors v of the search words according to the Chinese and English literature bilingual word embedding matrix obtained in the step 3;
step 4.2, selecting a word vector corresponding to each key technical term from the word vectors obtained in the step 4.1 as a word vector u of each key technical term, calculating the similarity D (u, v) of the word vector u of each key technical term and the word vector v of the search word according to cosine similarity, and setting a similarity threshold value to be 0.3, wherein when D (u, v) > 0.3, the key technical term belongs to a sub-field corresponding to the word vector v of the search word;
step 4.3: obtaining the dependency relationship between each key technical term and 8 sub-fields according to the calculation process in the step 4.2, taking the sum of the occurrence times of all key technical terms corresponding to the 8 sub-fields in the document data as the co-occurrence times of the key technical terms and the sub-fields (called term-field for short), and dividing according to the publication years of the document to which each key technical term belongs, and constructing a co-occurrence matrix of each key technical term and 8 sub-fields; as shown in table 4.
TABLE 4
Figure RE-GDA0002942335740000082
Figure RE-GDA0002942335740000091
And 5: and calculating mutual information of any two sub-fields.
Step 5.1: for any two sub-domains x1And x2Respectively calculating the one-dimensional information entropy H (x) of the two sub-domains according to the formula (2)1) And H (x)2),
Figure RE-GDA0002942335740000092
Wherein x is a sub-domain, ciThe number of co-occurrences of key technical terms in the i (i ═ 1, 2.., 8) th sub-domain; for example, the one-dimensional entropy of the power source terminal field in 2010 is
Figure RE-GDA0002942335740000093
Step 5.1: respectively calculating two sub-domains x according to formula (3)1And x2Two-dimensional information entropy H (x)1,x2),
Figure RE-GDA0002942335740000094
Wherein, c1And c2Respectively being any two sub-fields x1And x2The number of co-occurrences of the key technical term of (c),
then the two sub-domains x1And x2The two-dimensional mutual information quantity is obtained by calculating the formula (4),
H(x1)+H(x2)-H(x1,x2) (4),
FIG. 4 shows the calculation results of the mutual information amount between the three pairs of source-load, source-store and network-store sub-domains. Shown in table 5, the two-dimensional average mutual information calculation results of 8 sub-domains are obtained by adding and averaging the two-dimensional mutual information of any two sub-domains in 2010-2019,
TABLE 5(mbit)
Figure RE-GDA0002942335740000101
According to the two-dimensional mutual information quantity obtained by the calculation, any two sub-fields x can be judged1And x2The degree of correlation between the innovative technologies of (1).
The above description is only for the preferred embodiment of the present invention, but the present invention is not limited thereto, for example. All equivalents and modifications of the inventive concept and its technical solutions are intended to be included within the scope of the present invention.

Claims (1)

1. A correlation degree analysis and judgment method for innovative technologies in different fields of the power Internet of things is characterized by comprising the following steps:
step 1, dividing and collecting document data in the field of power internet of things, specifically comprising the following steps:
dividing the power internet of things into 8 sub-fields of a power source end, a network end, a load end, a storage end, an internet of things sensing layer, a network layer, a computing layer and an application layer; constructing Chinese and English literature search formulas related to the sub-fields according to the definitions of the sub-fields, wherein each search formula comprises a plurality of search terms; searching the technical sub-fields from the known network and the Web of Science core database according to the search formula, respectively acquiring Chinese documents and English documents, respectively extracting titles, abstracts, keywords and publication years from the Chinese documents and the English documents as Chinese document data and English document data, and forming Chinese document data and English document data together;
step 2, obtaining key technical terms of the power internet of things, specifically as follows:
step 2.1, extracting sentences containing the keywords corresponding to each document in the abstract of each document, and taking the extracted sentences as input of a space tool and training to obtain an entity recognition model;
step 2.2, traversing each sentence in all abstracts in the Chinese and English literature data by using the entity recognition model to perform entity recognition, and if the recognized entity is in the same sentence with the keyword of the literature, taking the entity as a key technical term of the power Internet of things;
step 3, performing unified vectorization processing on the Chinese and English literature data of the power internet of things, and mapping the Chinese and English literature data to a Chinese and English bilingual word embedding matrix by using a word embedding model, wherein the steps are as follows:
3.1, self-defining a Chinese and English translation anchor file, wherein the Chinese and English translation anchor file defines the one-to-one correspondence of Chinese and English translations of common words, obtaining an additional Chinese and English correspondence by calling a search word in a Baidu translation Chinese search formula and translating Chinese in the key technical term obtained in the step 2.2 into English, and adding the additional Chinese and English correspondence to the Chinese and English translation anchor file;
step 3.2, performing word segmentation processing on the Chinese and English literature data by using Chinese and English word segmentation tools respectively to obtain a Chinese word sequence of the Chinese literature data and an English word sequence of the English literature data respectively, training the Chinese word sequence and the English word sequence of the literature data respectively by using word2vec to obtain a word vector of each word, wherein the word vectors respectively form a Chinese literature word embedding matrix and an English literature word embedding matrix, and the dimension of each matrix is the number of words in the corresponding Chinese literature or English literature data multiplied by the same word vector dimension d;
and 3.3, constructing a bilingual word vector mapping model from Chinese to English, as shown in a formula (1).
Figure RE-FDA0002942335730000011
Where d denotes the word vector dimension, Md(R) represents a real matrix defined on a real number domain R by d, S and T represent a Chinese embedding matrix and an English literature word embedding matrix, respectively, W is a weight matrix, and argmin represents a minimized Chinese contribution word embedding matrix S toDistance | | | WS-T | | non-conducting phosphor of English literature word embedding matrix TF,|| ||FExpressing Frobenius norm, and obtaining an optimal weight matrix W by a calculation result*And obtaining a Chinese and English bilingual word embedding matrix through the bilingual word vector mapping model.
Step 4, constructing a co-occurrence matrix of the key technical terms and the sub-fields, which comprises the following specific steps:
step 4.1, dividing the search word into 8 types according to 8 sub-fields, and taking a word vector corresponding to the search word as a word vector v of the search word according to the Chinese and English literature bilingual word embedding matrix obtained in the step 3;
step 4.2, selecting a word vector corresponding to the key technical term from the Chinese and English literature bilingual word embedding matrix obtained in the step 3 as a word vector u of the key technical term, calculating the similarity D (u, v) between the word vector u of the key technical term and the word vector v of the search word according to cosine similarity, and setting a similarity threshold value to be 0.3, wherein when D (u, v) > 0.3, the key technical term belongs to the sub-field corresponding to the word vector v of the search word;
step 4.3, obtaining the subordinate relationship between the key technical terms and the sub-fields according to the step 4.2, taking the sum of the times of all the key technical terms corresponding to the sub-fields appearing in the literature data as the co-occurrence times of the key technical terms and the sub-fields, and constructing a co-occurrence matrix of the key technical terms and the sub-fields according to the publication year division of the literature;
step 5, calculating the mutual information of any two sub-fields, specifically as follows:
step 5.1 for any two sub-domains x1And x2Respectively calculating the one-dimensional information entropy H (x) of the two sub-domains according to the formula (2)1) And H (x)2)。
Figure RE-FDA0002942335730000021
Wherein x is a sub-domain, ciNo. (i) ═ 1, 2.., 8) th sub-collarThe number of co-occurrences of the key technical term of the domain;
step 5.1 separately calculating two sub-domains x according to the formula (3)1And x2Two-dimensional information entropy H (x)1,x2),
Figure RE-FDA0002942335730000022
Wherein, c1And c2Respectively being any two sub-fields x1And x2The number of co-occurrences of the key technical term of (c),
then the two sub-domains x1And x2The two-dimensional mutual information quantity is obtained by calculating the formula (4),
H(x1)+H(x2)-H(x1,x2) (4),
judging any two sub-fields x according to the two-dimensional mutual information quantity1And x2The degree of correlation between the innovative technologies of (1).
CN202011408521.2A 2020-12-04 2020-12-04 Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things Pending CN112507707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011408521.2A CN112507707A (en) 2020-12-04 2020-12-04 Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011408521.2A CN112507707A (en) 2020-12-04 2020-12-04 Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things

Publications (1)

Publication Number Publication Date
CN112507707A true CN112507707A (en) 2021-03-16

Family

ID=74971709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011408521.2A Pending CN112507707A (en) 2020-12-04 2020-12-04 Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things

Country Status (1)

Country Link
CN (1) CN112507707A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177420A (en) * 2021-04-29 2021-07-27 同方知网(北京)技术有限公司 Chinese-English bilingual dictionary construction method based on academic literature

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177420A (en) * 2021-04-29 2021-07-27 同方知网(北京)技术有限公司 Chinese-English bilingual dictionary construction method based on academic literature

Similar Documents

Publication Publication Date Title
CN112069408B (en) Recommendation system and method for fusion relation extraction
Xie et al. A novel text mining approach for scholar information extraction from web content in Chinese
CN106547739A (en) A kind of text semantic similarity analysis method
CN113268569B (en) Semantic-based related word searching method and device, electronic equipment and storage medium
CN111666766A (en) Data processing method, device and equipment
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN111241410A (en) Industry news recommendation method and terminal
CN114443855A (en) Knowledge graph cross-language alignment method based on graph representation learning
CN111581943A (en) Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph
CN115309915A (en) Knowledge graph construction method, device, equipment and storage medium
Yang et al. Improving word representations with document labels
Alian et al. Arabic sentence similarity based on similarity features and machine learning
CN112507707A (en) Correlation degree analysis and judgment method for innovative technologies in different fields of power internet of things
Ayetiran An index-based joint multilingual/cross-lingual text categorization using topic expansion via BabelNet
US20220207240A1 (en) System and method for analyzing similarity of natural language data
CN112597273A (en) Power distribution automation chart generation method based on NL2SQL technology
CN112270189A (en) Question type analysis node generation method, question type analysis node generation system and storage medium
Mohnot et al. Hybrid approach for Part of Speech Tagger for Hindi language
Abimbola et al. A noun-centric keyphrase extraction model: Graph-based approach
Aejas et al. Named entity recognition for cultural heritage preservation
Chang et al. Incorporating word embedding into cross-lingual topic modeling
Wei et al. Integrating visual word embeddings into translation language model for keyword spotting on historical Mongolian document images
Kumari et al. An Extractive Approach for Automated Summarization of Indian Languages using Clustering Techniques.
CN107402914B (en) Deep learning system and method for natural language
Pham Sensitive keyword detection on textual product data: an approximate dictionary matching and context-score approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination