CN114154498B - Innovative evaluation method based on science and technology big data text content - Google Patents

Innovative evaluation method based on science and technology big data text content Download PDF

Info

Publication number
CN114154498B
CN114154498B CN202111489894.1A CN202111489894A CN114154498B CN 114154498 B CN114154498 B CN 114154498B CN 202111489894 A CN202111489894 A CN 202111489894A CN 114154498 B CN114154498 B CN 114154498B
Authority
CN
China
Prior art keywords
text content
word
text
mth
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111489894.1A
Other languages
Chinese (zh)
Other versions
CN114154498A (en
Inventor
刘业政
陈航
姜元春
钱洋
孙见山
柴一栋
王继成
袁昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202111489894.1A priority Critical patent/CN114154498B/en
Publication of CN114154498A publication Critical patent/CN114154498A/en
Application granted granted Critical
Publication of CN114154498B publication Critical patent/CN114154498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • General Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an innovative evaluation method based on technological big data text content, which comprises the following steps: 1. acquiring, preprocessing and word segmentation of the text content of the science and technology big data; 2. using a TF-IDF model to process the segmented technical big data text data and constructing a technical big data document word vector to be evaluated; 3. performing dimension reduction on the document word vector by using a Principal Component Analysis (PCA); 4. calculating the similarity among all documents under the time window M, and representing the similarity by cosine between word vectors of each document; 5. and sorting the similarity values in each set in a descending order, and selecting L values with the highest similarity with the text, wherein the smallest similarity value can represent the innovation size of the text, and obtaining the normalized innovation score. The invention can effectively evaluate the innovation of the technical big data and improve the evaluation accuracy, thereby laying a foundation for evaluating and screening the valuable technical big data.

Description

Innovative evaluation method based on science and technology big data text content
Technical Field
The invention relates to the field of scientific and technological big data value evaluation, in particular to a scientific and technological big data innovation evaluation method based on text content.
Background
In recent years, with the vigorous development of network and communication technologies, data related to life production of people is in explosive growth, and modern society has also advanced into a big data era, and technological big data is a kind of information resource capable of reflecting the state and process of human technological activity. It can support human beings to get insight into new ideas, discover new laws, invent new technology and develop new products. In other words, on one hand, scientific and technological big data are as valuable as other common data; on the other hand, based on the characteristics of the system, the value of the technology big data is mainly guided by technology innovation; therefore, the innovation is an indispensable characteristic of the technological big data and is also a fundamental characteristic of distinguishing the technological big data from other data.
The scientific and technological big data comprise scientific and technological papers, patents, soft books, standard specifications, policy suggestions and the like, and comprise a large amount of unstructured data represented by text content data, wherein the value and innovation of the scientific and technological papers and the invention patents are more studied, on one hand, researchers describe the data value by establishing a value evaluation index system and applying a traditional metering model, but the quality of the unstructured data such as the text content and the like is harder to measure; on the other hand, students use traditional text analysis methods such as word frequency analysis, co-occurrence word analysis and the like and a topic model method represented by LDA to measure the quality of text content, and the innovation evaluation of the text content is less depicted.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an innovative evaluation method based on the text content of the technological big data, so that the innovation of the technological big data can be effectively evaluated, the evaluation accuracy is improved, and a foundation is laid for the evaluation and screening of valuable technological big data.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
the innovative evaluation method based on the text content of the technical big data is characterized by comprising the following steps:
step 1, after acquiring a text content set of technical big data to be evaluated and performing preprocessing of removing duplication and deletion, dividing the preprocessed text content according to the generation time of the text content to obtain text content { d } with a time stamp 1 ,d 2 ,...,d m ,...,d M };d m Representing the mth text content, M representing the text space in the text content collection;
step 2, regarding the mth text content d m Performing word segmentation, word stopping removal and repeated word merging to obtain the mth text content after word segmentation Representing the mth text content d 'after word segmentation' m N of the N-th word, N m Representing the mth text content d 'after word segmentation' m The total number of different words in (a); thus, forming a corpus D by all the segmented words in the M text contents;
step 3, using a TF-IDF model to process the text content after word segmentation, extracting keywords of technological big data to be evaluated and constructing a document word vector;
step 3.1, calculating the m-th text content D 'after word segmentation in the corpus D by using the formula (1)' m N-th word of (a)TF-IDF value T of (v) nm
In the formula (1), the components are as follows,representing the mth text content d 'after word segmentation' m N-th word->Is used for the word frequency of (a),representing the mth text content d 'after word segmentation' m N-th word->Inverse document frequency in corpus D;
step 3.2, constructing a document word vector of the technical big data to be evaluated;
combining repeated words among the text contents in the corpus D to obtain a combined corpus D' = { t 1 ,t 2 ,...,t p ,...,t P },t p Represents the p-th word; p represents the total word number in the merged corpus D', and the P-th word t is calculated by using the formula (2) p The mth text content d 'after word segmentation' m Degree of importance X in (2) pm Thereby obtaining the mth text content d 'after word segmentation' m Word vector X of (a) m =(X 1m ,X 2m ,…,X pm ,…,X Pm ) T And further obtaining word vectors X= (X) of all documents under the time window M 1 ,X 2 ,...,X m ,...,X M ) And as a sparse matrix;
step 4, performing dimension reduction on the sparse matrix X by using a principal component analysis method;
step 4.1, performing zero-mean treatment on each row of elements in the sparse matrix X to obtain a matrix H;
step 4.2, calculating covariance matrix
Step 4.3, calculating the eigenvalue of the covariance matrix C and the corresponding unit orthogonal eigenvector, and forming the eigenvector into a matrix P according to the descending order of the corresponding eigenvalue and the row;
step 4.4, taking a matrix formed by the elements of the first k rows in the matrix P and multiplying the matrix by the matrix H, thereby obtaining a matrix Y= (Y) after dimension reduction 1 ,Y 2 ,...,Y m ,...,Y M ) Wherein Y is m Represents the mth document word vector after dimension reduction, andY km representing the kth dimension value in the mth document word vector after dimension reduction; k is less than P;
step 5, calculating an mth document word vector Y after the dimension reduction of the time window M m And the z-th document word vector Y after dimension reduction z Cosine similarity value cos between<Y m ,Y z >For representing the mth text content d m And the z-th text content d z Similarity sim between<d m ,d z >Thereby obtaining the mth text content d m Text similarity set { sim between other text content<d m ,d z >Z=1, 2, …, M, and z+.m };
step 6, for the text similarity set { sim }<d m ,d z >The similarity in z=1, 2, …, M, and z is not equal to M is ordered in descending order, the first L values with the largest text similarity are selected, and the L similarity value is used as the M text content d m Is to be added to the mth text content d m Normalized to obtain the mth text content d m Is a novel score of (2);
and 7, calculating innovation scores of all M text contents under the time window M according to the processes of the steps 5 and 6, and arranging the text contents in a descending order, so that the innovation evaluation of the text content set of the technological big data to be evaluated is completed.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention obtains text content data of science and technology big data for preprocessing, eliminates data with missing values, keeps only the latest item of production time for repeated data, and classifies the evaluated data under each time window according to years; combining the existing mature stay vocabulary, segmenting the scientific and technological big data text content through jieba segmentation, deleting the stay words in the text content, and removing nonsensical works; the quality of the scientific and technological big data text data is greatly improved, and the efficiency of the text analysis of the actual data is improved;
2. according to the invention, text data after word segmentation is processed by using a TF-IDF model, keywords of the technical big data are extracted, a TF-IDF value is calculated for each piece of evaluated technical big data word segmentation data, the implementation of key subjects and information of each piece of technical big data under the time window is realized, a document word vector of the technical big data is constructed through word segmentation of the text and the TF-IDF value, and preparation work is carried out for measuring the similarity between subsequent texts;
3. according to the invention, a sparse matrix formed by all the document word vectors under each time window is constructed, and the dimension of the document word vectors is reduced by a method of compressing the sparse matrix through a Principal Component Analysis (PCA); the problem of calculating the distance between sample points in a high-dimensional space is solved, preparation is made for calculating the similarity of the word vectors of the subsequent documents, and the accuracy of the subsequent similarity measurement is improved; the innovation of accurately describing all technological big data in the field under the time window is facilitated;
4. the invention expresses the similarity between documents by calculating cosine similarity between the word vectors of the documents after dimension reduction; and finally, combining a nearest neighbor (KNN) idea, and enabling the smallest similarity value in a plurality of values with the highest similarity between each technological big data and all other technological big data to represent the innovative size of the technological big data, so that the innovative size and the value of the technological big data can be evaluated, and the innovative and valuable technological big data can be screened more effectively.
Drawings
FIG. 1 is a flow chart of the inventive evaluation of scientific and technological big data based text content of the present invention.
Detailed Description
In the embodiment, an innovative evaluation method based on technological big data text content is that keywords of the evaluated technological big data text content are extracted through a TF-IDF model; constructing a document word vector through the keywords and TF-IDF values thereof to form a sparse matrix; performing dimension reduction on the document word vectors by using a Principal Component Analysis (PCA) method through a method of performing row compression on the sparse matrix; then, the cosine similarity between the word vectors of the documents after dimension reduction is calculated to represent the similarity between the documents; finally, combining the nearest neighbor (KNN) idea, the minimum similarity value of the L values with the highest similarity between each text and all other texts can represent the innovative size of the text, specifically, as shown in fig. 1, the method comprises the following steps:
step 1, after acquiring a text content set of technical big data to be evaluated and performing preprocessing of removing duplication and deletion, dividing the preprocessed text content according to the generation time of the text content to obtain text content { d } with a time stamp 1 ,d 2 ,...,d m ,...,d M };d m Representing the mth text content, M representing the text space in the text content collection;
step 2, regarding the mth text content d m Performing word segmentation, word stopping removal and repeated word merging to obtain the mth text content after word segmentation Representing the mth text content d 'after word segmentation' m N of the N-th word, N m Representing the mth text content d 'after word segmentation' m The total number of different words in (a); thus, forming a corpus D by all the segmented words in the M text contents;
step 3, using a TF-IDF model to process the text content after word segmentation, extracting keywords of technological big data to be evaluated and constructing a document word vector;
TF-IDF (Term Frequency-Inverse Document Frequency) is used to evaluate the importance of words to text in a document set or corpus, and consists of two parts: TF and IDF.
Step 3.1: word frequency (TF) is the frequency of occurrence of the word in a text sample, assuming d m For a particular text sample to be used,for the n-th word (if there is a repeated word, the first appearance position of the word is selected as the reference, and the subsequent repeated word is not counted), the word frequency of the word in the text sample is +.>The ratio of the frequency of occurrence of the word to the total frequency of occurrence of all words in the text can be expressed by the formula (1):
step 3.2: inverse Document Frequency (IDF) is used to evaluate the popularity of terms with a corpus. In this embodiment, the corpus D is the text content data of all the large-tech data after word segmentation under each time window,IDF value +.>All of D may be used to include +.>Text number->The sum total number of samples N is expressed as:
step 3.3: calculating the m-th text content D 'after word segmentation in the corpus D by using the method (3)' m N-th word of (a)TF-IDF value T of (v) nm
In the formula (3), the amino acid sequence of the compound,representing the mth text content d 'after word segmentation' m N-th word->Is used for the word frequency of (a),representing the mth text content d 'after word segmentation' m N-th word->Inverse document frequency in corpus D;
step 3.4, constructing a document word vector of the technical big data to be evaluated;
combining repeated words among the text contents in the corpus D to obtain a combined corpus D' = { t 1 ,t 2 ,...,t p ,...,t P },t p Represents the p-th word; p represents the total word number in the merged corpus D', and the P-th word t is calculated by using the formula (4) p The mth text content d 'after word segmentation' m Degree of importance X in (2) pm Thereby obtaining the mth text content d 'after word segmentation' m Word vector X of (a) m =(X 1m ,X 2m ,…,X pm ,…,X Pm ) T And further obtaining word vectors X= (X) of all documents under the time window M 1 ,X 2 ,...,X m ,...,X M ) And as a sparse matrix;
step 4, performing dimension reduction on the sparse matrix X by using a principal component analysis method;
step 4.1, performing zero-mean treatment on each row of elements in the sparse matrix X by using a formula (5) to obtain a matrix H:
in the formula (5), the amino acid sequence of the compound,and satisfy->
Step 4.2, calculating a covariance matrix by using the method (6)
In the formula (6), the covariance matrix C is a real symmetric matrix of p rows and p columns, diagonal elements of the covariance matrix C respectively correspond to variances of data of each row of the matrix H, and j-th row are identical in elements, which represents covariance between j-th row and j-th row of the matrix H:
step 4.3, calculating the eigenvalue of the covariance matrix C and the corresponding unit orthogonal eigenvector by using a formula (7), and forming the eigenvector into a matrix P by rows according to the descending order of the corresponding eigenvalue by using a formula (8);
the characteristic value of C is obtained and is arranged according to the order of the size to be lambda 123 ,...,λ p Due to CFor the real symmetric matrix of p rows and p columns, the sum eigenvalue lambda is not difficult to be found 123 ,...,λ p Sequentially corresponding p unit orthogonal eigenvectors e 1 ,e 2 ,e 3 ,...,e p It is formed into a matrix e= (E) in columns 1 ,e 2 ,e 3 ,...,e p ) Then the covariance matrix C is concluded as follows:
step 4.4, taking a matrix composed of the first k rows of elements in the matrix P by using the formula (9) and multiplying the matrix by the matrix H, thereby obtaining a matrix Y= (Y) after dimension reduction 1 ,Y 2 ,...,Y m ,...,Y M ) Wherein Y is m Represents the mth document word vector after dimension reduction, andY km representing the kth dimension value in the mth document word vector after dimension reduction; k is less than P;
step 5, calculating an mth document word vector Y after the dimension reduction of the time window M m And the z-th document word vector Y after dimension reduction z Cosine similarity value cos between<Y m ,Y z >For representing the mth text content d m And the z-th text content d z Similarity sim between<d m ,d z >Thereby obtaining the mth text content d m Text similarity set { sim between other text content<d m ,d z >Z=1, 2, …, M, and z+.m };
step 5.1, calculating similarity sim between different document word vectors in the k-dimensional vector space by using a formula (10):
in the formula (10), Y jm Represents the j dimension value, Y in the m-th document word vector after dimension reduction jz Representing a j-th dimension value in the z-th document word vector after dimension reduction, M, z=1, 2,3,..m, m+.z, j=1, 2,..k;
step 5.2, calculating word vectors of each documentCosine similarity with all other document word vectors forms M sets, wherein each set has M-1 elements, and the sets are sequentially:
{sim<d 1 ,d m >|1<m≤M},{sim<d 2 ,d m >|1≤m≤M,q≠2},……,{sim<d M ,d m >|1≤m<M}
step 6, for the text similarity set { sim }<d m ,d z >The similarity in z=1, 2, …, M, and z is not equal to M is ordered in descending order, the first L values with the largest text similarity are selected, and the L similarity value is used as the M text content d m Is to be added to the mth text content d m Normalized to obtain the mth text content d m Is a novel score of (2);
step 6.1, sorting cosine similarity values in each set in a descending order, and selecting L values with highest similarity to the text, wherein the smallest similarity value can represent the innovative size of the text, and the document d j (j=1, 2,3., M., (v.), the innovative size of M) can be expressed as formula (11):
sim l (d j )(j=1,2,3...,m,...,M;l=1,2,3,...,k) (11)
step 6.2, standardizing the innovative calculation result of the technical big data, and assigning percentages as shown in (12), wherein the big data text is displayed in a time window MDocument d j (j=1, 2,3., M) can be expressed as:
in the formula (12), sim max For sim l (d j ) Is set at the maximum value of (c), j=1, 2,3., M, M, l=1, 2,3,;
and 7, calculating innovation scores of all M document word vectors under the time window M according to the processes of the steps 5 and 6, and arranging the innovation scores in a descending order, so that the innovation evaluation of the text content set of the technical big data to be evaluated is completed.

Claims (1)

1. An innovative evaluation method based on technological big data text content is characterized by comprising the following steps:
step 1, after acquiring a text content set of technical big data to be evaluated and performing preprocessing of removing duplication and deletion, dividing the preprocessed text content according to the generation time of the text content to obtain text content { d } with a time stamp 1 ,d 2 ,...,d m ,...,d M };d m Representing the mth text content, M representing the text space in the text content collection;
step 2, regarding the mth text content d m Performing word segmentation, word stopping removal and repeated word merging to obtain the mth text content after word segmentation Representing the mth text content d 'after word segmentation' m N of the N-th word, N m Representing the mth text content d 'after word segmentation' m The total number of different words in (a); thus, forming a corpus D by all the segmented words in the M text contents;
step 3, using a TF-IDF model to process the text content after word segmentation, extracting keywords of technological big data to be evaluated and constructing a document word vector;
step 3.1, calculating the m-th text content D 'after word segmentation in the corpus D by using the formula (1)' m N-th word of (a)TF-IDF value T of (v) nm
In the formula (1), the components are as follows,representing the mth text content d 'after word segmentation' m N-th word->Is used for the word frequency of (a),representing the mth text content d 'after word segmentation' m N-th word->Inverse document frequency in corpus D;
step 3.2, constructing a document word vector of the technical big data to be evaluated;
combining repeated words among the text contents in the corpus D to obtain a combined corpus D' = { t 1 ,t 2 ,...,t p ,...,t P },t p Represents the p-th word; p represents the total word number in the merged corpus D', and the P-th word t is calculated by using the formula (2) p The mth text content d 'after word segmentation' m Degree of importance X in (2) pm Thereby obtaining the mth text content d 'after word segmentation' m Word vectors of (a)And then obtaining word vectors X= (X) of all documents under the time window M 1 ,X 2 ,...,X m ,...,X M ) And as a sparse matrix;
step 4, performing dimension reduction on the sparse matrix X by using a principal component analysis method;
step 4.1, performing zero-mean treatment on each row of elements in the sparse matrix X to obtain a matrix H;
step 4.2, calculating covariance matrix
Step 4.3, calculating the eigenvalue of the covariance matrix C and the corresponding unit orthogonal eigenvector, and forming the eigenvector into a matrix P according to the descending order of the corresponding eigenvalue and the row;
step 4.4, taking a matrix formed by the elements of the first k rows in the matrix P and multiplying the matrix by the matrix H, thereby obtaining a matrix Y= (Y) after dimension reduction 1 ,Y 2 ,...,Y m ,...,Y M ) Wherein Y is m Represents the mth document word vector after dimension reduction, andY km representing the kth dimension value in the mth document word vector after dimension reduction; k is less than P;
step 5, calculating an mth document word vector Y after the dimension reduction of the time window M m And the z-th document word vector Y after dimension reduction z Cosine similarity value cos between<Y m ,Y z >For representing the mth text content d m And the z-th text content d z Similarity sim between<d m ,d z >Thereby obtaining the mth text content d m Text similarity to other text contentSex set { sim<d m ,d z >]z=1, 2, …, M, and z+.m };
step 6, for the text similarity set { sim }<d m ,d z >The similarity in z=1, 2, …, M, and z is not equal to M is ordered in descending order, the first L values with the largest text similarity are selected, and the L similarity value is used as the M text content d m Is to be added to the mth text content d m Normalized to obtain the mth text content d m Is a novel score of (2);
and 7, calculating innovation scores of all M text contents under the time window M according to the processes of the steps 5 and 6, and arranging the text contents in a descending order, so that the innovation evaluation of the text content set of the technological big data to be evaluated is completed.
CN202111489894.1A 2021-12-08 2021-12-08 Innovative evaluation method based on science and technology big data text content Active CN114154498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111489894.1A CN114154498B (en) 2021-12-08 2021-12-08 Innovative evaluation method based on science and technology big data text content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111489894.1A CN114154498B (en) 2021-12-08 2021-12-08 Innovative evaluation method based on science and technology big data text content

Publications (2)

Publication Number Publication Date
CN114154498A CN114154498A (en) 2022-03-08
CN114154498B true CN114154498B (en) 2024-02-20

Family

ID=80453329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111489894.1A Active CN114154498B (en) 2021-12-08 2021-12-08 Innovative evaluation method based on science and technology big data text content

Country Status (1)

Country Link
CN (1) CN114154498B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885675A (en) * 2019-02-25 2019-06-14 合肥工业大学 Method is found based on the text sub-topic for improving LDA
CN110597949A (en) * 2019-08-01 2019-12-20 湖北工业大学 Court similar case recommendation model based on word vectors and word frequency
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7720675B2 (en) * 2003-10-27 2010-05-18 Educational Testing Service Method and system for determining text coherence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885675A (en) * 2019-02-25 2019-06-14 合肥工业大学 Method is found based on the text sub-topic for improving LDA
CN110597949A (en) * 2019-08-01 2019-12-20 湖北工业大学 Court similar case recommendation model based on word vectors and word frequency
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种文本挖掘和文献计量的科技论文评估方法;王莉军;姚长青;刘志辉;;情报科学;20190501(第05期);全文 *

Also Published As

Publication number Publication date
CN114154498A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
Goëau et al. Lifeclef bird identification task 2016: The arrival of deep learning
CN111401040B (en) Keyword extraction method suitable for word text
CN104966097A (en) Complex character recognition method based on deep learning
CN105843850B (en) Search optimization method and device
CN108804595B (en) Short text representation method based on word2vec
CN110765254A (en) Multi-document question-answering system model integrating multi-view answer reordering
CN110046264A (en) A kind of automatic classification method towards mobile phone document
CN112051986B (en) Code search recommendation device and method based on open source knowledge
Wolf et al. Computerized paleography: tools for historical manuscripts
CN108647729A (en) A kind of user&#39;s portrait acquisition methods
CN113688635B (en) Class case recommendation method based on semantic similarity
CN111813933A (en) Automatic identification method for technical field in technical atlas
CN103745242A (en) Cross-equipment biometric feature recognition method
CN109344248B (en) Academic topic life cycle analysis method based on scientific and technological literature abstract clustering
CN113342950B (en) Answer selection method and system based on semantic association
CN111984790B (en) Entity relation extraction method
CN114154498B (en) Innovative evaluation method based on science and technology big data text content
CN111242131B (en) Method, storage medium and device for identifying images in intelligent paper reading
CN116682015A (en) Feature decoupling-based cross-domain small sample radar one-dimensional image target recognition method
CN111221915B (en) Online learning resource quality analysis method based on CWK-means
CN114722183A (en) Knowledge pushing method and system for scientific research tasks
CN113657106B (en) Feature selection method based on normalized word frequency weight
Bria et al. Deep Transfer Learning for writer identification in medieval books
CN105404899A (en) Image classification method based on multi-directional context information and sparse coding model
CN115544361A (en) Frame for predicting change of attention point of window similarity analysis and analysis method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant