CN112650847B - Technological research hotspot theme prediction method - Google Patents

Technological research hotspot theme prediction method Download PDF

Info

Publication number
CN112650847B
CN112650847B CN201910961978.7A CN201910961978A CN112650847B CN 112650847 B CN112650847 B CN 112650847B CN 201910961978 A CN201910961978 A CN 201910961978A CN 112650847 B CN112650847 B CN 112650847B
Authority
CN
China
Prior art keywords
subject
topic
word
linked list
frequent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910961978.7A
Other languages
Chinese (zh)
Other versions
CN112650847A (en
Inventor
谢能付
郝心宁
熊炜
徐倩
吴蕾
梁晓贺
吴赛赛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Information Institute of CAAS
Original Assignee
Agricultural Information Institute of CAAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Information Institute of CAAS filed Critical Agricultural Information Institute of CAAS
Priority to CN201910961978.7A priority Critical patent/CN112650847B/en
Publication of CN112650847A publication Critical patent/CN112650847A/en
Application granted granted Critical
Publication of CN112650847B publication Critical patent/CN112650847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a technological research hotspot topic prediction method, which comprises the steps of preprocessing subject documents according to a technological research topic word list related to a topic to be detected to obtain word segmentation documents of corresponding years, and converting the word segmentation documents into binary vector matrixes; processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent subject set; filtering the frequent topic sets to obtain hot topic sets; converting the hot topic set into time sequence data, training a plurality of prediction models according to the time sequence data, and obtaining topic prediction models by using a weighting processing method; predicting the occurrence frequency of the theme to be detected according to the theme prediction model. According to the invention, word filtering based on the domain subject vocabulary is adopted, so that the characteristics of the technical research domain are well summarized, and the hot topic in the technical research domain is identified by adopting a frequent item set algorithm, so that the hot topic in the future time can be accurately predicted.

Description

Technological research hotspot theme prediction method
Technical Field
The invention relates to the field of information processing, in particular to a technological research hotspot theme prediction method.
Background
Most of the prior art adopts a clustering method to identify scientific hot topics, and part of prediction methods are only carried out by utilizing key high frequencies, so that the prediction of the scientific hot topics in a future period cannot be effectively carried out, and the accuracy of the hot topic prediction is low.
Disclosure of Invention
The invention aims to provide a technological research hot spot theme prediction method which can accurately predict the occurrence frequency of hot spot themes.
In order to achieve the above object, the present invention provides the following solutions:
a technological research hotspot topic prediction method, the prediction method comprising:
determining a database of the corresponding scientific and technological research field according to the subject to be tested, wherein the database comprises subject documents, network resources and expert knowledge;
constructing a subject word list of one-dimensional transverse vectors according to the database;
preprocessing the annual subject literature according to the subject vocabulary to obtain subject literature word segmentation documents of corresponding years;
obtaining binary vectors of corresponding years according to the occurrence condition of words in the subject word list in the subject document word segmentation document by utilizing the subject word list; the binary vectors of all years form a binary vector matrix;
processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent subject set;
filtering the frequent topic set to obtain a hot topic set;
converting the hot topic collection into time sequence data;
training a plurality of prediction models according to the time sequence data, and obtaining a theme prediction model by using a weighting processing method;
and predicting the occurrence frequency of the theme to be detected by using the theme prediction model.
Optionally, the preprocessing is performed on the annual subject literature according to the subject vocabulary to obtain the subject literature word segmentation document of the corresponding year, which specifically includes:
the following treatments are performed for every year scientific literature:
sentence division is carried out on the scientific literature to obtain a corresponding sentence set;
and performing word segmentation processing on the sentence set according to the subject word list to form a subject document word segmentation document of corresponding year.
Alternatively, if the word in the subject document word segmentation document appears in the subject word list, the word is marked as 1, otherwise, the word is marked as 0, and a binary vector of the corresponding year is formed.
Optionally, the processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent topic set specifically includes:
taking binary vectors corresponding to word segmentation documents in any year as transactions, arranging subject words in the word segmentation documents according to the order of the support degree from large to small, and deleting frequent 1 item sets to obtain updated transaction data sets;
converting the transaction data set into a transaction linked list group, wherein each transaction linked list of the transaction linked list group stores information of each transaction with the same head element;
updating the transaction linked list group according to the increasing arrangement sequence of the supporting degree of the head element to obtain an updated transaction linked list group;
digging the updated transaction linked list group to obtain a frequent topic set of corresponding years;
and calculating each subject term by taking the frequent subject set of the last year as a reference, if the number of years of occurrence of the subject term exceeds a threshold value, reserving, otherwise, deleting, and obtaining the frequent hot subject set.
Optionally, the step of updating the transaction linked list group according to the increasing arrangement order of the supporting degree of the head element to obtain an updated transaction linked list group specifically includes:
recursively scanning the transaction linked list to find out frequent item sets;
deleting the transaction linked list group from the transaction linked list group, and creating a transaction linked list group taking the head element of the transaction linked list as a prefix;
and merging the transaction linked list group and the transaction linked list group with the transaction linked list head element as a prefix to obtain the updated transaction linked list group.
Optionally, filtering the frequent topic set to obtain a hot topic set, which specifically includes:
dividing the frequent subject set to obtain related subject words;
constructing related subject phrases according to related subject words;
each related subject phrase is processed as follows:
deleting repeated subject words to obtain a subject phrase without repetition;
deleting the subtopic word and the subtopic word of each topic word in the topic word group without repetition to obtain a hot topic word group;
and forming the hot topic set according to each hot topic phrase.
Optionally, the converting the hotspot topic set into time sequence data specifically includes:
forming a vector set according to the frequencies of topics in the hot topic set in the corresponding year;
and arranging the vector values in the vector set in order from small to large according to the year to form the time sequence data.
Optionally, the topic prediction model is: freq (X) =w 1 *M 1 (X)+w 2 *M 2 (X)+...+w j *M j (X)+...w J *M J (X);
Wherein X represents the subject to be tested, freq (X) represents the frequency of occurrence of the subject to be tested, M j (X) represents M j Predicting value, w, of model on the subject to be detected j Represents M j Weight of model prediction, j=1, 2.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention discloses a technological research hotspot topic prediction method, which comprises the steps of preprocessing subject documents according to a technological research topic word list related to a topic to be detected to obtain word segmentation documents of corresponding years, and converting the word segmentation documents into binary vector matrixes; processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent subject set; filtering the frequent topic sets to obtain hot topic sets; converting the hot topic set into time sequence data, training a plurality of prediction models according to the time sequence data, and obtaining topic prediction models by using a weighting processing method; predicting the occurrence frequency of the theme to be detected according to the theme prediction model. According to the invention, word filtering based on the domain subject vocabulary is adopted, so that the characteristics of the technical research domain are well summarized, and the hot topic in the technical research domain is identified by adopting a frequent item set algorithm, so that the hot topic in the future time can be accurately predicted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for predicting hot topics in technical research according to the present invention;
FIG. 2 is a binary vector matrix of the present invention;
FIG. 3 is a timing sequence data of the present invention;
fig. 4 is a graph of the subject predictors of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a technological research hot spot theme prediction method, which improves the accuracy of hot spot theme prediction.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Examples
As shown in fig. 1, the method for predicting the hot topic of scientific research of the present invention includes:
step 101: determining a database of the corresponding scientific and technological research field according to the subject to be tested, wherein the database comprises subject documents, network resources and expert knowledge;
step 102: constructing a subject word list of one-dimensional transverse vectors according to the database;
step 103: preprocessing the annual subject literature according to the subject vocabulary to obtain subject literature word segmentation documents of corresponding years;
step 104: obtaining binary vectors of corresponding years according to the occurrence condition of words in the subject word list in the subject document word segmentation document by utilizing the subject word list; the binary vectors of all years form a binary vector matrix;
step 105: processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent subject set;
step 106: filtering the frequent topic set to obtain a hot topic set;
step 107: converting the hot topic collection into time sequence data;
step 108: training a plurality of prediction models according to the time sequence data, and obtaining a theme prediction model by using a weighting processing method;
step 109: and predicting the occurrence frequency of the theme to be detected by using the theme prediction model.
When subject literature a in the scientific research field is collected, the following should be satisfied: 1) A minimal literature collection capable of covering research topics in the scientific research field; 2) The literature at least comprises three aspects of a title, a abstract and a keyword; 3) Data collection is at least a literature volume over 10 years.
According to the subject vocabulary, the annual subject literature is preprocessed to obtain subject literature word segmentation documents of corresponding years, and the method specifically comprises the following steps:
the following treatments are performed for every year scientific literature:
sentence division is carried out on the scientific literature to obtain a corresponding sentence set;
and performing word segmentation processing on the sentence set according to the subject word list to form a subject document word segmentation document of corresponding year.
And according to the occurrence condition of the words in the subject word list in the subject word segmentation document, marking 1 if the occurrence condition occurs, otherwise marking 0, and marking binary vectors of corresponding years.
In the specific implementation process, the specific steps for obtaining the frequent topic collection are as follows:
taking binary vectors corresponding to word segmentation documents in any year as transactions, arranging subject words in the word segmentation documents according to the order of the support degree from large to small, and deleting frequent 1 item sets to obtain updated transaction data sets;
converting the transaction data set into a transaction linked list group, wherein each transaction linked list of the transaction linked list group stores information of each transaction with the same head element;
updating the transaction linked list group according to the increasing arrangement sequence of the supporting degree of the head element to obtain an updated transaction linked list group;
digging the updated transaction linked list group to obtain a frequent topic set of corresponding years;
and calculating each subject term by taking the last frequent subject set as a reference, if the number of years of occurrence of the subject term exceeds a threshold value, reserving, otherwise deleting, and obtaining the frequent hot subject set.
The specific process of obtaining the updated transaction linked list group is as follows: recursively scanning the transaction linked list to find out frequent item sets; deleting the transaction linked list group from the transaction linked list group, and creating a transaction linked list group taking the head element of the transaction linked list as a prefix; and merging the transaction linked list group and the transaction linked list group with the transaction linked list head element as a prefix to obtain the updated transaction linked list group.
The specific steps for obtaining the hot topic collection comprise:
dividing the frequent subject set to obtain related subject words;
constructing related subject phrases according to related subject words;
each related subject phrase is processed as follows:
deleting repeated subject words to obtain a subject phrase without repetition;
deleting the subtopic word and the subtopic word of each topic word in the topic word group without repetition to obtain a hot topic word group;
and forming the hot topic set according to each hot topic phrase.
Forming a vector set according to the frequencies of topics in the hot topic set in the corresponding year;
and arranging the vector values in the vector set in order from small to large according to the year to form the time sequence data.
The frequency of the topics of a single subject word in a certain year is that the number of lines in the binary vector matrix direct utilization in the year is 1 sum to be used as the frequency of occurrence in the year, namely the number of sentences in which the topic representation word occurs in the year. For a topic composed of multiple topic words, the number of sentences that all topic words of the topic appear in the year at the same time is directly calculated as the frequency of the occurrence in the year by using a binary vector matrix. The topic vector is denoted (F0, F1, … Fi, …, fn), F1 denotes the frequency of the topic in the beginning year, fi denotes the frequency of the topic in the beginning year + i year, fn denotes the frequency of the topic in the ending year, i.e. the vector frequency is set from small to large by year.
According to the predictive model freq (X) =w 1 *M 1 (X)+w 2 *M 2 (X)+...+w j *M j (X)+...w J *M J (X) predicting the probability of occurrence of the subject to be tested.
Wherein X represents the subject to be testedFreq (X) represents the frequency of occurrence of the subject to be tested, M j (X) represents M j Predicting value, w, of model on the subject to be detected j Represents M j Weight of model prediction, j=1, 2.
Potential hot spot topics may also be predicted according to the methods described above. Potential hot topic refers to the topic that hot words not in RT may become hot in future to predict so as to understand the hot development trend of subjects in the next year.
The time series vector x= (X1, X2, …, xn) is first derived from the frequency of occurrence of the dataset each year, and correlation calculations are performed with the timing vectors of all topics of RT. The invention adopts the relevance of the pearson correlation coefficient to two time sequence vectors, and the principle is as follows:
assume that there are two vectors x= (X) 1 ,x 2 ,...,x n ) And y= (Y) 1 ,y 2 ,...,y n ) The pearson correlation coefficient between X and Y can be calculated using the following formula:
Figure SMS_1
wherein the covariance between X and Y is defined as:
Figure SMS_2
the variance is defined as:
Figure SMS_3
the correlation coefficient can be written as:
Figure SMS_4
the values of the variables of the correlation coefficients are mathematically specified as between-1 and +1; when the value is closer to 1, the larger the value, the stronger the positive correlation between any two variables; the closer the value is to-1, the stronger the negative correlation between the two variables.
According to the calculation result, the invention takes the topic TP with highest correlation with the potential hot point topic as the calculation basis, and uses the formula F=TP R ×P L /TP L Predicting occurrence frequency values of potential hot-spot topics, wherein TP R As predicted value of TP in next year, TP L For the last component value in the TP time series vector, i.e., the last year of the subject frequency in the collected data, P L The frequency of occurrence of the time subject to be measured in the last year.
The following is a specific embodiment of the present solution:
and constructing a subject vocabulary Dict in the specific subject field according to subject documents, network resources and expert knowledge.
The second group, in english for example, selects 36 journals representative of the animal genetic and breeding field, and 73990 collections of documents in the year 2000 to 2017.
And thirdly, filtering irrelevant documents to finally obtain 71990 documents, dividing sentences of each article, and dividing words according to a subject word list Dict to form a sentence document set.
And fourthly, converting the result of the third step into a binary vector matrix of the document. That is, with the sentence as a dimension, the word in the sentence appears, then the flag is 1, otherwise it is 0, as shown in fig. 2. Where a behavior is a sentence, and a column is a word.
And fifthly, on the basis of the fourth step, carrying out hot spot topic identification by utilizing a frequent item set mining algorithm Relim algorithm. And taking sentence vectors processed by all documents as transactions, and setting a minimum support threshold MinSupport. At the minimum support threshold, the subject hotspot topic set st= { "animal_association_behavior", "animal_behavio", "concentrate_plasma", "cow_dairy", "feed_intak", … "gene_expression" } is obtained.
And step six, filtering the topics which can generate repetition on the basis of the step five, and finally forming a hot topic set = { "animal_association_behavir", "concept_plasma", "cow_day", "feed_interval", … "gene_expression" }.
Seventh, time series data are generated as shown in fig. 3. Collections are formed at the frequency of years according to the topics in St'.
And eighth step, a hot spot theme prediction model. And training according to the time sequence data by using linear regression, a support vector machine, radial basis function regression and a radial basis function neural network model respectively, wherein the weight of each model is 1/4. The predicted value is expressed by the formula freq (TopicWord) =w1×m1+w2×m2+w3×m3+w4×m4.wi takes on a value of 1/4.
And ninth, predicting hot spot topics, namely calculating possible occurrence frequencies of a hot spot topic set St' by using freq (TopicWord) to know hot spot changes of the hot spot topics.
Tenth, predicting the potential hot topic. For the hotspot word Wordp that is not in St', the user may want to learn about future hotspot conditions according to the business. According to formula 1, the subject "concentrate_plasma" is found to be most relevant to the predictive term "gene_expression", while the occurrence frequency of "gene_expression" in 2017 is 536, the predictive value in 2018 is 612, the occurrence frequency of "concentrate_plasma" in 2017 is 146, and the prediction of "concentrate_plasma" in 2018 is 146×612/536=168. The attention of the subject "concentrate_plasma" is described as having an increasing tendency.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (8)

1. A technological research hotspot topic prediction method, characterized in that the prediction method comprises:
determining a database of the corresponding technical research field according to the subject to be tested; the database comprises discipline literature, network resources and expert knowledge;
constructing a subject word list of one-dimensional transverse vectors according to the database;
preprocessing the annual subject literature according to the subject vocabulary to obtain subject literature word segmentation documents of corresponding years;
obtaining binary vectors of corresponding years according to the occurrence condition of words in the subject word list in the subject document word segmentation document by utilizing the subject word list; the binary vectors of all years form a binary vector matrix;
processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent subject set;
filtering the frequent topic set to obtain a hot topic set;
converting the hot topic collection into time sequence data;
training a plurality of prediction models according to the time sequence data, and obtaining a theme prediction model by using a weighting processing method;
and predicting the occurrence frequency of the theme to be detected by using the theme prediction model.
2. The method for predicting hot topics of scientific research according to claim 1, wherein the preprocessing the annual subject literature according to the topic word list to obtain subject literature word segmentation documents of corresponding years specifically comprises the following steps:
the following treatments are performed for every year scientific literature:
sentence division is carried out on the scientific literature to obtain a corresponding sentence set;
and performing word segmentation processing on the sentence set according to the subject word list to form a subject document word segmentation document of corresponding year.
3. The method of claim 1, wherein if a word in the subject document appears in the subject vocabulary, the word is marked as 1, otherwise the word is marked as 0, and a binary vector of the corresponding year is formed.
4. The technological research hotspot topic prediction method of claim 1, wherein the processing the binary vector matrix by using a frequent item set mining algorithm to obtain a frequent topic set specifically comprises:
taking binary vectors corresponding to word segmentation documents in any year as transactions, arranging subject words in the word segmentation documents according to the order of the support degree from large to small, and deleting frequent 1 item sets to obtain updated transaction data sets;
converting the transaction data set into a transaction linked list group, wherein each transaction linked list of the transaction linked list group stores information of each transaction with the same head element;
updating the transaction linked list group according to the increasing arrangement sequence of the supporting degree of the head element to obtain an updated transaction linked list group;
digging the updated transaction linked list group to obtain a frequent topic set of corresponding years;
and calculating each subject term by taking the frequent subject set of the last year as a reference, if the number of years of occurrence of the subject term exceeds a threshold value, reserving, otherwise, deleting, and obtaining the frequent hot subject set.
5. The method for predicting hot topics of scientific research according to claim 4, wherein the step of updating the transaction linked list group according to the increasing order of the supporting degree of the head element to obtain an updated transaction linked list group specifically comprises:
recursively scanning the transaction linked list to find out frequent item sets;
deleting the transaction linked list group from the transaction linked list group, and creating a transaction linked list group taking the head element of the transaction linked list as a prefix;
and merging the transaction linked list group and the transaction linked list group with the transaction linked list head element as a prefix to obtain the updated transaction linked list group.
6. The technological research hotspot topic prediction method of claim 1, wherein the filtering the frequent topic set to obtain a hotspot topic set specifically comprises:
dividing the frequent subject set to obtain related subject words;
constructing related subject phrases according to related subject words;
each related subject phrase is processed as follows:
deleting repeated subject words to obtain a subject phrase without repetition;
deleting the subtopic word and the subtopic word of each topic word in the topic word group without repetition to obtain a hot topic word group;
and forming the hot topic set according to each hot topic phrase.
7. The technological research hotspot topic prediction method of claim 1, wherein the converting the hotspot topic set into time series sequence data specifically comprises:
forming a vector set according to the frequencies of topics in the hot topic set in the corresponding year;
and arranging the vector values in the vector set in order from small to large according to the year to form the time sequence data.
8. The scientific research hotspot topic prediction method of claim 1, wherein the topic prediction model is: freq (X) =w 1 *M 1 (X)+w 2 *M 2 (X)+...+w j *M j (X)+...w J *M J (X);
Wherein X represents the subject to be tested, freq (X) represents the frequency of occurrence of the subject to be tested, M j (X) represents M j Predicting value, w, of model on the subject to be detected j Represents M j Weight of model prediction, j=1, 2.
CN201910961978.7A 2019-10-11 2019-10-11 Technological research hotspot theme prediction method Active CN112650847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910961978.7A CN112650847B (en) 2019-10-11 2019-10-11 Technological research hotspot theme prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910961978.7A CN112650847B (en) 2019-10-11 2019-10-11 Technological research hotspot theme prediction method

Publications (2)

Publication Number Publication Date
CN112650847A CN112650847A (en) 2021-04-13
CN112650847B true CN112650847B (en) 2023-05-09

Family

ID=75343740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910961978.7A Active CN112650847B (en) 2019-10-11 2019-10-11 Technological research hotspot theme prediction method

Country Status (1)

Country Link
CN (1) CN112650847B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164540A (en) * 2013-04-15 2013-06-19 武汉大学 Patent hotspot discovery and trend analysis method
CN106021222A (en) * 2016-05-09 2016-10-12 浙江农林大学 Analysis method and device for scientific research literature theme evolution
CN106682172A (en) * 2016-12-28 2017-05-17 江苏大学 Keyword-based document research hotspot recommending method
CN107038156A (en) * 2017-04-28 2017-08-11 北京清博大数据科技有限公司 A kind of hot spot of public opinions Forecasting Methodology based on big data
CN107193797A (en) * 2017-04-26 2017-09-22 天津大学 The much-talked-about topic detection of Chinese microblogging and trend forecasting method
CN107885727A (en) * 2017-11-13 2018-04-06 成都蓝景信息技术有限公司 A kind of social hotspots based on machine learning model find method
CN107992976A (en) * 2017-12-15 2018-05-04 中国传媒大学 Much-talked-about topic early-stage development trend predicting system and Forecasting Methodology
CN108959378A (en) * 2018-05-28 2018-12-07 天津大学 The visual analysis method of document hot spot
CN110188263A (en) * 2019-05-29 2019-08-30 国网山东省电力公司电力科学研究院 It is a kind of towards isomery when away from scientific research hotspot prediction method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164540A (en) * 2013-04-15 2013-06-19 武汉大学 Patent hotspot discovery and trend analysis method
CN106021222A (en) * 2016-05-09 2016-10-12 浙江农林大学 Analysis method and device for scientific research literature theme evolution
CN106682172A (en) * 2016-12-28 2017-05-17 江苏大学 Keyword-based document research hotspot recommending method
CN107193797A (en) * 2017-04-26 2017-09-22 天津大学 The much-talked-about topic detection of Chinese microblogging and trend forecasting method
CN107038156A (en) * 2017-04-28 2017-08-11 北京清博大数据科技有限公司 A kind of hot spot of public opinions Forecasting Methodology based on big data
CN107885727A (en) * 2017-11-13 2018-04-06 成都蓝景信息技术有限公司 A kind of social hotspots based on machine learning model find method
CN107992976A (en) * 2017-12-15 2018-05-04 中国传媒大学 Much-talked-about topic early-stage development trend predicting system and Forecasting Methodology
CN108959378A (en) * 2018-05-28 2018-12-07 天津大学 The visual analysis method of document hot spot
CN110188263A (en) * 2019-05-29 2019-08-30 国网山东省电力公司电力科学研究院 It is a kind of towards isomery when away from scientific research hotspot prediction method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Hot Topic-Aware Retweet Prediction with Masked Self-attentive Model;Renfeng Ma等;《SIGIR 19》;525-534 *
基于文本挖掘的国外农业科研项目研究热点主题分析;聂秀萍等;《江西农业学报》;第30卷(第7期);102-106 *
基于机器学习方法的动物遗传与育种学科热点趋势预测;聂秀萍等;《农业展望》;第16卷(第1期);101-105 *

Also Published As

Publication number Publication date
CN112650847A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
Abualigah et al. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering
Luan et al. Scientific information extraction with semi-supervised neural tagging
US8019699B2 (en) Machine learning system
Rahmawati et al. Word2vec semantic representation in multilabel classification for Indonesian news article
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
CN108717411B (en) Questionnaire design auxiliary system based on big data
US11481560B2 (en) Information processing device, information processing method, and program
CN110390014B (en) Theme mining method and device and storage medium
CN110377695B (en) Public opinion theme data clustering method and device and storage medium
CN110674301A (en) Emotional tendency prediction method, device and system and storage medium
CN111125315B (en) Technical trend prediction method and system
Bhutada et al. Semantic latent dirichlet allocation for automatic topic extraction
CN112765961A (en) Fact verification method and system based on entity graph neural network inference
Alsaidi et al. English poems categorization using text mining and rough set theory
Luan Information extraction from scientific literature for method recommendation
Liu et al. Sent2Span: span detection for PICO extraction in the biomedical text without span annotations
Sun et al. Twitter part-of-speech tagging using pre-classification Hidden Markov model
CN111125329B (en) Text information screening method, device and equipment
CN112650847B (en) Technological research hotspot theme prediction method
CN111563361A (en) Text label extraction method and device and storage medium
CN110609997B (en) Method and device for generating abstract of text
Yoon et al. Efficient implementation of associative classifiers for document classification
CN107729509A (en) The chapter similarity decision method represented based on recessive higher-dimension distributed nature
CN113495964A (en) Method, device and equipment for screening triples and readable storage medium
Parsafard et al. Text classification based on discriminative-semantic features and variance of fuzzy similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant