CN111339286B - Method for exploring mechanism research conditions based on theme visualization - Google Patents

Method for exploring mechanism research conditions based on theme visualization Download PDF

Info

Publication number
CN111339286B
CN111339286B CN202010092905.1A CN202010092905A CN111339286B CN 111339286 B CN111339286 B CN 111339286B CN 202010092905 A CN202010092905 A CN 202010092905A CN 111339286 B CN111339286 B CN 111339286B
Authority
CN
China
Prior art keywords
topic
word
document
research
idf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010092905.1A
Other languages
Chinese (zh)
Other versions
CN111339286A (en
Inventor
秦红星
曹鑫霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Southern Wanfang Data Co.,Ltd.
Original Assignee
Sichuan Chaoyihong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Chaoyihong Technology Co ltd filed Critical Sichuan Chaoyihong Technology Co ltd
Priority to CN202010092905.1A priority Critical patent/CN111339286B/en
Publication of CN111339286A publication Critical patent/CN111339286A/en
Application granted granted Critical
Publication of CN111339286B publication Critical patent/CN111339286B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for researching conditions of an exploration mechanism based on theme visualization, and belongs to the technical field of visualization. The method comprises the following steps: s1, acquiring research data and preprocessing, namely determining an institution to be researched, and acquiring SCI academic literature data of the institution to be researched; extracting a required research field, and preprocessing the acquired research corpus; s2, processing the selected corpus by adopting TF-IDF feature extraction and LDA topic model text mining technology, extracting scientific research hot topics and topic words thereof, and carrying out academic literature topic clustering; and S3, presenting the clustering subject and other dimensional information in the academic literature data in a visual mode, and analyzing the result from multiple dimensions. The invention is beneficial to better grasp and track the development condition of the scientific research condition of the current mechanism, so as to better enable scientific researchers to capture the leading edge and hot spot of the development of subjects and avoid repeated research.

Description

Method for exploring mechanism research conditions based on theme visualization
Technical Field
The invention belongs to the technical field of visualization, and relates to a method for researching conditions of an exploration mechanism based on theme visualization.
Background
In recent years, the number of scientific research workers is rapidly increasing, and along with the wide application of computer networks and information technologies, academic documents are increasingly in sea, diversified and instant, and the phenomenon makes the development trend of scientific research hotspots unable to be tracked and processed artificially. Visual analysis is an emerging technology developed in recent years, is a product developed in the fields of information visualization and scientific visualization, is an effective means and way for people to understand and interpret large-scale complex situations, and realizes a graphical visual model through a visual algorithm to display multi-bit or high-dimensional data. The visual model combined with man-machine interaction can also perform dynamic multi-angle analysis.
Topic model-based literature hotspot analysis is an important method for exploring and researching a certain field condition, and is mainly performed by analyzing academic literature or patents published in the field, wherein the academic literature is an important embodiment of research development in the field. At present, research analysis of documents is carried out, topic models are improved by modeling topics and then displaying multi-scale information related to topic models in the field in a visual mode, or interactive operation is designed on the multi-scale information of the topic models.
An academic literature published by an organization carries research results of scientific research on various subjects. At present, scientific researches tend to be multi-polarized, and scientific topics are reflected in the characteristics of numerous, miscellaneous, messy and the like. The number of researchers is large, and the emphasis of each scientific research institution is different. The current research condition and development state of the scientific research institution are known and tracked through visual analysis by combining a plurality of dimensional information through subject modeling on academic documents of the scientific research institution.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for exploring research conditions of an organization based on topic visualization, which aims at the problem that the existing topic model visual analysis system lacks research conditions of a certain organization.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a method of exploring an institutional research situation based on topic visualization, the method comprising the steps of:
s1: acquisition and processing of study data:
determining the institution needing to be studied, and acquiring SCI academic literature data of the required institution;
extracting a required research field;
preprocessing the extracted fields;
s2: processing the selected corpus by adopting TF-IDF feature extraction and LDA topic model analysis technology;
extracting and generating TF-IDF characteristics of the preprocessed data, and establishing a characteristic vector space model of the whole corpus;
the LDA algorithm establishes a topic model by utilizing a feature vector space model generated by a corpus, calculates the established topic model by utilizing a Gibbs sampling method, and outputs and stores a topic-word matrix;
performing cluster analysis on the output theme-word matrix, and storing and outputting a clustering result;
s3: the clustering subject and other dimension information in academic documents are presented in a visual mode, and results are analyzed from multiple dimensions;
respectively displaying the change of the intensity of a main body along with time, the research field represented by each theme and the change of the frequency of the themes by using a river flow diagram, a text cloud and a line diagram;
the tree diagram and the bar diagram respectively show the hierarchical structure under each topic, and analyze the weight of each branch office on the academic influence of the topic;
the scatter diagram and the line diagram respectively show the intensity change and research trend of different topics under each branch office, and find the discipline advantage.
Optionally, in said step S1, it is determined that the institution to be studied is based on SCI academic literature containing the address of the author of the institution for the last five years.
Optionally, in the step S1, regular matching is used, and the retained data includes the title, author, time, frequency of introduction, keywords, and abstract of the document; the corpus includes keywords, topics, article summaries.
Optionally, in the step S1, the preprocessing includes cleaning and denoising, word segmentation of english text, and disabling word and root word reduction.
Optionally, in the step S2, the TF-IDF algorithm is used to perform feature extraction and generate a text vector space, which specifically includes the following operations:
TF represents the number of times that the word appears in a document, IDF represents how many documents in the document set the word appears, and the TF and IDF are multiplied to obtain the importance of a specific word to a document; and calculating the importance degree of each document for all dimensions of the document, and generating TF-IDF feature vectors of each document:
Feature-Vector={f 1 ,f 2 ,f 3 ……,f n } (1)
in the formula (1), the TF-IDF characteristic calculation formula of each document is:
f i =tf(w i ,d i )*idf(w i ,D) (2)
in the formula (2), the tf value calculation formula is:
wherein n is i Is the word in document d i The denominator is document d i The sum of the times of occurrence of all words in the list;
the idf value calculation formula in the formula (2) is:
wherein D is the total number of all documents in the |D| document set i For a particular document, w i Is a certain vocabulary, namely a feature;
the TF-IDF feature vectors of all the document sets constitute a (word, tfidf) matrix, which is a document feature vector space model.
Optionally, in the step S2, the obtaining of the topic-word matrix is implemented by using an LDA algorithm, and based on a word bag mode, the method specifically includes the following operations:
LDA assumes that the documents are produced from a mix of topics, each document being generated as follows:
generating the length N of a document from the distribution with the global poisson distribution parameter beta;
generating theta of a current document from the distribution with the global dirichlet parameter alpha;
for each word of the current document length N there is: generating a subject subscript z from a polynomial distribution with θ as a parameter n Generating a word w from a polynomial distribution in which θ and z are parameters in common n
Training process Gibbs Sampling:
randomly assigning a topic number z to each word w of each document;
statistics of each topic z i The number of next occurring words w and the topic z in each document n i The number of words w in (a);
subject distribution z excluding the current word w at a time i Estimating current word w assignment based on topic classification of all other words
To the respective subject z 1 ,z 2 ,…,z k Of (a), i.e. calculating p (z i |z -i D, w); obtaining that the current word belongs to all topics z 1 ,z 2 ,…,z k After the probability distribution of (a), resampling a new topic z for the word 1 The method comprises the steps of carrying out a first treatment on the surface of the Continuously updating the topic of the next word by the same method until the topic distribution theta under each document n And word distribution phi under each topic k Converging;
finally, outputting the parameters to be estimated, theta n And phi k Obtaining the subject z of each word k,n
And training the LDA model by Gibbs Sampling to obtain a topic-word co-occurrence frequency matrix.
Optionally, in the step S2, clustering analysis is performed on the obtained LDA topic model to obtain academic document clustering data, where the academic document clustering data includes an academic document included in each topic cluster and a keyword included in each topic.
Optionally, in the step S3, the information of other dimensions includes: time, author, frequency of introduction, and branch office to which the author belongs.
Optionally, in the step S3, the result is analyzed from multiple dimensions by using a d3.Js visual analysis technique, which specifically includes the following operations:
the method comprises the steps of respectively displaying the change of main body strength along with time, the research field represented by each theme and the change of the frequency of the themes to be led by using a river flow diagram, a text cloud and a line diagram to know the overall research outline of the organization; the tree diagram and the bar diagram respectively show the hierarchical structure under each topic, and analyze the weight of each branch office on the academic influence of the topic to know the relationship between the topic and the branch office; the scatter diagram and the line diagram respectively show the intensity change and research trend of different topics under each branch office, and find the discipline advantage.
The invention has the beneficial effects that:
the mechanism research condition exploration based on the theme visualization adopts the idea of combining visual analysis and theme modeling on the basis of data visualization, and intuitively displays the relation between the data by means of a certain visual symbol, so that the understanding of the rules contained in the literature data by a user is deepened. The research object in the invention is SCI academic literature based on a specific organization, the data has a certain representativeness, theme elements obtained by modeling the theme are added with time dimension, and then the evolution of the research content of the organization along with time is analyzed; the attention degree of the mechanism in the scientific research field represented by the theme is obtained through the frequency analysis of each theme so as to effectively analyze and explain the future research trend of the theme field; and the research condition of each branch office is subjected to multiple dimension analysis, so that researchers can be helped to know the development condition of the subject in time, make a decision in time, and avoid repeated research.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a subject-based visualization of institutional research conditions in accordance with the present invention;
FIG. 2 is a theme flowsheet of an organization according to an example of the present invention;
FIG. 3 illustrates word clouds representing various subject matter in accordance with an example of the present invention;
FIG. 4 is a partial line drawing showing how frequently each topic is referenced and how many text messages are sent in accordance with an embodiment of the present invention;
FIG. 5 is a scatter plot showing annual research situation for various subjects in a branch office for an example of the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.
As shown in fig. 1, the method of the present invention provides a method for exploring the research condition of an organization based on theme visualization, which comprises the following steps:
s1: acquiring and processing research data;
s2: processing the selected corpus by adopting TF-IDF feature extraction and LDA topic model analysis technology;
s3: visually presenting the clustering subject and other dimension information in academic documents, and analyzing results from multiple dimensions
In this example, the scientific literature of the required research institution is determined by data acquisition in academic literature retrieval of web ofScience, and the scientific literature of the last five years including academic papers, journal journals and the like contains the address of the author of the institution, and 2975 scientific literature in total.
In this embodiment, 2975 unstructured plain text format data are preprocessed, and the required research fields are extracted by regular matching, and the reserved data comprise titles, authors, time, frequency of introduction, keywords and abstracts of documents. The corpus includes keywords, topics, article summaries.
In this embodiment, text preprocessing operations such as english text word segmentation, stop word filtering, and root word reduction are performed on the information of the keywords, the topics, and the article abstract in three dimensions, so as to obtain a denoised corpus.
In this embodiment, feature extraction and generation are performed on each long text sample in the denoised corpus by using a TF-IDF algorithm, where TF-IDF is used to perform text feature weighting, and is a statistical-based calculation method, which is commonly used to evaluate the importance of a word in a document set on a certain document. It is composed of two parts: TF and IDF.
TF represents the number of times that this word appears in a document, assuming n i Is a word in document d i The denominator is document d i If the sum of the times of occurrence of all words in the database is equal to the sum of the times of occurrence of all words in the database, the TF value of the word is calculated as follows:
IDF represents how many documents in the document set this word appears, assuming the total number of all documents in the |D| document set, D i For a particular document, w i For a certain word, the calculation formula of the IDF value of the word is as follows:
and multiplying TF and IDF to obtain the importance degree of a specific word for a document, wherein the TF-IDF value of the word for a document d is calculated as follows:
f i =tf(w i ,d i )*idf(w i ,D)
a document has a plurality of features, each feature is a word, the importance degree of the document is calculated for all dimensions of each document, and TF-IDF feature vectors of each document are generated as follows:
Feature-Vector={f 1 ,f 2 ,f 3 ……,f n }
and then, after TF-IDF feature extraction and establishment are carried out on all documents in the corpus, TF-IDF feature vector composition (word, tfidf) matrixes of all document sets are generated, and the matrixes are feature vector space models of the corpus.
In this embodiment, through the feature vector space model of the obtained corpus, then LDA topic modeling is performed, gibbs Sampling is adopted in the topic model, and the training process is as follows:
randomly assigning a topic number z to each word w of each document;
statistics of each topic z i The number of next occurring words w and the topic z in each document n i The number of words w in (a);
subject distribution z excluding the current word w at a time i Estimating assignment of current word w to each topic z based on topic classification of all other words 1 ,z 2 ,…,z k Of (a), i.e. calculating p (z i |z -i D, w) (Gibbsupdatingroule). Obtaining that the current word belongs to all topics z 1 ,z 2 ,…,z k After the probability distribution of (a), resampling a new topic z for the word 1 . The topic of the next word is updated continuously in the same way until the topic distribution θ under each document n And word distribution phi under each topic k And (5) convergence.
Finally, outputting the parameters to be estimated, theta n And phi k Theme z of each word k,n Can also be obtained.
The Gibbs Sampling trains the LDA model to obtain a theme-word co-occurrence frequency matrix, and the matrix is the LDA model.
And evaluating the LDA model by using the confusion degree (perplexity), and selecting the number of topics corresponding to the minimum value of the confusion degree as the optimal clustering topic number by selecting the topic number in a certain range.
In this embodiment, the obtained LDA topic model is subjected to cluster analysis to obtain academic document cluster data, where the academic document cluster data includes an academic document contained in each topic cluster and a keyword contained in each topic.
In this embodiment, the clustering result is subjected to data fusion with information of other dimensions, where the information of other dimensions includes: time, author, frequency of introduction, and branch office to which the author belongs.
In this embodiment, finally, the d3.Js visual analysis technique is used to analyze the result from multiple dimensions, including the following steps:
as shown in fig. 2, the theme river flow graph shows the variation of the intensity of different themes, the horizontal axis represents the time axis, from left to right, represents the time lapse, different themes are rendered by different colors, the width represents the development condition of the themes in different time, and the larger the width is, the stronger the representative theme is. The mouse is moved over the theme with highlighting.
As shown in fig. 3, the text cloud represents keywords of each topic, that is, research content of each topic, and the size of the word represents the importance degree of the word in the topic.
As shown in FIG. 4, the line graph represents the annual volume of text sent by each topic and the frequency of the topic being referenced, whereby the research trends and academic impact of a topic of the institution can be predicted.
As shown in FIG. 5, the scatter plot represents the change of each topic study condition of each branch office with time, different topics are represented by different colors, and the circle size represents the number of topics.
According to the method for exploring the research condition of the organization based on the theme visualization, theoretical knowledge is based on data visualization and theme modeling, firstly, the researched prediction is required to be preprocessed, a text vector space is extracted and generated through features, then an academic document clustering result is obtained after the theme modeling, the academic document clustering result and theme keywords are combined with information of other dimensions in the academic document data, such as publishing time, branches, authors and the like, and the relationship among the data is analyzed and unfolded and predicted by using the visualization elements, so that the development condition of the scientific research condition of the organization can be mastered and tracked better, the leading edge and hot spot of the development of the science can be captured by scientific researchers better, and repeated research is avoided.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (1)

1. A method for exploring a mechanism research situation based on theme visualization, which is characterized by comprising the following steps: the method comprises the following steps:
s1: acquisition and processing of study data:
determining the institution needing to be studied, and acquiring SCI academic literature data of the required institution;
extracting a required research field;
preprocessing the extracted fields;
s2: processing the selected corpus by adopting TF-IDF feature extraction and LDA topic model analysis technology;
extracting and generating TF-IDF characteristics of the preprocessed data, and establishing a characteristic vector space model of the whole corpus;
the LDA algorithm establishes a topic model by utilizing a feature vector space model generated by a corpus, calculates the established topic model by utilizing a Gibbs sampling method, and outputs and stores a topic-word matrix;
performing cluster analysis on the output theme-word matrix, and storing and outputting a clustering result;
s3: the clustering subject and other dimension information in academic documents are presented in a visual mode, and results are analyzed from multiple dimensions;
respectively displaying the change of the intensity of a main body along with time, the research field represented by each theme and the change of the frequency of the themes by using a river flow diagram, a text cloud and a line diagram;
the tree diagram and the bar diagram respectively show the hierarchical structure under each topic, and analyze the weight of each branch office on the academic influence of the topic;
the scatter diagram and the line diagram respectively show the intensity change and the research trend of different topics under each branch office, and find the subject advantage;
in said step S1, it is determined that the institution to be studied is based on SCI academic documents of the last five years containing the address of the author of the institution;
in the step S1, regular matching is used, and the retained data includes the title, author, time, frequency of introduction, keywords and abstract of the document; the corpus comprises keywords, topics and article abstracts;
in the step S1, preprocessing comprises cleaning and denoising, english text word segmentation, word stopping and root word reduction;
in the step S2, feature extraction and text vector space generation are performed using TF-IDF algorithm, which specifically includes the following operations:
TF represents the number of times that the word appears in a document, IDF represents how many documents in the document set the word appears, and the TF and IDF are multiplied to obtain the importance of a specific word to a document; and calculating the importance degree of each document for all dimensions of the document, and generating TF-IDF feature vectors of each document:
Feature-Vector={f 1 ,f 2 ,f 3 ……,f n } (1)
in the formula (1), the TF-IDF characteristic calculation formula of each document is:
f i =tf(w i ,d i )*idf(w i ,D) (2)
in the formula (2), the tf value calculation formula is:
wherein n is i Is the word in document d i The denominator is document d i The sum of the times of occurrence of all words in the list;
the idf value calculation formula in the formula (2) is:
wherein D is the total number of all documents in the |D| document set i For a particular document, w i Is a certain vocabulary, namely a feature;
the TF-IDF feature vectors of all the document sets form a (word, tfidf) matrix, and the matrix is a document feature vector space model;
in the step S2, the obtaining of the topic-word matrix is achieved by using an LDA algorithm, and based on a word bag mode, the topic-word matrix obtaining method specifically includes the following operations:
LDA assumes that the documents are produced from a mix of topics, each document being generated as follows:
generating the length N of a document from the distribution with the global poisson distribution parameter beta;
generating theta of a current document from the distribution with the global dirichlet parameter alpha;
for each word of the current document length N there is: generating a subject subscript z from a polynomial distribution with θ as a parameter n Generating a word w from a polynomial distribution in which θ and z are parameters in common n
Training process gibbs sampling:
randomly assigning a topic number z to each word w of each document;
statistics of each topic z i The number of next occurring words w and the topic z in each document n i The number of words w in (a);
subject distribution z excluding the current word w at a time i Estimating current word w assignment based on topic classification of all other words
To the respective subject z 1 ,z 2 ,…,z k Of (a), i.e. calculating p (z i |z -i D, w); obtaining that the current word belongs to all topics z 1 ,z 2 ,…,z k After the probability distribution of (a), resampling a new topic z for the word 1 The method comprises the steps of carrying out a first treatment on the surface of the Continuously updating the topic of the next word by the same method until the topic distribution theta under each document n And word distribution phi under each topic k Converging;
finally, outputting the parameters to be estimated, theta n And phi k Obtaining the subject z of each word k,n
Training the LDA model by Gibbssampling to obtain a topic-word co-occurrence frequency matrix;
in the step S2, clustering analysis is performed on the obtained LDA topic model to obtain academic document clustering data, where the academic document clustering data includes academic documents contained in each topic cluster and keywords contained in each topic;
in the step S3, the information of the other dimensions includes: time, author, frequency of introduction, and branch office to which the author belongs;
in the step S3, the d3.Js visual analysis technique is adopted to analyze the result from multiple dimensions, which specifically includes the following operations:
the method comprises the steps of respectively displaying the change of main body strength along with time, the research field represented by each theme and the change of the frequency of the themes to be led by using a river flow diagram, a text cloud and a line diagram to know the overall research outline of the organization; the tree diagram and the bar diagram respectively show the hierarchical structure under each topic, and analyze the weight of each branch office on the academic influence of the topic to know the relationship between the topic and the branch office; the scatter diagram and the line diagram respectively show the intensity change and research trend of different topics under each branch office, and find the discipline advantage.
CN202010092905.1A 2020-02-14 2020-02-14 Method for exploring mechanism research conditions based on theme visualization Active CN111339286B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010092905.1A CN111339286B (en) 2020-02-14 2020-02-14 Method for exploring mechanism research conditions based on theme visualization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010092905.1A CN111339286B (en) 2020-02-14 2020-02-14 Method for exploring mechanism research conditions based on theme visualization

Publications (2)

Publication Number Publication Date
CN111339286A CN111339286A (en) 2020-06-26
CN111339286B true CN111339286B (en) 2024-02-09

Family

ID=71183452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010092905.1A Active CN111339286B (en) 2020-02-14 2020-02-14 Method for exploring mechanism research conditions based on theme visualization

Country Status (1)

Country Link
CN (1) CN111339286B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069314B (en) * 2020-08-25 2022-05-24 清华大学 Specific field situation analysis system based on scientific and technical literature data
CN113537609A (en) * 2021-07-26 2021-10-22 北京清博智能科技有限公司 Policy hotspot prediction method based on text intelligent mining
CN113342942B (en) * 2021-08-02 2021-11-16 平安科技(深圳)有限公司 Corpus automatic acquisition method and device, computer equipment and storage medium
CN116775029B (en) * 2023-07-25 2024-03-05 四川大学 Concept design information modeling method and system based on multiple participants

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142918A (en) * 2014-07-31 2014-11-12 天津大学 Short text clustering and hotspot theme extraction method based on TF-IDF characteristics
CN105956130A (en) * 2016-05-09 2016-09-21 浙江农林大学 Multi-information fusion scientific research literature theme discovering and tracking method and system thereof
CN106021222A (en) * 2016-05-09 2016-10-12 浙江农林大学 Analysis method and device for scientific research literature theme evolution
CN106777043A (en) * 2016-12-09 2017-05-31 宁波大学 A kind of academic resources acquisition methods based on LDA
CN108519971A (en) * 2018-03-23 2018-09-11 中国传媒大学 A kind of across languages theme of news similarity comparison methods based on Parallel Corpus
CN108536762A (en) * 2018-03-21 2018-09-14 上海蔚界信息科技有限公司 A kind of high-volume text data automatically analyzes scheme

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8245135B2 (en) * 2009-09-08 2012-08-14 International Business Machines Corporation Producing a visual summarization of text documents

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142918A (en) * 2014-07-31 2014-11-12 天津大学 Short text clustering and hotspot theme extraction method based on TF-IDF characteristics
CN105956130A (en) * 2016-05-09 2016-09-21 浙江农林大学 Multi-information fusion scientific research literature theme discovering and tracking method and system thereof
CN106021222A (en) * 2016-05-09 2016-10-12 浙江农林大学 Analysis method and device for scientific research literature theme evolution
CN106777043A (en) * 2016-12-09 2017-05-31 宁波大学 A kind of academic resources acquisition methods based on LDA
CN108536762A (en) * 2018-03-21 2018-09-14 上海蔚界信息科技有限公司 A kind of high-volume text data automatically analyzes scheme
CN108519971A (en) * 2018-03-23 2018-09-11 中国传媒大学 A kind of across languages theme of news similarity comparison methods based on Parallel Corpus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Performance of using LDA for Chinese news text classification;Xiaojun Wu;《IEEEXplore》;20150625;全文 *
主题模型可视化研究综述;孙国超等;《情报工程》;20151215(第06期);全文 *

Also Published As

Publication number Publication date
CN111339286A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN111339286B (en) Method for exploring mechanism research conditions based on theme visualization
Görg et al. Combining computational analyses and interactive visualization for document exploration and sensemaking in jigsaw
Li et al. Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network
Mukhtar et al. Urdu sentiment analysis using supervised machine learning approach
Kumar et al. Aspect-based sentiment analysis using deep networks and stochastic optimization
Chen et al. Identify topic relations in scientific literature using topic modeling
Abuhay et al. Analysis of publication activity of computational science society in 2001–2017 using topic modelling and graph theory
CN110750648A (en) Text emotion classification method based on deep learning and feature fusion
Jeong et al. Intellectual structure of biomedical informatics reflected in scholarly events
Zainol et al. Visualurtext: a text analytics tool for unstructured textual data
Bales et al. Bibliometric visualization and analysis software: State of the art, workflows, and best practices
Endert et al. Typograph: Multiscale spatial exploration of text documents
Meng et al. Mining user reviews: from specification to summarization
Asian et al. Sentiment analysis for the Brazilian anesthesiologist using multi-layer perceptron classifier and random forest methods
CN115017315A (en) Leading edge theme identification method and system and computer equipment
Oberbichler et al. Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods
Zhu et al. A scientometric review of research in translation studies in the twenty-first century
Rahul et al. Social media sentiment analysis for Malayalam
Sharma et al. A trend analysis of significant topics over time in machine learning research
Dastani et al. Identifying Emerging Trends in Scientific Texts Using TF-IDF Algorithm: A Case Study of Medical Librarianship and Information Articles
Al-Buraihy et al. An Ml-based classification scheme for analyzing the social network reviews of yemeni people.
Angdresey et al. Classification and Sentiment Analysis on Tweets of the Ministry of Health Republic of Indonesia
Katsurai Using word embeddings for library and information science research: A short survey
Ameer et al. Hybrid Deep Neural Networks for Improved Sentiment Analysis in Social Media
Katiandhago et al. Sentiment Analysis of Twitter Cases of Riots at Kanjuruhan Stadium Using the Naive Bayes Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240104

Address after: No. 1610, 16th Floor, Building 1, No. 188, Section 2, Renmin North Road, Jinniu District, Chengdu City, Sichuan Province, 610000

Applicant after: Sichuan Chaoyihong Technology Co.,Ltd.

Address before: 400065 Chongqing Nan'an District huangjuezhen pass Chongwen Road No. 2

Applicant before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240419

Address after: Room 216, No. 2 Tengfei 1st Street, Zhongxin Guangzhou Knowledge City, Huangpu District, Guangzhou City, Guangdong Province, 510000

Patentee after: Guangzhou Southern Wanfang Data Co.,Ltd.

Country or region after: China

Address before: No. 1610, 16th Floor, Building 1, No. 188, Section 2, Renmin North Road, Jinniu District, Chengdu City, Sichuan Province, 610000

Patentee before: Sichuan Chaoyihong Technology Co.,Ltd.

Country or region before: China