CN109783526A - A kind of research hotspot analysis method based on expert's paper big data - Google Patents

A kind of research hotspot analysis method based on expert's paper big data Download PDF

Info

Publication number
CN109783526A
CN109783526A CN201910029926.6A CN201910029926A CN109783526A CN 109783526 A CN109783526 A CN 109783526A CN 201910029926 A CN201910029926 A CN 201910029926A CN 109783526 A CN109783526 A CN 109783526A
Authority
CN
China
Prior art keywords
data
paper
word frequency
expert
vocabulary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910029926.6A
Other languages
Chinese (zh)
Inventor
黄翼
吴硕贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tansi Architectural Engineering Consulting (guangzhou) Co Ltd
South China University of Technology SCUT
Original Assignee
Tansi Architectural Engineering Consulting (guangzhou) Co Ltd
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tansi Architectural Engineering Consulting (guangzhou) Co Ltd, South China University of Technology SCUT filed Critical Tansi Architectural Engineering Consulting (guangzhou) Co Ltd
Publication of CN109783526A publication Critical patent/CN109783526A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of research hotspot analysis methods based on expert's paper big data, the following steps are included: S1, according to keyword, paper search is carried out by data source of knowledge data base, and grabs Article Titles, deliver time, author, data source these open data;S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary, and the vocabulary of removal verb, adjective part of speech only retains noun, obtains hot spot word lists;S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted, obtains specialized vocabulary list;S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains specialized vocabulary word frequency summary table before choosing;S5, increase time data list word frequency according to different year and divide table on the basis of specialized vocabulary word frequency summary table, obtain the focus variation tendency using the time as axis.

Description

A kind of research hotspot analysis method based on expert's paper big data
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of research hotspot based on expert's paper big data point Analysis method.
Background technique
Hownet itself has search engine, can be scanned for according to modes such as title, author, keywords, the knot of search Fruit can also export, and share with softwares such as Note express.For the java application analyzed with visually drawn network altogether CiteSpace can analyze the development process and structural relation of scientific knowledge, can be complete after the data such as Hownet export title At the scientific knowledges pedigree analysis such as key word analysis and author relationships analysis.
But the defect that Hownet itself has search engine is to be intended to manually put selection operation every time, and data analysis is come It says, this working method is undoubtedly very laborious.The limited amount for manually clicking downloading data, under usual human cost does not allow Total data has been carried, data is caused to be difficult have a totality and comprehensive analysis to present.Moreover, its analytic function be only limitted to Family input keyword memory and sequence, there is no deeper into analytic function.CiteSpace is still based on manually-operated base On plinth, efficiency is lower, the imperfect error that also will cause analysis of data, and not can solve specialty analysis problem.
Summary of the invention
The research heat based on expert's paper big data that in view of the deficiencies of the prior art, it is an object of the present invention to provide a kind of Point analysis method.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of research hotspot analysis method based on expert's paper big data, the described method comprises the following steps:
S1, according to keyword, when carrying out paper search by data source of knowledge data base, and grabbing Article Titles, deliver Between, author, data source these open data;
S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary, The vocabulary of verb, adjective part of speech is removed, only retains noun, obtains hot spot word lists;
S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted It removes, obtains specialized vocabulary list;
S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains profession before choosing Vocabulary word frequency summary table;
S5, increase time data list word frequency according to different year and divide table on the basis of specialized vocabulary word frequency summary table, Obtain the focus variation tendency using the time as axis.
Further, described according to keyword, paper search is carried out by data source of network opening database, keyword is One, or again include the maximum synonym of a similitude or near synonym of this keyword.
Further, the knowledge data base is Hownet database or other network opening numbers comprising professional paper data According to library.
Further, word frequency is listed according to different year in the step S5 and divides table, obtain using the time as the pass of axis Note point variation tendency can carry out visualization presentation with graph making.
Further, the graph making uses eCharts picture technology.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1, a kind of research hotspot analysis method based on expert's paper big data provided by the invention, is searched using having a double meaning keyword Rope is excluded using Article Titles as source data by participle, word frequency sequence, the Sino-Japan everyday words corpus matching of professional paper, thus Obtain effective specialized vocabulary list;The word frequency data obtained using paper text big data analysis, sample size is huge, have compared with High fidelity and accuracy represent industry specialists group viewpoint, can make up for it conventional method data sample amount it is smaller and sampling with The insufficient defect of machine can be analyzed for further expert view subjective assessment and provide data reference.
2, a kind of research hotspot analysis method based on expert's paper big data provided by the invention, can analyze for a long time from The contribution degree of the expert of the thing area research, all kinds of periodicals are to the attention rate in the field, and by the number such as degree of drawing and attention According to these data can help scientific research personnel to understand and grasp industry research status rapidly, have high efficiency.
Detailed description of the invention
Fig. 1 is a kind of research hotspot analysis method flow chart based on expert's paper big data of the present invention.
Fig. 2 is the research hotspot analysis method flow chart based on expert's paper big data in the embodiment of the present invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
Embodiment:
As shown in Figure 1, a kind of research hotspot analysis method based on expert's paper big data is present embodiments provided, including Following steps:
S1, according to keyword, when carrying out paper search by data source of knowledge data base, and grabbing Article Titles, deliver Between, author, data source these open data;
S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary, The vocabulary of verb, adjective part of speech is removed, only retains noun, obtains hot spot word lists;
S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted It removes, obtains specialized vocabulary list;
S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains profession before choosing Vocabulary word frequency summary table;
S5, increase time data list word frequency according to different year and divide table on the basis of specialized vocabulary word frequency summary table, Obtain the focus variation tendency using the time as axis.
The above method is described in detail combined with specific embodiments below, as shown in Figure 2, comprising the following steps:
S1, keyword is inputted in webpage, can input a keyword, can also input one of this keyword again The maximum synonym of similitude or near synonym.Search button log-on data background acquisition program is clicked, will be obtained from Hownet data The Article Titles list relevant to keyword of library downloading.It is duplicate since two keywords may search duplicate data Part is unable to superposition calculation.Therefore after the data of two keywords are downloaded respectively, Ying Jinhang duplicate checking simultaneously excludes repeated data.
S2, starting participle program, carry out word segmentation processing to Article Titles list, obtain word lists;
S3, starting part of speech analyze program, delete the vocabulary of the parts of speech such as conjunction, preposition, pronoun, verb, adjective, only retain Noun obtains hot spot word lists;
S4, starting professional paper day everyday words corpus matcher, exclude word frequency accounting list in hot spot word lists In day everyday words, according to word frequency descending arrange, obtain specialized vocabulary list;
S5, Article Titles different data sources are distinguished, obtains total data list and different data sources list;
S6, starting mapping program draw hot spot respectively and carry out source distribution according to total data list and different data sources list The analysis charts such as figure, hot spot accounting analysis chart, hot spot time trend analysis figure, expert's contribution plot, periodical contribution cloud atlas.
The above, only the invention patent preferred embodiment, but the scope of protection of the patent of the present invention is not limited to This, anyone skilled in the art is in the range disclosed in the invention patent, according to the present invention the skill of patent Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the scope of protection of the patent of the present invention.

Claims (5)

1. a kind of research hotspot analysis method based on expert's paper big data, which is characterized in that the method includes following steps It is rapid:
S1, according to keyword, carry out paper search by data source of knowledge data base, and grab Article Titles, deliver the time, make Person, data source these open data;
S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary, removal The vocabulary of verb, adjective part of speech only retains noun, obtains hot spot word lists;
S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted, is obtained Specialized vocabulary list out;
S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains specialized vocabulary before choosing Word frequency summary table;
S5, increase time data list word frequency according to different year and divide table, obtain on the basis of specialized vocabulary word frequency summary table Using the time as the focus variation tendency of axis.
2. a kind of research hotspot analysis method based on expert's paper big data according to claim 1, it is characterised in that: Described to carry out paper search by data source of knowledge data base according to keyword, keyword is one, or again includes this pass The maximum synonym of a similitude or near synonym of keyword.
3. a kind of research hotspot analysis method based on expert's paper big data according to claim 1, it is characterised in that: The knowledge data base is Hownet database or other network opening databases comprising professional paper data.
4. a kind of research hotspot analysis method based on expert's paper big data according to claim 1, it is characterised in that: Word frequency is listed according to different year in the step S5 and divides table, what is obtained can be with as the focus variation tendency of axis using the time Graph making carries out visualization presentation.
5. a kind of research hotspot analysis method based on expert's paper big data according to claim 4, it is characterised in that: The graph making uses eCharts picture technology.
CN201910029926.6A 2018-12-28 2019-01-10 A kind of research hotspot analysis method based on expert's paper big data Pending CN109783526A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811616957 2018-12-28
CN2018116169573 2018-12-28

Publications (1)

Publication Number Publication Date
CN109783526A true CN109783526A (en) 2019-05-21

Family

ID=66500439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910029926.6A Pending CN109783526A (en) 2018-12-28 2019-01-10 A kind of research hotspot analysis method based on expert's paper big data

Country Status (1)

Country Link
CN (1) CN109783526A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563172A (en) * 2020-05-07 2020-08-21 上海宝藤生物医药科技股份有限公司 Academic hotspot trend prediction method and device based on dynamic knowledge graph construction
CN111966791A (en) * 2020-09-03 2020-11-20 深圳市小满科技有限公司 Extraction method and retrieval method of customs data product words
CN112883148A (en) * 2021-01-15 2021-06-01 上海柏观数据科技有限公司 Subject talent evaluation control method and device based on research trend matching
CN113094496A (en) * 2021-04-29 2021-07-09 中国科学院西北生态环境资源研究院 Entry library-based method for designing ranking list of hot words in periodicals

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136337A (en) * 2013-02-01 2013-06-05 北京邮电大学 Distributed knowledge data mining device and mining method used for complex network
CN103164540A (en) * 2013-04-15 2013-06-19 武汉大学 Patent hotspot discovery and trend analysis method
CN103440329A (en) * 2013-09-04 2013-12-11 北京邮电大学 Authoritative author and high-quality paper recommending system and recommending method
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136337A (en) * 2013-02-01 2013-06-05 北京邮电大学 Distributed knowledge data mining device and mining method used for complex network
CN103164540A (en) * 2013-04-15 2013-06-19 武汉大学 Patent hotspot discovery and trend analysis method
CN103440329A (en) * 2013-09-04 2013-12-11 北京邮电大学 Authoritative author and high-quality paper recommending system and recommending method
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜婷婷,等: "基于NEViewer的国内外图书情报领域研究热点对比分析", 《情报科学》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563172A (en) * 2020-05-07 2020-08-21 上海宝藤生物医药科技股份有限公司 Academic hotspot trend prediction method and device based on dynamic knowledge graph construction
CN111563172B (en) * 2020-05-07 2024-02-06 上海宝藤生物医药科技股份有限公司 Academic hot spot trend prediction method and device based on dynamic knowledge graph construction
CN111966791A (en) * 2020-09-03 2020-11-20 深圳市小满科技有限公司 Extraction method and retrieval method of customs data product words
CN111966791B (en) * 2020-09-03 2024-04-19 深圳市小满科技有限公司 Method for extracting and retrieving customs data product words
CN112883148A (en) * 2021-01-15 2021-06-01 上海柏观数据科技有限公司 Subject talent evaluation control method and device based on research trend matching
CN113094496A (en) * 2021-04-29 2021-07-09 中国科学院西北生态环境资源研究院 Entry library-based method for designing ranking list of hot words in periodicals
CN113094496B (en) * 2021-04-29 2022-06-17 中国科学院西北生态环境资源研究院 Entry library-based method for designing ranking list of hot words in periodicals

Similar Documents

Publication Publication Date Title
CN109783526A (en) A kind of research hotspot analysis method based on expert's paper big data
Savage et al. Topic XP: Exploring topics in source code using latent Dirichlet allocation
CN104077407B (en) A kind of intelligent data search system and method
CN105389344A (en) Self-service new searching method and system
US11232137B2 (en) Methods for evaluating term support in patent-related documents
AU2013231149B2 (en) Systems and methods for keyword research and content analysis
JP2002245061A (en) Keyword extraction
CN113094512B (en) Fault analysis system and method in industrial production and manufacturing
CN108304382A (en) Mass analysis method based on manufacturing process text data digging and system
CN115757689A (en) Information query system, method and equipment
Tannebaum et al. Using query logs of USPTO patent examiners for automatic query expansion in patent searching
US11640499B2 (en) Systems, methods and computer program products for mining text documents to identify seminal issues and cases
JP5324677B2 (en) Similar document search support device and similar document search support program
Ferreira et al. The relationship of rural tourism with sustainable tourism and outdoor activities: A bibliometric analysis
Miotto et al. Supporting the Curation of Biological Databases Reusable Text Mining
Tannebaum et al. Learning keyword phrases from query logs of USPTO patent examiners for automatic query scope limitation in patent searching
Singh et al. Extractive Text Summarization Techniques of News Articles: Issues, Challenges and Approaches
KR20130068633A (en) Apparatus and method for visualizing data
JP4428703B2 (en) Information retrieval method and system, and computer program
KR101268503B1 (en) Method and its system for generation of patent maps
CN106777191A (en) A kind of search modes generation method and device based on search engine
Tannebaum et al. Mining query logs of uspto patent examiners
KR20040086913A (en) the process and system for finding patent vaccum by text mining
CN107220341A (en) A kind of log analysis method and Log Analysis System
Fong et al. Attribute overlap minimization and outlier elimination as dimensionality reduction techniques for text classification algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190521