CN109783526A - A kind of research hotspot analysis method based on expert's paper big data - Google Patents
A kind of research hotspot analysis method based on expert's paper big data Download PDFInfo
- Publication number
- CN109783526A CN109783526A CN201910029926.6A CN201910029926A CN109783526A CN 109783526 A CN109783526 A CN 109783526A CN 201910029926 A CN201910029926 A CN 201910029926A CN 109783526 A CN109783526 A CN 109783526A
- Authority
- CN
- China
- Prior art keywords
- data
- paper
- word frequency
- expert
- vocabulary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of research hotspot analysis methods based on expert's paper big data, the following steps are included: S1, according to keyword, paper search is carried out by data source of knowledge data base, and grabs Article Titles, deliver time, author, data source these open data;S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary, and the vocabulary of removal verb, adjective part of speech only retains noun, obtains hot spot word lists;S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted, obtains specialized vocabulary list;S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains specialized vocabulary word frequency summary table before choosing;S5, increase time data list word frequency according to different year and divide table on the basis of specialized vocabulary word frequency summary table, obtain the focus variation tendency using the time as axis.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of research hotspot based on expert's paper big data point
Analysis method.
Background technique
Hownet itself has search engine, can be scanned for according to modes such as title, author, keywords, the knot of search
Fruit can also export, and share with softwares such as Note express.For the java application analyzed with visually drawn network altogether
CiteSpace can analyze the development process and structural relation of scientific knowledge, can be complete after the data such as Hownet export title
At the scientific knowledges pedigree analysis such as key word analysis and author relationships analysis.
But the defect that Hownet itself has search engine is to be intended to manually put selection operation every time, and data analysis is come
It says, this working method is undoubtedly very laborious.The limited amount for manually clicking downloading data, under usual human cost does not allow
Total data has been carried, data is caused to be difficult have a totality and comprehensive analysis to present.Moreover, its analytic function be only limitted to
Family input keyword memory and sequence, there is no deeper into analytic function.CiteSpace is still based on manually-operated base
On plinth, efficiency is lower, the imperfect error that also will cause analysis of data, and not can solve specialty analysis problem.
Summary of the invention
The research heat based on expert's paper big data that in view of the deficiencies of the prior art, it is an object of the present invention to provide a kind of
Point analysis method.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of research hotspot analysis method based on expert's paper big data, the described method comprises the following steps:
S1, according to keyword, when carrying out paper search by data source of knowledge data base, and grabbing Article Titles, deliver
Between, author, data source these open data;
S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary,
The vocabulary of verb, adjective part of speech is removed, only retains noun, obtains hot spot word lists;
S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted
It removes, obtains specialized vocabulary list;
S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains profession before choosing
Vocabulary word frequency summary table;
S5, increase time data list word frequency according to different year and divide table on the basis of specialized vocabulary word frequency summary table,
Obtain the focus variation tendency using the time as axis.
Further, described according to keyword, paper search is carried out by data source of network opening database, keyword is
One, or again include the maximum synonym of a similitude or near synonym of this keyword.
Further, the knowledge data base is Hownet database or other network opening numbers comprising professional paper data
According to library.
Further, word frequency is listed according to different year in the step S5 and divides table, obtain using the time as the pass of axis
Note point variation tendency can carry out visualization presentation with graph making.
Further, the graph making uses eCharts picture technology.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1, a kind of research hotspot analysis method based on expert's paper big data provided by the invention, is searched using having a double meaning keyword
Rope is excluded using Article Titles as source data by participle, word frequency sequence, the Sino-Japan everyday words corpus matching of professional paper, thus
Obtain effective specialized vocabulary list;The word frequency data obtained using paper text big data analysis, sample size is huge, have compared with
High fidelity and accuracy represent industry specialists group viewpoint, can make up for it conventional method data sample amount it is smaller and sampling with
The insufficient defect of machine can be analyzed for further expert view subjective assessment and provide data reference.
2, a kind of research hotspot analysis method based on expert's paper big data provided by the invention, can analyze for a long time from
The contribution degree of the expert of the thing area research, all kinds of periodicals are to the attention rate in the field, and by the number such as degree of drawing and attention
According to these data can help scientific research personnel to understand and grasp industry research status rapidly, have high efficiency.
Detailed description of the invention
Fig. 1 is a kind of research hotspot analysis method flow chart based on expert's paper big data of the present invention.
Fig. 2 is the research hotspot analysis method flow chart based on expert's paper big data in the embodiment of the present invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
Embodiment:
As shown in Figure 1, a kind of research hotspot analysis method based on expert's paper big data is present embodiments provided, including
Following steps:
S1, according to keyword, when carrying out paper search by data source of knowledge data base, and grabbing Article Titles, deliver
Between, author, data source these open data;
S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary,
The vocabulary of verb, adjective part of speech is removed, only retains noun, obtains hot spot word lists;
S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted
It removes, obtains specialized vocabulary list;
S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains profession before choosing
Vocabulary word frequency summary table;
S5, increase time data list word frequency according to different year and divide table on the basis of specialized vocabulary word frequency summary table,
Obtain the focus variation tendency using the time as axis.
The above method is described in detail combined with specific embodiments below, as shown in Figure 2, comprising the following steps:
S1, keyword is inputted in webpage, can input a keyword, can also input one of this keyword again
The maximum synonym of similitude or near synonym.Search button log-on data background acquisition program is clicked, will be obtained from Hownet data
The Article Titles list relevant to keyword of library downloading.It is duplicate since two keywords may search duplicate data
Part is unable to superposition calculation.Therefore after the data of two keywords are downloaded respectively, Ying Jinhang duplicate checking simultaneously excludes repeated data.
S2, starting participle program, carry out word segmentation processing to Article Titles list, obtain word lists;
S3, starting part of speech analyze program, delete the vocabulary of the parts of speech such as conjunction, preposition, pronoun, verb, adjective, only retain
Noun obtains hot spot word lists;
S4, starting professional paper day everyday words corpus matcher, exclude word frequency accounting list in hot spot word lists
In day everyday words, according to word frequency descending arrange, obtain specialized vocabulary list;
S5, Article Titles different data sources are distinguished, obtains total data list and different data sources list;
S6, starting mapping program draw hot spot respectively and carry out source distribution according to total data list and different data sources list
The analysis charts such as figure, hot spot accounting analysis chart, hot spot time trend analysis figure, expert's contribution plot, periodical contribution cloud atlas.
The above, only the invention patent preferred embodiment, but the scope of protection of the patent of the present invention is not limited to
This, anyone skilled in the art is in the range disclosed in the invention patent, according to the present invention the skill of patent
Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the scope of protection of the patent of the present invention.
Claims (5)
1. a kind of research hotspot analysis method based on expert's paper big data, which is characterized in that the method includes following steps
It is rapid:
S1, according to keyword, carry out paper search by data source of knowledge data base, and grab Article Titles, deliver the time, make
Person, data source these open data;
S2, the Article Titles according to downloading carry out word segmentation processing, delete conjunction, preposition, pronoun these structural vocabulary, removal
The vocabulary of verb, adjective part of speech only retains noun, obtains hot spot word lists;
S3, exclusive method is matched by professional paper day everyday words corpus, the day everyday words in hot spot word lists is deleted, is obtained
Specialized vocabulary list out;
S4, word frequency analysis is carried out to specialized vocabulary list, and with the arrangement of word frequency descending, hundreds evidence obtains specialized vocabulary before choosing
Word frequency summary table;
S5, increase time data list word frequency according to different year and divide table, obtain on the basis of specialized vocabulary word frequency summary table
Using the time as the focus variation tendency of axis.
2. a kind of research hotspot analysis method based on expert's paper big data according to claim 1, it is characterised in that:
Described to carry out paper search by data source of knowledge data base according to keyword, keyword is one, or again includes this pass
The maximum synonym of a similitude or near synonym of keyword.
3. a kind of research hotspot analysis method based on expert's paper big data according to claim 1, it is characterised in that:
The knowledge data base is Hownet database or other network opening databases comprising professional paper data.
4. a kind of research hotspot analysis method based on expert's paper big data according to claim 1, it is characterised in that:
Word frequency is listed according to different year in the step S5 and divides table, what is obtained can be with as the focus variation tendency of axis using the time
Graph making carries out visualization presentation.
5. a kind of research hotspot analysis method based on expert's paper big data according to claim 4, it is characterised in that:
The graph making uses eCharts picture technology.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811616957 | 2018-12-28 | ||
CN2018116169573 | 2018-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109783526A true CN109783526A (en) | 2019-05-21 |
Family
ID=66500439
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910029926.6A Pending CN109783526A (en) | 2018-12-28 | 2019-01-10 | A kind of research hotspot analysis method based on expert's paper big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109783526A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563172A (en) * | 2020-05-07 | 2020-08-21 | 上海宝藤生物医药科技股份有限公司 | Academic hotspot trend prediction method and device based on dynamic knowledge graph construction |
CN111966791A (en) * | 2020-09-03 | 2020-11-20 | 深圳市小满科技有限公司 | Extraction method and retrieval method of customs data product words |
CN112883148A (en) * | 2021-01-15 | 2021-06-01 | 上海柏观数据科技有限公司 | Subject talent evaluation control method and device based on research trend matching |
CN113094496A (en) * | 2021-04-29 | 2021-07-09 | 中国科学院西北生态环境资源研究院 | Entry library-based method for designing ranking list of hot words in periodicals |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136337A (en) * | 2013-02-01 | 2013-06-05 | 北京邮电大学 | Distributed knowledge data mining device and mining method used for complex network |
CN103164540A (en) * | 2013-04-15 | 2013-06-19 | 武汉大学 | Patent hotspot discovery and trend analysis method |
CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
CN104965931A (en) * | 2015-07-30 | 2015-10-07 | 成都布林特信息技术有限公司 | Big data based public opinion analysis method |
-
2019
- 2019-01-10 CN CN201910029926.6A patent/CN109783526A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136337A (en) * | 2013-02-01 | 2013-06-05 | 北京邮电大学 | Distributed knowledge data mining device and mining method used for complex network |
CN103164540A (en) * | 2013-04-15 | 2013-06-19 | 武汉大学 | Patent hotspot discovery and trend analysis method |
CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
CN104965931A (en) * | 2015-07-30 | 2015-10-07 | 成都布林特信息技术有限公司 | Big data based public opinion analysis method |
Non-Patent Citations (1)
Title |
---|
姜婷婷,等: "基于NEViewer的国内外图书情报领域研究热点对比分析", 《情报科学》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563172A (en) * | 2020-05-07 | 2020-08-21 | 上海宝藤生物医药科技股份有限公司 | Academic hotspot trend prediction method and device based on dynamic knowledge graph construction |
CN111563172B (en) * | 2020-05-07 | 2024-02-06 | 上海宝藤生物医药科技股份有限公司 | Academic hot spot trend prediction method and device based on dynamic knowledge graph construction |
CN111966791A (en) * | 2020-09-03 | 2020-11-20 | 深圳市小满科技有限公司 | Extraction method and retrieval method of customs data product words |
CN111966791B (en) * | 2020-09-03 | 2024-04-19 | 深圳市小满科技有限公司 | Method for extracting and retrieving customs data product words |
CN112883148A (en) * | 2021-01-15 | 2021-06-01 | 上海柏观数据科技有限公司 | Subject talent evaluation control method and device based on research trend matching |
CN113094496A (en) * | 2021-04-29 | 2021-07-09 | 中国科学院西北生态环境资源研究院 | Entry library-based method for designing ranking list of hot words in periodicals |
CN113094496B (en) * | 2021-04-29 | 2022-06-17 | 中国科学院西北生态环境资源研究院 | Entry library-based method for designing ranking list of hot words in periodicals |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109783526A (en) | A kind of research hotspot analysis method based on expert's paper big data | |
Savage et al. | Topic XP: Exploring topics in source code using latent Dirichlet allocation | |
CN104077407B (en) | A kind of intelligent data search system and method | |
CN105389344A (en) | Self-service new searching method and system | |
US11232137B2 (en) | Methods for evaluating term support in patent-related documents | |
AU2013231149B2 (en) | Systems and methods for keyword research and content analysis | |
JP2002245061A (en) | Keyword extraction | |
CN113094512B (en) | Fault analysis system and method in industrial production and manufacturing | |
CN108304382A (en) | Mass analysis method based on manufacturing process text data digging and system | |
CN115757689A (en) | Information query system, method and equipment | |
Tannebaum et al. | Using query logs of USPTO patent examiners for automatic query expansion in patent searching | |
US11640499B2 (en) | Systems, methods and computer program products for mining text documents to identify seminal issues and cases | |
JP5324677B2 (en) | Similar document search support device and similar document search support program | |
Ferreira et al. | The relationship of rural tourism with sustainable tourism and outdoor activities: A bibliometric analysis | |
Miotto et al. | Supporting the Curation of Biological Databases Reusable Text Mining | |
Tannebaum et al. | Learning keyword phrases from query logs of USPTO patent examiners for automatic query scope limitation in patent searching | |
Singh et al. | Extractive Text Summarization Techniques of News Articles: Issues, Challenges and Approaches | |
KR20130068633A (en) | Apparatus and method for visualizing data | |
JP4428703B2 (en) | Information retrieval method and system, and computer program | |
KR101268503B1 (en) | Method and its system for generation of patent maps | |
CN106777191A (en) | A kind of search modes generation method and device based on search engine | |
Tannebaum et al. | Mining query logs of uspto patent examiners | |
KR20040086913A (en) | the process and system for finding patent vaccum by text mining | |
CN107220341A (en) | A kind of log analysis method and Log Analysis System | |
Fong et al. | Attribute overlap minimization and outlier elimination as dimensionality reduction techniques for text classification algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190521 |