WO2021035975A1 - Method and apparatus for predicting hot-topic subject on basis of multiple evaluation dimensions, terminal, and medium - Google Patents

Method and apparatus for predicting hot-topic subject on basis of multiple evaluation dimensions, terminal, and medium Download PDF

Info

Publication number
WO2021035975A1
WO2021035975A1 PCT/CN2019/117967 CN2019117967W WO2021035975A1 WO 2021035975 A1 WO2021035975 A1 WO 2021035975A1 CN 2019117967 W CN2019117967 W CN 2019117967W WO 2021035975 A1 WO2021035975 A1 WO 2021035975A1
Authority
WO
WIPO (PCT)
Prior art keywords
subject
performance data
data
hot
research
Prior art date
Application number
PCT/CN2019/117967
Other languages
French (fr)
Chinese (zh)
Inventor
田欣
赵燕
普丽娜
胡寅骏
张嘉锐
Original Assignee
上海科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海科技发展有限公司 filed Critical 上海科技发展有限公司
Publication of WO2021035975A1 publication Critical patent/WO2021035975A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Definitions

  • This application relates to the technical field of research disciplines, in particular to the prediction methods, devices, terminals, and media of hot disciplines based on multiple evaluation dimensions.
  • the purpose of this application is to provide a method, device, terminal, and medium for predicting hot subjects based on multiple evaluation dimensions to solve the problem that hot subjects cannot be accurately and efficiently predicted in the prior art .
  • the first aspect of this application provides a hot subject prediction method based on multiple evaluation dimensions, which includes: obtaining performance data of multiple research disciplines based on at least one evaluation dimension in a historical period of time; Taking the acquired performance data as model input data, construct a time recurrent neural network model; wherein, the time recurrent neural network model outputs the subject proportion data of each research discipline based on the performance data, and is used to predict corresponding The hot subject at the next time node of the historical period.
  • the evaluation dimension includes the publication of the paper, the publication of the paper includes the quantity of the paper and/or the quality of the paper; wherein, the quality of the paper includes the publication of the paper by other documents. Any one or a combination of the frequency of citations, the number of papers included in top journals, the frequency of papers being reported by media platforms after publication, and the evaluation of papers reported by media platforms after publication.
  • the method further includes: obtaining performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical time period; Assign values to the performance data of, and use the performance data after filling in the missing data as the input data of the model.
  • the method further includes: obtaining performance data of multiple research disciplines based on at least one evaluation dimension in a historical time period; The performance data of is normalized and then used as the input data of the model.
  • the type of the time recurrent neural network model includes an LSTM neural network model that uses a gradient descent algorithm as a model optimizer.
  • the research subject is selected from subject phrases in the vocabulary of the scientific and technological knowledge organization system.
  • the second aspect of this application provides a hot subject prediction device based on multiple evaluation dimensions, which includes: a data acquisition module for acquiring multiple research subjects based on at least one The performance data of the evaluation dimension; the hot subject prediction module, which is used to construct a time recurrent neural network model using the acquired performance data as the model input data; wherein the time recurrent neural network model outputs the results of each research subject based on the research subject
  • the subject weighting data of the performance data is used to predict the hot subjects corresponding to the next time node of the historical period.
  • the third aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to realize the hot subject based on multiple evaluation dimensions method of prediction.
  • a fourth aspect of the present application provides an electronic terminal, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the terminal executes the hot subject prediction method based on multiple evaluation dimensions.
  • the hot subject prediction method, device, terminal, and medium based on multiple evaluation dimensions of this application have the following beneficial effects: the hot subject prediction scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects On the one hand, it organizes the past academic development history and on the other hand discovers future academic development trends; collects multiple dimensions of information such as scientific research funding, information or media publication, paper publication, and patent application status to improve the accuracy of prediction results.
  • FIG. 1 shows a schematic flowchart of a method for predicting hot subjects based on multiple evaluation dimensions in an embodiment of this application.
  • FIG. 2 shows a schematic diagram of the model structure of the LSTM neural network model in an embodiment of this application.
  • FIG. 3 is a schematic diagram showing the structure of the forget gate when transmitting between the units of the hidden layer of the LSTM neural network model in an embodiment of this application.
  • FIG. 4 shows a schematic flowchart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of this application.
  • FIG. 5 shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of this application.
  • FIG. 6 shows a schematic structural diagram of a hot subject prediction device based on multiple evaluation dimensions in an embodiment of this application.
  • FIG. 7 shows a schematic structural diagram of an electronic terminal in an embodiment of this application.
  • the terms “installed”, “connected”, “connected”, “fixed”, “fixed” and other terms should be understood in a broad sense.
  • it can be a fixed connection or a fixed connection.
  • It is a detachable connection or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and it can be the internal communication between two components.
  • installed can be a fixed connection or a fixed connection. It is a detachable connection or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and it can be the internal communication between two components.
  • Hot disciplines refer to research disciplines with high attention and high research value among research disciplines. Investment institutions, media, finance and universities are also constantly paying attention to the changes in scientific research hotspots, hoping to find a way to accurately predict hot topics Methods, in order to arrange future hot subjects in advance. However, in the existing technology, usually only a certain expert or some elite groups make some predictions on the future hot subjects. These predictions are strongly subjective and limited, and cannot accurately predict the changes in the hot subjects.
  • the present invention proposes corresponding solutions to effectively solve these problems in the prior art.
  • the present invention provides prediction methods, devices, terminals, and media for hot subjects based on multi-evaluation dimensions, aiming at the distribution of research funding, information or media releases, paper publications, or patent applications based on the historical time period of each research subject
  • the performance data of multiple dimensions can be used to predict the distribution of hot subjects at the next time node, so that future hot subjects can be predicted accurately and efficiently.
  • FIG. 1 shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application.
  • the hardware device may be a controller, such as an ARM (Advanced RISC Machines) controller, FPGA (Field Programmable Gate Array) controller, SoC (System on Chip) controller, DSP (Digital Signal Processing) controller , Or MCU (Micorcontroller Unit) controller, etc.; the hardware device may also include a memory, a storage controller, one or more processing units (CPU), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, Computer equipment including input/output (I/O) subsystems, display screens, other output or control equipment, and external ports; said computer equipment includes, but is not limited to, desktop computers, laptops, tablets, smart phones, Personal computers such as smart TVs and personal digital assistants (PDAs for short); the hardware device may also be a server, and the server may be arranged on one or more physical servers according to various factors such as function and load. It can be composed
  • the hot subject prediction method based on multiple evaluation dimensions includes step S101 and step S102.
  • step S101 obtain performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical period of time.
  • the research disciplines include but are not limited to: engineering disciplines, science disciplines, agronomy disciplines, medical disciplines, military disciplines, management disciplines, philosophy disciplines, economics disciplines, education disciplines, literature disciplines, and history disciplines , Art Studies and so on.
  • Each type of discipline has several levels of disciplines, such as mathematics, physics, chemistry, and so on. Because there are many types of research disciplines, they will not be listed here.
  • the research subject is selected from subject phrases in the vocabulary of the scientific and technological knowledge organization system.
  • subject phrase concept_group of the "Science and Technology Knowledge Organization System (STKOS)" vocabulary can be used to represent the research subject.
  • the performance data of the research discipline based on at least one evaluation dimension in a historical time period is used to reflect the performance of the research discipline in at least one evaluation dimension in a past time period.
  • this past time period is adjustable and can be changed according to changes in actual research projects; in addition, this past time period can be the unit of time (for example, within the past 10 years) or month The unit (for example, within the past 5 months), and even hours can be the time unit, which is not limited in this embodiment.
  • the evaluation dimensions include, but are not limited to: scientific research funding distribution, information or media release status, paper release status, or patent application status, etc. It should be noted that the evaluation dimension used in this embodiment can include any one of the four evaluation dimensions, or any combination of two or three, or all four evaluation dimensions of the thinking evaluation dimension. , This embodiment is not limited.
  • the scientific research funds generally refer to various expenses for the development of scientific and technological undertakings, which are usually allocated by the government, enterprises, non-governmental organizations, foundations, etc. through entrustment or screening of application reports, which include domestic or foreign The funds are used to solve specific scientific and technical problems.
  • the information or media release status refers to the release status based on the information or media platform.
  • information platforms include, but are not limited to, technology information platforms, financial information platforms, security information platforms, lifestyle information platforms, entertainment information platforms, sports information platforms, regional information platforms, shopping information platforms, and health information platforms.
  • Media platforms include public media platforms and/or self-media platforms; public media platforms refer to media platforms that release information on behalf of the government, such as public broadcasting platforms, public television platforms, public network platforms, etc.; self-media platforms refer to the general public Media platforms that release information through the Internet, such as Weibo platforms, blog platforms, Tieba forum platforms, WeChat platforms (including Moments of Friends, Official Accounts, Mini Programs, etc.), and Alipay platforms (including life accounts, life circles, and small programs). Programs, etc.), MSN platform, etc., which will not be listed here.
  • the publication situation of the paper refers to the publication situation of related papers in each research discipline, and the publication situation of the paper referred to in this application can be multi-dimensional.
  • the number of papers can be used to reflect the publication of the papers.
  • the quality of the papers can be used to reflect the invention of the papers, such as research
  • the frequency with which the subject-related papers are cited by other documents after the release, the number of research subject-related papers published in top journals (such as "nature”, “science", “cell”, etc.), and the number of research discipline-related papers reported by the media platform after the release Frequency, evaluation status brought by media platforms after the publication of relevant papers in research disciplines, etc.
  • the patent application status refers to the issuance of related patents in various research disciplines.
  • the patents referred to in this application can be invention patents, utility model patents or appearance patents, and this embodiment does not limit it.
  • the status of patent applications can reflect statistics from many aspects, such as: number of patent applications, number of authorized patents, percentage of authorized patents, percentage of invention patents, patent quality evaluation scores, remaining right years of authorized patents, and patent achievement conversion status , Patent implementation license, etc., this embodiment does not limit it.
  • the following uses the historical period from 2000 to present as an example to illustrate how to obtain the evaluation dimensions of various research disciplines from 2000 to the present based on the distribution of scientific research funding, information or media publication, the publication of papers, and the status of patent applications. Performance data.
  • the frequency data, etc. are used to measure the social attention of each research discipline; use web crawlers to crawl the number of articles published in domestic and foreign journals or conferences or the number of top journals included in the keyword groups of each research discipline from 2000 to the present.
  • Data use web crawlers to crawl data such as the number of patent applications in various disciplines from 2000 to the present.
  • a time recurrent neural network model is constructed using the acquired performance data as model input data; wherein, the time recurrent neural network model outputs subject proportion data of each research discipline based on the performance data, It is used to predict the hot subjects corresponding to the next time node of the historical period.
  • the time recurrent neural network model is an LSTM neural network model, and this method adds a method of carrying information across multiple time steps.
  • the normalized four-dimensional feature data of each research subject in the past time period (tn ⁇ t-1) is used as the input of the model, and the output of the model is the proportion of each subject used to predict the time t, and finally according to the proportion of the subject Subject hotspots predicted.
  • the principle of the LSTM neural network model is a modification of the RNN neural network, that is, on the basis of the RNN neural network, memory units are added to the neural units in the hidden layer, so that the memory information on the time series can be controlled .
  • controllable gates forgetting gates, input gates, candidate gates, output gates
  • the memory and the degree of forgetting of previous and current information can be controlled, so that the RNN network has Long-term memory function.
  • the LSTM neural network model adopts a gradient descent algorithm as the model optimizer.
  • the gradient descent algorithm is an iterative method that can be used to solve least squares problems (including linear and nonlinear).
  • the gradient descent method can be used to solve it step by step to obtain the minimized loss function and simulated parameter values.
  • the LSTM neural network model uses a loss function to measure the gap between the predicted value output by the neural network model and the actual value.
  • the loss function used in the LSTM neural network model includes a classification loss function and/or a regression loss function.
  • the classification loss function includes, but is not limited to: log loss, focal loss, relative entropy loss, Hinge loss function, etc.;
  • the regression loss function includes but is not limited to: Mean Square Error Loss function, Mean Absolute Error loss function, Log cosh loss function, etc., because these loss functions are already existing, they will not be repeated here.
  • the model structure of the LSTM neural network model is shown in FIG. 2, and the model includes an input layer, a number of hidden layers, and an output layer.
  • the reason why LSTM neural network has "memory" is that there are connections between networks at different "points in time”, rather than the presence of feedforward or feedback in the network at a single point in time, that is, between the hidden layers as shown in Figure 2.
  • FIG. 3 The structure of the forget gate during the transmission between the units of the hidden layer of the LSTM neural network model is shown in Figure 3.
  • the figure lists time points 1 to 7, and each time point corresponds to an input layer, a hidden layer, and an output layer.
  • Each hidden layer neural unit has multiple forgotten gate valve nodes, wherein the valve node 31 marked " ⁇ " represents an open valve, and the valve node 32 marked “—” represents a closed valve.
  • the memory function of the LSTM neural network model is realized by these valve nodes; when the valve is opened, the previous model training results will be associated with the current model calculation; when the valve is closed, the previous calculation results will no longer affect Current calculation.
  • the black solid neural unit represents the neural unit that carries information.
  • the valve node of the hidden layer neural unit at time point 1 that is connected to the input layer neural unit at time point 1 is in the open state, so time point 1
  • the input layer neural unit at time point 1 transmits information to the hidden layer neural unit at time point 1.
  • the valve node of the hidden layer neural unit at time point 1 connected to the output layer neural unit at time point 1 is closed, so time point 1
  • the hidden layer neural unit of is unable to transmit information to the output layer neural unit at time point 1, and so on, the information distribution in Figure 3 can be formed.
  • the acquired performance data of multiple research disciplines based on at least one evaluation dimension in a historical period is divided into a training set and a test set, the training set is used to train the LSTM neural network model, and then used The test set is used to test and adjust the accuracy of the LSTM neural network model.
  • the hot subject forecasting scheme based on multiple evaluation dimensions uses neural network algorithms to predict hot subjects. On the one hand, it sorts out the past academic development history and on the other hand discovers future academic development trends; collects research funding, information or media releases Information on multiple dimensions, such as the situation, the publication of the paper, and the patent application, improves the accuracy of the prediction results.
  • FIG. 4 shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application.
  • the hot subject prediction method includes step S401 and step S402.
  • step S401 obtain performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical period of time. It should be noted that the implementation of step S401 in this embodiment is similar to that of step S101 in the above embodiment, so it will not be repeated.
  • step S402 the average value of the data of adjacent time nodes is used to assign values to the missing performance data, and the performance data after filling in the missing data is used as the model input data.
  • the acquired performance data of multiple research disciplines based on at least one evaluation dimension in a historical period is preliminarily processed and stored in the database.
  • the average value of the data of adjacent time nodes is used.
  • To fill for example, use the average value of the data of adjacent years to fill.
  • FIG. 5 shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application.
  • the hot subject prediction method includes step S501 and step S502.
  • step S501 obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period of time. It should be noted that the implementation of step S501 in this embodiment is similar to the implementation of step S101 in the above embodiment, so it will not be repeated here.
  • step S502 the acquired performance data or the performance data after filling the missing data are normalized and then used as the model input data.
  • Normalization processing is a non-dimensional processing method that turns the absolute value of the physical system value into a certain relative value relationship to ensure the accuracy of the prediction.
  • the normalization method is, for example, the total frequency of related papers in each research discipline divided by the total number of papers published in the year; another example is the scientific research funding received by each research discipline in a certain year divided by the year The total research funding and so on.
  • the hot subject prediction device includes a data acquisition module 61 and a hot subject prediction module 62.
  • the data acquisition module 61 is used to acquire performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;
  • the hot subject prediction module 62 is used to use the acquired performance data as model input data to construct a time recursive nerve Network model; wherein the time recurrent neural network model outputs subject proportion data of each of the research subjects based on the performance data for predicting the hot subjects corresponding to the next time node in the historical period.
  • the implementation manner of the hot subject prediction device based on multiple evaluation dimensions provided in this embodiment is similar to the implementation manner of the hot subject prediction method based on multiple evaluation dimensions provided in the above embodiment, so it will not be repeated.
  • the division of the various modules of the above device is only a division of logical functions, and may be fully or partially integrated into one physical entity during actual implementation, or may be physically separated.
  • these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; some modules can be implemented in the form of calling software by processing elements, and some of the modules can be implemented in the form of hardware.
  • the data acquisition module may be a separately established processing element, or it may be integrated in a chip of the above-mentioned device for implementation.
  • it may also be stored in the memory of the above-mentioned device in the form of program code and processed by one of the above-mentioned devices.
  • the component calls and executes the functions of the above data acquisition module.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated together or implemented independently.
  • the processing element described here may be an integrated circuit with signal processing capabilities.
  • each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (ASICs for short), or one or more microprocessors ( Digital signal processor, DSP for short), or, one or more Field Programmable Gate Arrays (FPGA for short), etc.
  • ASICs application specific integrated circuits
  • DSP Digital signal processor
  • FPGA Field Programmable Gate Arrays
  • the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU for short) or other processors that can call program codes.
  • these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).
  • SOC system-on-a-chip
  • FIG. 7 there is shown a schematic structural diagram of still another electronic terminal provided by an embodiment of the present application.
  • the electronic terminal provided in this example includes: a processor 71 and a memory 72; the memory 72 is connected to the processor 71 through a system bus and completes mutual communication, the memory 72 is used to store computer programs, and the processor 71 is used to run the computer programs. Make the electronic terminal execute each step of the hot subject prediction method based on multiple evaluation dimensions.
  • the aforementioned system bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the system bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used to realize the communication between the database access device and other devices (such as the client, the read-write library and the read-only library).
  • the memory may include random access memory (Random Access Memory, RAM for short), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP for short) , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method for predicting a hot subject based on multiple evaluation dimensions is realized.
  • a person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by hardware related to a computer program.
  • the aforementioned computer program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the foregoing method embodiments; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.
  • this application provides methods, devices, terminals, and media for predicting hot subjects based on multiple evaluation dimensions.
  • the hot subject prediction scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects.
  • the past academic development history on the one hand, discovers future academic development trends; collects multiple dimensions of information such as scientific research funding, information or media releases, papers released, and patent applications to improve the accuracy of prediction results. Therefore, this application effectively overcomes various shortcomings in the prior art and has a high industrial value.

Abstract

Provided in the present application are a method and apparatus for predicting a hot-topic subject on the basis of multiple evaluation dimensions, a terminal, and a medium. The method comprises: acquiring performance data of a plurality of research subjects based on at least one evaluation dimension in a historical period of time; and using the acquired performance data as model input data to construct a time recurrent neural network model, wherein the time recurrent neural network model outputs performance data-based subject proportion data of each research subject, and is used to predict a hot-topic subject corresponding to a next time node of the historical period of time. The scheme for predicting a hot-topic subject on the basis of multiple evaluation dimensions provided in the present invention uses a neural network algorithm to predict a hot-topic subject. On the one hand, past academic development history is organized, and on the other hand, future academic development trends are discovered. Multi-dimensional information such as scientific research funding status, information or media release status, paper publication status and patent application status is collected, and the accuracy of the prediction results is improved.

Description

基于多评价维度的热点学科预测方法、装置、终端、及介质Forecasting methods, devices, terminals, and media for hot subjects based on multiple evaluation dimensions 技术领域Technical field
本申请涉及研究学科技术领域,特别是涉及基于多评价维度的热点学科预测方法、装置、终端、及介质。This application relates to the technical field of research disciplines, in particular to the prediction methods, devices, terminals, and media of hot disciplines based on multiple evaluation dimensions.
背景技术Background technique
随着政府和社会对科研关注度的增加,各投资机构、媒体、财政及高校时刻关注着科研热点的变动,期望能够找到一种精确预测方法能够提前得知下一个科研热点。但是,目前尚不存在一种能够准确又高效地预测科研热点变动的方法,已然成为本领域亟需解决的问题。As the government and society pay more attention to scientific research, various investment institutions, media, finance and universities are always paying attention to the changes in scientific research hotspots, hoping to find an accurate prediction method to know the next scientific research hotspot in advance. However, there is currently no method that can accurately and efficiently predict changes in scientific research hotspots, which has become an urgent problem in this field.
申请内容Application content
鉴于以上所述现有技术的缺点,本申请的目的在于提供基于多评价维度的热点学科预测方法、装置、终端、及介质,用于解决现有技术中无法准确又高效地预测热点学科的问题。In view of the above-mentioned shortcomings of the prior art, the purpose of this application is to provide a method, device, terminal, and medium for predicting hot subjects based on multiple evaluation dimensions to solve the problem that hot subjects cannot be accurately and efficiently predicted in the prior art .
为实现上述目的及其他相关目的,本申请的第一方面提供一种基于多评价维度的热点学科预测方法,其包括:获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;以所获取的表现数据为模型输入数据,构建一时间递归神经网络模型;其中,所述时间递归神经网络模型输出各所述研究学科的基于所述表现数据的学科比重数据,用于预测对应于所述历史时段的下一时间节点的热点学科。In order to achieve the above and other related purposes, the first aspect of this application provides a hot subject prediction method based on multiple evaluation dimensions, which includes: obtaining performance data of multiple research disciplines based on at least one evaluation dimension in a historical period of time; Taking the acquired performance data as model input data, construct a time recurrent neural network model; wherein, the time recurrent neural network model outputs the subject proportion data of each research discipline based on the performance data, and is used to predict corresponding The hot subject at the next time node of the historical period.
于本申请的第一方面的一些实施例中,所述评价维度包括论文发布情况,所述论文发布情况包括论文数量情况和/或论文质量情况;其中,所述论文质量情况包括论文被其他文献引用的频次情况、论文被顶尖期刊收录的数量情况、论文发布后被媒体平台报道的频次情况、及论文发布后被媒体平台报道所带来的评价情况中的任意一种或多种组合。In some embodiments of the first aspect of the application, the evaluation dimension includes the publication of the paper, the publication of the paper includes the quantity of the paper and/or the quality of the paper; wherein, the quality of the paper includes the publication of the paper by other documents. Any one or a combination of the frequency of citations, the number of papers included in top journals, the frequency of papers being reported by media platforms after publication, and the evaluation of papers reported by media platforms after publication.
于本申请的第一方面的一些实施例中,所述方法还包括:获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;采用相邻时间节点的数据平均值来为缺失的表现数据赋值,并将填补缺失数据后的表现数据作为所述模型输入数据。In some embodiments of the first aspect of the present application, the method further includes: obtaining performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical time period; Assign values to the performance data of, and use the performance data after filling in the missing data as the input data of the model.
于本申请的第一方面的一些实施例中,所述方法还包括:获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;将所获取的表现数据或将填补缺失数据后的表现数据做归一化处理后再作为所述模型输入数据。In some embodiments of the first aspect of the present application, the method further includes: obtaining performance data of multiple research disciplines based on at least one evaluation dimension in a historical time period; The performance data of is normalized and then used as the input data of the model.
于本申请的第一方面的一些实施例中,所述时间递归神经网络模型的类型包括采用梯度下降算法作为模型优化器的LSTM神经网络模型。In some embodiments of the first aspect of the present application, the type of the time recurrent neural network model includes an LSTM neural network model that uses a gradient descent algorithm as a model optimizer.
于本申请的第一方面的一些实施例中,所述研究学科选自科技知识组织体系词表的学科词组。In some embodiments of the first aspect of the present application, the research subject is selected from subject phrases in the vocabulary of the scientific and technological knowledge organization system.
为实现上述目的及其他相关目的,本申请的第二方面提供一种基于多评价维度的热点学科预测装置,其包括:数据获取模块,用于获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;热点学科预测模块,用于以所获取的表现数据为模型输入数据,构建一时间递归神经网络模型;其中,所述时间递归神经网络模型输出各所述研究学科的基于所述表现数据的学科比重数据,用于预测对应于所述历史时段的下一时间节点的热点学科。In order to achieve the above and other related purposes, the second aspect of this application provides a hot subject prediction device based on multiple evaluation dimensions, which includes: a data acquisition module for acquiring multiple research subjects based on at least one The performance data of the evaluation dimension; the hot subject prediction module, which is used to construct a time recurrent neural network model using the acquired performance data as the model input data; wherein the time recurrent neural network model outputs the results of each research subject based on the research subject The subject weighting data of the performance data is used to predict the hot subjects corresponding to the next time node of the historical period.
为实现上述目的及其他相关目的,本申请的第三方面提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述基于多评价维度的热点学科预测方法。In order to achieve the above objectives and other related objectives, the third aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to realize the hot subject based on multiple evaluation dimensions method of prediction.
为实现上述目的及其他相关目的,本申请的第四方面提供一种电子终端,包括:处理器及存储器;所述存储器用于存储计算机程序,所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行所述基于多评价维度的热点学科预测方法。To achieve the foregoing and other related purposes, a fourth aspect of the present application provides an electronic terminal, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the terminal executes the hot subject prediction method based on multiple evaluation dimensions.
如上所述,本申请的基于多评价维度的热点学科预测方法、装置、终端、及介质,具有以下有益效果:本发明提供的基于多评价维度的热点学科预测方案,采用神经网络算法预测热点学科,一方面整理过去的学术发展历史,一方面发现未来学术发展趋势;收集科研经费情况、资讯或媒体发稿情况、论文发布情况、及专利申请情况等多个维度信息,提高预测结果的准确度。As mentioned above, the hot subject prediction method, device, terminal, and medium based on multiple evaluation dimensions of this application have the following beneficial effects: the hot subject prediction scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects On the one hand, it organizes the past academic development history and on the other hand discovers future academic development trends; collects multiple dimensions of information such as scientific research funding, information or media publication, paper publication, and patent application status to improve the accuracy of prediction results.
附图说明Description of the drawings
图1显示为本申请一实施例中的基于多评价维度的热点学科预测方法的流程示意图。FIG. 1 shows a schematic flowchart of a method for predicting hot subjects based on multiple evaluation dimensions in an embodiment of this application.
图2显示为本申请一实施例中的LSTM神经网络模型的模型结构示意图。FIG. 2 shows a schematic diagram of the model structure of the LSTM neural network model in an embodiment of this application.
图3显示为本申请一实施例中的LSTM神经网络模型的隐藏层各单元间传递时的遗忘门的结构示意图。FIG. 3 is a schematic diagram showing the structure of the forget gate when transmitting between the units of the hidden layer of the LSTM neural network model in an embodiment of this application.
图4显示为本申请一实施例中的基于多评价维度的热点学科预测方法的流程示意图。FIG. 4 shows a schematic flowchart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of this application.
图5显示为本申请一实施例中的基于多评价维度的热点学科预测方法的流程示意图。FIG. 5 shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of this application.
图6显示为本申请一实施例中的基于多评价维度的热点学科预测装置的结构示意图。FIG. 6 shows a schematic structural diagram of a hot subject prediction device based on multiple evaluation dimensions in an embodiment of this application.
图7显示为本申请一实施例中电子终端的结构示意图。FIG. 7 shows a schematic structural diagram of an electronic terminal in an embodiment of this application.
具体实施方式detailed description
以下通过特定的具体实例说明本申请的实施方式,本领域技术人员可由本说明书所揭露 的内容轻易地了解本申请的其他优点与功效。本申请还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本申请的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The following describes the implementation of the present application through specific specific examples, and those skilled in the art can easily understand the other advantages and effects of the present application from the content disclosed in this specification. This application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the application. It should be noted that, in the case of no conflict, the following embodiments and the features in the embodiments can be combined with each other.
需要说明的是,在下述描述中,参考附图,附图描述了本申请的若干实施例。应当理解,还可使用其他实施例,并且可以在不背离本申请的精神和范围的情况下进行机械组成、结构、电气以及操作上的改变。下面的详细描述不应该被认为是限制性的,并且本申请的实施例的范围仅由公布的专利的权利要求书所限定。这里使用的术语仅是为了描述特定实施例,而并非旨在限制本申请。空间相关的术语,例如“上”、“下”、“左”、“右”、“下面”、“下方”、“下部”、“上方”、“上部”等,可在文中使用以便于说明图中所示的一个元件或特征与另一元件或特征的关系。It should be noted that in the following description, with reference to the accompanying drawings, the accompanying drawings describe several embodiments of the present application. It should be understood that other embodiments can also be used, and mechanical, structural, electrical, and operational changes can be made without departing from the spirit and scope of the application. The following detailed description should not be considered restrictive, and the scope of the embodiments of the present application is limited only by the claims of the published patent. The terms used here are only for describing specific embodiments, and are not intended to limit the application. Space-related terms, such as "upper", "lower", "left", "right", "below", "below", "lower", "above", "upper", etc., can be used in the text for ease of explanation The relationship between one element or feature shown in the figure and another element or feature.
在本申请中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”、“固持”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。In this application, unless expressly stipulated and limited otherwise, the terms "installed", "connected", "connected", "fixed", "fixed" and other terms should be understood in a broad sense. For example, it can be a fixed connection or a fixed connection. It is a detachable connection or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and it can be the internal communication between two components. For those of ordinary skill in the art, the specific meanings of the above-mentioned terms in this application can be understood according to specific circumstances.
再者,如同在本文中所使用的,单数形式“一”、“一个”和“该”旨在也包括复数形式,除非上下文中有相反的指示。应当进一步理解,术语“包含”、“包括”表明存在所述的特征、操作、元件、组件、项目、种类、和/或组,但不排除一个或多个其他特征、操作、元件、组件、项目、种类、和/或组的存在、出现或添加。此处使用的术语“或”和“和/或”被解释为包括性的,或意味着任一个或任何组合。因此,“A、B或C”或者“A、B和/或C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A、B和C”。仅当元件、功能或操作的组合在某些方式下内在地互相排斥时,才会出现该定义的例外。Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to also include the plural forms, unless the context dictates to the contrary. It should be further understood that the terms "including" and "including" indicate the presence of the described features, operations, elements, components, items, types, and/or groups, but do not exclude one or more other features, operations, elements, components, The existence, appearance or addition of items, categories, and/or groups. The terms "or" and "and/or" used herein are interpreted as inclusive or mean any one or any combination. Therefore, "A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A, B and C" . An exception to this definition will only occur if the combination of elements, functions, or operations is inherently mutually exclusive in some way.
热点学科是指研究学科中受关注度较高且研究价值较高的研究学科,各投资机构、媒体、财政及高校也时刻关注着科研热点的变动,都希望能够找到一种精确预测热点学科的方法,以期提前布局未来的热点学科。但在现有技术中,通常只是由某位专家或某些精英团体对未来的热点学科做一些预测,这些预测都带有强烈的主观性和局限性,并不能准确地预测热点学科的变动。Hot disciplines refer to research disciplines with high attention and high research value among research disciplines. Investment institutions, media, finance and universities are also constantly paying attention to the changes in scientific research hotspots, hoping to find a way to accurately predict hot topics Methods, in order to arrange future hot subjects in advance. However, in the existing technology, usually only a certain expert or some elite groups make some predictions on the future hot subjects. These predictions are strongly subjective and limited, and cannot accurately predict the changes in the hot subjects.
有鉴于此,本发明提出了对应的解决方案来有效解决现有技术中的这些难题。本发明提供基于多评价维度的热点学科预测方法、装置、终端、及介质,旨在根据各个研究学科在历 史时段内基于科研经费分布情况、资讯或媒体发稿情况、论文发布情况、或专利申请情况等多个维度的表现数据,来预测下一时间节点的热点学科分布情况,从而能够精准又高效地预测未来的热点学科。In view of this, the present invention proposes corresponding solutions to effectively solve these problems in the prior art. The present invention provides prediction methods, devices, terminals, and media for hot subjects based on multi-evaluation dimensions, aiming at the distribution of research funding, information or media releases, paper publications, or patent applications based on the historical time period of each research subject The performance data of multiple dimensions can be used to predict the distribution of hot subjects at the next time node, so that future hot subjects can be predicted accurately and efficiently.
如图1所示,展示本申请一实施例中的基于多评价维度的热点学科预测方法的流程示意图。As shown in FIG. 1, it shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application.
需说明的是,本申请中的基于多评价维度的热点学科预测方法可应用于多种类型的硬件设备。具体而言,所述硬件设备可以是控制器,例如ARM(Advanced RISC Machines)控制器、FPGA(Field Programmable Gate Array)控制器、SoC(System on Chip)控制器、DSP(Digital Signal Processing)控制器、或者MCU(Micorcontroller Unit)控制器等等;所述硬件设备也可以是包括存储器、存储控制器、一个或多个处理单元(CPU)、外设接口、RF电路、音频电路、扬声器、麦克风、输入/输出(I/O)子系统、显示屏、其他输出或控制设备,以及外部端口等组件的计算机设备;所述计算机设备包括但不限于如台式电脑、笔记本电脑、平板电脑、智能手机、智能电视、个人数字助理(Personal Digital Assistant,简称PDA)等个人电脑;所述硬件设备还可以是服务器,所述服务器可根据功能、负载等多种因素布置在一个或多个实体服务器上,也可以由分布的或集中的服务器集群构成,本实施例不作限定。It should be noted that the hot subject prediction method based on multiple evaluation dimensions in this application can be applied to various types of hardware devices. Specifically, the hardware device may be a controller, such as an ARM (Advanced RISC Machines) controller, FPGA (Field Programmable Gate Array) controller, SoC (System on Chip) controller, DSP (Digital Signal Processing) controller , Or MCU (Micorcontroller Unit) controller, etc.; the hardware device may also include a memory, a storage controller, one or more processing units (CPU), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, Computer equipment including input/output (I/O) subsystems, display screens, other output or control equipment, and external ports; said computer equipment includes, but is not limited to, desktop computers, laptops, tablets, smart phones, Personal computers such as smart TVs and personal digital assistants (PDAs for short); the hardware device may also be a server, and the server may be arranged on one or more physical servers according to various factors such as function and load. It can be composed of a distributed or centralized server cluster, which is not limited in this embodiment.
在本实施例中,所述基于多评价维度的热点学科预测方法包括步骤S101和步骤S102。In this embodiment, the hot subject prediction method based on multiple evaluation dimensions includes step S101 and step S102.
在步骤S101中,获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据。In step S101, obtain performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical period of time.
可选的,所述研究学科包括但不限于:工学学科、理学学科、农学学科、医学学科、军事学学科、管理学学科、哲学学科、经济学学科、教育学学科、文学学科、历史学学科、艺术学学科等等。每一类学科又下设有若干级学科,例如理学学科下设有数学学科、物理学科、化学学科等等。因研究学科类别较多,此处不再一一列举。Optionally, the research disciplines include but are not limited to: engineering disciplines, science disciplines, agronomy disciplines, medical disciplines, military disciplines, management disciplines, philosophy disciplines, economics disciplines, education disciplines, literature disciplines, and history disciplines , Art Studies and so on. Each type of discipline has several levels of disciplines, such as mathematics, physics, chemistry, and so on. Because there are many types of research disciplines, they will not be listed here.
可选的,所述研究学科选自科技知识组织体系词表的学科词组。具体来说,可将“科技知识组织体系(STKOS)”词表的concept_group学科词组代表研究学科。Optionally, the research subject is selected from subject phrases in the vocabulary of the scientific and technological knowledge organization system. Specifically, the subject phrase concept_group of the "Science and Technology Knowledge Organization System (STKOS)" vocabulary can be used to represent the research subject.
可选的,所述研究学科在一历史时段内基于至少一个评价维度的表现数据,用于反映研究学科在一过往时间段内在至少一个评价维度上的表现情况。需说明的是,这一过往时间段是可调整的,可根据实际研究项目的变化而发生变化;另外,这一过往时间段可以年为时间单位(如过往10年内),也可以月为时间单位(如过往5个月内),甚至也可以小时为时间单位,本实施例不作限定。Optionally, the performance data of the research discipline based on at least one evaluation dimension in a historical time period is used to reflect the performance of the research discipline in at least one evaluation dimension in a past time period. It should be noted that this past time period is adjustable and can be changed according to changes in actual research projects; in addition, this past time period can be the unit of time (for example, within the past 10 years) or month The unit (for example, within the past 5 months), and even hours can be the time unit, which is not limited in this embodiment.
可选的,所述评价维度包括但不限于:科研经费分布情况、资讯或媒体发稿情况、论文发布情况、或者专利申请情况等等。需说明的是,本实施例所采用的评价维度可包括这四个 评价维度中的任意一个维度,也可包括这思维评价维度中任意的两两组合、三三组合、或者全部四个评价维度,本实施例不作限定。Optionally, the evaluation dimensions include, but are not limited to: scientific research funding distribution, information or media release status, paper release status, or patent application status, etc. It should be noted that the evaluation dimension used in this embodiment can include any one of the four evaluation dimensions, or any combination of two or three, or all four evaluation dimensions of the thinking evaluation dimension. , This embodiment is not limited.
其中,所述科研经费泛指各种用于发展科学技术事业而支出的费用,通常由政府、企业、民间组织、基金会等通过委托方式或者对申请报告的筛选来分配,其包括国内或国外的经费,用于解决特定的科学和技术问题。Among them, the scientific research funds generally refer to various expenses for the development of scientific and technological undertakings, which are usually allocated by the government, enterprises, non-governmental organizations, foundations, etc. through entrustment or screening of application reports, which include domestic or foreign The funds are used to solve specific scientific and technical problems.
所述资讯或媒体发稿情况是指基于资讯或媒体平台的发稿情况。其中,资讯平台包括但不限于科技类资讯平台、财经类资讯平台、安全类资讯平台、生活类资讯平台、文娱类资讯平台、体育类资讯平台、地区类资讯平台、购物类资讯平台、健康类资讯平台、旅游类资讯平台、教育类资讯平台等等。媒体平台包括公共媒体平台和/或自媒体平台;公共媒体平台是指代表官方向外发布信息的媒体平台,如公共广播平台、公共电视平台、公共网络平台等等;自媒体平台是指普通大众通过网络等途径向外发布信息的媒体平台,如微博平台、博客平台、贴吧论坛平台、微信平台(包括朋友圈、公众号、小程序等)、支付宝平台(包括生活号、生活圈、小程序等)、MSN平台等等,此处不再一一列举。The information or media release status refers to the release status based on the information or media platform. Among them, information platforms include, but are not limited to, technology information platforms, financial information platforms, security information platforms, lifestyle information platforms, entertainment information platforms, sports information platforms, regional information platforms, shopping information platforms, and health information platforms. Information platform, tourism information platform, education information platform, etc. Media platforms include public media platforms and/or self-media platforms; public media platforms refer to media platforms that release information on behalf of the government, such as public broadcasting platforms, public television platforms, public network platforms, etc.; self-media platforms refer to the general public Media platforms that release information through the Internet, such as Weibo platforms, blog platforms, Tieba forum platforms, WeChat platforms (including Moments of Friends, Official Accounts, Mini Programs, etc.), and Alipay platforms (including life accounts, life circles, and small programs). Programs, etc.), MSN platform, etc., which will not be listed here.
所述论文发布情况是指各研究学科相关论文的发布情况,本申请所指的论文发布情况可以是多维度的。一方面,可采用论文数量来反映论文发布情况,例如研究学科相关论文发布数量越多则说明该研究学科的论文发布情况越好;再一方面,可采用论文质量来反映论文发明情况,例如研究学科相关论文发布后被其它文献引用的频次、研究学科相关论文在顶尖期刊(例如《nature》、《science》、《cell》等)上发布的数量、研究学科相关论文发布后被媒体平台报道的频次、研究学科相关论文发布后被媒体平台报道所带来的评价情况等等。The publication situation of the paper refers to the publication situation of related papers in each research discipline, and the publication situation of the paper referred to in this application can be multi-dimensional. On the one hand, the number of papers can be used to reflect the publication of the papers. For example, the more the number of papers related to the research discipline, the better the publication of the papers in the research discipline; on the other hand, the quality of the papers can be used to reflect the invention of the papers, such as research The frequency with which the subject-related papers are cited by other documents after the release, the number of research subject-related papers published in top journals (such as "nature", "science", "cell", etc.), and the number of research discipline-related papers reported by the media platform after the release Frequency, evaluation status brought by media platforms after the publication of relevant papers in research disciplines, etc.
所述专利申请情况是指各研究学科相关专利的发布情况,本申请所指的专利既可以是发明专利,也可以是实用新型专利或者外观专利,本实施例不作限定。另外,专利申请情况可从多个方面来反映统计,例如:专利申请数量、授权专利数量、授权专利占比、发明专利占比、专利质量评估得分、授权专利的剩余权利年限、专利成果转化情况、专利被实施许可情况等等,本实施例不作限定。The patent application status refers to the issuance of related patents in various research disciplines. The patents referred to in this application can be invention patents, utility model patents or appearance patents, and this embodiment does not limit it. In addition, the status of patent applications can reflect statistics from many aspects, such as: number of patent applications, number of authorized patents, percentage of authorized patents, percentage of invention patents, patent quality evaluation scores, remaining right years of authorized patents, and patent achievement conversion status , Patent implementation license, etc., this embodiment does not limit it.
下面以2000年至今作为本实施例的历史时段为例,来说明如何获取各个研究学科从2000年至今基于科研经费分布情况、资讯或媒体发稿情况、论文发布情况、以及专利申请情况这些评价维度的表现数据。The following uses the historical period from 2000 to present as an example to illustrate how to obtain the evaluation dimensions of various research disciplines from 2000 to the present based on the distribution of scientific research funding, information or media publication, the publication of papers, and the status of patent applications. Performance data.
利用网络爬虫爬取从2000年至今,国内外科研经费在各研究学科的投入金额的分布情况数据;利用网络爬虫爬取从2000年至今,各研究学科的关键词组在国内外资讯或稿件中出现的频次数据等等,用来衡量各研究学科的社会关注度;利用网络爬虫爬取从2000年至今,各 究学科的关键词组在国内外期刊或会议的发文数量或被顶尖期刊收录的数量等数据;利用网络爬虫爬取从2000年至今,各究学科的专利申请数量等数据。Use a web crawler to crawl the distribution data of the amount of investment in various research disciplines at home and abroad from 2000 to the present; use a web crawler to crawl from 2000 to the present, the keyword groups of various research disciplines appear in domestic and foreign news or manuscripts The frequency data, etc., are used to measure the social attention of each research discipline; use web crawlers to crawl the number of articles published in domestic and foreign journals or conferences or the number of top journals included in the keyword groups of each research discipline from 2000 to the present. Data; use web crawlers to crawl data such as the number of patent applications in various disciplines from 2000 to the present.
在步骤S102中,以所获取的表现数据为模型输入数据,构建一时间递归神经网络模型;其中,所述时间递归神经网络模型输出各所述研究学科的基于所述表现数据的学科比重数据,用于预测对应于所述历史时段的下一时间节点的热点学科。In step S102, a time recurrent neural network model is constructed using the acquired performance data as model input data; wherein, the time recurrent neural network model outputs subject proportion data of each research discipline based on the performance data, It is used to predict the hot subjects corresponding to the next time node of the historical period.
可选的,所述时间递归神经网络模型为LSTM神经网络模型,该方法增加了一种携带信息跨越多个时间步的方法。将各个研究学科在过去时间段(t-n~t-1),经过归一化处理后的四维特征数据作为模型的输入,模型的输出则是用于预测时间t的各学科比重,最终根据学科比重得到预测的学科热点。Optionally, the time recurrent neural network model is an LSTM neural network model, and this method adds a method of carrying information across multiple time steps. The normalized four-dimensional feature data of each research subject in the past time period (tn~t-1) is used as the input of the model, and the output of the model is the proportion of each subject used to predict the time t, and finally according to the proportion of the subject Subject hotspots predicted.
具体而言,LSTM神经网络模型的原理是一种RNN神经网络的变形,即在RNN神经网络的基础上在隐藏层中的各神经单元中增加记忆单元,从而使时间序列上的记忆信息可控,每次在隐藏层各单元间传递时通过几个可控门(遗忘门、输入门、候选门、输出门),可以控制之前信息和当前信息的记忆和遗忘程度,从而使RNN网络具备了长期记忆功能。Specifically, the principle of the LSTM neural network model is a modification of the RNN neural network, that is, on the basis of the RNN neural network, memory units are added to the neural units in the hidden layer, so that the memory information on the time series can be controlled , Through several controllable gates (forgetting gates, input gates, candidate gates, output gates) each time when passing between the units of the hidden layer, the memory and the degree of forgetting of previous and current information can be controlled, so that the RNN network has Long-term memory function.
可选的,所述LSTM神经网络模型采用梯度下降算法那作为模型优化器。梯度下降算法是迭代法的一种,可以用于求解最小二乘问题(包括线性和非线性)。在求解损失函数的最小值时,可通过梯度下降法来一步步的迭代求解,得到最小化的损失函数和模拟参数值。Optionally, the LSTM neural network model adopts a gradient descent algorithm as the model optimizer. The gradient descent algorithm is an iterative method that can be used to solve least squares problems (including linear and nonlinear). When solving the minimum value of the loss function, the gradient descent method can be used to solve it step by step to obtain the minimized loss function and simulated parameter values.
可选的,LSTM神经网络模型采用损失函数来度量神经网络模型输出的预测值与实际值之间的差距。LSTM神经网络模型所采用的损失函数包括分类损失函数和/或回归损失函数。其中,所述分类损失函数包括但不限于:对数损失函数(log loss)、focal loss损失函数、相对熵损失函数、Hinge损失函数等等;所述回归损失函数包括但不限于:Mean Square Error损失函数、Mean Absolute Error损失函数、Log cosh损失函数等等,因这些损失函数本身已为现有,故不再赘述。Optionally, the LSTM neural network model uses a loss function to measure the gap between the predicted value output by the neural network model and the actual value. The loss function used in the LSTM neural network model includes a classification loss function and/or a regression loss function. The classification loss function includes, but is not limited to: log loss, focal loss, relative entropy loss, Hinge loss function, etc.; the regression loss function includes but is not limited to: Mean Square Error Loss function, Mean Absolute Error loss function, Log cosh loss function, etc., because these loss functions are already existing, they will not be repeated here.
在一实施例中,LSTM神经网络模型的模型结构如图2所示,该模型包括输入层、若干隐藏层、以及输出层。LSTM神经网络具有“记忆性”的原因在于不同“时间点”之间的网络存在连接,而不是单个时间点处的网络存在前馈或者反馈,也即如图2所示的各隐藏层之间有虚线箭头进行连接,其中的虚线箭头代表按照时间步序列进行神经单元之间的跳转连接。In an embodiment, the model structure of the LSTM neural network model is shown in FIG. 2, and the model includes an input layer, a number of hidden layers, and an output layer. The reason why LSTM neural network has "memory" is that there are connections between networks at different "points in time", rather than the presence of feedforward or feedback in the network at a single point in time, that is, between the hidden layers as shown in Figure 2. There are dashed arrows to connect, and the dashed arrows represent the jump connection between neural units according to the time step sequence.
LSTM神经网络模型的隐藏层各单元间传递时的遗忘门的结构如图3所示,图中列举了时间点1~7,每个时间点上方对应有输入层、隐藏层、及输出层。每个隐藏层神经单元都带有多个遗忘门的阀门节点,其中标记为“○”的阀门节点31表示是打开的阀门,而标记为“—”的阀门节点32表示是关闭的阀门。LSTM神经网络模型的记忆功能就是由这些阀门节点实现 的;当阀门打开的时候,之前的模型训练结果就会关联到当前的模型计算;而当阀门关闭的时候,之前的计算结果就不再影响当前的计算。The structure of the forget gate during the transmission between the units of the hidden layer of the LSTM neural network model is shown in Figure 3. The figure lists time points 1 to 7, and each time point corresponds to an input layer, a hidden layer, and an output layer. Each hidden layer neural unit has multiple forgotten gate valve nodes, wherein the valve node 31 marked "○" represents an open valve, and the valve node 32 marked "—" represents a closed valve. The memory function of the LSTM neural network model is realized by these valve nodes; when the valve is opened, the previous model training results will be associated with the current model calculation; when the valve is closed, the previous calculation results will no longer affect Current calculation.
具体的,黑色实心的神经单元表示携带信息的神经单元,在图3中,时间点1的隐藏层神经单元的与时间点1的输入层神经单元相连的阀门节点是打开状态,故时间点1的输入层神经单元将信息传递至时间点1的隐藏层神经单元;但是,时间点1的隐藏层神经单元的与时间点1的输出层神经单元相连的阀门节点是关闭状态,故时间点1的隐藏层神经单元无法将信息传递至时间点1的输出层神经单元,以此类推便可形成图3中的信息分布情况。Specifically, the black solid neural unit represents the neural unit that carries information. In Figure 3, the valve node of the hidden layer neural unit at time point 1 that is connected to the input layer neural unit at time point 1 is in the open state, so time point 1 The input layer neural unit at time point 1 transmits information to the hidden layer neural unit at time point 1. However, the valve node of the hidden layer neural unit at time point 1 connected to the output layer neural unit at time point 1 is closed, so time point 1 The hidden layer neural unit of is unable to transmit information to the output layer neural unit at time point 1, and so on, the information distribution in Figure 3 can be formed.
在一些可选的实现方式中,将所获取的多个研究学科在一历史时段内基于至少一个评价维度的表现数据分为训练集和测试集,利用训练集来训练LSTM神经网络模型,再用测试集来测试并调整LSTM神经网络模型的准确性。In some optional implementations, the acquired performance data of multiple research disciplines based on at least one evaluation dimension in a historical period is divided into a training set and a test set, the training set is used to train the LSTM neural network model, and then used The test set is used to test and adjust the accuracy of the LSTM neural network model.
因此,本发明提供的基于多评价维度的热点学科预测方案,采用神经网络算法预测热点学科,一方面整理过去的学术发展历史,一方面发现未来学术发展趋势;收集科研经费情况、资讯或媒体发稿情况、论文发布情况、及专利申请情况等多个维度信息,提高预测结果的准确度。Therefore, the hot subject forecasting scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects. On the one hand, it sorts out the past academic development history and on the other hand discovers future academic development trends; collects research funding, information or media releases Information on multiple dimensions, such as the situation, the publication of the paper, and the patent application, improves the accuracy of the prediction results.
如图4所示,展示本申请一实施例中的基于多评价维度的热点学科预测方法的流程示意图。热点学科预测方法包括步骤S401和步骤S402。As shown in FIG. 4, it shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application. The hot subject prediction method includes step S401 and step S402.
在步骤S401中,获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据。需说明的是,本实施例中的步骤S401与上文实施例中的步骤S101两者的实施方式类似,故不再赘述。In step S401, obtain performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical period of time. It should be noted that the implementation of step S401 in this embodiment is similar to that of step S101 in the above embodiment, so it will not be repeated.
在步骤S402中,采用相邻时间节点的数据平均值来为缺失的表现数据赋值,并将填补缺失数据后的表现数据作为所述模型输入数据。In step S402, the average value of the data of adjacent time nodes is used to assign values to the missing performance data, and the performance data after filling in the missing data is used as the model input data.
由于在数据采集的过程中经常会出现数据缺失的现象,例如采集到A研究学科在2001年和2003年的论文发布情况,但2002年的论文发布数据是缺失的,这就导致数据源的不完整性,从而会导致最终的模型预测值的准确性。Due to the lack of data in the process of data collection, for example, the publication of the papers of Research A in 2001 and 2003 is collected, but the data of the papers published in 2002 is missing, which leads to the inconsistency of the data source. Completeness, which will lead to the accuracy of the final model prediction value.
因此在本实施例中,将所获取的多个研究学科在一历史时段内基于至少一个评价维度的表现数据初步处理存入数据库中,对于缺失值,则采用相邻时间节点的数据的平均值来填充,例如采用相邻年份的数据的平均值来填充。这种采用相邻时间节点的数据的平均值来进行缺失值填充的方式,不仅能实现数据的完整性,还能防止补充数据过于突兀而影响预测值。Therefore, in this embodiment, the acquired performance data of multiple research disciplines based on at least one evaluation dimension in a historical period is preliminarily processed and stored in the database. For missing values, the average value of the data of adjacent time nodes is used. To fill, for example, use the average value of the data of adjacent years to fill. This method of using the average value of the data at adjacent time nodes to fill in the missing values can not only achieve the integrity of the data, but also prevent the supplementary data from being too abrupt and affecting the predicted value.
如图5所示,展示本申请一实施例中的基于多评价维度的热点学科预测方法的流程示意图。热点学科预测方法包括步骤S501和步骤S502。As shown in FIG. 5, it shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application. The hot subject prediction method includes step S501 and step S502.
在步骤S501中,获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据。需说明的是,本实施例中的步骤S501与上文实施例中的步骤S101两者的实施方式类似,故不再赘述。In step S501, obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period of time. It should be noted that the implementation of step S501 in this embodiment is similar to the implementation of step S101 in the above embodiment, so it will not be repeated here.
在步骤S502中,将所获取的表现数据或将填补缺失数据后的表现数据做归一化处理后再作为所述模型输入数据。归一化处理是一种无量纲处理手段,使物理系统数值的绝对值变成某种相对值关系,以确保预测的准确度。In step S502, the acquired performance data or the performance data after filling the missing data are normalized and then used as the model input data. Normalization processing is a non-dimensional processing method that turns the absolute value of the physical system value into a certain relative value relationship to ensure the accuracy of the prediction.
具体的,归一化处理的方式例如是各研究学科的相关论文的总频次除以该年总的论文发文数量;再例如是各研究学科在某一年中所获得的科研经费除以该年总的科研经费等等。Specifically, the normalization method is, for example, the total frequency of related papers in each research discipline divided by the total number of papers published in the year; another example is the scientific research funding received by each research discipline in a certain year divided by the year The total research funding and so on.
如图6所示,展示本申请一实施例中的基于多评价维度的热点学科预测装置的结构示意图。热点学科预测装置包括数据获取模块61和热点学科预测模块62。As shown in FIG. 6, there is shown a schematic structural diagram of a hot subject prediction device based on multiple evaluation dimensions in an embodiment of the present application. The hot subject prediction device includes a data acquisition module 61 and a hot subject prediction module 62.
其中,数据获取模块61用于获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;热点学科预测模块62用于以所获取的表现数据为模型输入数据,构建一时间递归神经网络模型;其中,所述时间递归神经网络模型输出各所述研究学科的基于所述表现数据的学科比重数据,用于预测对应于所述历史时段的下一时间节点的热点学科。Among them, the data acquisition module 61 is used to acquire performance data of multiple research disciplines based on at least one evaluation dimension in a historical period; the hot subject prediction module 62 is used to use the acquired performance data as model input data to construct a time recursive nerve Network model; wherein the time recurrent neural network model outputs subject proportion data of each of the research subjects based on the performance data for predicting the hot subjects corresponding to the next time node in the historical period.
需说明的是,本实施例提供的基于多评价维度的热点学科预测装置的实施方式,与上文实施例提供的基于多评价维度的热点学科预测方法的实施方式类似,故不再赘述。It should be noted that the implementation manner of the hot subject prediction device based on multiple evaluation dimensions provided in this embodiment is similar to the implementation manner of the hot subject prediction method based on multiple evaluation dimensions provided in the above embodiment, so it will not be repeated.
另应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,数据获取模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上数据获取模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。In addition, it should be understood that the division of the various modules of the above device is only a division of logical functions, and may be fully or partially integrated into one physical entity during actual implementation, or may be physically separated. And these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; some modules can be implemented in the form of calling software by processing elements, and some of the modules can be implemented in the form of hardware. For example, the data acquisition module may be a separately established processing element, or it may be integrated in a chip of the above-mentioned device for implementation. In addition, it may also be stored in the memory of the above-mentioned device in the form of program code and processed by one of the above-mentioned devices. The component calls and executes the functions of the above data acquisition module. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together or implemented independently. The processing element described here may be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit,简称ASIC),或,一个或多个微处理器(digital signal processor,简称DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,简称FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(Central Processing Unit, 简称CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,简称SOC)的形式实现。For example, the above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (ASICs for short), or one or more microprocessors ( Digital signal processor, DSP for short), or, one or more Field Programmable Gate Arrays (FPGA for short), etc. For another example, when one of the above modules is implemented in the form of processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU for short) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).
如图7所示,展示本申请实施例提供的再一种电子终端的结构示意图。本实例提供的电子终端,包括:处理器71和存储器72;存储器72通过系统总线与处理器71连接并完成相互间的通信,存储器72用于存储计算机程序,处理器71用于运行计算机程序,使电子终端执行如上基于多评价维度的热点学科预测方法的各个步骤。As shown in FIG. 7, there is shown a schematic structural diagram of still another electronic terminal provided by an embodiment of the present application. The electronic terminal provided in this example includes: a processor 71 and a memory 72; the memory 72 is connected to the processor 71 through a system bus and completes mutual communication, the memory 72 is used to store computer programs, and the processor 71 is used to run the computer programs. Make the electronic terminal execute each step of the hot subject prediction method based on multiple evaluation dimensions.
上述提到的系统总线可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。该系统总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口用于实现数据库访问装置与其他设备(例如客户端、读写库和只读库)之间的通信。存储器可能包含随机存取存储器(Random Access Memory,简称RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The aforementioned system bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The system bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus. The communication interface is used to realize the communication between the database access device and other devices (such as the client, the read-write library and the read-only library). The memory may include random access memory (Random Access Memory, RAM for short), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP for short) , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
在一实施例中,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述基于多评价维度的热点学科预测方法。In an embodiment, the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method for predicting a hot subject based on multiple evaluation dimensions is realized.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过计算机程序相关的硬件来完成。前述的计算机程序可以存储于一计算机可读存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by hardware related to a computer program. The aforementioned computer program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the foregoing method embodiments; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.
综上所述,本申请提供基于多评价维度的热点学科预测方法、装置、终端、及介质,本发明提供的基于多评价维度的热点学科预测方案,采用神经网络算法预测热点学科,一方面整理过去的学术发展历史,一方面发现未来学术发展趋势;收集科研经费情况、资讯或媒体发稿情况、论文发布情况、及专利申请情况等多个维度信息,提高预测结果的准确度。所以,本申请有效克服了现有技术中的种种缺点而具高度产业利用价值。In summary, this application provides methods, devices, terminals, and media for predicting hot subjects based on multiple evaluation dimensions. The hot subject prediction scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects. The past academic development history, on the one hand, discovers future academic development trends; collects multiple dimensions of information such as scientific research funding, information or media releases, papers released, and patent applications to improve the accuracy of prediction results. Therefore, this application effectively overcomes various shortcomings in the prior art and has a high industrial value.
上述实施例仅例示性说明本申请的原理及其功效,而非用于限制本申请。任何熟悉此技 术的人士皆可在不违背本申请的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本申请所揭示的精神与技术思想下所完成的一切等效修饰或改变,仍应由本申请的权利要求所涵盖。The foregoing embodiments only exemplarily illustrate the principles and effects of the present application, and are not used to limit the present application. Anyone familiar with this technology can modify or change the above-mentioned embodiments without departing from the spirit and scope of this application. Therefore, all equivalent modifications or changes made by persons with ordinary knowledge in the technical field without departing from the spirit and technical ideas disclosed in this application should still be covered by the claims of this application.

Claims (10)

  1. 一种基于多评价维度的热点学科预测方法,其特征在于,包括:A prediction method of hot subjects based on multiple evaluation dimensions, which is characterized in that it includes:
    获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;Obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;
    以所获取的表现数据为模型输入数据,构建一时间递归神经网络模型;其中,所述时间递归神经网络模型输出各所述研究学科的基于所述表现数据的学科比重数据,用于预测对应于所述历史时段的下一时间节点的热点学科。Taking the acquired performance data as model input data, construct a time recurrent neural network model; wherein, the time recurrent neural network model outputs the subject proportion data of each research discipline based on the performance data, and is used to predict corresponding The hot subject at the next time node of the historical period.
  2. 根据权利要求1所述的方法,其特征在于,所述评价维度包括:科研经费分布情况、资讯或媒体发稿情况、论文发布情况、及专利申请情况中的任意一种或多种的组合。The method according to claim 1, wherein the evaluation dimensions include any one or a combination of: distribution of scientific research funding, information or media release status, paper release status, and patent application status.
  3. 根据权利要求2所述的方法,其特征在于,包括:The method according to claim 2, characterized in that it comprises:
    所述评价维度包括论文发布情况,所述论文发布情况包括论文数量情况和/或论文质量情况;The evaluation dimensions include the publication status of the papers, and the publication status of the papers includes the number of papers and/or the quality of the papers;
    其中,所述论文质量情况包括论文被其他文献引用的频次情况、论文被顶尖期刊收录的数量情况、论文发布后被媒体平台报道的频次情况、及论文发布后被媒体平台报道所带来的评价情况中的任意一种或多种组合。Among them, the quality of the paper includes the frequency of the paper being cited by other documents, the number of papers included in top journals, the frequency of the paper being reported by the media platform after the publication, and the evaluation brought by the paper being reported by the media platform after the publication. Any one or more combinations of situations.
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;Obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;
    采用相邻时间节点的数据平均值来为缺失的表现数据赋值,并将填补缺失数据后的表现数据作为所述模型输入数据。The average value of the data at adjacent time nodes is used to assign values to the missing performance data, and the performance data after filling in the missing data is used as the model input data.
  5. 根据权利要求1或3所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 3, wherein the method further comprises:
    获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;Obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;
    将所获取的表现数据或将填补缺失数据后的表现数据做归一化处理后再作为所述模型输入数据。The acquired performance data or the performance data after filling the missing data are normalized and then used as the model input data.
  6. 根据权利要求1所述的方法,其特征在于,所述时间递归神经网络模型的类型包括采用梯度下降算法作为模型优化器的LSTM神经网络模型。The method according to claim 1, wherein the type of the time recurrent neural network model includes an LSTM neural network model using a gradient descent algorithm as a model optimizer.
  7. 根据权利要求1所述的方法,其特征在于,所述研究学科选自科技知识组织体系词表的学科词组。The method according to claim 1, wherein the research subject is selected from subject phrases in a vocabulary of a scientific and technological knowledge organization system.
  8. 一种基于多评价维度的热点学科预测装置,其特征在于,包括:A hot subject prediction device based on multiple evaluation dimensions, which is characterized in that it includes:
    数据获取模块,用于获取多个研究学科在一历史时段内基于至少一个评价维度的表现数据;The data acquisition module is used to acquire performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;
    热点学科预测模块,用于以所获取的表现数据为模型输入数据,构建一时间递归神经网络模型;其中,所述时间递归神经网络模型输出各所述研究学科的基于所述表现数据的 学科比重数据,用于预测对应于所述历史时段的下一时间节点的热点学科。The hot subject prediction module is used to construct a time recurrent neural network model using the acquired performance data as model input data; wherein the time recursive neural network model outputs the subject proportions of each research subject based on the performance data The data is used to predict the hot subjects corresponding to the next time node in the historical period.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述基于多评价维度的热点学科预测方法。A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method for predicting hot subjects based on multiple evaluation dimensions in any one of claims 1 to 7.
  10. 一种电子终端,其特征在于,包括:处理器及存储器;An electronic terminal, characterized by comprising: a processor and a memory;
    所述存储器用于存储计算机程序;The memory is used to store a computer program;
    所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行如权利要求1至7中任一项所述基于多评价维度的热点学科预测方法。The processor is configured to execute a computer program stored in the memory, so that the terminal executes the method for predicting a hot subject based on multiple evaluation dimensions according to any one of claims 1 to 7.
PCT/CN2019/117967 2019-08-23 2019-11-13 Method and apparatus for predicting hot-topic subject on basis of multiple evaluation dimensions, terminal, and medium WO2021035975A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910785088.5 2019-08-23
CN201910785088.5A CN110705821A (en) 2019-08-23 2019-08-23 Hotspot subject prediction method, device, terminal and medium based on multiple evaluation dimensions

Publications (1)

Publication Number Publication Date
WO2021035975A1 true WO2021035975A1 (en) 2021-03-04

Family

ID=69194029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117967 WO2021035975A1 (en) 2019-08-23 2019-11-13 Method and apparatus for predicting hot-topic subject on basis of multiple evaluation dimensions, terminal, and medium

Country Status (2)

Country Link
CN (1) CN110705821A (en)
WO (1) WO2021035975A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343082A (en) * 2021-05-25 2021-09-03 北京字节跳动网络技术有限公司 Hot field prediction model generation method and device, storage medium and equipment
CN113837807A (en) * 2021-09-27 2021-12-24 北京奇艺世纪科技有限公司 Heat prediction method and device, electronic equipment and readable storage medium
CN114710413A (en) * 2022-03-31 2022-07-05 中国农业银行股份有限公司 Method and device for predicting network state of bank outlets

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989070B (en) * 2020-06-17 2022-07-19 浙江大学 Core periodical quantitative evaluation system and method based on computer system
CN113239071B (en) * 2021-07-08 2022-02-11 北京邮电大学 Retrieval query method and system for scientific and technological resource subject and research topic information
CN114742328A (en) * 2022-06-13 2022-07-12 北京邮电大学 Method and device for predicting field trend of science and technology conference and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089152A1 (en) * 2016-09-02 2018-03-29 Digital Genius Limited Message text labelling
CN109214562A (en) * 2018-08-24 2019-01-15 国网山东省电力公司电力科学研究院 A kind of power grid scientific research hotspot prediction and method for pushing based on RNN
CN109359824A (en) * 2018-09-25 2019-02-19 浙江理工大学 A kind of evaluation method of the academic level of subject

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373830A (en) * 2015-12-11 2016-03-02 中国科学院上海高等研究院 Prediction method and system for error back propagation neural network and server
CN109213869B (en) * 2017-06-29 2021-08-13 中国科学技术大学 Hot spot technology prediction method based on multi-source data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089152A1 (en) * 2016-09-02 2018-03-29 Digital Genius Limited Message text labelling
CN109214562A (en) * 2018-08-24 2019-01-15 国网山东省电力公司电力科学研究院 A kind of power grid scientific research hotspot prediction and method for pushing based on RNN
CN109359824A (en) * 2018-09-25 2019-02-19 浙江理工大学 A kind of evaluation method of the academic level of subject

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343082A (en) * 2021-05-25 2021-09-03 北京字节跳动网络技术有限公司 Hot field prediction model generation method and device, storage medium and equipment
CN113837807A (en) * 2021-09-27 2021-12-24 北京奇艺世纪科技有限公司 Heat prediction method and device, electronic equipment and readable storage medium
CN113837807B (en) * 2021-09-27 2023-07-21 北京奇艺世纪科技有限公司 Heat prediction method, heat prediction device, electronic equipment and readable storage medium
CN114710413A (en) * 2022-03-31 2022-07-05 中国农业银行股份有限公司 Method and device for predicting network state of bank outlets

Also Published As

Publication number Publication date
CN110705821A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
WO2021035975A1 (en) Method and apparatus for predicting hot-topic subject on basis of multiple evaluation dimensions, terminal, and medium
Gao et al. Consensus reaching with non-cooperative behavior management for personalized individual semantics-based social network group decision making
Li et al. A new methodology to support group decision-making for IoT-based emergency response systems
Meng et al. Decision making with intuitionistic linguistic preference relations
Abdelfattah Data envelopment analysis with neutrosophic inputs and outputs
Geng et al. Bearing fault diagnosis based on improved federated learning algorithm
WO2022012136A1 (en) Block chain system, and biological product monitoring method and device
Xie et al. Primary node election based on probabilistic linguistic term set with confidence interval in the PBFT consensus mechanism for blockchain
Loganathan et al. Determination of single sampling plans by attributes under the conditions of zero-inflated Poisson distribution
Nie et al. An objective and interactive‐information‐based feedback mechanism for the consensus‐reaching process considering a non‐support degree for minority opinions
Shi et al. Research on supply network resilience considering the ripple effect with collaboration
Fougeres et al. Limit conditional distributions for bivariate vectors with polar representation
Yang et al. Asymptotics for randomly weighted and stopped dependent sums
Yuan et al. Fusion of expert uncertain assessment in FMEA based on the negation of basic probability assignment and evidence distance
Wu et al. A large‐scale group decision making method with a consensus reaching process under cognitive linguistic environment
Beliakov et al. Choquet integral‐based measures of economic welfare and species diversity
Zhang et al. Improving incremental nonnegative matrix factorization method for recommendations based on three-way decision making
Friedberg et al. Marriage, divorce, and asymmetric information
Xu et al. An interindividual iterative consensus model for fuzzy preference relations
Chai et al. Correlation Analysis-Based Neural Network Self-Organizing Genetic Evolutionary Algorithm
Dong Prediction of college employment rate based on big data analysis
Chen et al. Assessment of tropical cyclone disaster loss in Guangdong Province based on combined model
Xu et al. A new generalized p-value and its upper bound for ANOVA under unequal error variances
Fan et al. A new QoC parameter and corresponding context inconsistency elimination algorithms for sensed contexts and non-sensed contexts
Stengos et al. Information-theoretic distribution test with application to normality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19943408

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19943408

Country of ref document: EP

Kind code of ref document: A1