WO2021035975A1

WO2021035975A1 - Method and apparatus for predicting hot-topic subject on basis of multiple evaluation dimensions, terminal, and medium

Info

Publication number: WO2021035975A1
Application number: PCT/CN2019/117967
Authority: WO
Inventors: 田欣; 赵燕; 普丽娜; 胡寅骏; 张嘉锐
Original assignee: 上海科技发展有限公司
Priority date: 2019-08-23
Filing date: 2019-11-13
Publication date: 2021-03-04
Also published as: CN110705821A

Abstract

Provided in the present application are a method and apparatus for predicting a hot-topic subject on the basis of multiple evaluation dimensions, a terminal, and a medium. The method comprises: acquiring performance data of a plurality of research subjects based on at least one evaluation dimension in a historical period of time; and using the acquired performance data as model input data to construct a time recurrent neural network model, wherein the time recurrent neural network model outputs performance data-based subject proportion data of each research subject, and is used to predict a hot-topic subject corresponding to a next time node of the historical period of time. The scheme for predicting a hot-topic subject on the basis of multiple evaluation dimensions provided in the present invention uses a neural network algorithm to predict a hot-topic subject. On the one hand, past academic development history is organized, and on the other hand, future academic development trends are discovered. Multi-dimensional information such as scientific research funding status, information or media release status, paper publication status and patent application status is collected, and the accuracy of the prediction results is improved.

Description

Forecasting methods, devices, terminals, and media for hot subjects based on multiple evaluation dimensions

Technical field

This application relates to the technical field of research disciplines, in particular to the prediction methods, devices, terminals, and media of hot disciplines based on multiple evaluation dimensions.

Background technique

As the government and society pay more attention to scientific research, various investment institutions, media, finance and universities are always paying attention to the changes in scientific research hotspots, hoping to find an accurate prediction method to know the next scientific research hotspot in advance. However, there is currently no method that can accurately and efficiently predict changes in scientific research hotspots, which has become an urgent problem in this field.

Application content

In view of the above-mentioned shortcomings of the prior art, the purpose of this application is to provide a method, device, terminal, and medium for predicting hot subjects based on multiple evaluation dimensions to solve the problem that hot subjects cannot be accurately and efficiently predicted in the prior art .

In order to achieve the above and other related purposes, the first aspect of this application provides a hot subject prediction method based on multiple evaluation dimensions, which includes: obtaining performance data of multiple research disciplines based on at least one evaluation dimension in a historical period of time; Taking the acquired performance data as model input data, construct a time recurrent neural network model; wherein, the time recurrent neural network model outputs the subject proportion data of each research discipline based on the performance data, and is used to predict corresponding The hot subject at the next time node of the historical period.

In some embodiments of the first aspect of the application, the evaluation dimension includes the publication of the paper, the publication of the paper includes the quantity of the paper and/or the quality of the paper; wherein, the quality of the paper includes the publication of the paper by other documents. Any one or a combination of the frequency of citations, the number of papers included in top journals, the frequency of papers being reported by media platforms after publication, and the evaluation of papers reported by media platforms after publication.

In some embodiments of the first aspect of the present application, the method further includes: obtaining performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical time period; Assign values to the performance data of, and use the performance data after filling in the missing data as the input data of the model.

In some embodiments of the first aspect of the present application, the method further includes: obtaining performance data of multiple research disciplines based on at least one evaluation dimension in a historical time period; The performance data of is normalized and then used as the input data of the model.

In some embodiments of the first aspect of the present application, the type of the time recurrent neural network model includes an LSTM neural network model that uses a gradient descent algorithm as a model optimizer.

In some embodiments of the first aspect of the present application, the research subject is selected from subject phrases in the vocabulary of the scientific and technological knowledge organization system.

In order to achieve the above and other related purposes, the second aspect of this application provides a hot subject prediction device based on multiple evaluation dimensions, which includes: a data acquisition module for acquiring multiple research subjects based on at least one The performance data of the evaluation dimension; the hot subject prediction module, which is used to construct a time recurrent neural network model using the acquired performance data as the model input data; wherein the time recurrent neural network model outputs the results of each research subject based on the research subject The subject weighting data of the performance data is used to predict the hot subjects corresponding to the next time node of the historical period.

In order to achieve the above objectives and other related objectives, the third aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to realize the hot subject based on multiple evaluation dimensions method of prediction.

To achieve the foregoing and other related purposes, a fourth aspect of the present application provides an electronic terminal, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the terminal executes the hot subject prediction method based on multiple evaluation dimensions.

As mentioned above, the hot subject prediction method, device, terminal, and medium based on multiple evaluation dimensions of this application have the following beneficial effects: the hot subject prediction scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects On the one hand, it organizes the past academic development history and on the other hand discovers future academic development trends; collects multiple dimensions of information such as scientific research funding, information or media publication, paper publication, and patent application status to improve the accuracy of prediction results.

Description of the drawings

FIG. 1 shows a schematic flowchart of a method for predicting hot subjects based on multiple evaluation dimensions in an embodiment of this application.

FIG. 2 shows a schematic diagram of the model structure of the LSTM neural network model in an embodiment of this application.

FIG. 3 is a schematic diagram showing the structure of the forget gate when transmitting between the units of the hidden layer of the LSTM neural network model in an embodiment of this application.

FIG. 4 shows a schematic flowchart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of this application.

FIG. 5 shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of this application.

FIG. 6 shows a schematic structural diagram of a hot subject prediction device based on multiple evaluation dimensions in an embodiment of this application.

FIG. 7 shows a schematic structural diagram of an electronic terminal in an embodiment of this application.

detailed description

The following describes the implementation of the present application through specific specific examples, and those skilled in the art can easily understand the other advantages and effects of the present application from the content disclosed in this specification. This application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the application. It should be noted that, in the case of no conflict, the following embodiments and the features in the embodiments can be combined with each other.

It should be noted that in the following description, with reference to the accompanying drawings, the accompanying drawings describe several embodiments of the present application. It should be understood that other embodiments can also be used, and mechanical, structural, electrical, and operational changes can be made without departing from the spirit and scope of the application. The following detailed description should not be considered restrictive, and the scope of the embodiments of the present application is limited only by the claims of the published patent. The terms used here are only for describing specific embodiments, and are not intended to limit the application. Space-related terms, such as "upper", "lower", "left", "right", "below", "below", "lower", "above", "upper", etc., can be used in the text for ease of explanation The relationship between one element or feature shown in the figure and another element or feature.

In this application, unless expressly stipulated and limited otherwise, the terms "installed", "connected", "connected", "fixed", "fixed" and other terms should be understood in a broad sense. For example, it can be a fixed connection or a fixed connection. It is a detachable connection or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and it can be the internal communication between two components. For those of ordinary skill in the art, the specific meanings of the above-mentioned terms in this application can be understood according to specific circumstances.

Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to also include the plural forms, unless the context dictates to the contrary. It should be further understood that the terms "including" and "including" indicate the presence of the described features, operations, elements, components, items, types, and/or groups, but do not exclude one or more other features, operations, elements, components, The existence, appearance or addition of items, categories, and/or groups. The terms "or" and "and/or" used herein are interpreted as inclusive or mean any one or any combination. Therefore, "A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A, B and C" . An exception to this definition will only occur if the combination of elements, functions, or operations is inherently mutually exclusive in some way.

Hot disciplines refer to research disciplines with high attention and high research value among research disciplines. Investment institutions, media, finance and universities are also constantly paying attention to the changes in scientific research hotspots, hoping to find a way to accurately predict hot topics Methods, in order to arrange future hot subjects in advance. However, in the existing technology, usually only a certain expert or some elite groups make some predictions on the future hot subjects. These predictions are strongly subjective and limited, and cannot accurately predict the changes in the hot subjects.

In view of this, the present invention proposes corresponding solutions to effectively solve these problems in the prior art. The present invention provides prediction methods, devices, terminals, and media for hot subjects based on multi-evaluation dimensions, aiming at the distribution of research funding, information or media releases, paper publications, or patent applications based on the historical time period of each research subject The performance data of multiple dimensions can be used to predict the distribution of hot subjects at the next time node, so that future hot subjects can be predicted accurately and efficiently.

As shown in FIG. 1, it shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application.

It should be noted that the hot subject prediction method based on multiple evaluation dimensions in this application can be applied to various types of hardware devices. Specifically, the hardware device may be a controller, such as an ARM (Advanced RISC Machines) controller, FPGA (Field Programmable Gate Array) controller, SoC (System on Chip) controller, DSP (Digital Signal Processing) controller , Or MCU (Micorcontroller Unit) controller, etc.; the hardware device may also include a memory, a storage controller, one or more processing units (CPU), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, Computer equipment including input/output (I/O) subsystems, display screens, other output or control equipment, and external ports; said computer equipment includes, but is not limited to, desktop computers, laptops, tablets, smart phones, Personal computers such as smart TVs and personal digital assistants (PDAs for short); the hardware device may also be a server, and the server may be arranged on one or more physical servers according to various factors such as function and load. It can be composed of a distributed or centralized server cluster, which is not limited in this embodiment.

In this embodiment, the hot subject prediction method based on multiple evaluation dimensions includes step S101 and step S102.

In step S101, obtain performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical period of time.

Optionally, the research disciplines include but are not limited to: engineering disciplines, science disciplines, agronomy disciplines, medical disciplines, military disciplines, management disciplines, philosophy disciplines, economics disciplines, education disciplines, literature disciplines, and history disciplines , Art Studies and so on. Each type of discipline has several levels of disciplines, such as mathematics, physics, chemistry, and so on. Because there are many types of research disciplines, they will not be listed here.

Optionally, the research subject is selected from subject phrases in the vocabulary of the scientific and technological knowledge organization system. Specifically, the subject phrase concept_group of the "Science and Technology Knowledge Organization System (STKOS)" vocabulary can be used to represent the research subject.

Optionally, the performance data of the research discipline based on at least one evaluation dimension in a historical time period is used to reflect the performance of the research discipline in at least one evaluation dimension in a past time period. It should be noted that this past time period is adjustable and can be changed according to changes in actual research projects; in addition, this past time period can be the unit of time (for example, within the past 10 years) or month The unit (for example, within the past 5 months), and even hours can be the time unit, which is not limited in this embodiment.

Optionally, the evaluation dimensions include, but are not limited to: scientific research funding distribution, information or media release status, paper release status, or patent application status, etc. It should be noted that the evaluation dimension used in this embodiment can include any one of the four evaluation dimensions, or any combination of two or three, or all four evaluation dimensions of the thinking evaluation dimension. , This embodiment is not limited.

Among them, the scientific research funds generally refer to various expenses for the development of scientific and technological undertakings, which are usually allocated by the government, enterprises, non-governmental organizations, foundations, etc. through entrustment or screening of application reports, which include domestic or foreign The funds are used to solve specific scientific and technical problems.

The information or media release status refers to the release status based on the information or media platform. Among them, information platforms include, but are not limited to, technology information platforms, financial information platforms, security information platforms, lifestyle information platforms, entertainment information platforms, sports information platforms, regional information platforms, shopping information platforms, and health information platforms. Information platform, tourism information platform, education information platform, etc. Media platforms include public media platforms and/or self-media platforms; public media platforms refer to media platforms that release information on behalf of the government, such as public broadcasting platforms, public television platforms, public network platforms, etc.; self-media platforms refer to the general public Media platforms that release information through the Internet, such as Weibo platforms, blog platforms, Tieba forum platforms, WeChat platforms (including Moments of Friends, Official Accounts, Mini Programs, etc.), and Alipay platforms (including life accounts, life circles, and small programs). Programs, etc.), MSN platform, etc., which will not be listed here.

The publication situation of the paper refers to the publication situation of related papers in each research discipline, and the publication situation of the paper referred to in this application can be multi-dimensional. On the one hand, the number of papers can be used to reflect the publication of the papers. For example, the more the number of papers related to the research discipline, the better the publication of the papers in the research discipline; on the other hand, the quality of the papers can be used to reflect the invention of the papers, such as research The frequency with which the subject-related papers are cited by other documents after the release, the number of research subject-related papers published in top journals (such as "nature", "science", "cell", etc.), and the number of research discipline-related papers reported by the media platform after the release Frequency, evaluation status brought by media platforms after the publication of relevant papers in research disciplines, etc.

The patent application status refers to the issuance of related patents in various research disciplines. The patents referred to in this application can be invention patents, utility model patents or appearance patents, and this embodiment does not limit it. In addition, the status of patent applications can reflect statistics from many aspects, such as: number of patent applications, number of authorized patents, percentage of authorized patents, percentage of invention patents, patent quality evaluation scores, remaining right years of authorized patents, and patent achievement conversion status , Patent implementation license, etc., this embodiment does not limit it.

The following uses the historical period from 2000 to present as an example to illustrate how to obtain the evaluation dimensions of various research disciplines from 2000 to the present based on the distribution of scientific research funding, information or media publication, the publication of papers, and the status of patent applications. Performance data.

Use a web crawler to crawl the distribution data of the amount of investment in various research disciplines at home and abroad from 2000 to the present; use a web crawler to crawl from 2000 to the present, the keyword groups of various research disciplines appear in domestic and foreign news or manuscripts The frequency data, etc., are used to measure the social attention of each research discipline; use web crawlers to crawl the number of articles published in domestic and foreign journals or conferences or the number of top journals included in the keyword groups of each research discipline from 2000 to the present. Data; use web crawlers to crawl data such as the number of patent applications in various disciplines from 2000 to the present.

In step S102, a time recurrent neural network model is constructed using the acquired performance data as model input data; wherein, the time recurrent neural network model outputs subject proportion data of each research discipline based on the performance data, It is used to predict the hot subjects corresponding to the next time node of the historical period.

Optionally, the time recurrent neural network model is an LSTM neural network model, and this method adds a method of carrying information across multiple time steps. The normalized four-dimensional feature data of each research subject in the past time period (tn～t-1) is used as the input of the model, and the output of the model is the proportion of each subject used to predict the time t, and finally according to the proportion of the subject Subject hotspots predicted.

Specifically, the principle of the LSTM neural network model is a modification of the RNN neural network, that is, on the basis of the RNN neural network, memory units are added to the neural units in the hidden layer, so that the memory information on the time series can be controlled , Through several controllable gates (forgetting gates, input gates, candidate gates, output gates) each time when passing between the units of the hidden layer, the memory and the degree of forgetting of previous and current information can be controlled, so that the RNN network has Long-term memory function.

Optionally, the LSTM neural network model adopts a gradient descent algorithm as the model optimizer. The gradient descent algorithm is an iterative method that can be used to solve least squares problems (including linear and nonlinear). When solving the minimum value of the loss function, the gradient descent method can be used to solve it step by step to obtain the minimized loss function and simulated parameter values.

Optionally, the LSTM neural network model uses a loss function to measure the gap between the predicted value output by the neural network model and the actual value. The loss function used in the LSTM neural network model includes a classification loss function and/or a regression loss function. The classification loss function includes, but is not limited to: log loss, focal loss, relative entropy loss, Hinge loss function, etc.; the regression loss function includes but is not limited to: Mean Square Error Loss function, Mean Absolute Error loss function, Log cosh loss function, etc., because these loss functions are already existing, they will not be repeated here.

In an embodiment, the model structure of the LSTM neural network model is shown in FIG. 2, and the model includes an input layer, a number of hidden layers, and an output layer. The reason why LSTM neural network has "memory" is that there are connections between networks at different "points in time", rather than the presence of feedforward or feedback in the network at a single point in time, that is, between the hidden layers as shown in Figure 2. There are dashed arrows to connect, and the dashed arrows represent the jump connection between neural units according to the time step sequence.

The structure of the forget gate during the transmission between the units of the hidden layer of the LSTM neural network model is shown in Figure 3. The figure lists time points 1 to 7, and each time point corresponds to an input layer, a hidden layer, and an output layer. Each hidden layer neural unit has multiple forgotten gate valve nodes, wherein the valve node 31 marked "○" represents an open valve, and the valve node 32 marked "—" represents a closed valve. The memory function of the LSTM neural network model is realized by these valve nodes; when the valve is opened, the previous model training results will be associated with the current model calculation; when the valve is closed, the previous calculation results will no longer affect Current calculation.

Specifically, the black solid neural unit represents the neural unit that carries information. In Figure 3, the valve node of the hidden layer neural unit at time point 1 that is connected to the input layer neural unit at time point 1 is in the open state, so time point 1 The input layer neural unit at time point 1 transmits information to the hidden layer neural unit at time point 1. However, the valve node of the hidden layer neural unit at time point 1 connected to the output layer neural unit at time point 1 is closed, so time point 1 The hidden layer neural unit of is unable to transmit information to the output layer neural unit at time point 1, and so on, the information distribution in Figure 3 can be formed.

In some optional implementations, the acquired performance data of multiple research disciplines based on at least one evaluation dimension in a historical period is divided into a training set and a test set, the training set is used to train the LSTM neural network model, and then used The test set is used to test and adjust the accuracy of the LSTM neural network model.

Therefore, the hot subject forecasting scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects. On the one hand, it sorts out the past academic development history and on the other hand discovers future academic development trends; collects research funding, information or media releases Information on multiple dimensions, such as the situation, the publication of the paper, and the patent application, improves the accuracy of the prediction results.

As shown in FIG. 4, it shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application. The hot subject prediction method includes step S401 and step S402.

In step S401, obtain performance data of a plurality of research disciplines based on at least one evaluation dimension in a historical period of time. It should be noted that the implementation of step S401 in this embodiment is similar to that of step S101 in the above embodiment, so it will not be repeated.

In step S402, the average value of the data of adjacent time nodes is used to assign values to the missing performance data, and the performance data after filling in the missing data is used as the model input data.

Due to the lack of data in the process of data collection, for example, the publication of the papers of Research A in 2001 and 2003 is collected, but the data of the papers published in 2002 is missing, which leads to the inconsistency of the data source. Completeness, which will lead to the accuracy of the final model prediction value.

Therefore, in this embodiment, the acquired performance data of multiple research disciplines based on at least one evaluation dimension in a historical period is preliminarily processed and stored in the database. For missing values, the average value of the data of adjacent time nodes is used. To fill, for example, use the average value of the data of adjacent years to fill. This method of using the average value of the data at adjacent time nodes to fill in the missing values can not only achieve the integrity of the data, but also prevent the supplementary data from being too abrupt and affecting the predicted value.

As shown in FIG. 5, it shows a schematic flow chart of a hot subject prediction method based on multiple evaluation dimensions in an embodiment of the present application. The hot subject prediction method includes step S501 and step S502.

In step S501, obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period of time. It should be noted that the implementation of step S501 in this embodiment is similar to the implementation of step S101 in the above embodiment, so it will not be repeated here.

In step S502, the acquired performance data or the performance data after filling the missing data are normalized and then used as the model input data. Normalization processing is a non-dimensional processing method that turns the absolute value of the physical system value into a certain relative value relationship to ensure the accuracy of the prediction.

Specifically, the normalization method is, for example, the total frequency of related papers in each research discipline divided by the total number of papers published in the year; another example is the scientific research funding received by each research discipline in a certain year divided by the year The total research funding and so on.

As shown in FIG. 6, there is shown a schematic structural diagram of a hot subject prediction device based on multiple evaluation dimensions in an embodiment of the present application. The hot subject prediction device includes a data acquisition module 61 and a hot subject prediction module 62.

Among them, the data acquisition module 61 is used to acquire performance data of multiple research disciplines based on at least one evaluation dimension in a historical period; the hot subject prediction module 62 is used to use the acquired performance data as model input data to construct a time recursive nerve Network model; wherein the time recurrent neural network model outputs subject proportion data of each of the research subjects based on the performance data for predicting the hot subjects corresponding to the next time node in the historical period.

It should be noted that the implementation manner of the hot subject prediction device based on multiple evaluation dimensions provided in this embodiment is similar to the implementation manner of the hot subject prediction method based on multiple evaluation dimensions provided in the above embodiment, so it will not be repeated.

In addition, it should be understood that the division of the various modules of the above device is only a division of logical functions, and may be fully or partially integrated into one physical entity during actual implementation, or may be physically separated. And these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; some modules can be implemented in the form of calling software by processing elements, and some of the modules can be implemented in the form of hardware. For example, the data acquisition module may be a separately established processing element, or it may be integrated in a chip of the above-mentioned device for implementation. In addition, it may also be stored in the memory of the above-mentioned device in the form of program code and processed by one of the above-mentioned devices. The component calls and executes the functions of the above data acquisition module. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together or implemented independently. The processing element described here may be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (ASICs for short), or one or more microprocessors ( Digital signal processor, DSP for short), or, one or more Field Programmable Gate Arrays (FPGA for short), etc. For another example, when one of the above modules is implemented in the form of processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU for short) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).

As shown in FIG. 7, there is shown a schematic structural diagram of still another electronic terminal provided by an embodiment of the present application. The electronic terminal provided in this example includes: a processor 71 and a memory 72; the memory 72 is connected to the processor 71 through a system bus and completes mutual communication, the memory 72 is used to store computer programs, and the processor 71 is used to run the computer programs. Make the electronic terminal execute each step of the hot subject prediction method based on multiple evaluation dimensions.

The aforementioned system bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The system bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus. The communication interface is used to realize the communication between the database access device and other devices (such as the client, the read-write library and the read-only library). The memory may include random access memory (Random Access Memory, RAM for short), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP for short) , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.

In an embodiment, the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method for predicting a hot subject based on multiple evaluation dimensions is realized.

A person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by hardware related to a computer program. The aforementioned computer program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the foregoing method embodiments; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

In summary, this application provides methods, devices, terminals, and media for predicting hot subjects based on multiple evaluation dimensions. The hot subject prediction scheme based on multiple evaluation dimensions provided by the present invention uses neural network algorithms to predict hot subjects. The past academic development history, on the one hand, discovers future academic development trends; collects multiple dimensions of information such as scientific research funding, information or media releases, papers released, and patent applications to improve the accuracy of prediction results. Therefore, this application effectively overcomes various shortcomings in the prior art and has a high industrial value.

The foregoing embodiments only exemplarily illustrate the principles and effects of the present application, and are not used to limit the present application. Anyone familiar with this technology can modify or change the above-mentioned embodiments without departing from the spirit and scope of this application. Therefore, all equivalent modifications or changes made by persons with ordinary knowledge in the technical field without departing from the spirit and technical ideas disclosed in this application should still be covered by the claims of this application.

Claims

A prediction method of hot subjects based on multiple evaluation dimensions, which is characterized in that it includes:

Obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;

Taking the acquired performance data as model input data, construct a time recurrent neural network model; wherein, the time recurrent neural network model outputs the subject proportion data of each research discipline based on the performance data, and is used to predict corresponding The hot subject at the next time node of the historical period.
The method according to claim 1, wherein the evaluation dimensions include any one or a combination of: distribution of scientific research funding, information or media release status, paper release status, and patent application status.
The method according to claim 2, characterized in that it comprises:

The evaluation dimensions include the publication status of the papers, and the publication status of the papers includes the number of papers and/or the quality of the papers;

Among them, the quality of the paper includes the frequency of the paper being cited by other documents, the number of papers included in top journals, the frequency of the paper being reported by the media platform after the publication, and the evaluation brought by the paper being reported by the media platform after the publication. Any one or more combinations of situations.
The method according to claim 1, wherein the method further comprises:

Obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;

The average value of the data at adjacent time nodes is used to assign values to the missing performance data, and the performance data after filling in the missing data is used as the model input data.
The method according to claim 1 or 3, wherein the method further comprises:

Obtain performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;

The acquired performance data or the performance data after filling the missing data are normalized and then used as the model input data.
The method according to claim 1, wherein the type of the time recurrent neural network model includes an LSTM neural network model using a gradient descent algorithm as a model optimizer.
The method according to claim 1, wherein the research subject is selected from subject phrases in a vocabulary of a scientific and technological knowledge organization system.
A hot subject prediction device based on multiple evaluation dimensions, which is characterized in that it includes:

The data acquisition module is used to acquire performance data of multiple research disciplines based on at least one evaluation dimension in a historical period;

The hot subject prediction module is used to construct a time recurrent neural network model using the acquired performance data as model input data; wherein the time recursive neural network model outputs the subject proportions of each research subject based on the performance data The data is used to predict the hot subjects corresponding to the next time node in the historical period.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method for predicting hot subjects based on multiple evaluation dimensions in any one of claims 1 to 7.
An electronic terminal, characterized by comprising: a processor and a memory;

The memory is used to store a computer program;

The processor is configured to execute a computer program stored in the memory, so that the terminal executes the method for predicting a hot subject based on multiple evaluation dimensions according to any one of claims 1 to 7.