CN113343125B - Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system - Google Patents

Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system Download PDF

Info

Publication number
CN113343125B
CN113343125B CN202110732872.7A CN202110732872A CN113343125B CN 113343125 B CN113343125 B CN 113343125B CN 202110732872 A CN202110732872 A CN 202110732872A CN 113343125 B CN113343125 B CN 113343125B
Authority
CN
China
Prior art keywords
information
scientific research
academic
heterogeneous
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110732872.7A
Other languages
Chinese (zh)
Other versions
CN113343125A (en
Inventor
张凯
王楚豫
叶保留
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110732872.7A priority Critical patent/CN113343125B/en
Priority to PCT/CN2021/104396 priority patent/WO2023272748A1/en
Publication of CN113343125A publication Critical patent/CN113343125A/en
Application granted granted Critical
Publication of CN113343125B publication Critical patent/CN113343125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a heterogeneous scientific research information integration method and system for academic accurate recommendation, wherein the method comprises the following steps: preprocessing original scientific research information, encoding heterogeneous academic literature information into a vector mode, and constructing random heterogeneous scientific research information; encoding the academic information feature vector and the heterogeneous scientific research information, and constructing an academic coding mapping model; the generator in the mapping model is utilized to realize the conversion from heterogeneous academic information to complete scientific research feature vectors; the author information is encoded into sparse independent heat vectors, and the correlation degree between the sparse independent heat vectors and scientific research feature vectors is marked; training a collaborative filtering recommendation model by utilizing the independent heat vector, the scientific research feature vector and the correlation degree of the independent heat vector and the scientific research feature vector to generate a recommendation set related to an author; and fusing the two recommendation results to finish the accurate recommendation of the academic literature. The invention realizes heterogeneous scientific research information completion and fusion and accurate recommendation of scientific research academic content through comparison of characteristic content and collaborative filtering of authors.

Description

Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system
Technical Field
The invention relates to academic data recommendation, in particular to a heterogeneous scientific research information integration method and system for academic accurate recommendation.
Background
Along with the rapid development of internet technology, various scientific journals and publications are gradually changed from paper media to online stored electronic media, and online searching can be performed in a scientific database and other modes, so that great convenience is provided for scientific researchers. However, since the academic data is usually generated by discrete individuals or organizations, the difference of structures of different academic data is quite large, the quality of the different academic data is uneven, and the accurate positioning is difficult to realize due to heterogeneous academic data; the same author has larger deviation on different websites, marked versions of databases and the like, and some information deletion exists in different degrees for uploaded data, so that massive heterogeneous academic data is caused, and great difficulty is brought to scientific research retrieval and related academic recommendation. Therefore, how to integrate and fuse heterogeneous academic data makes constructing a unified academic information structure a key to academic recommendation.
Unlike traditional data information, academic data often requires finer classification to achieve accurate recommendations, even for research content in the same field, different users often have different trends. For example, for a beginner in a certain field, academic literature of review type and science popularization type can help to quickly understand the knowledge of the field; however, for research specialists in this field, more sophisticated and more specialized academic documents need to be recommended. Therefore, accurate recommendation based on academic data not only needs to have accurate classification and positioning on literature data, but also needs to be recommended by combining the characteristics of users, so that more accurate recommendation effect can be achieved.
The existing scientific research information recommendation method is usually recommended for a certain information source, and objective requirements of users on heterogeneous information retrieval cannot be well considered.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a heterogeneous scientific research information integration method and system for accurately recommending academic, which are used for processing heterogeneous scientific research information by using a method of generating type counterlearning and collaborative filtering, carrying out detail complementation on information with different quality, and constructing a comparison information base on the existing information so as to realize accurate intelligent recommendation on scientific research contents and scholars.
In order to achieve the above purpose, the invention adopts the following technical scheme:
according to a first aspect of the invention, a heterogeneous scientific research information integration method for academic accurate recommendation is provided, which comprises the following steps:
s1, preprocessing heterogeneous original scientific research information, wherein the preprocessing comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying some items of the academic feature vector to construct random heterogeneous scientific research information;
s2, encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing, and constructing an academic coding mapping model, wherein the academic coding mapping model is used for constructing and generating an countermeasure network by utilizing multiple convolution layers and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;
s3, utilizing a generator part for generating an countermeasure network to realize conversion from heterogeneous academic information to academic feature vectors, complementing the scientific information with heterogeneous and missing items and generating complete scientific information feature vectors;
s4, encoding the author information into sparse independent heat vectors, and marking the degree of correlation between the author information independent heat vectors and scientific research information feature vectors;
s5, training a collaborative filtering recommendation model by using the author information unique heat vector, the scientific research information feature vector and the correlation degree between the author information unique heat vector and the scientific research information feature vector, and generating a recommendation set related to the author;
and S6, merging author-literature correlation recommendation based on collaborative filtering and literature-literature correlation recommendation based on generation of an antagonism network to obtain a final recommendation result, and finishing accurate recommendation of academic literature.
In some embodiments, the step S1 includes: preprocessing the original information of each scientific research paper of a specific source, and extracting the following indexes to construct the original information:
author 1, author 2, … …, author n, research area, paper name, paper keyword, time of listing, paper abstract ]
Taking the original information with complete index data as a label; the random part of the fields are emptied to serve as training data, and the labels are in one-to-one correspondence with the training data to serve as an original data training set.
In some embodiments, the step ofThe step S2 of encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing comprises the following steps: mapping each word in the training set of raw data into a word vector V i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M k =[Vector 1 ,Vector 2 ,Vector 3 ,Vector 4 ,……Vector n ]Accordingly, for this training data, there are tags:for one group of scientific research data, multiple groups of data in training set are generated>
In some embodiments, the learning coding mapping model in step S2 uses a model of generating countermeasure, the model uses a method of generating countermeasure learning, and the generator and the arbiter extract information through convolution layers, wherein each layer is calculated as follows:
where m represents the output vector dimension of the l-1 layer network,representing the i-th output parameter of the layer 1 network,the j-th output parameter representing the layer-1 network,>is a parameter of the layer 1 network, +.>Is the bias of the first layer, f is the selected activation function;
the calculation target formula for generating the countermeasure model is as follows:
A=argmin G (argmax D Value(G,D))
Value(D,G)=E x~Pdata(x) [logD(x)]+E z~Pdata(z) [1-logD(G(z))]
wherein G is a generator, D is a discriminator, and E is a cross entropy error.
In some embodiments, the step S3 includes: independently taking out a generator G for generating an countermeasure network, selecting the current complete information as a data dictionary, sequentially inputting the information into the generator G, and constructing a vector dictionary set Dict= { V 1 ,V 2 ,V 3 ,……,V n };
And (3) information complement: for a heterogeneous input S with a deficiency, a vector matrix M is constructed by mapping words in the input S into word vectors and stacking them, and feature vectors are obtainedWill->Sequentially solving European inner products with word vectors in the Dict, and selecting the nearest V k Finding out the original data of the content, and complementing the missing content;
similarity matching recommendation: and obtaining several candidate data with highest similarity according to the European inner product solving mode in the information complementation, and recommending the user.
In some embodiments, the step S4 includes:
sparse coding the author information into a single heat vector R, wherein the ith element of the single heat vector R of the author i is 1, and the rest elements are 0;
for scientific research content related to authors, an information matrix M is constructed in a mode of mapping words into word vectors, and feature vectors are further constructed:wherein G is an antigen-forming networkA generator of a complex;
evaluation of the relevance of Author u to the scientific content lAs training labels, training set of collaborative training recommendation model is constructed up to this point>
In some embodiments, the collaborative filtering recommendation model used in step S5 includes the following calculation process:
h i-relu =max(0,h i )
W i T weight matrix representing the i-th layer, b i Represents the offset of the ith layer, h i Represents the specific output of the ith layer, h i-relu Indicating the output of the ith layer through the relu layer if the relu layer is present;
at the last layer of the network, after the softmax layer, the final output vector y is obtained ul
In step S5, the objective function of the collaborative filtering recommendation model is as follows:
wherein the network outputs y ul The method identifies the prediction of the correlation degree of the network to the scientific research content l by the author u, and the higher the correlation degree is, the tighter the relation between the scientific research content and the author is.
In some implementations, generating the set of recommendations related to the author in step S5 includes: for authors interested by users, k scientific research contents most relevant to the authors are selected through comparison of candidate scientific research contents, and then related information of the authors, the fields and the keywords of the scientific research contents is traced back through the original information, so that a recommendation set relevant to the authors is generated.
According to a second aspect of the present invention, there is provided an academic accurate recommendation-oriented heterogeneous scientific research information integration system, comprising:
the scientific research information preprocessing module is used for preprocessing heterogeneous original scientific research information and comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying the academic feature vector to construct random heterogeneous scientific research information;
the scientific research information coding module is used for coding the academic information feature vector obtained through preprocessing and the heterogeneous scientific research information, and constructing an academic coding mapping model, wherein the academic coding mapping model is constructed and generated by utilizing multiple convolution layers to generate an countermeasure network and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;
the academic coding mapping model using module is used for converting heterogeneous academic information into academic feature vectors by using a generator part for generating an countermeasure network, complementing the scientific research information with heterogeneous and missing items and generating complete scientific research information feature vectors;
the author information preprocessing module encodes the author information into sparse independent heat vectors and marks the correlation degree between the author information independent heat vectors and scientific research information feature vectors;
the collaborative filtering recommendation model training module is used for training a collaborative filtering recommendation model by utilizing the author information unique heat vector, the scientific research information feature vector and the correlation degree between the author information unique heat vector and the scientific research information feature vector, and is used for generating a recommendation set related to authors;
and integrating a recommendation model application module, fusing author-document correlation recommendation based on collaborative filtering and document-document correlation recommendation based on generation of an countermeasure network, obtaining a final recommendation result, and finishing accurate recommendation of academic documents.
The invention has the following beneficial effects: the method is based on the methods such as a generated type countermeasure network, a collaborative filtering method, a feature mapping method and a deep neural network, realizes a heterogeneous scientific research information integration method for accurately recommending scientific research, realizes functions such as heterogeneous scientific research information completion and fusion, accurately recommending scientific research academic content and the like through comparing feature content and collaborative filtering of authors, and improves recommending effect and user experience of scientific research information.
Drawings
Fig. 1 is a schematic flow chart of a heterogeneous scientific research information integration method for academic accurate recommendation provided by the embodiment of the invention;
FIG. 2 is a schematic diagram of a training process for generating an countermeasure network according to an embodiment of the present invention;
fig. 3 is a recommendation process based on scientific research content according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a generator provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of a collaborative filtering network according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention. In the following description, academic information and scientific information are used interchangeably.
Referring to fig. 1, in one embodiment, a heterogeneous scientific research information integration method for academic accurate recommendation includes the following steps:
s1, preprocessing scientific research information and constructing an original data training set.
Preprocessing the original information of each scientific research paper of a specific source, and selecting the following indexes to construct the original information in the process, wherein the format of the original information is as follows:
these features are selected as original features, and in the training set, in order to obtain a good training effect, a certain process must be performed on the training set, so that some items of information are deleted:
F 1 = [ author 1, author 2, … …, NULL, research area, paper name, time of listing, paper abstract ]]
F 2 = [ author 1, author 2, … …, author n, NULL, paper name, time of listing, paper abstract ]]
For example, the above data are obtained by randomly "attacking" some parts of the original information to make them miss some information, and combining the original data of the missing information with the data of the corresponding informationTogether forming a training set of raw data.
S2, encoding scientific research information.
And segmenting the original data training set by using a characteristic coding mode. According to some existing and mature mapping modes, mapping each word in the original data training set into a word Vector i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M k =[Vector 1 ,Vector 2 ,Vector 3 ,Vector 4 ,……Vector n ]Accordingly, for this training data, there are tags:label->Is made up of original information +.>The academic information feature vector obtained by processing is corresponding training data M k The training goals we want to learn to get, we use this tag to correct the output of the network. Because of the difference of the content and the content of the scientific research information, the original processing matrix has a certain gap, so that the scientific research information with less content is filled, and the input format is madeUnified for subsequent processing.
Thus, the conversion from the original input information to the matrix is completed, and a plurality of groups of data in the training set can be generated for one group of scientific research data
And S3, training an academic recommendation model.
Training set to be processed in step S2The training parameters are input into an academic recommendation model, the training model at the stage adopts a deep learning-based mode of generating countermeasure learning, and the generator and the discriminator extract information through a convolution layer, wherein the calculation mode of each layer is as follows:
where l represents the number of layers, m represents the output vector dimension of the l-1 layer network,an i-th output parameter representing a layer 1 network,/th output parameter>J-th parameter representing layer i network, < >>Is a parameter of the layer-l network, also the object of iterative optimization, +.>Is the bias of the layer (layer l), f is the selected activation function, and in this model the softmax activation function is selected. In the training process, a specific error calculation formula is as follows:
A=argmin G (argmax D Value(G,D))
Value(D,G)=E x~Pdata(x) [logD(x)]+E z~Pdata(z) [1-logD(G(z))]
wherein G is a generator, D is a discriminator, E is a cross entropy error, A is an optimization target, value (G, D) is a counter propagation error calculated by the formula under the condition that the generator is G and the discriminator is D, and z is a hidden variable input by the generator and is a learnable bias variable related to the network structure of the generator. E (E) x~Pdata(x) Representing that the error is calculated using cross entropy when x obeys its original distribution Pdata (x); similarly, E z~Pdata(z) Indicating that the error is calculated using cross entropy when z obeys the Pdata (z) distribution.
The input of the model is the items in the training set described above, and for each item, the output is a vector V, the purpose of the model is to make the arbiter output by continuously adjusting the parameters: v=g (M) andwith the greatest similarity, the continuous iteration of the generator and the arbiter is performed until the training converges. Fig. 2 is a schematic diagram of the system architecture of this step, and fig. 4 is a schematic diagram of the structure of the generator model.
S4, after a trained model is obtained, a generator G is independently taken out, current and complete scientific research original information is selected as a data dictionary, and is sequentially processed and input into the generator to construct a vector dictionary set Dict= { V 1 ,V 2 ,V 3 ,……,V n And used for subsequent comparison. There are two modes of use for networks: 1) And (3) information complement: for a heterogeneous input S with a missing, a vector matrix M is constructed in the above manner and is obtainedThen, the European inner product is sequentially calculated with the word vector in the Dict, and the nearest V is selected k And finding out the original data of the content, and complementing the missing content. 2) Similarity matching recommendation: according to the same wayIn a manner, several candidate data with highest similarity are obtained, and the recommendation is performed on the user, and the process is shown in fig. 3.
S5, preprocessing the author related data required by the collaborative filtering recommendation model.
Since the author of the scientific research content is the most critical information in the scientific research information, however, since the author name is only a small part of the original information defined above, it is difficult to generate a suitable recommendation for the user using the recommendation model under the condition that only a single author name is known. Therefore, the invention adopts a collaborative filtering method to construct a recommendation model based on authors. Fig. 5 shows a schematic diagram of a collaborative filtering network architecture in an embodiment of the present invention.
The data preprocessing steps are as follows:
sparse coding of author information into a single hot vector, wherein for author i, the coding mode is as follows:
R=[0,0,0,……0,0,1,0,0,0……0,0]
the ith element of the independent heat vector R of author i is 1, and the remaining elements are 0. Meanwhile, for the scientific research content related to the author, according to steps S1 to S4, constructing an author related scientific research feature vector for the scientific research content related to the author by utilizing a generator G generated in the step S4:
in addition, the relevance of the author u to the scientific research content I needs to be manually evaluatedAs training labels, methods such as interview consultation and statistical observation can be used to mark the degree of correlation of the author vector and the academic literature feature vector. Up to this point a training set of collaborative training networks is built>
And S6, training the collaborative filtering recommendation model by using a training set.
The training network consists of a plurality of full connection layers and a relu layer, and the calculation formula is as follows:
h i-relu =max(0,h i )
wherein W is i T Weight matrix representing the i-th layer, b i Represents the linear offset of the ith layer, h i Represents the specific output of the ith layer, h i-relu Indicating the output of the ith layer after passing the relu layer if the relu layer is present.
At the last layer of the network, after the softmax layer, the final output y is obtained ul The loss function of the network is as follows:
network output y ul The method comprises the steps of identifying the correlation prediction of a network to the scientific research content l by an author u, wherein the higher the correlation, the tighter the relation between the scientific research content and the author, and for the author interested by a user, selecting k scientific research contents most relevant to the author through comparison of candidate scientific research contents, and then backtracking related information such as the author, the field, the keywords and the like of the scientific research contents through original information, so that a recommendation set relevant to the author is generated.
S7, using an integrated recommendation system, and recommending similar research contents for the scientific research contents of interest of the user by using the methods of the steps S1-S4; for researchers of interest to the user (i.e. literature authors), recommendations of researchers and content are generated for them using the method of steps S5-S6, which is illustrated in the general flow diagram of fig. 1. Based on the method, the invention can respectively recommend related authors and related documents.
Based on the same technical concept as the method embodiment, according to another embodiment of the present invention, there is provided an academic-oriented precise recommendation heterogeneous scientific research information integration system, including:
the scientific research information preprocessing module is used for preprocessing heterogeneous original scientific research information and comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying some items of the academic feature vector to construct random heterogeneous scientific research information;
the scientific research information coding module is used for coding the academic information feature vector obtained through preprocessing and the heterogeneous scientific research information, and constructing an academic coding mapping model, wherein the academic coding mapping model is constructed and generated by utilizing multiple convolution layers to generate an countermeasure network and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;
the academic coding mapping model using module is used for converting heterogeneous academic information into academic feature vectors by using a generator part for generating an countermeasure network, complementing the scientific research information with heterogeneous and missing items and generating complete scientific research information feature vectors;
the author information preprocessing module encodes the author information into sparse independent heat vectors and marks the correlation degree between the author information independent heat vectors and scientific research feature vectors;
the collaborative filtering recommendation model training module is used for training a collaborative filtering recommendation model by utilizing the author information unique heat vector, the scientific research information feature vector and the correlation degree between the author information unique heat vector and the scientific research information feature vector, and is used for generating a recommendation set related to authors;
and integrating a recommendation model application module, fusing author-document correlation recommendation based on collaborative filtering and document-document correlation recommendation based on generation of an countermeasure network, obtaining a final recommendation result, and finishing accurate recommendation of academic documents.
In one embodiment, the scientific research information preprocessing module performs preprocessing on the original information of each scientific research paper of a specific source, and extracts the following indexes to construct the original information:
author 1, author 2, … …, author n, research area, paper name, paper keyword, time of listing, paper abstract ]
The original information with complete index data is used as a label; the random part of the fields are emptied to serve as training data, and the labels are in one-to-one correspondence with the training data to serve as an original data training set.
In one embodiment, the scientific research information encoding module encodes the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing, including: mapping each word in the training set of raw data into a word vector V i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M k =[Vector 1 ,Vector 2 ,Vector 3 ,Vector 4 ,……Vector n ]Accordingly, for this training data, there are tags:for one group of scientific research data, multiple groups of data in training set are generated>
In some embodiments, the academic code mapping model used in the scientific information coding module uses a generated countermeasure model, the model uses a manner of generating countermeasure learning, and the generator and the discriminator extract information through convolution layers, wherein the calculation manner of each layer is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the output of the layer after calculation, +.>Output +.>Is a parameter of the layer network, +.>Is the bias of the layer, f is the selected activation function;
the calculation target formula for generating the countermeasure model is as follows:
A=argmin G (argmax D Value(G,D))
Value(D,G)=E x~Pdata(x) [logD(x)]+E z~Pdata(z) [1-logD(G(z))]
wherein G is a generator, D is a discriminator, and E is a cross entropy error.
In some implementations, the academic code mapping model uses modules to effect the conversion of heterogeneous academic information into academic feature vectors by: independently taking out a generator G for generating an countermeasure network, selecting the current complete information as a data dictionary, sequentially inputting the information into the generator G, and constructing a vector dictionary set Dict= { V 1 ,V 2 ,V 3 ,……,V n };
And (3) information complement: for a heterogeneous scientific research original information S with a deficiency, a vector matrix M is constructed by mapping words in the information S into word vectors and stacking the word vectors, and feature vectors are obtainedWill->Sequentially solving European inner products with word vectors in the Dict, and selecting the nearest V k Finding out the original data of the content, and complementing the missing content;
similarity matching recommendation: and obtaining several candidate data with highest similarity according to the European inner product solving mode in the information complementation, and recommending the user.
In some implementations, the author information preprocessing module includes:
the independent heat vector construction unit is used for sparse coding of author information into an independent heat vector R, the ith element of the independent heat vector R of an author i is 1, and the rest elements are 0;
the author related feature vector construction unit constructs an information matrix M according to the mode of mapping words into word vectors for scientific research contents related to authors, and further constructs feature vectors:wherein G is a generator that generates an impedance network;
collaborative filtering recommendation training set construction unit for evaluating relevance of author u to scientific research content lAs training label, training set of collaborative training recommendation model can be obtained>
In some embodiments, the collaborative filtering recommendation model used by the collaborative filtering recommendation model training module includes the following calculation:
h i-relu =max(0,h i )
at the last layer of the network, after the softmax layer, the final output vector y is obtained ul
The objective function of the collaborative filtering recommendation model in the collaborative filtering recommendation model training module is as follows:
wherein the network outputs y ul The method identifies the prediction of the correlation degree of the network to the scientific research content l by the author u, and the higher the correlation degree is, the tighter the relation between the scientific research content and the author is.
In some implementations, the collaborative filtering recommendation model training module generating the set of recommendations related to the author includes: for authors interested by users, k scientific research contents most relevant to the authors are selected through comparison of candidate scientific research contents, and then related information of the authors, the fields and the keywords of the scientific research contents is traced back through the original information, so that a recommendation set relevant to the authors is generated.
It should be understood that the heterogeneous scientific research information integration system for academic accurate recommendation provided in this embodiment may implement all the technical solutions in the foregoing method embodiments, and the functions of each functional module may be specifically implemented according to the methods in the foregoing method embodiments, and specific implementation processes that are not described in detail in this embodiment may refer to relevant descriptions in the foregoing embodiments, which are not repeated herein.
According to another embodiment of the present invention, there is provided a computer apparatus including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implement the steps in the method embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (9)

1. The heterogeneous scientific research information integration method for academic accurate recommendation is characterized by comprising the following steps of:
s1, preprocessing heterogeneous original scientific research information, wherein the preprocessing comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying some items of the academic feature vector to construct random heterogeneous scientific research information;
s2, encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing, and constructing an academic coding mapping model, wherein the academic coding mapping model is used for constructing and generating an countermeasure network by utilizing multiple convolution layers and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;
s3, utilizing a generator part for generating an countermeasure network to realize conversion from heterogeneous academic information to academic information feature vectors, complementing the scientific information with heterogeneous and missing items and generating complete scientific information feature vectors;
s4, encoding the author information into sparse independent heat vectors, and marking the correlation degree between the independent heat vectors of the author information and the scientific research information feature vectors;
s5, training a collaborative filtering recommendation model by utilizing the independent heat vector of the author information, the scientific research information feature vector and the correlation degree between the independent heat vector of the author information and the scientific research information feature vector, and generating a recommendation set related to the author;
and S6, merging author-document correlation recommendation based on collaborative filtering and document-document correlation recommendation based on generation of an antagonism network to obtain a final recommendation result, and finishing accurate recommendation of academic documents.
2. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 1, wherein the step S1 includes: preprocessing the original information of each scientific research paper of a specific source, and extracting the following indexes to construct the original information:
author 1, author 2, … …, author n, research area, paper name, paper keyword, time of listing, paper abstract ]
Taking the original information with complete index data as a label; the random part of the fields are emptied to serve as training data, and the labels are in one-to-one correspondence with the training data to serve as an original data training set.
3. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 2, wherein the step S2 of encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing comprises: mapping each word in the training set of raw data into a word vector V i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M k =[Vector 1 ,Vector 2 ,Vector 3 ,Vector 4 ,......Vector n ]Accordingly, for this training data, there are tags:for one group of scientific research data, multiple groups of data in training set are generated>
4. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 3, wherein the academic coded mapping model in step S2 uses a generated countermeasure model, the model uses a manner of generating countermeasure learning, and the generator and the discriminator extract information through convolution layers, wherein the calculation manner of each layer is as follows:
where m represents the dimension of the layer 1 network output vector,an i-th output parameter representing a layer 1 network,/th output parameter>The j-th output representing the layer-I networkParameters (I)>Is a parameter of the layer 1 network, +.>Is the bias of the first layer, f is the selected activation function;
the calculation target formula for generating the countermeasure model is as follows:
A=argmin G (argmax D Value(G,D))
Value(D,G)=E x~Pdata(x) [logD(x)]+E z~Pdata(z) [1-logD(G(z))]
wherein G is a generator, D is a discriminator, E is a cross entropy error, A is an optimization target, value (G, D) is a counter propagation error calculated when the generator is G and the discriminator is D, z is a hidden variable input by the generator, E x~Pdata(x) Representing that the error is calculated using cross entropy when x obeys its original distribution Pdata (x); similarly, E z~Pdata(z) Indicating that the error is calculated using cross entropy when z obeys the Pdata (z) distribution.
5. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 1, wherein the step S3 includes: independently taking out a generator G for generating an countermeasure network, selecting the current complete information as a data dictionary, sequentially inputting the information into the generator G, and constructing a vector dictionary set Dict= { V 1 ,V 2 ,V 3 ,……,V n };
And (3) information complement: for a heterogeneous scientific research original information S with a deficiency, a vector matrix M is constructed by mapping words in the original information S into word vectors and stacking the word vectors, and feature vectors are obtainedWill->Sequentially solving European inner products with word vectors in the Dict, and selecting the nearest V k Finding out the original data of the content, and complementing the missing content;
similarity matching recommendation: and obtaining several candidate data with highest similarity according to the European inner product solving mode in the information complementation, and recommending the user.
6. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 1, wherein the step S4 includes:
sparse coding the author information into a single heat vector R, wherein the ith element of the single heat vector R of the author i is 1, and the rest elements are 0;
for scientific research content related to authors, an information matrix M is constructed in a mode of mapping words into word vectors and stacking the word vectors, and then feature vectors are constructed:wherein G is a generator that generates an impedance network;
evaluation of the relevance of Author u to the scientific content lAs training labels, training set of collaborative training recommendation model is constructed up to this point>
7. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 6, wherein the collaborative filtering recommendation model used in step S5 comprises the following calculation process:
h i-relu =max(0,h i )
weight matrix representing the i-th layer, b i Represents the linear offset of the ith layer, h i Represents the specific output of the ith layer, h i-relu Indicating the output of the ith layer after passing the relu layer if the relu layer is present;
at the last layer of the network, after the softmax layer, the final output vector y is obtained ul
In step S5, the objective function of the collaborative filtering recommendation model is as follows:
wherein the network outputs y ul The method identifies the prediction of the correlation degree of the network to the scientific research content l by the author u, and the higher the correlation degree is, the tighter the relation between the scientific research content and the author is.
8. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 7, wherein the generating of the recommendation set related to the author in the step S5 includes: for authors interested by users, k scientific research contents most relevant to the authors are selected through comparison of candidate scientific research contents, and then related information of the authors, the fields and the keywords of the scientific research contents is traced back through the original information, so that a recommendation set relevant to the authors is generated.
9. Heterogeneous scientific research information integration system towards accurate recommendation of academic, characterized by comprising:
the scientific research information preprocessing module is used for preprocessing heterogeneous original scientific research information and comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying some items of the academic feature vector to construct random heterogeneous scientific research information;
the scientific research information coding module is used for coding the academic information feature vector obtained through preprocessing and the heterogeneous scientific research information, and constructing an academic coding mapping model, wherein the academic coding mapping model is constructed and generated by utilizing multiple convolution layers to generate an countermeasure network and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;
the academic coding mapping model using module is used for converting heterogeneous academic information into academic feature vectors by using a generator part for generating an countermeasure network, complementing the scientific research information with heterogeneous and missing items and generating complete scientific research information feature vectors;
the author information preprocessing module encodes the author information into sparse independent heat vectors and marks the correlation degree between the author information independent heat vectors and scientific research information feature vectors;
the collaborative filtering recommendation model training module is used for training a collaborative filtering recommendation model by utilizing the author information unique heat vector, the scientific research information feature vector and the correlation degree between the author information unique heat vector and the scientific research information feature vector, and is used for generating a recommendation set related to authors;
and integrating a recommendation model application module, fusing author-document correlation recommendation based on collaborative filtering and document-document correlation recommendation based on generation of an countermeasure network, obtaining a final recommendation result, and finishing accurate recommendation of academic documents.
CN202110732872.7A 2021-06-30 2021-06-30 Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system Active CN113343125B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110732872.7A CN113343125B (en) 2021-06-30 2021-06-30 Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system
PCT/CN2021/104396 WO2023272748A1 (en) 2021-06-30 2021-07-03 Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732872.7A CN113343125B (en) 2021-06-30 2021-06-30 Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system

Publications (2)

Publication Number Publication Date
CN113343125A CN113343125A (en) 2021-09-03
CN113343125B true CN113343125B (en) 2023-08-22

Family

ID=77481815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732872.7A Active CN113343125B (en) 2021-06-30 2021-06-30 Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system

Country Status (2)

Country Link
CN (1) CN113343125B (en)
WO (1) WO2023272748A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578884B (en) * 2023-07-07 2023-10-31 北京邮电大学 Scientific research team identification method and device based on heterogeneous information network representation learning
CN117076658B (en) * 2023-08-22 2024-05-03 南京朗拓科技投资有限公司 Quotation recommendation method, device and terminal based on information entropy
CN116909991B (en) * 2023-09-12 2023-12-12 中国人民解放军总医院第六医学中心 NLP-based scientific research archive management method and system
CN117556118B (en) * 2024-01-11 2024-04-16 中国科学技术信息研究所 Visual recommendation system and method based on scientific research big data prediction

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336793A (en) * 2013-06-09 2013-10-02 中国科学院计算技术研究所 Personalized paper recommendation method and system thereof
CN108132961A (en) * 2017-11-06 2018-06-08 浙江工业大学 A kind of bibliography based on reference prediction recommends method
CN112069290A (en) * 2020-07-27 2020-12-11 中国科学院计算机网络信息中心 Academic paper recommendation method based on local structure of graph and semantic similarity of text
CN112214687A (en) * 2020-09-29 2021-01-12 华南师范大学 Paper recommendation method, system and medium for temporal perception academic information
CN112632397A (en) * 2021-01-04 2021-04-09 同方知网(北京)技术有限公司 Personalized recommendation method based on multi-type academic achievement portrait and mixed recommendation strategy

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132601A1 (en) * 2014-11-12 2016-05-12 Microsoft Technology Licensing Hybrid Explanations In Collaborative Filter Based Recommendation System
CN112862015A (en) * 2021-04-01 2021-05-28 北京理工大学 Paper classification method and system based on hypergraph neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336793A (en) * 2013-06-09 2013-10-02 中国科学院计算技术研究所 Personalized paper recommendation method and system thereof
CN108132961A (en) * 2017-11-06 2018-06-08 浙江工业大学 A kind of bibliography based on reference prediction recommends method
CN112069290A (en) * 2020-07-27 2020-12-11 中国科学院计算机网络信息中心 Academic paper recommendation method based on local structure of graph and semantic similarity of text
CN112214687A (en) * 2020-09-29 2021-01-12 华南师范大学 Paper recommendation method, system and medium for temporal perception academic information
CN112632397A (en) * 2021-01-04 2021-04-09 同方知网(北京)技术有限公司 Personalized recommendation method based on multi-type academic achievement portrait and mixed recommendation strategy

Also Published As

Publication number Publication date
CN113343125A (en) 2021-09-03
WO2023272748A1 (en) 2023-01-05

Similar Documents

Publication Publication Date Title
CN110046304B (en) User recommendation method and device
CN113343125B (en) Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system
CN110059198B (en) Discrete hash retrieval method of cross-modal data based on similarity maintenance
Shi et al. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval
Benabderrahmane et al. On the predictive analysis of behavioral massive job data using embedded clustering and deep recurrent neural networks
Yan et al. Active learning from multiple knowledge sources
Wen et al. Hybrid attentive answer selection in CQA with deep users modelling
CN111858940A (en) Multi-head attention-based legal case similarity calculation method and system
CN117333037A (en) Industrial brain construction method and device for publishing big data
Li et al. Mining online reviews for ranking products: A novel method based on multiple classifiers and interval-valued intuitionistic fuzzy TOPSIS
CN116680363A (en) Emotion analysis method based on multi-mode comment data
Shen et al. Clustering-driven deep adversarial hashing for scalable unsupervised cross-modal retrieval
CN114706989A (en) Intelligent recommendation method based on technical innovation assets as knowledge base
CN113516094A (en) System and method for matching document with review experts
CN111444414A (en) Information retrieval model for modeling various relevant characteristics in ad-hoc retrieval task
CN116244497A (en) Cross-domain paper recommendation method based on heterogeneous data embedding
Fosset et al. Docent: A content-based recommendation system to discover contemporary art
Li Construction of Sports Training Performance Prediction Model Based on a Generative Adversarial Deep Neural Network Algorithm
CN115269984A (en) Professional information recommendation method and system
Zhu et al. Few-shot temporal knowledge graph completion based on meta-optimization
Zhang et al. A deep recommendation framework for completely new users in mashup creation
CN113343666B (en) Method, device, equipment and storage medium for determining confidence of score
Ho et al. Enhancing Recommender Systems by Fusing Diverse Information Sources through Data Transformation and Feature Selection
Kumar et al. ACM venue recommendation system
CN114048305B (en) Class case recommendation method of administrative punishment document based on graph convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant