CN113343125B

CN113343125B - Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system

Info

Publication number: CN113343125B
Application number: CN202110732872.7A
Authority: CN
Inventors: 张凯; 王楚豫; 叶保留
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2023-08-22
Anticipated expiration: 2041-06-30
Also published as: CN113343125A; WO2023272748A1

Abstract

The invention discloses a heterogeneous scientific research information integration method and system for academic accurate recommendation, wherein the method comprises the following steps: preprocessing original scientific research information, encoding heterogeneous academic literature information into a vector mode, and constructing random heterogeneous scientific research information; encoding the academic information feature vector and the heterogeneous scientific research information, and constructing an academic coding mapping model; the generator in the mapping model is utilized to realize the conversion from heterogeneous academic information to complete scientific research feature vectors; the author information is encoded into sparse independent heat vectors, and the correlation degree between the sparse independent heat vectors and scientific research feature vectors is marked; training a collaborative filtering recommendation model by utilizing the independent heat vector, the scientific research feature vector and the correlation degree of the independent heat vector and the scientific research feature vector to generate a recommendation set related to an author; and fusing the two recommendation results to finish the accurate recommendation of the academic literature. The invention realizes heterogeneous scientific research information completion and fusion and accurate recommendation of scientific research academic content through comparison of characteristic content and collaborative filtering of authors.

Description

Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system

Technical Field

The invention relates to academic data recommendation, in particular to a heterogeneous scientific research information integration method and system for academic accurate recommendation.

Background

Along with the rapid development of internet technology, various scientific journals and publications are gradually changed from paper media to online stored electronic media, and online searching can be performed in a scientific database and other modes, so that great convenience is provided for scientific researchers. However, since the academic data is usually generated by discrete individuals or organizations, the difference of structures of different academic data is quite large, the quality of the different academic data is uneven, and the accurate positioning is difficult to realize due to heterogeneous academic data; the same author has larger deviation on different websites, marked versions of databases and the like, and some information deletion exists in different degrees for uploaded data, so that massive heterogeneous academic data is caused, and great difficulty is brought to scientific research retrieval and related academic recommendation. Therefore, how to integrate and fuse heterogeneous academic data makes constructing a unified academic information structure a key to academic recommendation.

Unlike traditional data information, academic data often requires finer classification to achieve accurate recommendations, even for research content in the same field, different users often have different trends. For example, for a beginner in a certain field, academic literature of review type and science popularization type can help to quickly understand the knowledge of the field; however, for research specialists in this field, more sophisticated and more specialized academic documents need to be recommended. Therefore, accurate recommendation based on academic data not only needs to have accurate classification and positioning on literature data, but also needs to be recommended by combining the characteristics of users, so that more accurate recommendation effect can be achieved.

The existing scientific research information recommendation method is usually recommended for a certain information source, and objective requirements of users on heterogeneous information retrieval cannot be well considered.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a heterogeneous scientific research information integration method and system for accurately recommending academic, which are used for processing heterogeneous scientific research information by using a method of generating type counterlearning and collaborative filtering, carrying out detail complementation on information with different quality, and constructing a comparison information base on the existing information so as to realize accurate intelligent recommendation on scientific research contents and scholars.

In order to achieve the above purpose, the invention adopts the following technical scheme:

according to a first aspect of the invention, a heterogeneous scientific research information integration method for academic accurate recommendation is provided, which comprises the following steps:

s1, preprocessing heterogeneous original scientific research information, wherein the preprocessing comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying some items of the academic feature vector to construct random heterogeneous scientific research information;

s2, encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing, and constructing an academic coding mapping model, wherein the academic coding mapping model is used for constructing and generating an countermeasure network by utilizing multiple convolution layers and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;

s3, utilizing a generator part for generating an countermeasure network to realize conversion from heterogeneous academic information to academic feature vectors, complementing the scientific information with heterogeneous and missing items and generating complete scientific information feature vectors;

s4, encoding the author information into sparse independent heat vectors, and marking the degree of correlation between the author information independent heat vectors and scientific research information feature vectors;

s5, training a collaborative filtering recommendation model by using the author information unique heat vector, the scientific research information feature vector and the correlation degree between the author information unique heat vector and the scientific research information feature vector, and generating a recommendation set related to the author;

and S6, merging author-literature correlation recommendation based on collaborative filtering and literature-literature correlation recommendation based on generation of an antagonism network to obtain a final recommendation result, and finishing accurate recommendation of academic literature.

In some embodiments, the step S1 includes: preprocessing the original information of each scientific research paper of a specific source, and extracting the following indexes to construct the original information:

author 1, author 2, … …, author n, research area, paper name, paper keyword, time of listing, paper abstract ]

Taking the original information with complete index data as a label; the random part of the fields are emptied to serve as training data, and the labels are in one-to-one correspondence with the training data to serve as an original data training set.

In some embodiments, the step ofThe step S2 of encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing comprises the following steps: mapping each word in the training set of raw data into a word vector V _i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M _k ＝[Vector ₁ ,Vector ₂ ,Vector ₃ ,Vector ₄ ,……Vector _n ]Accordingly, for this training data, there are tags:for one group of scientific research data, multiple groups of data in training set are generated>

In some embodiments, the learning coding mapping model in step S2 uses a model of generating countermeasure, the model uses a method of generating countermeasure learning, and the generator and the arbiter extract information through convolution layers, wherein each layer is calculated as follows:

where m represents the output vector dimension of the l-1 layer network,representing the i-th output parameter of the layer 1 network,the j-th output parameter representing the layer-1 network,>is a parameter of the layer 1 network, +.>Is the bias of the first layer, f is the selected activation function;

the calculation target formula for generating the countermeasure model is as follows:

A＝argmin _G (argmax _D Value(G,D))

Value(D,G)＝E _x～Pdata(x) [logD(x)]+E _z～Pdata(z) [1-logD(G(z))]

wherein G is a generator, D is a discriminator, and E is a cross entropy error.

In some embodiments, the step S3 includes: independently taking out a generator G for generating an countermeasure network, selecting the current complete information as a data dictionary, sequentially inputting the information into the generator G, and constructing a vector dictionary set Dict= { V ₁ ,V ₂ ,V ₃ ,……,V _n }；

And (3) information complement: for a heterogeneous input S with a deficiency, a vector matrix M is constructed by mapping words in the input S into word vectors and stacking them, and feature vectors are obtainedWill->Sequentially solving European inner products with word vectors in the Dict, and selecting the nearest V _k Finding out the original data of the content, and complementing the missing content;

similarity matching recommendation: and obtaining several candidate data with highest similarity according to the European inner product solving mode in the information complementation, and recommending the user.

In some embodiments, the step S4 includes:

sparse coding the author information into a single heat vector R, wherein the ith element of the single heat vector R of the author i is 1, and the rest elements are 0;

for scientific research content related to authors, an information matrix M is constructed in a mode of mapping words into word vectors, and feature vectors are further constructed:wherein G is an antigen-forming networkA generator of a complex;

evaluation of the relevance of Author u to the scientific content lAs training labels, training set of collaborative training recommendation model is constructed up to this point>

In some embodiments, the collaborative filtering recommendation model used in step S5 includes the following calculation process:

h _i-relu ＝max(0,h _i )

W _i ^T weight matrix representing the i-th layer, b _i Represents the offset of the ith layer, h _i Represents the specific output of the ith layer, h _i-relu Indicating the output of the ith layer through the relu layer if the relu layer is present;

at the last layer of the network, after the softmax layer, the final output vector y is obtained _ul ；

In step S5, the objective function of the collaborative filtering recommendation model is as follows:

wherein the network outputs y _ul The method identifies the prediction of the correlation degree of the network to the scientific research content l by the author u, and the higher the correlation degree is, the tighter the relation between the scientific research content and the author is.

In some implementations, generating the set of recommendations related to the author in step S5 includes: for authors interested by users, k scientific research contents most relevant to the authors are selected through comparison of candidate scientific research contents, and then related information of the authors, the fields and the keywords of the scientific research contents is traced back through the original information, so that a recommendation set relevant to the authors is generated.

According to a second aspect of the present invention, there is provided an academic accurate recommendation-oriented heterogeneous scientific research information integration system, comprising:

the scientific research information preprocessing module is used for preprocessing heterogeneous original scientific research information and comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying the academic feature vector to construct random heterogeneous scientific research information;

the scientific research information coding module is used for coding the academic information feature vector obtained through preprocessing and the heterogeneous scientific research information, and constructing an academic coding mapping model, wherein the academic coding mapping model is constructed and generated by utilizing multiple convolution layers to generate an countermeasure network and is used for obtaining the mapping of the complete feature vector from the heterogeneous scientific research information;

the academic coding mapping model using module is used for converting heterogeneous academic information into academic feature vectors by using a generator part for generating an countermeasure network, complementing the scientific research information with heterogeneous and missing items and generating complete scientific research information feature vectors;

the author information preprocessing module encodes the author information into sparse independent heat vectors and marks the correlation degree between the author information independent heat vectors and scientific research information feature vectors;

the collaborative filtering recommendation model training module is used for training a collaborative filtering recommendation model by utilizing the author information unique heat vector, the scientific research information feature vector and the correlation degree between the author information unique heat vector and the scientific research information feature vector, and is used for generating a recommendation set related to authors;

and integrating a recommendation model application module, fusing author-document correlation recommendation based on collaborative filtering and document-document correlation recommendation based on generation of an countermeasure network, obtaining a final recommendation result, and finishing accurate recommendation of academic documents.

The invention has the following beneficial effects: the method is based on the methods such as a generated type countermeasure network, a collaborative filtering method, a feature mapping method and a deep neural network, realizes a heterogeneous scientific research information integration method for accurately recommending scientific research, realizes functions such as heterogeneous scientific research information completion and fusion, accurately recommending scientific research academic content and the like through comparing feature content and collaborative filtering of authors, and improves recommending effect and user experience of scientific research information.

Drawings

Fig. 1 is a schematic flow chart of a heterogeneous scientific research information integration method for academic accurate recommendation provided by the embodiment of the invention;

FIG. 2 is a schematic diagram of a training process for generating an countermeasure network according to an embodiment of the present invention;

fig. 3 is a recommendation process based on scientific research content according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a generator provided by an embodiment of the present invention;

fig. 5 is a schematic diagram of a collaborative filtering network according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention. In the following description, academic information and scientific information are used interchangeably.

Referring to fig. 1, in one embodiment, a heterogeneous scientific research information integration method for academic accurate recommendation includes the following steps:

s1, preprocessing scientific research information and constructing an original data training set.

Preprocessing the original information of each scientific research paper of a specific source, and selecting the following indexes to construct the original information in the process, wherein the format of the original information is as follows:

these features are selected as original features, and in the training set, in order to obtain a good training effect, a certain process must be performed on the training set, so that some items of information are deleted:

F ₁ = [ author 1, author 2, … …, NULL, research area, paper name, time of listing, paper abstract ]]

F ₂ = [ author 1, author 2, … …, author n, NULL, paper name, time of listing, paper abstract ]]

For example, the above data are obtained by randomly "attacking" some parts of the original information to make them miss some information, and combining the original data of the missing information with the data of the corresponding informationTogether forming a training set of raw data.

S2, encoding scientific research information.

And segmenting the original data training set by using a characteristic coding mode. According to some existing and mature mapping modes, mapping each word in the original data training set into a word Vector _i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M _k ＝[Vector ₁ ,Vector ₂ ,Vector ₃ ,Vector ₄ ,……Vector _n ]Accordingly, for this training data, there are tags:label->Is made up of original information +.>The academic information feature vector obtained by processing is corresponding training data M _k The training goals we want to learn to get, we use this tag to correct the output of the network. Because of the difference of the content and the content of the scientific research information, the original processing matrix has a certain gap, so that the scientific research information with less content is filled, and the input format is madeUnified for subsequent processing.

Thus, the conversion from the original input information to the matrix is completed, and a plurality of groups of data in the training set can be generated for one group of scientific research data

And S3, training an academic recommendation model.

Training set to be processed in step S2The training parameters are input into an academic recommendation model, the training model at the stage adopts a deep learning-based mode of generating countermeasure learning, and the generator and the discriminator extract information through a convolution layer, wherein the calculation mode of each layer is as follows:

where l represents the number of layers, m represents the output vector dimension of the l-1 layer network,an i-th output parameter representing a layer 1 network,/th output parameter>J-th parameter representing layer i network, < >>Is a parameter of the layer-l network, also the object of iterative optimization, +.>Is the bias of the layer (layer l), f is the selected activation function, and in this model the softmax activation function is selected. In the training process, a specific error calculation formula is as follows:

A＝argmin _G (argmax _D Value(G,D))

Value(D,G)＝E _x～Pdata(x) [logD(x)]+E _z～Pdata(z) [1-logD(G(z))]

wherein G is a generator, D is a discriminator, E is a cross entropy error, A is an optimization target, value (G, D) is a counter propagation error calculated by the formula under the condition that the generator is G and the discriminator is D, and z is a hidden variable input by the generator and is a learnable bias variable related to the network structure of the generator. E (E) _x～Pdata(x) Representing that the error is calculated using cross entropy when x obeys its original distribution Pdata (x); similarly, E _z～Pdata(z) Indicating that the error is calculated using cross entropy when z obeys the Pdata (z) distribution.

The input of the model is the items in the training set described above, and for each item, the output is a vector V, the purpose of the model is to make the arbiter output by continuously adjusting the parameters: v=g (M) andwith the greatest similarity, the continuous iteration of the generator and the arbiter is performed until the training converges. Fig. 2 is a schematic diagram of the system architecture of this step, and fig. 4 is a schematic diagram of the structure of the generator model.

S4, after a trained model is obtained, a generator G is independently taken out, current and complete scientific research original information is selected as a data dictionary, and is sequentially processed and input into the generator to construct a vector dictionary set Dict= { V ₁ ,V ₂ ,V ₃ ,……,V _n And used for subsequent comparison. There are two modes of use for networks: 1) And (3) information complement: for a heterogeneous input S with a missing, a vector matrix M is constructed in the above manner and is obtainedThen, the European inner product is sequentially calculated with the word vector in the Dict, and the nearest V is selected _k And finding out the original data of the content, and complementing the missing content. 2) Similarity matching recommendation: according to the same wayIn a manner, several candidate data with highest similarity are obtained, and the recommendation is performed on the user, and the process is shown in fig. 3.

S5, preprocessing the author related data required by the collaborative filtering recommendation model.

Since the author of the scientific research content is the most critical information in the scientific research information, however, since the author name is only a small part of the original information defined above, it is difficult to generate a suitable recommendation for the user using the recommendation model under the condition that only a single author name is known. Therefore, the invention adopts a collaborative filtering method to construct a recommendation model based on authors. Fig. 5 shows a schematic diagram of a collaborative filtering network architecture in an embodiment of the present invention.

The data preprocessing steps are as follows:

sparse coding of author information into a single hot vector, wherein for author i, the coding mode is as follows:

R＝[0,0,0,……0,0,1,0,0,0……0,0]

the ith element of the independent heat vector R of author i is 1, and the remaining elements are 0. Meanwhile, for the scientific research content related to the author, according to steps S1 to S4, constructing an author related scientific research feature vector for the scientific research content related to the author by utilizing a generator G generated in the step S4:

in addition, the relevance of the author u to the scientific research content I needs to be manually evaluatedAs training labels, methods such as interview consultation and statistical observation can be used to mark the degree of correlation of the author vector and the academic literature feature vector. Up to this point a training set of collaborative training networks is built>

And S6, training the collaborative filtering recommendation model by using a training set.

The training network consists of a plurality of full connection layers and a relu layer, and the calculation formula is as follows:

h _i-relu ＝max(0,h _i )

wherein W is _i ^T Weight matrix representing the i-th layer, b _i Represents the linear offset of the ith layer, h _i Represents the specific output of the ith layer, h _i-relu Indicating the output of the ith layer after passing the relu layer if the relu layer is present.

At the last layer of the network, after the softmax layer, the final output y is obtained _ul The loss function of the network is as follows:

network output y _ul The method comprises the steps of identifying the correlation prediction of a network to the scientific research content l by an author u, wherein the higher the correlation, the tighter the relation between the scientific research content and the author, and for the author interested by a user, selecting k scientific research contents most relevant to the author through comparison of candidate scientific research contents, and then backtracking related information such as the author, the field, the keywords and the like of the scientific research contents through original information, so that a recommendation set relevant to the author is generated.

S7, using an integrated recommendation system, and recommending similar research contents for the scientific research contents of interest of the user by using the methods of the steps S1-S4; for researchers of interest to the user (i.e. literature authors), recommendations of researchers and content are generated for them using the method of steps S5-S6, which is illustrated in the general flow diagram of fig. 1. Based on the method, the invention can respectively recommend related authors and related documents.

Based on the same technical concept as the method embodiment, according to another embodiment of the present invention, there is provided an academic-oriented precise recommendation heterogeneous scientific research information integration system, including:

the scientific research information preprocessing module is used for preprocessing heterogeneous original scientific research information and comprises the following steps: respectively encoding heterogeneous academic literature information into a vector mode, and extracting semantically related academic feature vectors for encoding training; randomly emptying some items of the academic feature vector to construct random heterogeneous scientific research information;

the author information preprocessing module encodes the author information into sparse independent heat vectors and marks the correlation degree between the author information independent heat vectors and scientific research feature vectors;

In one embodiment, the scientific research information preprocessing module performs preprocessing on the original information of each scientific research paper of a specific source, and extracts the following indexes to construct the original information:

The original information with complete index data is used as a label; the random part of the fields are emptied to serve as training data, and the labels are in one-to-one correspondence with the training data to serve as an original data training set.

In one embodiment, the scientific research information encoding module encodes the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing, including: mapping each word in the training set of raw data into a word vector V _i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M _k ＝[Vector ₁ ,Vector ₂ ,Vector ₃ ,Vector ₄ ,……Vector _n ]Accordingly, for this training data, there are tags:for one group of scientific research data, multiple groups of data in training set are generated>

In some embodiments, the academic code mapping model used in the scientific information coding module uses a generated countermeasure model, the model uses a manner of generating countermeasure learning, and the generator and the discriminator extract information through convolution layers, wherein the calculation manner of each layer is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,for the output of the layer after calculation, +.>Output +.>Is a parameter of the layer network, +.>Is the bias of the layer, f is the selected activation function;

A＝argmin _G (argmax _D Value(G,D))

Value(D,G)＝E _x～Pdata(x) [logD(x)]+E _z～Pdata(z) [1-logD(G(z))]

wherein G is a generator, D is a discriminator, and E is a cross entropy error.

In some implementations, the academic code mapping model uses modules to effect the conversion of heterogeneous academic information into academic feature vectors by: independently taking out a generator G for generating an countermeasure network, selecting the current complete information as a data dictionary, sequentially inputting the information into the generator G, and constructing a vector dictionary set Dict= { V ₁ ,V ₂ ,V ₃ ,……,V _n }；

And (3) information complement: for a heterogeneous scientific research original information S with a deficiency, a vector matrix M is constructed by mapping words in the information S into word vectors and stacking the word vectors, and feature vectors are obtainedWill->Sequentially solving European inner products with word vectors in the Dict, and selecting the nearest V _k Finding out the original data of the content, and complementing the missing content;

In some implementations, the author information preprocessing module includes:

the independent heat vector construction unit is used for sparse coding of author information into an independent heat vector R, the ith element of the independent heat vector R of an author i is 1, and the rest elements are 0;

the author related feature vector construction unit constructs an information matrix M according to the mode of mapping words into word vectors for scientific research contents related to authors, and further constructs feature vectors:wherein G is a generator that generates an impedance network;

collaborative filtering recommendation training set construction unit for evaluating relevance of author u to scientific research content lAs training label, training set of collaborative training recommendation model can be obtained>

In some embodiments, the collaborative filtering recommendation model used by the collaborative filtering recommendation model training module includes the following calculation:

h _i-relu ＝max(0,h _i )

The objective function of the collaborative filtering recommendation model in the collaborative filtering recommendation model training module is as follows:

In some implementations, the collaborative filtering recommendation model training module generating the set of recommendations related to the author includes: for authors interested by users, k scientific research contents most relevant to the authors are selected through comparison of candidate scientific research contents, and then related information of the authors, the fields and the keywords of the scientific research contents is traced back through the original information, so that a recommendation set relevant to the authors is generated.

It should be understood that the heterogeneous scientific research information integration system for academic accurate recommendation provided in this embodiment may implement all the technical solutions in the foregoing method embodiments, and the functions of each functional module may be specifically implemented according to the methods in the foregoing method embodiments, and specific implementation processes that are not described in detail in this embodiment may refer to relevant descriptions in the foregoing embodiments, which are not repeated herein.

According to another embodiment of the present invention, there is provided a computer apparatus including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implement the steps in the method embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The heterogeneous scientific research information integration method for academic accurate recommendation is characterized by comprising the following steps of:

s3, utilizing a generator part for generating an countermeasure network to realize conversion from heterogeneous academic information to academic information feature vectors, complementing the scientific information with heterogeneous and missing items and generating complete scientific information feature vectors;

s4, encoding the author information into sparse independent heat vectors, and marking the correlation degree between the independent heat vectors of the author information and the scientific research information feature vectors;

s5, training a collaborative filtering recommendation model by utilizing the independent heat vector of the author information, the scientific research information feature vector and the correlation degree between the independent heat vector of the author information and the scientific research information feature vector, and generating a recommendation set related to the author;

and S6, merging author-document correlation recommendation based on collaborative filtering and document-document correlation recommendation based on generation of an antagonism network to obtain a final recommendation result, and finishing accurate recommendation of academic documents.

2. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 1, wherein the step S1 includes: preprocessing the original information of each scientific research paper of a specific source, and extracting the following indexes to construct the original information:

3. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 2, wherein the step S2 of encoding the academic information feature vector and the heterogeneous scientific research information obtained by preprocessing comprises: mapping each word in the training set of raw data into a word vector V _i Mapping all information of the original features and then stacking to obtain a preprocessed matrix M _k ＝[Vector ₁ ，Vector ₂ ，Vector ₃ ，Vector ₄ ，......Vector _n ]Accordingly, for this training data, there are tags:for one group of scientific research data, multiple groups of data in training set are generated>

4. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 3, wherein the academic coded mapping model in step S2 uses a generated countermeasure model, the model uses a manner of generating countermeasure learning, and the generator and the discriminator extract information through convolution layers, wherein the calculation manner of each layer is as follows:

where m represents the dimension of the layer 1 network output vector,an i-th output parameter representing a layer 1 network,/th output parameter>The j-th output representing the layer-I networkParameters (I)>Is a parameter of the layer 1 network, +.>Is the bias of the first layer, f is the selected activation function;

A＝argmin _G (argmax _D Value(G，D))

Value(D，G)＝E _x～Pdata(x) [logD(x)]+E _z～Pdata(z) [1-logD(G(z))]

wherein G is a generator, D is a discriminator, E is a cross entropy error, A is an optimization target, value (G, D) is a counter propagation error calculated when the generator is G and the discriminator is D, z is a hidden variable input by the generator, E _x～Pdata(x) Representing that the error is calculated using cross entropy when x obeys its original distribution Pdata (x); similarly, E _z～Pdata(z) Indicating that the error is calculated using cross entropy when z obeys the Pdata (z) distribution.

5. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 1, wherein the step S3 includes: independently taking out a generator G for generating an countermeasure network, selecting the current complete information as a data dictionary, sequentially inputting the information into the generator G, and constructing a vector dictionary set Dict= { V ₁ ，V ₂ ，V ₃ ，……，V _n }；

And (3) information complement: for a heterogeneous scientific research original information S with a deficiency, a vector matrix M is constructed by mapping words in the original information S into word vectors and stacking the word vectors, and feature vectors are obtainedWill->Sequentially solving European inner products with word vectors in the Dict, and selecting the nearest V _k Finding out the original data of the content, and complementing the missing content;

6. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 1, wherein the step S4 includes:

for scientific research content related to authors, an information matrix M is constructed in a mode of mapping words into word vectors and stacking the word vectors, and then feature vectors are constructed:wherein G is a generator that generates an impedance network;

7. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 6, wherein the collaborative filtering recommendation model used in step S5 comprises the following calculation process:

h _i-relu ＝max(0，h _i )

weight matrix representing the i-th layer, b _i Represents the linear offset of the ith layer, h _i Represents the specific output of the ith layer, h _i-relu Indicating the output of the ith layer after passing the relu layer if the relu layer is present;

8. The academic accurate recommendation-oriented heterogeneous scientific research information integration method according to claim 7, wherein the generating of the recommendation set related to the author in the step S5 includes: for authors interested by users, k scientific research contents most relevant to the authors are selected through comparison of candidate scientific research contents, and then related information of the authors, the fields and the keywords of the scientific research contents is traced back through the original information, so that a recommendation set relevant to the authors is generated.

9. Heterogeneous scientific research information integration system towards accurate recommendation of academic, characterized by comprising: