CN112732889A - Student retrieval method and device based on cooperative network - Google Patents

Student retrieval method and device based on cooperative network Download PDF

Info

Publication number
CN112732889A
CN112732889A CN202011420372.1A CN202011420372A CN112732889A CN 112732889 A CN112732889 A CN 112732889A CN 202011420372 A CN202011420372 A CN 202011420372A CN 112732889 A CN112732889 A CN 112732889A
Authority
CN
China
Prior art keywords
learner
vector
scholars
student
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011420372.1A
Other languages
Chinese (zh)
Inventor
张道枫
李微
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011420372.1A priority Critical patent/CN112732889A/en
Publication of CN112732889A publication Critical patent/CN112732889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a student retrieval method based on a cooperative network, which comprises the following steps: the method comprises the following steps: building a student cooperative network; step two: optimization of a student cooperative network structure; step three: training a student node representation model; step four: establishing a learner vector index; step five: the learner searches; the scheme realizes a retrieval model based on word embedding, and compared with the traditional keyword retrieval mode, the retrieval model based on word embedding can make full use of semantic information and improve the recall rate of retrieval.

Description

Student retrieval method and device based on cooperative network
Technical Field
The invention relates to a retrieval method and a retrieval device, in particular to a student retrieval method based on a cooperative network, and belongs to the technical field of information retrieval.
Background
Innovation is the source of economic growth of modern countries or regions, and the main problem of regional innovation is talent introduction. Students in high schools undertake scientific research projects, apply high-value scientific and technological achievements to production, and can promote industry upgrading and improve competitiveness. How to accurately find scholars in mass information becomes a key for talent introduction.
The scholars search plays an important role in talent introduction, and the scholars are experts in one or more leading-edge fields, master latest scientific research trends, possess a large number of interpersonal relationships and can provide guidance opinions for research and development and production of enterprises. The domain characteristics of the scientific research data of the scholars are analyzed, a scholars retrieval model is constructed, and the scholars can be positioned by government enterprises to complete talent introduction work.
The scholars' search has been receiving extensive attention and intensive research from experts in various fields, and the research results thereof have been successfully applied to various systems, such as Aminer, encyclopedia, and the like. But existing systems and models only consider student attributes or thesis information. The cooperation relationship of the scholars also implies a large amount of information, and the accuracy of retrieval can be improved by applying the cooperation relationship to the scholars for retrieval.
Disclosure of Invention
The invention provides a learner retrieval method based on a cooperative network aiming at the problems in the prior art, the technical scheme provides the learner retrieval method based on word embedding, a word embedding model can learn semantic information, network topology attributes are added into the model, and the precision can be improved in the retrieval model. The learner search process includes a learner node representation and a learner vector search. Firstly, a learner cooperative network is constructed, network topology information is added into a word embedding model, and a learner node representation vector is obtained and consists of a text vector and a node vector. The text vector is generated from the text of the current node and the neighboring nodes, and the node vector is generated from the neighboring nodes. And then, the vector of the learner is searched by using the product quantization model, and the related learner is returned. Compared with the traditional model, the word embedding-based retrieval model can improve the recall ratio on the premise of ensuring the precision ratio through testing.
In order to achieve the above object, the technical solution of the present invention is as follows, a learner search method based on a cooperative network, the method comprising the steps of:
the method comprises the following steps: building a student cooperative network;
step two: optimization of a student cooperative network structure;
step three: training a student node representation model;
step four: establishing a learner vector index;
step five: and (5) searching by the scholars.
As an improvement of the invention, the method comprises the following steps: building a student cooperative network; the method comprises the following specific steps: reading scholars data and scholars cooperative output data from a database, comprising: scholars data (ID, author, organization, age, job title, profile);
scholars collaboratively producing data (ID, title, author, organization, abstract, journal, year, keyword); after the data is read, the data needs to be preprocessed, and a student cooperation network is constructed. The student cooperation network takes the students as nodes, and the quantity of cooperative output of the students is the weight of edges. The construction process of the student cooperative network mainly uses a network tool kit, and firstly inputs cooperative output of students, such as data of articles, patents and the like. And initializing an adjacency matrix of student cooperation according to the input statistical quantity of students, circularly processing participants of each thesis or project, updating the adjacency matrix, and finally outputting the adjacency matrix, namely the student cooperation network.
As an improvement of the invention, the step two is as follows: the optimization of the cooperative network structure of the scholars is as follows:
step 1) using the weights of the logarithmically normalized edges,
step 2) calculating the topological similarity, the text similarity and the path distance between the scholars pairwise,
step 3) calculating the total similarity between the scholars pairwise, sorting according to the similarity,
step 4) selecting the relationship of scholars with the similarity ranking of 10 percent, adding the cooperative relationship of scholars,
step 5) selecting the relationship of scholars with the similarity ranking of 10 percent, deleting the cooperative relationship of scholars,
step 6) updating the student cooperative network, updating the adjacent matrix,
the edge is normalized in step 1) to narrow the difference of the number of times of cooperation between different scholars, the more the number of times of cooperation, the higher the degree of similarity of the scholars, the smaller the distance between the scholars should be, and the calculation formula is shown in 1:
pij=1/ln(pij+1) (1)
wherein p isijIs the weight of the edge.
The topological-based Similarity (Topology) represents the Similarity between nodes in the graph in terms of topological structure, if two nodes have common neighbor nodes, they are more likely to be similar and have a cooperative relationship, and the calculation formula is shown in fig. 2:
topoSim(H,v)=2*|N(u)∩N(v)|/(d(u)+d(v)) (2)
wherein N (u) is the co-worker of the student u, N (u) n (v) is the co-worker of the student u and the student v, d (u) is the degree of the student u, d (v) is the degree of the student v;
text similarity represents the similarity of papers between scholars. The more similar the papers published among scholars and the more potential cooperative relationships, the calculation formula is shown in fig. 3:
Figure RE-GDA0002995886150000031
wherein xuFor the text vector of learner u, the text information is encoded using the BERT model.
The similarity of the learners is composed of the topological structure of the network, node attribute information and the path distance between the learners, and the calculation formula is shown as 4:
authorSim(u,v)=textSim(u,v)*topoSim(u,v)/dist(u,v) (4)
wherein authorSim (u, v) is the similarity between the scholars, dist (u, v) is the shortest path distance between the scholars u and v, and when the shortest path between the scholars u and v is calculated, if a path already exists between scholars nodes, the shortest path between the nodes should be calculated after the current path is deleted.
The cooperation relationship of the scholars with low similarity is deleted through an algorithm, the cooperation relationship of the scholars with high similarity is added, and the optimized cooperation network is closer to a real scholars community. Better effects can be achieved in the tests of the student representation model and the student community discovery model.
As an improvement of the invention, step three: training a student node representation model; the method comprises the following specific steps:
the CANE model fuses text context information into network embedded vectors, and the representation capability of the model on nodes in the network is improved. However, the model focuses on mining the network topology information of the nodes, and the text information is only used as a supplement of the network information. Therefore, the CANE model is suitable for network related tasks such as community discovery, link prediction and the like, and is difficult to apply to information retrieval. In addition, the CANE adopts a bag-of-words model and ignores word sequence information.
A student node representation model suitable for information retrieval is provided by using a BERT model and taking the idea of a CANE model as reference. The method includes the steps that network topology information of a learner is added into an original word embedding model to obtain a learner representation vector and more accurately represent the learner, a learner partner network G is set to be (V, E), each vertex represents the learner, each edge E is set to be (u, V) to represent the relationship between the learner u and the learner V, the model is input to be { Tu, Tv, Su, Sv }, the model represents text information of the learner u, text information of the learner V, a learner u node and a learner V node respectively, and the calculation flow of the model is as follows:
(1) respectively coding the text information of a learner u, the text information of a learner v, a learner u node and a learner v node, wherein the text information of the learner u and the learner v is coded by BERT, and the initial coding of the learner node is randomly generated;
(2) respectively sending the coded information into a convolution layer to obtain a matrix after convolution;
(3) splicing the convolved matrixes to obtain a matrix T;
(4) obtaining a matrix M and implicit information between the learning text and the text, the nodes and between the learning text and the nodes of the Transfomer by the matrix T through self-attention coding;
(5) finally, multiplying the matrix M by the weight respectively to output ut、vt、us、vsAnd respectively represent a learner u text vector, a learner v text vector, a learner u network structure vector and a learner v network structure vector.
Unlike the can model, which is a model in which text embedding and node embedding are learned separately, the node representation model implemented herein can learn text information and network structure information at the same time through a self-attention layer. The text and network structure information may be mapped into the same vector space. In the model training process, the student pairs u and v are input, and the model is updated. In addition, special data, namely empty node pairs, such as a learner u and an empty node, and a learner v and an empty node pair, are added into the input of the training model, and the learner can be coded independently.
Secondly, the loss function of the CANE model focuses more on the network information of the nodes than on the text information. The model realized by the method emphasizes the optimization target on the text information, and the optimization target of the model is shown in formula 5:
ε=∑e∈EL(e) (5)
the learner vector is composed of a paper information vector and a network structure vector, Ls(e),Lt(e) Respectively, the optimization target based on the network structure and the optimization target based on the text, and the calculation formula is shown as 6:
L(e)=Ls(e)+Lt(e) (6)
optimization objective L for network architectures(e) The neighboring nodes have similar structures, assuming there is an edge between the learner nodes u, vsThe structural vector of the node v is shown in the following specific formula 7:
Ls(e)=wu,v*log p(vs|ua) (7)
optimizing an objective L for textt(e) The model takes the learner u and the paper p as the input of the model to predict whether u writes the paper p, so as to make the node representation of the learner closer to the text representation thereof, and the current learner paper information and the collaborator paper have certain similarity, the specific formula is shown as 10:
Lp(e)=wu,v log p(vt|vs) (8)
Ltt(e)=wu,v log p(vt|ut) (9)
Lt(e)=a*Ltt(e)+β*Lp(e) (10)
wherein v istAnd utArticles of scholars v and u, respectivelyThis representation. The loss function of the model is realized, the text information of a learner is more concerned than the network information, and in the actual test, the node representation of the model learner can obtain better effect in the retrieval.
As an improvement of the invention, step four: the establishment of the learner vector index is as follows:
the learner's retrieval is mainly realized through vector retrieval, and the time complexity of calculating the vector similarity by the cosine value is as follows: and when n and D are larger, the complexity is higher, and the vector index can be built to compress the vector and accelerate the retrieval speed.
(1) Learner node vector indexing;
product quantization with milvus integration[37]The method comprises the steps of dividing an original vector space into different subspaces, clustering each subspace, representing the original vector by using the clustering center of each subspace, and calculating the similarity of the different subspaces to obtain the similarity of the original vector.
The main calculation process of model training is as follows: vector clustering, distance calculation between clustering centers, and mapping of original vectors to the clustering centers, wherein the vector clustering adopts a Kmeans algorithm, and the time complexity of clustering is shown as a formula 11 according to the principle of Kmeans:
O(n)=(1*n*k*d) (11)
wherein 1 is the iteration number, n is the number of vectors, k is the number of clustering centers, d is the dimension of the vectors,
as can be seen from the vector distance calculation formula, the distance time complexity between the cluster centers is o (n) ═ k × d, and the time complexity of the cluster center corresponding to the original vector is calculated as o (n) ═ k × d; the training has a total of m subspaces,
Figure RE-GDA0002995886150000051
d is the dimension of the original vector, and therefore, the total complexity of the training is shown in equation 12:
O(n)=m*(l*n*k*D/m)+m*(k*k*D/m)+m*(n*k*D/d)
=D*k*(n*l+k+n)(12);
as shown in formula 12, when the iteration number 7 and the clustering center k in the clustering model are constant, the time complexity can be simplified to o (n) ═ D × n, so that the time complexity of model training for product quantization is only related to the vector dimensions and the number of vectors, and after the model training is completed, the clustering models in different subspaces are saved, and the distance between the clustering centers is stored in a table for query during vector retrieval.
As an improvement of the invention, step five: the learner vector retrieval is as follows:
after establishing the learner vector index, the user inputs the retrieved subject, the system returns the relevant learner, and the specific retrieval steps are as follows:
(1) vectorizing a user retrieval subject through a learner representation model, and outputting a retrieval vector Q;
(2) reading a learner vector index and a clustering center which is trained by a model;
(3) dividing the retrieval vector Q into m subspaces, and finding out clustering centers corresponding to different subspaces;
(4) the approximate distance between the retrieval vector and the retrieved vector Q in m subspaces is inquired through an index table look-up;
(5) adding the approximate distances of all subspaces to obtain the approximate distance between the retrieval vector Q and the learners to be retrieved;
(6) sorting the approximate distances of all scholars and returning the top n results.
From the above, it can be seen that the product quantization retrieval time complexity is: when D is much larger than m, the time complexity of product quantization is much smaller than that of directly calculating cosine value.
The device comprises a data storage module, a background management module and a user retrieval module; the data storage module stores data required by the system. Learner vector data for learner searches and unstructured data to show the learner, institution and outcome. The storage of data can be divided into two major modules according to data characteristics: the first is a milvus component for storing vector indexes; the ES component stores unstructured data; and the background management module is used for preprocessing the collected student and student scientific research data, constructing a student cooperation network, training a model and establishing a data index. The data preprocessing component reads new data from the database, and the main work includes data cleaning, student cooperative network construction and the like. The model training component is used for training a student node representation model and generating a student vector, and the index component generates a vector index by passing the student vector through a product quantization model; the user retrieval module has the main functions of student retrieval and information display, the retrieval component is responsible for calculating similarity and sequencing, and the visual component copies information display and user interaction functions.
Compared with the prior art, the invention has the following advantages that:
1) the scheme realizes a retrieval model based on word embedding, and compared with the traditional keyword retrieval mode, the retrieval model based on word embedding can make full use of semantic information and improve the recall rate of retrieval.
2) The scheme realizes a student retrieval model based on the cooperative network, and the traditional student retrieval model based on the text only considers the text information and ignores the cooperative relationship of the students, so that the student retrieval model based on the cooperative network has better effect.
3) The scheme adopts a synthesis quantization algorithm to establish the vector index, and has smaller occupied space and higher retrieval speed compared with the traditional vector retrieval.
Drawings
FIG. 1 is a schematic diagram of an algorithm process;
FIG. 2 is a flow diagram of a learner collaboration network element;
FIG. 3 is a partial diagram of a student collaboration network;
FIG. 4 is a schematic diagram of a learner node representation model;
FIG. 5 is a flow chart of vector index training;
FIG. 6 is a schematic diagram of a pseudo code for a product quantization algorithm;
FIG. 7 is a pseudo-code diagram of a product quantization similarity calculation algorithm;
FIG. 8 is a functional block diagram of the system;
FIG. 9 is a system implementation class diagram;
FIG. 10 is a graph of a trainee node representation model comparison;
FIG. 11 is a comparison of P-R curves for different query models;
FIG. 12 is a comparison graph of various model indexes under query;
FIG. 13 is a graph comparing model search times;
Detailed Description
For the purposes of promoting an understanding and appreciation of the invention, reference will now be made in detail to the present embodiments of the invention.
Example 1: a learner retrieval construction method based on a cooperative network comprises the following steps:
the method comprises the following steps: and building a student cooperative network.
Step two: and (5) optimizing the cooperative network structure of the students.
Step three: the learner nodes represent model training.
Step four: and establishing a learner vector index.
Step five: and (5) searching by the scholars.
The schematic diagram of the algorithm process is shown in fig. 1.
Each step is described in detail below.
The method comprises the following steps: the student cooperative network is constructed as follows:
reading scholars data and scholars cooperative output data from a database, comprising:
scholars data (ID, author, organization, age, job title, profile);
scholars collaboratively producing data (ID, title, author, organization, abstract, journal, year, keyword); after the data is read, the data needs to be preprocessed, and a student cooperation network is constructed. The student cooperation network takes the students as nodes, and the quantity of cooperative output of the students is the weight of edges. The network tool kit is mainly used in the building process of the student cooperative network, and the building process is shown in fig. 2.
As can be seen from FIG. 2, the cooperative outcome of the learner, such as data of articles, patents, etc., is first entered. And initializing an adjacency matrix of student cooperation according to the input statistic student number. The participants in each paper or project are then processed in a loop to update the adjacency matrix. And finally outputting the adjacency matrix, namely the learner cooperative network. The effect of the student cooperative network construction is shown in fig. 3.
Step two: the optimization of the cooperative network structure of the scholars is as follows:
the student cooperative network generated by the original data has the problems of data loss, noise and the like, so that the constructed student cooperative network cannot reflect a real student community. And optimizing the network structure in the original cooperative network according to a node similarity algorithm, adding a cooperative relationship among scholars with high similarity, and deleting the cooperative relationship among the scholars with low similarity. The algorithm combines the topological structure of the network, the node attribute information and the path distance between the learners, and the specific process is realized as follows:
step 1) the weights of the edges are normalized logarithmically.
And 2) calculating the topological similarity, the text similarity and the path distance between the scholars pairwise.
And 3) calculating the total similarity between the scholars pairwise, and sorting according to the similarity.
And 4) selecting a scholars relationship with the similarity ranking of 10 percent at the top between the scholars, and adding the scholars cooperative relationship.
And 5) selecting the relationship of the scholars with the similarity ranking of 10 percent, and deleting the cooperative relationship of the scholars.
And 6) updating the student cooperative network and updating the adjacency matrix.
The edge is normalized in step 1) to narrow the gap in the number of collaborations between different scholars. The more the cooperation times, the higher the similarity of the scholars, the smaller the distance between the scholars should be, and the calculation formula is shown in 1:
pij=1/ln(pij+1) (1)
wherein p isijIs the weight of the edge.
The topological Similarity (Topology-based Similarity) represents the Similarity in terms of topological structure between nodes in the graph. If two nodes have a common neighbor node, they are more likely to be similar, having a cooperative relationship. The calculation formula is shown in fig. 2:
topoSim(u,v)=2*|N(u)∩N(v)|/(d(u)+d(v)) (2)
wherein N (u) is the co-worker of the student u, N (u) n (v) is the co-worker of the student u and the student v, d (u) is the degree of the student u, and d (v) is the degree of the student v.
Text similarity represents the similarity of papers between scholars. The more similar the papers published between scholars, the more potential for collaboration. The calculation formula is shown in fig. 3:
Figure RE-GDA0002995886150000091
wherein xuFor the text vector of learner u, the text information is encoded using the BERT model.
The similarity of the learners is composed of the topological structure of the network, node attribute information and the path distance between the learners, and a calculation formula is shown as 5-4:
authorSim(u,v)=textSim(u,v)*topoSim(u,v)/dist(u,v) (4)
wherein authorSim (u, v) is the similarity between the scholars, dist (u, v) is the shortest path distance between the scholars u and v, and when the shortest path between the scholars u and v is calculated, if a path already exists between scholars nodes, the shortest path between the nodes should be calculated after the current path is deleted.
The cooperation relationship of the scholars with low similarity is deleted through an algorithm, the cooperation relationship of the scholars with high similarity is added, and the optimized cooperation network is closer to a real scholars community. Better effects can be achieved in the tests of the student representation model and the student community discovery model.
Step three: the learner node represents model training as follows:
the CANE model fuses text context information into network embedded vectors, and the representation capability of the model on nodes in the network is improved. However, the model focuses on mining the network topology information of the nodes, and the text information is only used as a supplement of the network information. Therefore, the CANE model is suitable for network related tasks such as community discovery, link prediction and the like, and is difficult to apply to information retrieval. In addition, the CANE adopts a bag-of-words model and ignores word sequence information.
A student node representation model suitable for information retrieval is provided by using a BERT model and taking the idea of a CANE model as reference. Network topology information of the learner is added into the original word embedding model to obtain a learner representation vector, and the learner is represented more accurately. Assume a network of student partners G ═ (V, E), where each vertex represents a student and each edge E ═ u, V > represents a relationship between student u and student V. The model is shown in fig. 4, and the input of the model is { Tu, Tv, Su, Sv }, which represents the learner u text information, the learner v text information, the learner u node, and the learner v node, respectively. The calculation flow of the model is as follows:
(1) the text information of the scholars u, the text information of the scholars v, the nodes of the scholars u and the nodes of the scholars v are respectively coded, wherein the text information of the scholars u and v is coded by BERT, and the initial codes of the nodes of the scholars are generated randomly.
(2) And respectively sending the coded information into a convolution layer to obtain a matrix after convolution.
(3) And splicing the convolved matrixes to obtain a matrix T.
(4) The matrix M is obtained by coding the matrix T with self attention, and the Transfomer can learn the implicit information between texts and texts, between nodes and between texts and between nodes.
(5) Finally, multiplying the matrix M by the weight respectively to output ut、vt、us、vsAnd respectively represent a learner u text vector, a learner v text vector, a learner u network structure vector and a learner v network structure vector.
Unlike the can model, which is a model in which text embedding and node embedding are learned separately, the node representation model implemented herein can learn text information and network structure information at the same time through a self-attention layer. The text and network structure information may be mapped into the same vector space. In the model training process, the student pairs u and v are input, and the model is updated. In addition, special data, namely empty node pairs, such as a learner u and an empty node, and a learner v and an empty node pair, are added into the input of the training model, and the learner can be coded independently.
Secondly, the loss function of the CANE model focuses more on the network information of the nodes than on the text information. The model realized by the method emphasizes the optimization target on the text information, and the optimization target of the model is shown in formula 5:
ε=∑e∈EL(e) (5)
the learner vector is composed of a paper information vector and a network structure vector, Ls(e),Lt(e) Respectively, the optimization target based on the network structure and the optimization target based on the text, and the calculation formula is shown as 6:
L(e)=Ls(e)+Lt(e) (6)
optimization objective L for network architectures(e) The neighboring nodes have similar structures, assuming there is an edge between the learner nodes u, vsThe structural vector of the node v is shown in the following specific formula 7:
Ls(e)=wu,v*log p(vs|us) (7)
optimizing an objective L for textt(e) In that respect The model takes scholars u and papers p as input of the model to predict whether u writes papers p. The objective is to make the node representation of the scholars closer to the text representation thereof, and the current scholars 'papers information and collaborator's papers have certain similarity, then the specific formula is shown as 10:
Lp(e)=wu,v log p(vt|vs) (8)
Ltt(e)=wu,v log p(vt|ut) (9)
Lt(e)=α*Ltt(e)+β*Lp(e) (10)
wherein v istAnd utTextual representations of scholars v and u, respectively. The loss function of the model is realized in the text, and the text information of a learner is more concerned than the network information, so that the loss function of the model is in actual testThe model learner node shows that better effect can be obtained in the search.
Step four: the establishment of the learner vector index is as follows:
the learner's retrieval is mainly realized through vector retrieval, and the time complexity of calculating the vector similarity by the cosine value is as follows: 0(n) ═ n × D, n is the number of vectors, D is the dimension of the vectors, and the complexity is higher when n and D are larger. The vector index is established, so that the vector can be compressed, and the retrieval speed is accelerated.
(1) Student node vector indexing
Product quantization with milvus integration[37]The method comprises the steps of dividing an original vector space into different subspaces, clustering each subspace, representing the original vector by using the clustering center of each subspace, and calculating the similarity of the different subspaces to obtain the similarity of the original vector.
The vector index training process is shown in fig. 5, and the main calculation processes of model training include: vector clustering, distance calculation between clustering centers, and mapping of original vectors to the clustering centers. The vector clustering adopts a Kmeans algorithm, and the time complexity of clustering is shown in formula 11 according to the principle of Kmeans:
O(n)=(1*n*k*d) (11)
wherein 1 is iteration times, n is vector number, k is the number of clustering centers, and d is the dimensionality of the vector.
As can be seen from the vector distance calculation formula, the time complexity of the distance between the cluster centers is o (n) ═ k × d, and the time complexity of the cluster center corresponding to the original vector is o (n) ═ k × d. The training has a total of m subspaces,
Figure RE-GDA0002995886150000111
d is the dimension of the original vector. It can be seen that the total complexity of training is shown in equation 12:
O(n)=m*(l*n*k*D/m)+m*(k*k*D/m)+m*(n*k*D/d)
=D*k*(n*l+k+n)(12)
as shown in equation 12, when the iteration number 1 and the cluster center k in the cluster model are constant, the time complexity can be reduced to o (n) ═ D × n, so the time complexity of model training for product quantization is only related to the vector dimension and the number of vectors. After the model training is finished, the clustering models of different subspaces are stored, and the distance between the clustering centers is stored in a table for query during vector retrieval.
Algorithm 1 the pseudo code of the product quantization algorithm is shown in fig. 6.
Step five: the learner vector retrieval is as follows:
after establishing the learner vector index, the user inputs the retrieved subject, the system returns the relevant learner, and the specific retrieval steps are as follows:
(1) vectorizing the user retrieval theme through the learner representation model, and outputting a retrieval vector Q.
(2) And reading the learner vector index and the clustering center which is trained by the model.
(3) And dividing the retrieval vector Q into m subspaces, and finding out clustering centers corresponding to different subspaces.
(4) And querying the approximate distance between the retrieval vector and the retrieved vector Q in m subspaces through an index lookup table.
(5) The sum of the approximate distances of all subspaces is the approximate distance between the retrieval vector Q and the learners to be retrieved.
(6) Sorting the approximate distances of all scholars and returning the top n results.
From the above, it can be seen that the product quantization retrieval time complexity is: when D is much larger than m, the time complexity of product quantization is much smaller than the direct computation of cosine values.
Algorithm 2 pseudo code of the product quantization similarity calculation algorithm is shown in fig. 7.
Example 2: referring to fig. 8, a learner search building apparatus based on a collaboration network is shown in fig. 8, and the overall functional structural design of the system may be divided into a data storage module, a background management module, and a user search module according to functions. This section will explain the functional design of the three modules.
1. And a data storage module. And storing data required by the system. Learner vector data for learner searches and unstructured data to show the learner, institution and outcome. The storage of data can be divided into two major modules according to data characteristics: the first is a milvus component for storing vector indexes; the second is the ES component that stores unstructured data.
2. And a background management module. Preprocessing collected student and student scientific research data, constructing a student cooperation network, training a model and establishing a data index. The data preprocessing component reads new data from the database, and the main work includes data cleaning, student cooperative network construction and the like. The model training component trains the learner node representation model and generates a learner vector. The indexing component generates a vector index by passing the learner vector through a product quantization model.
3. And a user retrieval module. The main functions are the functions of student retrieval and information display. The retrieval component is responsible for calculating similarity and sequencing, and the visualization component duplicates information display and user interaction functions.
The specific functions of each class are described below:
the View class mainly has the functions of analyzing data, rendering the data into a visual page and presenting the data to a user.
The Controller class is mainly used for receiving the user request parameters, analyzing the parameters, calling the related services in the system and returning the results to the View class.
And 3, the User class is used for encapsulating User data and controlling the login registration and the authority control of the User.
The role of the textPreprocess class is the preprocessing work of the data, such as data consistency check, student cooperative network construction and the like.
The Model class is mainly responsible for the training and saving of the Model.
The role of the Index class is the versioning of the model and the loading of the model.
Timing tasks for Timer-like systems, such as training and updating of models.
The Connect class is responsible for the linking and reading of data, e.g. mysql, ES etc. linking of databases.
The Index class is responsible for the loading of the model and Index.
The Retrieval class is the key point of Retrieval, and is mainly responsible for calculating the similarity and sequencing the results.
Experiment design and result analysis:
in order to prove the advancement of the search method based on the student cooperative network, a contrast experiment is designed to compare the experimental effect.
And (3) testing environment:
1. hardware environment
CPU model: inter (R) core (TM) i5-6500CPU
Memory capacity: 16.0GB
Hard disk information: 240GB SSD
2. Software environment
Operating the system: microsoft Windows 10
A database: milvus, Mysql, ES
Developing a tool: pycharm (Python IDE)
Programming language: python 3.6
The browser: chrome 75.0.3770.100
3. Test tools and data
Testing the tool: pytest, BurnTest, jmeter, etc
cora data set: 2277 nodes, 2277 text messages, 5214 edges
Data set collected by the system: 6 thousands of scholars, 161 universities (including treatises, patents, projects, etc.), 80 thousands of scholars
And (3) testing a system algorithm:
the system algorithm test aims to verify the advancement and feasibility of the algorithm. The main algorithms tested are: the learner node representation model and the learner retrieval model based on word embedding.
(1) Student node representation model testing
The learner node vector representation model test is to verify whether the model represents the learner more accurately. The data set of the model test is the data set collected by the system. In the word embedding model, words with similar meanings are distributed relatively closely in space. Therefore, in the vector space, the scholars can judge the quality of the model through the spatial distribution. The contour factor is one of the evaluation methods, and it combines two factors of cohesion and separation to evaluate the model. The formula for calculating the profile coefficients for a certain vector i in a cluster is shown in fig. 11:
S(i)=(b(i)-a(i))/max{a(i),b(i)} (11)
where a (i) is the average of the distances of the i vector to other points in the same cluster, and b (i) is the minimum of the distances of the i vector to other clusters. It can be seen that the value of the profile factor is between [ -1, 1], and that approaching 1 means that both the cohesion and the separation are relatively good. And calculating the average value of the contour coefficients of all the vectors, namely the evaluation index of the model.
Table 1 shows that the learner node representation model implemented in this document can obtain a higher contour coefficient in the dimension of a different vector than other models in a comparison experiment with other models, which indicates that the clustering effect is better than that of other models.
TABLE 1 Profile factor comparison Table
Figure RE-GDA0002995886150000141
The vector space is reduced to two dimensions through the TSNE algorithm, and the model effect is observed more visually and conveniently. The specific effect is shown in fig. 10, wherein the students of different disciplines are distributed in a cluster shape in the vector space, and the clustering effect is automatically formed. Compared with the BERT model, the model has the advantages that under the same theme, the gathering effect of the scholars is better, the coincidence degree of the scholars in different disciplines is lower, and the discrimination is large. The model realized by the method adds node information on the basis of word embedding, and students in the same discipline have more cooperative relationships. Therefore, the students in the same discipline are closer to each other in space in vector representation, and the clustering effect is better.
(2) A learner search model based on word embedding;
the learner search model test based on word embedding is used for verifying whether the model can ensure the search precision and speed compared with the traditional search mode. The test data of the model is a data set collected by the system. Results can be classified into four categories in a single query test: retrieved relevant RRs, retrieved irrelevant RNs, not retrieved relevant NRs and not retrieved irrelevant NNs. Precision ratio and recall ratio of the search result can be defined, and the calculation formula is shown as 12 and 13:
Figure RE-GDA0002995886150000151
Figure RE-GDA0002995886150000152
on the basis of given evaluation indexes, comparison experiments of a learner retrieval model based on word embedding and a traditional retrieval model are designed respectively. Neither precision alone nor recall reflects the performance of the search, while PR curves can integrate precision and recall to evaluate the model. The PR curve can more accurately represent the performance of the search, and the test result is shown in fig. 11. The experimental result shows that the precision ratio is superior to or equal to that of the traditional retrieval mode under different conditions under the condition that the precision ratio is almost equal in the retrieval mode based on word embedding. With the improvement of the recall ratio, higher precision ratio can still be kept. Overall, the upper curve in the PR graph represents the superior system performance.
And respectively calculating the P @5, the P @10 and the P @20 of the result for the query result. P @ n is the precision of the first n returned results, R-pre is the recall, and the experimental results are shown in FIG. 12. The experimental results show that the word embedding-based retrieval mode can ensure higher accuracy under the condition of different retrieval result numbers, and the recall ratio is superior to other models.
The performance of the different models was verified using different data scales and the test results are shown in fig. 13. The search model based on word embedding can meet the requirements of users although the search time is slightly higher than other models.
Through comparison of the three experiments, the retrieval mode based on word embedding can meet the requirement on precision, meanwhile, the recall ratio of retrieval is improved, but the retrieval speed is slightly insufficient. The traditional retrieval model adopts a keyword matching mode and adopts a bag-of-words model, so that different but related data cannot be retrieved. The learner retrieval model based on the word embedding adopts a word embedding mode, makes full use of semantic information, can retrieve related data, and improves the recall ratio of the retrieval model.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.

Claims (7)

1. A learner retrieval method based on a cooperative network, the method comprising the steps of:
the method comprises the following steps: building a student cooperative network;
step two: optimization of a student cooperative network structure;
step three: training a student node representation model;
step four: establishing a learner vector index;
step five: and (5) searching by the scholars.
2. The collaborating network based learner retrieval method according to claim 1, wherein the first step: building a student cooperative network; the method comprises the following specific steps: reading scholars data and scholars cooperative output data from a database, comprising: student data ID, author, institution, age, job title, profile;
the scholars collaboratively produce data including ID, title, author, organization, abstract, periodical, year and keyword; after data is read, preprocessing is needed to be carried out on the data, a student cooperation network is constructed, the student cooperation network takes students as nodes, the quantity of the student cooperation output is the weight of edges, a network tool kit is mainly used in the construction process of the student cooperation network, the cooperation output of the students is firstly input, then the student cooperation output is counted according to the input quantity, an adjacency matrix of student cooperation is initialized, then participants of each paper or project are processed in a circulating mode, the adjacency matrix is updated, and finally the output adjacency matrix is the student cooperation network.
3. The learner's search method based on cooperative network as claimed in claim 1, wherein step two: the optimization of the cooperative network structure of the scholars is as follows:
step 1) using the weights of the logarithmically normalized edges,
step 2) calculating the topological similarity, the text similarity and the path distance between the scholars pairwise,
step 3) calculating the total similarity between the scholars pairwise, sorting according to the similarity,
step 4) selecting the relationship of scholars with the similarity ranking of 10 percent, adding the cooperative relationship of scholars,
step 5) selecting the relationship of scholars with the similarity ranking of 10 percent, deleting the cooperative relationship of scholars,
step 6) updating the student cooperative network, updating the adjacent matrix,
the edge is normalized in step 1) to narrow the difference of the number of times of cooperation between different scholars, the more the number of times of cooperation, the higher the degree of similarity of the scholars, the smaller the distance between the scholars should be, and the calculation formula is shown in 1:
pij=1/ln(pij+1) (1)
wherein p isijIs the weight of the edge;
the topological-based Similarity (Topology) represents the Similarity between nodes in a graph in terms of topological structure, and if two nodes have common neighbor nodes, the two nodes may be similar and have a cooperative relationship, and the calculation formula is shown in fig. 2:
topoSim(u,v)=2*|N(u)∩N(v)|/(d(u)+d(v)) (2)
wherein N (u) is the co-worker of the student u, N (u) n (v) is the co-worker of the student u and the student v, d (u) is the degree of the student u, d (v) is the degree of the student v;
the text similarity represents the similarity of papers among scholars, the more similar the papers published among scholars are, the more potential cooperative relationship exists, and the calculation formula is shown as 3:
Figure FDA0002821915430000021
wherein xuCoding text information by adopting a BERT model for a text vector of a learner u;
the similarity of the learners is composed of the topological structure of the network, node attribute information and the path distance between the learners, and the calculation formula is shown as 4:
authorSim(u,v)=textSim(u,v)*topoSim(u,v)/dist(u,v) (4)
wherein authorSim (u, v) is the similarity between the scholars, dist (u, v) is the shortest path distance between the scholars u and v, and when the shortest path between the scholars u and v is calculated, if a path already exists between scholars nodes, the shortest path between the nodes should be calculated after the current path is deleted.
4. The learner's search method based on cooperative network as claimed in claim 1, wherein step three: training a student node representation model; the method comprises the following specific steps:
the method includes the steps that network topology information of a learner is added into an original word embedding model to obtain a learner representation vector and more accurately represent the learner, a learner partner network G is set to be (V, E), each vertex represents the learner, each edge E is set to be (u, V) to represent the relationship between the learner u and the learner V, the model is input to be { Tu, Tv, Su, Sv }, the model represents text information of the learner u, text information of the learner V, a learner u node and a learner V node respectively, and the calculation flow of the model is as follows:
(1) respectively coding the text information of a learner u, the text information of a learner v, a learner u node and a learner v node, wherein the text information of the learner u and the learner v is coded by BERT, and the initial coding of the learner node is randomly generated;
(2) respectively sending the coded information into a convolution layer to obtain a matrix after convolution;
(3) splicing the convolved matrixes to obtain a matrix T;
(4) obtaining a matrix M and implicit information between the learning text and the text, the nodes and between the learning text and the nodes of the Transfomer by the matrix T through self-attention coding;
(5) finally, multiplying the matrix M by the weight respectively to output ut、vt、us、vsRespectively representing a learner u text vector, a learner v text vector, a learner u network structure vector and a learner v network structure vector;
the model realized by the method emphasizes the optimization target on the text information, and the optimization target of the model is shown in formula 5:
ε=∑e∈EL(e) (5)
the learner vector is composed of a paper information vector and a network structure vector, Ls(e),Lt(e) Respectively, the optimization target based on the network structure and the optimization target based on the text, and the calculation formula is shown as 6:
L(e)=Ls(e)+Lt(e) (6)
optimization objective L for network architectures(e) The neighboring nodes have similar structures, assuming there is an edge between the learner nodes u, vsThe structural vector of the node v is shown in the following specific formula 7:
Ls(e)=wu,v*log p(vs|us) (7)
optimizing an objective L for textt(e) If the model takes the scholars u and the paper p as the input of the model to predict whether u writes the paper p, the specific formula is shown as 10:
Lp(e)=wu,vlog p(vt|vs) (8)
Ltt(e)=wu,vlog p(vt|ut) (9)
Lt(e)=α*Ltt(e)+β*Lp(e) (10)
wherein v istAnd utTextual representations of scholars v and u, respectively.
5. The learner's search method based on cooperative network as set forth in claim 1, wherein the fourth step: the establishment of the learner vector index is as follows:
the learner's retrieval is mainly realized through vector retrieval, and the time complexity of calculating the vector similarity by the cosine value is as follows: when n and D are larger, the complexity is higher, and the vector index can be built to compress the vector and accelerate the retrieval speed;
(1) learner node vector indexing;
the learner vector is indexed by a milvus integrated product quantification method, the algorithm principle is that an original vector space is divided into different subspaces, each subspace is clustered, then the original vector is represented by the clustering center of each subspace, and the similarity of the original vector can be obtained by calculating the similarity of the different subspaces;
the main calculation process of model training is as follows: vector clustering, distance calculation between clustering centers, and mapping of original vectors to the clustering centers, wherein the vector clustering adopts a Kmeans algorithm, and the time complexity of clustering is shown as a formula 11 according to the principle of Kmeans:
O(n)=(l*n*k*d) (11)
wherein 1 is the iteration number, n is the number of vectors, k is the number of clustering centers, d is the dimension of the vectors,
as can be seen from the vector distance calculation formula, the distance time complexity between the cluster centers is o (n) ═ k × d, and the time complexity of the cluster center corresponding to the original vector is calculated as o (n) ═ k × d; the training has a total of m subspaces,
Figure FDA0002821915430000041
d is the dimension of the original vector, and therefore, the total complexity of the training is shown in equation 12:
O(n)=m*(l*n*k*D/m)+m*(k*k*D/m)+m*(n*k*D/d)
=D*k*(n*l+k+n) (12);
as shown in formula 12, when the iteration number 1 and the clustering center k in the clustering model are constant, the time complexity can be simplified to o (n) ═ D × n, so that the time complexity of model training for product quantization is only related to the vector dimensions and the number of vectors, and after the model training is completed, the clustering models in different subspaces are saved, and the distance between the clustering centers is stored in a table for query during vector retrieval.
6. The scholars search method based on cooperative network as claimed in claim 1, characterized by the step five: the learner vector retrieval is as follows:
after establishing the learner vector index, the user inputs the retrieved subject, the system returns the relevant learner, and the specific retrieval steps are as follows:
(1) vectorizing a user retrieval subject through a learner representation model, and outputting a retrieval vector Q;
(2) reading a learner vector index and a clustering center which is trained by a model;
(3) dividing the retrieval vector Q into m subspaces, and finding out clustering centers corresponding to different subspaces;
(4) the approximate distance between the retrieval vector and the retrieved vector Q in m subspaces is inquired through an index table look-up;
(5) adding the approximate distances of all subspaces to obtain the approximate distance between the retrieval vector Q and the learners to be retrieved;
(6) sorting the approximate distances of all scholars and returning the first n results;
from the above, it can be seen that the product quantization retrieval time complexity is: when D is much larger than m, the time complexity of product quantization is much smaller than the direct computation of cosine values.
7. The device for realizing the student search method based on the cooperative network as claimed in any one of the claims 1 to 6, characterized in that the device comprises a data storage module, a background management module and a user search module;
the data storage module is used for storing data required by the system, learner vector data for learner retrieval and unstructured data for demonstrating students, mechanisms and achievements, and the data storage module can be divided into two modules according to data characteristics: the first is a milvus component for storing vector indexes; the ES component stores unstructured data;
the background management module is used for preprocessing collected scientific research data of scholars and scholars, constructing a scholars cooperation network, training a model and establishing a data index, the data preprocessing component reads new data from a database, the main work is data cleaning, the scholars cooperation network is constructed, the model training component is a model for training a scholars node representation and generating a scholars vector, and the scholars vector passes through a product quantization model during indexing the component to generate a vector index;
the user retrieval module has the main functions of student retrieval and information display, the retrieval component is responsible for calculating similarity and sequencing, and the visual component copies information display and user interaction functions.
CN202011420372.1A 2020-12-07 2020-12-07 Student retrieval method and device based on cooperative network Pending CN112732889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011420372.1A CN112732889A (en) 2020-12-07 2020-12-07 Student retrieval method and device based on cooperative network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011420372.1A CN112732889A (en) 2020-12-07 2020-12-07 Student retrieval method and device based on cooperative network

Publications (1)

Publication Number Publication Date
CN112732889A true CN112732889A (en) 2021-04-30

Family

ID=75598317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011420372.1A Pending CN112732889A (en) 2020-12-07 2020-12-07 Student retrieval method and device based on cooperative network

Country Status (1)

Country Link
CN (1) CN112732889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420328A (en) * 2021-06-23 2021-09-21 鹤壁国立光电科技股份有限公司 Big data batch sharing exchange system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295903A1 (en) * 2010-05-28 2011-12-01 Drexel University System and method for automatically generating systematic reviews of a scientific field
WO2017210949A1 (en) * 2016-06-06 2017-12-14 北京大学深圳研究生院 Cross-media retrieval method
CN110717043A (en) * 2019-09-29 2020-01-21 三螺旋大数据科技(昆山)有限公司 Academic team construction method based on network representation learning training
CN110929044A (en) * 2019-12-03 2020-03-27 山西大学 Community detection method and device for academic cooperation network
CN111078873A (en) * 2019-11-22 2020-04-28 北京市科学技术情报研究所 Domain expert selection method based on citation network and scientific research cooperation network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295903A1 (en) * 2010-05-28 2011-12-01 Drexel University System and method for automatically generating systematic reviews of a scientific field
WO2017210949A1 (en) * 2016-06-06 2017-12-14 北京大学深圳研究生院 Cross-media retrieval method
CN110717043A (en) * 2019-09-29 2020-01-21 三螺旋大数据科技(昆山)有限公司 Academic team construction method based on network representation learning training
CN111078873A (en) * 2019-11-22 2020-04-28 北京市科学技术情报研究所 Domain expert selection method based on citation network and scientific research cooperation network
CN110929044A (en) * 2019-12-03 2020-03-27 山西大学 Community detection method and device for academic cooperation network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S RAO CHINTALAPUDI 等: "A survey on community detection algorithms in large scale real world networks", 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 4 May 2015 (2015-05-04), pages 1323 - 1327 *
张玉洁 等: "组推荐系统及其应用研究", 计算机学报, vol. 39, no. 4, 30 April 2016 (2016-04-30), pages 745 - 764 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420328A (en) * 2021-06-23 2021-09-21 鹤壁国立光电科技股份有限公司 Big data batch sharing exchange system
CN113420328B (en) * 2021-06-23 2023-04-28 鹤壁国立光电科技股份有限公司 Big data batch sharing exchange system

Similar Documents

Publication Publication Date Title
CN109271506A (en) A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning
CN109885672A (en) A kind of question and answer mode intelligent retrieval system and method towards online education
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN111444348A (en) Method, system and medium for constructing and applying knowledge graph architecture
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN114238653B (en) Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN115526590B (en) Efficient person post matching and re-pushing method combining expert knowledge and algorithm
CN115982338B (en) Domain knowledge graph question-answering method and system based on query path sorting
CN116975256B (en) Method and system for processing multisource information in construction process of underground factory building of pumped storage power station
CN117743315B (en) Method for providing high-quality data for multi-mode large model system
CN112131261A (en) Community query method and device based on community network and computer equipment
CN113988071A (en) Intelligent dialogue method and device based on financial knowledge graph and electronic equipment
CN114329181A (en) Question recommendation method and device and electronic equipment
CN113066358B (en) Science teaching auxiliary system
CN114511085A (en) Entity attribute value identification method, apparatus, device, medium, and program product
CN112396092B (en) Crowdsourcing developer recommendation method and device
CN117875412A (en) Method for constructing computer education knowledge graph based on knowledge graph
CN118035440A (en) Enterprise associated archive management target knowledge feature recommendation method
CN112732889A (en) Student retrieval method and device based on cooperative network
CN117391497A (en) News manuscript quality subjective and objective scoring consistency evaluation method and system
CN117494760A (en) Semantic tag-rich data augmentation method based on ultra-large-scale language model
CN115660695A (en) Customer service personnel label portrait construction method and device, electronic equipment and storage medium
CN112200474A (en) Teaching quality evaluation method, terminal device and computer readable storage medium
Yu et al. The application of data mining technology in employment analysis of university graduates
CN117993876B (en) Resume evaluation system, method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination