CN117314266B - Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism - Google Patents

Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism Download PDF

Info

Publication number
CN117314266B
CN117314266B CN202311623569.9A CN202311623569A CN117314266B CN 117314266 B CN117314266 B CN 117314266B CN 202311623569 A CN202311623569 A CN 202311623569A CN 117314266 B CN117314266 B CN 117314266B
Authority
CN
China
Prior art keywords
talent
data
talents
scientific
hypergraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311623569.9A
Other languages
Chinese (zh)
Other versions
CN117314266A (en
Inventor
邹赛
周予
刘耀徽
陈镜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University
Original Assignee
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University filed Critical Guizhou University
Priority to CN202311623569.9A priority Critical patent/CN117314266B/en
Publication of CN117314266A publication Critical patent/CN117314266A/en
Application granted granted Critical
Publication of CN117314266B publication Critical patent/CN117314266B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Abstract

The invention relates to the technical field of scientific and technological talent evaluation, and discloses a novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism, which comprises the following steps: s1, constructing a science and technology talent knowledge hypergraph; s2, introducing an attention mechanism to design a scientific and technological talent classification evaluation network model; s3: and constructing a scientific and technological talent intelligent recommendation model. The novel intelligent evaluation method for the talents based on the hypergraph attention mechanism can automatically complement the defects of the talents information data, realize the dynamic evaluation and accurate recommendation of the omnibearing talents from the angles of multidimensional degree, multi-space, multi-angle, exhibition and comparison, further improve the scientificity, the speciality and the objectivity of the talents evaluation and provide powerful support for the intelligent management and the application of the talents data of human units.

Description

Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism
Technical Field
The invention relates to the technical field of scientific and technological talent evaluation, in particular to a novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanisms.
Background
The evaluation of the talents is an important content of basic system and deepening innovation of the talents, and is important to cultivating high-level talents, producing high-quality scientific research achievements and creating good innovative environment. However, constructing a scientific and effective talent assessment system is highly desirable to solve three major problems:
(1) talent data fragmentation, lack of dynamic diversified technology talent database
The current technology talent information data fragmentation lacks a diversified and dynamic database; leading to lack of scientificity, objectivity, authenticity and systemicity in talent assessment. Therefore, a full-body, multi-space-time and multi-dimensional dynamic database with large category span and complicated talent hierarchical structure is established by multi-source data analysis and talent evaluation core indexes.
(2) Incomplete evaluation data and difficult quantification of evaluation system indexes
The traditional talent evaluation has the problems of single existence form, undirected serious and the like, and the multi-objective performance, the dynamic performance, the traceability, the working suitability and the exposable performance in the talent evaluation cannot be reflected, so that the scientific talent evaluation result is inconsistent with the actual existence of an evaluation object.
(3) The evaluation result is single, and accurate recommendation of scientific talents is difficult to realize
The results of the current scientific and technological talent evaluation are only subjected to simple data statistics, the evaluation data are single in comparison, and the evaluation results mainly show basic indexes such as evaluation scores, grades, parameter evaluation rates and the like; the lack of multi-dimensional interactive analysis of the evaluation results cannot provide personalized accurate recommendations to the human entity.
Disclosure of Invention
The invention aims to provide a novel intelligent scientific and technological talent assessment method based on a hypergraph attention mechanism, which solves the problems set forth in the background technology.
In order to achieve the above purpose, the invention provides a novel intelligent evaluation method for talents of science and technology based on hypergraph attention mechanism, which comprises the following steps:
s1, constructing a science and technology talent knowledge hypergraph: extracting data from multi-source talent big data to form a talent database, preprocessing the data of the talent database, and designing a structure and a model of a knowledge graph after preprocessing;
s2, introducing an attention mechanism to design a scientific and technological talent classification evaluation network model: s1, establishing a science and technology talent knowledge hypergraph, introducing an attention mechanism to learn and identify, and establishing a science and technology talent classification evaluation network model;
s3: constructing a scientific and technological talent intelligent recommendation model: and on the basis of the S2 technology talent classification evaluation network model, constructing a technology talent intelligent recommendation model by adopting a collaborative filtering recommendation algorithm based on a knowledge graph.
Preferably, step S1 comprises the steps of:
s101, acquiring large data of scientific and technological talents;
s102, preprocessing large data of scientific and technological talents;
s103, determining a hypergraph basic structure based on the entity, the relation and the attribute in the scientific and technological talent data.
Preferably, step S101 includes the steps of:
s1011, determining a data input source: the data input sources comprise talent networks, academic institutions, research centers and professional social networks, and talent information is obtained from innovation values, capabilities and contribution aspects of scientific and technological talents;
s1012, determining an acquisition technology: for streaming data, adopting a kafka technology to acquire data, adopting a parallel crawler technology to acquire network unstructured data, and adopting an sqoop technology to extract traditional database structured data;
s1013, determining a big data storage technique: and storing big data by adopting an HDFS distributed file system and an Hbase column database.
Preferably, step S102 includes the steps of:
s1021, data cleaning, filling or deleting data missing values, and ensuring the integrity of data; for a numerical feature, filling the missing value by using a mean value, a median value or a mode value, for time series data or ordered data, filling the missing value by using a previous observation value or a subsequent observation value, if correlation exists between the data, estimating the missing value by using an interpolation method, deleting the whole row of data containing the missing value under the condition that the missing value of the data is less or is not important for an analysis task, deleting the whole feature if the data of a certain feature is the missing value, and deleting the continuous data segment if a plurality of continuous data points are missing;
s1022, converting the multi-mode data, creating new features or converting existing features; creating new features according to domain knowledge, capturing potential modes of data, selecting the most relevant features, scaling feature values into similar ranges, and finally converting classification variables into numbers or single-hot codes.
Preferably, step S103 includes the steps of:
s1031, analyzing multiple attributes of the scientific and technological talents in scientific researches, wherein the multiple attributes comprise academic levels of the scientific and technological talents, reputation and awareness of the scientific and technological talents in academia, working experience and professional histories in scientific research projects, application and technical innovation of research papers and patents published in academic journals and conferences, influence degree of the research achievements on society and industry, and collaborative capability of the scientific and technological talents in multidisciplinary and cross-domain teams;
s1032, according to the extracted talent multiple attribute information, extracting and integrating entities, relations and attributes by using NER, NLTK, stanford NLP and GATE technologies, identifying the entities in a database, establishing the relations among the entities and extracting important attribute information, and constructing technological talent information;
s1033, abstract the multidimensional attribute of the technology talent information into nodes in the knowledge graph according to the knowledge graph construction principle, connecting the nodes according to the evaluation mechanisms of different periods, associating the evaluation results of different periods as superedges, and defining the technology talent supergraph asBy node set->And (2) side set->And hyperedge set->Composition, thus hypergraph->Expressed as:
wherein each edgeAssigning weights to each edge according to evaluation mechanism attribute bias of different periodsEach overrun->According to the bias of the evaluation results in different periods, weight is distributed to each superside,/>And->Hypergraph +.f. for representing importance of connection relationship in whole hypergraph, after weight is introduced>Expressed as:
and->Weight diagonal matrix representing edges and superedges respectively,
wherein the method comprises the steps ofRepresenting the number of elements in the set;
the structure of hypergraph uses an associated matrixDescription of:
preferably, step S2 includes:
s201, introducing a graph embedding technology, using a graph convolution neural network and one-hot coding, converting each node and associated information thereof in a hypergraph into a vector form, and expressing the obtained vector in a matrix form as network input;
s202, an input vector matrix is embedded into a trainable embedding layer, each element in input data is mapped into a high-dimensional vector space, and CTransR is used for carrying out embedded learning on nodes and supersides, so that a scoring function is defined:
wherein the method comprises the steps ofAnd->For embedded entities learned under corresponding hyperedges, < +.>To be in specific entity pair->The following is about a specific relationship>Is embedded with a vector of triplet head entity,>to be in specific entity pair->The following is about a specific relationship>Is embedded with a vector by the triplet tail entity,>for entity pair->The learned superside embeds the relation vector, +.>And->Representing a first norm and a second norm, respectively, < ->Relation vector for constraint clustering>Is +.>Similarity between->For adjusting the influence of the constraint on the scoring function;
s203, performing convolution operation on the output of the embedded layer by using the 1D convolution check, and further extracting characteristic information;
s204, performing attention mechanism operation on the output of the step S203, calculating the characteristic weight by using cosine similarity, and in the first stepIn the layer, the embedding vector of each superside obtained according to the embedding layer +.>Calculate +.>Weighted embedding vector of individual nodes:
representing the +.f in the attention mechanism diagram neural network>The hyperedge weight of the layer,/>Indicate->Layer->Embedding vectors by the superedges of the individual nodes;
according to the firstWeighted embedding vector calculation of individual nodes first order neighbor node set +.>Middle->Cosine similarity of individual nodes:
is->Layer->Weighting the embedded vectors by the individual nodes;
further obtaining the attention coefficient of the first-order neighbor node:
is->Layer->Personal node and->Cosine similarity among neighboring nodes;
obtainingNode in layer->Is embedded in:
wherein the method comprises the steps ofAs a sigmoid function,/>Is->Node in layer->Is embedded in the memory;
s205, performing dimension reduction operation on the attention layer output by using maximum pooling, and extracting representative feature vectors;
s206, connecting the output of the pooling layer into the full-connection layer, and mapping the extracted feature vector to a target space by using a LeakyReLU activation function to obtain the output of the classifier;
s207, measuring the error between the prediction result and the sample label by using the cross entropy loss function, optimizing and training the model, and training the sampleThe true probability distribution is +.>Its predictive probability distribution is +.>The loss function is:
wherein the method comprises the steps ofIs the sample type;
s208, updating weight parameters of the network by using an optimizer, minimizing a loss function to achieve the aim of optimization, and continuously training the network by using a training set to achieve the best effect;
s209, determining a root cause hyper-parameter affecting model classification by using a fish bone analysis method: aiming at the situation that the classification accuracy of the person in the model is insufficient, judging the output attribute by using a fishbone analysis method, determining the problem of setting parameters, obtaining a super-parameter setting decision, and improving the overall classification performance of the model; setting the skeleton class attribute threshold to be 0.6, setting the branch class attribute threshold to be 0.8, if the data exceeds the threshold, taking the data into the root cause,
wherein the method comprises the steps ofRepresenting fish bone assay, ->Representing root cause attribute, ++>Representing branch attributes->Representing attribute values; if the value in the root cause attribute exceeds the threshold value of 0.6, the attribute is expressed asThe method comprises the steps of carrying out a first treatment on the surface of the If the value in the root cause attribute is less than 0.6, then the attribute is expressed as +.>The root hyper-parameter influence of the classification capability of the model is determined through the judgment, the root hyper-parameter influence is used as an adjustment target of the network model, and the aim of optimal classification evaluation is achieved by changing relevant parameters in the network through the target.
Preferably, step S3 includes:
s301: obtaining feature vectors of different technical talents based on talent features in the technical talent knowledge hypergraph, and constructing a technical talent matrixMeanwhile, a post demand matrix is built>,/>The construction mode of (2) is as follows:
wherein,
the construction mode of (2) is as follows:
wherein,
s302, acquiring a science and technology talent matrixAnd human unit demand matrix->Thereafter, utilizeJaccardSimilarity coefficient calculates the similarity between the technology talent matrix and the demand matrix:
s303, selecting the first K scientific talents with the highest similarity with the post requirements to form a nearest neighbor set U through a K nearest neighbor algorithm;
s304, recommending the first N technological talents from the nearest neighbor set U to the target post by adopting a TOP-N method, and realizing intelligent recommendation of the technological talents.
Therefore, the novel intelligent talent assessment method based on hypergraph attention mechanism has the following beneficial effects:
(1) According to the invention, through multi-source data analysis and talent evaluation core indexes, a full-body, multi-space-time and multi-dimensional dynamic database with large category span and complicated talent hierarchical structure is established;
(2) The invention adopts hypergraph technology to extract hidden multi-element and multi-dimensional relationship of the technology talents, selects key indexes by using an attention mechanism, constructs an intelligent technology talent classification evaluation model, has more comprehensive evaluation data, and quantifies evaluation system indexes;
(3) Based on the intelligent evaluation based on hypergraph attention mechanism, the invention uses Jaccard similarity coefficient to measure the similarity between talents and posts, and obtains scientific and technological talent information which is most matched with the demands of institutions and human units by means of collaborative filtering recommendation algorithm, thereby achieving the aim of accurate recommendation.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a flow chart of a method for intelligent evaluation of talents based on hypergraph attention mechanisms.
Detailed Description
The technical scheme of the invention is further described below through the attached drawings and the embodiments.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
Examples
As shown in fig. 1, the novel intelligent talent assessment method based on hypergraph attention mechanism comprises the following steps:
s1: and constructing a science and technology talent knowledge hypergraph. Firstly, aiming at the problems of scattered data, fragility, lack of integrity and systematicness of the talents information of the current technology, adopting the techniques of crawlers, kafka and sqoop to extract the data of multi-source talents generated by platforms such as talent networks, academic institutions, research centers and the like, and forming a talent database; secondly, aiming at a talent database, cleaning and preprocessing data by adopting a Hive technology; and then designing a structure and a model of the knowledge graph based on the preprocessed data.
The step S1 comprises the following steps:
s101: technology talent big data acquisition
S1011: determining a data input source: the data input sources comprise talent networks, academic institutions, research centers and professional social networks, and talent information is obtained from the innovative value, capability, contribution and the like of scientific and technological talents.
S1012: determining an acquisition technology: for streaming data, the kafka technology is adopted for data acquisition; for network unstructured data, a large-scale parallel crawler technology is adopted for collection; for traditional database structured data, the sqoop technique is adopted for extraction.
S1013: determining big data storage technology: and storing big data by adopting an HDFS distributed file system and an Hbase column database.
S102: technology talent big data preprocessing
S1021: data cleansing, padding or deleting missing values to ensure data integrity. For numerical features, the missing values are filled with mean, median, or mode; for time series data or ordered data, filling in missing values using either the previous observations or the next observations; if there is a correlation between the data, interpolation methods (such as linear interpolation, polynomial interpolation, or spline interpolation) are used to estimate the missing values. Deleting the whole row of data containing the missing value under the condition that the data missing value is less or the analysis task is not important; if most of data of a certain feature are missing values, deleting the whole feature; if consecutive data points are missing, these consecutive data segments are deleted.
S1022: the multi-modal data is transformed to create new features or to transform existing features to improve the performance of the analysis. Creating new features according to domain knowledge to capture potential patterns of data; secondly, selecting the most relevant features to reduce the dimension and improve the model efficiency; scaling the feature values to a similar range to prevent some features from affecting the model too much; finally, the classified variables are converted into numbers or single-heat codes so as to be convenient for model processing.
S103: determining hypergraph basic structure based on entity, relationship and attribute in scientific and technological talent data
S1031: multiple attributes of scientific talents in scientific research (personal characteristics, academic background, working experience, professional fields and the like) are analyzed. The system comprises the academic level of the talents of science and technology, reputation and awareness in academia, working experience and professional history in scientific research projects, application and technical innovation of research papers and patents published in academia journals and conferences, influence degree of research achievements on society and industry, and cooperation capability of the talents of science and technology in multi-disciplines and cross-field teams.
S1032: according to the extracted talent multiple attribute information, the extraction and integration of entities, relations and attributes are carried out by using NER, NLTK, stanford NLP, GATE and other technologies, the entities (such as names, mechanism names, academic fields and the like) in a database are accurately identified, the relations (such as cooperation relations, teacher relations and the like) among the entities are established, and important attribute information (such as research achievements, academic backgrounds, project experiences and the like) is extracted, so that high-quality and highly-structured scientific talent information is constructed.
S1033: and (3) abstracting the multidimensional attribute of the scientific and technological talent information into nodes in the knowledge graph by combining a knowledge graph construction principle, connecting the nodes according to evaluation mechanisms in different periods, and associating evaluation results in different periods as superedges. Defining a science and technology talent hypergraph asBy node set->And (2) side set->And hyperedge set->Composition, thus hypergraph->Can be expressed as:
(1)
wherein each edgeAssigning weights to each edge according to evaluation mechanism attribute bias of different periodsEach overrun->According to the bias of the evaluation results in different periods, weight is distributed to each superside。/>And->For indicating the importance of the connection relationship throughout the hypergraph. Hypergraph after weight introduction ++>Can be expressed as:
(2)
and->Weight diagonal matrix respectively representing edges and superedges, namely:
(3)
(4)
wherein the method comprises the steps ofRepresenting the number of elements in the collection.
The structure of hypergraph can use an incidence matrixDescription of:
(5)
s2: the attention-introducing mechanism designs a scientific talent classification evaluation network model. The scientific and technological talents have rich and diversified characteristics and capabilities, the subjectivity of traditional scientific and technological talents is relatively strong, once indexes are not easy to adjust and index weight distribution is determined according to multiple experiences, talent evaluation results obtained by using the index system are often inconsistent with the real conditions of evaluation objects. In order to comprehensively and accurately evaluate the technology talents, based on the technology talent knowledge hypergraph constructed by the S1, a attention mechanism is introduced to automatically learn and identify the relative importance of different features in talent evaluation, key features are given higher weight, and finally a dynamic multi-target technology talent classification evaluation model is established.
The step S2 comprises the following steps:
s201: the graph embedding technology is introduced, a graph convolutional neural network (Graph Convolutional Network, GCN) and one-hot coding are used, each node and associated information in the hypergraph are converted into vector forms, and the obtained vectors are expressed in a matrix form and are used as network inputs.
S202: embedding an input vector matrix into a trainable embedding layer, mapping each element in input data into a high-dimensional vector space, and enhancing the expression capability; and (3) performing embedded learning on the nodes and the supersides by using CTransR, and defining a scoring function to obtain reasonable embedded vectors:
(6)
wherein the method comprises the steps ofAnd->For embedded entities learned under corresponding supersides, assume the superside relationship represented by the entity clusters in the same group ++>Has similar characteristics, and the relationship expressed in different groups is +.>There may be a large difference; thus, for each group of entity clusters +.>Learning out-of-limit embedding alone>。/>To be in specific entity pair->The following is about a specific relationship>Is embedded with a vector of triplet head entity,>to be in specific entity pair->The following is about a specific relationship>Is embedded with a vector by the triplet tail entity,>for entity pair->The learned superedges embed the relationship vectors. />And->Representing a first norm and a second norm, respectively. />Relation vector for constraint clustering>Is +.>The similarity between the clusters can ensure that the same relationship expressed by different clusters still has a certain degree of similarity. />For adjusting the influence of the constraints on the scoring function.
S203: performing convolution operation on the output of the embedded layer by using a 1D convolution check, and further extracting characteristic information;
s204: performing attention mechanism operation on the S203 output, and calculating the characteristic weight by using cosine similarity to improve the evaluation accuracy; in the first placeIn the layer, the embedding vector of each superside obtained according to the embedding layer +.>Calculate +.>Weighted embedding vector of individual nodes:
(7)
representing the +.f in the attention mechanism diagram neural network>The hyperedge weight of the layer,/>Indicate->Layer->Embedding vectors by the superedges of the individual nodes;
according to the firstWeighted embedding vector calculation of individual nodesNode set->Middle->Cosine similarity of individual nodes:
(8)
is->Layer->Weighting the embedded vectors by the individual nodes;
further obtaining the attention coefficient of the first-order neighbor node:
(9)
is->Layer->Personal node and->Cosine similarity among neighboring nodes;
thereby can be obtainedNode in layer->Is embedded in, namely:
(10)
wherein the method comprises the steps ofFor sigmoid function, +.>Is->Node in layer->Is embedded in the memory; .
S205: performing dimension reduction operation on the attention layer output by using maximum pooling, and extracting the most representative feature vector;
s206: connecting the output of the pooling layer into the full connection layer, and mapping the extracted feature vector to a target space by using a LeakyReLU activation function to obtain the output of the classifier;
s207: error between the predicted result and the sample label is measured using cross-entropy (cross-entropy) loss function for model optimization and training. Training sampleThe true probability distribution is +.>The predictive probability distribution is thatThe loss function is:
(11)
wherein the method comprises the steps ofIs the sample type;
s208: and updating the weight parameters of the network by using an optimizer, minimizing a loss function to achieve the aim of optimization, and continuously training the network by using a training set to achieve the best effect.
S209: the root cause hyper-parameters affecting the classification of the model were determined using fishbone analysis. Aiming at the situation that the classification accuracy of the person in the model is insufficient, the fish bone analysis method is used for judging the output attribute so as to determine the set parameter problem, and further the super-parameter set decision is obtained, so that the overall classification performance of the model is improved. Setting the skeleton class attribute threshold value as 0.6 and the branch class attribute threshold value as 0.8; if the data exceeds this threshold, it is included as a root cause, namely:
(12)
wherein the method comprises the steps ofRepresenting fish bone assay, ->Representing root cause attribute, ++>Representing branch attributes->Representing attribute values. If the value in the root cause attribute exceeds the threshold value of 0.6, the attribute is expressed asNamely determining the root attribute; if smaller than 0.6, its attribute is expressed as +.>I.e., non-root attributes. The root superparameter influence of the model classification capability can be clearly determined through the judgment, and the root superparameter influence is taken as a main adjustment target of a network model, and related parameters in the network are changed through the target so as to achieve the optimal classification evaluationIs a target of (a).
S3: and constructing a scientific and technological talent intelligent recommendation model. Different personnel units have different demands on the scientific and technological talents, however, the traditional scientific and technological talent evaluation standard is single, so that the personnel unit demands are not matched with the scientific and technological talents, and talent waste is caused. In order to fully develop the talent value of science and technology, a knowledge graph-based collaborative filtering recommendation algorithm is adopted to construct an intelligent talent recommendation model on the basis of an evaluation model. Firstly, talent characteristics are extracted through a technological talent knowledge hypergraph, then post demand characteristics are extracted, similarity between different technical talents and different post demands is calculated, and finally, technological talents matched with demands are recommended to a human unit.
The step S3 comprises the following steps:
s301: obtaining feature vectors of different technical talents based on talent features in the technical talent knowledge hypergraph, and constructing a technical talent matrixMeanwhile, a post demand matrix is built>,/>The construction mode of (2) is as follows:
(13)
wherein,
(14)
the construction mode of (2) is as follows:
(15)
wherein,
(16)
s302: acquiring a science and technology talent matrixAnd human unit demand matrix->Thereafter, utilizeJaccardSimilarity coefficient calculates the similarity between the technology talent matrix and the demand matrix: />
(17)
S303: and selecting the first K scientific talents with the highest similarity with the post requirements to form a nearest neighbor set U through a K nearest neighbor algorithm.
S304: and recommending the first N technological talents from the nearest neighbor set U to the target post by adopting a TOP-N method, so as to realize intelligent recommendation of the technological talents.
Therefore, the novel intelligent evaluation method for the talents based on the hypergraph attention mechanism can automatically complement the defects of the talents information data, realize the dynamic evaluation and accurate recommendation of the omnibearing talents from the angles of multidimensional, multi-space-time, multi-angle, demonstration and comparison, further improve the scientificity, the speciality and the objectivity of the talents evaluation and provide powerful support for the intelligent management and the application of the talents data of human units.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims (6)

1. A novel intelligent evaluation method for talents of science and technology based on hypergraph attention mechanism is characterized in that: the method comprises the following steps:
s1, constructing a science and technology talent knowledge hypergraph: extracting data from multi-source talent big data to form a talent database, preprocessing the data of the talent database, and designing a structure and a model of a knowledge graph after preprocessing;
s2, introducing an attention mechanism to design a scientific and technological talent classification evaluation network model: s1, establishing a science and technology talent knowledge hypergraph, introducing an attention mechanism to learn and identify, and establishing a science and technology talent classification evaluation network model;
s3: constructing a scientific and technological talent intelligent recommendation model: on the basis of S2 technology talents classifying and evaluating network models, constructing a technology talents intelligent recommendation model by adopting a collaborative filtering recommendation algorithm based on a knowledge graph;
the step S2 comprises the following steps:
s201, introducing a graph embedding technology, using a graph convolution neural network and one-hot coding, converting each node and associated information thereof in a hypergraph into a vector form, and expressing the obtained vector in a matrix form as network input;
s202, an input vector matrix is embedded into a trainable embedding layer, each element in input data is mapped into a high-dimensional vector space, and CTransR is used for carrying out embedded learning on nodes and supersides, so that a scoring function is defined:
wherein h and t are embedded entities learned under corresponding hyperedges, h r,c Embedding vectors for triplet header entities for a particular relationship r under a particular entity pair c, t r,c Embedding vectors for triplet tail entities for a particular relationship r under a particular entity pair c, r c Embedding relation vector for entity pair c learned superb, L 1 And L 2 Respectively representing a first norm and a second norm,relation vector r for constraint clustering c Similarity with the original relation vector r, and beta is used for adjusting the influence of the constraint on the scoring function;
s203, performing convolution operation on the output of the embedded layer by using the 1D convolution check, and further extracting characteristic information;
s204, performing attention mechanism operation on the output of the step S203, calculating the characteristic weight by using cosine similarity, and calculating the weighted embedded vector of the ith node in the first layer according to the embedded vector v of each superside obtained by the embedded layer and the initial weight of the embedded vector v:
representing the superside weight of the first layer in the attentional scheme neural network, ++>A superside embedded vector representing an ith node of the first layer;
computing a first-order set of neighbor nodes based on the weighted embedding vector of the ith nodeCosine similarity of j-th node:
weighting the embedded vector for the j-th node of the first layer;
further obtaining the attention coefficient of the first-order neighbor node:
cosine similarity between the ith node and the kth neighbor node of the first layer;
obtaining the embedding of the node i in the l+1 layer:
where σ is the sigmoid function,an embedded value for node j in the first layer;
s205, performing dimension reduction operation on the attention layer output by using maximum pooling, and extracting representative feature vectors;
s206, connecting the output of the pooling layer into the full-connection layer, and mapping the extracted feature vector to a target space by using a LeakyReLU activation function to obtain the output of the classifier;
s207, measuring the error between the prediction result and the sample label by using the cross entropy loss function, optimizing and training the model, and training the sample x i The true probability distribution is p (x i ) The predictive probability distribution is q (x i ) The loss function is:
wherein n is the sample species;
s208, updating weight parameters of the network by using an optimizer, minimizing a loss function to achieve the aim of optimization, and continuously training the network by using a training set to achieve the best effect;
s209, determining a root cause hyper-parameter affecting model classification by using a fish bone analysis method: aiming at the situation that the classification accuracy of the person in the model is insufficient, judging the output attribute by using a fishbone analysis method, determining the problem of setting parameters, obtaining a super-parameter setting decision, and improving the overall classification performance of the model; setting the attribute threshold of the skeleton class as 0.6, the attribute threshold of the branch class as 0.8,
wherein fishbone represents a fishbone analysis method, cause represents a root attribute, subsubau represents a branch attribute, and value represents an attribute value; if the value in the root cause attribute exceeds the threshold value of 0.6, the attribute is expressed as defined; if the value of the root attribute is smaller than 0.6, the attribute is expressed as index, the root superparameter influence of the classification capability of the model is determined through the judgment, the root superparameter influence is used as an adjustment target of the network model, and the related parameters in the network are changed through the target to achieve the aim of optimal classification evaluation.
2. The novel intelligent talent assessment method based on hypergraph attention mechanism, which is characterized by comprising the following steps of: step S1 comprises the steps of:
s101, acquiring large data of scientific and technological talents;
s102, preprocessing large data of scientific and technological talents;
s103, determining a hypergraph basic structure based on the entity, the relation and the attribute in the scientific and technological talent data.
3. The novel intelligent talent assessment method based on hypergraph attention mechanism according to claim 2, which is characterized by comprising the following steps: step S101 includes the steps of:
s1011, determining a data input source: the data input sources comprise talent networks, academic institutions, research centers and professional social networks, and talent information is obtained from innovation values, capabilities and contribution aspects of scientific and technological talents;
s1012, determining an acquisition technology: for streaming data, adopting a kafka technology to acquire data, adopting a parallel crawler technology to acquire network unstructured data, and adopting an sqoop technology to extract traditional database structured data;
s1013, determining a big data storage technique: and storing big data by adopting an HDFS distributed file system and an Hbase column database.
4. The novel intelligent talent assessment method based on hypergraph attention mechanism according to claim 3, wherein the novel intelligent talent assessment method is characterized by comprising the following steps: step S102 includes the steps of:
s1021, data cleaning, filling or deleting data missing values, and ensuring the integrity of data; for a numerical feature, filling the missing value by using a mean value, a median value or a mode value, for time series data or ordered data, filling the missing value by using a previous observation value or a subsequent observation value, if correlation exists between the data, estimating the missing value by using an interpolation method, deleting the whole row of data containing the missing value under the condition that the missing value of the data is less or is not important for an analysis task, deleting the whole feature if the data of a certain feature is the missing value, and deleting the continuous data segment if a plurality of continuous data points are missing;
s1022, converting the multi-mode data, creating new features or converting existing features; creating new features according to domain knowledge, capturing potential modes of data, selecting the most relevant features, scaling feature values into similar ranges, and finally converting classification variables into numbers or single-hot codes.
5. The novel intelligent talent assessment method based on hypergraph attention mechanism, which is characterized in that: step S103 includes the steps of:
s1031, analyzing multiple attributes of the scientific and technological talents in scientific researches, wherein the multiple attributes comprise academic levels of the scientific and technological talents, reputation and awareness of the scientific and technological talents in academia, working experience and professional histories in scientific research projects, application and technical innovation of research papers and patents published in academic journals and conferences, influence degree of the research achievements on society and industry, and collaborative capability of the scientific and technological talents in multidisciplinary and cross-domain teams;
s1032, extracting and integrating entities, relations and attributes by using NER, NLTK, stanfordNLP, GATE technology according to the extracted talent multiple attribute information, identifying the entities in the database, establishing the relations among the entities and extracting important attribute information to construct technological talent information;
s1033, abstract the multidimensional attribute of the technology talent information into nodes in the knowledge graph according to the knowledge graph construction principle, connecting the nodes according to the evaluation mechanisms of different periods, associating the evaluation results of different periods as superedges, and defining the technology talent supergraph asBy node set->With edge set ε and superedge set ++>Composition, thus hypergraph->Expressed as:
wherein each edge e epsilon distributes weight omega (e) to each edge according to evaluation mechanism attribute bias of different periods, and each superedgeBiasing the assignment of weights to each superside according to the evaluation results of different periods>Omega (e) and->Hypergraph +.f. for representing importance of connection relationship in whole hypergraph, after weight is introduced>Expressed as:
W 1 and W is 2 Weight diagonal matrix representing edges and superedges respectively,
wherein the method comprises the steps ofRepresenting the number of elements in the set;
the structure of the hypergraph is described by an association matrix H (v, e):
6. the novel intelligent talent assessment method based on hypergraph attention mechanism, which is characterized in that: the step S3 comprises the following steps:
s301: based on the talents of science and technologyTalent characteristics in the superstration graph are obtained to obtain characteristic vectors of different technical talents, and a scientific Talent matrix Talent is constructed x Meanwhile, constructing a post demand matrix Talent req ,Talent x The construction mode of (2) is as follows:
Talent x =[s 1 ,s 2 ,s 3 ...,s i ]
wherein,
Talent req the construction mode of (2) is as follows:
Talent req =[r 1 ,r 2 ,r 3 …,r i ]
wherein,
s302, acquiring a Talent matrix Talent of science and technology x And human unit demand matrix Talent req Afterwards, calculating the similarity between the talent and demand matrixes by using the Jaccard similarity coefficient:
s303, selecting the first K scientific talents with the highest similarity with the post requirements to form a nearest neighbor set U through a K nearest neighbor algorithm;
s304, recommending the first N technological talents from the nearest neighbor set U to the target post by adopting a TOP-N method, and realizing intelligent recommendation of the technological talents.
CN202311623569.9A 2023-11-30 2023-11-30 Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism Active CN117314266B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311623569.9A CN117314266B (en) 2023-11-30 2023-11-30 Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311623569.9A CN117314266B (en) 2023-11-30 2023-11-30 Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism

Publications (2)

Publication Number Publication Date
CN117314266A CN117314266A (en) 2023-12-29
CN117314266B true CN117314266B (en) 2024-02-06

Family

ID=89260819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311623569.9A Active CN117314266B (en) 2023-11-30 2023-11-30 Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism

Country Status (1)

Country Link
CN (1) CN117314266B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3640864A1 (en) * 2018-10-18 2020-04-22 Fujitsu Limited A computer-implemented method and apparatus for inferring a property of a biomedical entity
CN112905891A (en) * 2021-03-05 2021-06-04 中国科学院计算机网络信息中心 Scientific research knowledge map talent recommendation method and device based on graph neural network
CN114817568A (en) * 2022-04-29 2022-07-29 武汉科技大学 Knowledge hypergraph link prediction method combining attention mechanism and convolutional neural network
CN115269816A (en) * 2022-09-01 2022-11-01 迪吉凡特(宁波)数字技术有限公司 Core personnel mining method and device based on information processing method and storage medium
CN116340646A (en) * 2023-01-18 2023-06-27 云南师范大学 Recommendation method for optimizing multi-element user representation based on hypergraph motif
CN116702900A (en) * 2023-06-21 2023-09-05 电子科技大学 Knowledge hypergraph completion method based on graph structure transformation
CN117056392A (en) * 2022-05-07 2023-11-14 六棱镜(杭州)科技有限公司 Big data retrieval service system and method based on dynamic hypergraph technology

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3640864A1 (en) * 2018-10-18 2020-04-22 Fujitsu Limited A computer-implemented method and apparatus for inferring a property of a biomedical entity
CN112905891A (en) * 2021-03-05 2021-06-04 中国科学院计算机网络信息中心 Scientific research knowledge map talent recommendation method and device based on graph neural network
CN114817568A (en) * 2022-04-29 2022-07-29 武汉科技大学 Knowledge hypergraph link prediction method combining attention mechanism and convolutional neural network
CN117056392A (en) * 2022-05-07 2023-11-14 六棱镜(杭州)科技有限公司 Big data retrieval service system and method based on dynamic hypergraph technology
CN115269816A (en) * 2022-09-01 2022-11-01 迪吉凡特(宁波)数字技术有限公司 Core personnel mining method and device based on information processing method and storage medium
CN116340646A (en) * 2023-01-18 2023-06-27 云南师范大学 Recommendation method for optimizing multi-element user representation based on hypergraph motif
CN116702900A (en) * 2023-06-21 2023-09-05 电子科技大学 Knowledge hypergraph completion method based on graph structure transformation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
图卷积神经网络综述;徐冰冰;岑科廷;黄俊杰;沈华伟;程学旗;;计算机学报(第05期);第755-780页 *
基于简历文本数据的人才知识图谱构建;沈振国;《中国优秀硕士学位论文全文数据库 信息科技辑》(第3期);第 I138-2993页 *
超图神经网络综述;林晶晶 等;《计算机研究与发展》;第1-26页 *

Also Published As

Publication number Publication date
CN117314266A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN109492157B (en) News recommendation method and theme characterization method based on RNN and attention mechanism
CN108009674A (en) Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks
CN111127146B (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
Dariane et al. Forecasting streamflow by combination of a genetic input selection algorithm and wavelet transforms using ANFIS models
Piao et al. Housing price prediction based on CNN
CN108874959A (en) A kind of user&#39;s dynamic interest model method for building up based on big data technology
CN107704970A (en) A kind of Demand-side load forecasting method based on Spark
Wei et al. Forecasting the daily natural gas consumption with an accurate white-box model
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN106649658A (en) Recommendation system and method for improving user role undifferentiated treatment and data sparseness
Huang et al. Research on urban modern architectural art based on artificial intelligence and GIS image recognition system
CN113762387B (en) Multi-element load prediction method for data center station based on hybrid model prediction
Amirteimoori et al. On the environmental performance analysis: a combined fuzzy data envelopment analysis and artificial intelligence algorithms
Hatim et al. Addressing challenges and demands of intelligent seasonal rainfall forecasting using artificial intelligence approach
Jui et al. Flat price prediction using linear and random forest regression based on machine learning techniques
CN115859450A (en) Building modeling data processing method and system based on BIM technology
Kurt Determination of the most appropriate statistical method for estimating the production values of medium density fiberboard
Sun Real estate evaluation model based on genetic algorithm optimized neural network
CN114781503A (en) Click rate estimation method based on depth feature fusion
CN111062511B (en) Aquaculture disease prediction method and system based on decision tree and neural network
CN111078859B (en) Author recommendation method based on reference times
CN117314266B (en) Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism
CN116662860A (en) User portrait and classification method based on energy big data
CN115018357A (en) Farmer portrait construction method and system for production performance improvement
Lu et al. A deep belief network based model for urban haze prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant