CN117076484B

CN117076484B - Human resource data analysis method based on time sequence knowledge graph

Info

Publication number: CN117076484B
Application number: CN202311129301.XA
Authority: CN
Inventors: 杜鹏
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2024-04-19
Anticipated expiration: 2043-09-04
Also published as: CN117076484A

Abstract

The invention discloses a human resource data analysis method based on a time sequence knowledge graph, which comprises the following steps: s101: establishing a human resource time sequence data knowledge graph, wherein the human resource time sequence data knowledge graph comprises nodes, edges and attributes of the nodes; s102: synchronously analyzing the fact data, and synchronizing the changed fact data into a time sequence data knowledge graph of the human resources; s103: synchronously analyzing the index data, and synchronizing the added index data into a time sequence data knowledge graph of human resources; s104: and establishing a data analysis method, including statistics, retrieval and recommendation, and obtaining a data analysis result. The method has the advantages that the requirement and the change of a data source are adapted through an incremental updating mode, the adaptability and the flexibility of a human resource data statistics, retrieval and recommendation analysis method are remarkably improved, the effectiveness of human resource analysis data management is solved, the synchronous updating of analysis fact data and analysis index data is realized, and the method has stronger adaptability to the change and updating of indexes, measurement and the fact data along with time.

Description

Human resource data analysis method based on time sequence knowledge graph

Technical Field

The application belongs to the technical field of data processing, and particularly relates to a human resource data analysis method based on a time sequence knowledge graph.

Background

The model commonly used for analyzing the human resource data has a relational model and a multidimensional data set, and the data analysis technology based on different data model representations has respective characteristics and adaptation characteristics.

The data analysis based on the relation model adopts an index and a measure in the data analysis of the modeling of the entity relation model, the index is stored in a hierarchical index table, the measure is stored in a fact table, a main external key relation mode is adopted between the fact table and the index table to establish a relation, the data analysis is generally realized based on SQL, the relation between the fact table and the index table is realized by adopting multi-table association JOIN, the mode can be directly established based on the existing transaction data, the data synchronization cost is low, and the SQL is adopted to access the data analysis requirements which can be adapted to the flexibility and the variability. However, the implicit relation between the time measurement and other measurement is weakened, the data analysis based on the relation model is realized by adopting a multi-table association mode, the system performance is obviously reduced when the data quantity is large or the calculation quantity is large, the modeling mode structure of the main external key is adopted to be solidified, and the change cost of the corresponding table structure is larger when the index or measurement caused by the change of the analysis requirement is changed.

The data analysis based on the multi-dimensional data set is realized by adopting a modeling method in a traditional data warehouse, indexes and metrics are respectively modeled as indexes and metrics in the multi-dimensional data set, and data are periodically extracted into the data warehouse by adopting a special ETL tool to form the multi-dimensional data set, so that corresponding dimension and fact data are formed. The data analysis based on the multidimensional dataset is realized by adopting a special multidimensional query language, and the method is specially designed for a scene of large-scale multidimensional data analysis, and the data analysis is friendly in operation and good in performance. However, based on data analysis of a multidimensional dataset, indexes and metrics need to be determined in modeling, model indexes are fixed in dimension, and data synchronization is generally mainly generated in full quantity, so that the method is not friendly to time-varying data and index changes caused by demand changes. Representative methods for information retrieval based on knowledge graph include retrieval based on entity links and retrieval based on relationship paths.

The retrieval based on entity links is based on the identification information among the entities, links the same entities together to form an identification chain, and realizes the link retrieval of the entities. Since an entity may have rich attribute information, such as parts of speech, classification, attribute values, etc., the entity attribute information may be directly utilized for retrieval.

The retrieval based on the relation path is based on the relation connection path, the relation path is converted into a query statement, and then the attribute information of the entity is retrieved. By analyzing the relation path, a series of condition constraints can be obtained, and then the constraint conditions are utilized for inquiring, so that information retrieval is realized. The commonly used sequence recommendation models are a recommendation model based on a Markov chain and a recommendation model based on deep learning.

Based on the recommendation model of the Markov chain, the transition probability from one state to another can be calculated, and if the random process related state is generalized to the previous state, the transition of each state in the process depends on n states before, and the model is an n-order model. When n is 1, each state transition depends only on the previous state. The first order Markov conditional probability transition matrix can be used for capturing sequence characteristics, and sequence information of time sequence data is effectively utilized. The method of using the high-order Markov chain is used, a probability transition matrix is constructed for calculating the transition probability of the sequence, the hidden rule characteristics of the short-term sequence can be captured, and the problems of data sparseness and cold users can be solved to a certain extent. However, when the sequence string length is not fixed, the conventional Markov chain model with a fixed order cannot accurately model the hidden regular behavior of the sequence, the transfer matrix is a sparse matrix, and what data structure is adopted to store the transfer matrix needs to be designed according to a specific use scene.

Based on a deep learning recommendation model, a model of a cyclic neural network is used for learning time dependence in time sequence data, the problem of gradient disappearance is often faced, the cyclic neural network is relatively weak in long-term dependence modeling capability of a sequence, and although LSTM and GRU can improve the problem, sequence data items are mutually dependent, so that parallel calculation is difficult, and pseudo dependence is easy to generate; the model based on the convolutional neural network has no strict sequential assumption, so that the strong sequential dependence of the model based on the convolutional neural network is reduced, but the model is difficult to process and learn the long-term dependence in the sequence due to the limitation of the size of a filter in the model; the model based on the graph convolution neural network can learn the complex transfer relation between the captured sequence data through the graph structure, has strong interpretability, but the calculation complexity of the model is relatively high due to the complexity of the graph; the existing deep learning recommendation model considers the problem of the front-back sequence of the post, but does not take the key factor of the post time into the model, so that the calculation result deviates from the actual requirement of the actual human post recommendation.

Disclosure of Invention

In view of this, some embodiments disclose a human resources time series data analysis method based on a knowledge graph, the method comprising:

s101: establishing a human resource time sequence data knowledge graph, wherein the human resource time sequence data knowledge graph comprises nodes, edges and attributes of the nodes; the nodes comprise basic indexes, calculation indexes and analysis objects, and the edges refer to the relation among the nodes;

s102: synchronously analyzing the fact data, and synchronizing the changed fact data into a time sequence data knowledge graph of the human resources;

S103: synchronously analyzing the index data, and synchronizing the added index data into a time sequence data knowledge graph of human resources;

s104: and establishing a data analysis method and obtaining a data analysis result.

Further, in the human resource time series data analysis method based on the knowledge graph disclosed in some embodiments, the basic index refers to an index classification which is not decomposable.

According to the human resource time sequence data analysis method based on the knowledge graph, which is disclosed in some embodiments, the calculated index refers to index classification formed by combining a plurality of basic classification indexes.

The human resource time sequence data analysis method based on the knowledge graph disclosed by some embodiments comprises the steps of calculating the calculation relation between the indexes and the basic indexes, calculating the calculation relation between the indexes and the analysis objects and analyzing the belonging relation between the objects and the basic indexes.

According to the human resource time sequence data analysis method based on the knowledge graph, disclosed by some embodiments, the time relation between the object and the basic index is analyzed, and the effective time range is represented by the starting time and the ending time.

The human resource time sequence data analysis method based on the knowledge graph disclosed by some embodiments comprises the following steps of:

Adding an analysis object: adding nodes of analysis objects in the human resource time sequence data knowledge graph, running calculation logic of all basic indexes on the added analysis objects, and establishing a relation with the basic indexes and calculation indexes meeting the conditions through adding edges;

Modifying the analysis object: synchronously modifying information to corresponding nodes in the human resource time sequence data knowledge graph, operating calculation logic of all basic indexes on the analysis object, and adding or deleting the basic indexes, the relation between the calculation indexes and the analysis object;

Deleting the analysis object: and deleting the corresponding nodes in the human resource time sequence data knowledge graph, and synchronously deleting all edges connected with the corresponding nodes.

The human resource time sequence data analysis method based on the knowledge graph disclosed by some embodiments specifically includes:

adding basic indexes: adding a node of a basic index type into the human resource time sequence data knowledge graph, operating the operation logic of the added basic index on all analysis objects in the human resource time sequence data knowledge graph, and adding edges according to a calculation result;

adding a calculation index: and adding the nodes for calculating the index types into the human resource time sequence data knowledge graph.

The time sequence data analysis method of human resources based on the knowledge graph disclosed by some embodiments is established, and the established data analysis method comprises a data analysis method based on basic indexes and a data analysis method based on calculation indexes.

The human resource time sequence data analysis method based on the knowledge graph disclosed by some embodiments comprises the following steps:

taking the basic index as a starting point, and acquiring all analysis object entity nodes pointed by the basic index;

and calculating the statistical result of the basic index by counting.

taking the calculation index as a starting point to obtain a calculation index or a basic index pointed by the calculation index;

calculating an analysis object set corresponding to all the pointed basic indexes;

Combining all the results to obtain an analysis object set corresponding to the calculation index;

counting or performing related calculation on the basis of the analysis object set to obtain a data analysis result based on calculation indexes;

and predicting future values of the calculated index or the basic index time sequence of the specific analysis object.

According to the human resource time sequence data analysis method disclosed by the embodiment of the invention, the data analysis is carried out based on the established human resource time sequence data knowledge graph, the synchronous update of analysis fact data and analysis index data can be dynamically realized, the flexibility and the effectiveness of data modeling are effectively improved, the method has stronger adaptability to the random change and update of indexes, measurement and fact data, particularly for the scene that the indexes change along with the needs and the human resource business data randomly, the human resource data analysis method based on the time sequence knowledge graph adapts to the needs and the changes of data sources in an incremental update mode, the adaptability and the flexibility of the human resource data analysis method are remarkably improved, and the method has good application prospects in the technical field of human resource time sequence data analysis.

Drawings

FIG. 1 is a flow chart of a human resource time sequence data analysis method based on a knowledge graph;

Fig. 2, a schematic diagram of a time-series knowledge graph of human resources in embodiment 2;

FIG. 3, schematic diagram of the knowledge graph of the stem, post, unit and examination time sequence of example 3;

Fig. 4, embodiment 4 is a schematic diagram of a human resource time-series knowledge graph information retrieval result;

FIG. 5, example 5, post relationship schematic of the trunk optional time sequence knowledge graph;

FIG. 6, example 5, heterogeneous plot of relationship between trunk, post and attribute.

Detailed Description

The word "embodiment" as used herein does not necessarily mean that any embodiment described as "exemplary" is preferred or advantageous over other embodiments. Performance index testing in the examples of the present application, unless otherwise specified, was performed using conventional testing methods in the art. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure.

Unless otherwise defined, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; other test methods and techniques not specifically mentioned in the present application are those commonly used by those skilled in the art.

The terms "substantially" and "about" are used herein to describe small fluctuations. For example, they may refer to less than or equal to ±5%, such as less than or equal to ±2%, such as less than or equal to ±1%, such as less than or equal to ±0.5%, such as less than or equal to ±0.2%, such as less than or equal to ±0.1%, such as less than or equal to ±0.05%. Numerical data presented or represented herein in a range format is used only for convenience and brevity and should therefore be interpreted flexibly to include not only the numerical values explicitly recited as the limits of the range, but also to include all the individual numerical values or sub-ranges encompassed within that range. For example, a numerical range of "1 to 5%" should be interpreted to include not only the explicitly recited values of 1% to 5%, but also include individual values and sub-ranges within the indicated range. Thus, individual values, such as 2%, 3.5% and 4%, and subranges, such as 1% to 3%, 2% to 4% and 3% to 5%, etc., are included in this numerical range. The same principle applies to ranges reciting only one numerical value. Moreover, such an interpretation applies regardless of the breadth of the range or the characteristics being described.

In this document, including the claims, conjunctions such as "comprising," including, "" carrying, "" having, "" containing, "" involving, "" containing, "and the like are to be construed as open-ended, i.e., to mean" including, but not limited to. Only the conjunctions "consisting of … …" and "consisting of … …" are closed conjunctions.

Numerous specific details are set forth in the following examples in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In the examples, some methods, means, instruments, devices, etc. well known to those skilled in the art are not described in detail in order to highlight the gist of the present application.

On the premise of no conflict, the technical features disclosed by the embodiment of the application can be combined at will, and the obtained technical scheme belongs to the disclosure of the embodiment of the application.

Further exemplary details of the technology are described below in connection with the embodiments and figures 1,2,3, 4, 5, 6.

In some embodiments, some examples disclose a human resource time series data analysis method based on a knowledge graph, the method comprising:

Generally, the human resource time sequence data knowledge graph comprises nodes, edges and attributes, wherein the nodes comprise node elements such as basic indexes, calculation indexes and analysis objects.

In general, an analysis object refers to each member of a human resource management system, and is one of the most basic metrics. As an alternative embodiment, attribute extension metrics of the analysis object may be added, such as adding an age attribute, a revenue attribute of the analysis object, which can measure the average age or total revenue of the analysis object at the time of calculation.

Generally, the basic index represents a basic classification that a business analysis needs to perform on an analysis object, and this classification mode cannot be decomposed into other index classifications, for example, a basic classification of a human resource management object from the ethnic aspect is represented by the han nationality. In general, the computation logic f_t is held in each node of the base index type, describing the analysis object features with which this node can be associated.

In general, the calculation index represents a classification mode that needs to be performed on the analysis object by a business analysis, and the classification mode may be formed by combining a plurality of basic classification modes, and the combination mode may be represented by calculation. For example, "minority" represents a combination of all non-han nationalities, and in general, the computation logic f_c is maintained in each node of the computation index type, describing the underlying index with which the index relationship can be established.

In general, an edge refers to a relationship between nodes in a human resource time series data knowledge graph, for example, the edge includes a calculation relationship between a calculation index and a base index, a calculation relationship between a calculation index and an analysis object, and a belonging relationship between the analysis object and the base index.

Generally, the process of establishing a human resources analysis data knowledge graph includes: analyzing the fact data and the index data, determining the relation between the node content and the nodes, determining the attribute of the nodes, determining the expression mode of the relation between the nodes, and constructing a human resource time sequence data knowledge graph based on the determined nodes, the node relation and the attribute content.

As an alternative embodiment, the valid time range is represented by a start time and an end time for the relation of the existence time factors. For example, the valid time range of the relationship is represented in terms of the start time and end time of the attribute; the effective time range of this relationship can also be represented by the start time and end time of the level.

As an alternative embodiment, the time relationship between the analysis object and the calculation index, if the basic indexes constituting the calculation index have time relationships with the analysis object, respectively, the effective time range is represented by the intersection of the start time and the end time; if there is a temporal relationship between the base index constituting the calculation index and the analysis object, and there is no temporal relationship, the effective time range is represented by the union of the start time and the end time.

As an alternative embodiment, the synchronous analysis of the fact data specifically includes:

As an alternative embodiment, the synchronization analysis index data specifically includes:

Adding the calculation index: and adding the nodes for calculating the index types into the human resource time sequence data knowledge graph.

As an alternative embodiment, the established data analysis method includes a data analysis method based on a base index and a data analysis method based on a calculation index.

As an alternative embodiment, the data analysis method based on the basic index includes: taking the basic index as a starting point, and acquiring all analysis object entity nodes pointed by the basic index; and counting to obtain the statistical results of all the analysis object entities corresponding to the basic index. For example, the number of "han nationality" people can be calculated by calculating the corresponding attribute of the node list, and the average age of the "han nationality" people can be calculated for the basic index with the time sequence relationship, and the calculation can be realized by filtering based on the time on the relationship.

As an alternative embodiment, the data analysis method based on the calculation index includes:

Calculating an analysis object set corresponding to all the corresponding basic indexes;

Counting or performing related calculation on the basis of the analysis object set to obtain a data analysis result based on calculation indexes; for example, the average age of the "department trunk" personnel is obtained, all department trunk categories are obtained first, then the personnel set of each level is obtained, the personnel sets of all department trunk are combined to obtain the set of all "department trunk", and then the age average is obtained.

As an alternative embodiment, the multi-index retrieval method includes:

Setting search information;

information retrieval is carried out from a human resource library;

and carrying out structural information display on the screening item by utilizing the knowledge graph.

As an alternative embodiment, the matching recommendation method based on the specific object multi-index includes:

circularly acquiring a sequence set strung by basic indexes or calculation indexes according to time sequence by taking a specific analysis object entity node as a starting point;

constructing a calculation index or basic index time sequence matrix of the object set;

inputting a sample sequence in the sequence set into a sequence prediction model based on a graph neural network, training to obtain a corresponding index prediction result, and optimizing the index prediction model and parameters according to the matching degree;

and inputting the target object index sequence into a sequence prediction model, and predicting the optimal result of the index updating option of the target object through the matching degree.

Example 1

In embodiment 1, construction of a time sequence knowledge graph of human resources is realized, and g= (C, B, O, R, T, P) is used to represent a human resource analysis data knowledge graph, where C, B and O are both a class of nodes in G, B represents a basic index, C represents a calculation index, O represents an analysis object, R is an edge in G, represents a relationship between an index and an object, T is a time relationship, and P is an attribute of C, B, O.

1. Construction of time sequence knowledge graph

(1) O: the analysis object is mainly each person in the human resource management system. Is a most basic measure, and the attribute of the analysis object can be added to expand the measure content, such as the age and income attribute of the analysis object, and the average age or the total income of the analysis object can be measured in calculation.

(2) B: the basic index represents a basic classification mode of a business analysis required to analyze an object, and the classification mode can not be decomposed any more. For example, "han nationality" represents a basic classification of human resource management objects from an ethnic perspective. The computation logic f_t is stored in each node of type B, describing the analysis object features that can establish a relationship with this node.

(3) C: the method comprises the steps of calculating an index, representing a classification mode of an analysis object required by business analysis, wherein the classification mode can be formed by combining a plurality of basic classification modes, and the combination mode can be represented by calculation. For example, "minority" represents a combination of all non-han nationalities. The computation logic F_C is stored in each node of the C type, describing the basic index with which this index can be related.

(4) R: the relationship between nodes in G, one being the relationship between C and B, indicates that C is calculated from B, and the other being the relationship between B and O, indicates that O belongs to this B class.

(5) T: time, time scope is represented by attribute STARTDATE, ENDDATE, such as job level by STARTDATE and endDate, indicating the time it takes effect and fails.

(6) P: o, B, C, a feature attribute representing an analysis object, a base index, or a computational index, such as the base index "Miao nationality" feature classification is "minority nationality".

2. Analysis of factual data synchronization

Analytical fact data is typically stored in databases of business systems, typically in relational databases. The data is randomly changed, and after the data is changed, the data needs to be synchronized into the analysis knowledge graph in time so as to reflect the latest data condition.

Newly added personnel: and adding O-type nodes in the knowledge graph to represent newly added people. Running calculation logic of all basic indexes on the object, and establishing a relation with the indexes meeting the conditions by adding edges;

Modification personnel: synchronizing the information to the corresponding nodes in the knowledge-graph. Running calculation logic of all basic indexes on the object, and synchronizing the relation between the basic indexes and the object by adding or deleting the relation between the basic indexes and the object;

Deletion personnel: and deleting the corresponding node in the knowledge graph, and synchronously deleting all edges connected with the node.

3. Analysis index data synchronization

Analysis index data generally represents a business concern, which is a way to classify managed data objects, and new analysis requirements are generally satisfied by adding an index.

Basic index addition: adding B type nodes in the knowledge graph to represent newly added basic indexes, operating the operation logic of the newly added indexes on all analysis objects in the current knowledge graph, adding edges according to calculation results, establishing a relation and assigning time;

and (3) adding a calculation index: adding a node of type C in the knowledge graph represents a newly added calculation index.

Example 2

In embodiment 2, a data statistics calculation model of a time sequence knowledge graph of human resources is realized, and a data statistics calculation method is established for meeting the requirements of data statistics of different scenes.

Basic index operation: to calculate a certain basic index, only the index is needed to be used as a starting point, all entity nodes pointed by the index are obtained, the statistical result of the index can be calculated through counting, for example, the number of Chinese staff is calculated, and the average age of the Chinese staff is calculated through calculating the corresponding attribute of the node list. The calculation of the basic index with time sequence relationship can be realized by filtering based on time on the relationship.

Calculating an index: to calculate a certain calculation index, firstly, taking the calculation index as a starting point to obtain all pointed calculation or basic indexes, calculating all personnel sets corresponding to the corresponding basic indexes, then, combining all results to obtain personnel sets corresponding to the calculation indexes, counting or performing related calculation on the personnel sets to obtain the operation result of the calculation index, for example, calculating the average age of 'minority nationality' personnel, firstly, obtaining the minority nationality of all personnel, then obtaining the personnel set of each ethnicity, combining to obtain all the 'minority nationality' sets, and calculating the age average value.

Multi-index statistics: calculation result result＝count({o,where o has relation with b_1}∪{o,where o has relation with b_2}∩{o,when t in the duration from startdate1 to enddate2}).

Multi-index metrics: the calculation result result＝f({o,where o has relation with b_1}U{o,where o has relation with b_2}∩{o,when t in the duration from startdate1 to enddate2},p), is the calculation of the attribute p, and may be sum operation, max operation, min operation, avg operation, etc.

As shown in fig. 2, a specific example is given, the analysis objects included in the human resource data knowledge graph are Zhang san, li Si, wang Wu and Zhao Qi, the basic indexes are Han nationality, buyi nationality, zhen-ji and Ju-ji, the calculation indexes are minority nationality and management cadres, the node attributes are entity description and type, and the edges include the following relations: ethnic group affiliation, trunk affiliation, level affiliation, time of job free; for example, zhang three, lifour are Han families; the king is the Buyi group, belonging to the minority; the time of any stage of the king is 10 months in 2010, the time of no stage is 12 months in 2015, the time of any stage is 12 months in 2015, and the time of no stage is 6 months in 2018; the time of Zhao Qiren office is 11 months in 2016, and the time of 2021 is 3 months.

In embodiment 2, index statistics of human resource data is implemented, and the number of "han nationality" staff is counted, and the number of "han nationality" nodes in the node list can be obtained by performing operation on the time of entering the nodes in the han nationality "to obtain the number of 2.

In embodiment 2, the index measurement of the human resource data is realized, and the maximum time of any local level can be obtained by calculating the arrival degree and the time span of the local level trunk talent node of the node list, so that the result is 4.4 years.

Example 3

In embodiment 3, a data retrieval model of a time sequence knowledge graph of human resources is realized, and a data retrieval calculation method is established for meeting the requirements of data retrieval of different scenes.

Multi-index retrieval: calculation result result＝select({o,where o has relation with b_1}∪{o,where o has relation with b_2}∩{o,when t in the duration from startdate1 to enddate2}).

As shown in fig. 3, a specific example is given, the analysis objects included in the human resource time series data knowledge graph are li, wang and Zhao Qi, the calculation indexes are management authorities, the basic indexes are first branch company, second branch company, third branch company, team, department manager, branch company manager and assessment result, the node attributes are production line experience, rewards and professional technology, and the side includes the following relations: the relationships of the stem part designation membership, the job class membership, the assessment membership, the designation time of the designated or designated class, the job-free time, the assessment excellent, the time period of job title and the like; for example, lifour, wangwu work units were affiliated to the first branch company; the working units of the five and Zhao Qi are affiliated to the second branch company, and the working units of the seven are affiliated to the third branch company and the main company; the principal is a group leader and a department secondary manager in the first branch, the group leader has the time of 2 months in 2013, the job exemption time is 5 months in 2018, the secondary manager has the time of 5 months in 2018, the job exemption time is 7 months in 2022, and the principal examination in 2019, 2020 and 2021 is excellent. Zhao Qiren manager's time starts at month 2 of 2003, at month 5 of 2004 in any department of the second division, at month 6 of 2006 in any secondary manager of the second division, at month 1 of 2013 in any secondary manager of the third division, at month 9 of 2016 in any secondary manager of the third division, at month 52 of 2016 in any secondary manager of the main division to month 3 of 2022, and at Zhao Qi, 2020, 2021, and rated results of scale, scale and excellent, respectively.

In embodiment 3, information retrieval of human resource data is realized, and the Wei Jian system retrieves the entity object list of the grade of excellent secondary check for three years continuously, and the entity object of the secondary manager trunk of the excellent secondary check for three years of the second branch can be obtained by performing an aggregation operation on the entrance degree and time of the second branch, the secondary manager and the grade of excellent check of the node list, as shown in fig. 4, the corresponding analysis entity object is wang five.

Example 4

In example 4, a method for performing data analysis based on a time-series knowledge graph of human resources expressed by g= (C, B, O, R, T, P) is provided, where the analysis object O represents a person, the calculation index C represents a post, T represents the time of the person at the post, and P represents the attribute of the person and the post. And constructing a heterogeneous graph of the person post and the attribute.

Several time sequences of person objects o (i) are generated by random walk. The sequence implies the related information between the personnel and the posts, and the application distribution hypothesis is that the requirement condition of one post for the personnel can be reflected by a plurality of posts before and after the post time sequence. A method based on bert and a pre-trained model utilizes personnel and post feature representations to predict context nodes. The feature expression vector of the post is randomly initialized, and the feature vector is continuously optimized in the training model process until the context node can be accurately predicted. At this time, the node feature vector already implies the association structure information in the graph. Considering that the calculation index of the heterogeneous network is multi-type and edge time variation, a post sequence-based path random walk method is adopted, and heterogeneous neighbors containing semantics can be generated for various types of nodes. Given the tenure sequence, starting from a node of type c (j), the random walk will only access a node of type c (j+1) in the next step. The random walk guided by the meta-path generates a node sequence on the network, and the node sequence is input into a model for training.

The tenninal time sequence G of person o (i) is expressed as: g (o (i)): < o (i), c (1), t (1) > - > < o (i), c (2), t (2) > - > < o (i) is walked around the net, c (j), t (k). Wherein i is personnel ID, j and k are the serial numbers of personnel job positions and job time.

Because the personnel and the posts are nodes in the graph, the characteristic representation vectors of the personnel and the posts are obtained by utilizing the process, and the personnel post relation analysis can be performed. The core of the human post relation analysis is the similarity of feature vectors of the computing personnel nodes and the post nodes. The person post matching degree is quantified by calculating cosine similarity or inner product of the corresponding vectors. For post recommendation, according to the post recommendation, a classification model is pre-trained on the basis of feature vectors according to post relation data, and the potential links of the posts in the network can be judged, namely post adaptation.

As shown in fig. 5, a specific example is given, where the human resource time sequence knowledge graph includes analysis objects of Liu one, qian two, zhang three, liu four, wang five, sun Liu, zhao Qi; the calculation indexes are posts, including organization associate department, a willow town auxiliary place, a Wei Jian principal place, a state-owned enterprise total manager, a district straight unit auxiliary office and the like; basic indexes comprise organization parts, state education commission, wei Jian commission, willow town, district straight units and the like, minor departments, major departments, minor departments and the like; edges include timing relationships between computation metrics.

In embodiment 4, a person post relationship heterogeneous diagram is constructed based on a human resource time sequence knowledge graph to implement information recommendation by contrast learning, for example, a person post relationship heterogeneous diagram n= { person, post, time of job }, wherein person w= (workerid, properties), wherein attribute values include characteristic information such as age, academic, ethnicity, political face, etc., post s= (roleid, properties), wherein attribute values include characteristic information such as category, level, etc., time of job t= (workerid, roleid, duration), wherein time of job duration is converted from begindate and enddate.

As shown in FIG. 6, a person post relationship heterogeneous map is constructed by using part of the data of FIG. 5, wherein the person post relationship heterogeneous map comprises person objects Zhang three, lifour and Wang five, the post objects comprise a group principal, ma Zhen auxiliary departments, state-owned enterprise general manager, regional straight unit auxiliary departments and the like, the attribute of the person is ethnic, and the attribute of the post is category.

According to the human resource time sequence data analysis method disclosed by the embodiment of the invention, the data analysis is carried out based on the established human resource time sequence data knowledge graph, the synchronous update of analysis fact data and analysis index data can be dynamically realized, the flexibility and the effectiveness of data modeling are effectively improved, the method has stronger adaptability to the random change and update of indexes, measurement and fact data, particularly for the scene that the indexes change along with the needs and the human resource business data randomly, the human resource time sequence data analysis method based on the knowledge graph adapts to the needs and the changes of data sources in an incremental update mode, the adaptability and the flexibility of the human resource time sequence data analysis method are remarkably improved, and the method has good application prospects in the technical field of human resource time sequence data analysis.

The technical details disclosed in the technical scheme and the embodiment of the application are only illustrative of the inventive concept of the application and are not limiting to the technical scheme of the application, and all conventional changes, substitutions or combinations of the technical details disclosed in the application have the same inventive concept as the application and are within the scope of the claims of the application.

Claims

1. The human resource data analysis method based on the time sequence knowledge graph is characterized by comprising the following steps of:

S101: the method for establishing the human resource time sequence data knowledge graph comprises nodes, edges and attributes of the nodes; the node comprises a basic index, a calculation index and an analysis object; the basic index is an index classification which can not be decomposed, the calculation index is an index classification formed by combining a plurality of basic indexes, and the analysis object is each member in the human resource management system; the edges refer to the relation between the nodes, and comprise the belonged relation between the analysis object and the basic index, the calculated relation between the calculation index and the basic index and the calculated relation between the calculation index and the analysis object;

s102: synchronously analyzing the fact data, and synchronizing the changed fact data to the time sequence data knowledge graph of the human resources;

s103: synchronously analyzing index data, and synchronizing the added index data into the time sequence data knowledge graph of the human resources;

s104: establishing a data analysis method and obtaining a data analysis result; specifically, the data analysis method is a matching recommendation method based on multiple indexes of a specific object, and comprises the following steps:

inputting the target object index sequence into a sequence prediction model, and predicting the optimal result of the index updating option of the target object through the matching degree;

Performing data analysis on a human resource time sequence knowledge graph represented by G= (C, B, O, R, T and P), setting an analysis object O to represent a person, calculating an index C to represent a post, T to represent the time of the person in the post, and P to represent the attribute of the person and the post; b represents a basic index, and R represents a relation between nodes;

Constructing a heterogeneous graph of the person post and the attribute;

generating a number of time sequences by the person object o (i) by random walk;

A method based on bert and a pre-training model, using the personnel and post feature representations to predict the context nodes;

given a tenninal sequence, starting from a node with the type c (j), randomly walking, and accessing only the node with the type c (j+1) next;

Generating a node sequence on a network by using the random walk guided by the meta-path, and inputting the node sequence into a model for training;

The tenninal time sequence G of person o (i) is expressed as: g (o (i)): < o (i), c (1), t (1) > - > < o (i), c (2), t (2) > - > < o (i) wandering around the net, c (j), t (k); wherein i is a personnel ID, j and k are the serial numbers of personnel job positions and the serial numbers of job positions;

and obtaining the characteristic representation vector of the personnel and the posts by using the process, and analyzing the personnel post relation.

2. The time series knowledge graph-based human resource data analysis method according to claim 1, wherein the time relation between the analysis object and the base index and the time relation between the analysis object and the calculation index represent an effective time range by a start time and an end time.

3. The human resource data analysis method based on the time series knowledge graph according to claim 1, wherein the step S102 specifically includes:

4. The human resources data analysis method based on time series knowledge graph according to claim 1, wherein the step S103 specifically includes:

5. The human resources data analysis method based on the time series knowledge graph according to claim 1, wherein in the step S104, the data analysis method includes a data analysis method based on a basic index and a data analysis method based on a calculation index.

6. The time series knowledge graph-based human resource data analysis method according to claim 5, wherein the basic index-based data analysis method comprises:

and calculating the statistical result of the basic index by counting.

7. The time-series knowledge graph-based human resource data analysis method according to claim 5, wherein the calculation index-based data analysis method comprises: