CN116644184B

CN116644184B - Human resource information management system based on data clustering

Info

Publication number: CN116644184B
Application number: CN202310933469.XA
Authority: CN
Inventors: 竹甜钿
Original assignee: Zhejiang Houxue Network Technology Co ltd
Current assignee: Zhejiang Houxue Network Technology Co ltd
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2023-10-20
Anticipated expiration: 2043-07-27
Also published as: CN116644184A

Abstract

The invention relates to the technical field of data processing, in particular to a human resource information management system based on data clustering. The system comprises: the data acquisition module is used for acquiring keyword areas of each dimension of the human resource information and setting corresponding dimension weights; the comprehensive advantage analysis module is used for acquiring non-repeated texts in the keyword area and acquiring comprehensive advantage parameters of each dimension in each piece of human resource information; the dimension advantage analysis module is used for acquiring the same-dimension advantage parameters of each dimension according to the hierarchy weights of the same dimension of the human resource information; and clustering management is carried out on the human resource information by combining comprehensive advantage parameters of any dimension in each piece of human resource information, the same-dimension advantage parameters and the dimension weight. According to the invention, text information of different dimensionalities of the human resource information and the advantages of the same dimensionality are analyzed, so that the human resource information is clustered more accurately, and the management efficiency of the human resource information is improved.

Description

Human resource information management system based on data clustering

Technical Field

The invention relates to the technical field of data processing, in particular to a human resource information management system based on data clustering.

Background

Human resource management refers to the collective term of a series of management activities such as planning, recruiting, training, evaluating, and stimulating human resources by an enterprise or organization. Talent recruitment in human resources is mainly judged by screening resume delivered by job seekers, and talent training is mainly to acquire related training contents and the like from the internet. The capability level of various job seekers is uneven, so that the distribution of posts and the recording efficiency of the job seekers are low, and the management of human resource information of enterprises is very important.

In the prior art, word vectors of keywords of different types in human resource information are obtained by utilizing a latent semantic model, the word vectors of repeated keywords and the word vectors of non-repeated keywords are respectively weighted and summed according to summation factors of the word vectors, feature vectors of the human resource information are obtained, and the human resource information is managed based on the feature vectors. Because the human resource information contains a plurality of different types of information, and repeated information possibly exists in the different types of information, the characteristic vector can not accurately represent the human resource information due to the repeated information, so that a clustering result is inaccurate, and the management efficiency of the human resource information is further reduced.

Disclosure of Invention

In order to solve the technical problem that repeated information exists in different types of content of human resource information, so that a clustering result is inaccurate, the invention aims to provide a human resource information management system based on data clustering, and the adopted technical scheme is as follows:

the invention provides a human resource information management system based on data clustering, which comprises:

the data acquisition module is used for dividing each piece of human resource information into at least two dimensions according to different types of preset keywords, and one dimension corresponds to one type of preset keyword; setting dimension weights of all dimensions of each piece of human resource information; acquiring a keyword area of each dimension according to the position of a preset keyword in the human resource information;

the comprehensive advantage analysis module is used for acquiring a text which is not repeated in the keyword area; acquiring comprehensive advantage parameters of each dimension in each piece of human resource information according to the distribution confusion condition of non-repeated texts in a keyword area of any dimension and other arbitrary dimensions and the difference of the number of the non-repeated texts;

the dimension advantage analysis module is used for acquiring the hierarchical weight of each dimension of each piece of human resource information; according to the hierarchical weight of any one dimension of any one piece of human resource information and the hierarchical weight of the corresponding dimension of other human resource information, the same-dimension advantage parameter of each dimension of each piece of human resource information is obtained;

And the resource information management module is used for carrying out cluster management on the human resource information by combining the comprehensive advantage parameter of any dimension of each human resource information, the same-dimension advantage parameter and the dimension weight.

Further, the method for acquiring the keyword region comprises the following steps:

traversing all texts in each piece of human resource information; if the same text does not exist in the human resource information in each type of preset keywords, the text does not exist in the keyword area of the corresponding dimension of the preset keywords of the corresponding type, and the tag value of the corresponding dimension is set to be a first tag value;

if the same text exists in the human resource information in each type of preset keywords, the text is used as the keyword text of the corresponding dimension of the preset keywords of the corresponding type in the human resource information, and the label value of the corresponding dimension of the preset keywords of the corresponding type is set as a second label value; in each piece of human resource information, all texts in the area from the first keyword text in the front keyword text in the two adjacent keyword texts to the first keyword text in the rear keyword text are used as keyword areas of corresponding dimensions of the front keyword text; and taking all texts in the region from the first keyword text to the end of the human resource information in the last keyword text as the keyword region of the corresponding dimension of the last keyword text.

Further, the method for acquiring the comprehensive advantage parameter comprises the following steps:

substituting the probability of each non-repeated text in the keyword area of one dimension and the keyword area of the other dimension into an information entropy formula for any two dimensions of each piece of human resource information, and sequentially obtaining the text confusion degree of the corresponding two dimensions; substituting the probability of each non-repeated text in the two keyword areas corresponding to any two dimensions into an information entropy formula to obtain the text joint confusion corresponding to the two dimensions; taking the ratio of the sum of the text confusion corresponding to any two dimensions to the text joint confusion as a joint influence parameter corresponding to the two dimensions;

counting the number of the non-repeated texts in each keyword area;

taking the sum of the numbers of the non-repeated texts in the keyword areas of any two dimensions as a molecule, and taking the ratio obtained by taking the sum of the absolute value of the difference of the numbers and a preset constant as a denominator as a quantity adjustment value of the corresponding two dimensions; the quantity adjustment values of any two dimensions are used as weights of the joint influence parameters to be adjusted, so that quantity influence parameters corresponding to the two dimensions are obtained;

When the label value of the dimension of the human resource information is a first label value, the comprehensive advantage parameter of the corresponding dimension is 0; and carrying out negative correlation and normalized mapping on the average value of the quantity influence parameters of any dimension with the label value being the second label value and any other dimension to obtain the comprehensive advantage parameters of the corresponding dimension.

Further, the method for acquiring the same-dimensional dominant parameter comprises the following steps:

and taking the difference value between the hierarchical weight of any one dimension of each piece of human resource information and the average value of the hierarchical weights of the same dimension of other human resource information as the same-dimension advantage parameter of the corresponding dimension of each piece of human resource information.

Further, the method for performing cluster management on the human resource information by combining the comprehensive advantage parameter, the same-dimensional advantage parameter and the dimension weight of any dimension of each piece of human resource information comprises the following steps:

respectively carrying out normalization processing on the comprehensive advantage parameter and the same-dimensional advantage parameter to sequentially obtain a normalized comprehensive advantage parameter and a normalized single-dimensional advantage parameter;

taking a preset first adjustment coefficient as the weight of the normalized comprehensive advantage parameter, taking a preset second adjustment coefficient as the weight of the normalized single-dimensional advantage parameter, and carrying out weighted summation on the normalized comprehensive advantage parameter and the normalized single-dimensional advantage parameter of each dimension of each piece of human resource information to obtain a weighted influence parameter of each dimension of each piece of human resource information; obtaining a final dimension value of each dimension of each piece of human resource information by multiplying the weighted influence parameter of each dimension of each piece of human resource information by the dimension weight of the corresponding dimension;

Accumulating the final dimension values of each dimension of each piece of human resource information to obtain clustering parameters of the corresponding human resource information;

clustering the clustering parameters by using a clustering algorithm to obtain a preset number of clustering clusters; and displaying the human resource information corresponding to the clustering parameters in different clusters.

Further, the method for acquiring the non-repeated text comprises the following steps:

and removing repeated texts in each keyword area by using a text de-duplication algorithm to obtain non-repeated texts in the corresponding keyword area.

Further, the method for acquiring the hierarchical weight comprises the following steps:

when the label value of the dimension of the human resource information is the first label value, the hierarchical weight of the corresponding dimension is 0;

and analyzing texts in the keyword areas of any same dimension of all the human resource information by using a hierarchical analysis method based on preset screening content of each dimension to obtain the hierarchical weight of each dimension of each piece of human resource information.

The invention has the following beneficial effects:

in the embodiment of the invention, the clustering of the human resource information is based on the text information of each dimension, the keyword area of the dimension is the premise of carrying out the clustering analysis on the human resource information, the repeated information in each dimension, namely, the keyword area of different types, in the human resource information cannot provide effective information for the clustering process, the clustering effect of the human resource information is influenced, the distribution disorder condition of the non-repeated text in the keyword area of different dimensions of the same human resource information is presented, the influence of the distribution condition of the non-repeated text on the information among the dimensions is presented, the clustering effect is possibly influenced due to the number difference of the non-repeated text of different dimensions, and the influence condition among the dimensions is regulated through the number difference of the non-repeated text, so that the comprehensive advantage parameter can more accurately reflect the advantage information of the comprehensive dimension; the hierarchical weights represent the adaptation degree of the content of a certain dimension in the human resource information and the enterprise related content, and the hierarchical weights of the same dimension in all human resource information are compared, so that the advantage parameter of the same dimension can more accurately represent the advantage of a single dimension; based on comprehensive advantage parameters obtained by the mutual influence condition of repeated contents among different dimensions in the same resume, the comprehensive capability of each dimension in the resume can be measured, the same-dimension advantage parameters represent the importance degree of the same dimension in different human resource information, the dimension weight measures the importance degree of each dimension information in the human resource information to the enterprise related content, the characteristics of the human resource information can be accurately represented by three comprehensive factors, the human resource information can be clustered based on the characteristics of the human resource information, the phenomenon of dimension spells is avoided, the accuracy of human resource information clustering is increased, and the management efficiency of the human resource information is further improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a system block diagram of a human resource information management system based on data clustering according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of an intelligent monitoring system and a monitoring method for a construction hanging basket according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The invention aims at the specific scene: when the human resource information is managed, the human resource information is often required to be classified, but when the human resource information is clustered by the existing K-means clustering algorithm, the phenomenon of 'dimension curse' is easy to occur because of too much information of different types contained in the human resource information, so that the clustering result is inaccurate. On the basis of quantifying the resume information based on the carried information quantity and the personal ability advantage characteristics, the invention combines different types of information in the resume information to generate the clustering parameters for each resume so as to realize the clustering management of the human resource management information.

The following specifically describes a specific scheme of the human resource information management system based on data clustering provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a system block diagram of a human resource information management system based on data clustering according to an embodiment of the present invention is shown, where the system includes: the system comprises a data acquisition module 101, a comprehensive advantage analysis module 102, a dimension advantage analysis module 103 and a resource information management module 104.

The data acquisition module 101 is configured to divide each piece of human resource information into at least two dimensions according to preset keywords in different categories, where one dimension corresponds to one type of preset keyword; setting dimension weights of all dimensions of each piece of human resource information; and acquiring a keyword area of each dimension according to the position of the preset keyword in the human resource information.

Talent recruitment in human resources has an extremely important impact on the injection of fresh blood into businesses and companies. Talent recruitment in human resources is mainly judged by screening resume delivered by job seekers, various job seekers are uneven in capacity level for companies, job seekers are more, distribution of posts and recording efficiency of job seekers are low, and therefore classification of resume is very important for human resource information management of enterprises.

The information of the resume collected by enterprises is unstructured and chaotic, and the resume is easily affected by redundant irrelevant information when being clustered, so that the calculation amount is increased in the whole information analysis and clustering parameter acquisition process of each resume, and therefore, the resume information has structured unified analysis characteristics due to the fact that multidimensional information is required to be extracted from each resume. The embodiment of the invention performs cluster analysis on the resume with the fixed template format.

Specifically, in the embodiment of the invention, n resume received by an enterprise in one day is analyzed. In the prior art, title information in each resume in a resume database can be acquired through a text detection algorithm, and enterprises can limit specific preset keywords, such as educational experience, work experience, project experience and the like, according to self-requirements by counting contents of the title information. In the embodiment of the invention, the enterprise defines M preset keywords, and each preset keyword corresponds to one dimension in the resume, namely M dimensions in each resume.

Because the recruitment of enterprises has different importance degrees on each preset keyword, namely different desirability, the dimension weight of each dimension in each resume needs to be set. The specific dimension weight can be determined according to the research of human resource departments of enterprises, and an implementer can set the specific dimension weight according to actual conditions. The preset keywords corresponding to all dimensions in the invention are equally important for enterprise recruitment, so that the dimension weights of all dimensions in the resume are set to be equal, namelyWherein->Dimension weight of 1 st dimension in resume,/->Dimension weight for the m-th dimension in resume,>and M is the number of the dimensions in each resume.

Clustering the resume is based on text information corresponding to each dimension, and the keyword area of the dimension in the resume is the premise of performing cluster analysis on the resume.

Preferably, the keyword region acquiring method comprises the following steps: traversing all texts in each piece of human resource information; if the same text does not exist in the human resource information in each type of preset keywords, the text does not exist in the keyword area of the corresponding dimension of the preset keywords of the corresponding type, and the tag value of the corresponding dimension is set to be a first tag value; if the same text exists in the human resource information in each type of preset keywords, the text is used as the keyword text of the corresponding dimension of the preset keywords of the corresponding type in the human resource information, and the label value of the corresponding dimension of the preset keywords of the corresponding type is set as a second label value; in each piece of human resource information, all texts in the area from the first keyword text in the front keyword text in the two adjacent keyword texts to the first keyword text in the rear keyword text are used as keyword areas of corresponding dimensions of the front keyword text; and taking all texts in the region from the first keyword text to the end of the human resource information in the last keyword text as the keyword region of the corresponding dimension of the last keyword text.

As one example, the number of categories of preset keywords is equal to the number of dimensions in the resume. If the text which is the same as each type of preset keyword cannot be found in the resume, setting the label value of the corresponding dimension of the preset keyword of the type in the resume as a first label value, and enabling text information to be absent in the keyword area of the corresponding dimension of the preset keyword in the resume; if the same text is found, setting the label value of the corresponding dimension of the preset keyword of the corresponding category in the resume as a second label value, and then setting a keyword area in the corresponding dimension of the preset keyword of the category in the resume. In the embodiment of the invention, the first label value and the second label value are respectively set to be 0 and 1 in sequence, and an implementer can set the label value by himself. For convenience of description, the text in the resume is replaced by a number, and the text information in the resume is 12135121356789, wherein the keyword text in the 1 st dimension in the resume is 1, the keyword text in the 2 nd dimension is 4, and the keyword text in the 3 rd dimension is 5; at this time, the keyword region of the 1 st dimension corresponding to the keyword text 1 is 121351213, and the label value of the dimension is the second label value 1; no text exists in the 2 nd dimension keyword area corresponding to the keyword text 4, and the dimension tag value is a first tag value 0; the 3 rd dimension keyword area corresponding to the keyword text 5 is 56789, and the dimension tag value is the second tag value 1. Since the selection of the preset keywords is based on the brief description of a certain part of content in the resume, the keywords cannot be adjacent texts or the last text of the resume, and each keyword in the resume has a corresponding keyword area.

So far, the keyword area of each dimension of each resume is obtained.

The comprehensive advantage analysis module 102 is configured to obtain a text that is not repeated in the keyword region; and acquiring comprehensive advantage parameters of each dimension in each piece of human resource information according to the distribution confusion condition of the non-repeated text in the keyword area of any dimension and any other dimension and the difference of the number of the non-repeated text.

The larger the information quantity carried in the resume is, the more the enterprise can be helped to know job seekers, and the more the resume has advantages in different dimensions compared with other resume, the more the gap of the job seekers can be embodied, so the resume is a good resume for human resource management.

The information carried in the resume may have repeated information, and the repeated information of different dimensions in the resume affects the clustering effect, so that analysis is required for the text which is not repeated in the keyword area. The method for acquiring the non-repeated text comprises the following steps: and removing repeated texts in each keyword area by using a text de-duplication algorithm to obtain non-repeated texts in the corresponding keyword area. In the implementation of the invention, a minimum hash (MinHash) algorithm is selected to de-duplicate the text of the keyword area, so as to obtain the non-repeated text of the keyword area. The minimum hash algorithm is a well-known technology for those skilled in the art, and will not be described herein.

When the enterprise performs clustering judgment on the resume, the information carried in the resume is more comprehensive and more detailed, and a great amount of repeated information possibly exists among contents in different dimensions in the resume due to the difference of personal life histories of job seekers, so that certain dimensions in the resume cannot provide more effective information in the resume clustering process, and the resume clustering is easy to influence each other.

Repeated texts in a keyword area easily enable a certain dimension to not provide more effective information in a resume clustering process, influence of distribution conditions of the non-repeated texts on information among the dimensions is presented based on the distribution disorder conditions of the non-repeated texts in keyword areas of different dimensions of the same resume, clustering effects can be possibly influenced by differences of numbers of the non-repeated texts of different dimensions, influence conditions among the dimensions are adjusted through differences of numbers of the non-repeated texts, and comprehensive advantage parameters can reflect advantage information of the dimensions more accurately.

Preferably, the method for acquiring the comprehensive advantage parameters comprises the following steps: substituting the probability of each non-repeated text in the keyword area of one dimension and the keyword area of the other dimension into an information entropy formula for any two dimensions of each piece of human resource information, and sequentially obtaining the text confusion degree of the corresponding two dimensions; substituting the probability of each non-repeated text in the two keyword areas corresponding to any two dimensions into an information entropy formula to obtain the text joint confusion corresponding to the two dimensions; taking the ratio of the sum of the text confusion corresponding to any two dimensions to the text joint confusion as a joint influence parameter corresponding to the two dimensions; counting the number of the non-repeated texts in each keyword area; taking the sum of the numbers of the non-repeated texts in the keyword areas of any two dimensions as a molecule, and taking the ratio obtained by taking the sum of the absolute value of the difference of the numbers and a preset constant as a denominator as a quantity adjustment value of the corresponding two dimensions; the quantity adjustment value of any two dimensions is used as the weight of the combined influence parameter to be adjusted, so that the quantity influence parameter corresponding to the two dimensions is obtained; when the label value of the dimension of the human resource information is the first label value, the comprehensive advantage parameter of the corresponding dimension is 0; and carrying out negative correlation and normalized mapping on the average value of the number influence parameters of any dimension with the label value being the second label value and any other dimension to obtain the comprehensive advantage parameter of the corresponding dimension.

As an example, to facilitate understanding that text in a resume is replaced with numbers, assume text in the keyword region of the m-th dimension in the resume is 1, 2, 3, 4, 5, thThe text in the keyword area of each dimension is 2, 3, 4, 5 and 6, and the m dimension and the +.>The number of the non-repeated texts in the key word areas of each dimension is sequentially 5 and 5; mth dimension and->The non-repeated texts in the two keyword areas of each dimension are 1, 2, 3, 4, 5 and 6, and the number of the non-repeated texts is 6. When more repeated texts are in the keyword area with two dimensions, namely, the smaller the text joint confusion with two dimensions is, the larger the joint influence parameter is, the description is +.>The greater the effect of each dimension on the mth dimension. When the mth dimension exists and +.>When the dimension does not exist, the information entropy of the non-repeated text of the keyword area of the two dimensions is equal to the information entropy of the non-repeated text of the keyword area of the mth dimension, and the joint influence parameter of the two dimensions is->Equal to 1; when the mth dimension and->When the text content of two keyword areas of two dimensions is identical, the joint influence parameter of two dimensions is +.>Equal to 2. Under the condition that the number of non-repeated texts in the keyword areas in two dimensions is large in difference, even if the joint influence coefficient is large, the influence on the whole resume is small. For example, the number of non-repeated text in the mth dimension area +. >First->Number of non-repeated text of individual dimension area +.>Even if->The non-repeated text of the keyword region of the dimension is a subset of the non-repeated text of the keyword region of the m-th dimension, nor can it be stated +.>The influence of the individual dimensions on the mth dimension is greater, so in the joint influence coefficient +.>On the basis of the above, the difference of the number of the text which is not repeated in the two-dimensional area is utilized to carry out constraint, and the number influence parameters corresponding to the two dimensions are obtained>. When->The smaller the total number difference of non-repeated texts with each dimension corresponding to the dimension area of the m-th dimension, the number influencing parameter +.>The larger is->The more serious the influence of each dimension on the mth dimension, further explaining the weaker the degree of integration of the mth dimension. It should be noted that, when the tag value of the m dimension of the resume is the first tag value 0, the m preset keyword does not have the same text in the resume, i.e. the m dimension does not have a keyword area, and the m dimension is described as not having comprehensive advantages, the comprehensive advantage parameter is set to be 0; for the mth dimension with the tag value of the second tag value of 1, acquiring the quantity influence parameters of other dimensions and the mth dimension of the resume, carrying out negative correlation and normalized mapping on the average value of all quantity influence parameters, wherein the more serious the other dimensions influence the content of the mth dimension, the worse the comprehensive degree, and after carrying out negative correlation mapping, the stronger the comprehensive capacity of the mth dimension is explained, namely the comprehensive advantage parameter of the mth dimension is shown >The larger.

And combining the distribution confusion condition of the non-repeated texts in the key word areas of the m dimension and other dimensions of the resume and the number difference of the non-repeated texts to obtain the comprehensive advantage parameter of the m dimension of the resume. The calculation formula of the comprehensive advantage parameter is as follows:

in the method, in the process of the invention,the comprehensive advantage parameter of the m dimension of each resume; m is the number of dimensions in each resume, namely the number of categories of preset keywords; />Mth dimension and mth dimension for each resume>The number of individual dimensions influences the parameters, wherein +.>And->；/>The number of the non-repeated texts in the keyword area of the m dimension; />Is the firstThe number of non-repeated texts in the keyword area of each dimension; />Is->The joint influence parameter of the m dimension by the dimension; />The keyword region of the m-th dimension is the +.>Probability of occurrence of a non-duplicate text; />Is->The +.>Probability of occurrence of a non-duplicate text; />For the m-th dimension and +.>The number of non-repeated texts in the two keyword areas corresponding to the dimensions; />For the m-th dimension and +.>First +.in two keyword regions corresponding to each dimension>Probability of occurrence of a non-duplicate text; / >Taking an empirical value of 1 for a preset constant, and acting as a prevention of meaningless formula; />Is a logarithmic function based on a constant 2; />As a function of absolute value; e is a natural constant.

It should be noted that, when the m-th dimension and the m-th dimension of the resumeThe more the repeated content in the keyword area of each dimension, the smaller the text joint confusion of the two dimensions is compared with the sum of the text confusion of the two dimensions, resulting in the joint influence parameter of the two dimensionsCount->The larger; when the difference of the number of the non-repeated texts in the keyword areas of the two dimensions, the mutual influence of the text contents in the keyword areas corresponding to the two dimensions is influenced, so that the clustering of the overall resume is influenced, and the number adjustment value is +.>For the combined influencing parameters->Adjusting to improve the accuracy of the joint influence parameters; when the repeated text is more in the keyword areas of other dimensions and the mth dimension of the resume, the more the repeated text is +.>The larger the content of the mth dimension is, the more serious the influence of other dimensions is, the worse the comprehensive degree is, and the stronger the comprehensive capability of the mth dimension is, namely the comprehensive advantage parameter of the mth dimension is, after the negative correlation mapping is carried out>The larger the dimension is, the smaller the influence of the rest dimension on the mth dimension due to the repeated text is, and the more information the mth dimension provides when clustering is carried out; when the joint influence parameters are calculated, the numerator and the denominator are both information entropy formulas, and the negative sign in the information entropy formulas in the numerator and the denominator is divided.

According to the method for calculating the mutual influence parameters of the mth dimension of the resume, the comprehensive advantage parameters of each dimension of each resume are obtained.

So far, each dimension of each resume has a corresponding comprehensive advantage parameter.

A dimension dominance analysis module 103, configured to obtain a hierarchical weight of each dimension of each piece of human resource information; and acquiring the same-dimensional advantage parameters of each dimension of each piece of human resource information according to the hierarchical weight of any dimension of any piece of human resource information and the hierarchical weight of the corresponding dimension of other human resource information.

The enterprise filters the profile, and the capability of a certain dimension in a certain resume is very outstanding, which is very necessary when the enterprise filters professional talents through the resume, so that the hierarchical weight of each dimension in each resume needs to be obtained, and the hierarchical weight is used as a measure of the personal capability in the same dimension in each resume.

Preferably, the method for acquiring the hierarchical weight comprises the following steps: when the label value of the dimension of the human resource information is the first label value, the hierarchical weight of the corresponding dimension is 0; and analyzing texts in the keyword areas of any same dimension of all the human resource information by using a hierarchical analysis method based on preset screening content of each dimension to obtain hierarchical weights of each dimension of each piece of human resource information.

As an example, take the firstThe method is characterized in that the keyword is analyzed by taking working experience as an example, the keyword area of each dimension in the resume corresponds to the corresponding content requirement of the enterprise post recruitment, and the preset screening content of each dimension is the corresponding content requirement of the enterprise post recruitment. If the content of the keyword area of the "working experience" dimension in the 1 st resume is: the content of the keyword area of the "working experience" dimension in the 2 nd resume, which has the working experience of the service industry of 3 years, is: with->The method is characterized in that the working experience of the annual computer industry is that the preset screening content of the 'working experience' dimension in the enterprise post recruitment requirement is assumed to be 'computer', and the 1 st and 2 nd resume weights, namely the hierarchy weights, relative to the weights of the enterprise post recruitment can be obtained by combining a hierarchy analysis method on the preset screening content of the 'working experience' dimension, and the larger the hierarchy weights, the more suitable the description, the lower the hierarchy weight of the 1 st part in the 'working experience' dimension is than the hierarchy weight of the 2 nd part. The analytic hierarchy process is a well known technique to those skilled in the art, and is not described herein.

The similar advantage parameters of the dimension in one resume are compared according to the level weight of the same dimension in the resume, and compared with the level weight of the same dimension in other resume, the level weight presents the adaptation degree of the content of the dimension in the resume and the recruitment content of the enterprise, so that the advantage of the dimension in the resume is presented more accurately.

Preferably, the method for acquiring the same-dimension dominant parameter comprises the following steps: and taking the difference value between the hierarchical weight of any dimension of each piece of human resource information and the average value of the hierarchical weights of the same dimension of other human resource information as the same-dimension dominant parameter of the corresponding dimension of each piece of human resource information.

The calculation formula of the same-dimensional dominance parameter of the m dimension of the nth resume is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,the same-dimensional dominance parameter of the m dimension of the nth resume; />The hierarchy weight of the m dimension of the nth resume; />The hierarchy weight of the mth dimension of the jth resume, wherein +.>And->The method comprises the steps of carrying out a first treatment on the surface of the n is the total number of resume acquired by the company in one day.

It should be noted that, if the capability of a certain dimension in a resume is very prominent, the analytic hierarchy process is used to obtain a larger hierarchical weight of the dimension, and to compare the dominant situation of the same dimension of multiple resume, calculate the average value of the hierarchical weights of the same dimension of other resume, when the information of a certain dimension of the nth resume is compared with the information of the corresponding dimension of other resumeMore prominent, thenThe positive and larger; when one dimension of the nth resume is not highlighted by the information of the corresponding dimension of the other resume, the corresponding dimension of the nth resume is +. >The negative and smaller.

And obtaining the same-dimensional dominant parameters of each dimension of each resume according to the calculation method of the same-dimensional dominant parameters of the m dimension of the nth resume.

So far, each dimension of each resume has corresponding homodimensional dominant parameters.

The resource information management module 104 is configured to perform cluster management on the human resource information by combining the comprehensive advantage parameter of any dimension of each piece of human resource information, the same-dimension advantage parameter and the dimension weight.

Based on comprehensive advantage parameters obtained by the mutual influence condition of repeated contents among different dimensions in the same resume, comprehensive capacity of each dimension in the resume can be measured, the same-dimension advantage parameters represent importance degrees of the same dimension in different resume, the dimension weight measures importance degrees of each dimension information in the resume for enterprise recruitment, any dimension in the resume is analyzed by combining three factors, and the final dimension value represents importance degrees of dimension information suitable for enterprise requirements.

Preferably, the method for acquiring the final dimension value of each dimension of each resume is as follows: respectively carrying out normalization processing on the comprehensive advantage parameter and the same-dimension advantage parameter to sequentially obtain a normalized comprehensive advantage parameter and a normalized single-dimension advantage parameter; taking a preset first adjustment coefficient as the weight of the normalized comprehensive advantage parameter, taking a preset second adjustment coefficient as the weight of the normalized single-dimensional advantage parameter, and carrying out weighted summation on the normalized comprehensive advantage parameter and the normalized single-dimensional advantage parameter of each dimension of each piece of human resource information to obtain the weighted influence parameter of each dimension of each piece of human resource information; and multiplying the weighted influence parameter of each dimension of each piece of human resource information by the dimension weight of the corresponding dimension to obtain the final dimension value of each dimension of each piece of human resource information.

It should be noted that the first adjustment parameter is presetAnd preset a second adjustment parameter->The sum is an integer 1, wherein,，/>. Presetting a first adjusting parameter->And preset a second adjustment parameter->The specific size of the (C) is regulated according to the practical conditions of the enterprise, and the specific regulation method comprises the following steps: if the enterprise pays attention to the comprehensive ability, a first adjustment parameter is preset +.>Larger, preset second adjustment parameter +.>Smaller; if the enterprise pays attention to the single domain prominence capability, presetting a first adjustment parameter +.>Smaller, preset second adjustment parameter +.>Larger; if enterprises pay attention to comprehensive capacity and single-domain prominence capacity. The first adjustment parameter is preset>And preset a second adjustment parameter->Both 0.5. In the embodiment of the invention, the first adjustment parameter +.>And preset a second adjustment parameter->Experience values of 0.5 and 0.5 are sequentially taken. In the embodiment of the invention, normalization functions are used for respectively normalizing the comprehensive advantage parameter and the same-dimensional advantage parameter, and other normalization methods for normalizing the comprehensive advantage parameter and the same-dimensional advantage parameter, such as normalization methods of function transformation, maximum and minimum normalization and the like, can be selected in the embodiment of the invention, and the method is not limited.

And adjusting the hierarchy weight by combining the comprehensive advantage parameter and the same-dimensional advantage parameter of each dimension of each resume to obtain a final dimension value of each dimension of each resume. The calculation formula of the final dimension value is as follows:

in the method, in the process of the invention,the final dimension value of the m dimension of the nth resume; />The comprehensive dominance parameter of the m dimension of the nth resume; />The same-dimension dominant parameter of the m dimension of the nth resume; />The dimension weight of the m dimension of the nth resume; />Presetting a first adjustment coefficient; />Presetting a second adjustment coefficient;weighting influence parameters of the m dimension of the nth resume; norms are normalization functions.

It should be noted that, according to the comprehensive capability and single domain advantage of the enterprise, the preset first adjustment coefficient and the preset second adjustment coefficient are adjusted, and the comprehensive advantage parameter is normalizedAnd normalized one-dimensional dominance parameterWeighting and summing to enable weighting influence parameters of corresponding dimensions in the resume to present recruitment requirements of enterprises; and adjusting the dimension weight of the dimension in the resume based on the weighted influence parameter, so that the final dimension value of the dimension representing the resume better meets the recruitment requirement of the enterprise.

The final dimension value presents the importance degree of dimension information suitable for enterprise requirements, the comprehensive advantage and the single advantage of each dimension in the resume are considered based on the clustering parameters acquired by the final dimension value, and the accuracy of the clustering parameters for representing the resume features is improved. The acquisition method of the clustering parameters comprises the following steps: and accumulating final dimension values of each dimension of each piece of human resource information to obtain clustering parameters of the corresponding human resource information. The calculation formula of the clustering parameters of each resume is as follows:

in the method, in the process of the invention,the clustering parameter is the n-th resume; />Final dimension value of the m dimension of the nth resume; m is the dimension of each resumeNumber of the pieces.

It should be noted that, when each dimension in the same resume corresponds to a keyword, the larger the final dimension value of each dimension is, the more obvious the clustering feature of the resume is described, and each dimension of the resume has the advantage that the more information amount carried by the resume is.

Clustering the clustering parameters by using a clustering algorithm to obtain a preset number of clustering clusters; and displaying the human resource information corresponding to the aggregation parameters in different clusters. In the embodiment of the invention, the clustering parameters are clustered by using a K-means clustering algorithm, and K takes an empirical value of 3, so that an implementer can set the clustering parameters according to actual conditions; the clustering parameters are divided into 3 categories, namely, the resume corresponding to the clustering parameters in each clustering cluster is divided into one category, and the resume of each category is transmitted to a display module in the human resource information management system for display.

The K-means clustering algorithm is a well-known technique for those skilled in the art, and will not be described herein.

The present invention has been completed.

In summary, in the embodiment of the present invention, the data acquisition module is configured to acquire a keyword area of each dimension of the human resource information, and set a corresponding dimension weight; the comprehensive advantage analysis module is used for acquiring non-repeated texts in the keyword area and acquiring comprehensive advantage parameters of each dimension in each piece of human resource information; the dimension advantage analysis module is used for acquiring the same-dimension advantage parameters of each dimension according to the hierarchy weights of the same dimension of the human resource information; and clustering management is carried out on the human resource information by combining comprehensive advantage parameters of any dimension in each piece of human resource information, the same-dimension advantage parameters and the dimension weight. According to the invention, text information of different dimensionalities of the human resource information and the advantages of the same dimensionality are analyzed, so that the human resource information is clustered more accurately, and the management efficiency of the human resource information is improved.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A human resource information management system based on data clustering, the system comprising:

the resource information management module is used for carrying out cluster management on the human resource information by combining the comprehensive advantage parameter of any dimension of each human resource information, the same-dimension advantage parameter and the dimension weight;

the keyword region acquisition method comprises the following steps:

if the same text exists in the human resource information in each type of preset keywords, the text is used as the keyword text of the corresponding dimension of the preset keywords of the corresponding type in the human resource information, and the label value of the corresponding dimension of the preset keywords of the corresponding type is set as a second label value; in each piece of human resource information, all texts in the area from the first keyword text in the front keyword text in the two adjacent keyword texts to the first keyword text in the rear keyword text are used as keyword areas of corresponding dimensions of the front keyword text; all texts in the region from the first keyword text to the end of the human resource information in the last keyword text are used as the keyword region of the corresponding dimension of the last keyword text;

The method for acquiring the comprehensive advantage parameters comprises the following steps:

counting the number of the non-repeated texts in each keyword area;

When the label value of the dimension of the human resource information is a first label value, the comprehensive advantage parameter of the corresponding dimension is 0; carrying out negative correlation and normalized mapping on the average value of the quantity influence parameters of any dimension with the label value being the second label value and any other dimension to obtain the comprehensive advantage parameters of the corresponding dimension;

the method for acquiring the same-dimensional dominant parameters comprises the following steps:

2. The human resource information management system based on data clustering according to claim 1, wherein the method for clustering management of human resource information by combining the comprehensive dominance parameter, the homodimensional dominance parameter and the dimensional weight of any dimension of each piece of human resource information is as follows:

3. The human resource information management system based on data clustering as claimed in claim 1, wherein the non-repeated text obtaining method comprises:

4. The human resource information management system based on data clustering as claimed in claim 1, wherein the hierarchical weight acquisition method comprises: