CN111666377A

CN111666377A - Talent portrait construction method and system based on big data modeling

Info

Publication number: CN111666377A
Application number: CN202010493764.4A
Authority: CN
Inventors: 杨灵运; 杨文峰; 张昌福; 邓生雄; 张磊; 李琳
Original assignee: Guizhou Casicloud Technology Co ltd
Current assignee: Guizhou Casicloud Technology Co ltd
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2020-09-15

Abstract

The invention discloses a talent portrait construction method and a system based on big data modeling, which comprises the steps of obtaining sample talent data, carrying out talent weight refinement on the sample talent data to obtain a talent data set, and constructing a sample talent data model by using the talent data set; acquiring voice data of interviewers, and performing textualization processing on the voice data to obtain text data; screening the text data to obtain a screening data set, and matching the screening data set with the talent data set to obtain an image set; constructing a talent portrait of the portrait set by using the sample talent data model; the invention can solve the defect that the authenticity of the acquired information is limited because the information of the personnel is acquired through the resume in the prior scheme and the authenticity of the resume cannot be determined, and can effectively improve the accuracy of talent portrayal.

Description

Talent portrait construction method and system based on big data modeling

Technical Field

The invention belongs to the field of big data, relates to a talent portrait technology, and particularly relates to a talent portrait construction method and system based on big data modeling.

Background

The user portrait is a tagged user model abstracted according to information such as social attributes, living habits, consumption behaviors and the like of users, the core work of constructing the user portrait is tagged to the users in real time, and the notebooks are highly refined feature marks obtained by analyzing the information of the users.

The significance of talent portraits is well known and agreed, but in the practice of different industries and enterprises, the talent portraits project groups are applied and bloomed respectively under different enterprise organizational cultural forms due to different cultural background and strategic requirements and different concerns and methodologies, the precipitation and accumulation of data directions in the field of human resources are different, and the difference of organizational management modes can cause personalized differences in the understanding of talent portraits.

The correct and efficient talent portrait construction method and system can help enterprises to accurately obtain effective talents and effectively save manpower and material resources, but the current talent portrait method is not accurate and efficient enough due to numerous influence factors of talent portraits, so that personnel obtained by the method do not meet the requirements of the enterprises and cause inestimable loss; in order to solve the above-mentioned drawbacks, a solution is now provided.

Disclosure of Invention

The invention aims to provide a talent portrait construction method and system based on big data modeling.

The technical problem to be solved by the invention is as follows:

(1) how to model with big data; acquiring sample talent data, performing talent weight refinement on the sample talent data to obtain a talent weight data set, and constructing a sample talent data model by using the talent weight data set; the sample talent data model obtained by processing the sample data in the early stage can provide powerful data support for the subsequent construction of talent portraits, and the accuracy and the efficiency of talent portraits can be effectively improved;

(2) how to process talent portraits on the acquired data; obtaining voice data of interviewees, performing textualization processing on the voice data to obtain text data, screening the text data to obtain a screening data set, matching the screening data set with the talent data set to obtain an image set, and constructing a talent portrait of the image set by using the sample talent data model; the invention can solve the defect that the authenticity of the acquired information is limited because the information of the personnel is acquired through the resume in the prior scheme and the authenticity of the resume cannot be determined, and can effectively improve the accuracy of talent portrayal.

The purpose of the invention can be realized by the following technical scheme:

a talent portrait construction method based on big data modeling comprises the following steps:

s1: acquiring sample talent data, performing talent weight refinement on the sample talent data to obtain a talent data set, and constructing a sample talent data model by using the talent data set;

s2: acquiring voice data of interviewers, and performing textualization processing on the voice data to obtain text data;

s3: screening the text data to obtain a screening data set, and matching the screening data set with the talent data set to obtain an image set;

s4: and constructing the talent portrait of the portrait set by utilizing the sample talent data model.

Further, the talent weight refining of the sample talent data to obtain a talent data set includes:

s21: acquiring talent weight phrases in the sample talent data, and performing phrase division on the sample talent data to obtain a divided data set, wherein the divided data set comprises a sample college information set, a sample hundred-strength enterprise set and a sample honor set;

s22: performing grade division on the sample college information set to obtain a college grade set, performing score marking on the college grade set by using a preset score algorithm to obtain a college score set, combining the college grade set and the college score set to obtain sample college data, and marking colleges in the college grade set as Gi, i is 1,2,3 … n;

s23: carrying out grade division on the sample hundred-strength enterprise set to obtain an enterprise grade set, carrying out grade marking on the sample hundred-strength enterprise set by using a preset grade algorithm to obtain an enterprise grade set, combining the enterprise grade set and the enterprise grade set to obtain sample enterprise data, and marking enterprises in the enterprise grade set as Qi, wherein i is 1,2, and 3 … n;

s24: grade division is carried out on the sample honor sets to obtain the honor grade sets, score marking is carried out on the sample honor sets by utilizing a preset score algorithm to obtain the honor score sets, the honor grade sets and the honor score sets are combined to obtain sample honor data, and the honor items in the honor grade sets are marked as Ri, i is 1,2 and 3 … n;

s25: and combining the sample college data, the sample enterprise data and the sample honor data to obtain a talent data set.

Further, the performing the text processing on the voice data to obtain text data includes:

s31, sampling the voice data by using a preset sampling rate and sampling digit to obtain a sampling data set;

s32, quantizing the sampling data set to obtain a quantized data set;

s33, pre-emphasis is carried out on the quantized data set to obtain a first feature set;

s34, performing framing and windowing on the first feature set to obtain a second feature set;

s35, performing discrete Fourier transform processing on the second feature set to obtain a third feature set;

and S36, performing textualization processing on the third feature set by using a text feature coefficient algorithm to obtain text data.

Further, the matching the interviewer keyword set with the talent weight data set to obtain an image set comprises:

s41: carrying out phrase division on the text data to obtain a phrase data set, and marking phrases in the phrase data set as Wi, i is 1,2,3 … n;

s42: screening the phrase data set according to preset keywords to obtain a screened data set, wherein the screened data set comprises education data, working data and prize winning data, and marking screened phrases in the screened data set as Wij, i is 1,2,3 … n, j is 1,2,3 … n;

s43: comparing the screening data set with the talent data set, and storing phrases with the same comparison result to obtain matching data;

s44: and combining the matched data to obtain an image set.

Further, constructing a talent representation of the interviewer using the set of representations includes:

s51, matching the image set with the sample talent data model to obtain education data scores, work data scores and prize winning data scores corresponding to the education data, the work data and the prize winning data in the image set;

s52, matching the education data score, the work data score and the prize winning data score with a preset score grade to obtain the education grade data, the work grade data and the prize winning grade data;

and S53, dividing and combining the education grade data, the work grade data and the prize winning grade data according to a preset proportion to obtain the talent portrait of the interviewee.

A talent portrait construction system based on big data modeling comprises a sample data processing module, a voice data processing module, a text data processing module and a portrait construction module;

the sample data processing module is used for acquiring sample talent data, performing talent weight refinement on the sample talent data to obtain a talent data set, and constructing a sample talent data model by using the talent data set;

the voice data processing module is used for acquiring voice data of interviewers and performing textualization processing on the voice data to obtain text data;

the text data processing module is used for screening the text data to obtain a screening data set, and matching the screening data set with the talent data set to obtain an image set;

the portrait construction module is used for constructing the talent portrait of the portrait set by utilizing the sample talent data model.

The invention has the beneficial effects that:

(1) on one aspect of the invention, talent weight extraction is carried out on sample talent data by obtaining the sample talent data to obtain a talent data set, and a sample talent data model is constructed by utilizing the talent data set; the sample talent data model obtained by processing the sample data in the early stage can provide powerful data support for the subsequent construction of talent portraits, and the accuracy and the efficiency of talent portraits can be effectively improved;

(2) on the other hand, the voice data of the interviewer is acquired, the voice data is subjected to text processing to obtain text data, the text data is screened to obtain a screening data set, the screening data set is matched with the talent data set to obtain an image set, and the talent portrait of the image set is constructed by using the sample talent data model; the invention can solve the defect that the authenticity of the acquired information is limited because the information of the personnel is acquired through the resume in the prior scheme and the authenticity of the resume cannot be determined, and can effectively improve the accuracy of talent portrayal.

Drawings

In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.

FIG. 1 is a flow chart of a talent portrait construction method based on big data modeling according to the present invention.

Detailed Description

As shown in FIG. 1, a talent portrait construction method based on big data modeling includes:

Will sample talent data carries out talent weight and refines, obtains the talent data set and includes:

In the embodiment of the invention, the sample college information set can be the ranking list of the first five hundred colleges, the sample hundred-powerful enterprise set can be the ranking list of the first five hundred colleges, and the sample honor set can be the honor of state-level, provincial-level and city-level certificates; when the sample college information set, the sample Baiqiang enterprise set and the sample honor set are graded, the number of the sample college information set, the sample Baiqiang enterprise set and the sample honor set is graded by taking 100 as a grade, for example, the interval [1,100) is a first grade, [100,200) is a second grade, [200,300) is a third grade, [3,400) is a fourth grade, [4,500] is a fifth grade, and the grades from the first grade to the fifth grade are sequentially reduced;

respectively setting scores of all colleges in the sample college information set, all enterprises in the sample Baiqiang enterprise set and all honors in the sample honor set by using a preset score algorithm, wherein all colleges in the sample college information set are set in a way of decreasing the scores from front to back, for example, the college ranked first is 500, the college ranked second is 499; each enterprise in the sample Baiqiang enterprise set is set in a descending manner by scores from front to back, for example, the enterprise ranked first is 500 scores, the enterprise ranked second is 499 scores. Setting each honor in the sample honor set in a descending way by a score from front to back, for example, the first honor item is ranked 500 points, the second honor item is ranked 499 points, and so on;

performing the text processing on the voice data to obtain text data includes:

s32, quantizing the sampling data set to obtain a quantized data set;

In the embodiment of the invention, the preset sampling rate and the sampling digit number are respectively 16kHz and 16k, the preset sampling rate and the sampling digit number convert the sound continuous waveform in the voice data into discrete data points, the data points are stored by amplitude values, and the amplitude values are quantized into integers, wherein the quantization is in the field of digital signal processing;

the pre-emphasis is used for increasing the energy of the high-frequency part of the sound, the energy of the high-frequency part is strengthened to enable the high-frequency formant to be better utilized, so that the identification accuracy is improved, the pre-emphasis can be realized by a first-order high-pass filter, in the time domain, if the input signal is x [ n ], n represents the number of the input signals, and in the formula, the value of a preset coefficient mu is between 0.9 and 1.0, usually 0.97, the filter is represented as y [ n ] ═ x [ n-1 ];

the frame represents that N sampling points are grouped into an observation unit, in general, the value of N is 256 or 512, the covered time is about 20-30ms, an overlap region is arranged between two adjacent frames for avoiding overlarge change of the two adjacent frames, the overlap region comprises M sampling points, in general, the value of M is 1/2 or 1/3 of N, the sampling frequency of a voice signal adopted for voice recognition is 8KHz or 16KHz, and if the frame length is set to be 8KHz, the corresponding time length is 256/8000 × 1000 ═ 32 ms;

the windowing refers to that the sound in daily life is generally a non-stationary signal, the statistical characteristic of the signal is not fixed and constant, but the signal can be considered to be stationary within a short period of time, the window is described by three parameters of window length, offset and shape, each windowed sound signal is called a frame, the millisecond number of each frame is called a frame length, and the distance between the left boundaries of two adjacent frames is called frame shift;

the discrete Fourier transform refers to transforming a signal from a time domain to a frequency domain so as to research the frequency spectrum structure and the change rule of the signal, performing fast Fourier transform on each frame signal in the voice data after framing and windowing to obtain the frequency spectrum of each frame, and performing modular squaring on the frequency spectrum of the voice signal to obtain the power spectrum of the voice signal;

the text characteristic coefficient algorithm is a Mel frequency cepstrum coefficient algorithm.

Matching the interviewer keyword set with the talent weight data set to obtain an portrait set comprises:

s44: and combining the matched data to obtain an image set.

In the embodiment of the invention, the names in the text data are extracted, the names in the text data include but are not limited to schools, companies, prize-winning names, participating activities and the like of interviewees, a phrase data set is obtained, and the phrase data set is screened and matched according to preset high-efficiency keywords, enterprise keywords and honor keywords, so that an image set of the interviewees is obtained.

The talent portrait construction method for interviewers by utilizing the portrait collection comprises the following steps:

In the embodiment of the invention, the efficient keywords, the enterprise keywords and the honor keywords in the portrait set are matched with the sample talent data model to obtain school scores and school grades learned by the interviewees, enterprise scores and enterprise grades of work and honor scores and honor grades of awards, the talent scores of the interviewees can be obtained through the school scores, the enterprise scores and the honor scores, and talent grades of the interviewees can be obtained through the school grades, the enterprise grades and the honor grades; the talent portrayal of the interviewee is obtained through the talent score and the talent grade.

The working steps of the embodiment of the invention comprise:

acquiring sample talent data, performing talent weight refinement on the sample talent data to obtain a talent data set, and constructing a sample talent data model by using the talent data set; wherein, will sample talent data carries out talent weight and refines, obtains the talent data set and includes: acquiring talent weight phrases in the sample talent data, and performing phrase division on the sample talent data to obtain a divided data set, wherein the divided data set comprises a sample college information set, a sample hundred-strength enterprise set and a sample honor set; performing grade division on the sample college information set to obtain a college grade set, performing score marking on the college grade set by using a preset score algorithm to obtain a college score set, combining the college grade set and the college score set to obtain sample college data, and marking colleges in the college grade set as Gi, i is 1,2,3 … n; carrying out grade division on the sample hundred-strength enterprise set to obtain an enterprise grade set, carrying out grade marking on the sample hundred-strength enterprise set by using a preset grade algorithm to obtain an enterprise grade set, combining the enterprise grade set and the enterprise grade set to obtain sample enterprise data, and marking enterprises in the enterprise grade set as Qi, wherein i is 1,2, and 3 … n; grade division is carried out on the sample honor sets to obtain the honor grade sets, score marking is carried out on the sample honor sets by utilizing a preset score algorithm to obtain the honor score sets, the honor grade sets and the honor score sets are combined to obtain sample honor data, and the honor items in the honor grade sets are marked as Ri, i is 1,2 and 3 … n; and combining the sample college data, the sample enterprise data and the sample honor data to obtain a talent data set.

Acquiring voice data of interviewers, and performing textualization processing on the voice data to obtain text data; performing text processing on the voice data to obtain text data includes: sampling the voice data by using a preset sampling rate and a preset sampling digit to obtain a sampling data set; quantizing the sampling data set to obtain a quantized data set; pre-emphasis is carried out on the quantitative data set to obtain a first feature set; performing frame windowing on the first feature set to obtain a second feature set; performing discrete Fourier transform processing on the second feature set to obtain a third feature set; and performing textualization processing on the third feature set by using a text feature coefficient algorithm to obtain text data.

Screening the text data to obtain a screening data set, and matching the screening data set with the talent data set to obtain an image set; wherein, will interviewer keyword set with talent weight data set matches, and it includes to obtain portraits set: carrying out phrase division on the text data to obtain a phrase data set, and marking phrases in the phrase data set as Wi, i is 1,2,3 … n; screening the phrase data set according to preset keywords to obtain a screened data set, wherein the screened data set comprises education data, working data and prize winning data, and marking screened phrases in the screened data set as Wij, i is 1,2,3 … n, j is 1,2,3 … n; comparing the screening data set with the talent data set, and storing phrases with the same comparison result to obtain matching data; and combining the matched data to obtain an image set.

Utilizing the sample talent data model to construct a talent portrait of the portrait collection, wherein utilizing the portrait collection to construct a talent portrait of an interviewer comprises: matching the image set with the sample talent data model to obtain education data values, work data values and prize winning data values corresponding to the image set education data, the work data and the prize winning data; matching the education data score, the work data score and the prize winning data score with a preset score grade to obtain the education grade data, the work grade data and the prize winning grade data; and dividing and combining the education grade data, the work grade data and the prize winning grade data according to a preset proportion to obtain the talent portrait of the interviewee.

A talent portrait construction system based on big data modeling comprises a sample data processing module, a voice data processing module, a text data processing module and a portrait construction module; the sample data processing module is used for acquiring sample talent data, performing talent weight refinement on the sample talent data to obtain a talent data set, and constructing a sample talent data model by using the talent data set; the voice data processing module is used for acquiring voice data of interviewers and performing textualization processing on the voice data to obtain text data; the text data processing module is used for screening the text data to obtain a screening data set, and matching the screening data set with the talent data set to obtain an image set; the image construction module is used for constructing the talent image of the image set by using the sample talent data model;

on one aspect of the invention, talent weight extraction is carried out on sample talent data by obtaining the sample talent data to obtain a talent data set, and a sample talent data model is constructed by utilizing the talent data set; the sample talent data model obtained by processing the sample data in the early stage can provide powerful data support for the subsequent construction of talent portraits, and the accuracy and the efficiency of talent portraits can be effectively improved;

on the other hand, the voice data of the interviewer is acquired, the voice data is subjected to text processing to obtain text data, the text data is screened to obtain a screening data set, the screening data set is matched with the talent data set to obtain an image set, and the talent portrait of the image set is constructed by using the sample talent data model; the invention can solve the defect that the authenticity of the acquired information is limited because the information of the personnel is acquired through the resume in the prior scheme and the authenticity of the resume cannot be determined, and can effectively improve the accuracy of talent portrayal.

The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the invention as defined in the following claims.

Claims

1. A talent portrait construction method based on big data modeling is characterized by comprising the following steps:

2. The method for constructing a talent portrait based on big data modeling as claimed in claim 1, wherein said extracting the sample talent data by talent weight to obtain a talent data set comprises:

3. The method for constructing a talent portrait based on big data modeling as claimed in claim 1, wherein said textualizing said speech data to obtain textual data comprises:

s32, quantizing the sampling data set to obtain a quantized data set;

4. The method of claim 1, wherein the matching the filtered dataset with the talent dataset to obtain an image set comprises:

s44: and combining the matched data to obtain an image set.

5. The method for constructing a talent representation based on big data modeling as claimed in claim 1, wherein said constructing talent representations of said set of representations using said sample talent data model comprises:

6. A talent portrait construction system based on big data modeling is characterized by comprising a sample data processing module, a voice data processing module, a text data processing module and a portrait construction module;