WO2019108133A1 - Plateforme de gestion de compétences - Google Patents

Plateforme de gestion de compétences Download PDF

Info

Publication number
WO2019108133A1
WO2019108133A1 PCT/SG2018/050583 SG2018050583W WO2019108133A1 WO 2019108133 A1 WO2019108133 A1 WO 2019108133A1 SG 2018050583 W SG2018050583 W SG 2018050583W WO 2019108133 A1 WO2019108133 A1 WO 2019108133A1
Authority
WO
WIPO (PCT)
Prior art keywords
individual
score
job
data set
data
Prior art date
Application number
PCT/SG2018/050583
Other languages
English (en)
Inventor
Jussi KEPPO
Zhengzhi YANG
Vishnu PRATAP
Original Assignee
X0Pa Ai Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by X0Pa Ai Pte Ltd filed Critical X0Pa Ai Pte Ltd
Priority to SG11201907551YA priority Critical patent/SG11201907551YA/en
Priority to GBGB1909943.1A priority patent/GB201909943D0/en
Publication of WO2019108133A1 publication Critical patent/WO2019108133A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Definitions

  • the invention relates to talent management.
  • it relates to computer based techniques for calculating scores for an employee or potential employee to determine their suitability for an available job position, and a platform to support such techniques.
  • a method of determining suitability of an individual for a job position comprises: identifying a plurality of individual characteristics from an individual profile data set; retrieving a data model built based on the job demographic profile data set, wherein the job demographic profile data comprises historical data associated with the plurality of individual characteristics of the job position; inputting the identified plurality of individual characteristics into the data model; and computing a score for the individual based on the input into the data model, wherein the score is used to determine the suitability of the individual for the job position.
  • a method of determining suitability of an individual for a job position comprises: identifying a job demographic profile data, wherein the job demographic profile data set comprises historical data associated with the plurality of individual characteristics of the job position; and building a data model based on the job demographic profile data set, wherein input of the identified plurality of individual characteristics into the data model allows for computation of a score for the individual, wherein the score is used to determine the suitability of the individual for the job position.
  • a method of determining suitability of an individual for a job position comprises: identifying a plurality of individual characteristics from an individual profile data set; identifying a job demographic profile data set, wherein the job demographic profile data set comprises historical data associated with the plurality of individual characteristics of the job position; building a data model based on the job demographic profile data set; inputting the identified plurality of individual characteristics into the data model; and computing a score for the individual based on the input into the data model, wherein the score is used to determine the suitability of the individual for the job position.
  • the methods described above and herein are computer implemented methods, and the steps described herein may be performed by one or more processors, or computing devices.
  • the method comprises performing feature engineering to generate a modified individual characteristic from at least one of the plurality of individual characteristics.
  • the method comprises identifying a plurality of requirements from a job description for the job position; wherein building the data model comprises building a plurality of data models, each data model corresponding to a requirement of the plurality of requirements and quantifies the relevance of the individual characteristic to the requirement.
  • the method comprises matching each identified individual characteristic to the corresponding data model; inputting each identified individual characteristic to the corresponding data model to generate a plurality of individual data model scores; and computing a final score by summing the plurality of individual data model scores.
  • computing the final score comprises applying a non-linear transformation to each individual data model score to obtain a plurality of transformed individual data model scores; summing the plurality of transformed individual data model scores to obtain an overall score; applying a second non-linear transformation to the overall score to obtain the score.
  • the data model is built by natural language processing methods.
  • the natural language processing method may be any one selected from word embedding of the individual characteristic, topic modelling of the individual characteristic, and term frequency-inverse document frequency of the individual characteristic.
  • the individual characteristic is an experience level of the individual and the natural language processing method employed includes word embeddings and Latent Dirichlet allocation.
  • the method comprises applying rule-based filtering based on the match made. More preferably, the rule-based filtering includes a location filter.
  • the score is a relevance score and indicates the relevance of the individual profile to the job description.
  • the job demographic profile data set comprises an employment history of every person in the database.
  • a deep neural network algorithm is implemented to train the data model.
  • the data model is optimised through a weighted cost function.
  • the data model is built by blending two or more statistical models.
  • the score is a loyalty score and indicates the probability the individual leaves the job position over a point in time or over a time period.
  • the loyalty score indicates the likelihood of the individual staying in the job position beyond a certain time period after being hired for the job position.
  • the job demographic profile data set comprises a time period a person spends in a first job position before moving to a second job position, wherein the second job position is a promotion from the first job position.
  • the promotion of a person may be used as a proxy for the productivity of the person in the organisation.
  • the second job position is preferably in the same organisation as the first job position, but may be another organisation too. This has an advantage as it does not require the organisation to provide sensitive commercial information.
  • the job demographic profile data set comprises key performance indicators of an organisation.
  • the score is a productivity score and is computed based on a selected reference.
  • the job demographic profile data set comprises a salary level of a job in the market, past salary offers to a plurality of candidates and an acceptance rate of past salary offers.
  • the score is an acceptability score and indicates the probability of the individual accepting a job offer at a given salary level.
  • the method comprises building a causal data model to identify a causal relationship between at least one of the plurality of individual characteristics and the computed score; and computing a second score indicating the elasticity of the score.
  • the method comprises generating an alert when a change in the at least one of the plurality of individual characteristics causes the score to fall below or rise above a threshold level.
  • the method comprises merging a plurality of data models to generate a merged data set to build a second causal data model.
  • building the second causal data model comprises calculating a propensity score for an individual in the merged data set; matching the propensity scores for all individuals in the merged data set to generate a PSM data set comprising pairs of individuals, each pair comprising a first individual who has received treatment, and a second individual who is a control; and performing statistical analysis to determine the impact of treatment.
  • the statistical analysis may be a paired t-test method.
  • a first data set corresponding to a loyalty score is merged with a second data set corresponding to a productivity score to build the second causal data model.
  • the job demographic profile data set comprises the plurality of individual characteristics in people who have previously qualified for a similar or identical job position in a similar or identical industry.
  • computing the score comprises applying a cosine similarity function to determine the similarity between the individual and the job demographic profile data set. More preferably, a weighted cosine similarity function is applied to give a higher weightage to a job profile which appears more frequently in the job demographic profile data set.
  • the score is a similarity score and indicates the similarity of the individual to a stereotypical job profile in the job demographic profile data set.
  • the weighted cosine similarity function may be as follows:
  • the job profile demographic data set is identified from a database comprising an organisation data set and a market data set.
  • the method comprises subsetting the organisation data set and the market data set to form the job demographic profile data set to build the data model.
  • the method comprises computing a hireability score by combining the relevance score and one or more additional score (like the loyalty score, productivity score, acceptability score). More preferably, combining the plurality of the computed score and/or second score is by applying any one selected from a linear function, a step like function, and a logistic function.
  • the method comprises allocating each requirement of the job description into one of the following groups: an essential component, a variable component and an optional component.
  • computing the score comprises at least one of the following:
  • the method comprises receiving an individual profile from a user terminal.
  • the method comprises parsing the individual profile into the individual profile data set.
  • the method scraping the Internet to build an individual online data set to form part of the individual profile data set.
  • the method comprises checking for errors in the individual; and penalising the computed score for errors found.
  • checking for errors is by a symmetric spelling correction method.
  • the method comprises adjusting the computing step of the score.
  • the method comprises ranking the individual based on the computed score relative to a plurality of other candidates.
  • the method comprises removing an individual from consideration if the score is below a qualifying score.
  • a non-transitory computer readable medium comprising instructions that when executed cause at least one computing device to perform the method according to the above methods.
  • a system for determining suitability of an individual for a job position comprises at least one processor; a non-transitory computer readable medium comprising instructions that when executed cause the at least one processor to perform the method according to the above methods.
  • the system comprises at least one of the following: the database, wherein the database comprises the market data set and/or the organisation data set; a user terminal for submitting the individual profile; and a parser tool to convert the individual profile into the individual profile data set.
  • the talent management platform provides organisations with an ability to manage their human resource requirements from the initial recruiting process till the management of employees in the organisation.
  • the platform allows organisations to simplify and improve their recruitment process by utilising available information to find the most suitable candidates for an available job position, and may be used to predict the loyalty and productivity of an employee to find a most suitable fit for the organisation.
  • the platform can also be used to determine the likelihood that a candidate will accept a job offer and the suitability of the candidates for a job or task, i.e. whether the organisation should hire a particular candidate.
  • Various aspects of the platform shows how salary, promotion, learning initiatives and other characteristics of the organisation varies an employee's productivity and loyalty, or a potential employee's acceptability of the job offer, and forecasted productivity and loyalty, and thus the hireability of the job candidate.
  • the talent management platform may be further configured to generate recommendations or a retention strategy for the organisation to identity best performing employees with a high likelihood to leave and retain them.
  • the talent management platform allows the organisation to use their resources more effectively to hire and retain their employees, providing savings in time and costs.
  • FIG 1 shows an overview of an embodiment of the invention
  • Figure 2 shows a schematic diagram of a database
  • FIGS 3A and 3B show embodiments of the invention
  • FIG. 4 shows an embodiment of the invention
  • FIG. 5 shows an embodiment of the invention
  • FIG. 6 shows an embodiment of the invention
  • FIG. 7 shows an embodiment of the invention
  • Figure 8 shows a system of an embodiment of the invention
  • FIG. 9 shows an embodiment of the invention
  • FIG. 10 shows an embodiment of the invention.
  • Figure 1 shows an overview of the workflow to use the talent management platform.
  • Data is obtained from a user of the talent management platform in block 105, i.e. the user is an organisation looking to hire or retain an employee.
  • the data may also be provided by an individual looking to apply for a job in the organisation (or by a recruiter), and could be in the form of a physical curriculum vitae (or a scanned copy), or the individual could enter the data into a webform.
  • the data obtained is then prepared, sorted, and filtered in block 110.
  • the data is analysed and predictions or forecasts are made based on the data in block 115.
  • the prediction or forecasts are combined into solutions for the user for further interpretation in block 120.
  • FIG 8 shows an example of a system 800 which may be used as the platform.
  • the system 800 comprises a client terminal 805 and a server terminal 815, the client 805 and server terminals 815 being connected via a network 810, for example a wide area network like the internet or a local area network like an intranet, and is shown schematically in Figure 8.
  • Multiple client terminals 805 and server terminals 815 may be used as required. Examples of devices usable as the client terminal 805 include personal computers, mobile devices and the like.
  • the server 815 generally comprises a processor 820, a memory 825, a user input device 830, an output device 835 to render the results and solutions in a readable format (e.g. text file, tabular file, graphs), and a network interface 840.
  • a readable format e.g. text file, tabular file, graphs
  • the server 815 may further contain control logic 845 and a database 200. Different embodiments described herein may be performed by different server terminals 815 and/or client terminals 805, hence each server terminal 815 need not contain the same components.
  • a server terminal 815 may be a database server 815 while another is an application server 815.
  • Figure 9 shows an overview of the workflow 900 for a user to use the system 800.
  • the user first accesses the system 800, which recognises the user in block 910, for example with a user identification and password system.
  • the required data is prepared for analysis by data checking and assembling in block 915, and data appending, updating and subsetting in block 920.
  • the data from the various sources need not come from a single database, or need not be in a single data structure, thus there may be a need to connect the data and update it in the form of a time series.
  • analysis and modelling of the data takes place as described above.
  • the system 800 determines if there are additional functionalities, for example other predictive solutions or value, which may be obtained from the user inputs and data sets. If no additional functionalities are required, the system 800 assembles and visualises output into the output device for the user to view (block 935) and subsequently exits (block 940). If present or required, the additional functionalities and the specified data are prepared (block 945) with further data analysis and modelling (block 950). The system 800 then loops back to block 930, and if there are no other functionalities, creates the output (block 935) and exits (block 940).
  • additional functionalities for example other predictive solutions or value
  • Figure 10 shows a method 1000 to be performed by the system 900 to aid the recruitment of a job candidate.
  • interested candidates will submit their curriculum vitae. This could be an existing employee in the organisation, an internal reference of an external candidate, an external candidate, an external human resource recruiter, or from an existing database of curriculum vitae for identical or similar positions.
  • the candidate's curriculum vitae is submitted to the system (block 1005) either by accessing an organisation's job portal, human resource management system, excel document, entry into an online form, Portable Document Format file, a text file, a Hyper-Text Markup Language file, an image file, and other readable format.
  • the system 1000 analyses the candidate's curriculum vitae (block 1010) and determines if it meets a first hiring threshold (block 1020).
  • analysing the candidate's curriculum vitae includes identifying whether the curriculum vitae contains a plurality of qualification parameters.
  • the qualification parameters typically include at least one of the following: educational and/or professional qualifications, a specific skill set, work experience, expected income, expected employment benefits, and ability to travel or relocate.
  • the qualification parameters are typically provided or set by the hiring organisation with a minimum set of requirements for a job, alternatively the system 700 may suggest the qualification parameters based on the market data 210 and/or organisation data 205.
  • the second set of qualification parameters are divided into intelligence quotient (IQ) factors, skill factors, and emotional quotient (EQ) factors.
  • a score is determined for each of the intelligence quotient (IQ) factors, skill factors, and emotional quotient (EQ) factors (block 1015).
  • Intelligence quotient (IQ) factors and emotional quotient (EQ) factors scores are obtained by making the candidate to take a customized online test, while the skills factor score is obtained by matching the candidate's curriculum vitae and job description skill features.
  • the different factors can also be weighted to value certain factors more. This can be determined by the organisation with the open job position, or could be identified from the organisation and/or market data.
  • the final score comprises 10% to 60% weightage of each of the three scores, such that the total is 100%.
  • the organisation may determine they wish to employ a weightage of 40% IQ factor, 25% skill factor, and 35% EQ factor, using this a total score is obtained.
  • the weightage of each factor is usually provided by the hiring organisation, or could use a default weightage, or could be based on market data. It is subsequently determined if the candidate meets the first hiring threshold (block 1020), this could be with respect to the total score, or for each individual score.
  • the candidate is notified of non-selection for the next round (block 1025), and may additionally be added into a database for future opportunities. If the first hiring threshold is met, a second pass analysis is conducted (block 1030) by computing the matching, loyalty, productivity and hireability scores as described by the methods described herein. This allows the hiring organisations to take into consideration more factors than what are present in the candidate's curriculum vitae and decrease the guesswork involved in existing recruitment processes.
  • the scores meet a second threshold (block 1035), if not the candidate is informed of the non-selection.
  • the candidate is then invited for further interviews. This may include a video interview (block 1040), a psychometric test and/or gamification etc. (block 1045), and determine if the candidate meets a third hiring threshold (block 1050).
  • Gamification is a term where programs/courses/selection are done via a gamified way. For example, a hackathon where developers code to get a job but is based on a real life scenario or problem.
  • the candidate is subsequently invited for a final round of interview (block 1055) or informed of non-selection (block 1025).
  • the number of interviews may be reduced or increased as required. For example, there could be additional interview rounds, or an organisation may wish to conduct the final interview directly after the second threshold test. This depends on the requirements of the hiring organisation and may be readily varied without deviating from the methods described.
  • the two stage analysis may be useful if there are a large number of candidates for a job; if there are a smaller number of candidates, the first threshold analysis may be combined with the second threshold analysis.
  • the method 1000 provides an advantage over existing methods which rely only on the individual personal data provided in the curriculum vitae of the individual and may not provide the best fit for both the individual and organisation, and may also cause the organisation to miss out some applicants who may contain other qualification factors which are not immediately identifiable through the curriculum vitae.
  • the system 800 and method 1000 allows the organisation to consider more qualification factors in a candidate to increase the diversity and different abilities present in the organisation, which is not readily done by conventional methods.
  • instructions for the methods described herein can be provided to be used directly by the human resource department of an organisation, or provided to the organisation as a third party service, for example as a software as a service model via a client terminal 805.
  • Application programming interfaces may be used to allow the system 800 to interact with the organisation's existing human resource management system or human capital management system, or a separate database 200 containing the necessary organisation data.
  • the method (300) comprises
  • the data model used in the methods herein can be prepared and provided to an organisation to be used with the scoring method, and need not be performed by the same organisation.
  • the market and/or organisation data set will most likely contain more characteristics (or features) than the individual characteristics available, or only certain individual characteristics will be relevant for a particular method. Thus, only those individual characteristic relevant to the method will be identified in the individual profile data set. Further, only the historical data (i.e. the market and/or organisation data) associated with the identified individual characteristics will be identified in the job demographic profile data set. By associated with, the relevant historical data could be directly or indirectly related to the individual characteristic.
  • the method (350) comprises:
  • the method comprises:
  • the method may further comprise any one of the following:
  • a database 200 contains data to be used by a system 700 or talent management platform. There are generally three categories of data in the database 200 as shown in Figure 2 - organisation data 205, market (or third party) data 210 and individual profile data 215.
  • the database 200 referred herein are not limited to a single physical device, and can refer to several physical devices located in different locations.
  • the organisation data 205 may be stored by the organisation itself and separate from the remaining data.
  • the data may be in structured (e.g. data entered into fields of a form) or unstructured (e.g. no pre-defined data model or is not organised in a pre-defined manner) form, and is converted into machine readable format.
  • the system or platform may allow a user (be it the organisation recruiting or the individual applying for the job) to enter to required information in specified fields.
  • the system converts the unstructured data into a machine readable format by parsing. Natural language processing systems may be used to parse the data into a suitable data set, and is explained further below.
  • organisation data 205 refers to data on the organisation including information of the company such as its size, industry, revenue, profits, description of jobs in the organisation, and human resource data of the organisation's employees.
  • the human resource data can be sourced from an organisation's Human Resource Management System or Human Capital Management System and includes information such as the employee's age, ethnicity, education, job skills and experience, performance review, employment tenure for various job positions in the organisation (including the mean and median values), salary of various job positions in the organisation.
  • the organisation data set may be provided by an organisation which uses the method, system or platform described herein. However, not every organisation may be willing or have such data readily on hand to be used, and is not essential to the database 200 and methods but allow for better results.
  • a job description including the requirements the person should possess will be provided by the organisation.
  • This job description may then be used to create a job profile data set and is part of the organisation data 205.
  • the job description may be provided as a hard copy which is converted into a machine readable format, or submitted through a webform.
  • Market data 210 includes the salary and length of stay for a particular job position (upon which the average of each can be derived), and includes those which are present in multiple industries and industry specific jobs.
  • the market data 210 may be obtained from surveys of employees and/or employers, third party providers or from public data, for example government employment statistics.
  • the market data 210 may be obtained for particular jobs, industries, countries, and regions, and the data set used can be modified accordingly as desired and required.
  • the market data 210 varies largely depending on the job and industry. For example, some jobs may require skills that are highly transferable between industries and/or jobs which are highly mobile and need not be restricted by geographic region. On the other hand, some jobs may be highly specific for various reasons including country specific requirements, or a specific skill set.
  • Individual data 215 refers to data on the candidates applying for a job or employees of an organisation.
  • the individual data 215 includes personal individual data 220 or information typically obtained in the individual's curriculum vitae or employment records such as age, ethnicity, education, skills and work experience. These may be some examples of individual characteristics that may be identified and used in the analysis methods described herein. For an employee, this individual data 220 is also part of the organisation data as stated above.
  • the individual data 220 could also be have been stored in an organisation's existing database of potential candidates for previous job openings or just a general submission by the candidate.
  • the individual data 215 can be provided in the form of a hard copy (e.g.
  • Portable Document Format file a text file, a Hyper-Text Mark-up Language file, and other readable format
  • the curriculum vitae in which it will be processed first and converted into a machine readable form.
  • the individual could be asked to provide such information directly into a user interface (or webform) which captures the individual data 215 in a structured format.
  • the talent management platform sources for information from the Internet to establish the individual's Internet footprint 225, i.e. scraping public data to build the individual's online data set.
  • web scraping techniques include application programming interface links, text pattern matching, HTTP programming, HTML parsing, DOM parsing, vertical aggregation, semantic annotation recognition, and machine learning.
  • the individual's email and/or name can be used to find information relevant to the individual, and used to compile an individual online profile.
  • the individual characteristics include any "push” or “pull” factors that may induce an employee to leave or stay with the organisation.
  • some individual characteristics include the age, education qualifications, work experience, gender, ethnicity, marital status, salary of the employee (the salary here is taken to be the financial remuneration to the employee including any fixed or variable wage component, like bonuses, commissions, and options), work benefits (e.g. leave, opportunity for professional development, flexible work arrangements), commuting distance and time from home, work satisfaction level, relationships with colleagues, relationships with clients, work environment, diversity and other characteristics of the workforce in the firm.
  • the individual data includes the data on the individual's current job, and previous job/s.
  • the industry data may also include the individual's current and previous company and industry. Every individual is different, and may value different factors or benefits provided differently.
  • the individual data may be provided by the individual itself applying for the position, or by the organisation to analyse the suitability of a candidate (be it external or internal candidate) through a user terminal either locally on a device, or through a webform.
  • the individual data is received by a receiving module, an extraction module to extract out the relevant data, a sorting and filtering module to sort and filter the data. For example, information regarding an individual can be found from the individual's Linkedln account including the number of social media contacts, the people or groups the individual follows, the groups the individual joins, recommendations the individual has.
  • the market data 210 is in the form of free text and hence unstructured, the data is converted into a semi structured format using a parser tool, which is further converted into a structured format in the form of tables. Further data cleaning is performed to remove null values, unnecessary punctuations and outliers. Feature engineering is also performed to derive features that are useful when computing the similarity scores. This may include deriving completely new features or creating dummy variables for existing features.
  • Feature engineering is the process of using both domain knowledge and statistical behaviour of the data to create features that enhance the machine learning predictive capability.
  • feature engineering combining several raw features to generate a new meaningful feature is one approach.
  • An example of feature engineering would be using a feature like experience (in number of years) of a person to derive the seniority of a person. Usually in a job requirement, the experience is represented in a range (like at least 3 years, 3-5 years or mid-senior level, etc.)
  • This engineered feature a modified individual characteristic
  • the appropriate ranges may be determined using inputs from domain experts.
  • Another advantage is that grouping experiences helped in reducing the skewness in the data. For example, if there is a relative high number of candidates with experience less than 5 years feature engineering as described will reduce the skewness of the data set. It will be appreciated that other job requirements may be similarly engineered to generate other modified individual characteristics.
  • feature engineering is done appropriately, it increases the predictive power through an iterative process, and provides more meaningful interpretation of the machine learning algorithms by creating features from raw data that are more interpretable and help to facilitate the machine learning process.
  • the market data 210 may be analysed to determine trends and/or patterns, to allow the system 100 to identify a job demographic profile data set by a data analysis engine.
  • the job demographic profile may include information such as the average salary for a particular job, additional factors can be added including age, education, gender, location (e.g. city, state, country, region, and continent), industry etc. to define the job demographic profile more accurately or broadly as required.
  • a job demographic profile may establish that individuals in a certain age group has a high tendency to change jobs or companies every two years.
  • job description features such as education, past experiences, skills and certifications are also considered to identify the right candidate in the hiring process.
  • Feature engineering may also performed on the market data set. For example, based on the job industry mentioned in the job description, we compare how close this job industry is to the job industry of the candidate. For example, if the job is about the internet industry and the candidate is from the information technology industry, the job might be still relevant to the candidate.
  • An organisation specific job demographic profile data set can be created based on the market and organisation data, and is a subset of the job demographic profile. For example, the average tenure of an employee in a specific job may be higher than other similar organisations and could reflect the better human resource management policies in the organisation or other factors that may not immediately be apparent from numerical data. In another example, the average tenure of an employee in a specific job could be shorter than the industry average but could be attributable to a job rotation or promotion policy within the organisation. These information may not be readily available outside the organisation, but could provide a more accurate reflection of the organisation's policies and strategies.
  • the organisation may also provide a job description containing the requirements for the available job position, some non-limiting examples of these requirements include the education level, number of years of experience, salary range, technical skills required, travel frequency, specific experience in managing a team or project.
  • the job description may be provided as structured or unstructured data, and may be converted to a machine readable format to form the job profile data set 207 as described for the individual profile data set.
  • the requirements for the position may typically be divided into three types: essential, variable, and optional.
  • Essential requirements are those which the organisation considers to be necessary for the position
  • variable requirements are those which the organisation may accept a wider range than indicated
  • the optional requirements are those which are good to have for the candidates.
  • the job profile data set comprises an essential component, a variable component, and an optional component, each corresponding to the requirements of the position.
  • the method may further check for errors in the individual profile data set 215, and penalise the score for the errors found in the check. This is because frequent misspelling in the curriculum vitae, or submission, will reflect inadequate preparation or carelessness in the person, and would imply a higher chance of unfitness for the job.
  • a fast spelling check on the submitted individual data is conducted and a measurement of the spelling errors would be provided to penalise the overall score.
  • the spelling check may be done by string comparison using a symmetric spelling correction method to have fast string checking and editing, in which the editing count is used to measure the number of spelling errors.
  • NLP natural language processing
  • the method (400) comprises:
  • building the data model comprises building a plurality of data model, each data model corresponding to a requirement of the plurality of requirements and quantifies the relevance of the individual characteristic to the requirement (block 420);
  • the resume is parsed and different approaches are adopted for each section.
  • Each individual's profile is segmented into several sections with different weightage for each as described below.
  • the data on the individual is compiled into an individual profile data set 215, from which a plurality of individual characteristics are identified by a processor 820 in block 305.
  • the processor 820 identifies a job demographic profile data set, wherein the job demographic profile data set comprises historical data associated with the plurality of individual characteristics of the job position in block 310.
  • a plurality of requirements are further identified from a job description for the job position and can be compiled into a job profile data set 207.
  • the processor 820 builds multiple data models, each data model corresponding to a specific requirement and quantifies the relevance of the individual characteristic to the requirement.
  • the processor 820 further matches and inputs each individual characteristic to the corresponding data model to compute an individual score for each corresponding data model, and combining the individual scores from corresponding data model to compute a final score.
  • Different models may be used to analyse each section, after which the scores from each section is combined.
  • location we use both keywords and geographical distance to quantify the proximity.
  • experience level we use nonlinear transformation of the experience so that we do not set a strict range as the requirement.
  • LDA Latent Dirichlet Allocation
  • For experience description we use Latent Dirichlet Allocation (LDA) topic modelling to see the similarity between the individual and the job requirement.
  • skills or some key requirements we use customized word embeddings to compare the numerical representation of words/phrases, the model is trained on a large number of text from job descriptions and resume elaborations.
  • LDA Latent Dirichlet Allocation
  • experience level each job has certain requirement on experience level, and there is usually an upper and lower bound expected from recruiters. Instead of doing strict cutoffs, a non-linear transformation is implemented to give variable tolerance bandwidth to each experience level.
  • the experience level requirement is from the job description, and the individual's total experience is calculated from the employment history.
  • This model compares the personal total experience to the required experience level in the job description. Instead of a strict bound, e.g. 5-10 years which the applicants must have experience between 5 and 10 years, or maybe between 4 and 11 years with 1 year tolerance, the score is calculated based on how much the personal experience deviates from the required value, and the deviation is transformed nonlinearly such that even if the individual falls short of the requirement the individual may still be considered for the job. However, if the difference from the range requirement is too large, the model will penalise the individual heavily.
  • Fine-tuning based on users' behaviour with the focus on different sections for relevance comparison, the users' decision is collected to understand which section is more critical and the model would be fine-tuned to have better overall matching score.
  • Rule-based filtering identifying the key words/phrases which are required based on the job description, keep those profiles which have fulfilled most of the requirements.
  • Geo-location measurement using google API, retrieve the location longitude and latitude to calculate the geographical distance between two places.
  • Nonlinear smoothing through a nonlinear function with mean, lower bound and upper bound values provided, fit the experience tolerance value with a variable buffer based on the variance of the bounds, so for more junior level which the difference between bounds and mean value is small, the tolerance derived would be low and for more senior jobs, the tolerance would be larger.
  • Word embeddings mapping word/phrases to higher dimensional vector space to quantify the meaning of them. Calculate the similarity by comparing the word vectors.
  • Topic modeling Latent Dirichlet Allocation/probabilistic latent semantic analysis are utilized to retrieve the topic information from text paragraphs to summarize the topics.
  • Term frequency-inverse document frequency (tf-idf) a numerical statistic that reflects how important a word is to a document.
  • Each individual characteristic may be of a numerical value or is text based.
  • the numerical value may be matched to a fixed value or range in the job profile data set 207, for example 3 years of experience matched to a requirement of at least 2 years of experience.
  • word embedding and/or topic modelling is performed instead.
  • a person's curriculum vitae may contain long lines of text in the work experience, activities, and award sections. However, since it is unlikely for every person to use the same word/s to describe the same item, this will often lead to errors and inaccurate results. In addition, job titles and awards may differ between organisations thus leading to errors as well.
  • Word embedding is to map a word or a phrase in the sentence to a higher dimensional vector space to quantify the meaning of the word or phrase. Subsequently, the similarity between the word vectors are calculated and added to the scoring function.
  • latent Dirichlet Allocation and/or probabilistic latent semantic analysis methods are utilised to retrieve the topic information from text paragraphs to summarise the topics and calculate the similarities with the requirements of the job profile. This provides a more accurate result to determine how relevant a person's history (work experience, education levels and skills) are to the job requirements, and removes or reduces variances due to differences between organisations.
  • Each section of the job requirements may utilise one or more of the natural language processing models or methods, and is targeted to calculate the relevance for each section inside the job requirement based on the individual characteristics.
  • the method 300 may further apply rule-based filtering. This could be based on the match made between the individual characteristics and the job profile requirements (i.e the number of matches and/or quality of the match).
  • One of the rules used could be a location filter. This could be by country, state, city, or geographical distance, and is dependent in part on the job and individual. For example, in a small country the geographical distance may be more relevant, since the distance between two points may not be significant. On the other hand in a large country with many states and cities, people may have less inclination to move out of the country. Thus, this is dependent on the country in question.
  • the location measurement may be performed using the Google Maps API (or other similar programmes) to retrieve the coordinates of a location (longitude and latitude) and measure the geographical distance between them. The method may factor into account transport links between the two locations in determining the location filter.
  • the matching of the individual characteristic to a component of the job profile data set may have several areas to evaluate, so for each area it could be viewed as a rule-based filtering. However, it is not a simple yes or no filtering. For the above example of location filter, it also has the geospatial analysis to understand the physical distance to perform the filtering.
  • the individual data model will provide an individual data model score for each requirement (or section) in the job description.
  • a different non-linear transformation may be applied to each score, and all the transformed scores are summed up to get an overall score.
  • a final non-linear transformation is performed on the overall score to obtain a final score.
  • the nonlinear transformation for the individual data model score varies across the different scores, and it is not a fixed formula.
  • a benchmark formula is used as a starting point, and with more validation and decision available, reinforcement learning is used to train the nonlinear transformation formula for each individual requirement (or section).
  • the benchmark transformation for word embedding may be:
  • the method may subtract 70 from the must-have skill word embedding score as we set 70 as the benchmark (the numerical value provided is purely for illustrative purpose and may be any other suitable value). If it is negative, a continuously increasing order is taken to amplify the negative score such that if the difference becomes too negative, no matter how well the individual scores for the other sections, the individual would not be able to obtain a positive score, in other words be qualified for the job position.
  • a similar non-linear transformation may be applied to each section, but with slight differences to account for the difference in the individual characteristic in the section (i.e. the non-linear transformation applied to each section is similar but unique to that section).
  • the final matching score is obtained by another transformation which is 100 / (1 + exp(- overall score)).
  • the score calculated is termed the relevance score has been used and evaluated both internally and externally by recruiters/managers. With more than 50 jobs being tested before the fine-tuning, 65% of the people recommended to be relevant for a job position using a base model (with tf-idf) were validated as being truly relevant by expert recruiters and through the hiring organisation. For example, out of a shortlist of 20 candidates, 13 candidates have been found to be a good match (in other words the relevance score has achieved at least 65% accuracy for those jobs). For the ensemble matching algorithm, using the method, models and final matching score described, an accuracy of 70-75% has been achieved. The accuracy will further increase with additional fine tuning, and if the percentage of good resume is higher for each job.
  • the loyalty score reflects the probability or likelihood that the individual leaves the organisation, and could be provided as a point in time, for example the likelihood that the individual leaves in a year, or provided as a plot over a time interval, for example the likelihood that the individual leaves over a whole year period may vary due to various reasons. For example, there may be a higher tendency for employees to leave after a calendar year or financial year for the employee to receive an annual bonus, or the organisation could award a loyalty bonus for every five years of service, thus an employee is more likely to leave only after a certain financial bonus is received. This is similarly applicable to other benefits described herein.
  • the method (300) comprises:
  • the processor (820) may identify and compile the job profile demographic data set a database (200) comprising an organisation data set (205) and a market data set (210).
  • the method (300) may further comprise subsetting the organisation data set and the market data set to form a job demographic profile data set.
  • the data model is termed a loyalty model and may be considered a predictive model.
  • the profile information (the individual's past employment history) is used as base features for building the loyalty model described here.
  • the employment history of the individual has the factual target variable - whether they have left the job within the first one year after they joined, and it is from this that the target variable values are derived, and this provides the foundation for supervised learning.
  • the employment history of every individual is extracted and formatted as the job demographic profile data set. With the common seasonality discovered through people's resigning behaviour, a time period (for example one year) is chosen as the time horizon to calculate the loyalty score.
  • the predictive loyalty model utilises supervised learning strategy and converts the loyalty problem to a binary classification problem.
  • the loyalty model may utilise linear models, for example continuous linear models and/or over time horizon linear models.
  • the model does not include such features as a single feature in the modelling process.
  • the training data is re-sampled to have a more balanced distribution for such features so the bias in the historical data is not carried forward to the future prediction.
  • the loyalty model may implement a deep neural network algorithm to have fast model training process and accurate prediction results.
  • the loyalty model utilises more than 300 different features, such that tree-based methods which include RandomForest or Gradient Boosting (GBM) would take too long for training as there are too many combinatorial possibilities to compare the entropy change.
  • the deep neural network algorithm is able to nonlinear transform the 300 different features to significantly fewer features based on the number of hidden layers and number of nodes for each layer, and converges much faster.
  • deep neural network is able to fit for hard-to-guess nonlinear statistical behaviour because of the intrinsic nature of the algorithm, it could provide better model fitting results with proper overfitting prevention.
  • two or more models may be blended to provide a better fit or faster fitting. Examples of models that may be used include continuous linear models, over time horizon linear models, Random Forest and Neural Networks models.
  • the model is optimized through a weighted cost function, which gives different importance to false positive and false negative prediction.
  • the model training process needs to minimize a cost function for convergence, e.g. if accuracy is chosen, the cost function would be the number of incorrect matching. However, for our situation, it is not appropriate to choose accuracy because on average only 20-30% of the people leave within the first one year.
  • the model might give very high false negative values if it predicts almost everyone is not going to leave. Instead, we use a customized Fl-score cost function so that we balance the false positive and false negative predictions.
  • the loyalty model may be considered a predictive model, when the target variable is known, the model is able to outperform the normal human judgement.
  • the loyalty model allows organisations to hire individuals who are more likely to stay with the organisation beyond a certain time period and minimises the organisation's costs in hiring and retraining people to fill the available positions caused by a high turnover in the employees.
  • the productivity score measures how productive an employee is or how productive a potential employee would be. For the former, this can be obtained at least in part from the organisation data based on the performance review of the employee, including input from a supervisor or manager. For a potential employee, this could be from the candidate's past work experience, education, or a personal recommendation and this could be the individual characteristic identified in the individual profile data set, and how these translate to the future productivity.
  • the productivity score is computed relative to a reference value.
  • a default reference value could be set as the best performing individual in an organisation or industry, but it will be apparent that other reference values may be used as desired.
  • a productivity score for an employee or potential employee can also be obtained by determining or computing the productivity score by reference to a productivity threshold.
  • the productivity threshold may be any suitable reference point in the organisation or industry.
  • the reference could be the top performing individual in a job or task in the organisation or industry.
  • An alternate reference could be the mean performance of the job or task.
  • the productivity score is computed based on a selected reference.
  • the productivity score can be determined based on the organisation data alone and/or with the market data. This could involve analysing the work performance review of an employee and empirical scores provided by a supervisor.
  • the productivity score may also be determined for potential employees based on the existing work done, but may be more limited due to the limited information for a specific industry or job type.
  • the productivity score of a job candidate can also reflect the predicted future performance of the individual, and can be based on different key performance indicators.
  • the method (300) used to calculate the productivity score is similar to that used for the loyalty score.
  • the difference lies in the individual characteristics identified, the training set used from the job demographic profile data set, and the type of statistical models used.
  • the job demographic profile data set may comprise a time period a person (or of all people in the organisation and/or industry) spends in a first job position in an organisation before the person is promoted to a second job position in the organisation (i.e. the second job position is at a higher level than then first job position).
  • the second job position may also be in another organisation, i.e. the person leaves for a higher level job at another organisation. If the key performance indicators of an organisation are available, these may instead be used or in combination with time period Feature engineering may also be done to generate modified individual characteristics as per the loyalty model.
  • KPIs Key Performance Indicators
  • work summary information The average promotion time for individuals in a particular industry is used to compare to whether the person got promoted quicker than his/her peers in the same industry and/or job position.
  • the dataset used for building the model is the same as the one mentioned above for building the loyalty model.
  • the profile information is used as base features (or individual characteristics) for building the performance model.
  • feature engineering to come up with new features from the existing base features, for example the frequency of job change, education status during the work, similarity of education background to every job.
  • the work summary contains information on job title changes within any particular company. This is considered as a promotion within the company and is used as a proxy for the performance of the individual, and all information that is not a promotion is ignored from our analysis.
  • the time taken for an individual to get promoted is compared with the average rate of promotion based on the industry and seniority of an individual. This is the expected promotion for an individual. It is a binary indicator where 1 means that the individual is promoted quicker than his/her peers and 0 otherwise. Thus, an individual who is promoted more quickly would be expected to have a better performance than another individual in the same position.
  • the performance model is a predictive model, when the target variable is known, the model is able to outperform the normal human judgement. In addition, it is possible to use the feedback provided by the organisation of the actual performance indicator of the individual who eventually took the job position to perform the fine-tuning for the predictive model.
  • a method 400 to compute and use an acceptability score for a job candidate can be an external candidate or an internal candidate within the organisation seeking a transfer.
  • the acceptability score reflects the likelihood of a job candidate accepting a job offer from the organisation and depends in part on the individual characteristics in the individual profile data set, and the job demographic profile. For example, the acceptability model predicts the probability of the candidate accepting the job offer based on his/her current salary, expected salary and the offered salary. In general, a job candidate is less likely to accept a job offer which is below the market average and vice versa.
  • the individual characteristics used for the acceptability score could be same or similar to that used for the loyalty score or productivity score.
  • some individual characteristics could include the salary of the employee (the salary here is taken to be the financial compensation to the employee including any fixed or variable wage component, like bonuses, commissions, and options), work benefits (e.g. leave, opportunity for professional development, flexible work arrangements), commuting distance and time from home, work satisfaction level, relationships with colleagues, relationships with clients, work environment, diversity and other characteristics of the workforce in the firm.
  • the individual data includes the data on the individual's current job, and previous job/s.
  • the industry data may also include the individual's current and previous company and industry.
  • the data on the new (i.e. the hiring) organisation or company and industry may be used in the analysis.
  • information on current salary, expected salary, whether an offer was given to the candidate (Yes/No answer), whether the offer was accepted by the candidate (Yes/No answer), if the offer was rejected by the candidate, what is the reason for the rejection and offered salary to the candidate is collected to form a training set to build a salary model that predicts whether a new hire would accept the offer at a particular offered salary.
  • the target outcome is dichotomous (Offer accepted or not)
  • linear models categorical
  • the outcome is a probability score giving the likelihood of the candidate to accept the job offer, and is termed an acceptability score.
  • the acceptability score can be used to perform causal analysis like the loyalty score and productivity score. For example, it can be determined using a causal data model whether the time interval between the last interview date and job offer date has an impact on whether a job offer is accepted. In another example, another causal model could be about the number of rounds of negotiation after the job is offered and how that impacts the acceptability.
  • the loyalty model, performance model and benchmark models described thus far provide predictions at an individual level, and may be further utilised with causal models to determine the impact of treatment on outcome on a group level, for example the impact of performance on loyalty once the individual loyalty and performance scores have been obtained.
  • causal analysis interactive impact is important. The reasons and the impact of those factors do not only depend on the candidates' information, but also depend on the details of the job. Therefore, some job information is needed for the feature engineering and modelling.
  • the method (600), shown in Figure 6, comprises:
  • profile data set comprises historical data associated with the plurality of individual characteristics of the job position (block 610);
  • score is used to determine the suitability of the individual for the job position (block 625);
  • the method 600 determines how the change in an individual characteristic will affect the loyalty score, i.e. the elasticity of the loyalty score.
  • the loyalty model may show how an individual characteristic affect the loyalty score, it is not the correct way of measure the impact of a treatment on outcome since the sample is biased.
  • recommendations can be provided on how to improve the loyalty score of the individual.
  • this provides a retention strategy for the employee when the loyalty score falls below the retention threshold. For example, it can be determined how an increase in the salary in the immediate time or future will affect the loyalty score of the employee.
  • Other benefits or incentives may also be provided, for example a promotion to the employee, additional leave benefits or a study award.
  • a second score may be computed to reflect the elasticity of the loyalty score.
  • the method 600 may further comprise generating an alert when a change in at least one of the individual characteristics causes the loyalty score to fall or rise below a retention threshold.
  • a retention threshold This is generally only applicable to an existing employee of the organisation.
  • the alert can be sent to a supervisor or human resource department.
  • the retention threshold may be determined from the organisation and/or market data to determine the probability that an employee with a certain loyalty score leaves. This serves to highlight to a supervisor or human resource department that an employee has a high likelihood to leave, and additional action may be required. It could also be used to indicate that measures taken are effective in reducing the likelihood of the employee leaving.
  • a causal relationship between at least one of the individual characteristic and the loyalty score is identified and analysed to determine how the individual characteristic affects the loyalty score.
  • the identification and analysis can be done by a causal relationship identification module.
  • the analysis of the causal relationship can be done using at least one of the Rubin causal model, the method of Instrumental Variables and Difference-in-differences method. For example, the effect of changes in the salary (or financial remuneration), promotion to the employee and/or provision of training (or skills development) are analysed to determine how the change will affect the individual's loyalty score.
  • residual error expected - predicted.
  • the residual errors from a time series can have temporal structure like trends, bias, and seasonality. Any temporal structure in the time series of residual forecast errors is useful as a diagnostic as it suggests information that could be incorporated into the predictive model. An ideal model would leave no structure in the residual error, just random fluctuations that cannot be modelled.
  • Structure in the residual error can also be modelled directly. There may be complex signals in the residual error that are difficult to directly incorporate into the model. Instead, you can create a model of the residual error time series and predict the expected error for your model. The predicted error can then be subtracted from the model prediction and in turn provide an additional lift in performance.
  • a simple and effective model of residual error is an autoregression. This is where some number of lagged error values are used to predict the error at the next time step. These lag errors are combined in a linear regression model, much like an autoregression model of the direct time series observations.
  • the retention strategy provided to improve the loyalty score may be varied to reflect the productivity of the employee, and to preferably retain employees with high productivity scores.
  • Counterfactual analysis can also show different scenarios to retain employees with different productivity scores and the costs associated to raise their loyalty scores by the same extent or to cross a certain threshold, like the retention threshold.
  • the system may also be used to show how changes in the salary of the employee or a promotion given to the employee may change the productivity of the employee.
  • a method (700) to determine the causal relationship of performance on loyalty is shown in Figure 7, the loyalty model (705) and performance model (710) are built and may be further merged in order to come up with one dataset that contains the treatment as well as the outcome (block 715).
  • the propensity score is calculated (720), one-on-one matching is performed, and statistical analysis is done on the matched data (730).
  • the method (700) comprises: merging a plurality of the data models (for example the loyalty model and the performance model explained above) to generate a second causal data model.
  • the method (700) may further comprise retrieving a plurality of data models from a database (200).
  • the second causal data model be built by calculating a propensity score for an individual in the merged data set; matching the propensity scores for all individuals in the merged data set to generate a propensity score matching (PSM) data set comprising pairs of individuals, each pair comprising a first individual who has received treatment, and a second individual who is a control; and performing statistical analysis to determine the impact of treatment.
  • PSM propensity score matching
  • Propensity Score Matching is used to estimate the impact of promotion (treatment, in this case) on loyalty (outcome). PSM is generally done on observational studies where random assignment of treatment to subjects is not possible. PSM removes selection bias between treatment and control groups. Propensity score is simply the probability of receiving treatment, given covariates.
  • Propensity score for person / which is a function of and we are indexing it by /, because person / has a unique set of covariates X,. So this is the probability of treatment, given that person's particular set of covariance.
  • the propensity score is matched to achieve balance.
  • the match can be made either on the entire set of covariates by taking the distance between them or by simply matching on the propensity score.
  • the probability of treatment given covariates would not depend on the covariates and hence it will be 0.5.
  • the propensity score is unknown.
  • the propensity score depends on observational data i.e. A and X, both of which are available in the data set. So the propensity score is estimated by treating the treatment as the outcome. Since the treatment is binary, we have used logistic regression to estimate the propensity scores, i.e. calculate the propensity score (block 720).
  • the model is fitted to get the predicted probabilities or fitted values for each subject.
  • the propensity score is a probability but it is unknown for an observational study.
  • the model is fitted to estimate the probability of a person to get the treatment. This is done by using all the features of a person as input and using the treatment as the outcome (usually a binary). Hence a classification model is built to estimate the probabilities. Once we get the probability, distance measures like nearest neighbours can be used to determine pairs of treated and control subjects.
  • the propensity score is a scalar, so each person will just have a single value of the propensity score, and will be a single number between zero and one for each person. This greatly simplifies the matching problem as only one variable needs to be matched as opposed to a whole set of variables. So essentially, the propensity score is summarizing all the covariates (X), and then is just match on that summary.
  • the propensity score calculation is an intermediate step that is used to identify pairs of individuals having the same characteristics except that one receives the treatment whereas the other is in control group.
  • matching is performed. Usually, matching might cause a reduction in the sample size because for each person on the treatment group we are trying to find a person in the control group such that their propensity scores match (block 725).
  • a dataset comprising pairs of individuals who are same in their characteristics except that one in the pair is given the treatment and the other in the pair is the control.
  • Paired t-test is performed on outcome to determine the mean impact of treatment on the treated (block 730).
  • the null hypothesis assumes that the true mean difference between the paired samples i.e. the group that got promoted and the one that did not is zero. Under this model, all observable differences are explained by random variation.
  • the alternative hypothesis assumes that the true mean difference between the paired samples is not equal to zero.
  • Statistical significance was determined by looking at the p-value. The significance or p-value is ⁇ 0.05 which means that null hypothesis (as stated above) can be rejected and the alternative hypothesis (as stated above) can be accepted. Thus, this shows whether there is a statistically significant difference (or impact) of promotion (in the form of the performance model as explained above) on the loyalty (i.e. whether the impact is random or not).
  • Figure 5 illustrates another embodiment of the invention.
  • the individual is compared against the market or industry data, in particular a pool of profiles (or people) who are working and/or have worked in a similar job and/or industry, to find the similarity of the individual compared to the industry norm.
  • a similarity score is determined by how similar a person is to people who have ever been qualified for this type of job position in our database. This is to distinguish a potential applicant based on common traits found in a group of individuals working in the same job function/industry. If the candidate has a stereotypical profile in the industry or is an outlier, as being more similar to the stereotypical profile adds more credibility to the profile. In many situations, an applicant might not be a good job fit even though they might seem to be based on their job profile/CV. By using the power of big data, we match an applicant against a pool of profiles who have ever worked in a similar job/industry and find their similarity with this pool of candidates.
  • Figure 5 illustrates further how the system and method 500 works.
  • the processor 820 identifies a plurality of individual characteristics from the individual profile data set 215.
  • the market data set 210 is subset by the processor 820 to generate a job demographic profile data set, which will be used as a benchmark for the applicant's profile.
  • the profile information is used as base features for calculating the Benchmark score.
  • feature engineering to come up with new features from the existing base features, for example the frequency of job change, education status during the work, similarity of education background to every job.
  • a different job demographic profile data set is obtained. Inputting the identified individual characteristics allows a similarity score between the applicant and the industry profile for the job can be computed.
  • the similarity score which has a value between 0 and 1 inclusive, may be computed using weighted cosine similarity.
  • Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.
  • the cosine of 0° is 1, and it is less than 1 for any other angle in the interval [0, 0.5n). It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.
  • the cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]
  • the name derives from the term "direction cosine”: in this case, unit vectors are maximally “similar” if they're parallel and maximally “dissimilar” if they're orthogonal (perpendicular). This is analogous to the cosine, which is unity (maximum value) when the segments subtend a zero angle and zero (uncorrelated) when the segments are perpendicular.
  • the cosine of two non-zero vectors can be derived by using the Euclidean dot product formula:
  • This formula takes in the prepared trained array ( from base dataset ) and test array ( incoming new CV) and outputs the similarity score for each item in test set against each item in train dataset (similarity score matrix).
  • a weight matrix is generated for each item in training dataset by calculating average similarity score against all other items in the training dataset. This is multiplied with the similarity score matrix. The product is then normalized by the summation of weights in the weight matrix. The final result is an array of scores for each item in the test dataset
  • the similarity score may be used alone or in conjunction with the scores obtained from the other methods described herein in order to determine the suitability of the candidate and in ranking multiple candidates.
  • the similarity score may be viewed as a distance score between two profiles - the individual applying for the job position and the stereotypical profile of people working in the job position.
  • the overall scoring function is a hybrid of above-mentioned models, and further utilize the domain knowledge and user-decision to refine the final scoring function.
  • the relevance score (and the data models) used may be combined with one of the loyalty score/model, productivity score (performance model) and the acceptability score (salary model).
  • all the three predictive models are combined together with the relevance score to generate a hybrid model.
  • the job requirements may be grouped into an essential component, a variable component and an optional component as explained above.
  • the method may apply a step-like function to the essential component to simulate the importance of the essential component of the requirements. For the person's skills that matches the optional component, a smaller weightage is given in computing their contribution to the score.
  • a logistic function is used to approximate the step function behaviour.
  • the linear function shows a consistent coefficient, so the increasing behaviour is proportional.
  • the logistic function is an S-shape function which shows smooth but sharp change within a small range, and on two ends of the domain values, the change is insignificant.
  • the appropriate function is chosen based on the statistical behaviour to obtain the required results.
  • the method may be further optimised to fine tune the overall score calculating, by using machine learning models with the scores from each section as the input and the users' (hiring side) decision as the final output.
  • the user's decision is collected to understand which section (or individual characteristic) is more critical and the model is fine tuned for more accurate results.
  • the bias would be reduced through a large user base, and the final relevance calculation would be more accurate.
  • Table 1 shows an example of different statistical models and functions that may be used to compute the relevant score in the various models described herein.
  • the different statistical models may be blended to obtain each of the relevance, loyalty, and productivity scores.
  • a way to obtain the models to compute the relevant score is by using stacked generalisation.
  • Another way is to use blending (or stacked ensembling) to obtain the relevant score.
  • the combined models allow the system and method to compute the loyalty score, which reflects the likelihood that the individual will leave at a point in time or over a period of time.
  • the loyalty score can be obtained by using the linear models (categorical), machine learning models like random forest and neural networks as shown in Table. More than one algorithm may be used to develop a model to analyse part of the scope, and we combine the output of each model to form the overall score.
  • Model selection is an iterative process. In order to select the right model, the relationship between the features and the target variable is checked. One method would be to plot a single feature vs the target variable and check if the relation is linear or not. Alternatively, the correlation coefficient between the two variables may be determined. The correlation coefficient varies between -1 and +1, with values near -1 or +1 indicating linearity. Since repeating this process for each feature can be tedious, a linear model may be built instead to check the accuracy. It the outcome is not good, it could mean that either the features are not good or the relation between features and target is non- linear. The training data may be subsequently used to build a non-linear model like SVM, neural network or random forest.
  • the scoring is based on the output of statistical models. For categorical outcomes the models give out a probability value. So if a person has a loyalty of 60%, it means that the probability of a person staying in the company for the next 12 months is 0.60. Multiple models are used to check the consistency of the model outcome across different machine learning models. Another reason for using a selection of models is because models differ in their characteristics in terms of how good they are in finding the relation between the features and the outcome. If the relation between a feature and an outcome is linear then a linear model will suffice but if the relation is non-linear, then using a linear model to fit the data would lead to inaccurate results. Instead, models like neural networks will be better at capturing non-linear relationships. Blending of the models may be done as required, for example the ensemble matching (or relevance) score. Multiple models are run and the scores from each of these models is combined either through voting or weighting to come up with one score.
  • the methods described herein are provided on a talent management platform, and serves to change and/or enhance the way recruitment is done, by optimising the six KPI's of hiring:
  • the methods, models, scores, and system described herein help the recruitment process to be much more efficient with better hiring quality.
  • the workflow design of the platform provides an effective and interactive way for stakeholders to manage the information and all relevant processes.
  • the insights and the ease through modelling and automation make the method/model really helpful.
  • the models change the way people do the hiring and the screening of candidates. They combine different unique elements in an unconventional way to evaluate every candidate from several perspectives.
  • the workflow design and the method provide secure and fast data access and analytical solution.
  • the methods described herein apply specific techniques to provide more accurate results, for example by correcting for skewed data sets, and bias in the models.
  • Modules may comprise either software modules (for example code embodied on a computer readable medium) or hardware implemented modules which have configured or arranged in a certain manner to perform certain operations.
  • one or more computer systems or one or more processors
  • a hardware implemented module may be implemented mechanically or electronically.
  • a hardware implemented module may comprise dedicated circuitry or logic that is permanently configured (like a field programmable gate array or an application specific integrated circuit), or programmable logic or circuitry.
  • modules or software can be used to practice certain aspects of the invention.
  • software-as-a-service (SaaS) models or application service provider (ASP) models may be employed as software application delivery models to communicate software applications to clients or other users.
  • Such software applications can be downloaded through an Internet connection, for example, and operated either independently (e.g., downloaded to a laptop or desktop computer system) or through a third-party service provider (e.g., accessed through a third-party web site).
  • cloud computing techniques may be employed in connection with various embodiments of the invention.
  • a "module" may include software, firmware, hardware, or any reasonable combination thereof.
  • the processes associated with the present embodiments may be executed by programmable equipment, such as computers.
  • Software or other sets of instructions that may be employed to cause programmable equipment to execute the processes may be stored in any storage device, such as a computer system (non-volatile) memory.
  • some of the processes may be programmed when the computer system is manufactured or via a computer-readable memory storage medium.
  • a “computer,” “computer system,” “computing apparatus,” “component,” or “computer processor” may be, for example and without limitation, a processor, microcomputer, minicomputer, server, mainframe, laptop, personal data assistant (PDA), wireless e-mail device, smart phone, mobile phone, electronic tablet, cellular phone, pager, processor, fax machine, scanner, or any other programmable device or computer apparatus configured to transmit, process, and/or receive data.
  • Computer systems and computer- based devices disclosed herein may include memory for storing certain software applications used in obtaining, processing, and communicating information. It can be appreciated that such memory may be internal or external with respect to operation of the disclosed embodiments.
  • the memory may also include any means for storing software, including a hard disk, an optical disk, floppy disk, ROM (read only memory), RAM (random access memory), PROM (programmable ROM), EEPROM (electrically erasable PROM) and/or other computer-readable memory media.
  • ROM read only memory
  • RAM random access memory
  • PROM programmable ROM
  • EEPROM electrically erasable PROM
  • a "host,” “engine,” “loader,” “filter,” “platform,” or “component” may include various computers or computer systems, or may include a reasonable combination of software, firmware, and/or hardware.
  • a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to perform a given function or functions. Except where such substitution would not be operative to practice embodiments of the present invention, such substitution is within the scope of the present invention.
  • Any of the servers described herein, for example may be replaced by a "server farm" or other grouping of networked servers (e.g., a group of server blades) that are located and configured for cooperative functions. It can be appreciated that a server farm may serve to distribute workload between/among individual components of the farm and may expedite computing processes by harnessing the collective and cooperative power of multiple servers.
  • Such server farms may employ load-balancing software that accomplishes tasks such as, for example, tracking demand for processing power from different machines, prioritizing and scheduling tasks based on network demand, and/or providing backup contingency in the event of component failure or reduction in operability.
  • Examples of assembly languages include ARM, MIPS, and x86; examples of high level languages include Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal, Object Pascal; and examples of scripting languages include Bourne script, JavaScript, Python, Ruby, PHP, and Perl.
  • Various embodiments may be employed in a Lotus Notes environment, for example.
  • Such software may be stored on any type of suitable computer-readable medium or media such as, for example, a magnetic or optical storage medium.
  • Various embodiments of the systems and methods described herein may employ one or more electronic computer networks to promote communication among different components, transfer data, or to share resources and information.
  • Such computer networks can be classified according to the hardware and software technology that is used to interconnect the devices in the network, such as optical fiber, Ethernet, wireless LAN, HomePNA, power line communication or G.hn.
  • the computer networks may also be embodied as one or more of the following types of networks: local area network (LAN); metropolitan area network (MAN); wide area network (WAN); virtual private network (VPN); storage area network (SAN); or global area network (GAN), among other network varieties.
  • an application server may be a server that hosts an API to expose business logic and business processes for use by other applications.
  • Examples of application servers include J2EE or Java EE 5 application servers including WebSphere Application Server.
  • Other examples include WebSphere Application Server Community Edition (IBM), Sybase Enterprise Application Server (Sybase Inc), WebLogic Server (BEA), JBoss (Red Hat), JRun (Adobe Systems), Apache Geronimo (Apache Software Foundation), Oracle OC4J (Oracle Corporation), Sun Java System Application Server (Sun Microsystems), and SAP Netweaver AS (ABAP/Java).
  • application servers may be provided in accordance with the .NET framework, including the Windows Communication Foundation, .NET Remoting, ADO.NET, and ASP.NET among several other components.
  • JSP Java Server Page
  • the application servers may mainly serve web-based applications, while other servers can perform as session initiation protocol servers, for instance, or work with telephony networks.
  • Specifications for enterprise application integration and service-oriented architecture can be designed to connect many different computer network elements. Such specifications include Business Application Programming Interface, Web Services Interoperability, and Java EE Connector Architecture.
  • Embodiments of the methods and systems described herein may divide functions between separate CPUs, creating a multiprocessing configuration. For example, multiprocessor and multi-core (multiple CPUs on a single integrated circuit) computer systems with co-processing capabilities may be employed. Also, multitasking may be employed as a computer processing technique to handle simultaneous execution of multiple computer programs.
  • the computer systems, data storage media, or modules described herein may be configured and/or programmed to include one or more of the above-described electronic, computer-based elements and components, or computer architecture.
  • these elements and components may be particularly configured to execute the various rules, algorithms, programs, processes, and method steps described herein.
  • the description is merely exemplary in nature and is not intended to limit the invention or the application and uses of the invention.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé et un système de détermination de l'adéquation d'un individu pour un poste. Le procédé consiste à : identifier une pluralité de caractéristiques individuelles à partir d'un ensemble de données de profil individuel; récupérer un modèle de données construit à partir de l'ensemble de données de profil démographique de poste, les données de profil démographique de poste comprenant des données historiques associées à la pluralité de caractéristiques individuelles du poste; fournir la pluralité identifiée de caractéristiques individuelles en entrée dans le modèle de données; et calculer un score pour l'individu en fonction de l'entrée dans le modèle de données, le score étant utilisé pour déterminer l'adéquation de l'individu pour le poste.
PCT/SG2018/050583 2017-11-30 2018-11-29 Plateforme de gestion de compétences WO2019108133A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG11201907551YA SG11201907551YA (en) 2017-11-30 2018-11-29 Talent management platform
GBGB1909943.1A GB201909943D0 (en) 2017-11-30 2018-11-29 Talent management platform

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10201709959Q 2017-11-30
SG10201709959Q 2017-11-30

Publications (1)

Publication Number Publication Date
WO2019108133A1 true WO2019108133A1 (fr) 2019-06-06

Family

ID=66665165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2018/050583 WO2019108133A1 (fr) 2017-11-30 2018-11-29 Plateforme de gestion de compétences

Country Status (3)

Country Link
GB (1) GB201909943D0 (fr)
SG (1) SG11201907551YA (fr)
WO (1) WO2019108133A1 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751587A (zh) * 2019-07-23 2020-02-04 福建奇点时空数字科技有限公司 一种基于人工神经网络的建设数据挖掘方法
CN111598462A (zh) * 2020-05-19 2020-08-28 厦门大学 一种面向校园招聘的简历筛选方法
CN112036641A (zh) * 2020-08-31 2020-12-04 中国平安人寿保险股份有限公司 基于人工智能的留存预测方法、装置、计算机设备及介质
CN112541013A (zh) * 2020-01-02 2021-03-23 北京融信数联科技有限公司 基于移动信令大数据的应届毕业生跳槽频率分析方法
WO2021202407A1 (fr) * 2020-03-30 2021-10-07 Eightfold AI Inc. Plateforme informatique mettant en œuvre un marché du travail multivoque
US20220092547A1 (en) * 2020-09-18 2022-03-24 Eightfold AI Inc. System, method, and computer program for processing compensation data
US20220138698A1 (en) * 2020-10-29 2022-05-05 Accenture Global Solutions Limited Utilizing machine learning models for making predictions
US11544626B2 (en) * 2021-06-01 2023-01-03 Alireza ADELI-NADJAFI Methods and systems for classifying resources to niche models
CN115578080A (zh) * 2022-12-08 2023-01-06 长沙软工信息科技有限公司 基于信息化系统造价基准库工作量核定方法
US11562032B1 (en) 2022-02-08 2023-01-24 My Job Matcher, Inc. Apparatus and methods for updating a user profile based on a user file
US11836633B2 (en) * 2020-09-08 2023-12-05 Vettery, Inc. Generating realistic counterfactuals with residual generative adversarial nets
US11847616B2 (en) 2022-05-13 2023-12-19 Stynt Inc. Apparatus for wage index classification
EP4312170A1 (fr) * 2022-07-28 2024-01-31 Gojob Dispositif et procédé d'association d'un utilisateur à une offre d'emploi optimale

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120123956A1 (en) * 2010-11-12 2012-05-17 International Business Machines Corporation Systems and methods for matching candidates with positions based on historical assignment data
US20140122355A1 (en) * 2012-10-26 2014-05-01 Bright Media Corporation Identifying candidates for job openings using a scoring function based on features in resumes and job descriptions
US20150006422A1 (en) * 2013-07-01 2015-01-01 Eharmony, Inc. Systems and methods for online employment matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120123956A1 (en) * 2010-11-12 2012-05-17 International Business Machines Corporation Systems and methods for matching candidates with positions based on historical assignment data
US20140122355A1 (en) * 2012-10-26 2014-05-01 Bright Media Corporation Identifying candidates for job openings using a scoring function based on features in resumes and job descriptions
US20150006422A1 (en) * 2013-07-01 2015-01-01 Eharmony, Inc. Systems and methods for online employment matching

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751587A (zh) * 2019-07-23 2020-02-04 福建奇点时空数字科技有限公司 一种基于人工神经网络的建设数据挖掘方法
CN112541013B (zh) * 2020-01-02 2021-12-28 北京融信数联科技有限公司 基于移动信令大数据的应届毕业生跳槽频率分析方法
CN112541013A (zh) * 2020-01-02 2021-03-23 北京融信数联科技有限公司 基于移动信令大数据的应届毕业生跳槽频率分析方法
WO2021202407A1 (fr) * 2020-03-30 2021-10-07 Eightfold AI Inc. Plateforme informatique mettant en œuvre un marché du travail multivoque
CN111598462B (zh) * 2020-05-19 2022-07-12 厦门大学 一种面向校园招聘的简历筛选方法
CN111598462A (zh) * 2020-05-19 2020-08-28 厦门大学 一种面向校园招聘的简历筛选方法
CN112036641B (zh) * 2020-08-31 2024-05-14 中国平安人寿保险股份有限公司 基于人工智能的留存预测方法、装置、计算机设备及介质
CN112036641A (zh) * 2020-08-31 2020-12-04 中国平安人寿保险股份有限公司 基于人工智能的留存预测方法、装置、计算机设备及介质
US11836633B2 (en) * 2020-09-08 2023-12-05 Vettery, Inc. Generating realistic counterfactuals with residual generative adversarial nets
US20220092547A1 (en) * 2020-09-18 2022-03-24 Eightfold AI Inc. System, method, and computer program for processing compensation data
US20220138698A1 (en) * 2020-10-29 2022-05-05 Accenture Global Solutions Limited Utilizing machine learning models for making predictions
US11514403B2 (en) * 2020-10-29 2022-11-29 Accenture Global Solutions Limited Utilizing machine learning models for making predictions
US11544626B2 (en) * 2021-06-01 2023-01-03 Alireza ADELI-NADJAFI Methods and systems for classifying resources to niche models
US11860953B2 (en) 2022-02-08 2024-01-02 My Job Matcher, Inc. Apparatus and methods for updating a user profile based on a user file
US11562032B1 (en) 2022-02-08 2023-01-24 My Job Matcher, Inc. Apparatus and methods for updating a user profile based on a user file
US11847616B2 (en) 2022-05-13 2023-12-19 Stynt Inc. Apparatus for wage index classification
EP4312170A1 (fr) * 2022-07-28 2024-01-31 Gojob Dispositif et procédé d'association d'un utilisateur à une offre d'emploi optimale
CN115578080A (zh) * 2022-12-08 2023-01-06 长沙软工信息科技有限公司 基于信息化系统造价基准库工作量核定方法

Also Published As

Publication number Publication date
SG11201907551YA (en) 2019-09-27
GB201909943D0 (en) 2019-08-28

Similar Documents

Publication Publication Date Title
WO2019108133A1 (fr) Plateforme de gestion de compétences
Khayer et al. Cloud computing adoption and its impact on SMEs’ performance for cloud supported operations: A dual-stage analytical approach
Sterling et al. The confidence gap predicts the gender pay gap among STEM graduates
Cameron et al. A simple tool to predict admission at the time of triage
US11727328B2 (en) Machine learning systems and methods for predictive engagement
Fedushko et al. Real-time high-load infrastructure transaction status output prediction using operational intelligence and big data technologies
US20210383308A1 (en) Machine learning systems for remote role evaluation and methods for using same
US20220067665A1 (en) Three-party recruiting and matching process involving a candidate, referrer, and hiring entity
US20160292248A1 (en) Methods, systems, and articles of manufacture for the management and identification of causal knowledge
AU2017258946A1 (en) Automatic interview question recommendation and analysis
US20130110567A1 (en) Human capital assessment and ranking system
US20140108103A1 (en) Systems and methods to control work progress for content transformation based on natural language processing and/or machine learning
US20190066020A1 (en) Multi-Variable Assessment Systems and Methods that Evaluate and Predict Entrepreneurial Behavior
Ragone et al. On peer review in computer science: Analysis of its effectiveness and suggestions for improvement
US20210383261A1 (en) Machine learning systems for collaboration prediction and methods for using same
Thaheem et al. A survey on usage and diffusion of project risk management techniques and software tools in the construction industry
Poluru et al. Applications of Domain-Specific Predictive Analytics Applied to Big Data
Wilson Comparative analysis in public management: reflections on the experience of a major research programme
US20200327475A1 (en) Systems and Methods for Maximizing Employee Return on Investment
Langan et al. Benchmarking factor selection and sensitivity: a case study with nursing courses
US20170024830A1 (en) Enhanced employee turnover rate
Lamola Moulding information systems components and agitations for the adoption of enterprise application architecture for supply chain management
US20230297964A1 (en) Pay equity framework
Dunleavy et al. Measuring adverse impact in employee selection decisions
Samoilenko et al. An approach to modelling complex ICT4D investment problems: towards a solution-oriented framework and data analytics methodology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18883788

Country of ref document: EP

Kind code of ref document: A1