CN117592450A - Panoramic archive generation method and system based on employee information integration - Google Patents

Panoramic archive generation method and system based on employee information integration Download PDF

Info

Publication number
CN117592450A
CN117592450A CN202311345854.9A CN202311345854A CN117592450A CN 117592450 A CN117592450 A CN 117592450A CN 202311345854 A CN202311345854 A CN 202311345854A CN 117592450 A CN117592450 A CN 117592450A
Authority
CN
China
Prior art keywords
information
data
archive
panoramic
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202311345854.9A
Other languages
Chinese (zh)
Inventor
冯天健
周明
张靖
马永
薛晓茹
徐道磊
唐轶轩
周婕
张子健
张迪
郑皓文
时雨农
查伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority to CN202311345854.9A priority Critical patent/CN117592450A/en
Publication of CN117592450A publication Critical patent/CN117592450A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of human resource management, and provides a panoramic archive generation method and a system based on employee information integration, wherein the method comprises the steps of receiving a data packet imported by an internal and external data source of an enterprise, and analyzing the data packet to obtain employee information resource text; establishing an employee information classification model, traversing the content of the input employee information resource text by text, identifying and classifying employee information data based on a keyword database of a file information list, determining the file category of the employee, and generating a file information item architecture tree corresponding to the file information list; loading a pre-configuration module database, screening corresponding templates based on file categories and file information item architecture trees, writing in according to a format, and automatically generating a panoramic file. The invention provides a more comprehensive and high-quality employee profile by automating and integrating employee information.

Description

Panoramic archive generation method and system based on employee information integration
Technical Field
The invention relates to the field of human resource management, in particular to a panoramic archive generation method and system based on employee information integration.
Background
In enterprise management, it is critical to maintain accurate and timely employee information to support various human resources and salary management activities. With the development of technology, some traditional employee information management methods and systems are presented, however, these methods and systems generally have some limitations, including the following problems:
1. information dispersion: traditional businesses use Human Resource Information Systems (HRIS) or similar systems to manage employee information, which typically contain basic employee information such as name, contact, work experience, training records, and performance data. However, such information is often scattered in different departments and databases, and is difficult to integrate and access comprehensively, resulting in scattered and inconsistent information, and difficult to maintain.
2. The timeliness is low: traditional employee information management methods generally rely on manual maintenance, which results in low timeliness of information, and employee information may change but not be updated in time, thereby affecting accuracy of human resource decision.
3. Data quality problem: as in the data collection and maintenance process, employee information comes from a variety of sources, including internal databases, external recruitment sites, employee manuals, online forms, and the like. Such information exists in different formats, including text, spreadsheets, and database records, there is a diversity of data sources, employee information is often dispersed in different databases and systems, different departments may use different software and tools to manage employee information, there is a data dispersion problem, and data quality problems caused by both are also a common challenge.
4. Manual integration and access issues: traditional employee information management typically involves manual integration and data access. Employee information typically needs to be manually integrated from multiple sources, and accessing employee information may require long time searches and data collations, lack of multi-angle queries, and difficulty in viewing employee information from different angles, such as payroll history, training records, performance assessment, and the like.
5. It is difficult to support multi-scenario applications: conventional employee information management systems often cannot effectively support multi-scenario applications, such as salary adjustment, talent reporting, etc., requiring multiple exports and processing of data.
Therefore, the prior art is often based on a staff information system with single purpose, and is difficult to meet diversified enterprise demands, and cannot solve the data dispersion, manual processing, data inconsistency and difficult-to-access bags, but neglects the demands of information integration and multi-angle query. Based on the above problems, there is a need for an omnidirectional and multi-angle employee information management method and system, which can meet challenges in modern enterprise management by using a panoramic archive method, and improve efficiency and accuracy of employee information management.
Disclosure of Invention
The embodiment of the invention aims to provide a panoramic archive generation method and a panoramic archive generation system based on employee information integration, which provide more comprehensive and high-quality employee archives through automation and employee information integration, solve various problems in traditional employee information management, provide better data access, information integration and compliance for enterprises, and are beneficial to improving management efficiency and decision quality.
In order to achieve the above purpose, the embodiment of the present invention provides the following technical solutions.
In a first aspect, the present invention provides a panoramic archive generating method based on employee information integration, including the following steps:
receiving a data packet imported by an external data source in an enterprise, and analyzing the data packet to obtain employee information resource text;
establishing an employee information classification model, traversing the content of the input employee information resource text by text, identifying and classifying employee information data based on a keyword database of a file information list, determining the file category of the employee, and generating a file information item architecture tree corresponding to the file information list;
loading a pre-configuration module database, screening corresponding templates based on file categories and file information item architecture trees, writing in according to a format, and automatically generating a panoramic file.
As a further scheme of the invention, the panoramic archive generation method based on employee information integration also comprises the calling of the panoramic archive; the invoking of the panoramic archive comprises the following steps:
acquiring a file calling request input by a user side, wherein the file calling request comprises a unique identifier of an employee and specific item information of a calling requirement;
According to the unique identifier of the employee, the panoramic archive data of the corresponding employee is retrieved from the panoramic archive resource database, and screening and sorting are carried out according to the calling requirement;
identifying keyword data related to the calling requirement in the screened panoramic archive data, including but not limited to date and project name;
screening out panoramic archive resource data meeting the calling requirement based on the keyword data, and generating a corresponding panoramic archive text;
the generated employee panoramic archive text is exported for review or further processing by the user.
As a further scheme of the invention, when analyzing the data packet to obtain employee information resource text, the method comprises the following steps: identifying the format of a text file, a table file or a database file contained in the data packet; analyzing the data packet based on the loaded text analyzer, the table processing library and the database connecting tool, extracting employee information data and converting the employee information data into a structured text form; the extracted employee information data comprises personal basic information, work experience and educational background, and is converted into a structured text form in XML or JSON format.
As a further scheme of the invention, the employee information classification model is established, and comprises the following steps:
Step 1, data acquisition and processing:
collecting employee information data, removing duplicate items and missing values, converting the employee information data into document data in a standardized data format, marking a document category label, and taking the document category label as sample data; wherein the employee information data includes personal information, work experience, educational background, information from an internal enterprise database, an external data source, or user provided information;
step 2, keyword and phrase identification:
automatically identifying and extracting keywords and phrases in the document data by utilizing Natural Language Processing (NLP) technology based on the established keyword and phrase database;
step 3, data set division:
converting the document data into a data set represented by the feature vectors processed by the computer, and dividing the data set into a training set and a testing set;
step 4, model training and testing:
constructing a classification model by using a naive Bayes algorithm, training the classification model by using training set data, and learning to distribute document data into different archive categories; using the recall rate of the test set test model, and carrying out parameter adjustment on the model according to the performance evaluation result;
step 5, generating a file information item architecture tree:
classifying the employee information texts by using the trained classification model, and determining the archive category of each text; based on the classification result, generating an information item architecture tree of each archive category, wherein the information item architecture tree comprises different information items: personal information, work experience, educational background;
Step 6, deploying a model:
the trained model is deployed to a panoramic archive generation system for automatic classification of employee information.
As a further scheme of the invention, when the naive Bayes algorithm is used for constructing the classification model, the method further comprises the step of calculating the prior probability of the category, wherein the steps are as follows:
preparing a training data set containing classified documents, wherein the training data set comprises document data and category labels corresponding to the document data;
counting the training data set, and calculating the number of documents in each category to obtain the document frequency of each category;
calculating the total document number in the training data set, and calculating the prior probability of the category through a formula:
P(C)=N/N_total
wherein P (C) is the prior probability of class C; n is the number of documents belonging to category C; n_total is the total number of documents;
the obtained prior probability of each category is used for predicting the category of the new document.
As a further aspect of the present invention, when predicting a category of a new document according to the obtained prior probability of each category, the method further includes calculating a feature condition probability (P (x|c)), that is, a probability that a feature x is observed given the category C, as follows:
counting the documents in the training data set according to the training data set of the classified documents, and calculating the occurrence times of each feature under each category; the training data set comprises document data, category labels corresponding to the document data and features appearing in the document;
Given class C, the frequency of feature x; if the document data has M different characteristics, the document data is expressed as: n (x, C), the frequency of the feature x under category C, generating a frequency for each category and each feature;
calculating the relative frequency P (x|C) of the feature x in each category C to obtain the conditional probability of each feature in each category:
P(x|C)=N(x,C)/N(C)
wherein P (x|C) is the conditional probability of feature x under category C; n (x, C) is the frequency of feature x under category C; n (C) is the total number of documents under category C.
As a further aspect of the present invention, when predicting a category of a new document using a naive bayes classification model, the method includes the steps of:
preparing a trained naive bayes classification model including a priori probability (P (C)) of the class and a conditional probability (P (x|c)) of the feature calculated from the training data;
preparing a new document, performing text preprocessing on the new document, which is the same as training data, and converting the new document into a feature vector;
for each category C, a posterior probability P (c|x) is calculated using bayesian theorem, where X represents the feature vector of the new document:
P(C|X)=P(X|C)*P(C)/P(X)
where P (C|X) is the posterior probability of class C given feature X; p (X|C) is the conditional probability of feature X under category C, obtained from the training data; p (C) is the prior probability of the class C and is obtained from the training data; p (X) is the marginal probability of feature X, calculated by summing the posterior probabilities of the various categories in which P (X) is the same for each category;
After the posterior probability of each category is calculated, the category with the highest posterior probability is selected as the prediction category of the new document; namely:
prediction category=argmax P (c|x).
As a further aspect of the present invention, the generating a profile information item architecture tree corresponding to a profile information list includes:
defining a file information list of employee information integration, wherein the file information list comprises information items of the employee information integration;
each information item in the file information list is hierarchically structured to create a file information item architecture tree, and each information item in the file information item architecture tree is assigned with a unique tree structure label to create the information item architecture tree.
As a further scheme of the invention, when the panoramic archive is generated, staff information is mapped according to corresponding information items in the information item architecture tree and stored in the panoramic archive resource database, and a complete panoramic archive is created for a user side to query and retrieve panoramic archive data of corresponding staff by using a staff unique identifier and specific item information required by retrieval.
In a second aspect, the present invention further provides a panoramic archive generating system based on employee information integration, including:
and a data importing module: the method comprises the steps of receiving data packets from external data sources in an enterprise, and analyzing the data packets to obtain employee information resource texts;
Employee information classification model: the system is used for automatically identifying and classifying employee information data and determining the file type of the employee; generating a file information item architecture tree according to the information items defined by the file information list;
pre-configuration module database: comprises a predefined template and format, and screens and automatically generates a panoramic archive according to archive category and archive information item architecture tree.
Panoramic archive resource database: the panoramic archive data generation module is used for storing the generated panoramic archive data; the staff information is stored according to the structure of the information item architecture tree;
panorama archive calling module: allowing the user side to search and arrange panoramic archive data according to the unique identifier of the staff and the specific item information required by the call, generating panoramic archive text meeting the call requirement, and allowing the user to review.
As a further scheme of the invention, the panoramic archive generation system based on employee information integration also comprises a text parser, a table processing library and a database connection tool; the text analyzer, the table processing library and the database connecting tool are respectively used for analyzing the formats of text files, table files or database files contained in the imported data packets and extracting and converting employee information data into a structured text form.
As a further scheme of the invention, the panoramic archive generation system based on employee information integration also comprises a naive Bayesian classification model for automatically classifying the employee information, and the prior probability of the classification and the conditional probability of the characteristics are calculated by utilizing the training data set of the classified documents, so that the classification of the new document can be conveniently predicted.
In a third aspect, in still another embodiment of the present invention, there is provided a panorama archive generating device based on employee information integration, including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform an operation corresponding to the panoramic archive generating method based on employee information integration according to the first aspect.
In a fourth aspect, in still another embodiment of the present invention, a storage medium is provided, where at least one executable instruction is stored, where the executable instruction causes a processor to perform an operation corresponding to the panoramic archive generating method based on employee information integration according to the first aspect.
Compared with the prior art, the panoramic archive generation method and system based on employee information integration provided by the embodiment of the invention have the following beneficial effects:
1. automated information integration: the panoramic archive generation method and the panoramic archive generation system based on employee information integration can automatically analyze and integrate employee information from different data sources without manual intervention, greatly improve the efficiency of information integration and reduce the time and labor cost of manual processing.
2. Accuracy of data: by automatically classifying and structuring employee information, the panoramic archive generation method and system based on employee information integration are beneficial to reducing data input errors and information inconsistency, and the generated panoramic archive keeps high quality and accuracy.
3. Information traceability: the generated panoramic archive allows a user to easily trace the history record and change of employee information, which is very important for auditing and compliance; allowing businesses to define archival information listings and information item architecture trees according to their specific needs makes them highly customizable so that the system adapts to the needs of different organizations, rather than enforcing specification of information.
4. And (3) quick data retrieval: the panorama archive calling function in the system allows a user to easily search and retrieve employee information according to the unique employee identifier and specific item information, and improves the speed and efficiency of data access and retrieval.
5. Panoramic view: the generated panoramic archive provides a comprehensive view of employee information, including personal information, work experience, educational background, etc., so that the manager can better understand the employees and make more intelligent decisions, and the information in the panoramic archive facilitates the enterprises to make more intelligent business decisions, such as recruitment, performance assessment, employee training, etc.
In summary, the panoramic archive generation method and the panoramic archive generation system based on employee information integration improve the efficiency and the quality of employee information management, promote the consistency, the traceability and the safety of information, and provide better data access and decision support tools for enterprises.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 shows a flowchart of a panoramic archive generation method based on employee information integration provided by the invention;
fig. 2 shows a flowchart of implementation of panoramic file invocation in a panoramic file generation method based on employee information integration according to an embodiment of the present invention;
Fig. 3 shows a system architecture diagram of a panoramic archive generating system based on employee information integration according to an embodiment of the present invention.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments herein to enable those skilled in the art to practice them. Portions and features of some embodiments may be included in, or substituted for, those of others. The scope of the embodiments herein includes the full scope of the claims, as well as all available equivalents of the claims.
The terms "first," "second," and the like herein are used merely to distinguish one element from another element and do not require or imply any actual relationship or order between the elements. Indeed the first element could also be termed a second element and vice versa. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a structure, system, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such structure, system, or apparatus.
The panoramic archive generation method and the panoramic archive generation system based on employee information integration provide a better management scheme of data access, information integration and compliance for enterprises for various problems in traditional employee information management, and provide more comprehensive and high-quality employee archives through automation and integration of employee information.
Referring to fig. 1, an embodiment of the present invention provides a panoramic archive generating method based on employee information integration, including the following steps:
step S10, receiving a data packet imported by an external data source in an enterprise, and analyzing the data packet to obtain employee information resource text;
when analyzing the data packet to obtain employee information resource text, the method comprises the following steps: identifying the format of a text file, a table file or a database file contained in the data packet; analyzing the data packet based on the loaded text analyzer, the table processing library and the database connecting tool, extracting employee information data and converting the employee information data into a structured text form; the extracted employee information data comprises personal basic information, work experience and educational background, and is converted into a structured text form in XML or JSON format.
Step S20, an employee information classification model is established, the content of the input employee information resource text is traversed text by text, the employee information data are identified and classified based on a keyword database of a file information list, the file type of the employee is determined, and a file information item architecture tree corresponding to the file information list is generated;
and S30, loading a pre-configuration module database, screening corresponding templates based on the file category and the file information item architecture tree, writing according to the format, and automatically generating the panoramic file.
In step S10 of the present embodiment, the system receives a data packet imported from an external data source within the enterprise, where the data packet may include various file formats, such as a text file, a table file, or a database file. The data packet contains employee information resource text. And analyzing the data packet by adopting a text analyzer, a table processing library and a database connection tool. This process can identify the different file formats, then extract employee information data and convert it to a structured text form, typically in XML or JSON format. The data packet contains personnel information such as personal basic information, working experience, educational background and the like.
In step S20, in the employee information classification model construction, in the data collection and processing stage, employee information data is collected, duplicate items and missing values in the data are removed, the data are converted into standardized document data formats, and document class labels are identified for each document data, and the employee information data is provided from an internal database, an external data source, or by a user of the enterprise. Keyword and phrase recognition is then performed, and the system automatically recognizes and extracts keywords and phrases in the document data based on the established keyword and phrase database, using Natural Language Processing (NLP) techniques, which helps to more finely distinguish between different parts of employee information. Then, data set partitioning is performed to convert the document data into data sets represented by computer-processable feature vectors. The data set is then divided into a training set and a test set for training and performance evaluation of the classification model. In model training and testing, a naive Bayesian algorithm is used to construct a classification model whose objective is to learn how to assign document data to different archive categories, and training is performed using training set data while training the model. When testing the model, the performance of the model is evaluated by using the test set data, and the system can adjust the model parameters according to the performance evaluation result. Finally, forming a file information item architecture tree, classifying the employee information text by using a trained classification model, and generating an information item architecture tree of each file type based on classification results, wherein the information item architecture tree comprises different information items such as personal information, working experience, education background and the like and is used for organizing and storing different parts of employee information.
In step S30, the file generation and call process includes loading a pre-configured module database and generating a panoramic file, wherein in the step of loading the pre-configured module database, the system loads the pre-configured module database, and the module database contains various templates required for generating employee information files and is used for generating panoramic files of different file types; when the panoramic archive is generated, the system screens an applicable template based on archive categories and archive information item architecture trees, writes data into the template according to a specified format, automatically generates the panoramic archive, and organizes the panoramic archive including data of different information items according to a predefined information item architecture tree structure.
According to the panoramic archive generation method based on employee information integration, high-quality panoramic archive generation and access are provided through automation and employee information integration. This helps to improve the efficiency of data management and decision support within the enterprise.
In some embodiments, as described with reference to fig. 2, the panoramic archive generating method based on employee information integration further includes invoking the panoramic archive; the invoking of the panoramic archive comprises the following steps:
step S101, a file access request input by a user terminal is obtained, wherein the file access request comprises two main parts:
Employee unique identifier: the user provides a unique identifier of the employee, typically a unique identification number or employee ID, for accurately identifying the employee;
invoking required specific item information: the user specifies specific item information that they want to acquire, such as the educational background of the employee, work experience, or item information within a specific date range.
Step S102, according to the unique identifier of the staff, the panoramic archive data of the corresponding staff is retrieved from a panoramic archive resource database, and screening and sorting are carried out according to the calling requirement;
step S103, identifying keyword data related to the calling requirement, including but not limited to date and project name, from the screened panoramic archive data;
step S104, based on the keyword data, panoramic archive resource data meeting the calling requirement is screened out, and a corresponding panoramic archive text is generated;
step S105, the generated employee panoramic archive text is exported so that the user can review or further process the employee panoramic archive text.
In the embodiment, according to the request of the user, the related information is retrieved and arranged from the panoramic archive resource so as to meet the specific requirement of the user, thereby being beneficial to the user to quickly access the required employee information and improving the availability and accessibility of the information.
In the embodiment of the invention, an employee information classification model is established, which comprises the following steps:
step 1, data acquisition and processing:
collecting employee information data, removing duplicate items and missing values, converting the employee information data into document data in a standardized data format, marking a document category label, and taking the document category label as sample data; wherein the employee information data includes personal information, work experience, educational background, information from an internal enterprise database, an external data source, or user provided information;
step 2, keyword and phrase identification:
automatically identifying and extracting keywords and phrases in the document data by utilizing Natural Language Processing (NLP) technology based on the established keyword and phrase database;
step 3, data set division:
converting the document data into a data set represented by the feature vectors processed by the computer, and dividing the data set into a training set and a testing set;
step 4, model training and testing:
constructing a classification model by using a naive Bayes algorithm, training the classification model by using training set data, and learning to distribute document data into different archive categories; using the recall rate of the test set test model, and carrying out parameter adjustment on the model according to the performance evaluation result;
Step 5, generating a file information item architecture tree:
classifying the employee information texts by using the trained classification model, and determining the archive category of each text; based on the classification result, generating an information item architecture tree of each archive category, wherein the information item architecture tree comprises different information items: personal information, work experience, educational background;
step 6, deploying a model:
the trained model is deployed to a panoramic archive generation system for automatic classification of employee information.
In this embodiment, when the naive bayes algorithm is used to construct the classification model, the method further includes calculating a class prior probability, and includes the following steps:
preparing a training data set containing classified documents, wherein the training data set comprises document data and category labels corresponding to the document data;
counting the training data set, and calculating the number of documents in each category to obtain the document frequency of each category;
calculating the total document number in the training data set, and calculating the prior probability of the category through a formula:
P(C)=N/N_total
wherein P (C) is the prior probability of class C; n is the number of documents belonging to category C; n_total is the total number of documents;
the obtained prior probability of each category is used for predicting the category of the new document.
The working process for calculating the category prior probability by using the naive Bayes algorithm to construct the classification model is as follows:
(1) Preparing a training data set: first, a training dataset containing classified documents is collected. Each document should be associated with its corresponding category label, and the training dataset includes document data and the category label to which the document data corresponds.
(2) Counting the number of documents: counting the training data set, and calculating the number of documents in each category to determine the document frequency of each category; the number of documents indicates how many documents are in each category.
(3) Calculating the total document number: the total number of documents in the entire training dataset, i.e. the sum of the number of documents in all categories, is calculated.
(4) Calculating the prior probability of the category: a priori probabilities (P (C)) are calculated for each class C. Wherein the prior probability represents the probability that a document belongs to a specific category C; the prior probability for a class is calculated using the following formula:
P(C)=N/N_total
where P (C) is the prior probability of category C, N is the number of documents belonging to category C, and N_total is the total number of documents.
Exemplary: assuming that an employee profile classification model is being built, wherein the categories of documents are different types of employees (e.g., full-time employees, part-time employees, interns, etc.); wherein a training data set has been prepared comprising the following information:
100 documents are labeled "full time staff".
The 50 documents are labeled "part time staff".
30 documents are labeled "practice student".
Now, the prior probability for each category needs to be calculated.
1. For the "full time employee" category:
n ("full-time employee") =100 (100 documents belong to the full-time employee category)
N_total (total number of documents) =100+50+30=180
Calculating prior probability: p ("full-time employee") =100/180≡0.5556
2. For the "part time employee" category:
n ("part time employee") =50
N_total (total number of documents) =180
Calculating prior probability: p ("part-time employee") =50/180≡ 0.2778)
3. For the "practice" category:
n ("practice") =30
N_total (total number of documents) =180
Calculating prior probability: p ("practice") =30/180≡0.1667.
Thus, the prior probability of each category has been calculated as described above, and will be used to predict the category of the new document, and the posterior probability is calculated based on the feature vector and the bayesian theorem of the new document. According to the category with the highest posterior probability, the category of the new document can be predicted, and the model is facilitated to automatically classify the new document.
In this embodiment, when predicting the category of the new document according to the obtained prior probability of each category, the method further includes calculating a feature condition probability (P (x|c)), that is, a probability that a feature x is observed given the category C, as follows:
Counting the documents in the training data set according to the training data set of the classified documents, and calculating the occurrence times of each feature under each category; the training data set comprises document data, category labels corresponding to the document data and features appearing in the document;
given class C, the frequency of feature x; if the document data has M different characteristics, the document data is expressed as: n (x, C), the frequency of the feature x under category C, generating a frequency for each category and each feature;
calculating the relative frequency P (x|C) of the feature x in each category C to obtain the conditional probability of each feature in each category:
P(x|C)=N(x,C)/N(C)
wherein P (x|C) is the conditional probability of feature x under category C; n (x, C) is the frequency of feature x under category C; n (C) is the total number of documents under category C.
Illustratively, assuming an employee profile classification model, the present invention has constructed a frequency matrix that includes the number of occurrences of each feature under each category, as follows is a partial example:
assume that the present invention has three categories: A. b, C; four features: x, Y, Z, W. An example of a partial frequency matrix is shown below:
A B C
X 20 15 10
Y 12 18 9
Z 8 7 5
W 6 4 3
then, the feature conditional probabilities of the present invention are calculated, wherein the present invention is to calculate the conditional probabilities of feature X under class A:
P(X|A)=N(X,A)/N(A)=20/(20+12+8+6)≈0.4651
This represents the probability that feature X is observed under category a. Similarly, the invention can calculate the conditional probability of other features under different categories, the conditional probability is used for a naive Bayesian classification model to help the model predict the category of the new document, and the posterior probability of each category is calculated according to the observation condition of the features in the new document, so as to determine the most probable category.
In this embodiment, when predicting the category of the new document using a naive bayes classification model, the method includes the steps of:
preparing a trained naive bayes classification model including a priori probability (P (C)) of the class and a conditional probability (P (x|c)) of the feature calculated from the training data;
preparing a new document, performing text preprocessing on the new document, which is the same as training data, and converting the new document into a feature vector;
for each category C, a posterior probability P (c|x) is calculated using bayesian theorem, where X represents the feature vector of the new document:
P(C|X)=P(X|C)*P(C)/P(X)
where P (C|X) is the posterior probability of class C given feature X; p (X|C) is the conditional probability of feature X under category C, obtained from the training data; p (C) is the prior probability of the class C and is obtained from the training data; p (X) is the marginal probability of feature X, calculated by summing the posterior probabilities of the various categories in which P (X) is the same for each category;
After the posterior probability of each category is calculated, the category with the highest posterior probability is selected as the prediction category of the new document; namely:
prediction category=argmax P (c|x).
In this embodiment, it is assumed by way of example that the present invention has a naive bayes classification model that has been trained for news article classification, with two categories: sports and science; now, the invention has a new sports news article to be classified:
1. the method carries out pretreatment on the news article, including word segmentation, stop word removal and the like.
2. For this article, the invention extracts features from the model, which may include word frequencies, word bag models, and the like.
3. Next, the present invention calculates posterior probabilities of the articles belonging to sports and technology categories according to bayesian theorem using a naive bayes model.
4. Finally, the present invention selects the category with the highest posterior probability as the classification result of the article, and if P (sports |x) > P (science |x), classifies it as sports news.
This process assigns the new document to the most likely category so that the new document can be correctly classified.
In some embodiments, the generating the profile information item architecture tree corresponding to the profile information list includes:
Defining a file information list of employee information integration, wherein the file information list comprises information items of the employee information integration;
each information item in the file information list is hierarchically structured to create a file information item architecture tree, and each information item in the file information item architecture tree is assigned with a unique tree structure label to create the information item architecture tree.
When the panoramic file is generated, the employee information is mapped according to the corresponding information items in the information item architecture tree and stored in the panoramic file resource database, and a complete panoramic file is created so that the user side can query and retrieve the panoramic file data of the corresponding employee by using the unique employee identifier and the required specific item information.
The panoramic archive generation method based on employee information integration, provided by the invention, is used for effectively managing and utilizing employee information in enterprise internal and external data sources, and has the following advantages:
1. panoramic archive generation: the method can convert employee information in the enterprise internal and external data sources into panoramic files, including personal information, working experience, education background and the like. Such panoramic archives are organized in a structured manner so that information is easily accessed and managed.
2. Automatic classification and archive generation: the invention uses employee information classification model to automatically identify and classify employee information based on keyword database. Then, according to the structure of the file information list, the file information item architecture tree is automatically generated, so that automatic file generation is realized, and the workload of manual processing is reduced.
3. And (3) automatically generating a template: the method also supports automatic generation of templates, selection of appropriate templates according to archive categories and information item architecture trees, and filling of data into templates. This ensures that the generated panoramic archive is compliant with certain standards and formats.
4. Intelligent archive calling: in some embodiments, the method also provides a recall function for the panoramic archive, allowing the user to query and retrieve panoramic archive data based on employee unique identifiers and specific item information. This increases the intelligence and interactivity of the system, making employee information easier to access.
5. Naive bayes classification model: and a classification model is constructed by adopting a naive Bayes algorithm, so that the automatic classification of new documents is realized, and the flexibility and the performance of the system are ensured.
6. Efficient employee information integration: the invention improves the integration and management efficiency of employee information, and is suitable for various enterprises, especially for human resources and employee file management.
In summary, the panoramic archive generation method based on employee information integration greatly improves the information management efficiency of enterprises, reduces manual work, ensures the consistency and the integrity of data, and provides highly accessible employee information resources. This is of great value to both the operation and management of the enterprise.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In one embodiment, as shown in fig. 3, there is provided a panorama archive generating system based on employee information integration, including:
and a data importing module: the method comprises the steps of receiving data packets from external data sources in an enterprise, and analyzing the data packets to obtain employee information resource texts; the employee information resource text can be extracted, and original data is provided for subsequent processing.
Employee information classification model: the system is used for automatically identifying and classifying employee information data and determining the file type of the employee; generating a file information item architecture tree according to the information items defined by the file information list; a naive bayes classification model is used that can effectively assign employee information to different archive categories. The model also utilizes a predefined keyword database to identify and tag key information, making subsequent archives more targeted.
Pre-configuration module database: comprises a predefined template and format, and screens and automatically generates a panoramic archive according to archive category and archive information item architecture tree.
Panoramic archive resource database: the panoramic archive data generation module is used for storing the generated panoramic archive data; the staff information is stored according to the structure of the information item architecture tree;
panorama archive calling module: allowing the user side to search and arrange panoramic archive data according to the unique identifier of the staff and the specific item information required by the call, generating panoramic archive text meeting the call requirement, and allowing the user to review.
The panoramic archive generation system based on employee information integration further comprises a text analyzer, a table processing library and a database connection tool; the text analyzer, the table processing library and the database connecting tool are respectively used for analyzing the formats of text files, table files or database files contained in the imported data packets and extracting and converting employee information data into a structured text form.
The panoramic archive generation system based on employee information integration further comprises a naive Bayesian classification model for automatic classification of employee information, and the prior probability of the category and the conditional probability of the feature are calculated by using the training data set of the classified documents, so that the category of the new document can be predicted conveniently.
In the embodiment, the panoramic archive generation system based on employee information integration provides a powerful employee information management solution for enterprises and organizations. The system receives, integrates, categorizes and stores employee information in an efficient manner and then automatically generates panoramic archives according to a predefined information architecture. The panoramic archive generating system based on employee information integration adopts the steps of the panoramic archive generating method based on employee information integration as described above when executing, so the operation process of the panoramic archive generating system based on employee information integration in this embodiment will not be described in detail.
In one embodiment, a computer device is provided in an embodiment of the present invention, including at least one processor, and a memory communicatively coupled to the at least one processor, where the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to cause the at least one processor to execute the method for generating a panoramic archive based on employee information integration, where the processor executes the instructions to implement the steps of the method for generating a panoramic archive based on employee information integration described above.
In one embodiment, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a computer to perform the steps of a method for generating a panoramic archive based on employee information integration.
Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by computer programs characterized by computer instructions that, when executed, cause the associated hardware to perform the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.
The non-volatile memory may include read-only memory, magnetic tape, floppy disk, flash memory, optical memory, etc. Volatile memory can include random access memory or external cache memory. By way of illustration, and not limitation, RAM can take many forms, such as static random access memory or dynamic random access memory.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (10)

1. A panoramic archive generation method based on employee information integration is characterized by comprising the following steps:
receiving a data packet imported by an external data source in an enterprise, and analyzing the data packet to obtain employee information resource text;
establishing an employee information classification model, traversing the content of the input employee information resource text by text, identifying and classifying employee information data based on a keyword database of a file information list, determining the file category of the employee, and generating a file information item architecture tree corresponding to the file information list;
loading a pre-configuration module database, screening corresponding templates based on file categories and file information item architecture trees, writing in according to a format, and automatically generating a panoramic file.
2. The method for generating a panoramic archive based on staff information integration according to claim 1, wherein the method for generating a panoramic archive based on staff information integration further comprises invoking the panoramic archive; the invoking of the panoramic archive comprises the following steps:
Acquiring a file calling request input by a user side, wherein the file calling request comprises a unique identifier of an employee and specific item information of a calling requirement;
according to the unique identifier of the employee, the panoramic archive data of the corresponding employee is retrieved from the panoramic archive resource database, and screening and sorting are carried out according to the calling requirement;
identifying keyword data related to the calling requirement in the screened panoramic archive data, including but not limited to date and project name;
screening out panoramic archive resource data meeting the calling requirement based on the keyword data, and generating a corresponding panoramic archive text;
the generated employee panoramic archive text is exported for review or further processing by the user.
3. A method for generating a panoramic archive based on employee information integration as recited in claim 2, wherein parsing said data packet to obtain employee information resource text comprises: identifying the format of a text file, a table file or a database file contained in the data packet; and analyzing the data packet based on the loaded text analyzer, the table processing library and the database connection tool, extracting employee information data and converting the employee information data into a structured text form.
4. A method for generating a panoramic archive based on employee information integration as recited in claim 1, wherein creating an employee information classification model comprises the steps of:
step 1) data acquisition and processing:
collecting employee information data, removing duplicate items and missing values, converting the employee information data into document data in a standardized data format, marking a document category label, and taking the document category label as sample data; wherein the employee information data includes personal information, work experience, educational background, information from an internal enterprise database, an external data source, or user provided information;
step 2) keyword and phrase recognition:
automatically identifying and extracting keywords and phrases in the document data by using a natural language processing technology based on the established keyword and phrase database;
step 3) data set division:
converting the document data into a data set represented by the feature vectors processed by the computer, and dividing the data set into a training set and a testing set;
step 4) model training and testing:
constructing a classification model by using a naive Bayes algorithm, training the classification model by using training set data, and learning to distribute document data into different archive categories; using the recall rate of the test set test model, and carrying out parameter adjustment on the model according to the performance evaluation result;
Step 5) generating a file information item architecture tree:
classifying the employee information texts by using the trained classification model, and determining the archive category of each text; based on the classification result, generating an information item architecture tree of each archive category, wherein the information item architecture tree comprises different information items: personal information, work experience, educational background;
step 6) deployment model:
the trained model is deployed to a panoramic archive generation system for automatic classification of employee information.
5. The method for generating a panoramic archive based on employee information integration of claim 4, further comprising calculating a class prior probability when constructing a classification model using a naive bayes algorithm, comprising the steps of:
preparing a training data set containing classified documents, wherein the training data set comprises document data and category labels corresponding to the document data;
counting the training data set, and calculating the number of documents in each category to obtain the document frequency of each category;
calculating the total document number in the training data set, and calculating the prior probability of the category through a formula:
P(C)=N/N_total
wherein P (C) is the prior probability of class C; n is the number of documents belonging to category C; n_total is the total number of documents;
The obtained prior probability of each category is used for predicting the category of the new document.
6. The method for generating a panoramic archive based on employee information integration according to claim 5, wherein when predicting the category of the new document based on the obtained prior probability of each category, further comprising calculating a feature conditional probability P (x|c), i.e., a probability that a feature x is observed given the category C, the steps are as follows:
counting the documents in the training data set according to the training data set of the classified documents, and calculating the occurrence times of each feature under each category; the training data set comprises document data, category labels corresponding to the document data and features appearing in the document;
given class C, the frequency of feature x; if the document data has M different characteristics, the document data is expressed as: n (x, C), the frequency of the feature x under category C, generating a frequency for each category and each feature;
calculating the relative frequency P (x|C) of the feature x in each category C to obtain the conditional probability of each feature in each category:
P(x|C)=N(x,C)/N(C)
wherein P (x|C) is the conditional probability of feature x under category C; n (x, C) is the frequency of feature x under category C; n (C) is the total number of documents under category C.
7. A method for generating a panoramic archive based on employee information integration as recited in claim 6 wherein when predicting the category of the new document using a naive bayes classification model, comprising the steps of:
preparing a trained naive bayes classification model including a priori probability (P (C)) of the class and a conditional probability (P (x|c)) of the feature calculated from the training data;
preparing a new document, performing text preprocessing on the new document, which is the same as training data, and converting the new document into a feature vector;
for each category C, a posterior probability P (c|x) is calculated using bayesian theorem, where X represents the feature vector of the new document:
P(C|X)=P(X|C)*P(C)/P(X)
where P (C|X) is the posterior probability of class C given feature X; p (X|C) is the conditional probability of feature X under category C, obtained from the training data; p (C) is the prior probability of the class C and is obtained from the training data; p (X) is the marginal probability of feature X, calculated by summing the posterior probabilities of the various categories in which P (X) is the same for each category;
after the posterior probability of each category is calculated, the category with the highest posterior probability is selected as the prediction category of the new document; namely:
prediction category=argmax P (c|x).
8. A method for generating a panoramic archive based on staff information integration as recited in claim 7 wherein said generating an archive information item architecture tree corresponding to an archive information list comprises:
defining a file information list of employee information integration, wherein the file information list comprises information items of the employee information integration;
each information item in the file information list is hierarchically structured to create a file information item architecture tree, and each information item in the file information item architecture tree is assigned with a unique tree structure label to create the information item architecture tree.
9. The method for generating a panoramic archive based on employee information integration according to claim 8, wherein when the panoramic archive is generated, the employee information is mapped according to the corresponding information item in the information item architecture tree and stored in the panoramic archive resource database, and a complete panoramic archive is created for the user side to query and retrieve the panoramic archive data of the corresponding employee by using the unique employee identifier and the required specific item information.
10. A panoramic archive generation system based on staff information integration for performing the panoramic archive generation method based on staff information integration as claimed in any one of claims 1 to 9, characterized in that the system comprises:
And a data importing module: the method comprises the steps of receiving data packets from external data sources in an enterprise, and analyzing the data packets to obtain employee information resource texts;
employee information classification model: the system is used for automatically identifying and classifying employee information data and determining the file type of the employee; generating a file information item architecture tree according to the information items defined by the file information list;
pre-configuration module database: the method comprises the steps of including a predefined template and format, screening and automatically generating a panoramic archive according to archive categories and archive information item architecture trees;
panoramic archive resource database: the panoramic archive data generation module is used for storing the generated panoramic archive data; the staff information is stored according to the structure of the information item architecture tree;
panorama archive calling module: allowing the user side to search and arrange panoramic archive data according to the unique identifier of the staff and the specific item information required by the call, generating panoramic archive text meeting the call requirement, and allowing the user to review.
CN202311345854.9A 2023-10-17 2023-10-17 Panoramic archive generation method and system based on employee information integration Withdrawn CN117592450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311345854.9A CN117592450A (en) 2023-10-17 2023-10-17 Panoramic archive generation method and system based on employee information integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311345854.9A CN117592450A (en) 2023-10-17 2023-10-17 Panoramic archive generation method and system based on employee information integration

Publications (1)

Publication Number Publication Date
CN117592450A true CN117592450A (en) 2024-02-23

Family

ID=89912232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311345854.9A Withdrawn CN117592450A (en) 2023-10-17 2023-10-17 Panoramic archive generation method and system based on employee information integration

Country Status (1)

Country Link
CN (1) CN117592450A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069586A (en) * 2024-04-17 2024-05-24 南通点耐特智能科技有限公司 Employee profile information transmission method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069586A (en) * 2024-04-17 2024-05-24 南通点耐特智能科技有限公司 Employee profile information transmission method

Similar Documents

Publication Publication Date Title
Diba et al. Extraction, correlation, and abstraction of event data for process mining
US11663254B2 (en) System and engine for seeded clustering of news events
US10929348B2 (en) Method and system for large scale data curation
US7912816B2 (en) Adaptive archive data management
US8126826B2 (en) Method and system for active learning screening process with dynamic information modeling
US20220291921A1 (en) Methods and systems to classify software components based on multiple information sources
CN112131295A (en) Data processing method and device based on Elasticissearch
CN111046059B (en) Low-efficiency SQL statement analysis method and system based on distributed database cluster
WO2022081812A1 (en) Artificial intelligence driven document analysis, including searching, indexing, comparing or associating datasets based on learned representations
KR20210129465A (en) Apparatus for managing laboratory note and method for searching laboratory note using thereof
CA2956627A1 (en) System and engine for seeded clustering of news events
CN117592450A (en) Panoramic archive generation method and system based on employee information integration
CN114356967A (en) Professional information collection and analysis application platform
CN112966162A (en) Scientific and technological resource integration method and device based on data warehouse and middleware
AU2012244271B2 (en) Associative memory-based project management system
CN115221337A (en) Data weaving processing method and device, electronic equipment and readable storage medium
Yang et al. No Time to dice: learning execution contexts from event logs for resource-oriented process mining
CN117453805A (en) Visual analysis method for uncertainty data
CN106775694B (en) A kind of hierarchy classification method of software configuration code product
CN116340845A (en) Label generation method and device, storage medium and electronic equipment
Sepahvand et al. An Effective Model to Predict the Extension of Code Changes in Bug Fixing Process Using Text Classifiers
Krasic et al. Big data and business intelligence: research and challenges in telecom industry
CN116560976B (en) Fine granularity test case detection method based on machine learning
Isha et al. Design and implementation of public data warehouse
CN117520425A (en) Scalable and efficient industrial commodity digitizing system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20240223