CN112036692A - Analysis method and analysis system for flow condition of personnel among mechanisms - Google Patents

Analysis method and analysis system for flow condition of personnel among mechanisms Download PDF

Info

Publication number
CN112036692A
CN112036692A CN202010735795.6A CN202010735795A CN112036692A CN 112036692 A CN112036692 A CN 112036692A CN 202010735795 A CN202010735795 A CN 202010735795A CN 112036692 A CN112036692 A CN 112036692A
Authority
CN
China
Prior art keywords
personnel
information
person
calculation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010735795.6A
Other languages
Chinese (zh)
Other versions
CN112036692B (en
Inventor
杨万征
蔡超
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN202010735795.6A priority Critical patent/CN112036692B/en
Publication of CN112036692A publication Critical patent/CN112036692A/en
Application granted granted Critical
Publication of CN112036692B publication Critical patent/CN112036692B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Theoretical Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of data processing, and discloses an analysis method and an analysis system for flow conditions of staff among institutions, which are used for acquiring information of workers, patents, periodicals and papers, respectively processing the information and extracting the information of the workers and the merchants; performing preliminary calculation based on the mechanism; calculating based on the business relation; based on the content similarity calculation on the timeline, the unique id of the person is finally determined. The invention can effectively calculate the documents issued by technicians, and judges whether the documents are the same author or not according to the semantic similarity degree of the documents, and the accuracy of the method is higher than that of the method only using the document classification number. The initial packet number can be compressed simply and efficiently by standardizing the name of the enterprise to which the device belongs and performing enterprise calculation based on the industrial and commercial information. Through calculation on the time line for the person, the career course of the person can be combed to obtain the personal history of the person.

Description

Analysis method and analysis system for flow condition of personnel among mechanisms
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to an analysis method and an analysis system for flow conditions of personnel among mechanisms.
Background
At present, for an enterprise, the general flow of personnel is normal, however, the impact of the flow of core personnel on the enterprise is really very significant. The loss of core personnel affects project schedule slightly, and enterprises are forced to adjust directions because core services cannot be pushed. And the absorption of high-energy personnel can bring the enterprise to the correct direction, so that the enterprise tends to be broken bamboo or tiger plus wings on the developed road.
Enterprises on the market in the application of the scientific and creative edition are required to have board of directors, advanced managers and core technicians in the last two years, wherein the core technicians are taken as standard objects of the scientific and creative edition and basically comprise technical responsibility of companies, research and development responsibility, main members of research and development departments, main intellectual property rights, non-patent technology inventors or designers, drafts of main technical standards and the like. Therefore, for scientific and technical enterprises, the influence of the flow of core personnel, especially core technicians, on the enterprises is extremely considerable.
In addition, the more precious staff is experience accumulated for many years, the more prominent core technical staff can finish certain control work and deal with emergencies only through abundant experience, the experience is intangible asset which cannot be handed over through work and is left in enterprises, and the evaluation of the value of a certain staff is also based on the experience.
Therefore, to fully evaluate the current situation and potential of an enterprise, it is necessary to know the change situation of the core personnel and the past experience and comprehensive capability of the core personnel.
Aiming at the needs of understanding the comprehensive strength of a certain enterprise and the change and past experience of core personnel, the technical implementation is often compressed into a small range, or the precision is too low, the credibility is low, and the like. Such as: currently, many websites providing services for information of industrial and commercial businesses calculate and process the information of the board of director, because the information of the board of director is relatively easy to be exposed, many places require enterprises to fill in more detailed or uniquely determined information of the board of director, and a technical service website can calculate data according to uniquely determined identity numbers, mobile phone numbers or mailboxes, however, besides the board of director, some core technicians are also important components of the comprehensive strength of the enterprises, but the exposure rate of technicians is often very low, and is only a noun in papers, patents, periodicals and the like, and the phenomenon of renaming is serious, and a unique identity information of an author cannot be registered during document registration, so that the data calculation for the technicians is relatively difficult. Some service merchants providing document inquiry can serve as joint identification marks according to authors and companies where the authors and the companies are located, however, serious problems exist, if a certain core technician works in multiple units, multiple identification marks can appear, the multiple identification marks cannot be correlated, the past cannot be known after the core technician is inquired, and the analysis is not really convenient.
In the prior art, the simplest method is to use an author-organization combined identifier as a person unique id, but the method cuts off the flowing relation of persons among enterprises, and cannot analyze the person history.
Meanwhile, part of systems determine whether the authors are the same person or not by labeling the document data published by the authors and calculating the similarity of the label set, and give a unique id, but the label system is often more standard, and the field limitation is performed through the labels, so that the limitation range is larger, and under the same field, a large number of duplicate people still exist, for example, under the same IPC group, the inventor named 'Zhang III' still exists a lot.
Through the above analysis, the problems and defects of the prior art are as follows: (1) the prior art does not judge and identify the flow condition of core personnel of an enterprise based on big data. The technical risk problem of enterprises cannot be effectively helped;
(2) the exposure rate of technicians is low, the flow of the technicians cannot be revealed through information publicity or news, and the technicians are difficult to capture and verify;
(3) the document publisher cannot register the unique id identification, and the phenomenon of duplicate name of an author is serious;
(4) technical staff in China increase year by year, even if a great number of famous persons still exist in a certain precise field;
(5) enterprises, colleges and universities and the like with high personnel mobility still have strong renaming phenomenon even in the same mechanism;
the difficulty in solving the above problems and defects is:
in order to solve the first problem, people's histories need to be combed based on time lines, past companies of a certain person are arranged, influence of personnel movement has influence on personal reputation, evidence can be raised while a conclusion is given, various complex scenes existing in the practical problem are fully considered, and the fact that the complex scenes can be explained and can be provided with specific evidence in any calculation link.
In order to solve the second problem, a large amount of professional histories of the enterprise technicians are required to be collected, but the exposure rate of the technicians is low, and related news is difficult to report, so social experiences and even interpersonal networks are required to be arranged one by one.
In order to solve the problems three, four and five, the analysis needs to be carried out from multiple angles at the same time before the conclusion is given by keeping the rigorous attitude of science, various complex scenes in reality are considered, and the conclusion cannot be recklessly drawn through a single evaluation dimension.
The significance of solving the problems and the defects is as follows:
the invention provides a method for analyzing the flowing condition of a worker among organizations, aiming at the problems, the basic data of the enterprise core technicians is automatically analyzed, the basic data is not limited to the director, and the past data information of the core technicians is calculated to obtain the occupation history of the core technicians.
By solving the above problems, the following significance is also provided:
accurately evaluating the market value of the enterprise;
measuring the comprehensive strength of enterprise personnel;
performing departure warning on core technicians;
precision talent digging, and the like.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an analysis method and an analysis system for the flow condition of personnel among mechanisms.
The invention is realized in such a way that a method for analyzing the flow condition of personnel among mechanisms comprises the following steps:
collecting people-related data, including but not limited to: business information, patents, periodicals, articles;
secondly, according to the personnel list, the collected data are processed, calculated, combined, disassembled and the like;
and thirdly, calculating the passing information of the personnel to obtain the occupation history.
The method specifically comprises the following steps:
acquiring information such as information of a manufacturer, a patent, a periodical, a paper and the like;
step two, performing characteristic processing on the information;
thirdly, carrying out hierarchical clustering based on personnel names, standardized company names, document language feature vectors and the like;
step four, performing preliminary calculation, merging and disassembling based on the mechanism;
calculating, merging and disassembling based on the business relationship;
and step six, based on content similarity calculation on the time line, finally determining the unique id of the personnel.
Further, in the present invention,
the first step of processing the information respectively comprises the following steps: dividing basic data into two categories, namely industrial and commercial data and literature data; by using as a data set, literature, picture or video data with author and affiliated organization information;
the step of processing the characteristics of the information comprises the following steps:
carrying out data calculation on the obtained business information to obtain the superior-inferior relation between enterprises;
standardizing the names of people and organizations in the document data;
thirdly, extracting the characteristics of keywords, field vocabularies, technical nouns and the like from the document data;
fourthly, extracting semantic features of the document data;
step three, hierarchical clustering comprises the following steps:
firstly, clustering is carried out according to standardized personnel names;
secondly, clustering is carried out according to the standardized company name;
the fourth step of performing preliminary calculation, merging and disassembling based on the mechanism comprises the following steps of:
extracting all document sets C of people named as 'A' under a 'B' company;
extracting semantic features of the documents in the set C;
thirdly, clustering is carried out according to the semantic features of the set C;
determining personnel id according to the clustering result;
the method for calculating the fifth step based on the industrial and commercial relationship comprises the following steps: according to the obtained upper and lower level relations of the enterprise and
calculating data of the obtained personnel id;
verification is made that an employee called "a" is present in enterprise 1 and that an employee called "a" is also present in enterprise 2,
semantic comparison of two personnel literature sets is carried out, and the two personnel literature sets are determined to be the same person when a preset threshold value is met.
The sixth step of calculating the similarity of the contents based on the time line comprises the following steps:
the time difference between the last cooperation time of the employee A in the enterprise 1 and each employee A launched later is calculated, the semantic similarity between the document published by the employee A in the enterprise 1 and the document published by each employee A launched later is calculated, the transition probability between the two is obtained through calculation of the time difference and the average semantic similarity, and if the transition probability exceeds a preset threshold value, the same person is judged.
The step six of obtaining the unique id of the person comprises the following steps:
combing each person with the related company on the time line to obtain the transfer relationship of each person among the companies, namely the professional experience of the employee; and determining the uniqueness of each employee and giving the employee a unique identity id.
In the present invention, by using such as: papers, periodicals, patents, trademarks, books, etc. as data sets. By using as data sets documents, pictures or video data with information of the author and the organization to which it belongs. And calculating semantic similarity between document sets by using a neural network algorithm.
Another object of the present invention is to provide a system for analyzing a flow of a person between facilities, comprising: the information acquisition and extraction module is used for acquiring information of industry and commerce, patents, periodicals and papers, respectively processing the information and extracting the information of the industry and commerce;
the mechanism calculation module is used for preliminary calculation based on the mechanism to which the mechanism belongs;
the business-industry relation calculation module is used for calculating based on the business-industry relation;
the document semantic extraction module is used for calculating document semantic features;
and the personnel occupation record acquisition module is used for calculating the content similarity on the time line and finally determining the unique id of the personnel.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
acquiring information such as information of a manufacturer, a patent, a periodical, a paper and the like;
step two, performing characteristic processing on the information;
thirdly, carrying out hierarchical clustering based on personnel names, standardized company names, document language feature vectors and the like;
step four, performing preliminary calculation, merging and disassembling based on the mechanism;
calculating, merging and disassembling based on the business relationship;
and step six, based on content similarity calculation on the time line, finally determining the unique id of the personnel.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring information such as information of a manufacturer, a patent, a periodical, a paper and the like;
step two, performing characteristic processing on the information;
thirdly, carrying out hierarchical clustering based on personnel names, standardized company names, document language feature vectors and the like;
step four, performing preliminary calculation, merging and disassembling based on the mechanism;
calculating, merging and disassembling based on the business relationship;
and step six, based on content similarity calculation on the time line, finally determining the unique id of the personnel.
By combining all the technical schemes, the invention has the advantages and positive effects that:
the invention judges and identifies the flow condition of the core personnel of the enterprise based on the patent big data. Judging whether the people are the same person or not by using semantic similarity among the document sets; and determining the professional history of the personnel through comprehensive data analysis.
Through the scheme, the method and the system can help to well know the technical risk of the enterprise by identifying and monitoring the flow condition of the core technicians.
Compared with the prior art, the invention has the advantages that:
the system structure can effectively calculate the documents issued by technicians, and judges whether the documents are the same author or not according to the semantic similarity degree of the documents, wherein the accuracy of the author is higher than that of the author only using the document classification number.
The initial packet number can be compressed simply and efficiently by standardizing the name of the enterprise to which the device belongs and performing enterprise calculation based on the industrial and commercial information.
Through calculation on the time line for the person, the career course of the person can be combed to obtain the personal history of the person.
(1) By using the technical scheme, the invention can finally obtain the career history of the core technicians, and as the external personnel, the site where the technicians come from can be known, and the comprehensive technical strength and the potential wind direction of the enterprise can be obtained by comprehensively analyzing the sources and the directions of the technicians.
(2) As described in the claims, the whole calculation process has flexibility and high interpretability, all calculation results can be demonstrated, and analysis basis is provided for all upper-layer analysis.
(3) Compared with the prior art. According to the technical scheme, the similarity calculation can be carried out in the more subdivided field, and the personnel disambiguation is more accurate through the content-based similarity calculation.
(4) Compared with the prior art. The technical scheme has more evaluation dimensions, and besides the evaluation of the content similarity, the enterprise relationship and other dimensions, in the actual operation process, the multi-dimensional measurement is also carried out on the city, the cooperative relationship and the like, so that all results can be interpreted.
(5) And (5) the scheme has the effect. By using the technical scheme, personnel history analysis is carried out on 46000+ companies to obtain that 17000+ companies have obvious personnel flow, and 30000+ people are involved, and data verification is carried out on data, so that data of definite talent flow of furniture such as three-country health, singer shares, Jiangsu, warship chip manufacturing and the like exposed to historical news can be obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is an analysis schematic diagram of the flow situation of the personnel between the facilities provided by the embodiment of the invention.
Fig. 2 is a flow chart of preliminary calculation based on the affiliated entity according to the embodiment of the present invention.
Fig. 3 is a flowchart of enterprise relationship based computing according to an embodiment of the present invention.
Fig. 4 is a flow chart of content similarity calculation based on a timeline according to an embodiment of the present invention.
Fig. 5 is a diagram illustrating the effect of performing calculations based on a timeline according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method and a system for analyzing the flow condition of people between facilities, which will be described in detail with reference to the accompanying drawings.
As shown in FIG. 1, the invention provides a method for analyzing the flow condition of people between institutions, which comprises the following core steps: acquiring basic (industrial and commercial, patent, journal, thesis and the like) information, respectively processing the information, extracting the industrial and commercial information, performing preliminary calculation based on affiliated institutions, calculating based on industrial and commercial relationships, calculating based on content similarity on a time line, and finally determining the unique id of a person.
Those of ordinary skill in the art of the methods provided herein may also perform other steps, and the invention of fig. 1 is provided as a specific example only.
The invention is further described below with reference to specific embodiments.
And acquiring basic information and grouping. The process of collecting the basic information is not within the scope of this patent discussion, at this step the system simply classifies the underlying data into two broad categories, business data and literature data.
And extracting industrial and commercial information. And performing data calculation according to the obtained industrial and commercial information to obtain the superior-subordinate relation between enterprises.
And performing preliminary calculation based on the affiliated mechanism. Although most documents are published without filling in the unique identity of a publisher, most documents fill in the affiliated institutions of users, the step is to cluster the documents according to the author names and the affiliated institution names of the documents as joint identifiers to obtain a plurality of data groups, and although uploading of the documents requires submission of the affiliated institution names, the examination of the affiliated institution names is not strict enough, and the affiliated institution names are not required to be completely consistent with the business information, so that the affiliated institution names need to be normalized before data calculation. However, since the same enterprise condition may occur even though the duplication probability is limited, it is necessary to obtain semantic representation of the document and perform secondary verification to ensure the accuracy of the entire system, and the overall structure diagram is shown in fig. 3.
As shown in fig. 2, the specific steps of the preliminary calculation based on the mechanism are divided into the following items:
extracting all relevant documents of a person A to be checked
② standardizing names of belonged enterprises in documents
Thirdly, clustering by taking the standardized enterprise name and the standardized personnel name as joint identifiers
Fourthly, converting the data literature obtained by clustering into semantic vectors
Fifthly, clustering operation in the data block is carried out according to the semantic vector
Sixthly, determining personnel id according to clustering result
And calculating based on the business relation. And (4) performing data calculation according to the upper and lower level relation of the enterprise obtained in the step (2) and the personnel id obtained in the step (3). Along with the development of enterprises and the increase of staff capacity, the internal post-adjustment, the combination of departments, the division and the splitting of departments and the like of enterprises, the same person may be in the duties of a plurality of branch organizations of the enterprises or groups, such as: if the document published by the enterprise 1 includes the person a and the document published by the branch enterprise 1_1 of the enterprise 1 also includes the person a, it is likely that the person a performs duties in both the enterprise 1 and the enterprise 1_1 and documents are published in the two enterprises, respectively, and therefore, the cases need to be merged.
As shown in fig. 3, there is one employee called "a" in enterprise 1, there is one employee called "a" in enterprise 2, and enterprise 2 is a branch of enterprise 1, so that the two employees are most likely to be the same person, and in order to verify that the two employees are indeed the same person, semantic comparison of two personnel literature sets is performed, and if a predetermined threshold is met, the two personnel literature sets are determined to be the same person.
Based on content similarity calculations on the timeline. A group of personnel data heaps signed at different enterprises can be obtained through the step 4, but as shown in fig. 4, the approximate starting and ending time of the employee at a company (or company family) can be analyzed through the initial time and the final time of the documents in the data heaps, and the possibility of slot skipping can be presumed through the sequential relationship. For example, in the following figure, employee "a" in enterprise 3 is most likely to be the same person who is out of the enterprise 1 slot.
Based on the method, each employee A appearing after the employees of the enterprise 1-A cooperate for the last time is probably caused by the same employee A jumping, so the time difference between the last cooperation time of the employee A in the enterprise 1 and each employee A launched later is calculated, the semantic similarity between the document published by the employee A in the enterprise 1 and the document published by each employee A launched later is calculated at the same time, the transition probability between the two is obtained through the calculation of the time difference and the average semantic similarity, and if the transition probability exceeds a preset threshold value, the same employee is judged.
In the actual operation process, multidimensional supplementation such as the location of an enterprise, the cooperation relationship of personnel, the time span before and after the enterprise is added at the same time, and judgment is carried out by using multidimensional information.
And obtaining the unique id of the person. Based on the step 5, each person has been combed out with its associated company on the time line, and the transfer relationship between each person and each company, that is, the professional experience of the employee, is obtained. At this point, the uniqueness of each employee can be determined and given its unique identification id.
In the present invention, by using such as: papers, periodicals, patents, trademarks, books, etc. as data sets. By using as data sets documents, pictures or video data with information of the author and the organization to which it belongs. And calculating semantic similarity between document sets by using a neural network algorithm.
The invention is further described with reference to specific examples.
Example (b): calculating the occupation history of the core technical personnel based on the patent data and the enterprise business information, and searching a core technical personnel name which the user wants to view by the user, such as: "Zhang Yang".
All patent information issued by the name staff is acquired by searching in a patent data set, and 4803 patent inventors are found in total to "Zhang Yang".
And carrying out normalization operations such as cleaning, completion, wrong word correction and the like on all enterprise names in the patent set.
Clustering is carried out according to the normalized enterprise names, each cluster can be regarded as an independent individual person Zhang Yan, and the fact that the personal shadow of the Zhang Yan of the staff exists in the enterprises of 849 is found through clustering.
Whether the companies have superior-inferior relations or not is obtained through the industrial and commercial information, if yes, whether the companies are the same person is judged through the average semantic similarity among the document sets, and if the companies are the same person, integration is carried out.
Calculations are performed based on the timeline. The effect graph is shown in fig. 5, and each line represents the time span of the clustered individual zhangyang in the enterprise where the individual zhangyang is located. It was found that the earliest occurrence time of Zhangyang in patent data was 1991, some Zhangyang had long time of duties in the enterprise where it is located, and some had short time of duties in the enterprise where it is located.
And judging whether the cluster is the same person or not by calculating the time span of each cluster and clusters appearing after the cluster and the average semantic similarity among the clusters, and if the cluster is the same person, representing by using the same id.
By using the technical scheme, personnel history analysis is carried out on 46000+ companies to obtain that 17000+ companies have obvious personnel flow, and 30000+ people are involved, and data verification is carried out on data, so that data of definite talent flow of furniture such as three-country health, singer shares, Jiangsu, warship chip manufacturing and the like exposed to historical news can be obtained.
In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (11)

1. A method for analyzing the flow of people between facilities, which is characterized by comprising the following steps:
personnel-related data is collected, including but not limited to: business information, patents, periodicals, articles;
according to the personnel list, carrying out feature processing, calculating, merging and disassembling on the collected data;
and calculating the passing information of the personnel to obtain the occupation history.
2. The method for analyzing the flow condition of the person between the facilities as claimed in claim 1, wherein the method for analyzing the flow condition of the person between the facilities specifically comprises:
step one, acquiring information of a manufacturer, a patent, a periodical and a paper;
step two, performing characteristic processing on the information;
thirdly, carrying out hierarchical clustering based on personnel names, standardized company names, document language feature vectors and the like;
step four, performing preliminary calculation, merging and disassembling based on the mechanism;
calculating, merging and disassembling based on the business relationship;
and step six, based on content similarity calculation on the time line, finally determining the unique id of the personnel.
3. The method for analyzing the flow of people between facilities as claimed in claim 2,
the first step of processing the information respectively comprises the following steps: dividing basic data into two categories, namely industrial and commercial data and literature data; by using as data sets documents, pictures or video data with information of the author and the organization to which it belongs.
4. The method for analyzing the flow of people between institutions as claimed in claim 2, wherein the step of characterizing the information by two comprises:
carrying out data calculation on the obtained business information to obtain the superior-inferior relation between enterprises;
standardizing the names of people and organizations in the document data;
extracting characteristics such as keywords, field vocabularies, technical nouns and the like from the document data;
and semantic feature extraction is carried out on the document data.
5. The method for analyzing the flow of the people among the institutions as claimed in claim 2, wherein the step three, hierarchical clustering comprises:
clustering according to the standardized personnel names;
clustering was performed according to standardized company names.
6. The method for analyzing the flow condition of the personnel among the institutions as claimed in claim 2, wherein the step four of performing the preliminary calculation, the combination and the disassembly based on the institutions comprises the following steps:
extracting all document collections C of the person named A under the company B;
extracting semantic features of the documents in the set C;
clustering is carried out according to the semantic features of the set C;
and determining the personnel id according to the clustering result.
7. The method for analyzing the flow condition of the personnel among the institutions as claimed in claim 2, wherein the step five is based on a method of industrial and commercial relationship calculation, and comprises the following steps: performing data calculation according to the obtained superior-subordinate relation of the enterprise and the obtained personnel id;
and verifying the employee called A in the enterprise 1 and the employee called A in the enterprise 2, comparing the semantics of the two personnel document sets, and determining that the two personnel document sets are the same person if the predetermined threshold value is met.
8. The method for analyzing the flow of people among organizations as claimed in claim 2, wherein the sixth step of calculating the similarity of the contents based on the time line comprises:
calculating the time difference between the last cooperation time of the employee A in the enterprise 1 and each employee A initiated later, calculating the semantic similarity between the document published by the employee A in the enterprise 1 and the document published by each employee A initiated later, calculating the transition probability between the two through the time difference and the average semantic similarity, and judging the same employee if the transition probability exceeds a preset threshold value;
the step six of obtaining the unique id of the person comprises the following steps:
combing each person with the related company on the time line to obtain the transfer relationship of each person among the companies, namely the professional experience of the employee; and determining the uniqueness of each employee and giving the employee a unique identity id.
9. An analysis system for flow of people between facilities, comprising:
the information acquisition and extraction module is used for acquiring information of industry and commerce, patents, periodicals and papers, respectively processing the information and extracting the information of the industry and commerce;
the mechanism calculation module is used for preliminary calculation based on the mechanism to which the mechanism belongs;
the business-industry relation calculation module is used for calculating based on the business-industry relation;
the document semantic extraction module is used for calculating document semantic features;
and the personnel occupation record acquisition module is used for calculating the content similarity on the time line and finally determining the unique id of the personnel.
10. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
acquiring information of trades, patents, periodicals and papers;
processing the characteristics of the information;
carrying out hierarchical clustering based on personnel names, standardized company names, document language feature vectors and the like;
performing preliminary calculation, merging and disassembling based on the mechanism;
calculating, merging and disassembling based on the business relationship;
based on the content similarity calculation on the timeline, the unique id of the person is finally determined.
11. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring information of trades, patents, periodicals and papers;
processing the characteristics of the information;
carrying out hierarchical clustering based on personnel names, standardized company names, document language feature vectors and the like;
performing preliminary calculation, merging and disassembling based on the mechanism;
calculating, merging and disassembling based on the business relationship;
based on the content similarity calculation on the timeline, the unique id of the person is finally determined.
CN202010735795.6A 2020-07-28 2020-07-28 Analysis method and analysis system for flow condition of personnel between institutions Active CN112036692B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010735795.6A CN112036692B (en) 2020-07-28 2020-07-28 Analysis method and analysis system for flow condition of personnel between institutions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010735795.6A CN112036692B (en) 2020-07-28 2020-07-28 Analysis method and analysis system for flow condition of personnel between institutions

Publications (2)

Publication Number Publication Date
CN112036692A true CN112036692A (en) 2020-12-04
CN112036692B CN112036692B (en) 2024-06-07

Family

ID=73583318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010735795.6A Active CN112036692B (en) 2020-07-28 2020-07-28 Analysis method and analysis system for flow condition of personnel between institutions

Country Status (1)

Country Link
CN (1) CN112036692B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724085A (en) * 2021-07-24 2021-11-30 北京华彬立成科技有限公司 Mining method and device for early project information, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111973A (en) * 2014-06-17 2014-10-22 中国科学院计算技术研究所 Scholar name duplication disambiguation method and system
CN105069560A (en) * 2015-07-30 2015-11-18 中国科学院软件研究所 Resume information extraction and characteristic identification analysis system and method based on knowledge base and rule base
CN105653590A (en) * 2015-12-21 2016-06-08 青岛智能产业技术研究院 Name duplication disambiguation method of Chinese literature authors
CN110020433A (en) * 2019-04-01 2019-07-16 中科天玑数据科技股份有限公司 A kind of industrial and commercial senior executive's name disambiguation method based on enterprise's incidence relation
CN111105327A (en) * 2019-12-19 2020-05-05 清华大学 Calculation method for cross-border flow of scholars
CN111221873A (en) * 2019-12-31 2020-06-02 成都数联铭品科技有限公司 Inter-enterprise homonym identification method and system based on associated network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111973A (en) * 2014-06-17 2014-10-22 中国科学院计算技术研究所 Scholar name duplication disambiguation method and system
CN105069560A (en) * 2015-07-30 2015-11-18 中国科学院软件研究所 Resume information extraction and characteristic identification analysis system and method based on knowledge base and rule base
CN105653590A (en) * 2015-12-21 2016-06-08 青岛智能产业技术研究院 Name duplication disambiguation method of Chinese literature authors
CN110020433A (en) * 2019-04-01 2019-07-16 中科天玑数据科技股份有限公司 A kind of industrial and commercial senior executive's name disambiguation method based on enterprise's incidence relation
CN111105327A (en) * 2019-12-19 2020-05-05 清华大学 Calculation method for cross-border flow of scholars
CN111221873A (en) * 2019-12-31 2020-06-02 成都数联铭品科技有限公司 Inter-enterprise homonym identification method and system based on associated network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724085A (en) * 2021-07-24 2021-11-30 北京华彬立成科技有限公司 Mining method and device for early project information, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112036692B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
US10509985B2 (en) Method and apparatus for security inspection
Lazar et al. Improving the accuracy of duplicate bug report detection using textual similarity measures
Gong et al. A survey on dataset quality in machine learning
US9595005B1 (en) Systems and methods for predictive coding
US20110099133A1 (en) Systems and methods for capturing and managing collective social intelligence information
CN111553137B (en) Report generation method and device, storage medium and computer equipment
CN109885597B (en) User grouping processing method and device based on machine learning and electronic terminal
CN111506771B (en) Video retrieval method, device, equipment and storage medium
CN113190372B (en) Multi-source data fault processing method and device, electronic equipment and storage medium
WO2019072007A1 (en) Data processing method and device
CN112632268B (en) Complaint work order detection processing method, complaint work order detection processing device, computer equipment and storage medium
Alsudais Quantifying the offline interactions between hosts and guests of Airbnb
CN114491034B (en) Text classification method and intelligent device
CN109740156B (en) Feedback information processing method and device, electronic equipment and storage medium
CN114969387A (en) Document author information disambiguation method and device and electronic equipment
Khelifa et al. Towards a Software Requirements Change Classification using Support Vector Machine.
CN112036692A (en) Analysis method and analysis system for flow condition of personnel among mechanisms
CN112632958A (en) Contract document examination and analysis method based on contract knowledge base
Leung et al. Counting protests in news articles: A dataset and semi-automated data collection pipeline
CN112328752B (en) Course recommendation method and device based on search content, computer equipment and medium
KR102244699B1 (en) Method for labeling emotion using sentence similarity of crowdsourcing based project for artificial intelligence training data generation
CN113987351A (en) Artificial intelligence based intelligent recommendation method and device, electronic equipment and medium
Ghawi et al. Analysis of country mentions in the debates of the un security council
CN109934740B (en) patent monitoring method and device
CN113849618A (en) Strategy determination method and device based on knowledge graph, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant