CN113886779A - Method for identifying person identity, storage medium and computer program product - Google Patents

Method for identifying person identity, storage medium and computer program product Download PDF

Info

Publication number
CN113886779A
CN113886779A CN202111161578.1A CN202111161578A CN113886779A CN 113886779 A CN113886779 A CN 113886779A CN 202111161578 A CN202111161578 A CN 202111161578A CN 113886779 A CN113886779 A CN 113886779A
Authority
CN
China
Prior art keywords
information
person
identification
group organization
identifications
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111161578.1A
Other languages
Chinese (zh)
Inventor
杨镓毓
方涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fangjianghu Technology Co Ltd
Original Assignee
Beijing Fangjianghu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fangjianghu Technology Co Ltd filed Critical Beijing Fangjianghu Technology Co Ltd
Priority to CN202111161578.1A priority Critical patent/CN113886779A/en
Publication of CN113886779A publication Critical patent/CN113886779A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication

Abstract

The embodiment of the disclosure discloses a person identity identification method, a storage medium and a computer program product. The personnel identity identification method comprises the following steps: determining at least two identical target personnel identifications from a predetermined personnel identification set; respectively determining group organization information corresponding to the at least two target personnel identifications from the association relationship information of the at least two target personnel identifications; and inputting the determined group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model, wherein the identification result represents whether the persons indicated by the at least two target person identifications are the same person. The embodiment of the disclosure can adopt the identification model constructed based on the lightweight gradient elevator to identify whether the personnel indicated by the two target personnel identifications are the same personnel, thereby improving the accuracy of personnel identification.

Description

Method for identifying person identity, storage medium and computer program product
Technical Field
The present disclosure relates to the field of big data technology, and in particular, to a method, a storage medium, and a computer program product for identifying a person identity.
Background
In the prior art, electronic devices such as computers often identify objects through identification. For example, for the identification of the identity of a person, different person identifications are often used to mark different persons.
However, in the background of wide application of big data, the acquisition channels of data are in a diversified trend. The data obtained from different channels may actually correspond to the same object. For example, in the current market, company data can be acquired through acquisition of public official channels, and information query services are provided. However, since the public data that can be obtained cannot completely cover the identity information of all natural persons, the identification of company personnel (whether the same personnel of different companies are the same natural person or not) becomes a core problem that plagues these companies and all company data users.
Therefore, how to improve the accuracy of identity recognition is a concern.
Disclosure of Invention
The embodiment of the disclosure provides a person identity identification method, a storage medium and a computer program product, so as to improve the accuracy of identity identification.
According to a first aspect of the embodiments of the present disclosure, a method for identifying a person identity is provided, which includes:
determining at least two identical target personnel identifications from a predetermined personnel identification set, wherein the at least two target personnel identifications have different association relation information;
respectively determining group organization information corresponding to the at least two target personnel identifications from the incidence relation information of the at least two target personnel identifications, wherein the group organization information is information of a group organization in which the personnel indicated by the personnel identification are located;
and inputting the determined group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model, wherein the identification result represents whether the persons indicated by the at least two target person identifications are the same person.
Optionally, in the method according to any embodiment of the present disclosure, the inputting the determined group organization information into a recognition model pre-constructed based on a lightweight gradient elevator, and generating a recognition result via the recognition model includes:
selecting a preset second quantity threshold of group organization information from the determined group organization information in response to the fact that the quantity of the group organization indicated by the determined group organization information is larger than or equal to a preset first quantity threshold;
and inputting the selected group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model.
Optionally, in the method according to any embodiment of the present disclosure, the inputting the determined group organization information into a recognition model pre-constructed based on a lightweight gradient elevator, and generating a recognition result via the recognition model includes:
in response to the fact that the number of the group organizations indicated by the determined group organization information is larger than or equal to a preset first number threshold, determining the area where the group organizations indicated by the determined group organization information are located, and obtaining an area set;
aiming at the area in the area set, selecting at least one group organization information from the group organization information of the group organization in the area;
and inputting the selected group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model.
Optionally, in the method according to any embodiment of the present disclosure, the inputting the determined group organization information into a recognition model pre-constructed based on a lightweight gradient elevator, and generating a recognition result via the recognition model includes:
inputting the determined group organization information into a pre-constructed recognition model based on a lightweight gradient elevator, determining Euclidean distance between the group organization information corresponding to the at least two target personnel identifications through the recognition model, and generating a recognition result based on the Euclidean distance.
Optionally, in the method of any embodiment of the present disclosure, the method further includes:
inputting the recognition model and the determined group organization information to a pre-trained interpretation model, and generating interpretation information of the recognition model for the recognition result via the interpretation model.
Optionally, in the method of any embodiment of the present disclosure, the method further includes:
acquiring a sample information set, wherein the sample information in the sample information set comprises group organization information and pre-labeled label information corresponding to the group organization information, and the label information represents whether two personnel identifications corresponding to the group organization information indicate the same personnel;
determining a training sample set from the sample information set;
and taking the group organization information included in the training samples in the training sample set as input data, taking the label information corresponding to the input data as expected output data, and training to obtain the recognition model.
Optionally, in the method of any embodiment of the present disclosure, the method further includes:
determining a verification sample set from the sample information set;
and calculating at least one of the area under the working characteristic curve, the accuracy, the precision and the recall rate of the subject of the identification model based on the verification sample set.
Optionally, in the method according to any embodiment of the present disclosure, the association relationship information is represented by nodes or edges in a predetermined knowledge graph, and the person identifier is represented by nodes in the knowledge graph; and
the determining at least two identical target person identifications from a predetermined set of person identifications includes:
determining at least two identical target person identifications from a set of person identifications characterized by nodes in the knowledge-graph.
Optionally, in the method of any embodiment of the present disclosure, the method further includes:
marking a plurality of target personnel identifications representing the same personnel in the knowledge graph by adopting the same clustering identification based on the identification result;
and calculating at least one of accuracy, comprehensiveness and accuracy of the recognition result aiming at the cluster identification based on each cluster identification marked by the target personnel in the knowledge graph.
Optionally, in the method of any embodiment of the present disclosure, the group organization information includes company information; and
the determining group organization information respectively corresponding to the at least two target person identifications from the association relationship information of the at least two target person identifications respectively comprises:
determining company information corresponding to the at least two target personnel identifications respectively from the incidence relation information of the at least two target personnel identifications respectively; and
the step of inputting the determined group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model comprises the following steps:
inputting at least one of the following items into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model:
a relationship characteristic between the determined at least two corporate information;
company information having an association relationship with the determined at least two company information;
and the person information has an association relation with the determined at least two company information.
Optionally, in the method of any embodiment of the present disclosure, the dimension of the feature data used for training the recognition model is greater than or equal to 100.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for identifying a person, including:
a first determining unit configured to determine at least two identical target person identifications from a predetermined person identification set, wherein the at least two target person identifications have different association relationship information;
the second determining unit is configured to determine group organization information corresponding to the at least two target person identifications respectively from the association relationship information of the at least two target person identifications, wherein the group organization information is information of a group organization in which the person indicated by the person identification is located;
a first input unit configured to input the determined group organization information to a recognition model pre-constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model, wherein the recognition result represents whether the persons indicated by the at least two target person identifications are the same person.
Optionally, in the apparatus of any embodiment of the present disclosure, the first input unit includes:
a first selecting subunit configured to select, in response to the number of group organizations indicated by the determined group organization information being greater than or equal to a preset first number threshold, a preset second number threshold of group organization information from the determined group organization information;
a first input subunit configured to input the selected group organization information to a recognition model pre-constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model.
Optionally, in the apparatus of any embodiment of the present disclosure, the first input unit includes:
a first determining subunit, configured to determine, in response to that the number of the group organizations indicated by the determined group organization information is greater than or equal to a preset first number threshold, an area where the group organizations indicated by the determined group organization information are located, resulting in an area set;
a second selecting subunit configured to select, for an area in the area set, at least one group organization information from the group organization information of the group organization located in the area;
a second input subunit configured to input the selected group organization information to a recognition model pre-constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model.
Optionally, in the apparatus of any embodiment of the present disclosure, the first input unit includes:
a third input subunit configured to input the determined group organization information to a recognition model pre-constructed based on a lightweight gradient elevator, determine a euclidean distance between the group organization information corresponding to the at least two target person identifiers via the recognition model, and generate a recognition result based on the euclidean distance.
Optionally, in the apparatus of any embodiment of the present disclosure, the apparatus further includes:
a second input subunit configured to input the recognition model and the determined group organization information to a pre-trained interpretation model via which interpretation information of the recognition model for the recognition result is generated.
Optionally, in the apparatus of any embodiment of the present disclosure, the apparatus further includes:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a sample information set, wherein the sample information in the sample information set comprises group organization information and pre-labeled label information corresponding to the group organization information, and the label information represents whether two personnel identifications corresponding to the group organization information indicate the same personnel;
a third determining unit configured to determine a training sample set from the sample information set;
and the training unit is configured to train the group organization information included in the training samples in the training sample set as input data and label information corresponding to the input data as expected output data to obtain the recognition model.
Optionally, in the apparatus of any embodiment of the present disclosure, the apparatus further includes:
a fourth determination unit configured to determine a set of verification samples from the set of sample information;
a first calculating unit configured to calculate at least one of an area under a subject working characteristic curve, an accuracy rate, a recall rate of the identification model based on the set of verification samples.
Optionally, in the apparatus according to any embodiment of the present disclosure, the association relationship information is represented by nodes or edges in a predetermined knowledge graph, and the person identifier is represented by nodes in the knowledge graph; and
the first determination unit includes:
a second determining subunit configured to determine at least two identical target person identifications from the set of person identifications characterized by the nodes in the knowledge-graph.
Optionally, in the apparatus of any embodiment of the present disclosure, the apparatus further includes:
a marking unit configured to mark a plurality of target person identifications representing the same person in the knowledge graph by using the same cluster identification based on the recognition result;
and the second calculation unit is configured to calculate at least one of the accuracy degree, the comprehensiveness degree and the accuracy degree of the recognition result of the knowledge graph based on the clustering identification of the target person mark in the knowledge graph.
Optionally, in the apparatus of any embodiment of the present disclosure, the group organization information includes company information; and
the second determination unit includes:
a third determining subunit, configured to determine company information corresponding to the at least two target person identifiers respectively from the incidence relation information that the at least two target person identifiers have respectively; and
the first input unit includes:
a third input subunit configured to input at least one of the following to a recognition model pre-constructed based on the lightweight gradient elevator, via which recognition result is generated:
a relationship characteristic between the determined at least two corporate information;
company information having an association relationship with the determined at least two company information;
and the person information has an association relation with the determined at least two company information.
Optionally, in the apparatus of any embodiment of the present disclosure, a dimension of the feature data used for training the recognition model is greater than or equal to 100.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a memory for storing a computer program;
a processor configured to execute the computer program stored in the memory, and when the computer program is executed, the method of any embodiment of the method for identifying a person identity of the first aspect of the present disclosure is implemented.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable medium, where the computer program is executed by a processor to implement the method according to any one of the embodiments of the method for identifying a person identity in the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program comprising computer readable code which, when run on an apparatus, causes a processor in the apparatus to execute instructions for implementing the steps in the method of any of the embodiments of the method for identification of identity of a person as described in the first aspect above.
According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer program product having a computer program stored thereon, the computer program, when executed by a processor, implementing the method according to any one of the embodiments of the method for identifying a person identity according to the first aspect described above.
Based on the identification method, storage medium and computer program product for personnel identity provided by the above embodiments of the present disclosure, at least two identical target person identifications may be determined from a predetermined set of person identifications, wherein, the at least two target personnel identifications have different association relation information, then the group organization information corresponding to the at least two target personnel identifications is determined from the association relation information of the at least two target personnel identifications respectively, wherein the group organization information is information of a group organization in which the person indicated by the person identification is located, and finally, inputting the determined group organization information to a recognition model constructed in advance based on the lightweight gradient elevator, generating a recognition result via the recognition model, and the identification result represents whether the people indicated by the at least two target people identifications are the same person or not. In the embodiment of the disclosure, whether the persons indicated by the two target person identifications are the same person or not can be identified based on the identification model pre-constructed by the lightweight gradient elevator, so that the representation capability and performance can be improved by selecting the lightweight gradient elevator, and the accuracy of identity identification is improved.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of a first embodiment of a method for identifying an identity of a person of the present disclosure.
Fig. 2 is a flowchart of a second embodiment of the method for identifying the identity of a person of the present disclosure.
Fig. 3A to fig. 3F are schematic application scenarios of an embodiment of the method for identifying an identity of a person of the present disclosure.
Fig. 4 is a schematic structural diagram of an embodiment of an apparatus for identifying an identity of a person of the present disclosure.
Fig. 5 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to at least one of a terminal device, a computer system, and a server, which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with at least one electronic device of a terminal device, computer system, and server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
At least one of the terminal device, the computer system, and the server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Referring to fig. 1, a flow 100 of a first embodiment of a method of identifying a person's identity according to the present disclosure is shown. The personnel identity identification method comprises the following steps:
101, at least two identical target person identifications are determined from a predetermined set of person identifications.
In this embodiment, an executing entity (for example, a server, a terminal device, a person identification recognition device, etc.) of the person identification recognition method may obtain a predetermined person identification set from other electronic devices or locally through a wired connection manner or a wireless connection manner. Then, at least two identical target person identifications are determined from the set of person identifications.
The person identifier set may include a plurality of person identifiers. The person identification may be any information that may be used to identify a person. The person identification may be the name of the person, for example. The set of person identifiers may comprise two or more identical person identifiers. The predetermined set of person identifications may be a set of various predetermined person identifications. In practice, a set of person identifications may be stored. As an example. The set of person identifications may include, but is not limited to: personnel identification (e.g., names) of staff members in all or a portion of companies within an area (e.g., country) obtained from an official channel.
Different persons can be characterized here by the same person identification. For example, two natural persons have the same name, and if the name is used as the person identifier of the natural person, the two natural persons can be characterized by the same person identifier (i.e., name).
In practice, the person identifiers in the person identifier set may be represented by nodes in a knowledge graph or in other forms.
The at least two target person identifications have different association relation information. The association relation information may include: any information that has an association with the person identification is stored. For example, the association may include, but is not limited to: the person identification indicates group organization information, age, sex, native place, and the like of a group organization to which the person belongs.
Here, if the person identifiers in the person identifier set are represented by nodes in a knowledge graph, the association relationship information of the person identifiers may be represented by nodes or edges in the knowledge graph or another knowledge graph; if the person identifier in the person identifier set is stored in the two-dimensional table or the database, the association relationship information of the person identifier may be stored in the two-dimensional table or the database in association with the person identifier.
As an example, the foregoing 101 may specifically include the following steps: from a predetermined set of names of people, two names of people having the same name (i.e., two strings of characters are the same) but belonging to different companies are determined. For example, Zhang three, which is affiliated with company A, and Zhang three, which is affiliated with company B.
And 102, determining group organization information corresponding to the at least two target personnel identifications respectively from the association relation information of the at least two target personnel identifications respectively.
In this embodiment, the executing entity may determine group organization information corresponding to the at least two target person identifiers from the association relationship information of the at least two target person identifiers, respectively.
The community organization information may include: and the stored information of the group organization which has the association relation with the personnel identification (including the target personnel identification). The group organization information is information of a group organization in which the person identification indicates that the person is located. For example, the community organization information may include at least one of the following company information: the relationship characteristic between the determined at least two company information, the company information having an association relationship with the determined at least two company information, and the person information having an association relationship with the determined at least two company information. Optionally, the group organization information may further include city information, family information, and the like of the person.
In practice, the same natural person usually belongs to two or more group organizations (for example, the same natural person serves as a stockholder of two companies), however, in an application scenario of mass data, group organization information of all group organizations to which the same natural person belongs cannot be collected and stored. Here, the group organization information corresponding to the determined person identifier allows group organization information of not all group organizations to which the person indicated by the person identifier belongs.
The group organization information thus determined is input to a recognition model previously constructed by the lightweight gradient elevator, and a recognition result is generated via the recognition model 103.
In this embodiment, the execution agent may input the determined group organization information to an identification model pre-constructed based on a lightweight Gradient hoisting Machine (LightGBM), and generate an identification result via the identification model.
And the identification result represents whether the people indicated by the at least two target people identifications are the same person or not. The recognition model may be used to recognize whether the persons indicated by the at least two target person identifications are the same person.
For example, the execution agent may sequentially input two or more pieces of group organization information among the determined group organization information to an identification model previously constructed based on the lightweight gradient elevator, and generate an identification result through the identification model.
Here, in the aspect of model selection, a lightweight gradient elevator in the integrated tree model is selected. Compared with methods such as a deep neural network and the like, the integrated tree model can more directly and properly process the nonlinear problem. Furthermore, in this embodiment, the light-weight gradient elevator is selected to improve the characterization capability and performance, thereby improving the accuracy of identity recognition.
In practice, the recognition model may be obtained by training in a supervised manner based on a predetermined acquisition sample information set. The sample information in the sample information set comprises group organization information and label information which is labeled in advance and corresponds to the group organization information, and the label information represents whether two personnel identifications corresponding to the group organization information indicate the same personnel.
The method for identifying the person identities provided in the above embodiment of the present disclosure may determine at least two identical target person identifiers from a predetermined person identifier set, where the at least two target person identifiers have different association relationship information, then determine group organization information corresponding to the at least two target person identifiers respectively from the association relationship information that the at least two target person identifiers have, where the group organization information is information of a group organization in which a person indicated by the person identifier is located, and finally input the determined group organization information to an identification model pre-constructed based on a lightweight gradient elevator, and generate an identification result through the identification model, where the identification result represents whether the persons indicated by the at least two target person identifiers are the same person. Therefore, the representation capability and performance can be improved by selecting the light-weight gradient elevator, and the accuracy of identity recognition is further improved.
In some optional implementations of the present embodiment, the execution main body may adopt the following manner. To perform the above 103:
first, when the number of group organizations indicated by the determined group organization information is greater than or equal to a preset first number threshold, a preset second number threshold of group organization information is selected from the determined group organization information. The first quantity threshold and the second quantity threshold may be predetermined values, respectively, and the first quantity threshold may be greater than the second quantity threshold.
Then, the selected group organization information is input to a recognition model which is constructed in advance based on the lightweight gradient elevator, and a recognition result is generated through the recognition model.
It is to be understood that, in the above alternative implementation manner, in the case that the number of the group organizations indicated by the determined group organization information is too large (i.e. the number of the group organizations indicated by the determined group organization information is greater than or equal to the preset first number threshold), part of the group organization information may be selected from the group organization information to be used as the input of the recognition model. In this way, when the number of group organizations is too large, the input data of the recognition model can be reduced by limiting the number of group organization information to the second number threshold, and the speed of identity recognition can be increased.
In some optional implementations of this embodiment, the executing body may also execute 103 in the following manner:
first, when the number of group organizations indicated by the determined group organization information is greater than or equal to a preset first number threshold, determining an area where the group organization indicated by the determined group organization information is located, and obtaining an area set. The area where the group organization is located may be an activity area or a registration area of the group organization (e.g., a company), and the like.
Then, at least one group organization information is selected from the group organization information of the group organization in the area aiming at the area in the area set.
And finally, inputting the selected group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model.
It is to be understood that, in the above alternative implementation manner, in the case that the number of the group organizations indicated by the determined group organization information is too large (i.e. the number of the group organizations indicated by the determined group organization information is greater than or equal to the preset first number threshold), the group sampling may be preferentially performed in different areas (e.g. cities), and part of the group organization information may be selected as the input of the recognition model. In this way, when the number of group organizations is too large, the input data of the recognition model can be reduced by adopting the grouping, so that the speed of identity recognition can be improved.
In some optional implementations of this embodiment, the executing body may further execute 103 by:
inputting the determined group organization information into a recognition model which is constructed in advance based on a lightweight gradient elevator, determining Euclidean distance between the group organization information corresponding to the at least two target personnel identifications through the recognition model, and generating a recognition result based on the Euclidean distance.
It can be understood that, in the above alternative implementation manner, the euclidean distance between group organization information may be calculated by using a recognition model, and a recognition result is generated based on the euclidean distance, so that, in the case where the set of person identifiers is a set of person identifiers represented by nodes in the knowledge graph, the map domain features may be converted into features in a euclidean space, and identity recognition may be further implemented.
In some optional implementations of this embodiment, the executing body may execute the following steps:
the recognition model and the determined group organization information are input to a pre-trained interpretation model, and interpretation information of the recognition model for the recognition result is generated via the interpretation model.
The interpretation Model may be a SHAP (SHAPLey Additive ExPlation) Model, a Home (Local Interpretetable Model-aging ExPlation) Model, or the like.
It can be understood that, in the prior art, the model often lacks interpretability, and cannot be sufficiently applied due to the reason that the prediction result cannot be simply and intuitively given in the actual business. In the above optional implementation manner, the recognition result generated by the recognition model may be interpreted by the interpretation model to describe the main influence factors, so that the influence of the multiple collinearity on the feature effect can be eliminated.
In some optional implementations of this embodiment, the executing body may execute the following steps:
first, a sample information set is obtained. The sample information in the sample information set comprises group organization information and label information which is labeled in advance and corresponds to the group organization information. The tag information represents whether two person identifications corresponding to the group organization information indicate the same person.
And then, determining a training sample set from the sample information set. Here, the sample information may be selected from the sample information set according to a certain strategy or randomly, so as to obtain the training sample set.
Then, the group organization information included in the training samples in the training sample set is used as input data, label information corresponding to the input data is used as expected output data, and the recognition model is obtained through training.
It is understood that in the above alternative implementation, the recognition model may be trained in a supervised manner.
In some application scenarios in the above optional implementation manners, the execution main body may further perform the following steps:
first, a verification sample set is determined from the sample information set. Here, the sample information may be selected from the sample information set according to a certain policy or randomly, so as to obtain the verification sample set. For example, the sample information set may be randomly divided into two parts, one part of the sample information set is used as a training sample set, and the other part of the sample information set is used as a verification sample set. Here, the number of sample information in both the training sample set and the verification sample set may be arbitrarily determined. For example, the ratio of the number of sample information in the training sample set to the number of sample information in the validation sample set may be 4 to 1.
And then, calculating at least one of the area under the operating characteristic curve (ROC), the Accuracy (Accuracy), the Precision (Precision) and the Recall (Recall) of the identification model based on the verification sample set.
It can be understood that in the above application scenario, the identification result of the identification model may be evaluated by calculating at least one of an area under the working characteristic curve of the subject of the identification model, an accuracy rate, and a recall rate, so as to ensure that the applied identification model has a higher accuracy in identification.
In some alternative implementations of the present embodiment, the community organization information includes corporate information. On this basis, the execution main body may execute the step 102 in the following manner: and determining company information corresponding to the at least two target personnel identifications respectively from the association relationship information of the at least two target personnel identifications respectively.
Further, the executing agent may execute 103 as follows:
inputting at least one of the following items to a pre-constructed recognition model based on the lightweight gradient elevator, and generating a recognition result through the recognition model:
first, a relationship characteristic between the determined at least two company information. Such as a person's hedging time between two companies, an investment amount, relationship information between branches of two companies, etc.
And a second item of individual company information having an association relationship (e.g., investment relationship) with the determined at least two company information. For example, the company's real income capital sum, stockholder number average, etc. indicated by the respective company information.
And thirdly, the personnel information has an association relation with the determined at least two company information. For example, in the knowledge graph, the sum of the reciprocal of the degree of the node representing each person information (which may be represented by one node in the knowledge graph), and the like.
It is understood that in the above alternative implementation manner, the recognition result may be generated based on at least one item, so that the accuracy of identity recognition may be further improved.
In some optional implementations of the present embodiment, the dimension of the feature data used for training the recognition model is greater than or equal to 100.
As an example, a partial Feature (e.g., 105-dimensional Feature may be selected) from a multi-dimensional Feature (e.g., 345-dimensional Feature) may be selected into the recognition model based on Recursive Feature Removal (RFE).
It can be understood that in the above alternative implementation manner, the identification can be realized by selecting features with more dimensions, so that the accuracy of the identification is further improved.
With further reference to fig. 2, fig. 2 is a flow chart of a second embodiment of a method of identifying a person identity of the present disclosure. The process 200 of the method for identifying the person identity includes:
at least two identical target person identities are determined 201 from a set of person identities characterized by nodes in the knowledge-graph.
In this embodiment, an executing entity (e.g., a server, a terminal device, a person identity recognition device, etc.) of the person identity recognition method may obtain a set of person identifiers represented by nodes in the knowledge graph from other electronic devices or locally through a wired connection manner or a wireless connection manner. Then, at least two identical target person identifications are determined from the set of person identifications.
The person identifier set may include a plurality of person identifiers. The person identification may be any information that may be used to identify a person. The person identification may be the name of the person, for example.
Different persons can be characterized here by the same person identification. For example, two natural persons have the same name, and if the name is used as the person identifier of the natural person, the two natural persons can be characterized by the same person identifier (i.e., name).
The predetermined set of person identifications may be a set of various predetermined person identifications. In practice, a set of person identifications may be stored. As an example. The set of person identifications may include, but is not limited to: personnel identification (e.g., name) of staff members in all or a portion of the companies in a certain area (e.g., the country) obtained from an official channel. The person identifiers in the person identifier set can be represented by nodes in the knowledge graph. Edges in the knowledge graph may represent relationship information or attribute information having an association relationship with a node.
Here, the knowledge graph may include at least two nodes referring to the same person identifier, but the association relationship information of different nodes is usually different. Thus, the executive may determine at least two identical target person identifications from a set of person identifications (e.g., names) characterized by nodes in the knowledge-graph.
In this embodiment, the association relationship information is represented by nodes or edges in a predetermined knowledge graph, and the person identifier is represented by the nodes in the knowledge graph.
202, determining group organization information corresponding to the at least two target person identifications respectively from the association relationship information of the at least two target person identifications respectively.
In this embodiment, the executing entity may determine group organization information corresponding to the at least two target person identifiers from the association relationship information of the at least two target person identifiers, respectively. Wherein the group organization information is information of a group organization in which a person indicated by the person identification is present
In this embodiment, 202 is substantially the same as 102 in the corresponding embodiment of fig. 1, and is not described herein again.
The group organization information thus determined is input to a recognition model previously constructed by a lightweight gradient elevator, and a recognition result is generated via the recognition model 203.
In this embodiment, the execution agent may input the determined group organization information to a recognition model previously constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model. And the identification result represents whether the people indicated by the at least two target people identifications are the same person or not.
In this embodiment, step 203 is substantially the same as step 103 in the embodiment corresponding to fig. 1, and is not described here again.
It should be noted that, besides the above-mentioned contents, the embodiment of the present application may further include the same or similar features and effects as the embodiment corresponding to fig. 1, and details are not repeated herein.
As can be seen from fig. 2, in the embodiment, the process 200 of the method for identifying person identities may determine at least two identical target person identities from a set of person identities represented by nodes in a knowledge graph, so that whether the persons indicated by the person identities represented by the nodes in the knowledge graph are the same person may be identified based on a lightweight gradient elevator. Therefore, the relation information between the nodes and/or edges of the graph structure in the knowledge graph can be abstracted into the characteristic data in the identification model constructed by the lightweight gradient elevator, so that the accuracy of identity identification in the scene where big data should be used is improved.
In some optional implementation manners of this embodiment, the executing main body may further perform the following steps:
firstly, based on the recognition result, the same cluster identifier is adopted to mark a plurality of target personnel identifiers representing the same personnel in the knowledge graph.
Then, based on each cluster identifier marked by the target person in the knowledge graph, at least one of accuracy, comprehensiveness and accuracy of the identification result for the cluster identifier can be calculated.
Here, after obtaining the recognition result via the recognition model described above, a plurality of person identifiers representing the same person in the knowledge graph may be labeled with the same cluster identifier. In this way, for each cluster identity, a set of community organizations indicated by the predicted community organization information for that cluster identity (hereinafter referred to as predicted community organization set) and a set of community organizations indicated by the real community organization information for that cluster identity (hereinafter referred to as real community organization set) may be derived. Wherein, the set of group organizations indicated by the real group organization information can be obtained based on the label information in the sample information set.
As an example, the accuracy P, the comprehensiveness R, and the accuracy a of the recognition result of the cluster identifier may be calculated by using the following formulas:
P=TP÷(TP+FP)
R=TP÷(TP+FN)
A=TP÷(TP+FP+FN)
wherein TP represents the number of group organizations in the intersection of the prediction group organization set and the real group organization set, i.e. the correct number of predictions; FP represents the number of community organizations in the difference set of the prediction community organization set and the real community organization set, i.e., the number of prediction errors; FN represents the number of community organizations in the difference set of the real community organization set and the prediction community organization set, i.e., the number of prediction omissions.
It can be understood that, in the above alternative implementation manner, the clustering result may be measured based on at least one of accuracy, comprehensiveness, and accuracy of the recognition result identified by the knowledge graph for the cluster, so as to know whether the recognition result finally contains no wrong relationship, whether all relationships are contained, and accuracy. Therefore, the evaluation of the identification result based on accuracy, comprehensiveness and accuracy is realized.
The following exemplifies the present embodiment by taking the group organization information as the company information:
referring to fig. 3A to 3F, fig. 3A to 3F are schematic application scenarios of an embodiment of the identity recognition method of the present disclosure.
In fig. 3A: first, data processing preparation may be performed.
In the application scenario, the related data mainly comprises two parts:
the first part is the original data. All enterprise and personnel data related to the real estate industry can be acquired through third party purchasing. For example: basic company information, main persons, stakeholder information, change information, basic person information, and the like. Because of the nature of the associated features in the enterprise repository, the data may be presented in a knowledge-graph manner. The knowledge graph may contain nodes and edges, among other things. Each node may be a company node (characterizing a company) or a person node (characterizing a person), with edges characterizing relationships such as legal, investment, job, etc. Additional information may be available on both the company and personnel nodes.
The second part is a sample information set. The sample information set can be divided into a training sample set, a testing sample set and a verification sample set. In this case, the sample information set may be obtained by interface call or the like to construct the recognition model, and any two companies are selected as a combination for all companies of the same person. If this "company C1Person P-company C2In the triple, the same person of the two companies is the same natural person, so that the positive case (label is 1) of the prediction problem is the negative case (label is 0) of the prediction problem. Wherein, company C1Person P-company C2Can characterize company C in the knowledge graph1And a company node characterizing company C2, respectively, have an association with a person node characterizing person P, however, in the knowledge-graph, with a person node characterizing company C1A person node of a representative person P having an association relationship with the company node of (C), and a representative company C2The company node of (2) has a person node of the representative person P in an association relationship, and the two person nodes are not one person node, that is, in the knowledge graph, two person nodes are adopted to respectively represent: and characterization company C1A person node of a representative person P having an association relationship with the company node of (C), and a representative company C2Company node ofAnd the person node which is used for representing the person P and has the incidence relation.
Thereafter, a metapath feature engineering step may be performed. Here, the graph structure problem can be abstracted and converted into the euclidean space problem based on the graph feature engineering of metapath. Where MetaPath is a path that contains relationship information defined between objects of different types.
Specifically, graph feature engineering may be performed based on the following three kinds of metapath, which are input to the recognition model to abstract the graph structure data to the euclidean feature space: company C1Company C2Company C1Company C' -company C2Company C1Person P' -company C2
Wherein, company C1Company C2Respectively, contains person P. In this application scenario, it is company C that needs to be determined1Person P and company C in (1)2Whether the person P in (a) is the same natural person. If company C1Person P and company C in (1)2If the people P in the knowledge graph (also called enterprise relationship graph) are the same natural person, the company C in the knowledge graph (also called enterprise relationship graph) can be used1Person P and company C in (1)2The persons P in (a) are grouped together and given the same aggregate segmentation id (i.e. the cluster identity described above). In addition, domain knowledge (domain knowledge) can be fused in the input data of the recognition model in a feature engineering mode, so that the graph structure is converted into a Euclidean space problem, and therefore better learning can be achieved. Such an approach may be desirable where the data size is limited, and where the sample is richer, end-to-end learning based on Graph Neural Networks (GNNs) may be more desirable.
By way of example, please refer to FIG. 3B, where FIG. 3B employs "company C1Company C' -company C2"to characterize a schematic of metapath. In FIG. 3B, companies 1-3, within the dashed box, may be referred to as "company C1Company C' -company C2"to characterize metapath.
Further, company C1Company C2Can useIn processing company C1And company C2Features of relationships between persons, e.g. person P at company C1Company C2Time of participating, amount of investment, company C1Company C2Branch relations of (a); also for processing company C1And company C2Such as the jaccard similarity of company name participles. Company C1Company C' -company C2Can be used for processing the characteristics of a plurality of companies C 'in the middle and carrying out polymerization, such as the real income capital sum, the average value of the number of stockholders and the like of the plurality of companies C'; it can also be used to process two-end companies (i.e., company C)1And company C2) Characteristics of the relationship with the intermediary company C', such as equity proportion information, etc. Company C1Person P' -company C2The method can be used for processing the characteristics of the middle personnel P' and performing aggregation, such as the sum of reciprocal numbers of the degree (namely the number of edges in the graph) of the personnel node; it can also be used to process two-end companies (i.e., company C)1And company C2) Features of relationship to intermediary P ', e.g. intermediary P' at company C1And company C2The type of duties, etc.
In this example, 345-dimensional features may be created, where only 4 typical features are chosen as shown in the following table:
typical characteristics Value example
Company
1 XXX science and technology Ltd
Company
2 YYY technology Co Ltd
People of the same name First of all
Company 1 investment company 2 100 ten thousand yuan
Company name jaccard similarity 0.5
Number of companies associated with both companies 3
The number of names associated with both companies 2
An integrated tree model (e.g., LightGBM) may then be used for recognition and provide an interpretation of the model predictions using the SHAP as an interpretation model.
Here, in the model selection aspect, the integration tree model is selected. Firstly, compared with methods such as a deep neural network and the like, the integrated tree model can more directly and properly handle the nonlinear problem, and here, the LightGBM with better representation capability and performance is selected. In the interpretability aspect, the prediction result of the LightGBM is explained by using a method of SHAP (SHAPLey Additive experiment), main influence factors are illustrated, and the method can eliminate the influence of multiple collinearity on characteristic effects.
Here, the process and steps of modeling the recognition model mainly include: data preprocessing, Exploratory Data Analysis (EDA), pipeline model (pipeline) construction, feature selection, Bayesian optimization parameter adjustment and model evaluation. Wherein:
in terms of feature selection: on the basis of Recursive Feature Extraction (RFE), a method of using a fixed ratio instead of a fixed number is proposed, so that the early convergence is faster, and the later Feature selection is finer. Finally, we have selected 105-dimensional features from the 345-dimensional features into the final model. In the RFE iteration process, the variation of the characteristic number, the Area Under the AUC (roc) Curve, and the Area Under the Area cut index is shown in fig. 3C.
In terms of model evaluation: we performed model evaluation on two local and global levels and evaluated the stability of the model in the time dimension. Wherein:
the local level is "company C1Person P-company C2"whether the relationship prediction is correct, that is, whether the recognition result is correct. Specifically, the evaluation can be made by: area under the operating characteristic curve (AUC-ROC), Accuracy (Accuracy), Precision (Precision), Recall (Recall), and the like. Wherein the area under the ROC curve, AUC, reached 0.9768, and the ROC curve is shown in FIG. 3D.
The global level is to evaluate whether the clustering result is accurate or not and whether the clustering result is comprehensive or not through the global precision rate and the global recall rate, and the following two indexes are mainly defined: global precision and global recall. And the global accuracy rate represents the proportion that natural people divided by the clustering result do not contain error relation. The global recall characterizes the proportion of actual natural persons (determined based on tag information) for which all relationships are found. Time dimension model stability the model efficacy decay was re-evaluated using data over time.
Subsequently, the SHAP interpretation model can also be adopted to generate interpretation information of the recognition model for the recognition result.
In particular, the predicted result of LightGBM can be explained by the method of SHAP, which can exclude the effect of multiple collinearity on the characteristic effect, to illustrate the main influencing factors. The interpretation information may reflect: factors which have positive influence on model prediction, namely reasons for establishment of the relation, and the longer the length, the higher the importance is represented; and negative factors, i.e., the reason why the relationship does not hold.
And then, carrying out enterprise personnel aggregation and segmentation to obtain a clustering result.
In particular, a confidence model may be trained to provide a variety of confidence measures based on an aggregated segmentation of relationship threshold connected components.
After the threshold value is selected, the same natural person in the same person in different companies is aggregated by calculating connected components (connected components), and different connected components are divided and assigned with different aggregation division ids. For example, a person of the same name P appears in four companies, C1、C2、C3、C4And the predicted results are shown in the following table:
company(s) Company(s) Prediction
C1 C2 1
C1 C3 1
C1 C4 0
C2 C3 0
C2 C4 0
C3 C4 0
Then C is1、C2、C3Should be given the same aggregate split id, and C4The person in (1) is then the individual aggregate split id, as an example, C1、C2、C3、C4The relationship between them can be as shown in fig. 3E.
Next, confidence modeling may proceed.
In particular, we performed model evaluation on two local and global levels, and evaluated the stability of the model in the time dimension. The local level is "company C1Person P-company C2Whether the relation prediction of the' is correct or not is determined, and the global level means whether the aggregation segmentation result is accurate or not and whether the aggregation segmentation result is complete or not.
Three confidence definitions of 'accuracy', 'comprehensiveness' and 'accuracy' are provided, and the aggregation segmentation result is measured respectively: whether there is no relationship containing an error; whether all relationships are included; a comprehensive accuracy measure. We can give confidence scores better by a sub-model training method.
Here, after obtaining the recognition result via the recognition model described above, a plurality of person identifiers representing the same person in the knowledge graph may be labeled with the same cluster identifier. In this way, for each cluster identity, a set of predicted companies for that cluster identity (hereinafter referred to as a set of predicted companies), and a set of real companies for that cluster identity (hereinafter referred to as a set of real companies) may be obtained. Wherein, the set of real companies can be obtained based on the labels in the sample information set.
For example, the following formulas can be used to calculate three confidence levels of accuracy P, comprehensiveness R and accuracy a:
P=TP÷(TP+FP)
R=TP÷(TP+FN)
A=TP÷(TP+FP+FN)
wherein, TP represents the number of companies in the intersection of the forecast company set and the real company set, namely the correct number of forecasts; FP represents the number of companies in the difference set of the predicted company set and the real company set, i.e., the number of prediction errors; FN represents the number of companies in the difference set of the real company set and the predicted company set, i.e., the number of prediction omissions.
It should be noted that, we first find the real natural person with the largest company intersection included in the aggregate split id as the real natural person of the aggregate split id, and if the number of intersection of more than two real natural persons is equal, take the company set with the smallest external difference set.
Illustratively, three calculation examples may be given by fig. 3F, where the solid boxes represent the predicted company set and the dashed boxes represent the real company set.
Furthermore, these three confidences can be learned and predicted by way of sub-models. For example, the smaller the average prediction probability within a group, the lower the accuracy confidence, i.e., there may be an error; the greater the inter-group maximum prediction probability, the lower the overall confidence, i.e., there may be omissions. The characteristic engineering and model selection parts of the sub-model are not described in detail, and the sub-model can be easily realized by mainly referring to the ideas.
Finally, model deployment may be brought online and model monitoring iterations may be performed.
Specifically, the jpmml model can be used to perform offline prediction on new samples in a distributed model deployment prediction manner. In addition, after the model is on line, the verification data needs to be updated regularly, the attenuation condition of the model evaluation index is monitored, and iterative updating is performed if necessary.
In the application scene, the accuracy of the personal identification in the company can be improved, the result can be interpreted, the personal identification problem can be solved with different confidence levels, and the relationship that the same person of different companies should be the same person is supplemented by using part of relatively accurate natural person identity information. Meanwhile, information such as enterprise node attributes, graph structure incidence relation, interaction characteristics and the like is considered. The model selection has the nonlinear fitting capability and certain interpretability. The result processing can give an explanation, and also give a confidence degree of prediction, so as to facilitate the distinguishing processing.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a device for identifying a person, where the embodiment of the device corresponds to the embodiments of the methods shown in fig. 1, fig. 2, and fig. 3A to fig. 3F, and the embodiment of the device may include the same or corresponding features as the embodiments of the methods shown in fig. 1, fig. 2, and fig. 3A to fig. 3F, in addition to the features described below, and produce the same or corresponding effects as the embodiments of the methods shown in fig. 1, fig. 2, and fig. 3A to fig. 3F. The device can be applied to various electronic equipment.
As shown in fig. 4, the apparatus 400 for identifying the identity of a person in this embodiment includes: a first determination unit 401, a second determination unit 402, and a first input unit 403. The first determining unit 401 is configured to determine at least two identical target person identifiers from a predetermined person identifier set, where the at least two target person identifiers have different association relationship information; a second determining unit 402, configured to determine group organization information corresponding to the at least two target person identifiers respectively from association relationship information that the at least two target person identifiers have, where the group organization information is information of a group organization in which a person indicated by the person identifier is located; a first input unit 403, configured to input the determined group organization information to a recognition model pre-constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model, wherein the recognition result represents whether the persons indicated by the at least two target person identifications are the same person.
In this embodiment, the first determination unit 401 of the apparatus 400 for identifying a person identity may determine at least two identical target person identities from a predetermined set of person identities. The at least two target person identifications have different association relation information.
In this embodiment, the second determining unit 402 may determine group organization information corresponding to the at least two target person identifiers from the association relationship information of the at least two target person identifiers, respectively. Wherein the group organization information is information of a group organization in which the person indicated by the person identifier is located.
In this embodiment, the first input unit 403 may input the determined group organization information to a recognition model that is constructed in advance based on the lightweight gradient elevator, and generate a recognition result via the recognition model. And the identification result represents whether the people indicated by the at least two target people identifications are the same person or not.
In some optional implementations of the present embodiment, the first input unit 403 includes:
a first selecting subunit (not shown in the figure) configured to select, in response to the number of group organizations indicated by the determined group organization information being greater than or equal to a preset first number threshold, a preset second number threshold of group organization information from the determined group organization information;
a first input subunit (not shown in the figure) configured to input the selected group organization information to a recognition model that is constructed in advance based on the lightweight gradient elevator, and generate a recognition result via the recognition model.
In some optional implementations of the present embodiment, the first input unit 403 includes:
a first determining subunit (not shown in the figure), configured to determine, in response to that the number of the group organizations indicated by the determined group organization information is greater than or equal to a preset first number threshold, an area where the group organizations indicated by the determined group organization information are located, resulting in an area set;
a second selecting subunit (not shown in the figure), configured to select, for an area in the area set, at least one group organization information from the group organization information of the group organization located in the area;
a second input subunit (not shown in the figure) configured to input the selected group organization information to a recognition model pre-constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model.
In some optional implementations of the present embodiment, the first input unit 403 includes:
a third input subunit (not shown in the figure) configured to input the determined group organization information to a recognition model pre-constructed based on the lightweight gradient elevator, determine a euclidean distance between the group organization information corresponding to the at least two target person identifiers via the recognition model, and generate a recognition result based on the euclidean distance.
In some optional implementations of this embodiment, the apparatus 400 further includes:
a second input subunit (not shown in the figure) configured to input the recognition model and the determined group organization information to a pre-trained interpretation model, and generate interpretation information of the recognition model for the recognition result via the interpretation model.
In some optional implementations of this embodiment, the apparatus 400 further includes:
an obtaining unit (not shown in the figure) configured to obtain a sample information set, where the sample information in the sample information set includes group organization information and pre-labeled tag information corresponding to the group organization information, and the tag information represents whether two personnel identifications corresponding to the group organization information indicate the same personnel;
a third determining unit (not shown in the figure) configured to determine a training sample set from the sample information set;
and a training unit (not shown in the figure) configured to train the group organization information included in the training samples in the training sample set as input data, and the label information corresponding to the input data as expected output data to obtain the recognition model.
In some optional implementations of this embodiment, the apparatus 400 further includes:
a fourth determining unit (not shown in the figure) configured to determine a verification sample set from the sample information sets;
a first calculating unit (not shown in the figures) configured to calculate at least one of an area under a working characteristic curve of the subject, an accuracy and a recall rate of the identification model based on the set of verification samples.
In some optional implementation manners of this embodiment, the association relationship information is represented by nodes or edges in a predetermined knowledge graph, and the person identifier is represented by the nodes in the knowledge graph; and
the first determination unit 401 includes:
a second determining subunit (not shown in the figure) configured to determine at least two identical target person identifications from the set of person identifications characterized by the nodes in the knowledge-graph.
In some optional implementations of this embodiment, the apparatus 400 further includes:
a marking unit (not shown in the figure) configured to mark a plurality of target person identifiers representing the same person in the knowledge graph by using the same cluster identifier based on the recognition result;
and the second calculation unit (not shown in the figure) is configured to calculate at least one of the accuracy degree, the comprehensiveness degree and the accuracy degree of the recognition result of the knowledge graph based on the cluster identification of the target person mark in the knowledge graph.
In some optional implementations of this embodiment, the group organization information includes company information; and
the second determining unit 402 includes:
a third determining subunit (not shown in the figure), configured to determine company information corresponding to the at least two target person identifiers respectively from the association relationship information that the at least two target person identifiers have; and
the first input unit 401 includes:
a third input subunit (not shown in the figure) configured to input at least one of the following to a recognition model pre-constructed based on the lightweight gradient elevator, and generate a recognition result via the recognition model:
a relationship characteristic between the determined at least two corporate information;
company information having an association relationship with the determined at least two company information;
and the person information has an association relation with the determined at least two company information.
In some optional implementations of the present embodiment, the dimension of the feature data used for training the recognition model is greater than or equal to 100.
In the apparatus 400 for identifying the identity of a person provided in the above embodiments of the present disclosure, the first determining unit 401 may determine at least two identical target person identifications from a predetermined set of person identifications, wherein the at least two target person identifiers have different association relationship information, and then the second determining unit 402 may determine group organization information corresponding to the at least two target person identifiers from the association relationship information of the at least two target person identifiers, wherein the group organization information is information of a group organization in which the person indicated by the person identification is located, and finally, the first input unit 403 may input the determined group organization information to a recognition model previously constructed based on the lightweight gradient elevator, generate a recognition result via the recognition model, and the identification result represents whether the people indicated by the at least two target people identifications are the same person or not. Therefore, in the embodiment of the disclosure, whether the persons indicated by the two target person identifications are the same person or not can be identified based on the identification model pre-constructed by the lightweight gradient elevator, so that the representation capability and performance can be improved by selecting the lightweight gradient elevator, and the accuracy of identity identification is improved.
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 5. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom.
FIG. 5 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 5, the electronic device 5 includes one or more processors 501 and memory 502.
The processor 501 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
Memory 502 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 501 to implement the method for identifying a person identity and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device may further include: an input device 503 and an output device 504, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is a first device or a second device, the input device 503 may be the microphone or the microphone array described above for capturing the input signal of the sound source. When the electronic device is a stand-alone device, the input means 503 may be a communication network connector for receiving the acquired input signals from the first device and the second device.
The input device 503 may also include, for example, a keyboard, a mouse, and the like. The output device 504 may output various information to the outside, including the determined distance information, direction information, and the like. The output devices 504 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 5, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device may include any other suitable components, depending on the particular application.
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of identification of a person's identity according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (13)

1. A method for identifying a person, the method comprising:
determining at least two identical target personnel identifications from a predetermined personnel identification set, wherein the at least two target personnel identifications have different association relation information;
respectively determining group organization information corresponding to the at least two target personnel identifications from the incidence relation information of the at least two target personnel identifications, wherein the group organization information is information of a group organization in which the personnel indicated by the personnel identification are located;
and inputting the determined group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model, wherein the identification result represents whether the persons indicated by the at least two target person identifications are the same person.
2. The method of claim 1, wherein the inputting the determined community organization information to a pre-constructed recognition model based on a lightweight gradient elevator, generating recognition results via the recognition model, comprises:
selecting a preset second quantity threshold of group organization information from the determined group organization information in response to the fact that the quantity of the group organization indicated by the determined group organization information is larger than or equal to a preset first quantity threshold;
and inputting the selected group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model.
3. The method of claim 1, wherein the inputting the determined community organization information to a pre-constructed recognition model based on a lightweight gradient elevator, generating recognition results via the recognition model, comprises:
in response to the fact that the number of the group organizations indicated by the determined group organization information is larger than or equal to a preset first number threshold, determining the area where the group organizations indicated by the determined group organization information are located, and obtaining an area set;
aiming at the area in the area set, selecting at least one group organization information from the group organization information of the group organization in the area;
and inputting the selected group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model.
4. The method according to one of claims 1 to 3, wherein the step of inputting the determined community organization information into a pre-constructed recognition model based on the lightweight gradient elevator, and generating a recognition result through the recognition model comprises the following steps:
inputting the determined group organization information into a pre-constructed recognition model based on a lightweight gradient elevator, determining Euclidean distance between the group organization information corresponding to the at least two target personnel identifications through the recognition model, and generating a recognition result based on the Euclidean distance.
5. The method according to one of claims 1 to 4, characterized in that the method further comprises:
inputting the recognition model and the determined group organization information to a pre-trained interpretation model, and generating interpretation information of the recognition model for the recognition result via the interpretation model.
6. The method according to one of claims 1 to 5, characterized in that the method further comprises:
acquiring a sample information set, wherein the sample information in the sample information set comprises group organization information and pre-labeled label information corresponding to the group organization information, and the label information represents whether two personnel identifications corresponding to the group organization information indicate the same personnel;
determining a training sample set from the sample information set;
and taking the group organization information included in the training samples in the training sample set as input data, taking the label information corresponding to the input data as expected output data, and training to obtain the recognition model.
7. The method of claim 6, further comprising:
determining a verification sample set from the sample information set;
and calculating at least one of the area under the working characteristic curve, the accuracy, the precision and the recall rate of the subject of the identification model based on the verification sample set.
8. The method according to one of claims 1 to 7, characterized in that the incidence relation information is characterized by nodes or edges in a predetermined knowledge graph, and the person identification is characterized by nodes in the knowledge graph; and
the determining at least two identical target person identifications from a predetermined set of person identifications includes:
determining at least two identical target person identifications from a set of person identifications characterized by nodes in the knowledge-graph.
9. The method of claim 8, further comprising:
marking a plurality of target personnel identifications representing the same personnel in the knowledge graph by adopting the same clustering identification based on the identification result;
and calculating at least one of accuracy, comprehensiveness and accuracy of the recognition result aiming at the cluster identification based on each cluster identification marked by the target personnel in the knowledge graph.
10. The method of any one of claims 1-9, wherein the community organization information includes corporate information; and
the determining group organization information respectively corresponding to the at least two target person identifications from the association relationship information of the at least two target person identifications respectively comprises:
determining company information corresponding to the at least two target personnel identifications respectively from the incidence relation information of the at least two target personnel identifications respectively; and
the step of inputting the determined group organization information into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model comprises the following steps:
inputting at least one of the following items into a pre-constructed identification model based on the lightweight gradient elevator, and generating an identification result through the identification model:
a relationship characteristic between the determined at least two corporate information;
company information having an association relationship with the determined at least two company information;
and the person information has an association relation with the determined at least two company information.
11. The method according to one of claims 1 to 10, characterized in that the dimension of the feature data used for training the recognition model is greater than or equal to 100.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 11.
13. A computer program product having a computer program stored thereon, characterized in that the computer program, when being executed by a processor, carries out the method of any one of the preceding claims 1-11.
CN202111161578.1A 2021-09-30 2021-09-30 Method for identifying person identity, storage medium and computer program product Pending CN113886779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111161578.1A CN113886779A (en) 2021-09-30 2021-09-30 Method for identifying person identity, storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111161578.1A CN113886779A (en) 2021-09-30 2021-09-30 Method for identifying person identity, storage medium and computer program product

Publications (1)

Publication Number Publication Date
CN113886779A true CN113886779A (en) 2022-01-04

Family

ID=79004898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111161578.1A Pending CN113886779A (en) 2021-09-30 2021-09-30 Method for identifying person identity, storage medium and computer program product

Country Status (1)

Country Link
CN (1) CN113886779A (en)

Similar Documents

Publication Publication Date Title
US20220179620A1 (en) System and method for enriching datasets while learning
US20170109657A1 (en) Machine Learning-Based Model for Identifying Executions of a Business Process
US9195910B2 (en) System and method for classification with effective use of manual data input and crowdsourcing
CN109670267B (en) Data processing method and device
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
CN111222976B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
US20170109639A1 (en) General Model for Linking Between Nonconsecutively Performed Steps in Business Processes
US20170109638A1 (en) Ensemble-Based Identification of Executions of a Business Process
CN113554175B (en) Knowledge graph construction method and device, readable storage medium and terminal equipment
CN112818162A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
US20170109640A1 (en) Generation of Candidate Sequences Using Crowd-Based Seeds of Commonly-Performed Steps of a Business Process
CN113656699B (en) User feature vector determining method, related equipment and medium
CN111259975B (en) Method and device for generating classifier and method and device for classifying text
US20170109637A1 (en) Crowd-Based Model for Identifying Nonconsecutive Executions of a Business Process
US20170109670A1 (en) Crowd-Based Patterns for Identifying Executions of Business Processes
US20210124748A1 (en) System and a method for resource data classification and management
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
CN114491084B (en) Self-encoder-based relation network information mining method, device and equipment
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN113886779A (en) Method for identifying person identity, storage medium and computer program product
CN112989217B (en) System for managing human veins
CN111400413B (en) Method and system for determining category of knowledge points in knowledge base
CN113468604A (en) Big data privacy information analysis method and system based on artificial intelligence
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination