CN111611531A - Personnel relationship analysis method and device and electronic equipment - Google Patents

Personnel relationship analysis method and device and electronic equipment Download PDF

Info

Publication number
CN111611531A
CN111611531A CN202010430279.2A CN202010430279A CN111611531A CN 111611531 A CN111611531 A CN 111611531A CN 202010430279 A CN202010430279 A CN 202010430279A CN 111611531 A CN111611531 A CN 111611531A
Authority
CN
China
Prior art keywords
node
probability
window
vector
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010430279.2A
Other languages
Chinese (zh)
Other versions
CN111611531B (en
Inventor
陆韵
李冰
沈俊青
孙云
江易
舒塘皓
郑申俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Chinaoly Technology Co ltd
Original Assignee
Hangzhou Chinaoly Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Chinaoly Technology Co ltd filed Critical Hangzhou Chinaoly Technology Co ltd
Priority to CN202010430279.2A priority Critical patent/CN111611531B/en
Publication of CN111611531A publication Critical patent/CN111611531A/en
Application granted granted Critical
Publication of CN111611531B publication Critical patent/CN111611531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a personnel relation analysis method, a device and electronic equipment, relating to the technical field of relation analysis, and comprising the steps of obtaining behavior information between people and calculating to obtain side weight according to the behavior information; under the condition that the last node is determined, the selection probability of different paths is determined by the edge weight, and a node sequence is obtained by random sampling; aiming at the node sequences, a log-likelihood function is used to maximize the probability of the node sequences, so that the optimal embedded vector is obtained; clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model; establishing surrounding circles of the same cluster based on the embedded vectors, and calculating the radius value of the surrounding circles; and calculating to obtain the personal affinity value according to the radius value and the edge weight. The invention effectively improves the reliability of the affinity value of the calculator.

Description

Personnel relationship analysis method and device and electronic equipment
Technical Field
The present invention relates to the field of relational analysis technologies, and in particular, to a method and an apparatus for analyzing a person relationship, and an electronic device.
Background
In the prior art, in order to calculate the intimacy degree of the person, a graph algorithm or a graph mining technology is usually adopted for calculation, and these methods usually perform weighted summation on certain behavior times between people so as to calculate the intimacy degree of the person. However, different liveness, same behavior and times may exist in different communities, and the person-to-person relationship in some inactive communities should have higher affinity than that in the active community, and the weighted summation causes inaccurate calculation.
Disclosure of Invention
The invention aims to provide a personnel relationship analysis method, a personnel relationship analysis device and electronic equipment, which can effectively improve the reliability of calculating the affinity value of a person.
In a first aspect, the present invention provides a method for analyzing a person relationship, including:
acquiring behavior information between people, and calculating according to the behavior information to obtain side weight;
determining the selection probability of different paths under the condition that the last node is determined by the edge weight, randomly sampling to obtain a node sequence, and maximizing the probability of the node sequence by using a log-likelihood function aiming at the node sequence so as to obtain an optimal embedded vector;
clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model;
establishing a bounding circle of the same cluster based on the embedding vector, and calculating a radius value of the bounding circle;
and calculating to obtain a personnel affinity value according to the radius value and the edge weight.
Further, the step of calculating an edge weight according to the behavior information includes:
calculating the edge weight according to the following equation:
Figure BDA0002500280270000021
wherein the content of the first and second substances,
Figure BDA0002500280270000022
is the frequency of action between two persons, krIs the action weight, r is the action category.
Further, the step of determining, by the edge weight, a selection probability of different paths under the condition that the previous node is determined, obtaining a node sequence by random sampling, and maximizing a probability of occurrence of the node sequence by using a log-likelihood function for the node sequence, thereby obtaining an optimal embedding vector, includes:
establishing a personnel relationship graph according to the edge weight, and defining the personnel relationship graph as G (V, E), wherein V is a node set, and V is a nodei∈ V represents the ith person in the figure, E is the set of edges, E (V)i,vj) ∈ E represents node v in the figureiAnd vjThe actual meaning of the edge between the ith person and the jth person;
according to a node s in the node set1As a starting point, random walk is performed based on the edge weight calculation adoption probability to generate a node sequence, and the node sequence is marked as S ═ S1,s2,...,sn},si∈ V, using the probability formula:
Figure BDA0002500280270000023
wherein s istAnd st-1The nodes respectively represent the current time and the last time, and the physical meaning of the nodes is the last time nodeUnder the condition that the point is v, the current node selects the probability of u;
traversing the sequence S by taking 2w +1 as the window length, and obtaining one node S in each iteration processiA window W (i) as a center, wherein the center node si∈ S, window W (i) ═ S [ i-w: i + w];
Establishing a probability formula based on the window W (i), wherein the probability formula is calculated by the following formula:
Figure BDA0002500280270000031
wherein s isiIs the window center node, skIs window W (i) inner division siAny node outside, P(s)k|si) Is a softmax function, represents siAnd skThe probability of the simultaneous occurrence in one window is calculated by the following formula:
Figure BDA0002500280270000032
wherein v isjDenotes any node in V, f(s)i) Representing an input node siThe embedded vector of (2).
Further, the step of determining, by the edge weight, a selection probability of different paths under the condition that the previous node is determined, obtaining a node sequence by random sampling, and maximizing a probability of occurrence of the node sequence by using a log-likelihood function for the node sequence, thereby obtaining an optimal embedding vector, further includes:
using a log-likelihood function as an objective function for the probability formula, and summing to obtain a total objective function value of all windows in the sequence so as to maximize the probability of occurrence of the node sequence, wherein the calculation formula of the total objective function value is as follows:
Figure BDA0002500280270000033
wherein P (W (i) | s)i) As a summary inside a single windowA rate function, S being a window node sequence;
converting the mapping node of the total objective function value into a mapping vector;
and learning the mapping vector to obtain the vector model.
Further, the step of creating a bounding circle of the same cluster based on the embedding vector includes:
clustering the embedding result by using a DBSCAN method, and dividing all nodes into a plurality of clusters;
randomly scattering all embedded points in each cluster, selecting the first two points, taking the centers of the two points as the circle center, taking half of the Euclidean distance as the radius, and constructing a minimum enclosing circle;
sampling the residual points without putting back, and calculating the radius of the sampling points from the circle center;
judging whether the sampling point is in the circle, if so, continuing to sample, and if not, reconstructing the minimum enclosing circle by taking the connecting line of the new sampling point and another point farthest from the new sampling point as the diameter;
after repeated iteration, when the number of times that the sampling points continuously appear in the circle is larger than the number N which is set in advance, the minimum enclosing circle of the cluster is considered to be obtained.
Further, the step of calculating the personal affinity value according to the radius value and the edge weight includes:
updating the relationship affinity values among all the persons according to the following formula:
Figure BDA0002500280270000041
wherein A isi,jFor the final calculated personal affinity value, mi,jIs an edge weight, RlIs the minimum bounding circle radius value of the ith cluster.
In a second aspect, the present invention provides a personnel relationship analysis apparatus, wherein the obtaining unit is configured to obtain behavior information between people, and calculate a side weight according to the behavior information;
the embedded vector unit is used for determining the selection probability of different paths under the condition that the last node is determined according to the edge weight, randomly sampling to obtain a node sequence, and maximizing the probability of the node sequence by using a log-likelihood function aiming at the node sequence so as to obtain the optimal embedded vector;
the clustering unit is used for clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model;
a bounding circle establishing unit, configured to establish a bounding circle of the same cluster based on the embedding vector, and calculate a radius value of the bounding circle;
and the intimacy value calculating unit is used for calculating to obtain the personnel intimacy value according to the radius value and the edge weight.
Further, the embedded vector unit is further configured to:
establishing a personnel relationship graph according to the edge weight, and defining the personnel relationship graph as G (V, E), wherein V is a node set, and V is a nodei∈ V represents the ith person in the figure, E is the set of edges, E (V)i,vj) ∈ E represents node v in the figureiAnd vjThe actual meaning of the edge between the ith person and the jth person;
according to a node v in the node setiAs a starting point, based on the edge weight calculation and probability, random walk is carried out to generate a node sequence which is recorded as
Figure BDA0002500280270000051
The probability formula is adopted as follows:
Figure BDA0002500280270000052
wherein s istAnd st-1Respectively representing the nodes at the current moment and the previous moment, wherein the physical meaning of the nodes is the probability that the current node selects u under the condition that the node at the previous moment is v;
window 2w +1 for sequence SThe length is traversed, and more than one node s can be obtained in each iteration processiA window W (i) as a center, wherein the center node si∈ S, window W (i) ═ S [ i-w: i + w];
Establishing a probability formula based on the window W (i), wherein the probability formula is calculated by the following formula:
Figure BDA0002500280270000053
wherein s isiIs the window center node, ukIs window W (i) inner division siAny node outside, P (u)k|si) Is a softmax function, represents siAnd ukThe probability of the simultaneous occurrence in one window is calculated by the following formula:
Figure BDA0002500280270000054
wherein v isjRepresenting any node in V, and the function f represents the embedded vector of the input node.
Using a log-likelihood function as an objective function for the probability formula, and summing to obtain a total objective function value of all windows in the sequence so as to maximize the probability of occurrence of the node sequence, wherein the calculation formula of the total objective function value is as follows:
Figure BDA0002500280270000055
wherein P (W (i) | s)i) The probability function is the probability function in a single window, and S is a window node sequence;
converting the mapping node of the total objective function value into a mapping vector;
and learning the mapping vector to obtain the vector model.
In a third aspect, the present invention provides an electronic device, comprising a processor and a memory, wherein the memory stores computer-executable instructions capable of being executed by the processor, and the processor executes the computer-executable instructions to implement the steps of the person relationship analysis method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the steps of the method for human relationship analysis according to the first aspect.
The embodiment of the invention has the following beneficial effects:
the invention provides a personnel relationship analysis method, a personnel relationship analysis device and electronic equipment, wherein side weights are obtained by acquiring behavior information between people and calculating according to the behavior information; establishing a target function according to the edge weight, and training the target function to obtain a vector model; then, clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model; then, establishing surrounding circles of the same cluster based on the embedded vectors, and calculating the radius value of the surrounding circles; and finally, calculating according to the radius value and the edge weight to obtain a personnel affinity value. In the above manner provided by this embodiment, the edge weight is calculated after obtaining the behavior information between people, the target function is established by the edge weight, the vector model is obtained after the target function is trained, the vector model is clustered based on the density clustering algorithm to obtain the embedded vector, the enclosing circle is established according to the embedded vector, and then all clusters in the embedded vector are included, and finally the people's affinity value is calculated according to the radius value and the edge weight of the enclosing circle, so that the problem that the people's relationship in some inactive communities should have higher affinity than that in active communities is avoided, inaccurate calculation is caused by weighted summation, and the reliability of the people's affinity value can be effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for analyzing a person relationship according to an embodiment of the present invention;
FIG. 2 is a flowchart of establishing an objective function according to edge weights according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a personnel relationship analysis apparatus according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Icon: 301-an acquisition unit; 302-embedding a vector unit; 303-a clustering unit; 304-a bounding circle creation unit; 305-an intimacy value calculation unit; 400-a processor; 401-a memory; 402-a bus; 403-communication interface.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Considering that different liveness, same behavior and times may exist in different communities in the prior art, the person-to-person relationship in some inactive communities should have higher affinity than that in the active community, and the weighted summation causes a problem of inaccurate calculation. The invention provides a personnel relationship analysis method, a personnel relationship analysis device and electronic equipment.
To facilitate understanding of the embodiment, a detailed description will be given of a method for analyzing a person relationship disclosed in the embodiment of the present invention.
The first embodiment is as follows:
referring to a flowchart of a human relationship analysis method shown in fig. 1, which may be executed by an electronic device such as a computer, a processor, or the like, the method mainly includes steps S101 to S105:
and step S101, acquiring behavior information between people, and calculating according to the behavior information to obtain the side weight.
In one particular embodiment, the behavioral information may include behavioral data information such as train, same flight, same internet, stay, same temporary, same guard, etc.
Step S102, determining the selection probability of different paths under the condition that the last node is determined by the edge weight, and randomly sampling to obtain a node sequence; and aiming at the node sequences, the probability of the node sequences is maximized by using a log-likelihood function, so that the optimal embedded vector is obtained.
And S103, clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model.
And step S104, establishing surrounding circles of the same cluster based on the embedded vectors, and calculating the radius values of the surrounding circles.
In a specific embodiment, when a bounding circle is established, firstly, embedding points of all clusters in an embedding vector are scattered randomly, the centers of the first two embedding points are taken as the center of a circle, half of the Euclidean distance is taken as a radius to establish the bounding circle, and the bounding circle at the moment is the minimum bounding circle.
And step S105, calculating to obtain a personnel affinity value according to the radius value and the edge weight.
In the above manner provided by this embodiment, the edge weight is calculated after obtaining the behavior information between people, the target function is established by the edge weight, the vector model is obtained after the target function is trained, the vector model is clustered based on the density clustering algorithm to obtain the embedded vector, the enclosing circle is established according to the embedded vector, and then all clusters in the embedded vector are included, and finally the people's affinity value is calculated according to the radius value and the edge weight of the enclosing circle, so that the problem that the people's relationship in some inactive communities should have higher affinity than that in active communities is avoided, inaccurate calculation is caused by weighted summation, and the reliability of the people's affinity value can be effectively improved.
In specific implementation, the step of calculating the edge weight according to the behavior information includes:
the edge weight is calculated according to the following equation (1):
Figure BDA0002500280270000091
wherein the content of the first and second substances,
Figure BDA0002500280270000092
is the frequency of action between two persons, krIs the action weight, r is the action category.
In specific implementation, referring to a flow chart shown in fig. 2 for establishing an objective function according to edge weights, determining, by the edge weights, selection probabilities of different paths in a case where a previous node is determined, randomly sampling to obtain a node sequence, and maximizing a probability of occurrence of the node sequence by using a log-likelihood function for the node sequence, thereby obtaining an optimal embedded vector, the method includes the following steps S201 to S203:
step S201, establishing a personnel relation graph according to the edge weight, and defining the personnel relation graph as G (V, E), wherein V is a node set, and V is a nodei∈ V represents the ith person in the figure, E is the set of edges, E (V)i,vj) ∈ E represents node v in the figureiAnd vjThe actual meaning of the edge between the ith person and the jth person.
Step S202, according to a node S in the node set1As a starting point, random walk is performed based on the edge weight calculation adoption probability to generate a node sequence, and the node sequence is marked as S ═ S1,s2,...,sn},si∈ V, using probability formula (2) as:
Figure BDA0002500280270000093
wherein s istAnd st-1Respectively representing the nodes at the current moment and the previous moment, wherein the physical meaning of the nodes is the probability that the current node selects u under the condition that the node at the previous moment is v;
traversing the sequence S by taking 2w +1 as the window length, and obtaining one node S in each iteration processiA window W (i) as a center, wherein the center node si∈ S, window W (i) ═ S [ i-w: i + w]。
Step S203, establishing a probability formula based on the window w (i), wherein the calculation formula of the probability formula is (3):
Figure BDA0002500280270000101
wherein s isiIs the window center node, ukIs window W (i) inner division siAny node outside, P (u)k|si) Is a softmax function, represents siAnd ukAnd the probability appearing in a window simultaneously is calculated by the following formula (4):
Figure BDA0002500280270000102
wherein v isjRepresenting any node in V, and the function f represents the embedded vector of the input node.
In a specific implementation mode, firstly, selecting a node vi in a node set V, randomly walking with the node vi as a starting point, generating a node sequence S with a length l, making i equal to 1, selecting a certain node Si in the sequence, generating a window W (i) with a length W with the Si as a center node, further obtaining an objective function, then judging whether i is greater than or equal to 1, and if i is less than 1, reselecting the certain node Si in the sequence; and if the target function is greater than or equal to 1, training the target function to obtain a vector model.
In specific implementation, the edge weight determines the selection probability of different paths under the condition that the last node is determined, a node sequence is obtained through random sampling, and a log-likelihood function is used for the node sequence to maximize the probability of the node sequence, so that the optimal embedded vector is obtained, wherein the step comprises the following steps of a to c:
step a, using a log-likelihood function as an objective function for the probability formula, and summing to obtain a total objective function value of all windows in the sequence, so as to maximize the probability of occurrence of the node sequence, wherein a calculation formula (5) of the total objective function value is as follows:
Figure BDA0002500280270000103
wherein P (W (i) | s)i) The probability function is the probability function in a single window, and S is a window node sequence;
b, converting the mapping node of the total objective function value into a mapping vector;
and c, learning the mapping vector to obtain a vector model.
In specific implementation, the step of establishing the surrounding circles of the same cluster based on the embedded vectors comprises the following steps 1-4:
and step 1, clustering the embedding results by using a DBSCAN method, and dividing all nodes into a plurality of clusters.
And 2, randomly scattering all embedded points in each cluster, selecting the first two points, taking the centers of the two points as the circle center, taking half of the Euclidean distance as the radius, and constructing a minimum enclosing circle.
And 3, sampling the residual points without putting back, and calculating the radius of the sampling point from the circle center.
Step 4, judging whether the sampling point is in the circle, if so, continuing to sample, and if not, reconstructing the minimum enclosing circle by taking the connecting line of the new sampling point and another point farthest from the point as the diameter;
after repeated iteration, when the number of times that the sampling points continuously appear in the circle is larger than the number N which is set in advance, the minimum enclosing circle of the cluster is considered to be obtained.
In a specific embodiment, the process of expanding the circle based on the remaining nodes until the remaining nodes are all within the circle is as follows: and performing non-replacement sampling on the residual nodes, calculating the radius of the sampling point from the circle center, and if the radius of the sampling point from the circle center is greater than or equal to the radius of the surrounding circle, increasing the radius value of the surrounding circle and continuously expanding the surrounding circle. And in addition, a value N is selected as the minimum continuous times, and as long as the residual nodes are randomly selected in the circle in the continuous N times, the surrounding circle is built.
In specific implementation, the step of calculating the personnel affinity value according to the radius value and the edge weight comprises the following steps:
the personal affinity value is calculated according to the following equation (6):
Figure BDA0002500280270000111
wherein A is the personal affinity value mi,jIs the edge weight and r is the radius value.
Example two:
referring to fig. 3, a schematic diagram of a personnel relationship analysis apparatus includes:
the obtaining unit 301 is configured to obtain behavior information between people, and calculate a side weight according to the behavior information.
And an embedded vector unit 302, configured to determine, by the edge weight, a selection probability of different paths under the condition that the previous node is determined, randomly sample to obtain a node sequence, and maximize, by using a log-likelihood function for the node sequence, a probability of occurrence of the node sequence, thereby obtaining an optimal embedded vector.
And the clustering unit 303 is configured to perform clustering on the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model.
And a bounding circle establishing unit 304 for establishing a bounding circle of the same cluster based on the embedding vector and calculating a radius value of the bounding circle.
And the affinity value calculation unit 305 is used for calculating the affinity value of the person according to the radius value and the edge weight.
In the device provided by this embodiment, the edge weight is calculated after behavior information between people is acquired, the sampling probability is calculated by the edge weight to obtain an adopted sequence, the conditional probability function is calculated for a window in the sequence, the objective function is established, the vector model is obtained after the objective function is trained, the vector model is clustered based on a density clustering algorithm to obtain an embedded vector, a bounding circle is established according to the embedded vector to further include all clusters in the embedded vector, and the personal affinity value is finally calculated according to the radius value and the edge weight of the bounding circle.
In practical implementation, the obtaining unit 301 is further configured to calculate the edge weight according to the following equation (7):
Figure BDA0002500280270000121
wherein the content of the first and second substances,
Figure BDA0002500280270000122
is the frequency of action between two persons, krIs the action weight, r is the action category.
In particular implementation, the embedded vector unit 302 is further configured to:
establishing a personnel relationship graph according to the edge weight, and defining the personnel relationship graph as G (V, E), wherein V is a node set, and V is a nodei∈ V represents the ith person in the figure, E is the set of edges, E (V)i,vj) ∈ E represents node v in the figureiAnd vjThe actual meaning of the edge between the ith person and the jth person;
according to a node s in the node set1As a starting point, the probability is adopted based on the edge weight calculation, and the following is carried outThe machine walks, a node sequence is generated, and is marked as S ═ S1,s2,...,sn},si∈ V, using the probability formula as (8):
Figure BDA0002500280270000131
wherein s istAnd st-1Respectively representing the nodes at the current moment and the previous moment, wherein the physical meaning of the nodes is the probability that the current node selects u under the condition that the node at the previous moment is v;
traversing the sequence S by taking 2w +1 as the window length, and obtaining one node S in each iteration processiA window W (i) as a center, wherein the center node si∈ S, window W (i) ═ S [ i-w: i + w];
Establishing a probability formula based on the window W (i), wherein the probability formula is as shown in formula (9):
Figure BDA0002500280270000132
wherein s isiIs the window center node, ukIs window W (i) inner division siAny node outside, P (u)k|si) Is a softmax function, represents siAnd ukAnd the probability of the two-dimensional data appearing in a window is calculated according to the formula (10):
Figure BDA0002500280270000133
wherein v isjRepresenting any node in V, and the function f represents the embedded vector of the input node.
In particular implementation, the embedded vector unit 302 is further configured to:
using a log-likelihood function as an objective function for the probability formula, and summing to obtain a total objective function value of all windows in the sequence so as to maximize the probability of occurrence of the node sequence, wherein a calculation formula (11) of the total objective function value is as follows:
Figure BDA0002500280270000141
wherein P (W (i) | s)i) S is a probability function inside a single window, and is a window node sequence.
And converting the mapping node of the total objective function value into a mapping vector.
And learning the mapping vector to obtain a vector model.
In practical implementation, the bounding circle creating unit 304 is further configured to:
clustering the embedding result by using a DBSCAN method, and dividing all nodes into a plurality of clusters;
randomly scattering all embedded points in each cluster, selecting the first two points, taking the centers of the two points as the circle center, taking half of the Euclidean distance as the radius, and constructing a minimum enclosing circle;
sampling the residual points without putting back, and calculating the radius of the sampling points from the circle center;
judging whether the sampling point is in the circle, if so, continuing to sample, and if not, reconstructing the minimum enclosing circle by taking the connecting line of the new sampling point and another point farthest from the new sampling point as the diameter;
after repeated iteration, when the number of times that the sampling points continuously appear in the circle is larger than the number N which is set in advance, the minimum enclosing circle of the cluster is considered to be obtained.
In practical implementation, the affinity value calculating unit 305 is further configured to update the affinity values of the relationships between all the persons according to the following equation (12):
Figure BDA0002500280270000142
wherein A isi,jFor the final calculated personal affinity value, mi,jIs an edge weight, RlIs the minimum bounding circle radius value of the ith cluster.
The embodiment of the invention also provides electronic equipment which comprises a processor and a memory, wherein the memory stores computer executable instructions capable of being executed by the processor, and the processor executes the computer executable instructions to realize the steps of the human relationship analysis method in the embodiment.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device includes: a processor 400, a memory 401, a bus 402 and a communication interface 403, wherein the processor 400, the communication interface 403 and the memory 401 are connected through the bus 402; the processor 400 is used to execute executable modules, such as computer programs, stored in the memory 401.
The Memory 401 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 403 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 402 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory 401 is configured to store a program, and the processor 400 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 400, or implemented by the processor 400.
Processor 400 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 400. The Processor 400 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 401, and the processor 400 reads the information in the memory 401 and completes the steps of the method in combination with the hardware.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by the processor 400 to perform the steps of the human relationship analysis method according to the embodiment.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A person relationship analysis method, comprising:
acquiring behavior information between people, and calculating according to the behavior information to obtain side weight;
determining the selection probability of different paths under the condition that the last node is determined by the edge weight, randomly sampling to obtain a node sequence, and maximizing the probability of the node sequence by using a log-likelihood function aiming at the node sequence so as to obtain an optimal embedded vector;
clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model;
establishing a bounding circle of the same cluster based on the embedding vector, and calculating a radius value of the bounding circle;
and calculating to obtain a personnel affinity value according to the radius value and the edge weight.
2. The method of claim 1, wherein the step of calculating the edge weight according to the behavior information comprises:
calculating the edge weight according to the following equation:
Figure FDA0002500280260000011
wherein the content of the first and second substances,
Figure FDA0002500280260000012
is the frequency of action between two persons, krIs the action weight, r is the action category.
3. The method according to claim 1, wherein the step of determining the selection probability of different paths in the case of the last node determination by the edge weights, randomly sampling to obtain a node sequence, and using a log-likelihood function to maximize the probability of occurrence of the node sequence for the node sequence, thereby obtaining the optimal embedded vector comprises:
establishing a personnel relationship graph according to the edge weight, and defining the personnel relationship graph as G (V, E), wherein V is a node set, and V is a nodei∈ V represents the ith person in the people relationship graph, E is the set of edges, E (V)i,vj) ∈ E representing node v in the people relationship graphiAnd vjThe practical meaning of the edge between isThe relationship between i person and j;
according to a node s in the node set1As a starting point, random walk is performed based on the edge weight calculation adoption probability to generate a node sequence, and the node sequence is marked as S ═ S1,s2,...,sn},si∈ V, using the probability formula:
Figure FDA0002500280260000021
wherein s istAnd st-1Respectively representing the nodes at the current moment and the previous moment, wherein the physical meaning of the nodes is the probability that the current node selects u under the condition that the node at the previous moment is v;
traversing the sequence S by taking 2w +1 as the window length, and obtaining one node S in each iteration processiA window W (i) as a center, wherein the center node si∈ S, window W (i) ═ S [ i-w: i + w];
Establishing a probability formula based on the window W (i), wherein the probability formula is calculated by the following formula:
Figure FDA0002500280260000022
wherein s isiIs the window center node, skIs window W (i) inner division siAny node outside, P(s)k|si) Is a softmax function, represents siAnd skThe probability of the simultaneous occurrence in one window is calculated by the following formula:
Figure FDA0002500280260000023
wherein v isjDenotes any node in V, f(s)i) Representing an input node siThe embedded vector of (2).
4. The method according to claim 3, wherein the step of determining the selection probability of different paths in the case of the last node determination by the edge weights, randomly sampling to obtain a node sequence, and using a log-likelihood function to maximize the probability of occurrence of the node sequence for the node sequence, thereby obtaining the optimal embedded vector further comprises:
using a log-likelihood function as an objective function for the probability formula, and summing to obtain a total objective function value of all windows in the sequence so as to maximize the probability of occurrence of the node sequence, wherein the calculation formula of the total objective function value is as follows:
Figure FDA0002500280260000031
wherein P (W (i) | s)i) The probability function is the probability function in a single window, and S is a window node sequence;
converting the mapping node of the total objective function value into a mapping vector;
and learning the mapping vector to obtain the vector model.
5. The method of claim 1, wherein the step of creating a bounding circle of the same cluster based on the embedding vector comprises:
clustering the embedding result by using a DBSCAN method, and dividing all nodes into a plurality of clusters;
randomly scattering all embedded points in each cluster, selecting the first two points, taking the centers of the two points as the circle center, taking half of the Euclidean distance as the radius, and constructing a minimum enclosing circle;
sampling the residual points without putting back, and calculating the radius of the sampling points from the circle center;
judging whether the sampling point is in the circle, if so, continuing to sample, and if not, reconstructing the minimum enclosing circle by taking the connecting line of the new sampling point and another point farthest from the new sampling point as the diameter;
after repeated iteration, when the number of times that the sampling points continuously appear in the circle is larger than the number N which is set in advance, the minimum enclosing circle of the cluster is considered to be obtained.
6. The method of claim 1, wherein the step of calculating the personal affinity value based on the radius value and the edge weight comprises:
updating the relationship affinity values among all the persons according to the following formula:
Figure FDA0002500280260000032
wherein A isi,jFor the final calculated personal affinity value, mi,jIs an edge weight, RlIs the minimum bounding circle radius value of the ith cluster.
7. A person relationship analysis apparatus, comprising:
the acquiring unit is used for acquiring behavior information between people and calculating side weight according to the behavior information;
the embedded vector unit is used for determining the selection probability of different paths under the condition that the last node is determined according to the edge weight, randomly sampling to obtain a node sequence, and maximizing the probability of the node sequence by using a log-likelihood function aiming at the node sequence so as to obtain the optimal embedded vector;
the clustering unit is used for clustering the vector model based on a density clustering algorithm to obtain embedded vectors of the same cluster in the vector model;
a bounding circle establishing unit, configured to establish a bounding circle of the same cluster based on the embedding vector, and calculate a radius value of the bounding circle;
and the intimacy value calculating unit is used for calculating to obtain the personnel intimacy value according to the radius value and the edge weight.
8. The apparatus of claim 7, wherein the embedded vector unit is further configured to:
establishing a personnel relationship graph according to the edge weight,defining the personnel relationship graph as G (V, E), wherein V is a node set and V is a nodei∈ V represents the ith person in the people relationship graph, E is the set of edges, E (V)i,vj) ∈ E representing node v in the people relationship graphiAnd vjThe actual meaning of the edge between the ith person and the jth person;
according to a node v in the node setiAs a starting point, based on the edge weight calculation and probability, random walk is carried out to generate a node sequence which is recorded as
Figure FDA0002500280260000041
The probability formula is adopted as follows:
Figure FDA0002500280260000042
wherein s istAnd st-1Respectively representing the nodes at the current moment and the previous moment, wherein the physical meaning of the nodes is the probability that the current node selects u under the condition that the node at the previous moment is v;
traversing the sequence S by taking 2w +1 as the window length, and obtaining one node S in each iteration processiA window W (i) as a center, wherein the center node si∈ S, window W (i) ═ S [ i-w: i + w];
Establishing a probability formula based on the window W (i), wherein the probability formula is calculated by the following formula:
Figure FDA0002500280260000051
wherein s isiIs the window center node, ukIs window W (i) inner division siAny node outside, P (u)k|si) Is a softmax function, represents siAnd ukThe probability of the simultaneous occurrence in one window is calculated by the following formula:
Figure FDA0002500280260000052
wherein v isjRepresenting any node in V, and the function f represents an embedded vector of an input node;
using a log-likelihood function as an objective function for the probability formula, and summing to obtain a total objective function value of all windows in the sequence so as to maximize the probability of occurrence of the node sequence, wherein the calculation formula of the total objective function value is as follows:
Figure FDA0002500280260000053
wherein P (W (i) | s)i) The probability function is the probability function in a single window, and S is a window node sequence;
converting the mapping node of the total objective function value into a mapping vector;
and learning the mapping vector to obtain the vector model.
9. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to perform the steps of a method of personal relationship analysis as claimed in any one of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of a method for person relationship analysis according to any one of the preceding claims 1 to 6.
CN202010430279.2A 2020-05-20 2020-05-20 Personnel relationship analysis method and device and electronic equipment Active CN111611531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430279.2A CN111611531B (en) 2020-05-20 2020-05-20 Personnel relationship analysis method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430279.2A CN111611531B (en) 2020-05-20 2020-05-20 Personnel relationship analysis method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111611531A true CN111611531A (en) 2020-09-01
CN111611531B CN111611531B (en) 2023-11-21

Family

ID=72205690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430279.2A Active CN111611531B (en) 2020-05-20 2020-05-20 Personnel relationship analysis method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111611531B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668800A (en) * 2021-01-04 2021-04-16 荣联科技集团股份有限公司 Information processing method, apparatus, medium, and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522489A (en) * 2018-11-27 2019-03-26 杭州中奥科技有限公司 The determination method, apparatus and intelligent terminal of personage's cohesion
US20190114362A1 (en) * 2017-10-12 2019-04-18 Facebook, Inc. Searching Online Social Networks Using Entity-based Embeddings
WO2019114412A1 (en) * 2017-12-15 2019-06-20 阿里巴巴集团控股有限公司 Graphical structure model-based method for credit risk control, and device and equipment
CN110347881A (en) * 2019-06-19 2019-10-18 西安交通大学 A kind of group's discovery method for recalling figure insertion based on path

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114362A1 (en) * 2017-10-12 2019-04-18 Facebook, Inc. Searching Online Social Networks Using Entity-based Embeddings
WO2019114412A1 (en) * 2017-12-15 2019-06-20 阿里巴巴集团控股有限公司 Graphical structure model-based method for credit risk control, and device and equipment
CN109522489A (en) * 2018-11-27 2019-03-26 杭州中奥科技有限公司 The determination method, apparatus and intelligent terminal of personage's cohesion
CN110347881A (en) * 2019-06-19 2019-10-18 西安交通大学 A kind of group's discovery method for recalling figure insertion based on path

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG B,YIH W T,HE X,ET AL.: "Embedding entities and relations for learning and inference in knowledge bases" *
张仲伟;曹雷;陈希亮;寇大磊;宋天挺;: "基于神经网络的知识推理研究综述" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668800A (en) * 2021-01-04 2021-04-16 荣联科技集团股份有限公司 Information processing method, apparatus, medium, and device

Also Published As

Publication number Publication date
CN111611531B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN109544598B (en) Target tracking method and device and readable storage medium
CN112365171B (en) Knowledge graph-based risk prediction method, device, equipment and storage medium
CN111695593A (en) XGboost-based data classification method and device, computer equipment and storage medium
CN110737730B (en) User classification method, device, equipment and storage medium based on unsupervised learning
CN112148767A (en) Group mining method, abnormal group identification method and device and electronic equipment
CN113313053B (en) Image processing method, device, apparatus, medium, and program product
CN109273097B (en) Automatic generation method, device, equipment and storage medium for pharmaceutical indications
EP3786882A1 (en) Movement state recognition model learning device, movement state recognition device, method, and program
CN111160049B (en) Text translation method, apparatus, machine translation system, and storage medium
CN115424053A (en) Small sample image identification method, device and equipment and storage medium
CN111611531A (en) Personnel relationship analysis method and device and electronic equipment
CN114398350A (en) Cleaning method and device for training data set and server
CN111611532B (en) Character relation completion method and device and electronic equipment
CN111767985B (en) Neural network training method, video identification method and device
CN116305289B (en) Medical privacy data processing method, device, computer equipment and storage medium
CN110175516B (en) Biological characteristic model generation method, device, server and storage medium
CN113782092B (en) Method and device for generating lifetime prediction model and storage medium
CN113780394B (en) Training method, device and equipment for strong classifier model
CN113780444B (en) Training method of tongue fur image classification model based on progressive learning
CN113362920B (en) Feature selection method and device based on clinical data
US11676050B2 (en) Systems and methods for neighbor frequency aggregation of parametric probability distributions with decision trees using leaf nodes
CN114780863A (en) Project recommendation method and device based on artificial intelligence, computer equipment and medium
CN114238658A (en) Link prediction method and device of time sequence knowledge graph and electronic equipment
CN111611530B (en) Case and personnel relationship analysis method and device and electronic equipment
CN111782301B (en) Unloading action set acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 310000 room 1408, building 2, Caizhi Shunfeng innovation center, No. 99, housheng street, Gongshu District, Hangzhou City, Zhejiang Province

Applicant after: HANGZHOU CHINAOLY TECHNOLOGY CO.,LTD.

Address before: 2 / F, building A04, 9 Jiusheng Road, Jianggan District, Hangzhou City, Zhejiang Province 310000

Applicant before: HANGZHOU CHINAOLY TECHNOLOGY CO.,LTD.

GR01 Patent grant
GR01 Patent grant