CN107992598B

CN107992598B - Method for mining social relation of group based on video material

Info

Publication number: CN107992598B
Application number: CN201711327006.XA
Authority: CN
Inventors: 李大庆; 张云轩
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2022-03-15
Anticipated expiration: 2037-12-13
Also published as: CN107992598A

Abstract

The invention provides a method for mining social relations of groups based on video materials, which comprises the following steps: firstly, preprocessing a figure image in a video; secondly, calculating the correlation between the two persons, drawing a social relation graph, and mining the group relation corresponding to the characteristic subgraph; analyzing and predicting the group relation; fourthly, analyzing the action expression and the state of the single person; through the steps, data are mined from the video data material with richer contents, and the practical problem of complex relationships among multiple people in the video can be solved, so that the relationships among multiple people in various scenes can be effectively quantified and evaluated; the invention supports real-time analysis of each actual life scene in the future and can provide powerful method support for character relation analysis in multi-person complex scenes.

Description

Method for mining social relation of group based on video material

The technical field is as follows:

the invention provides a method for mining social relations of groups based on video materials. The method relates to face detection, recognition, network structure analysis, social relationship calculation and the like, and belongs to the field of data mining.

Background art:

social relationship mining is a rapidly developing field, is a cross discipline, and deeply fuses computer science and social science. Generally, the relationship between people is difficult to obtain through social investigation means such as direct questionnaires, and the social relationship needs to be indirectly mined through the existing big data means. The social relationship mining has wide application in scientific and technological management, business intelligence, emergency management, anti-terrorism and anti-riot and other aspects.

"social relationships are important to finding significance in life". Individuals can gain various benefits by establishing and maintaining contact with others (Compton, 2005; Lee & Robbins, 1998). For example, identity recognition by many community members is related to happiness: college peers (Connolly, White, Stevens and Burnstein, 1987), work/employment (Haughey, 1993) and minority populations (Branscombe, Schmitt, & Harvey, 1999). Therefore, it is our goal to mine deeper level social relationships of groups through existing data resources (Daniel l.wann, Matthew Brasher, Paula j.waddill, Sagan Ladd).

At present, the social relationship mining mainly comprises the following modes: traditional questionnaires, online social network analysis, cell phone networking, text data mining, mail networking data, travel information, trajectory information, and the like.

For data communication of mobile phones, the literature indicates (Nathan Eagle, alex (sandy) pentaland, and David Lazer, 2009): with the handset call log, 95% of the friendships can be accurately inferred based solely on the observations, where the friends exhibit unique temporal and spatial patterns in their call patterns. Inversely, these behavior patterns can also predict personal-level characteristics, such as work satisfaction. By collecting communication information including location and proximity data of the mobile phone (including related call records, bluetooth devices around five meters, cell phone numbers, application usage and cell phone status, etc.) and comparing the resulting behavioural social network with self-reported relationships from the same group, even friendship and satisfaction among individuals can be analyzed.

The travel information and track aspects, taking GeoLife as an example (Yu Zheng, Xing Xie and Wei-Ying Ma, 2010), provide three key application scenarios: 1) sharing life experience based on the GPS trajectory; 2) general travel recommendations, e.g., locations of most interest, locations in a particular region, and travel sequences given by travel experts; 3) personalized friend and location recommendations. In the location graph, the nodes are locations (there are two types of nodes, i.e., user and location.) and the directed edges between two locations represent at least some users traversing these two locations consecutively in the journey. The user's edge from the starting location to the ending location represents the number of times the user has visited the location. Further, we can infer the number of times two users have visited the same location in the real world, and thus infer the relationship of the two people.

In terms of text data: with the popularity of Web 2.0 applications, more and more Web users are actively publishing text information online (Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai, 2008). These users also often form social networks in various ways, reflecting the simultaneous growth of textual information and network structures (e.g., social networks). Taking blogs as an example, a broad topic and diverse discussion can be found in blog articles, as well as a fast-developing friendship network between bloggers. Researchers regularly publish papers and we can not only get textual information, but naturally also the network of collaborators. For example, two researchers who often collaborate may be researching the same topic, and therefore likely in the same research area. Geographically sensitive events (e.g., katrina hurricane) may also infer more information, e.g., blogs living in nearby locations tend to write similar topics.

The online networking aspect is often combined with the above-described mining of textual information, e.g., to obtain deeper information through forum speech, comments, etc. Including Email, Twitter, blogs, microblogs, forums, and other social networking systems. For example, there is a paper discussion (Christopher p. diehl, Jaime Montemayor, Mike Pekala, 2009): how can mountain-climbing enthusiasts use the currently available technology to find a connection between the blogger and the reader? From the bloggers concerned by the mountain-climbing fans, the bloggers with similar styles and personalities to the mountain-climbing fans are expected to be found. We currently use only information cues and structural cues to determine similar objects. Therefore, the burden of searching bloggers of other mountain-climbing fans is still on the active searcher, and the active searcher needs to go to social contact and participate in related activities. Thus, the article envisions a social search engine that analyzes these digital social artifacts and presents a timeline that describes the relationships between each blog. These timelines present periods of time during which the correlation is particularly active. Review the results and adjust the suggested time period to cover the time period of the relationship sought by the active searcher. Social search engines look for and distinguish different social signals in the language and interaction style of blogs. These signatures are used to rank the climbing bloggers and determine their specific positions, exhibiting a style of interaction similar to that expected by active seekers.

In summary, most of the existing research and algorithms are based on data types such as quantitative characteristics or linguistic analysis to perform indirect social relationship mining and analysis, but the social connection level embodied by the data is superficial, and most of the data is the long-term static relationship accumulation in the social relationship (such as address book data), or the expression of life pattern similarity (such as track data), or the social relationship under a certain interest (such as microblog or blog), and is difficult to embody the deep characteristics of the social relationship in a group with multi-type, multi-interest and multi-behavior patterns. Such as people with different interests, habits, will often also be good friends and exhibit rich group activities and types of relationships. How to mine the dynamic composition and the leading nodes of the relationships among multiple people in the activities is a key problem in mining the deep content of the social relationships. Particularly, group conversation or discussion of the main social activity, from which how to mine social relationships is the main content of the invention.

The invention mainly aims at the video data containing the crowd conversation process, carries out crowd relation mining based on face recognition, and can find potential social sub-networks in the crowd and speaking right masters in the potential social sub-networks according to the analysis result. Specifically, a correlation index between two persons is formed mainly according to various dynamic parameters of the faces of the persons, a correlation network is constructed on the basis, network characteristics are further analyzed, relationships among the persons are predicted by comprehensively analyzing each social graph and each characteristic subgraph structure, and a deep relationship network of the persons appearing in the video is obtained. For example, a group of people who are actively interacting finds out, the relationship of the speaking right holder with other people, and so on. Compared with data in other forms such as numbers, texts and the like, the video image data has richer and more instant connotation, and the mined social relationship is multidimensional, so that the deep structure of crowd interaction is embodied.

Disclosure of Invention

Objects of the invention

The purpose of the invention is: the invention provides a method for mining group social relations based on video materials, which can effectively quantify and evaluate the relationships among multiple people in a video.

The theoretical basis of the invention is as follows: the strength of social relationships between people is manifested as a degree of behavioral relevance. Through social influences, a person's behavior can guide his friends to behave in a similar manner (Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian, 2008). Therefore, the strength and the type of the social relationship between two persons can be judged according to the dynamic correlation between the faces in the conversation process, and further social network mining among multiple persons is formed. On the basis, active small groups in the crowd are found according to the appearance condition of the social graph in each network, and then the complex coordination relationship among multiple people is deduced. The line segments among the characters have directivity, so that the dominant person in the crowd discussion or conversation can be judged, and the active and passive relations among the characters can be distinguished.

(II) technical scheme

The technical solution of the invention is as follows: and establishing construction and analysis of a group social relation graph based on the video material. Firstly, a face detection preprocessing is performed, and a preprocessing area is divided into an upper part and a lower part, namely hair and a face. And extracting relevant pixels, dyeing, counting and analyzing. And secondly, calculating the correlation between every two persons, wherein the correlation result is used for drawing lines and drawing a social graph. The connection line between two people contains statistical information such as significance, directivity and the like. And thirdly, mining, analyzing and predicting the group relationship, wherein the mining comprises the mining of each characteristic subgraph and the mining of key people. For example, the group relationship may be a leader-affiliation, a group-opponent, a friendship, a closer relationship, and so forth. Finally, for a single person, given series of parameters are assigned and sequenced according to different social graph of group relationship, and then the characteristics of each person, including the characteristics of action, language, participation degree, dominance degree and the like, can be judged.

The invention relates to a method for mining social relations of groups based on video materials, which comprises the following steps:

firstly, preprocessing a figure image in a video;

the invention is based on video material data that meet the following requirements:

(a) face information: the face detection condition is met, the back shadow cannot be formed or the whole course cannot be in the rotation angle which cannot be detected, and the face which is intermittent can be used;

(b) time information: the length, definition and frame number of the video; wherein the definition meets the basic face detection resolution;

firstly, carrying out face detection on each frame in a video, carrying out capacity expansion on the obtained square face outline, adding hair parts, and respectively drawing small rectangles in the two parts, wherein the size and the position of each small rectangle are determined by the size of the outline detected by the face;

secondly, averaging each pixel in the small rectangle to obtain the average value of the three colors as the reference value c of the pixel proportion model_b、c_g、c_r(ii) a To achieve the effect of automatically giving a reference value; the reason for this is that the pixels of the head and the face have specific color numerical proportions in many scenes, and the three colors have a certain relatively stable proportional relationship, so that the method is applicable to different scenes and keeps the proportional relationship stable under different light rays; but the proportional relationship is based on the reference value c_b、c_g、c_rAnd a reference value c_b、c_g、c_rThe difference is strong in different light rays in different environments; therefore, in the past, the reference value is usually determined in advance in the test, and then the test is carried out under the same environment to realize the dyeing of the hair and the face; dyeing means dyeing the spots according with the proportional model into red and purple (replaceable) respectively representing hair and face; therefore, an automatic detection mechanism of a small rectangle is arranged to ensure that the target pixel can be automatically dyed in each frame even if scene changes occur due to different light rays;

third, we have obtained a human face with red hair and purple face; counting the number of pixels, and comparing the change condition of each frame, namely judging that one basic action is finished if the red color is increased and then reduced and the change exceeds a certain threshold value m; if the purple color is increased and then decreased and the change exceeds a certain threshold value m, judging that one basic action is finished;

fourthly, a change value delta h is given to the Hs parameter of the character which completes one basic action; simultaneously adding another two variables influencing the parameter Hs, wherein one variable is the offset of a face detection frame (dete frame) and is used for detecting large-amplitude face displacement, and if the change condition of the offset of continuous n frames of coordinates meets a given rule and the variable quantity reaches a set threshold value s, judging that the person completes one basic action; another variable influencing the parameter Hs is the number of frames lasting the disappeared gate frame, which is used to solve the problem of interference caused by the intermittent human face detection frame; our solution is: the frame within m frames disappears and keeps the original continuity, if the frame within m frames still disappears continuously, the strength of the frame is reduced, and a small change value delta h of the Hs parameter is given to show a decreasing trend;

finally, we change the total variation A of the activity coefficient Hs of each character in each frame_r(d) (explained in step two) to store in two dimensional array, call while waiting for to calculate;

step two, calculating the correlation between two persons, drawing a social relation graph, and mining the group relation corresponding to the characteristic subgraph;

1. an activity value of a single person;

taking every k frames as a time interval in the video, wherein d represents the d-th frame in the k frame interval;

A_r(d)＝Σ_n(δh)

the meaning of the method is that in the frame d, the figure completes the accumulated change value after n times of basic actions, δ h represents different change values corresponding to different basic actions, r is the figure number, and d represents the frame d;

A_r ^y _arv＝{Σ_d＝1 ^k(A_r(d))}/k

meaning a in the y-th time interval_r(d) An average value of the parameter; starting with d ═ 1 and ending with d ═ k, y denotes the y th interval; r is a human number, and d represents the d-th frame;

2. calculating the correlation between every two persons;

performing Correlation calculation (Cross Correlation) between every two characters;

F_τ ^y _(l,r)＝{Σ_d＝1 ^(k-t)|(A_l ^y(d)-A_l ^y _arv)(A_r ^y(d+τ)-A_r ^y _arv)|}/(k-τ)

(τ > 0);

F_τ ^y _(l,r)＝{Σ_d＝1 ^(k+t)|(A_l ^y(d-τ)-A_l ^y _arv)(A_r ^y(d)-A_r ^y _arv)|}/(k+τ)

(τ < 0);

F_τ ^y _(l,r)(τ<0)≡F_τ ^y _(r,l)(τ>0)

F_τ ^y _(l,r)means the average of the sum of the correlation values of (l, r) two persons at a time difference of tau in the y-th time interval; wherein the positive and negative of tau are used for judging the directivity, tau>0 and τ<0 represents the direction of two arrows in different directions respectively;

F_max ^y _(l,r)＝max(F_τ ^y _(l,r)),(-k<τ<k)

F_max ^y _(l,r)indicates that different tau values correspond to different F in the range of the time difference tau epsilon (-k, k) in the y time period_τ ^y _(l,r)Screening out the maximum value in the sum average of the correlation values of the two persons, and reserving the corresponding tau value for judging the arrow direction (as shown in figure 2 and figure 3);

3. drawing a social relationship graph (called a social graph for short);

for the maximum correlation coefficient F between all two persons in the group_max ^y _(l,r)Sorting, setting a selection condition, drawing lines between all (l, r) two persons meeting the condition, and drawing a social graph with a y-period time interval; the social graph is the basis of group relationship mining;

4. excavating a group corresponding to the characteristic subgraph;

the social graph can be divided into characteristic subgraphs such as ray vertexes, triangles, stars, quadrilaterals, pentagons and the like; the main characteristic diagrams are introduced as follows:

ray vertex structure, fig. 4, i.e. two rays have the same vertex; corresponding to the Tn parameter;

the triangle structure, as shown in fig. 5 and fig. 6, i.e. the connection lines are arranged among the three persons to form a triangle; corresponding to the Tr parameter;

the star structure, as shown in fig. 7, is an expansion of the ray vertex structure, and means that a plurality of rays intersect at one point; corresponding to the Tn parameter;

a straight line structure, as shown in fig. 8, represents a connection line between two people, has universality, and appears in each scene; the straight line structure comprises homeopathic straight line connection, as shown in fig. 9, which means that the connection lines among all the characters finally form only one broken line; corresponding to the Lt parameter;

and comprehensively analyzing the corresponding groups of the characteristic subgraphs, and providing preparation work for the analysis and prediction of the group relationship. Analyzing and predicting the group relation;

1. defining the relevant variables:

we define Te, Tr, Tn, Lt, Ct, Hs, R, Ji, Jt, 9 variables for the next calculation; wherein Ct is a matrix, other parameters are all one-dimensional arrays, and the matrix and the sequencing number in the arrays correspond to the serial numbers l, r and the like of all people in the video;

1) the Te parameter reflects the speaking efficiency, the initial value is 0, and if a triangle which does not contain the character appears in a limited frame after the star appears each time, the variation delta n of the character Te parameter is increased; the other condition is that the number of outward pointing arrows of the character is counted every time the character appears in a star shape, and the variation quantity delta n of the corresponding Te value is increased;

2) the Tr parameter reflects the participation degree, is initially 0, and changes the Tr value of a person appearing in a triangle by deltan each time the triangle appears;

3) the Tn parameter corresponds to the number of speaking times, the initial value is 0, and the Tn value of the character at the vertex changes by deltan each time star appears;

4) the Hs parameter corresponds to the frequency and the amplitude of actions and is not completely independent from the speaking times Tn; hs ═ Σ_i＝ ₁ ^fA_r(d) (ii) a Wherein f is apparentA total number of frames; assigning values according to each basic action condition completed in each preprocessed frame, and if the standard is reached, increasing the Hs parameter by A_r(d) A variation amount;

5) the parameter R is Te/Tn; the intuitive meaning is the effective rate of speech; its action is similar to but not exactly the same as Te;

6) ji and Jt are arrow directivity parameters, and the Ji and the Jt of each person are changed by deltan according to the directivity of each line segment; if the arrow points outward, as in fig. 10, the Jt parameter changes by δ n, and if the arrow points inward, as in fig. 11, the Ji parameter changes by δ n; it should be noted that Jt also determines Te parameters, and when a character satisfies the condition appearing at the vertex of the star, Te ═ Te + Jt_(d)(ii) a Wherein Jt_(d)Jt value representing the d frame; d represents the number of frames;

7) the Ct and Lt parameters reflect the degree of interaction between two people in the group; the Lt parameter represents the connection times of two persons, and if the connection occurs, the numerical value changes by deltan; the Ct matrix represents the speech interaction degree of two persons, and the calculation method comprises the steps of extracting a continuous star-shaped sequence, regarding two persons continuously appearing in the sequence front and back as one-time interaction, storing the two persons into matrix positions with corresponding numbers, such as (l, r), sequencing matrix values, and screening out the relationship between the persons with strong or weak interaction degree in a group;

in summary, we can generalize the parameters into three series, which are respectively used to judge the group interaction degree, dominance, action frequency and amplitude;

2. an analysis method of group relation mining;

firstly, analyzing the frequency distribution of Ji and Jt, and excavating the character with stronger dominant force and the character biased to be passive in the group relation; secondly, sorting the speaking times and the occurrence action frequency of each person through Tn and Hs frequency distribution; thirdly, analyzing the frequency distribution of Te, Tr and R, and sequencing the speaking effectiveness and participation degree of each person; fourthly, sorting the strength of the relationship between two persons through the Lt frequency distribution; finally, analyzing Ct matrix data, and sequencing the speech coordination degree and the interaction degree between the human beings;

3. mining and predicting group relation;

we combine all the above information to predict the four-person scene group relationship (left-to-right order) in the example:

group interaction aspect: few conversations exist between the figure No. 1 and the figure No. 2, simple conversations exist, the effect is good, the interaction degree is high, and the matching degree is high; the figure 2 is more closely related to the figure 3 and the figure 4 than the figure 1; person number 2, person number 4, is more likely to be in a talking state, where person number 2 is more active, second order number 4; the number 3 person has high participation degree and high speech effect, but the speech activity is small, so that the situation that the speech content is attractive and boring can be inferred; in addition, the interactive cooperation between No. 3 and No. 4 two persons is very good, which indicates that the two persons have a communication process and have potential tacit understanding on the relationship or topic;

the dominant aspect is as follows: number 2, number 4, is significantly higher, with the number 4 figure highest; figure 4 has the advantages of conversation, high speaking efficiency and good participation, so the prediction dominance is stronger than that of figure 2; the figure 2 has a certain contradiction, namely, the figure has both dominance and passivity;

and (3) action aspect: the actions No. 1 and No. 2 are more, and we can presume that No. 3 and No. 4 have less head actions, and can also presume that No. 1 is not in a group-off state although the participation degree is low;

we attempt to restore the measured four-person scenario in language: good communication processes are respectively carried out among the characters 1,2, 3 and 3,4, and more close conversations exist; the whole four people have close relationships, one possibility is a chat scene with high emotion among friends, and most of the human friendships can be judged to be higher than strangers in terms of intimacy degree; another possibility is that number 1 and number 2 are in an opposite relationship with other people, but the hope is very high from the aspect of status and posture analysis, and the meeting scene or the negotiation scene can be realized;

in summary, most of the prediction results are in line with the actual situation, in the real scene, the number 1 is mother, the number 2 is daughter, the number 3 is son, and the number 4 is father (from left to right); four people are watching old people in a hospital and chatting in a corridor, wherein the talking atmosphere is humorous, and the coordination among the people also conforms to the inference hypothesis in the front; and gives the final saliency drawn line (fig. 12, fig. 13).

Step four, analyzing the action expression and the state of the single person:

from group relation analysis and prediction, the single action expression and character characteristics can be further extracted;

1. single person action expression;

firstly, judging whether the person is active or not and whether more head movements (such as nodding or limb movements) exist or not; secondly, analyzing the speaking times (numerical values represent specific times) of the person and whether the person is healthy to talk; thirdly, judging whether the speaking efficiency of the person is high, whether the action effect is good, whether the influence is generated on other people, and the speaking function is achieved; finally, judging whether the dominant force of the character is high or low, and whether the behavior is active or passive;

in the example, the person 1 is in listening state as a whole and has a small amount of speech; the person No. 2 has more speech, is more active in language and action, can be testimony in voice and harmonious in content, or can summarize the speech, and can win victory in all; the figure 3 also has a little time to participate in the conversation and sends out the tablature, but is relatively quiet and has slightly low liveness; the 4 th person has a little less speaking time;

2. predicting the state of a single person:

figure 4 is more likely to be a de-gao-stressful character or elder, authoritative, powerful, and with figure 1 following it, it may be more powerful but less powerful than the language table; the figure 2 and the figure 3 can belong to the first two persons or have slightly lower prospect and lower posture; wherein, 2 is better at active atmosphere, has more speech but not useless speech, and may be in publishing view or adjusting atmosphere; in addition, the character of the character No. 2 shows more contradictory characters than the character of the character No. 3, and is more likely to have stronger personality or have internal and external emotions of bipolar differentiation;

in conclusion, most of the prediction results accord with the actual situation; in a real scene, the No. 1 is mother, the No. 2 is daughter, the No. 3 is son, and the No. 4 is father; four people are watching old people in a hospital and chatting in a corridor, wherein the talking atmosphere is humorous, and the coordination among the people also conforms to the inference hypothesis in the front; and gives the final saliency drawn line (fig. 12, fig. 13).

The method only retains information required by calculation, belongs to the known technology, and is not repeated in the invention;

through the steps, the data are mined from the video data material with richer content, and the wider video material can be analyzed and processed, such as application in different scenes of friend chatting, work discussion, business negotiation and the like; the real-time calculation is supported, and the data can be converted into data which can be identified and processed by a computer; comprehensive analysis is carried out on more dimensions, group relation change is recorded, the problem of relation variability is solved, and the crowd interaction relation which is smaller in particle size and deeper is obtained; therefore, the practical problem of complex relationships among multiple people in the video is solved, and the relationships among multiple people in various scenes are effectively quantified and evaluated; the invention supports real-time analysis of each actual life scene in the future and can provide powerful method support for character relation analysis in multi-person complex scenes.

(III) advantages and effects

Compared with the traditional method, the analysis method adopted by the invention has the following advantages:

(a) universality: the method breaks through the limitation of only digital or character data and the like in the past, and analyzes the video data material with richer contents. The change of scenes, light rays and the like in the video materials is not limited in use, such as camera lens switching, parameters do not need to be reset after the scenes are replaced, and automatic extraction can be carried out.

(b) Functionality: the mined social relationships can reflect the social relationships in different group activity types (such as friend chatting, work discussion, business negotiation and the like), are smaller in granularity, and reflect the interaction relationships of people in different environments more deeply.

(c) Real-time performance: the algorithm is simple and can support real-time calculation. Complicated relationships among the obscure and abstract figures and relationships needing to be understood and analyzed by human thinking are converted into signals which can be recognized by a computer, and the computer can judge the implicit relationship network of the figures in the video by recognizing each characteristic subgraph.

(d) Multidimensional property: the group relation obtained through the image is more comprehensive and richer, the group relation is comprehensively analyzed on more dimensions, the relation change among the groups is recorded, and the problem of relation variability is solved.

In conclusion, the research result of the new method provides a powerful method support for the analysis of machine vision during computer image processing.

Description of the drawings:

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is a directional diagram of arrows in the social graph of the present invention.

FIG. 3 is a directional diagram of arrows in the social graph of the present invention.

FIG. 4 is a ray vertex structure of the social graph of the present invention.

FIG. 5 is a triangular structure of the "social graph" of the present invention.

FIG. 6 is a triangular structure of the "social graph" of the present invention.

FIG. 7 is a star structure of the social graph of the present invention.

FIG. 8 is a line structure of the "social graph" according to the present invention.

FIG. 9 is a schematic straight line structure of the social graph according to the present invention.

FIG. 10 shows the "social graph" according to the present invention with the arrows pointing outwards.

FIG. 11 shows the "social graph" according to the present invention with the arrows pointing inward.

FIG. 12 is a final saliency map of the social graph of the present invention.

FIG. 13 is a final saliency map of the social graph of the present invention.

Detailed Description

In order to make the technical problems and technical solutions to be solved by the present invention clearer, the following detailed description is made with reference to the accompanying drawings and specific embodiments.

The invention discloses a method for mining social relationships among people based on video data, which comprises the following specific steps as shown in figure 1:

firstly, preprocessing a figure image in a video;

secondly, averaging each pixel in the small rectangle to obtain the average value of the three colors as the reference value c of the pixel proportion model_b、c_g、c_r(ii) a To achieve the effect of automatically giving a reference value; the reason for this is that the pixels of the head and the face have specific color numerical proportions in many scenes, and three primary colors have a certain relatively stable proportional relationship, so that the three primary colors can be suitable for different scenes and keep the proportional relationship stable under different light rays; but the proportional relationship is based on the reference value c_b、c_g、c_rAnd a reference value c_b、c_g、c_rThe difference is strong in different light rays in different environments; therefore, in the past, the reference value is usually determined in advance in the test, and then the test is carried out under the same environment to realize the dyeing of the hair and the face; dyeing means dyeing the spots according with the proportional model into red and purple (replaceable) respectively representing hair and face; therefore, an automatic detection mechanism of a small rectangle is arranged to ensure that the target pixel can be automatically dyed in each frame even if scene changes occur due to different light rays;

1. an activity value of a single person;

A_r(d)＝Σ_n(δh)

A_r ^y _arv＝{Σ_d＝1 ^k(A_r(d))}/k

meaning a in the y-th time interval_r(d) An average value of the parameter; starting from d-1 and ending with d-k, y representing the y-th segmentSpacing; r is a human number, and d represents the d-th frame;

2. calculating the correlation between every two persons;

(τ > 0);

(τ < 0);

F_τ ^y _(l,r)(τ<0)≡F_τ ^y _(r,l)(τ>0)

F_max ^y _(l,r)＝max(F_τ ^y _(l,r)),(-k<τ<k)

3. drawing a social relationship graph (called a social graph for short);

for the maximum correlation coefficient F between all two persons in the group_max ^y _(l,r)Sorting, setting selection conditions to meetDrawing lines between all two persons (l, r) of the condition, and drawing a social graph of a y-period time interval; the social graph is the basis of group relationship mining;

4. excavating a group corresponding to the characteristic subgraph;

the social graph can be divided into characteristic subgraphs such as ray vertexes, triangles, stars, quadrilaterals, pentagons and the like; the introduction of the main characteristic subgraph is as follows:

and comprehensively analyzing the corresponding groups of the characteristic subgraphs, and providing preparation work for the analysis and prediction of the group relationship.

Analyzing and predicting the group relation;

1. defining the relevant variables:

7) the Te parameter reflects the speaking efficiency, the initial value is 0, and if a triangle which does not contain the character appears in a limited frame after the star appears each time, the variation delta n of the character Te parameter is increased; the other condition is that the number of outward pointing arrows of the character is counted every time the character appears in a star shape, and the variation quantity delta n of the corresponding Te value is increased;

8) the Tr parameter reflects the participation degree, is initially 0, and changes the Tr value of a person appearing in a triangle by deltan each time the triangle appears;

9) the Tn parameter corresponds to the number of speaking times, the initial value is 0, and the Tn value of the character at the vertex changes by deltan each time star appears;

10) the Hs parameter corresponds to the frequency and the amplitude of actions and is not completely independent from the speaking times Tn; hs ═ Σ_i＝1 ^fA_r(d) (ii) a Wherein f is the total frame number of the video; assigning values according to each basic action condition completed in each preprocessed frame, and if the standard is reached, increasing the Hs parameter by A_r(d) A variation amount;

11) the parameter R is Te/Tn; the intuitive meaning is the effective rate of speech; its action is similar to but not exactly the same as Te;

12) ji and Jt are arrow directivity parameters, and the Ji and the Jt of each person are changed by deltan according to the directivity of each line segment; if the arrow points outward, as in fig. 10, the Jt parameter changes by δ n, and if the arrow points inward, as in fig. 11, the Ji parameter changes by δ n; it should be noted that Jt also determines Te parameters, and when a character satisfies the condition appearing at the vertex of the star, Te ═ Te + Jt_(d)(ii) a Wherein Jt_(d)Jt value representing the d frame; d represents the number of frames;

2. an analysis method of group relation mining;

3. mining and predicting group relation;

Step four, analyzing the action expression and the state of the single person:

1. single person action expression;

2. predicting the state of a single person:

The invention has not been described in detail and is within the skill of the art.

The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A method for mining social relations of groups based on video materials is characterized in that: the method comprises the following steps:

firstly, preprocessing a figure image in a video;

based on the respective video material data satisfying the following requirements:

(a) face information: the face detection condition is met, the back shadow and the whole course can not be at the rotation angle which can not be detected, and the face which is intermittent can be used;

(b) time information: the length, definition and frame number of the video; wherein the sharpness satisfies a basic face detection resolution;

secondly, averaging each pixel in the small rectangle to obtain the average value of the three colors as the reference value c of the pixel proportion model_b、c_g、c_r(ii) a Dyeing hair and face; dyeing refers to dyeing the spots according with the proportional model into red and purple which respectively represent hair and face;

thirdly, obtaining a human face with red hair and purple face; counting the number of pixels, and comparing the change condition of each frame, namely judging that one basic action is finished if the red color is increased and then decreased and the change exceeds a certain threshold value m; if the purple color is increased and then decreased and the change exceeds a certain threshold value m, judging that one basic action is finished;

fourthly, giving a change value delta h for the Hs parameter of the character completing the basic action once; simultaneously adding another two variables influencing the parameter Hs, wherein one variable is the offset of a face detection frame, namely a dete frame, and is used for detecting large-amplitude face displacement, and if the change condition of the offset of continuous n frames of coordinates meets a given rule and the variable quantity reaches a set threshold value s, judging that the person completes one basic action; another variable influencing the parameter Hs is the number of frames lasting the disappeared gate frame, which is used to solve the problem of interference caused by the intermittent human face detection frame; the solution is as follows: the frame within m frames disappears and keeps the original continuity, if the frame within m frames still disappears continuously, the strength of the frame is reduced, and a small change value delta h of the Hs parameter is given to show a decreasing trend;

finally, the total variation A of the activity coefficient Hs of each character of each frame is calculated_r(d) Storing the two-dimensional array, and calling when waiting for calculation;

2.1 single person activity value;

A_r(d)＝Σ_n(δh)

2.2 calculating the correlation between every two persons;

performing Correlation calculation between every two characters, namely Cross Correlation;

when the current is over;

when the current is over;

means the average of the sum of the correlation values of (l, r) two persons at a time difference of tau in the y-th time interval; wherein the positive and negative of tau are used for judging the directivity, tau>0 and τ<0 represents the direction of two arrows in different directions respectively;

indicates that in the y-th time period, the time difference tau epsilon (-k, k) is within the range, and different tau values correspond to different differences

Screening out the maximum value of the sum average value of the correlation values of the two persons, and reserving the tau value corresponding to the maximum value for judging the direction of an arrow;

2.3, drawing a social relationship graph, namely a social graph for short;

for maximum correlation coefficient between all two persons in the group

Sorting, setting a selection condition, drawing lines between all (l, r) two persons meeting the condition, and drawing a social graph with a y-period time interval; the social graph is the basis of group relationship mining;

2.4 excavating a group corresponding to the characteristic subgraph;

the social graph can be divided into characteristic subgraphs of ray vertexes, triangles, stars, quadrangles and pentagons; the characteristic subgraph is as follows:

ray vertex structure, namely two rays have the same vertex; corresponding to the Tn parameter;

the triangular structure is formed by connecting lines among three persons to form a triangle; corresponding to the Tr parameter;

the star structure, namely the expansion of the ray vertex structure, means that a plurality of rays intersect at one point; corresponding to the Tn parameter;

the linear structure represents a connecting line between two persons, has universality and appears in each scene; the straight line structure comprises homeopathic straight line connection, which means that the connecting lines among all the figures finally form only one broken line; corresponding to the Lt parameter;

comprehensively analyzing the groups corresponding to the characteristic subgraphs, and providing preparation work for the analysis and prediction of group relations;

analyzing and predicting the group relation;

3.1 define the relevant variables:

defining Te, Tr, Tn, Lt, Ct, Hs, R, Ji, Jt and 9 variables for the next calculation; wherein Ct is a matrix, other parameters are all one-dimensional arrays, and the matrix and the sequencing number in the arrays correspond to the serial numbers l and r of all people in the video;

4) the Hs parameter corresponds to the frequency and the amplitude of actions and is not completely independent from the speaking times Tn;

wherein f is the total frame number of the video; assigning values according to each basic action condition completed in each preprocessed frame, and if the standard is reached, increasing the Hs parameter by A_r(d) A variation amount;

5) the parameter R is Te/Tn; the intuitive meaning is the effective rate of speech;

6) ji and Jt are arrow directivity parameters, and the Ji and the Jt of each person are changed by deltan according to the directivity of each line segment; if the arrow points outwards, the Jt parameter changes by δ n, and if the arrow points inwards, the Ji parameter changes by δ n; jt also determines the Te parameter, when the character meets the condition of appearing at the star vertex, Te is Te + Jt (d); wherein Jt (d) represents the Jt value of the d frame; d represents the number of frames;

7) the Ct and Lt parameters reflect the degree of interaction between two people in the group; the Lt parameter represents the connection times of two persons, and if the connection occurs, the numerical value changes by deltan; the Ct matrix represents the speech interaction degree of two persons, the calculation method comprises the steps of extracting a continuous star-shaped sequence, regarding two persons continuously appearing in the sequence from front to back as one-time interaction, storing the positions of the matrix with corresponding numbers as (l, r), sequencing matrix values, and screening out the relationship between the persons with strong and weak interaction degrees in a group;

the parameters can be summarized into three series which are respectively used for judging the group interaction degree, the dominance, the action frequency and the amplitude;

3.2 group relation mining analysis method;

firstly, analyzing the frequency distribution of Ji and Jt, and excavating the person with strong dominance and the person biased to be passive in the group relation; secondly, sorting the speaking times and the occurrence action frequency of each person through Tn and Hs frequency distribution; thirdly, analyzing the frequency distribution of Te, Tr and R, and sequencing the speaking effectiveness and participation degree of each person; fourthly, sorting the strength of the relationship between two persons through the Lt frequency distribution; finally, analyzing Ct matrix data, and sequencing the speech coordination degree and the interaction degree between the human beings;

3.3 mining and predicting group relation;

and (3) predicting the four-person scene group relation from left to right by combining all the information of the above:

group interaction aspect: few conversations exist between the figure No. 1 and the figure No. 2, simple conversations exist, the effect is good, the interaction degree is high, and the matching degree is high; the figure 2 is more closely related to the figure 3 and the figure 4 than the figure 1; person number 2, person number 4, is more likely to be in a talking state, where person number 2 is more active, second order number 4; the number 3 person has high participation and high speech effect, but has small speech activity, and can be inferred that the speech content is attractive and boring; in addition, the interaction between No. 3 and No. 4 two persons is very good, which indicates that the two persons have a communication process and have potential tacit understanding on the relation and topic;

and (3) action aspect: the number 1 and 2 movements are more, and the number 3 and 4 are less head movements, so that the number 1 can be inferred to be low in participation but not in a group-off state;

attempts were made to restore the measured four-person scenario in language: good communication processes are respectively carried out among the characters 1,2, 3 and 3,4, and more close conversations exist; the whole four-person relationship is close, one is a chat scene with high emotion among friends, and most of the personal relationships can be judged to be friendships higher than strangers in terms of intimacy degree; the other is that No. 1 and No. 2 are in an opposite relationship with other people, but the prospect is high from the aspect of status and posture analysis, and the people are in a negotiation scene and a meeting scene;

most of the prediction results conform to the actual situation, and in a real scene, the number 1 is mother, the number 2 is daughter, the number 3 is son, and the number 4 is father; four people are watching old people in a hospital and chatting in a corridor, wherein the talking atmosphere is humorous, and the coordination among the people also conforms to the inference hypothesis in the front; and giving a final conspicuous drawn line;

step four, analyzing the action expression and the state of the single person:

the motion expression and character characteristics of the single person can be further extracted from the analysis and prediction of the group relationship;

4.1 expression by single person action;

firstly, judging whether the character is active or not and whether more head movements exist or not; secondly, analyzing the speaking times of the person and whether the person is in good talk or not; thirdly, judging whether the speaking efficiency of the person is high or not, whether the action effect is good or not, whether the influence is generated on other people or not and whether the speaking plays a role; finally, judging whether the dominant force of the character is high or low, and whether the behavior is active or passive;

the whole person No. 1 is in a listening state, and a small amount of speech is available; the person No. 2 has more speech, is more active in language and action, is testimony in voice, is confident in content and gives a summary speech, and can win the wins in a word; the figure 3 also has a little time to participate in the conversation and sends out the tablature, but is relatively quiet and has slightly low liveness; the 4 th person has a little less speaking time;

4.2 prediction of the state of the single person:

no. 4 is more highly valued figure and elder, authoritative and powerful, and the figure No. 1 is followed by strong but less powerful than the language table; figure 2, figure 3, is a figure with a slightly lower prospect and a lower posture, which belongs to the first two persons; wherein, No. 2 is better in active atmosphere, has more speech but not useless speech, and is in publishing viewpoint and atmosphere regulation; in addition, the character 2 shows more contradictory characters than the character 3, and is more personalized, and has internal and external emotions of bipolar differentiation;

most of the prediction results conform to the actual situation; in a real scene, the No. 1 is mother, the No. 2 is daughter, the No. 3 is son, and the No. 4 is father; four people are watching old people in a hospital and chatting in a corridor, wherein the talking atmosphere is humorous, and the coordination among the people also conforms to the inference hypothesis in the front; and giving a final conspicuous drawn line;

the face detection, the pixel average value calculation, the graph drawing and only the information required by the calculation are reserved.