CN110348285B

CN110348285B - Social relationship identification method and device based on semantic enhanced network

Info

Publication number: CN110348285B
Application number: CN201910434462.7A
Authority: CN
Inventors: 闫海滨; 宋朝辉
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2021-07-20
Anticipated expiration: 2039-05-23
Also published as: CN110348285A

Abstract

The invention discloses a social relationship identification method and a social relationship identification device based on a semantic enhanced network, wherein the method comprises the following steps: acquiring an original image; the original image comprises a plurality of face subimages; extracting the face features in each face subimage; extracting global features from the original image by adopting a pre-trained semantic enhancement network; and according to the human face features and the global features, identifying the social relations of the people corresponding to the human faces in the original image, so that the identification accuracy of the social relations is improved.

Description

Social relationship identification method and device based on semantic enhanced network

Technical Field

The invention relates to the technical field of computer vision, in particular to a social relationship identification method and device based on a semantic enhanced network.

Background

The content of the automatic identification of the social relationship is to automatically identify the social relationship among multiple persons for a given picture of the multiple persons. Social relationship recognition has rich applications in intelligent systems, social networks, and the like. While some research in recent years is attempting to achieve this goal, accurately identifying social relationships remains a challenging task.

The existing social relationship identification technology can be divided into three categories: face-based methods, attribute-based methods, and environment-based methods. Face-based methods are directed to resolving social relationships through a pair of face images. However, a pair of facial images cannot completely and clearly represent social relationships, for example, when two people wear similar clothes in a sporting event in a photo, the relationship between the two people is likely to be a trusted partner; conversely, if two people wear different clothing, the relationship between the two people is likely to be a competitor. This face-based approach only considers a small amount of information and therefore has difficulties in social relationship recognition. The attribute-based method is to extract some representative attributes in the image, such as clothes, age, expression, etc., and then use these attributes to provide important guidance for social relationship identification. However, attributes affecting social relationships are diverse, such as chalks and blackboards often mean relationships between students and teachers in pictures, while sofas and televisions often mean relationships between family and friends, and such diverse attributes are difficult to be fully described and cannot be fully adapted to different scenes. The method based on the environment divides the multi-person image into a plurality of pictures with different meanings, such as human faces, flowers, screens and the like, and finally comprehensively considers interpersonal relations contained in the pictures according to the divided pictures. But the social relationship is abstract and exquisite, and the method based on the environment divides pictures with different meanings independently, thereby destroying the inherent relevance of the original scene. Therefore, the three social relationship identification technologies have low identification accuracy.

Disclosure of Invention

In view of the above, the present invention provides a social relationship identification method and apparatus based on a semantic enhanced network, which can improve the identification accuracy of social relationships.

Based on the above purpose, the invention provides a social relationship identification method based on a semantic enhanced network, which comprises the following steps:

acquiring an original image; the original image comprises a plurality of face subimages;

extracting the face features in each face subimage;

extracting global features from the original image by adopting a pre-trained semantic enhancement network;

and according to the human face features and the global features, identifying the social relations of the people corresponding to the human faces in the original image.

Further, the extracting the face features in each face sub-image specifically includes:

and processing each face subimage by adopting a convolutional neural network sharing weight to obtain the face characteristics in each face subimage.

Further, the semantic enhancement network comprises a feature extractor and a semantic enhancement structure;

extracting global features from the original image by adopting a pre-trained semantic enhancement network, which specifically comprises the following steps:

extracting environmental features from the original image by using the feature extractor;

converting the original image into semantic information;

extracting semantic features from the semantic information by using the semantic enhancement structure;

and combining the environmental features and the semantic features to obtain the global features.

Further, the semantic enhancement structure comprises a first convolutional layer, a second convolutional layer and a third convolutional layer;

the extracting semantic features from the semantic information by using the semantic enhancement structure specifically includes:

performing primary processing on the semantic information by adopting a first convolution layer;

converting the preliminarily processed semantic information into a first characteristic by adopting a second convolution layer;

converting the preliminarily processed semantic information into a second feature by adopting a third convolution layer;

and combining the first characteristic and the second characteristic to obtain the semantic characteristic.

Further, the combining the first feature and the second feature to obtain the semantic feature specifically includes:

and calculating the square of the first feature sum and the second feature sum, and taking the calculation result as the semantic feature.

Further, the first convolutional layer includes one convolutional layer, the second convolutional layer includes one convolutional layer, and the third convolutional layer includes six convolutional layers.

Further, the convolution kernel of the first convolution layer is 7 × 7, the convolution kernel of the second convolution layer is 3 × 3, and the convolution kernels of the six convolution layers of the third convolution layer are 1 × 1, 3 × 3, 1 × 1, 3 × 3, and 1 × 1, respectively; the step size of each convolutional layer is 1.

Further, the combining the environmental features and the semantic features to obtain the global features specifically includes:

combining the environmental features and the semantic features by adopting a residual learning algorithm to obtain combined features, and extracting the global features from the combined features;

the residual error learning algorithm is as follows:

F＝M₁+M₁*N₁；

wherein F is a combination feature, M₁As a feature of the environment, N₁Is a semantic feature.

Further, the loss function of the semantic enhancement network is:

wherein x is the recognition result of the social relationship, label is the actual category of the original image, and w_labelIs the weight of label, x_labelFor the value corresponding to the actual category in the recognition result, N is the number of categories, x_jThe value corresponding to the jth category in the recognition result is obtained.

The embodiment of the invention also provides a social relationship recognition device based on the semantic enhanced network, which can realize all the processes of the social relationship recognition method based on the semantic enhanced network, and the device comprises:

the image acquisition module is used for acquiring an original image; the original image comprises a plurality of face subimages;

the face feature extraction module is used for extracting the face features in each face subimage;

the global feature extraction module is used for extracting global features from the original image by adopting a pre-trained semantic enhancement network; and the number of the first and second groups,

and the identification module is used for identifying the social relationship of the people corresponding to the faces in the original image according to the face features and the global features.

From the above, the social relationship recognition method and device based on the semantic enhancement network provided by the invention can extract the face features of each face sub-image in the original image, extract the global features from the original image by adopting the pre-trained semantic enhancement network, and further recognize the social relationships of a plurality of tasks in the original image according to the face features and the global features, thereby effectively improving the recognition accuracy of the social relationships.

Drawings

FIG. 1 is a schematic flow chart of a social relationship identification method based on a semantic enhanced network according to an embodiment of the present invention;

fig. 2 is a schematic diagram of step S2 in the social relationship identification method based on semantic enhanced network according to the embodiment of the present invention;

FIG. 3 is a schematic diagram of semantic feature extraction in the social relationship identification method based on the semantic enhanced network according to the embodiment of the present invention;

fig. 4 is a schematic diagram of step S3 in the social relationship identification method based on semantic enhanced network according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of a social relationship identification method based on a semantic enhanced network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a social relationship identifying apparatus based on a semantic enhanced network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

Referring to fig. 1, a schematic flow chart of a social relationship identification method based on a semantic enhanced network according to an embodiment of the present invention is shown, where the method includes:

s1, acquiring an original image; the original image comprises a plurality of face sub-images.

In this embodiment, the original image is a multi-person image, and the face area of each person in the original image is a corresponding face sub-image, so that the original image has a plurality of face sub-images.

And S2, extracting the face features in each face sub-image.

Specifically, step S2 includes:

In this embodiment, a face detector is used to obtain a plurality of face sub-images in the original image, for example, as shown in fig. 2, two persons are in the original image 21, so as to obtain two

face sub-images

22 and 23. Further, a twin network (Siamese) is employed to extract facial features from each of the facial sub-images. In particular, the twin network processes each face sub-image through a set of convolutional neural networks that share weights. As shown in fig. 2, the two

face sub-images

22 and 23 are passed through a convolutional neural network 24 sharing weights to extract corresponding face features 25. Wherein, the main part of the twin network is a 34-layer residual network (resnet 34). The face features provide information from the face and help for identification of subsequent social relationships.

And S3, extracting global features from the original image by adopting a pre-trained semantic enhancement network.

In particular, the semantic enhancement network includes a feature extractor and a semantic enhancement structure.

Step S3 specifically includes:

converting the original image into semantic information;

In this embodiment, the feature extractor directly extracts, from the original image, the environmental features, which are core environmental information extracted from the rich original image, to help the neural network understand the correlation between the occurrence background of the social relationship and the object. And converting the original image into rich semantic information by adopting a semantic segmentation network (Deeplab). The semantic enhancement structure extracts semantic features from the semantic information, and the semantic features highlight representative objects in the environment and help the neural network to better judge the social relationship of a specific scene. In addition, the semantic features can not only effectively provide related information of representative objects in the environment, but also retain some object features with small influence factors, and further help the neural network to understand the object relevance in the environment.

In order to better utilize rich semantic information to help the identification of social relationships, the semantic enhancement structure adopts a plug-and-play design idea to process the semantic information.

the method for extracting semantic features from the semantic information by adopting the pre-trained semantic enhancement network specifically comprises the following steps:

Wherein the first convolutional layer comprises one convolutional layer, the second convolutional layer comprises one convolutional layer, and the third convolutional layer comprises six convolutional layers.

It should be noted that, as shown in fig. 3, a convolutional layer with a larger convolutional kernel is used to perform preliminary processing on the semantic information 32, and then convolutional layers with two different depths (i.e., a second convolutional layer and a third convolutional layer) are used to continue processing on the preliminary processing result. The first convolution layer has one convolution layer, and converts the preliminarily processed semantic information into rough feature expression, namely the first feature. The second convolution layer has a process convolution layer, and converts the preliminarily processed semantic information into a fine feature expression, namely a second feature. Finally, the first characteristic and the second characteristic are combined by adopting an activation function, and a semantic characteristic N is output₁. The combination of different levels of information can significantly improve the performance of neural networks. According to the embodiment, rich semantic information is converted into the semantic features convenient to use through the semantic enhancement structure, and objects highlighted in the semantic information and the inherent relevance of the objects are kept.

Specifically, each convolution layer in the semantic enhancement structure is followed by one batch normalization layer (BN) and an activation layer, as shown in fig. 3, the step size of each convolution layer is 1, the convolution kernel of the first convolution layer is 7 × 7, the convolution kernel of the second convolution layer is 3 × 3, and the convolution kernels of the six convolution layers of the third convolution layer are 1 × 1, 3 × 3, 1 × 1, 3 × 3, and 1 × 1, respectively. For convolution layers with convolution kernel 7 × 7, the padding value is set to 3; for the convolution layer with convolution kernel of 3 x 3, the filling value is set to 1, so that the output feature of the semantic enhancement structure is consistent with the input feature in size, and the semantic enhancement structure can be conveniently and quickly inserted into any network layer.

It should be noted that the first feature and the second feature are combined in an addition manner, and then a squaring operation is adopted as an activation function to obtain the semantic feature N₁。

the residual error learning algorithm is as follows:

F＝M₁+M₁*N₁；

In order to maintain the stability and effectiveness of the features, the embodiment adopts a residual learning algorithm to combine the environmental features and the semantic features. The obtained combined feature F is used for extracting perfect global features through a subsequent convolutional neural network. For any multi-person image, a background environment, a representative object and an internal association between the objects are captured through a semantic enhanced network, and a global feature is obtained to facilitate the identification of a subsequent social relationship.

For example, as shown in fig. 4, the feature extractor 31 is used to extract the environmental feature M from the original image 21₁Semantic feature N is extracted from semantic information 32 using semantic enhancement structure 33₁And the environmental characteristics M₁And semantic feature N₁And combining to obtain a combined feature F, and extracting a global feature 35 from the extracted combined feature F by adopting a convolutional neural network 34.

And S4, identifying the social relationship of the people corresponding to the faces in the original image according to the face features and the global features.

In this embodiment, after the face features and the global features are based, the classifier is adopted to complete the identification of the social relationship. The classifier is implemented by using a full link layer, the Relu function is used as an activation function of the full link layer, and a plurality of classifiers which do not share parameters are generally used for processing different social relationship categories respectively, so that social relationship results of a plurality of people in an original image, such as dominant (dominant), trusted (trusting), friendly (friendly), warm (arm) and the like, are obtained.

For example, as shown in fig. 5, after two

face sub-images

22 and 23 in the original image 21 are acquired, a convolutional neural network 24 sharing weights is used to extract face features 25 from the face sub-images 22 and 23, respectively. At the same time, the feature extractor 31 is used to extract the environmental feature M from the original image 21₁Semantic feature N is extracted from semantic information 32 using semantic enhancement structure 33₁And the environmental characteristics M₁And semantic feature N₁And combining to obtain a combined feature F, and extracting a global feature 35 from the extracted combined feature F by adopting a convolutional neural network 34. Finally, the classifier 51 is adopted, the social relationship of the person in the original image 21 is recognized based on the face feature 25 and the global feature 35, and the recognition result is output.

Further, an adaptive moment estimation (Adam) optimizer is employed in the process of training the semantic enhancement network. The loss function is defined as a cross-entropy function with weights to account for data imbalances. The 34 layers of residual error networks load pre-training parameters from a deep learning competition (ImageNet) to carry out transfer learning, the learning rate of the convolutional layers is designed to be 1e-4, and the learning rate of the fully-connected layers is designed to be 1 e-3. Wherein the loss function is:

in the formula, x is the recognition result of the social relationship, label is the actual category of the original image, and w_labelIs the weight of label, x_labelFor the value corresponding to the actual category in the recognition result, N is the number of categories, x_jThe value corresponding to the jth category in the recognition result is obtained.

The social relationship recognition method based on the semantic enhancement network can extract the face features of each face sub-image in the original image, extract the global features from the original image by adopting the pre-trained semantic enhancement network, and further recognize the social relationships of a plurality of tasks in the original image according to the face features and the global features, so that the recognition accuracy of the social relationships is effectively improved.

Correspondingly, the invention also provides a social relationship recognition device based on the semantic enhanced network, which can realize all the processes of the social relationship recognition method based on the semantic enhanced network.

Referring to fig. 6, it is a schematic structural diagram of a social relationship identifying apparatus based on a semantic enhanced network according to an embodiment of the present invention, where the apparatus includes:

the image acquisition module 1 is used for acquiring an original image; the original image comprises a plurality of face subimages;

the face feature extraction module 2 is used for extracting the face features in each face subimage;

the global feature extraction module 3 is used for extracting global features from the original image by adopting a pre-trained semantic enhancement network; and the number of the first and second groups,

and the recognition module 4 is configured to recognize the social relationship of the people corresponding to the multiple faces in the original image according to the face features and the global features.

The social relationship recognition device based on the semantic enhancement network can extract the face features of each face sub-image in the original image, extract the global features from the original image by adopting the pre-trained semantic enhancement network, and further recognize the social relationships of a plurality of tasks in the original image according to the face features and the global features, so that the recognition accuracy of the social relationships is effectively improved.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A social relationship recognition method based on a semantic enhanced network is characterized by comprising the following steps:

extracting the face features in each face subimage;

according to the face features and the global features, identifying social relations of people corresponding to a plurality of faces in the original image;

the semantic enhancement network comprises a feature extractor and a semantic enhancement structure;

the extracting of the global features from the original image by adopting the pre-trained semantic enhancement network specifically comprises the following steps:

converting the original image into semantic information;

2. The social relationship recognition method based on the semantic enhancement network according to claim 1, wherein the extracting the face features in each face sub-image specifically comprises:

3. The social relationship recognition method based on semantic enhanced network according to claim 1, wherein the semantic enhanced structure comprises a first convolutional layer, a second convolutional layer and a third convolutional layer;

4. The social relationship recognition method based on the semantic enhanced network according to claim 3, wherein the combining the first feature and the second feature to obtain the semantic features specifically comprises:

5. The semantic enhanced network-based social relationship recognition method of claim 3, wherein the first convolutional layer comprises one convolutional layer, the second convolutional layer comprises one convolutional layer, and the third convolutional layer comprises six convolutional layers.

6. The semantic enhancement network-based social relationship recognition method according to claim 5, wherein the convolution kernels of the first convolution layer are 7 × 7, the convolution kernels of the second convolution layer are 3 × 3, and the convolution kernels of the six convolution layers of the third convolution layer are 1 × 1, 3 × 3, 1 × 1, 3 × 3, and 1 × 1, respectively; the step size of each convolutional layer is 1.

7. The social relationship recognition method based on the semantic enhanced network according to claim 1, wherein the combining the environmental features and the semantic features to obtain the global features specifically comprises:

the residual error learning algorithm is as follows:

F＝M₁+M₁*N₁；

8. The social relationship recognition method based on semantic enhanced network according to claim 1, wherein the loss function of the semantic enhanced network is:

9. A social relationship recognition device based on a semantic enhanced network, which can implement the social relationship recognition method based on the semantic enhanced network according to any one of claims 1 to 8, wherein the device comprises: