CN111178146A - Method and device for identifying anchor based on face features - Google Patents

Method and device for identifying anchor based on face features Download PDF

Info

Publication number
CN111178146A
CN111178146A CN201911243502.6A CN201911243502A CN111178146A CN 111178146 A CN111178146 A CN 111178146A CN 201911243502 A CN201911243502 A CN 201911243502A CN 111178146 A CN111178146 A CN 111178146A
Authority
CN
China
Prior art keywords
anchor
network model
sample
face
face features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911243502.6A
Other languages
Chinese (zh)
Inventor
张菁
姚嘉诚
卓力
李晨豪
王立元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201911243502.6A priority Critical patent/CN111178146A/en
Publication of CN111178146A publication Critical patent/CN111178146A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method and a device for identifying a anchor based on human face characteristics, wherein the method comprises the following steps: intercepting a frame image from a live video of an anchor to be identified, inputting the frame image to a pre-trained neural network model, and obtaining the human face characteristics of the anchor to be identified according to the output of the neural network model; storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched; and calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized. The embodiment of the invention meets the requirement of identifying the anchor under the live broadcast condition.

Description

Method and device for identifying anchor based on face features
Technical Field
The invention relates to the technical field of video monitoring, in particular to a method and a device for identifying a anchor based on human face features.
Background
The network live broadcast, i.e. the internet live broadcast service, is a main broadcast leading and brand new internet audio-visual program, usually adopting the forms of video, audio, image and text, etc., and broadcasting various consultations or programs, etc. to the public in real time. By 6 months in 2019, the scale of the users for live webcasting in China reaches 4.33 hundred million, which accounts for 50.7% of the whole netizen, wherein the scale of the users for live webcasting is 2.05 million, which accounts for 24.0% of the whole netizen.
The network live broadcast attracts more and more users, and the temptation of economic benefit, the careless omission of supervision and the like also cause various disordering images of the live broadcast industry. Some anchor broadcasts attract the audience's eyes in an indiscriminate way for their own interests, and their manipulations are mainly classified into two categories. One is the behavior of an illegal criminal, such as obscency sexual performance, invasion of privacy, invasion of copyright, false act, aggressive sexual disorder, profanity country symbol, etc.; the other is the behavior which is bad and destroys social vitality without violating laws, such as violent and popular speech, climbing and dazzling, illegal advertisement, animal abuse, and the like. These broadcasters do not know that the live content is not in compliance with regulations, and more, the broadcasters embrace the lucky psychology that one platform can be replaced and the account can continue to be live even if the live content is killed by the platform, so that the behaviors are often prohibited.
In order to reduce the spread of such network objectionable content on the internet, cross-platform correlation identification of objectionable sponsors is required to decontaminate the internet environment.
Disclosure of Invention
Embodiments of the present invention provide a method and apparatus for identifying a anchor based on human face features, which overcome the above problems or at least partially solve the above problems.
In a first aspect, an embodiment of the present invention provides a method for identifying a anchor based on a face feature, including:
intercepting a frame image from a live video of an anchor to be identified, inputting the frame image to a pre-trained neural network model, and obtaining the human face characteristics of the anchor to be identified according to the output of the neural network model;
storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched;
and calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
Further, the neural network model includes a detection network model and a recognition network model;
correspondingly, the inputting the frame image into a pre-trained neural network model, and taking the output of the neural network model as the face feature of the anchor to be recognized includes:
inputting the frame image into the detection network model and outputting a recognition frame vector;
intercepting a face image from the frame image according to the identification frame vector;
inputting the face image into the recognition network model, and outputting face features;
and the identification frame vector is used for representing the position of an identification frame in the frame image and the probability of identifying the pattern in the identification frame as a human face.
Further, the detection network model is a multitask convolutional neural network; correspondingly, the inputting the frame image into the detection network model to obtain the identification frame vector specifically includes:
zooming the frame image into a plurality of images with different zooming ratios;
and inputting the images with different scales into a multitask convolutional neural network to obtain the identification frame vector of each image.
Further, the capturing a face image from a frame image according to the recognition frame vector, inputting the face image into the recognition network model, and outputting face features specifically as follows:
and intercepting a face image from the frame image according to the recognition frame vector with the highest probability of being recognized as the face, inputting the face image into the recognition network model, and outputting the face features.
Further, the training method for identifying the network model comprises the following steps:
constructing a plurality of triples, wherein each triplet comprises a reference sample, a positive sample and a negative sample, the reference sample and the positive sample are two different sample face images of one person, and the negative sample is a sample face image of the other person;
inputting the triples into the identification network model, and outputting feature vectors of a reference, a positive sample and a negative sample in the triples;
calculating cosine similarity between the characteristic vector of the reference and the characteristic vector of the positive sample and cosine similarity between the characteristic vector of the reference and the characteristic vector of the negative sample;
and determining triple losses according to the two cosine similarities, and optimizing the identification network model according to the triple losses.
Further, the inputting the triplet into the recognition network model may further include:
adding a full connection layer and a softmax layer at the end of the identification network model;
selecting a single sample human face image to train the recognition network model each time, and deleting the full connection layer and the softmax layer when the training times reach a preset threshold value or the loss of the recognition network model is less than a preset requirement;
wherein, selecting a single sample human face image each time to train the recognition network model specifically comprises: and calculating the loss of the single facial image in the recognition network model and the parameters needing to be updated by taking the single sample facial image as the input of the recognition network model and taking the cross entropy function as a loss function.
Further, the calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized specifically includes:
and performing cosine similarity measurement on the face features of the sample anchor and the face features of the anchor to be recognized to obtain the similarity between the face features of the sample anchor and the face features of the anchor to be recognized.
In a second aspect, an embodiment of the present invention provides an apparatus for identifying a anchor based on a face feature, including:
the human face feature extraction module is used for intercepting frame images from a live video of an anchor to be identified, inputting the frame images to a pre-trained neural network model, and obtaining the human face features of the anchor to be identified according to the output of the neural network model;
the mapping module is used for storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched;
and the similarity calculation module is used for calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the method and the device for identifying the anchor based on the face features, the face features which are not easy to change along with the change of the live broadcast environment are obtained to serve as the identification information, a small number of sample anchors similar to the anchor to be identified are found out through a local sensitive hashing method, and then the real anchor is determined from the small number of sample anchors according to the similarity, so that the identification efficiency is improved, and the requirement for identifying the anchor under the live broadcast condition is met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for identifying a anchor based on human face features according to an embodiment of the present invention
FIG. 2 is an architecture diagram of a detection network model during training according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for identifying anchor based on human face features according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram of a device for identifying a anchor based on human face features according to an embodiment of the present invention;
fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for identifying a anchor based on a face feature according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
s101, capturing a frame image from a live video of a anchor to be recognized, inputting the frame image to a pre-trained neural network model, and obtaining the human face characteristics of the anchor to be recognized according to the output of the neural network model.
It should be noted that, in consideration of the fact that the anchor may wear different clothes in the live broadcasting process, the embodiment of the present invention uses the face features with small variations as the features of the anchor. The human face features of the anchor in the embodiment of the invention are obtained through a pre-trained neural network model, and it can be understood that the neural network model pre-trained is trained by taking a sample live broadcast video as a sample and taking the human face features of the anchor in the sample live broadcast video as a label. The specific training process can be that a sample live broadcast video is input into a neural network model, the recognized human face features of the anchor are output, the similarity between the recognized human face features and the labels is used as loss, and the quantity needing to be updated in the neural network model is adjusted according to the loss until the loss is smaller than a preset threshold value or the training test reaches a preset number.
S102, storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched.
Aiming at the characteristics of large data volume of live broadcast video information of a main broadcast and high calculation complexity in association retrieval, the embodiment of the invention maps the face features of the sample main broadcast calculated by adopting the same neural network model into a hash value by using a local sensitive hash method, then maps the face features of the main broadcast to be identified according to the same local sensitive hash method, and based on the criterion that the probability of distributing similar features into the same hash bucket after hash mapping is higher, the features in the same hash bucket are used as the face features of the sample main broadcast to be matched, so that the calculation amount can be greatly reduced in subsequent similarity calculation, and the requirement of identifying the main broadcast under the live broadcast condition is met.
S103, calculating the similarity between the features to be matched and the face features, and taking the sample anchor corresponding to the feature vector with the highest similarity as the anchor to be identified.
It should be noted that, in the present invention, by calculating the direct similarity of the vectors, the sample anchor corresponding to the feature vector with the highest similarity is found out as the anchor to be identified. The embodiment of the present invention does not specifically limit the specific algorithm of the similarity, and for example, the similarity is represented by an euclidean distance, a cosine distance, and the like.
It should be noted that, in the embodiment of the present invention, by acquiring the face features that are not easily changed with the change of the live broadcast environment as the identification information, a small number of sample anchor similar to the anchor to be identified is found out first by using a locality sensitive hashing method, and then a real anchor is determined from the small number of sample anchors according to the size of the similarity, so that the identification efficiency is improved, and the requirement for identifying the anchor under the live broadcast condition is met.
As an alternative embodiment, the embodiment of the present invention further includes: and constructing an illegal anchor database by using the face characteristics of the illegal and sealed anchors on each platform, and sending alarm information to a live broadcast platform where the anchor to be identified is located when the identified identity of the anchor is the anchor in the illegal anchor database.
It should be noted that, in the embodiment of the present invention, by constructing the prohibited anchor database, the anchor identified each time is compared with the face features of the sample anchor in the prohibited anchor database, and if the comparison is successful, it is indicated that the identified anchor is a prohibited anchor, and a platform needs to be used for sealing, so that cross-platform correlation identification of the anchor is realized, and the internet environment is purified.
On the basis of the above embodiments, as an alternative embodiment, the neural network model includes a detection network model and a recognition network model;
correspondingly, the inputting the frame image into a pre-trained neural network model, and taking the output of the neural network model as the face feature of the anchor to be recognized includes: s201, S202, and S203, specifically:
s201, inputting the frame image into the detection network model, and outputting a recognition frame vector;
s202, intercepting a face image from the frame image according to the identification frame vector;
s203, inputting the face image into the recognition network model, and outputting face features;
and the identification frame vector is used for representing the position of an identification frame in the frame image and the probability of identifying the pattern in the identification frame as a human face.
It should be noted that if feature extraction is performed on all pixels in an image, the extracted feature vectors are affected by anchor dress and live background, and it is difficult to extract effective information that can be used for anchor identity recognition. The embodiment of the invention determines the recognition frame containing the face in the frame image, then obtains the face characteristics according to the pattern in the recognition frame, and only needs to extract the characteristics of a small part of pixels containing the face, thereby efficiently obtaining the accurate information of the anchor face.
It can be understood that the network model for detection according to the embodiment of the present invention is trained by using a sample image (the image is not necessarily an image of each frame of a video, and may be any image containing a face pattern) as a sample, and using a position of a recognition frame of a face in the sample image and a probability that a pattern in the recognition frame is recognized as the face as a sample label.
Specifically, the data set used for training the detection network model is face detection data provided by the widget database, and the data set has 32203 pictures, and a total of 93,703 faces are labeled. Training a detection network by using samples in a data set, wherein the specific training process is as follows:
(1) to unify the criteria, the pixel values of all training sample images are normalized to between [ -1,1], and the calculation formula is as follows:
Figure BDA0002306890870000071
wherein x represents a pixel value of any one pixel in the image;
(2) the weight value of the detection network is randomly selected from data which are 0.02 in variance and accord with normal distribution, and the batch processing size is set to be 128;
(3) inputting a training image into a detection network, outputting a recognition frame vector, wherein the identification frame vector is used for representing the probability that an input sample is a human face and the position of the recognition frame, the recognition frame vector is 6-dimensional features in the embodiment of the invention, wherein the first two bits represent the probability of the human face, and the last 4 bits represent the position of the recognition frame;
(4) the loss of image G in the detection network and the amount that needs to be updated are calculated.
(5) For the problem of detecting whether a human face exists, that is, detecting the first 2 bits of a 6-dimensional feature vector output by a network (i.e., a partial vector representing the probability of being recognized as a human face), a cross entropy loss function is used:
Figure BDA0002306890870000072
wherein the content of the first and second substances,
Figure BDA0002306890870000081
true tags representing samples, piRepresenting the probability of being recognized as a face.
For the determination of the position of the face bounding box in the image, i.e. the last 4 bits of the 6-dimensional feature vector, the euclidean distance is used as the loss function:
Figure BDA0002306890870000082
wherein the content of the first and second substances,
Figure BDA0002306890870000083
represents the bounding box coordinates of the network prediction, and
Figure BDA0002306890870000084
representing the real bounding box coordinates.
Fig. 2 is an architecture diagram of a detection network model during training according to an embodiment of the present invention, as shown in fig. 2, an upper limit of an input image size is 2000 × 1100 pixels, a lower limit is 250 × 130, a set of feature vectors is output, and a feature map of 6 channels is also understood, where one pixel (a common RGB image has three color channels, each pixel is represented by 3 values, and for the feature map, it can be understood that each pixel is represented by 6 values) in the feature map is extracted to be a 6-dimensional vector, which includes coordinate information of a recognition frame and a face probability. Since the structure of the detection network is known, it is possible to deduce to which 56 x 56 area of the original image this pixel corresponds. For the whole feature map, positions with the face existence probability higher than a certain threshold value can be found only by traversing the 5 th and 6 th channels, wherein the 1 st to 4 th channels are the coordinates of the recognition box.
On the basis of the above embodiments, as an optional embodiment, the recognition network model in the embodiments of the present invention is trained by using the face image of the sample image as a sample and using the face feature of the sample image as a sample label.
It should be noted that the recognition network model of the embodiment of the present invention is mainly used for extracting depth features (i.e., facial features) of a anchor facial image. And the recognition network model extracts the depth features of the face image based on ResNet-50. ResNet (Residual Neural Network), a Residual Network, is characterized by ease of optimization and can increase accuracy by adding significant depth. The inner residual block uses jump connection, and the problem of gradient disappearance caused by depth increase in a deep neural network is relieved. ResNet-50 is a residual network with 50 convolutional layers.
On the basis of the above embodiments, as an optional embodiment, the detection network model in the embodiment of the present invention is a Multi-Task Convolutional Neural network (Multi-Task Convolutional Neural Networks), most machine learning tasks belong to single-Task learning, that is, only one Task is learned at a time, and the Multi-Task Convolutional Neural network can learn a plurality of tasks at a time, and the two tasks of finding the position of the frame and detecting the existence probability of the face are integrated into one network by using the Multi-Task Convolutional Neural network to be completed.
On the basis of the foregoing embodiment, as an optional embodiment, the inputting the frame image into the detection network model to obtain the recognition frame vector specifically includes:
zooming the frame image into a plurality of images with different zooming ratios;
and inputting the images with different scales into a multitask convolutional neural network to obtain the identification frame vector of each image.
It should be noted that, in order to solve the problem of inconsistent scale of the anchor face image, the invention first performs scale transformation on the original image to generate an image pyramid, and the image pyramid is composed of a plurality of images with different scales, for example, an original frame image is subjected to scale transformation on the original frame image
Figure BDA0002306890870000091
For scaling, the upper limit of the image size is set to 2000 × 1100 pixels, the lower limit is set to 250 × 130 pixels, and when the length and width of the scaled image are both within the limit range, the image is scaled down or enlarged according to the scale to generate a plurality of images with different scales.
And then, utilizing the characteristic that the multitask convolutional neural network can process a plurality of tasks, inputting the generated image pyramid into the multitask convolutional neural network, and obtaining an H multiplied by W multiplied by 6 dimensional feature map through continuous multiple convolution and pooling operations, wherein H is the length of the feature map, W is the width of the feature map and is determined by the size of the input image, each group of 1 multiplied by 6 feature vectors corresponds to one area on the input image, the first bits of the feature vectors represent the probability of being recognized as a human face, and the last bits represent the position of a bounding box.
On the basis of the foregoing embodiments, as an optional embodiment, a face image is captured from a frame image according to the recognition frame vector, and is input to the recognition network model, and a face feature is output, specifically:
and intercepting a face image from the frame image according to the recognition frame vector with the highest probability of being recognized as the face, inputting the face image into the recognition network model, and outputting the face features.
It should be noted that, because the multitask convolutional neural network can show one recognition frame vector for each image, the embodiment of the present invention intercepts a face image from a corresponding image from a recognition frame vector with the highest probability of being recognized as a face, and inputs the face image into the recognition network model to output a face feature.
On the basis of the above embodiments, as an optional embodiment, the training method for identifying the network model includes:
constructing a plurality of triples, wherein each triplet comprises a reference sample, a positive sample and a negative sample, the reference sample and the positive sample are two different sample face images of one person, and the negative sample is a sample face image of the other person;
inputting the triples into the identification network model, and outputting feature vectors of a reference, a positive sample and a negative sample in the triples;
calculating cosine similarity between the characteristic vector of the reference and the characteristic vector of the positive sample and cosine similarity between the characteristic vector of the reference and the characteristic vector of the negative sample;
and determining triple losses according to the two cosine similarities, and optimizing the identification network model according to the triple losses.
It should be noted that the training method for identifying a network model according to the embodiment of the present invention is trained based on the idea of a twin network. By inputting the triplets into the recognition network model, the aim of better defining the human face characteristics is fulfilled by calculating the similarity of different human face images of the same person and the difference of the human face images of different persons.
On the basis of the foregoing embodiments, as an optional embodiment, the inputting the triplet into the recognition network model further includes:
adding a full connection layer and a softmax layer at the end of the identification network model;
selecting a single sample human face image to train the recognition network model each time, and deleting the full connection layer and the softmax layer when the training times reach a preset threshold value or the loss of the recognition network model is less than a preset requirement;
wherein, selecting a single sample human face image each time to train the recognition network model specifically comprises: and calculating the loss of the single facial image in the recognition network model and the parameters needing to be updated by taking the single sample facial image as the input of the recognition network model and taking the cross entropy function as a loss function.
According to the embodiment of the invention, before the triple training identification network model is used, training is carried out through a single sample face image, so that the training difficulty is lower, and meanwhile, the parameters of the identification network model can be better optimized. Therefore, when the triples are used for training, the neural network model with higher recognition rate can be trained only by reducing the triples.
As an alternative embodiment, the training process for identifying the network model is as follows:
1) normalizing each pixel of the training sample to be between [ -1,1 ];
2) randomly selecting weight values from the data conforming to the normal distribution, and setting the batch processing size to be 128;
3) adding a full connection layer and a softmax layer at the tail end of the identification network model, taking a single normalized sample as input, taking cross entropy loss as a loss function, and calculating the loss of the sample in the identification network model and parameters needing to be updated;
4) after the step 3) is repeated, deleting the full connection layer and the softmax layer, reserving the trained convolutional layer parameters, and inputting a triple (P, A, N) consisting of three image samples into the recognition network model;
5) and calculating the loss of the triples (P, A, N) in the recognition network model and the parameters needing to be updated, if the loss is greater than a preset threshold, repeatedly executing the step 4), and if the loss is less than the preset threshold, acquiring that the iteration times of the step 4) reach the preset times, and stopping training.
It can be understood that one of the triplets is used as a reference (anchor), the other two triplets are respectively a facial image (positive) of the same person and a facial image (negative) of a different person, each channel outputs a feature vector, cosine similarity (cosine) of the feature vectors extracted by the neural network between the reference and the other two triplets is respectively calculated, and the recognition network model is optimized through triple loss.
On the basis of the foregoing embodiments, as an optional embodiment, the storing the face features into a hash bucket generated in advance according to a locality sensitive hashing method specifically includes:
selecting a certain number of random mapping functions to map the face features;
the number of the mapping functions is the dimension of the hash value; the mapping function satisfies the following condition:
if the distance d (x, y) between two vectors is less than d1Then the probability of f (x) being f (y) is at least p1
If the distance d (x, y) between two vectors is greater than d2The probability of f (x) ═ f (y) is then at most p2
d1<<d2,p1>>p2And x and y represent two vectors, respectively.
It should be noted that, in view of the practical application of image retrieval, the hash method selected in the embodiment of the present invention must meet the requirement of invariant similarity distance, for the characteristics of large data volume of anchor live broadcast images and voice information and high computation complexity in association retrieval. And mapping the human face features by adopting a locality sensitive hashing method.
First, for a face feature H ═ (H)1,H2,...,HN) N denotes the dimension of the feature vector, and N' random mapping functions f ═ f (f) are selected1,f2,...,fN′) H is mapped. The functional formula is as follows:
H(V)=sign(V·R)
in the above formula, R is a random vector, and the number N 'of the mapping function f represents the dimensionality of the fusion features after mapping, and can be adjusted by setting the value of N'. Meanwhile, the mapping function f meets the following two requirements:
if the distance d (x, y) between two vectors is less than d1Then the probability of f (x) being f (y) is at least p1
If the distance d (x, y) between two vectors is greater than d2The probability of f (x) ═ f (y) is then at most p2
d1<<d2,p1>>p2And x and y represent two vectors, respectively.
With respect to the mapping function, it is,
Figure BDA0002306890870000121
i.e. the size of the angle between the two vectors,
Figure BDA0002306890870000122
after the function mapping, the relative size of the cosine distance between the vectors is unchanged, and the formula is as follows:
Figure BDA0002306890870000123
on the basis of the foregoing embodiments, as an optional embodiment, the calculating the similarity between the face feature of the sample anchor and the face feature of the anchor to be recognized specifically includes:
and performing cosine similarity measurement on the face features of the sample anchor and the face features of the anchor to be recognized to obtain the similarity between the face features of the sample anchor and the face features of the anchor to be recognized.
It should be noted that, the face features of the sample anchor are mapped to each hash bucket in advance by the same local hash sensitivity method, so after the hash operation is performed on the face features of the anchor to be recognized, cosine similarity measurement is respectively performed on the face features of other sample anchors and the face features of the anchor to be recognized which are in the same hash bucket, that is, the similarity between the face features of the sample anchor and the face features of the anchor to be recognized can be obtained.
Fig. 3 is a schematic flow chart of a method for identifying a anchor based on a face feature according to another embodiment of the present invention, as shown in fig. 3, the method includes:
acquiring a certain number of sample anchor which can be the anchor prohibited by each platform and is called as an adverse anchor, extracting a live broadcast image of the adverse anchor for each adverse anchor, and determining the face position in the image through a detection network model; intercepting a face image of an index according to a face position in the image, extracting face features through a recognition network model according to the face image, and storing the face features through a local sensitive hashing method to obtain a plurality of hash buckets, wherein each hash bucket is provided with at least one sample anchor face feature;
for the anchor to be identified, extracting a live broadcast image of the anchor, and determining the face position in the image; intercepting a face image of an index according to the face position in the image, extracting face features through a recognition network model according to the face image, and storing the face features through a local sensitive hash method;
inquiring a hash bucket where the face features of the anchor to be recognized are located, taking the sample anchor in the same hash bucket as the anchor to be matched, calculating the cosine similarity between the face features of the sample anchor to be matched and the face features of the anchor to be recognized, obtaining the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
Fig. 4 is a schematic structural diagram of a device for identifying a anchor based on a face feature according to an embodiment of the present invention, and as shown in fig. 4, the device for identifying an anchor based on a face feature includes: a face feature extraction module 401, a mapping module 402, and a similarity calculation module 403, wherein:
the human face feature extraction module 401 is configured to intercept a frame image from a live video of an anchor to be identified, input the frame image to a pre-trained neural network model, and obtain a human face feature of the anchor to be identified according to an output of the neural network model;
the mapping module 402 is used for storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature in the same hash bucket with the face features as the face features of a sample anchor to be matched;
and the similarity calculation module 403 is configured to calculate similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and use the sample anchor with the highest similarity as the anchor to be recognized.
The apparatus for identifying a anchor based on a face feature according to the embodiments of the present invention specifically executes the flows of the above-mentioned methods for identifying an anchor based on a face feature, and please refer to the contents of the above-mentioned methods for identifying an anchor based on a face feature in detail, which are not described herein again. The device for identifying the anchor based on the human face features, provided by the embodiment of the invention, is used for firstly finding a small number of sample anchors similar to the anchor to be identified through a local sensitive hashing method by acquiring the human face features which are not easy to change along with the change of a live broadcast environment as identification information, and then determining a real anchor from the small number of sample anchors according to the similarity, so that the identification efficiency is improved, and the requirement for identifying the anchor under the live broadcast condition is met.
Fig. 5 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. The processor 510 may invoke a computer program stored in the memory 530 and executable on the processor 510 to perform the methods and apparatuses for identifying a anchor based on human face features provided by the above embodiments, including, for example: intercepting a frame image from a live video of an anchor to be identified, inputting the frame image to a pre-trained neural network model, and obtaining the human face characteristics of the anchor to be identified according to the output of the neural network model; storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched; and calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method for identifying a anchor based on a face feature provided in the foregoing embodiments when executed by a processor, and the method includes: intercepting a frame image from a live video of an anchor to be identified, inputting the frame image to a pre-trained neural network model, and obtaining the human face characteristics of the anchor to be identified according to the output of the neural network model; storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched; and calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for identifying a anchor based on face features is characterized by comprising the following steps:
intercepting a frame image from a live video of an anchor to be identified, inputting the frame image to a pre-trained neural network model, and obtaining the human face characteristics of the anchor to be identified according to the output of the neural network model;
storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched;
and calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
2. The method of claim 1, wherein the neural network model comprises a detection network model and a recognition network model;
correspondingly, the inputting the frame image into a pre-trained neural network model, and taking the output of the neural network model as the face feature of the anchor to be recognized includes:
inputting the frame image into the detection network model and outputting a recognition frame vector;
intercepting a face image from the frame image according to the identification frame vector;
inputting the face image into the recognition network model, and outputting face features;
and the identification frame vector is used for representing the position of an identification frame in the frame image and the probability of identifying the pattern in the identification frame as a human face.
3. The method of identifying a anchor based on human face features of claim 2, wherein the detection network model is a multitasking convolutional neural network; correspondingly, the inputting the frame image into the detection network model to obtain the identification frame vector specifically includes:
zooming the frame image into a plurality of images with different zooming ratios;
and inputting the images with different scales into a multitask convolutional neural network to obtain the identification frame vector of each image.
4. The anchor recognition method based on human face features according to claim 3, wherein the human face image is captured from a frame image according to the recognition frame vector, and is input to the recognition network model to output human face features, specifically:
and intercepting a face image from the frame image according to the recognition frame vector with the highest probability of being recognized as the face, inputting the face image into the recognition network model, and outputting the face features.
5. The method for identifying a anchor based on human face features according to any one of claims 2 to 4, wherein the training method for identifying the network model comprises the following steps:
constructing a plurality of triples, wherein each triplet comprises a reference sample, a positive sample and a negative sample, the reference sample and the positive sample are two different sample face images of one person, and the negative sample is a sample face image of the other person;
inputting the triples into the identification network model, and outputting feature vectors of a reference, a positive sample and a negative sample in the triples;
calculating cosine similarity between the characteristic vector of the reference and the characteristic vector of the positive sample and cosine similarity between the characteristic vector of the reference and the characteristic vector of the negative sample;
and determining triple losses according to the two cosine similarities, and optimizing the identification network model according to the triple losses.
6. The method of claim 5, wherein the inputting the triplet into the recognition network model further comprises:
adding a full connection layer and a softmax layer at the end of the identification network model;
selecting a single sample human face image to train the recognition network model each time, and deleting the full connection layer and the softmax layer when the training times reach a preset threshold value or the loss of the recognition network model is less than a preset requirement;
wherein, selecting a single sample human face image each time to train the recognition network model specifically comprises: and calculating the loss of the single facial image in the recognition network model and the parameters needing to be updated by taking the single sample facial image as the input of the recognition network model and taking the cross entropy function as a loss function.
7. The method according to claim 6, wherein the calculating of the similarity between the face features of the sample anchor and the face features of the anchor to be recognized specifically comprises:
and performing cosine similarity measurement on the face features of the sample anchor and the face features of the anchor to be recognized to obtain the similarity between the face features of the sample anchor and the face features of the anchor to be recognized.
8. An apparatus for identifying a anchor based on human face features, comprising:
the human face feature extraction module is used for intercepting frame images from a live video of an anchor to be identified, inputting the frame images to a pre-trained neural network model, and obtaining the human face features of the anchor to be identified according to the output of the neural network model;
the mapping module is used for storing the face features into a pre-generated hash bucket according to a locality sensitive hash method, and taking at least one feature which is in the same hash bucket with the face features as the face features of a sample anchor to be matched;
and the similarity calculation module is used for calculating the similarity between the face features of the sample anchor and the face features of the anchor to be recognized, and taking the sample anchor with the highest similarity as the anchor to be recognized.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for identifying a anchor based on facial features of any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for identifying a anchor based on facial features of any one of claims 1 to 7.
CN201911243502.6A 2019-12-06 2019-12-06 Method and device for identifying anchor based on face features Pending CN111178146A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911243502.6A CN111178146A (en) 2019-12-06 2019-12-06 Method and device for identifying anchor based on face features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911243502.6A CN111178146A (en) 2019-12-06 2019-12-06 Method and device for identifying anchor based on face features

Publications (1)

Publication Number Publication Date
CN111178146A true CN111178146A (en) 2020-05-19

Family

ID=70650189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911243502.6A Pending CN111178146A (en) 2019-12-06 2019-12-06 Method and device for identifying anchor based on face features

Country Status (1)

Country Link
CN (1) CN111178146A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522989A (en) * 2020-07-06 2020-08-11 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for image retrieval
CN113255488A (en) * 2021-05-13 2021-08-13 广州繁星互娱信息科技有限公司 Anchor searching method and device, computer equipment and storage medium
CN113408412A (en) * 2021-06-18 2021-09-17 北京工业大学 Behavior identification method, system, equipment and storage medium of webcast anchor
CN114302157A (en) * 2021-12-23 2022-04-08 广州津虹网络传媒有限公司 Attribute tag identification and multicast event detection method, device, equipment and medium
WO2022148378A1 (en) * 2021-01-05 2022-07-14 百果园技术(新加坡)有限公司 Rule-violating user processing method and apparatus, and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682092A (en) * 2016-11-29 2017-05-17 深圳市华尊科技股份有限公司 Target retrieval method and terminal
CN108446674A (en) * 2018-04-28 2018-08-24 平安科技(深圳)有限公司 Electronic device, personal identification method and storage medium based on facial image and voiceprint
CN109492624A (en) * 2018-12-29 2019-03-19 北京灵汐科技有限公司 The training method and its device of a kind of face identification method, Feature Selection Model
CN109934115A (en) * 2019-02-18 2019-06-25 苏州市科远软件技术开发有限公司 Construction method, face identification method and the electronic equipment of human face recognition model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682092A (en) * 2016-11-29 2017-05-17 深圳市华尊科技股份有限公司 Target retrieval method and terminal
CN108446674A (en) * 2018-04-28 2018-08-24 平安科技(深圳)有限公司 Electronic device, personal identification method and storage medium based on facial image and voiceprint
CN109492624A (en) * 2018-12-29 2019-03-19 北京灵汐科技有限公司 The training method and its device of a kind of face identification method, Feature Selection Model
CN109934115A (en) * 2019-02-18 2019-06-25 苏州市科远软件技术开发有限公司 Construction method, face identification method and the electronic equipment of human face recognition model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522989A (en) * 2020-07-06 2020-08-11 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for image retrieval
CN111522989B (en) * 2020-07-06 2020-10-30 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for image retrieval
WO2022148378A1 (en) * 2021-01-05 2022-07-14 百果园技术(新加坡)有限公司 Rule-violating user processing method and apparatus, and electronic device
CN113255488A (en) * 2021-05-13 2021-08-13 广州繁星互娱信息科技有限公司 Anchor searching method and device, computer equipment and storage medium
CN113408412A (en) * 2021-06-18 2021-09-17 北京工业大学 Behavior identification method, system, equipment and storage medium of webcast anchor
CN113408412B (en) * 2021-06-18 2024-05-24 北京工业大学 Behavior recognition method, system, equipment and storage medium for live webcast anchor
CN114302157A (en) * 2021-12-23 2022-04-08 广州津虹网络传媒有限公司 Attribute tag identification and multicast event detection method, device, equipment and medium
CN114302157B (en) * 2021-12-23 2023-11-17 广州津虹网络传媒有限公司 Attribute tag identification and substitution event detection methods, device, equipment and medium thereof

Similar Documents

Publication Publication Date Title
CN111178146A (en) Method and device for identifying anchor based on face features
CN108446390B (en) Method and device for pushing information
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN111160110A (en) Method and device for identifying anchor based on face features and voice print features
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN108427927B (en) Object re-recognition method and apparatus, electronic device, program, and storage medium
CN110853033B (en) Video detection method and device based on inter-frame similarity
US20140257995A1 (en) Method, device, and system for playing video advertisement
US11144800B2 (en) Image disambiguation method and apparatus, storage medium, and electronic device
JP2017531883A (en) Method and system for extracting main subject of image
CN113779308B (en) Short video detection and multi-classification method, device and storage medium
CN114331829A (en) Countermeasure sample generation method, device, equipment and readable storage medium
CN107545049B (en) Picture processing method and related product
CN112348117A (en) Scene recognition method and device, computer equipment and storage medium
CN108491872B (en) Object re-recognition method and apparatus, electronic device, program, and storage medium
CN111128196B (en) Method and device for identifying anchor based on voiceprint characteristics
CN111696080B (en) Face fraud detection method, system and storage medium based on static texture
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN108268641A (en) Invoice information recognition methods and invoice information identification device, equipment and storage medium
CN112989098B (en) Automatic retrieval method and device for image infringement entity and electronic equipment
CN115294162B (en) Target identification method, device, equipment and storage medium
CN116798041A (en) Image recognition method and device and electronic equipment
CN111163332A (en) Video pornography detection method, terminal and medium
CN116228644A (en) Image detection method, electronic device and storage medium
CN111967312B (en) Method and system for identifying important persons in picture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination