CN111666908A - Interest portrait generation method, device and equipment for video user and storage medium - Google Patents

Interest portrait generation method, device and equipment for video user and storage medium Download PDF

Info

Publication number
CN111666908A
CN111666908A CN202010526685.9A CN202010526685A CN111666908A CN 111666908 A CN111666908 A CN 111666908A CN 202010526685 A CN202010526685 A CN 202010526685A CN 111666908 A CN111666908 A CN 111666908A
Authority
CN
China
Prior art keywords
video
face
user
target
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010526685.9A
Other languages
Chinese (zh)
Other versions
CN111666908B (en
Inventor
眭哲豪
卢江虎
项伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Baiguoyuan Information Technology Co Ltd filed Critical Guangzhou Baiguoyuan Information Technology Co Ltd
Priority to CN202010526685.9A priority Critical patent/CN111666908B/en
Publication of CN111666908A publication Critical patent/CN111666908A/en
Application granted granted Critical
Publication of CN111666908B publication Critical patent/CN111666908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for generating an interest portrait of a video user. Wherein, the method comprises the following steps: respectively intercepting real face areas actually shot by video users in each video frame of a target video; clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video; and generating portrait information representing each real face area in the face set, fusing the portrait information of each real face area, and generating the interesting portrait information of the video user in the target video. The technical scheme provided by the embodiment of the invention realizes the generation of the interest portrait of the user at the video level, eliminates the interference of non-human face pictures in the target video, and improves the comprehensiveness and the accuracy of the generation of the interest portrait information of the video user in the target video.

Description

Interest portrait generation method, device and equipment for video user and storage medium
Technical Field
The embodiment of the invention relates to the technical field of video processing, in particular to a method, a device, equipment and a storage medium for generating interest portraits of video users.
Background
In the current era of information explosion, short videos have become important channels for people to acquire information, and if appropriate short video works can be recommended to each user, the use experience of the user is improved, and the daily life and the retention of the user are improved by using short video products; however, when accurate recommendation is performed for each user, it is necessary to analyze a short video field in which the user is interested relatively through basic attribute information of the video user, such as age and gender of the video user, included in various videos historically uploaded by the user, and further recommend other related short videos in the field for the user, so that it is important to accurately generate image information of interest of the short video user, including age and gender, in the information recommendation field.
At present, a certain video frame which can represent the user information contained in a video is usually selected from videos uploaded by users in history, and then a face area in the video frame is directly identified by a convolutional neural network based on deep learning, so that portrait information composed of age, gender and the like of the users contained in the video frame is judged as the interesting portrait information of the video users represented by the video.
At this time, in the prior art, user portrait recognition is performed only on a certain specific video frame in a video, and the recognized portrait information is used as interest portrait information of a video user represented by the video, while user portrait recognition is not performed on other video frames, so that the finally obtained interest portrait information of the video user has a certain one-sidedness, and meanwhile, interest portrait recognition of the video user among different video frames has much interference, which affects interest portrait recognition accuracy of the video user.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating an interest portrait of a video user, which are used for improving the comprehensiveness and accuracy of interest portrait information of the video user.
In a first aspect, an embodiment of the present invention provides a method for generating an interest portrait of a video user, where the method includes:
respectively intercepting real face areas actually shot by video users in each video frame of a target video;
clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video;
and generating portrait information of each real face area in the representative face set, fusing the portrait information of each real face area, and generating the interesting portrait information of the video user in the target video.
In a second aspect, an embodiment of the present invention provides an interest representation generation apparatus for a video user, where the apparatus includes:
the real face intercepting module is used for respectively intercepting real face areas actually shot by video users in each video frame of the target video;
the representative face clustering module is used for clustering the real face region in the target video by adopting the face representative characteristics of the real face region to obtain a representative face set of the video user in the target video;
and the portrait generation module is used for generating portrait information of each real face area in the representative face set, fusing the portrait information of each real face area and generating the interesting portrait information of the video user in the target video.
In a third aspect, an embodiment of the present invention provides an apparatus, where the apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for generating a representation of interest of a video user according to any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for generating an interest representation of a video user according to any embodiment of the present invention.
The method, the device, the equipment and the storage medium for generating the interest portrait of the video user provided by the embodiment of the invention firstly respectively intercept real face areas actually shot by the video user in each video frame of a target video, eliminate the interference of non-face pictures in the target video, further adopt face representative characteristics of each real face area to cluster the real face areas in the target video to obtain a representative face set of the video user in the target video, at the moment, the real face areas in the representative face set can comprehensively represent the interest range of the video user in the target video, subsequently directly generate the portrait information of each real face area in the representative face set, fuse the portrait information of each real face area to generate the interest portrait information of the video user in the target video, thereby realizing the generation of the interest portrait of the user at the video level, the problem of one-sidedness of the interest portrait information when the user portrait information under a certain specific video frame in the video is used as the user interest portrait information of the video is avoided, meanwhile, the interference of non-human-face pictures in the target video is eliminated, and the comprehensiveness and the accuracy of the generation of the interest portrait information of the video user in the target video are improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1A is a flowchart illustrating a method for generating an interest representation of a video user according to an embodiment of the present invention;
FIG. 1B is a schematic diagram of a process for generating an interest representation of a video user according to an embodiment of the present invention;
FIG. 2A is a flowchart illustrating a method for generating an interest representation of a video user according to a second embodiment of the present invention;
fig. 2B is a schematic diagram illustrating a principle of a process of intercepting a real face region in the method according to the second embodiment of the present invention;
FIG. 3A is a flowchart illustrating a method for generating an interest representation of a video user according to a third embodiment of the present invention;
FIG. 3B is a schematic diagram illustrating a process of generating a user-level interest representation according to a third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an interest image generation apparatus for a video user according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
Example one
Fig. 1A is a flowchart of a method for generating an interest portrait of a video user according to an embodiment of the present invention, which is applicable to any situation of generating interest portrait information of a video user at a video level. The method for generating interest images of video users provided by this embodiment may be implemented by the apparatus for generating interest images of video users provided by the embodiment of the present invention, where the apparatus may be implemented in a software and/or hardware manner, and is integrated in a device for implementing the method, and the device may be a background server specially responsible for uploading and storing video data.
Specifically, referring to fig. 1A, the method may include the steps of:
and S110, respectively intercepting real face areas actually shot by the video user in each video frame of the target video.
Specifically, in the short video recommendation field, it is generally necessary to determine basic attribute information of each video user, such as age, gender, and the like of the video user, and screen short videos in which the video user is interested to perform accurate recommendation according to the basic attribute information of each video user, and since the video user usually cannot fill real attribute information when registering an account, it is important in the short video recommendation field to accurately generate a user interest portrait including the attribute information of age, gender, and the like, which is interesting to the video user.
Optionally, the video user may refer to a user who has registered a corresponding account in the short video software, and at this time, because the video user usually takes a picture related to the video user when actually taking the video, that is, the picture of the video user may exist in the video actually taken by the video user in the history process, the user interest portrait including basic attribute information such as age and gender of the user interested in the video may be calculated by analyzing a face picture included in the video historically taken by the video user in this embodiment. At this time, the target video in this embodiment may refer to any video that is actually shot by the video user in a history process and then uploaded to the video server to be published on the internet.
In this embodiment, when generating a user interest representation including basic attribute information such as age and gender of a user interested in a video user, the user interest representation mainly refers to face data appearing in a video actually captured by the video user, so that when analyzing the basic attribute information such as age and gender of the user included in a target video, the target video in this embodiment is first selected from a plurality of videos actually captured and uploaded by the video user in a historical process, the target video is correspondingly decoded and processed in a frame-per-second manner to obtain a series of decoded video frames of the target video, and then each video frame in the target video is subjected to picture analysis, so that a real face region actually captured by the video user is respectively captured in each video frame, for example, for a co-shot video of the video user, only a cut-off picture region in a partial face region captured by the video user in each video frame of the co-shot video is needed And taking the corresponding real face area without intercepting the real face area in the picture area shot by other users in each video frame of the co-shooting video, and subsequently carrying out portrait analysis on the real face area actually shot by the video user in each video frame, thereby eliminating the interference of shooting by other users and non-real faces and improving the accuracy of the interest portrait information of the video user.
And S120, clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video.
The face representation features in this embodiment refer to features in an actual face region that can represent an accurate shape of key points such as a nose, a mouth, and eyes, for example, a low-dimensional embedding feature vector representing an actual face shape.
Specifically, after the real face regions actually shot by the video users are respectively captured in each video frame of the target video, since the video users may have a plurality of people who go out of the mirror when actually shooting the target video, the real face regions captured in each video frame of the target video may be face images of a plurality of different users, and at this time, the representative features of faces among the different users are different, so that the present embodiment first performs feature extraction on each real face region captured in the target video by using the existing feature extraction network to obtain the representative face features of each real face region; then, the similarity between the face representative features of each real face region is analyzed through the existing Clustering algorithm to cluster each real face region in the target video, in this embodiment, a density-based Clustering algorithm (DBSCAN) can be adopted to perform Clustering analysis on the face representative features of each real face region in the target video, so as to obtain a plurality of different face Clustering results, at this time, the real face regions included in each face Clustering result are all similar faces, that is, each face Clustering result can represent face information of one user; at this time, since the target video is actually captured by the video user, it can be considered that the face most mirrored in the target video is the main character of the target video, that is, the video user in this embodiment, even if the face most mirrored in the target video is not the video user in this embodiment, the video user is interested in videos containing such faces, so after a plurality of different face clustering results are obtained, the face clustering result with the largest number of similar real face regions can be screened, at this time, it is considered that each real face region contained in the screened face clustering result is face information capable of representing the video user's interest, and the screened face clustering result is used as the representative face set of the video user in the target video in this embodiment, the representative face set contains a plurality of real face regions that the video user is interested in, and the face representative feature of each real face region.
S130, generating portrait information representing each real face area in the face set, fusing the portrait information of each real face area, and generating the interesting portrait information of the video user in the target video.
Specifically, after a representative face set of a video user in a target video is obtained, since a plurality of real face regions contained in the representative face set can represent real face information that is of interest to the video user, the present embodiment can determine portrait information such as age and gender that is represented by each real face region that is of interest to the video user by analyzing the representative face features of each real face region in the representative face set, and then fuse portrait information that is represented by each real face region in the representative face set, so as to obtain portrait information of interest to the video user in the target video, at this time, the portrait information of each real face region in the representative face set of the video user is analyzed and fused, thereby avoiding the one-sidedness existing when a specific video frame is used to analyze a user interest picture at a video level, the comprehensiveness and accuracy of the interesting portrait information generation of the video user in the target video are improved.
For example, as shown in fig. 1B, in order to accurately generate the portrait information of each real face area in the representative face set of the video user in the target video, in this embodiment, a training mode of a convolutional neural network model is adopted, and a portrait generation model capable of accurately generating the portrait information of the real face area is pre-constructed; at this time, generating the portrait information representing each real face region in the face set in this embodiment may specifically include: and aiming at each real face area in the representative face set, inputting the face representative characteristics of the real face area into a pre-constructed portrait generation model to obtain sub-portrait information of the real face area under each portrait dimension, and combining the sub-portrait information into portrait information of the real face area.
Specifically, after a representative face set of a video user in a target video is obtained, the embodiment may sequentially input the face representative features of each real face region in the representative face set into a pre-constructed portrait generation model, perform feature analysis in different portrait dimensions on the face representative features of each real face region through network parameters and a network structure of the portrait generation model in a training process, and further obtain sub-portrait information of each real face region in each portrait dimension, where the portrait dimension may be different attributes such as age and gender contained in a user portrait; subsequently, the sub-portrait information of each real face area under each portrait dimension can be combined, so as to obtain the portrait information of each real face area.
For example, in order to ensure the accuracy of generating the sub-portrait information of the real face region in each portrait dimension, the present embodiment may set a corresponding portrait sub-model in each portrait dimension in the pre-constructed portrait generation model, for example, an age sub-model set for the age of the user, a gender sub-model set for the gender of the user, and the like. At this time, the present embodiment may sequentially input the face representation features of each real face region in the representative face set into each portrait sub-model in the portrait generation model, and each portrait sub-model analyzes the sub-portrait information of the real face region in the corresponding portrait dimension, so that each portrait sub-model outputs the sub-portrait information of the real face region in the corresponding portrait dimension.
In the present embodiment, the type of each portrait sub-model may be set according to the characteristics of different portrait dimensions, for example, the age sub-model is a regression model, and the gender sub-model is a binary model.
In addition, after obtaining the sub-portrait information of the real face area in each portrait dimension, for the fusion operation of the portrait information of each real face area, the embodiment may perform mean analysis on the sub-portrait information of each real face area in the representative face set in the portrait dimension for each portrait dimension, and determine the interest sub-portrait information of the video user in the portrait dimension in the target video; and combining the interest sub-portrait information of the video user in each portrait dimension in the target video to obtain the interest portrait information of the video user in the target video.
Specifically, for each portrait dimension, the portrait sub-model in the portrait dimension outputs sub-portrait information of each real face region in the portrait dimension, such as each age indicated by the video user in each real face region, and the like, at this time, by performing mean analysis on the sub-portrait information of each real face region in the face set in the portrait dimension, the sub-portrait mean in each portrait dimension is used as the interesting sub-portrait information of the video user in the portrait dimension in the target video, such as the age mean of each real face region, and the like, by performing mean analysis on the sub-portrait information of each real face region in each portrait dimension, the interesting sub-portrait information of the video user in each portrait dimension in the target video is respectively determined, and then the interesting sub-portrait information of the video user in each portrait dimension in the target video is combined, and obtaining the interest portrait information of the video user in the target video.
The technical scheme provided by the embodiment includes the steps that firstly, real face areas actually shot by video users are respectively intercepted in each video frame of a target video, interference of non-face pictures in the target video is eliminated, then face representative characteristics of each real face area are adopted, the real face areas in the target video are clustered, a representative face set of the video users in the target video is obtained, at the moment, the real face areas in the representative face set can comprehensively represent interest ranges of the video users in the target video, portrait information of each real face area in the representative face set is directly generated subsequently, portrait information of each real face area in the representative face set is fused, and the portrait information of the video users in the target video is generated, so that the user interest portrait generation at the video level is achieved, and the situation that user interest portrait information under a certain specific video frame in the video is taken as the portrait information of the video users is avoided The method solves the problem of picture comprehensiveness, simultaneously eliminates the interference of non-human face pictures in the target video, and improves the comprehensiveness and the accuracy of the generation of the interesting portrait information of the video user in the target video.
Example two
Fig. 2A is a flowchart of a method for generating an interest portrait of a video user according to a second embodiment of the present invention, and fig. 2B is a schematic diagram of a principle of a process of capturing a real face area in the method according to the second embodiment of the present invention. The embodiment is optimized on the basis of the embodiment. Specifically, as shown in fig. 2A, the present embodiment mainly explains in detail a specific clipping process of a real face region in each video frame in a target video.
Optionally, as shown in fig. 2A, the present embodiment may include the following steps:
s210, respectively intercepting a frame area actually shot by a video user in each video frame of the target video to serve as a corresponding new video frame.
Optionally, when the video user actually shoots the video, the video shot this time may be merged with other videos to be uploaded as a co-shot video in a unified manner, so that pictures shot by other users are likely to exist in each video frame of the target video in the embodiment, at this time, in order to ensure the correlation between the intercepted real face area and the video user, the embodiment first determines the picture area actually shot by the video user in each video frame of the target video, further intercepts the frame area actually shot by the video user in each video frame of the target video respectively to be used as a corresponding new video frame, subsequently intercepts the real face area in the new video frame, and avoids interference of video pictures shot by other users.
For example, for the interception of a new video frame in a target video, the present embodiment may specifically include: if the target video is a snap-shot video, determining an actual shooting area of a video user in the target video, and respectively capturing a frame picture in the actual shooting area in each video frame to serve as a corresponding new video frame; and if the target video is the non-snap video, taking each video frame in the target video as a corresponding new video frame.
Specifically, since pictures shot by other users only exist in the snap-shot video, after the target video is acquired, in this embodiment, it is first determined whether the target video is the snap-shot video, if the target video is the snap-shot video, it is indicated that pictures shot by other users exist in each video frame of the target video, it is necessary to determine an actual shooting area of the video user in each video frame of the target video, and then frame pictures in the actual shooting area are respectively captured in each video frame to serve as corresponding new video frames; and if the target video is a non-co-shot video, which indicates that no picture shot by other users exists in each video frame of the target video, each video frame in the target video is not processed, and each original video frame in the target video is directly used as a corresponding new video frame.
At this time, the step of determining whether the target video is a snap video may specifically include: if the size proportion of the video frames in the target video accords with the preset ratio of the snap-shot video, determining the target video as the snap-shot video, otherwise, performing linear detection on the middle area of each video frame in the target video by adopting an edge detection operator; and if each video frame in the target video detects a straight line, and the straight line exceeds a preset length, determining that the target video is a snap-shot video.
Specifically, since the co-shooting video is obtained by merging a plurality of different videos actually shot by different users, and the size of the video actually shot by the users is consistent with the size of the terminal display screen, the size ratio of the co-shooting video and the non-co-shooting video is greatly different, for example, the size ratio of the non-co-shooting video actually shot by the users is adapted to the display screen and more approaches to a rectangle, and the size ratio of most of the co-shooting videos spliced by the plurality of non-co-shooting videos approaches to a square, so the size ratio of most of the co-shooting videos usually conforms to the preset co-shooting video ratio, at this time, if the video size ratio in the target video conforms to the preset co-shooting video ratio, the target video is determined to be the co-shooting video, otherwise, each video frame in the target video is converted into a gray scale image, and an edge detection operator (Sobel) is continuously adopted to perform linear detection on each video frame in the target video, because a vertical straight line is certainly arranged in the middle of the co-shooting video, the middle part area of each video frame after the straight line detection is selected, and the corresponding image expansion operation is carried out on the middle part area, so that the straight line length detected by the middle part area is counted; at this time, if a straight line is detected in the middle area of each video frame in the target video and exceeds a preset length, which indicates that the straight line is a straight line appearing in the snap-shot video, the target video is determined to be the snap-shot video. Meanwhile, if the size ratio of the video frames in the target video does not accord with the preset snap video ratio, or no straight line is detected in each video frame in the target video, or the detected straight line does not exceed the preset length, the target video is taken as the non-snap video.
And S220, respectively capturing corresponding human face target areas in each new video frame, and screening corresponding real human face areas from the human face target areas.
Optionally, after a frame region actually shot by a video user is respectively captured from each video frame of the target video as a corresponding new video frame, a face target region actually shot by the video user can be respectively captured from each new video frame by performing face image analysis on each new video frame in the target video; at this time, after a video user actually shoots a certain video, an animation character special effect or a facial expression special effect may be added to the video, so that some unreal face areas with the animation character special effect or the facial expression special effect may exist in a face target area captured in each new video frame, at this time, a corresponding real face area needs to be screened from the face target area, and then, only face representative features are extracted from each real face area, so that the influence of the unreal face on the generation of the interest portrait of the video user is eliminated, and the accuracy of the generation of the interest portrait of the video user is ensured.
For example, for the identification of the face target area in each new video frame, a face detection model may be pre-constructed in this embodiment, and the face detection model may accurately identify the face area in the image, at this time, as shown in fig. 2B, the embodiment respectively intercepts the corresponding face target area in each new video frame, and may specifically include: respectively inputting each new video frame into a pre-constructed face detection model, and marking an initial face region in the new video frame and a face confidence coefficient of the initial face region; and taking the initial face area with the face confidence coefficient exceeding a preset confidence threshold value in each new video frame as a corresponding face target area, and intercepting according to the mark position of the face target area.
Specifically, in this embodiment, each new video frame may be respectively input into a pre-constructed face detection model, and the face detection model identifies whether a corresponding face region exists in each new video frame, at this time, the face detection model may use a rectangular frame to mark and identify each initial face region in each new video frame, and record the coordinate position of each initial face region and the face confidence of the initial face region; at this time, if the face confidence of a certain initial face region exceeds a preset confidence threshold, it is indicated that the initial face region accurately includes a face image, so that the initial face region whose face confidence in each video frame exceeds the preset confidence threshold is taken as a corresponding face target region, and corresponding interception is performed according to the rectangular frame mark position of each face target region in each new video frame, thereby obtaining the face target region existing in each new video frame.
Further, after the corresponding face target regions are respectively captured in each new video frame, the embodiment may use a pre-constructed face classification model to determine whether each face target region is a real face region, and at this time, for the screening of the real face regions, the embodiment may respectively input each face target region into the pre-constructed face classification model to obtain a real face score of the face target region; and taking the face target area with the real face score exceeding a preset real threshold value as a corresponding real face area.
And S230, clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video.
S240, generating portrait information representing each real face area in the face set, fusing the portrait information of each real face area, and generating the interesting portrait information of the video user in the target video.
The technical solution provided by this embodiment is to intercept a frame region actually shot by a video user in each video frame of a target video as a corresponding new video frame, and intercept a corresponding face target region in each new video frame, further screen a corresponding real face region from the face target region, exclude the influence of pictures shot by other users and non-real faces on the generation of the portrait of the video user, subsequently adopt the face representative features of each real face region, cluster the real face regions in the target video, obtain a representative face set of the video user in the target video, at this time, the real face regions in the representative face set can comprehensively represent the interest range of the video user in the target video, subsequently directly generate the portrait information of each real face region in the representative face set, and fuse the portrait information of each real face region, and interest portrait information of the video user in the target video is generated, so that the interest portrait of the user at the video level is generated, and the comprehensiveness and accuracy of the interest portrait information of the video user in the target video are improved.
EXAMPLE III
Fig. 3A is a flowchart of a method for generating an interest portrait of a video user according to a third embodiment of the present invention, and fig. 3B is a schematic diagram of a process for generating an interest portrait at a user level according to the third embodiment of the present invention. The embodiment is optimized on the basis of the embodiment. Specifically, as shown in fig. 3A, the present embodiment mainly explains in detail a specific generation process of the image information of interest of the video user at the user level.
Optionally, as shown in fig. 3A, the present embodiment may include the following steps:
s310, respectively intercepting real face areas actually shot by video users in each video frame of the target video.
And S320, clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video.
And S330, sequentially taking each video actually shot and uploaded by the video user in the historical video library as a target video, and determining a representative face set of the video user in each video.
Optionally, when analyzing the interest image information of the video user at the user level, first of all, the image information in a large number of videos of the video user in a history video library actually shot and uploaded by the video user needs to be analyzed, so that the interest image information at multiple video levels can be conveniently and comprehensively analyzed subsequently, and the interest image information of the video user at the user level can be obtained; at this time, each video in the history video library actually shot and uploaded by the video user can be sequentially used as the target video in the embodiment, the representative face set of the video user in each video in the history video library is determined by adopting the interest figure generation method of the video user at the video level provided in the embodiment, subsequently, the representative face sets in each video are integrated together and analyzed again, and the comprehensiveness of the interest figure information of the video user at the user level is ensured.
And S340, re-clustering the real face areas of the video users after the representative face sets in the videos are combined to obtain a comprehensive representative face set of the video users.
Optionally, after determining the representative face set of the video user in each video, merging the representative face sets of the video user in each video, at this time, because different videos shot by the video user may face different users, so that the merged whole representative face set may have dissimilar real face regions, so that the similarity between the face representative features of the real face regions in the merged whole representative face set can be analyzed through the existing clustering algorithm, so as to re-cluster the real face regions in the merged whole representative face set, at this time, the real face regions included in each face clustering result are all similar faces, that is, each face clustering result can represent face information of one user; the target video is actually shot by the video user, so that the face with the most mirror in the combined overall representative face set can be considered to represent the video picture which is interested by the video user in the embodiment, and therefore after a plurality of different face clustering results are obtained, the face clustering result with the most number of similar real face regions can be screened out, each real face region contained in the screened face clustering result is considered to be the face information which is interested by the video user at the moment, the screened face clustering result is taken as the comprehensive representative face set of the video user in the embodiment, and the comprehensive representative face set contains a plurality of real face regions which are interested by the video user and the face representative features of each real face region.
And S350, generating the portrait information comprehensively representing each real face area in the face set, fusing the portrait information of each real face area, and generating the interest portrait information of the video user.
Optionally, after obtaining the comprehensive representative face set of the video user, since a plurality of real face regions included in the comprehensive representative face set can all represent real face information in which the video user is interested, the embodiment can determine age, gender and other portrait information represented by each real face region in which the video user is interested by analyzing the face representative features of each real face region in the comprehensive representative face set, and then fuse portrait information represented in each real face region in the comprehensive representative face set, so as to obtain the portrait information in which the video user is interested, at this time, analyze and fuse portrait information of each real face region in the comprehensive representative face set of the video user, so as to avoid the presence of one-sidedness when analyzing the user's portrait of video level by using a specific video frame, the comprehensiveness and the accuracy of the interesting portrait information generation of the video user are improved.
According to the technical scheme provided by the embodiment, each video in a historical video library actually shot and uploaded by a video user is sequentially used as a target video, a representative face set of the video user in each video is determined, real face areas of the video user in each video after the representative face sets are combined are re-clustered to obtain a comprehensive representative face set of the video user, then portrait information of each real face area in the comprehensive representative face set is generated, portrait information of each real face area is fused, interesting portrait information of the video user is generated, interesting portrait information of the video user at a user level is further generated on the basis of generating interesting portrait information of the video user at a video level, and comprehensiveness and accuracy of interesting portrait generation of the video user are improved.
Example four
Fig. 4 is a schematic structural diagram of an interest representation generating apparatus for a video user according to a fourth embodiment of the present invention, and specifically, as shown in fig. 4, the apparatus may include:
a real face intercepting module 410, configured to respectively intercept a real face area actually captured by a video user in each video frame of the target video;
the representative face clustering module 420 is configured to cluster the real face regions in the target video by using face representative features of the real face regions, so as to obtain a representative face set of the video user in the target video;
and the portrait generation module 430 is configured to generate portrait information representing each real face region in the face set, and fuse the portrait information of each real face region to generate portrait information of interest of the video user in the target video.
The technical scheme provided by the embodiment includes the steps that firstly, real face areas actually shot by video users are respectively intercepted in each video frame of a target video, interference of non-face pictures in the target video is eliminated, then face representative characteristics of each real face area are adopted, the real face areas in the target video are clustered, a representative face set of the video users in the target video is obtained, at the moment, the real face areas in the representative face set can comprehensively represent interest ranges of the video users in the target video, portrait information of each real face area in the representative face set is directly generated subsequently, portrait information of each real face area in the representative face set is fused, and the portrait information of the video users in the target video is generated, so that the user interest portrait generation at the video level is achieved, and the situation that user interest portrait information under a certain specific video frame in the video is taken as the portrait information of the video users is avoided The method solves the problem of picture comprehensiveness, simultaneously eliminates the interference of non-human face pictures in the target video, and improves the comprehensiveness and the accuracy of the generation of the interesting portrait information of the video user in the target video.
The interest portrait generating device for the video user provided by the embodiment can be applied to the interest portrait generating method for the video user provided by any embodiment, and has corresponding functions and beneficial effects.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention, as shown in fig. 5, the apparatus includes a processor 50, a storage device 51, and a communication device 52; the number of processors 50 in the device may be one or more, and one processor 50 is taken as an example in fig. 5; the processor 50, the storage means 51 and the communication means 52 in the device may be connected by a bus or other means, which is exemplified in fig. 5.
The device provided by the embodiment can be used for executing the interest portrait generation method of the video user provided by any embodiment, and has corresponding functions and beneficial effects.
EXAMPLE six
The sixth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the method for generating an interest representation of a video user in any of the above embodiments. The method specifically comprises the following steps:
respectively intercepting real face areas actually shot by video users in each video frame of a target video;
clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video;
and generating portrait information representing each real face area in the face set, fusing the portrait information of each real face area, and generating the interesting portrait information of the video user in the target video.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the method for generating an interest representation of a video user provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the above-mentioned interest representation generation apparatus for video users, the included units and modules are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. An interest portrait generation method for a video user, comprising:
respectively intercepting real face areas actually shot by video users in each video frame of a target video;
clustering the real face regions in the target video by adopting the face representative characteristics of the real face regions to obtain a representative face set of the video user in the target video;
and generating portrait information of each real face area in the representative face set, fusing the portrait information of each real face area, and generating the interesting portrait information of the video user in the target video.
2. The method of claim 1, wherein generating the representation information representing each real face region in the face set comprises:
and aiming at each real face area in the representative face set, inputting the face representative characteristics of the real face area into a pre-constructed portrait generation model to obtain sub-portrait information of the real face area under each portrait dimension, and combining the sub-portrait information into portrait information of the real face area.
3. The method of claim 2, wherein fusing the portrait information of each real face region to generate the portrait information of interest of the video user in the target video comprises:
for each portrait dimension, performing mean value analysis on sub-portrait information of each real face area in the representative face set under the portrait dimension, and determining interest sub-portrait information of the video user under the portrait dimension in the target video;
and combining the interest sub-portrait information of the video user in each portrait dimension in the target video to obtain the interest portrait information of the video user in the target video.
4. The method of claim 1, wherein the step of respectively intercepting the real face area actually shot by the video user in each video frame of the target video comprises:
respectively intercepting a frame area actually shot by a video user in each video frame of the target video to serve as a corresponding new video frame;
and respectively capturing corresponding human face target areas in each new video frame, and screening corresponding real human face areas from the human face target areas.
5. The method of claim 4, wherein the step of respectively intercepting the frame area actually shot by the video user in each video frame of the target video as a corresponding new video frame comprises:
if the target video is a snap-shot video, determining an actual shooting area of the video user in the target video, and respectively capturing a frame picture in the actual shooting area in each video frame as a corresponding new video frame;
and if the target video is a non-snap video, taking each video frame in the target video as a corresponding new video frame.
6. The method according to claim 5, before intercepting the frame area actually shot by the video user in each video frame of the target video, further comprising:
if the size proportion of the video frames in the target video accords with the preset ratio of the snap-shot video, determining that the target video is the snap-shot video, and otherwise, performing linear detection on the middle area of each video frame in the target video by adopting an edge detection operator;
and if each video frame in the target video detects a straight line, and the straight line exceeds a preset length, determining that the target video is a snap-shot video.
7. The method of claim 4, wherein the intercepting of the corresponding face target area in each new video frame comprises:
respectively inputting each new video frame into a pre-constructed face detection model, and marking an initial face region in the new video frame and a face confidence coefficient of the initial face region;
and taking the initial face region of which the face confidence coefficient exceeds a preset confidence threshold value in each new video frame as a corresponding face target region, and intercepting according to the mark position of the face target region.
8. The method of claim 4, wherein the step of screening out the corresponding real face region from the face target region comprises:
respectively inputting each human face target area into a human face classification model which is constructed in advance to obtain the real human face score of the human face target area;
and taking the face target area with the real face score exceeding a preset real threshold value as a corresponding real face area.
9. The method according to any one of claims 1-8, further comprising:
sequentially taking each video in a historical video library which is actually shot and uploaded by the video user as the target video, and determining a representative face set of the video user in each video;
re-clustering real face regions of the video users after the representative face sets in the videos are combined to obtain a comprehensive representative face set of the video users;
and generating portrait information of each real face area in the comprehensive representative face set, fusing the portrait information of each real face area, and generating the interest portrait information of the video user.
10. An interest figure generation device for a video user, comprising:
the real face intercepting module is used for respectively intercepting real face areas actually shot by video users in each video frame of the target video;
the representative face clustering module is used for clustering the real face region in the target video by adopting the face representative characteristics of the real face region to obtain a representative face set of the video user in the target video;
and the portrait generation module is used for generating portrait information of each real face area in the representative face set, fusing the portrait information of each real face area and generating the interesting portrait information of the video user in the target video.
11. An apparatus, characterized in that the apparatus comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method of interest representation generation for a video user as recited in any of claims 1-9.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for generating a representation of interest of a video user according to any one of claims 1 to 9.
CN202010526685.9A 2020-06-09 2020-06-09 Method, device, equipment and storage medium for generating interest portraits of video users Active CN111666908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010526685.9A CN111666908B (en) 2020-06-09 2020-06-09 Method, device, equipment and storage medium for generating interest portraits of video users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010526685.9A CN111666908B (en) 2020-06-09 2020-06-09 Method, device, equipment and storage medium for generating interest portraits of video users

Publications (2)

Publication Number Publication Date
CN111666908A true CN111666908A (en) 2020-09-15
CN111666908B CN111666908B (en) 2023-05-16

Family

ID=72386670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010526685.9A Active CN111666908B (en) 2020-06-09 2020-06-09 Method, device, equipment and storage medium for generating interest portraits of video users

Country Status (1)

Country Link
CN (1) CN111666908B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407418A (en) * 2016-09-23 2017-02-15 Tcl集团股份有限公司 A face identification-based personalized video recommendation method and recommendation system
CN108446385A (en) * 2018-03-21 2018-08-24 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN108471544A (en) * 2018-03-28 2018-08-31 北京奇艺世纪科技有限公司 A kind of structure video user portrait method and device
CN109190449A (en) * 2018-07-09 2019-01-11 北京达佳互联信息技术有限公司 Age recognition methods, device, electronic equipment and storage medium
CN110135257A (en) * 2019-04-12 2019-08-16 深圳壹账通智能科技有限公司 Business recommended data generation, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407418A (en) * 2016-09-23 2017-02-15 Tcl集团股份有限公司 A face identification-based personalized video recommendation method and recommendation system
CN108446385A (en) * 2018-03-21 2018-08-24 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN108471544A (en) * 2018-03-28 2018-08-31 北京奇艺世纪科技有限公司 A kind of structure video user portrait method and device
CN109190449A (en) * 2018-07-09 2019-01-11 北京达佳互联信息技术有限公司 Age recognition methods, device, electronic equipment and storage medium
CN110135257A (en) * 2019-04-12 2019-08-16 深圳壹账通智能科技有限公司 Business recommended data generation, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111666908B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN108875676B (en) Living body detection method, device and system
CN106897658B (en) Method and device for identifying human face living body
CN109815924B (en) Expression recognition method, device and system
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
US10318797B2 (en) Image processing apparatus and image processing method
CN110569721A (en) Recognition model training method, image recognition method, device, equipment and medium
KR20180122926A (en) Method for providing learning service and apparatus thereof
CN108875542B (en) Face recognition method, device and system and computer storage medium
US20160004904A1 (en) Facial tracking with classifiers
CN111008935B (en) Face image enhancement method, device, system and storage medium
CN111242097A (en) Face recognition method and device, computer readable medium and electronic equipment
CN113779308B (en) Short video detection and multi-classification method, device and storage medium
CN110741377A (en) Face image processing method and device, storage medium and electronic equipment
CN111814620A (en) Face image quality evaluation model establishing method, optimization method, medium and device
CN111738120B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN109241299B (en) Multimedia resource searching method, device, storage medium and equipment
CN112149570B (en) Multi-person living body detection method, device, electronic equipment and storage medium
CN112084812A (en) Image processing method, image processing device, computer equipment and storage medium
CN111079587B (en) Face recognition method and device, computer equipment and readable storage medium
KR101961462B1 (en) Object recognition method and the device thereof
CN113705310A (en) Feature learning method, target object identification method and corresponding device
CN115223022B (en) Image processing method, device, storage medium and equipment
CN111666908B (en) Method, device, equipment and storage medium for generating interest portraits of video users
CN112101479B (en) Hair style identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231008

Address after: 31a, 15 / F, building 30, maple mall, bangrang Road, Brazil, Singapore

Patentee after: Baiguoyuan Technology (Singapore) Co.,Ltd.

Address before: 5-13 / F, West Tower, building C, 274 Xingtai Road, Shiqiao street, Panyu District, Guangzhou, Guangdong 510000

Patentee before: GUANGZHOU BAIGUOYUAN INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right