CN111598600A - Multimedia information pushing method and system and terminal equipment - Google Patents

Multimedia information pushing method and system and terminal equipment Download PDF

Info

Publication number
CN111598600A
CN111598600A CN201910132779.5A CN201910132779A CN111598600A CN 111598600 A CN111598600 A CN 111598600A CN 201910132779 A CN201910132779 A CN 201910132779A CN 111598600 A CN111598600 A CN 111598600A
Authority
CN
China
Prior art keywords
attribute
image
face image
multimedia information
analysis result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910132779.5A
Other languages
Chinese (zh)
Inventor
方三勇
邱翰
王进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rainbow Software Co ltd
ArcSoft Corp Ltd
Original Assignee
Rainbow Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rainbow Software Co ltd filed Critical Rainbow Software Co ltd
Priority to CN201910132779.5A priority Critical patent/CN111598600A/en
Publication of CN111598600A publication Critical patent/CN111598600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0252Targeted advertisements based on events or environment, e.g. weather or festivals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition

Abstract

The invention discloses a multimedia information pushing method, a multimedia information pushing system and terminal equipment. The method comprises the steps of obtaining a face image; analyzing the attribute of the face image and outputting an analysis result; classifying the face images according to the analysis result, and determining the target category; and selecting and pushing the multimedia information corresponding to the target category, classifying the actual receiving objects based on the analysis of the face attributes, and accurately pushing the multimedia information to audience groups of different types according to the classification result, thereby solving the technical problem that the multimedia information cannot be updated in real time and accurately pushed according to the actual receiving objects.

Description

Multimedia information pushing method and system and terminal equipment
Technical Field
The invention relates to an information processing technology, in particular to a multimedia information pushing method, a multimedia information pushing system and terminal equipment.
Background
The electronic player is a common information pushing tool for merchants. For example, electronic players installed in elevators, public transportation vehicles, shopping guides, running machines, and the like, electronic players in public places, and display screens of electronic devices such as tablet computers, desktop computers, mobile phones, and the like.
However, currently, the information pushed by merchants is based on big data or offline research, by analyzing the attributes (including age, sex, proportion, etc.) of the target object in advance, and then selecting a suitable poster or video according to the analysis result and pushing the information to a specific area by means of periodic update.
However, in the conventional manner, the information coverage is narrow, the audience population is limited, and when the target object in the information delivery area is adjusted, the merchant cannot timely know that the target object is inconsistent with the actual receiving object, and cannot update information in real time according to the actual receiving object in front of the electronic player, so that an expected push effect cannot be achieved.
Disclosure of Invention
The embodiment of the invention provides a multimedia information pushing method, a multimedia information pushing system and terminal equipment, which at least solve the technical problem that the multimedia information cannot be updated in real time and pushed accurately according to an actual receiving object.
According to an aspect of the embodiments of the present invention, a multimedia information pushing method is provided, including: acquiring a face image; analyzing the attribute of the face image and outputting an analysis result; classifying the face images according to the analysis result, and determining the target category; and selecting and pushing the multimedia information corresponding to the target category.
Further, the multimedia information pushing method obtains the face image through an image capturing device, wherein the image capturing device is an independent camera device or a camera device integrated on the electronic equipment.
Further, the acquiring the face image includes: detecting whether the currently acquired image contains a human face; marking a detection frame for an image containing a human face; carrying out quality evaluation on the face image of the mark detection frame; and acquiring the face image with qualified quality evaluation.
Further, the attribute of the face image includes at least one of: gender, age group, race, specific age, expression.
Further, when the attribute of the face image includes a race attribute, the race attribute of the face image is analyzed, the face image is divided into a plurality of race categories according to different races, and if the face image further includes other attributes except for the race attribute, the other attributes of the face image are analyzed under each race category.
Further, when the attribute of the facial image includes an expression attribute, the expression attribute includes a degree of attention to the multimedia information.
And further, analyzing the attribute of the face image by adopting a multi-task classification model.
Further, when the number of attribute items of the face image is N, the multi-task classification model includes N input layers, M small group layers, and N output branches, where each attribute has a corresponding input layer, small group layer, and output branch, each output branch outputs an analysis result of an attribute, and N and M are integers greater than 1.
Further, when the number of attribute items of the face image is N, the multi-task classification model includes 1 input layer, at least 1 small group layer, and N output branches, where the N output branches share the 1 input layer and the at least 1 small group layer, each output branch outputs an analysis result of one attribute, and N is an integer greater than 1.
Further, when the number of attribute items of the face image is N, the multitask classification model includes 1 input layer, at least 1 small group layer, and M output branches, where the 1 input layer and the at least 1 small group layer are shared by the M output branches, one output branch of the M output branches first outputs an analysis result of one attribute (denoted as an attribute a) of the N attributes, then each output branch of the remaining (M-1) output branches outputs an analysis result of the remaining attributes according to the classification under the attribute a, the number of classifications under the attribute a is (M-1), and N and M are integers greater than 1.
Further, each output branch of the (M-1) output branches comprises (N-1) output layers.
Further, the analyzing the attributes of the face image and outputting the analysis result includes: positioning feature points of the face image to obtain a first image; adjusting the first image according to a standard image to obtain a second image; inputting the second image into the multitask classification model, and analyzing the attribute of the second image; and outputting the analysis result.
Further, the feature points include at least one of: eyes, nose, mouth, eyebrows.
And further, carrying out feature point positioning on the face image by adopting an ASM algorithm.
Further, the adjusting the first image according to the standard image, and obtaining the second image includes: aligning feature points in the first image with feature points in the standard image; obtaining the second image having the same size as the standard image.
Further, the standard image is a face image with feature points marked in advance, and the number of the feature points in the first image is the same as that of the feature points in the standard image.
Further, the first image is adjusted using affine transformation.
Further, training the multi-tasking classification model, the training comprising: acquiring a large number of sample images, and manually marking a part of the sample images to obtain a marking result; inputting the sample image and the labeling result into a multi-task classification model; analyzing the attribute of the sample image to obtain a sample analysis result; comparing the sample analysis result with the labeling result to determine a loss function; and updating the multitask classification model according to the loss function.
Further, comparing the sample analysis result with a labeling result, determining a loss function comprises: comparing a plurality of attribute values in the sample analysis result with corresponding attribute values in the marking result respectively to obtain a loss function of each attribute value; and weighting and summing the loss functions of the attribute values to obtain the overall loss function of the sample image.
Further, updating the multi-tasking classification model according to the loss function includes: and adjusting the weight value of each layer in the multi-task classification model according to the loss function until the output value of the multi-task classification model is consistent with the marking result or the minimum difference is kept and is not changed any more, and finishing updating.
According to another aspect of the embodiments of the present invention, there is also provided a multimedia information pushing system, including: an image capturing device configured to acquire a face image; the image analysis device is configured to analyze the attribute of the face image, output an analysis result, classify the face image according to the analysis result and determine the target category; the playing device is configured to push the corresponding multimedia information according to the target type.
Further, the image capturing apparatus is a separate camera or a camera integrated with the playback apparatus in one device.
Further, the image analysis device comprises a multitask classification model, wherein when the number of attribute items of the face image is N, the multitask classification model comprises N input layers, M small group layers and N output branches, each attribute comprises a corresponding input layer, a corresponding small group layer and a corresponding output branch, and each output branch outputs an analysis result of one attribute; or, the multi-task classification model includes 1 input layer, at least 1 subgroup layer, and N output branches, where the N output branches share the 1 input layer and the at least 1 subgroup layer, and each output branch outputs an analysis result of one attribute; or, the multitask classification model includes 1 input layer, at least 1 subgroup layer, and M output branches, where the 1 input layer and the at least 1 subgroup layer are shared by the M output branches, one output branch of the M output branches first outputs an analysis result of one attribute (denoted as an attribute a) of the N attributes, then each output branch of the remaining (M-1) output branches outputs an analysis result of the remaining attributes according to the classification under the attribute a, and the number of classifications under the attribute a is (M-1); n and M are integers greater than 1.
According to another aspect of the embodiments of the present invention, there is also provided a terminal device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute any one of the above multimedia information pushing methods via executing the executable instructions.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium including: and a stored program, wherein when the program runs, the device on which the storage medium is located is controlled to execute any one of the above multimedia information pushing methods.
In the embodiment of the invention, a face image is obtained; analyzing the attribute of the face image and outputting an analysis result; classifying the face images according to the analysis result, and determining the target category; and selecting and pushing the multimedia information corresponding to the target category, classifying the actual receiving objects based on the analysis of the face attributes, and accurately pushing the multimedia information to audience groups of different types according to the classification result, thereby solving the technical problem that the multimedia information cannot be updated in real time and accurately pushed according to the actual receiving objects.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flowchart illustrating an alternative multimedia information push method according to an embodiment of the present invention;
FIG. 2 is a flow chart of an alternative method of obtaining a face image according to an embodiment of the present invention;
FIG. 3 is a flow diagram of an alternative method of face attribute analysis according to an embodiment of the present invention;
FIG. 4 is a flow diagram of an alternative method of multi-tasking classification model training in accordance with an embodiment of the invention;
FIG. 5 is a diagram illustrating an alternative multimedia information push system according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention classifies the actual receiving objects based on the analysis of the face attributes, and accurately pushes the multimedia information to audience groups of different types according to the classification result. For enterprises, the efficiency can be greatly improved, and the cost and the resources can be saved by accurate multimedia information push; for the user, the accurate multimedia information push can provide proper information for the user, avoid the interference of the user by the junk information and improve the acceptance of the user to the information.
The embodiment of the invention can be applied to multimedia information pushing in places with a camera system and video image playing equipment, such as elevators, subways, buses, taxis, supermarkets, markets and the like, and can also be applied to terminal equipment with the camera system and the video image playing equipment, such as mobile phones, tablet computers, desktop computers, running machines, shopping guide machines and the like.
The embodiment of the invention can be applied to pushing various multimedia information such as commercial advertisements, education information, broadcast information, entertainment programs and the like aiming at different types of audience groups.
An alternative multimedia information pushing method according to an embodiment of the present invention is described below. It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Referring to fig. 1, a flow chart of an alternative multimedia information pushing method according to an embodiment of the invention is shown. As shown in fig. 1, the method comprises the steps of:
s10: acquiring a face image;
s12: analyzing the attribute of the face image and outputting an analysis result;
s14: classifying the face images according to the analysis result, and determining the target category;
s16: selecting and pushing the multimedia information corresponding to the target category.
In the embodiment of the invention, through the steps, the face image is obtained; analyzing the attribute of the face image and outputting an analysis result; classifying the face images according to the analysis result, and determining the target category; the multimedia information corresponding to the target category is selected and pushed, so that the real-time analysis of the actual receiving object can be realized based on the analysis of the face attribute, and the multimedia information is updated in real time according to the actual receiving object, thereby carrying out accurate multimedia information pushing.
The above steps will be described in detail below.
Step S10, acquiring a face image;
optionally, in the embodiment of the present invention, the face image may be obtained by an image capturing device, where the image capturing device may be an independent image capturing device or an image capturing device integrated on an electronic device, for example, a monitoring probe in a public place such as an elevator, a subway, a bus, a supermarket, a shopping mall, or a camera carried by an electronic device with a video image playing function such as a mobile phone, a tablet computer, a desktop computer, a shopping guide machine, a treadmill, or the like.
Optionally, in the embodiment of the present invention, the face image is a grayscale image or a color image. Preferably, when the face image is a color image, the recognition rate can be effectively improved by using the color information, and the accuracy of face attribute analysis is improved. For example, when the face image is a color image, the skin color of the face may be used as a parameter for analyzing the ethnic attribute.
Optionally, in the embodiment of the present invention, the face image is a planar image, and the apparent attribute of the face image is analyzed to obtain an analysis result. For example, a human face that is actually 40 years old may have an apparent age of 20-30 years. Similarly, sex, race, etc. are also analyzed as apparent attributes.
Optionally, in the embodiment of the present invention, the video frame image may be acquired by the image capturing device every predetermined number of frames, so as to reduce the acquiring frequency of the video frame image and optimize the computing resource.
Step S12, analyzing the attribute of the face image to obtain an analysis result;
optionally, in this embodiment of the present invention, the attribute of the face image includes at least one of: gender, age group, race, specific age, expression.
Optionally, in the embodiment of the present invention, when the attribute of the face image includes an expression attribute, the expression attribute includes a degree of attention to the multimedia information. The attention degree of the multimedia information can be determined by various modes such as the eye direction, attention time, face inclination angle and the like of the face.
Because the conventional electronic player generally cannot reflect the attention of the target object to the multimedia information transmitted by the target object in real time, when the target object is not interested, the target object still continues to be pushed, so that on one hand, resource waste is caused, the influence of the multimedia information is reduced, and on the other hand, the target object is disliked. Whether the actual receiving object receives the multimedia information can be determined by analyzing the attention degree of the actual receiving object to the multimedia information, if so, the pushing is continued, and if the actual receiving object does not pay attention to the multimedia information any more, the pushed multimedia information is stopped or updated. Before a new push round is performed, the attention of the multimedia information is updated. In addition, the number of people who actually receive the object can be determined by analyzing the attention degree of the actually received object to the multimedia information, so that the popularity of the multimedia information is further analyzed.
Optionally, in the embodiment of the present invention, the attribute of the face image may be analyzed by using a multi-task classification model. The multi-tasking classification model may include an input layer, a small group layer, and an output branch. The input layer is used for receiving the face image, the group layer is used for analyzing the attribute of the face image, and the output branch is used for outputting the analysis result.
When the number of attribute items of the face image is N, the structure of the multitask classification model may be in various forms, for example, the first structure is: the multitask classification model comprises N input layers, M small group layers and N output branches, wherein each attribute comprises the corresponding input layer, small group layer and output branch, each output branch outputs an analysis result of one attribute, and preferably, the accuracy of analysis can be improved under the condition that M is greater than or equal to N. For another example, the second structure is: the multi-task classification model comprises 1 input layer, at least 1 small group layer and N output branches, wherein the N output branches share the 1 input layer and the at least 1 small group layer, and each output branch outputs an analysis result of one attribute. For another example, the third structure is: the multitask classification model comprises 1 input layer, at least 1 small group layer and M output branches, wherein the M output branches share 1 input layer and at least 1 small group layer, one output branch in the M output branches firstly outputs an analysis result of one attribute (marked as an A attribute) in the N attributes, then each output branch in the rest (M-1) output branches outputs an analysis result of the rest attributes according to the classification under the A attribute, and the classification number under the A attribute is (M-1); wherein N and M are integers greater than 1. In this structure, the structure of (M-1) output branches also exists in various forms, for example, each output branch of the (M-1) output branches may include (N-1) output layers, each of which outputs an analysis result of one attribute. Wherein the third structure has a better detection rate than the first and second structures. For simplicity of illustration, only three exemplary structures are listed here, and those skilled in the art can also construct a multi-task classification model of other structures to realize attribute analysis on the face image and obtain the analysis result.
Optionally, in the embodiment of the present invention, for example, when the attribute of the face image includes a race attribute (i.e., an a attribute), the race attribute of the face image is analyzed first, and the face image is divided into a plurality of race categories according to different races, and if the face image further includes other attributes except the race attribute, the other attributes of the face image are analyzed under each race category. Since people of different ethnicities have great difference in other attributes such as apparent age and gender from actual comparison, the accuracy of the analysis result can be improved by analyzing the ethnicity attributes of the face image, dividing the face image into a plurality of ethnicity categories according to different ethnicities, and then analyzing the other attributes of the face image under each ethnicity category.
Optionally, in the embodiment of the present invention, the attribute of the face image includes gender, race, age group, and specific age 4 attribute as an example description, and the gender, age group, and race are specifically divided and tags are set. For example, the race is classified into (yellow, white, black), the sex is classified into (male, female), the age is classified into 7 age groups (infant 0-5, child 6-15, young 16-25, adult 26-35, middle-aged 36-48, middle-aged 49-60, old 60+ age), and the labels are set as race (0,1,2), sex (0,1), age group (0,1,2,3,4,5,6), respectively. The race, sex, age group and specific age of each face image are analyzed to obtain an analysis result, for example, if the attribute of one face image is black, female, young 16-25 years, and specific age is 23 years, the analysis result is (2, 1,2, 23).
Step S14, classifying the face images according to the analysis result and determining the target category;
optionally, in the embodiment of the present invention, taking the attribute of the face image including gender, race, age group, and specific age 4 attribute as an example, the face image may be classified according to the gender and age group label in the analysis result; the race and the specific age can be used as auxiliary attributes to improve the classification precision in different application occasions.
Optionally, in the embodiment of the present invention, the category with the highest proportion may be determined as the target category. For example, according to the gender and age group classification, if the label with the highest ratio in the face image is (1, 3), the target class is determined to be (1, 3), corresponding to an adult female.
Step S16, selecting and pushing the multimedia information corresponding to the target category;
optionally, in the embodiment of the present invention, the multimedia information may be classified according to a preset face attribute, and then the multimedia information corresponding to the target category is selected to be pushed. Specifically, each piece of multimedia information may include at least one attribute tag, and the multimedia information may be pushed when the object type corresponds to one of the attribute tags of the piece of multimedia information. For example, if the target category is adult female (1, 3), the attribute tags of the multimedia information related to beauty skin care may be set to (1, 3), (1, 4), and when the tags (1, 3) match the target category, the multimedia information related to beauty skin care may be selected to be pushed.
Optionally, the multimedia information includes various forms such as pictures, video, audio, text, etc.
Optionally, in the embodiment of the present invention, the multimedia information corresponding to the target category may be pushed through a playing device, where the playing device may be a video and/or audio player, a display screen, and the like that are independent or integrated on an electronic device, for example, in public places such as an elevator, a subway, a bus, a supermarket, a shopping mall, and the like, or a video and/or audio player, a display screen, and the like that are provided on a mobile phone, a tablet computer, a desktop computer, a shopping guide machine, a treadmill, and the like.
Through the steps, the actual receiving object can be analyzed in real time based on the analysis of the face attribute, the multimedia information can be updated in real time, and accurate multimedia information pushing is achieved.
The key steps S10 and S12 among the above steps will be specifically analyzed below.
Referring to fig. 2, a flow chart of an alternative method for obtaining a face image according to an embodiment of the present invention is shown. As shown in fig. 2, the method comprises the steps of:
step S20, detecting whether the current acquired image contains a human face, and marking a detection frame for the image containing the human face;
in the embodiment of the invention, the purpose of the step is to eliminate the images without human faces, and screen out the images containing human faces by marking the detection frame, so as to reduce the workload of human face image analysis and classification and improve the pushing efficiency of multimedia information.
Step S22, carrying out quality evaluation on the face image of the mark detection frame;
in the embodiment of the invention, the face image with blur, large angle, small size, serious deviation of a face detection frame and insufficient illumination can be judged as unqualified quality. And returning the unqualified face with an unqualified evaluation result, and stopping detection. And executing the next step for the face with the qualified evaluation result.
Step S26, acquiring a face image with qualified quality evaluation;
through the steps, the images acquired by the image capturing device can be preliminarily screened, and the images without human faces and the human face images with unqualified quality evaluation are eliminated, so that the workload of analyzing and classifying the human face images is reduced, and the pushing efficiency of multimedia information is improved.
Referring to fig. 3, a flow chart of an alternative face attribute analysis method according to an embodiment of the present invention is shown. As shown in fig. 3, the method comprises the steps of:
step S30, positioning the characteristic points of the face image to obtain a first image;
optionally, in the embodiment of the present invention, an ASM algorithm is used for locating the feature points, where the feature points include at least one of the following: eyes, nose, mouth, eyebrows, and the image with the characteristic point positioned is used as the first image.
Step S32, adjusting the first image according to the standard image to obtain a second image;
optionally, in this embodiment of the present invention, in step S32, adjusting the first image according to the standard image, and obtaining the second image includes: aligning the feature points in the first image with the feature points in the standard image; a second image of the same size as the standard image is obtained. The standard image is a face image with characteristic points marked in advance and is used as an alignment standard; the number of feature points in the first image is the same as the number of feature points in the standard image.
Optionally, in the embodiment of the present invention, the first image is adjusted by Affine transformation (Affine Transform). The affine transformation is a spatial rectangular coordinate transformation, is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and can maintain the 'straightness' (namely straight lines are still straight lines after transformation, and circular arcs are still circular arcs) and 'parallelism' (namely the relative position relationship of the two-dimensional graphs before and after transformation is kept unchanged) of the two-dimensional graphs. The first image is adjusted by affine transformation, and the first image is aligned with the feature points in the standard image mainly through a series of transformations such as moving, zooming, turning, rotating and the like, and a second image with the same size as the standard image is obtained.
Step S34, inputting the second image into the multi-task classification model, and analyzing the attribute of the second image;
optionally, in an embodiment of the present invention, the multitask classification model may be a Convolutional Neural Network (CNN). Convolutional neural networks are used primarily to identify two-dimensional patterns of displacement, scaling and other forms of distortion invariance. The feature detection layer of the convolutional neural network learns through training data, and explicit feature extraction can be avoided. Because the weights of the neurons on the same characteristic mapping surface are the same, the convolutional neural network can realize parallel learning, the complexity of the network is reduced due to the special structure of local weight sharing, the layout is closer to the actual biological neural network, and the convolutional neural network has unique superiority in the aspects of voice recognition and image processing. In addition, the image of the multidimensional input vector can be directly input into the convolutional neural network, so that the complexity of data reconstruction in the processes of feature extraction and classification is avoided.
Optionally, in the embodiment of the present invention, each layer of the convolutional neural network includes an input layer, a small group layer, and an output branch; wherein, the small group layer comprises a convolution layer and an activation layer; the output branch comprises a full connection layer and an output layer. Except for the input layer and the output layer, the input of each middle layer is the output of the previous layer, and the output is the input of the next layer. Of course, those skilled in the art will appreciate that the structures inside the input layer, the subgroup layer and the output branch can be constructed according to actual needs, for example, the subgroup layer can also include the pooling layer and the full connection layer, and is not limited to the above structures.
Optionally, in the embodiment of the present invention, a third structure is adopted for the multitask classification model, the attribute of the face image includes 4 attributes of gender, race, age group, and specific age, and the multitask classification model is described as an example as a convolutional neural network, where the convolutional neural network includes one input layer, at least one subgroup layer, and 4 output branches. Wherein, one branch of the 4 output branches is a race classifier, and the other 3 output branches respectively correspond to different classifications under the race attribute (for example, yellow race, caucasian and black race). The second image firstly passes through an input layer, a convolutional layer, an activation layer and a race classifier of the convolutional neural network to determine the race attributes of the second image, and after the race attributes are determined, corresponding output branches in the other 3 output branches are selected to analyze 3 attributes of gender, age group and specific age. For example, the second image passes through an input layer, a convolutional layer, an activation layer and a race classifier of a convolutional neural network, the race attribute of the second image is determined to be the yellow race, and an output branch corresponding to the yellow race is selected to further analyze the gender, the age group and the specific age of the second image.
And step S36, outputting the analysis result.
Optionally, in the embodiment of the present invention, taking the attribute of the face image including gender, race, age group and specific age 4 attribute as an example description, analyzing the gender, age group, race and specific age of each second image to obtain an analysis result, for example, analyzing the attribute of one second image as black, female, young 16-25 years, and specific age as 23 years, the analysis result is (2, 1,2, 23).
Through the steps, the apparent attribute of the face image can be obtained, and the analysis of multiple attributes of the face image through one multi-task classification model can be realized.
In an embodiment of the present invention, the multimedia information pushing method may further include training the multi-task classification model in advance. Referring to fig. 4, a flowchart of an alternative method for training a multi-tasking classification model according to an embodiment of the invention is shown. As shown in fig. 4, the method comprises the steps of:
step S40, acquiring a large number of sample images, and manually marking part of the sample images to obtain a marking result;
optionally, in the embodiment of the present invention, in order to obtain a better multi-task classification model. The sample images in a large number of different environments can be collected, for example, different environments with different scenes, different illumination, different resolutions, different decorations, and the attributes of the sample images in the predetermined face image should be uniformly distributed. For example, the attributes of the preset face image include two genders (male and female), 7 ages (infant 0-5, child 6-15, young 16-25, adult 26-35, middle-aged 36-48, middle-aged 49-60, old 60 +), 3 races (yellow, white, black), the sample image should cover the above attributes and be distributed uniformly, and the number of samples per attribute is sufficiently large (for example, more than 5000).
Optionally, in the embodiment of the present invention, the sample images may be subjected to transformations such as translation, rotation, scaling, and the like, so as to expand the number of the sample images and enhance the robustness of the multi-task classification model.
Optionally, in the embodiment of the present invention, in order to reduce the calculation amount of the subsequent step, the sample image may be uniformly cropped into an image of a specific size through a series of transformations such as eye point positioning, translation, rotation, and scaling.
Optionally, in the embodiment of the present invention, a part of sample images with better quality in a large number of sample images may be selected to be manually labeled to serve as a training basis of the multi-task classification model, so that a rough multi-task classification model may be trained first, and then the model is used to label the remaining unlabeled sample images, thereby gradually enhancing the robustness of the multi-task classification model and reducing the workload of manual labeling.
Step S42, inputting the sample image and the marking result into a multi-task classification model;
optionally, in an embodiment of the present invention, the multitask classification model may be a Convolutional Neural Network (CNN). Each layer of the convolutional neural network comprises an input layer, a small group layer and an output branch; wherein, the small group layer comprises a convolution layer and an activation layer; the output branch comprises a full connection layer and an output layer. Except for the input layer and the output layer, the input of each middle layer is the output of the previous layer, and the output is the input of the next layer. In step S42, the sample image and the labeling result are input to the input layer of the convolutional neural network.
Optionally, in an embodiment of the present invention, when the multi-task classification model adopts the third structure, that is, the multi-task classification model includes 1 input layer, 1 small group layer, and M output branches, where the M output branches share the input layer and the small group layer, one output branch of the M output branches first outputs an analysis result of one attribute (denoted as an attribute a) of the N attributes, then each output branch of the remaining (M-1) output an analysis result of the remaining attributes according to the classification under the attribute a, and the number of classifications under the attribute a is (M-1). The method is described by taking the example that the attributes of the face image comprise 4 attributes of gender, race, age group and specific age, and the multitask classification model is a convolutional neural network, the convolutional neural network comprises an input layer, at least one small group layer and 4 output branches, wherein the small group layer comprises a convolutional layer and an activation layer, and the output branches comprise a full connection layer and an output layer. Wherein 4 output branches share one input layer and at least one small group layer, the ethnicity attribute is used as an attribute A, one branch in the 4 output branches is a ethnicity classifier, and the other 3 output branches respectively correspond to different classifications (such as yellow, white and black) under the ethnicity attribute. First, input a sample image of a full skin color (e.g., including yellow, white, and black) to train a race classifier; then keeping all the weights of the shared layer and the parameters of the full-connected layer in the race classifier unchanged, and inputting a sample image of the yellow race to train an output branch corresponding to the yellow race; then keeping all weights of the shared layer, full-connected layer parameters in the race classifier and full-connected layer parameters in the output branch corresponding to the caucasian unchanged, and inputting sample images of the caucasian to train the output branch corresponding to the caucasian; and finally, keeping all weights of the shared layer, the full-connection layer parameters in the race classifier, the full-connection layer parameters in the output branch corresponding to the yellow race and the full-connection layer parameters in the output branch corresponding to the white race unchanged, and inputting a sample image of the black race to train the output branch corresponding to the black race. Of course, according to the principle of the above example, those skilled in the art can perform reasonable transformation (including attribute replacement and order adjustment) without creative work, for example, when the gender attribute is taken as the a attribute, the number of the output branches is 2, which respectively correspond to different classifications (e.g., male and female) under the gender attribute, and then perform similar training, which is not described herein again for simplifying the description.
Step S44, analyzing the attribute of the sample image to obtain a sample analysis result;
optionally, in the embodiment of the present invention, the convolutional layer of the convolutional neural network extracts data features of the sample image by setting a step size, a convolutional kernel size, and a convolutional kernel number; the activation layer adopts relu to carry out nonlinear change on the characteristic diagram output by the convolutional layer; the full-connection layer is connected with all feature maps output by the activation layer, the feature space is mapped to a mark space through linear transformation by weight, and the full-connection layer is connected with a Relu activation function; the output layer classifies and regresses the feature maps output by the full connected layer to obtain output values, for example, a softmax function is adopted as an output layer function of age bracket, gender and race, and an euclidean function is adopted as an output layer function of age value regression. Therefore, the attribute analysis of the sample image can be realized, and the sample analysis result is obtained through the output value of the convolutional neural network. Of course, those skilled in the art will appreciate that in other embodiments, the convolutional neural network may further include a pooling layer, and the feature map output by the previous layer is downsampled by the set step size and the pooling size.
Step S46, comparing the sample analysis result with the marking result, and determining a loss function (cost function);
optionally, in the embodiment of the present invention, a plurality of attribute values in the sample analysis result are respectively compared with corresponding attribute values in the labeling result, a loss function of each attribute value is solved, and then the loss functions of the plurality of attribute values are weighted and summed to obtain a loss function of the whole sample image. Conventional neural network models typically have only one attribute, and the loss function is compared to the labeled results based on the value of a single attribute. The multi-task classification model adopted in the embodiment of the invention can consider various attributes, so that the overall error of all the attributes is minimum. Thus, a combination of a plurality of different attributes may be supported. For example, the loss functions of Age, Gender and Race are L _ Age, L _ gene and L _ Race, respectively, and when two attributes of Age and Gender need to be output, the overall loss function of the multi-task classification model is L _ All ═ a × L _ Age + b × L _ gene; when three attributes of Age, Gender and Race are required to be output, the overall loss function of the multi-task classification model is L _ All ═ a × L _ Age + b × L _ Gender + c × L _ Race. If an Age value attribute is added, the overall loss function of the multi-tasking classification model is L _ All ═ a × L _ Age + b × L _ Gender + c × L _ Race + d × L _ Agevalue.
In step S48, the multitask classification model is updated according to the loss function.
Optionally, in the embodiment of the present invention, the weight value of each layer in the convolutional neural network is adjusted according to the loss function, so that a difference between the output value of the convolutional neural network and the labeling result is smaller and smaller, and the updating is completed until the output value of the convolutional neural network is consistent with the labeling result or the minimum difference is maintained and is not changed any more, and finally the required convolutional neural network is obtained.
Through the steps, the multitask classification model with better robustness can be obtained, and various attributes can be analyzed.
According to another aspect of the embodiment of the invention, a multimedia information pushing system is also provided. Fig. 5 is a schematic diagram of an alternative multimedia information push system according to an embodiment of the present invention, including: an image capturing device configured to acquire a face image; the image analysis device is configured to analyze the attribute of the face image, output an analysis result, classify the face image according to the analysis result and determine the target category; the playing device is configured to push the corresponding multimedia information according to the target type.
Alternatively, the image capturing device is a separate camera or a camera integrated with the playback device in one apparatus.
Optionally, the image analysis apparatus is further configured to: detecting whether the currently acquired image contains a human face; marking a detection frame for an image containing a human face; carrying out quality evaluation on the face image of the mark detection frame; and acquiring the face image with qualified quality evaluation.
Optionally, the attribute of the face image includes at least one of: gender, age group, race, specific age, expression.
Optionally, when the attribute of the face image includes a race attribute, the race attribute of the face image is analyzed first, and the face image is divided into a plurality of race categories according to different races, and if the face image further includes other attributes except for the race attribute, the other attributes of the face image are analyzed under each race category.
Optionally, when the attribute of the face image includes an expression attribute, the expression attribute includes a degree of attention to the multimedia information.
Optionally, the image analysis apparatus includes a multitask classification model, where when the number of attribute items of the face image is N, the multitask classification model includes N input layers, M small group layers, and N output branches, where each attribute has a corresponding input layer, small group layer, and output branch, and each output branch outputs an analysis result of one attribute, and preferably, when M is greater than or equal to N, the accuracy of analysis may be improved; or, the multi-task classification model includes 1 input layer, at least 1 subgroup layer, and N output branches, where the N output branches share the 1 input layer and the at least 1 subgroup layer, and each output branch outputs an analysis result of one attribute; or, the multitask classification model includes 1 input layer, at least 1 subgroup layer, and M output branches, where the 1 input layer and the at least 1 subgroup layer are shared by the M output branches, one output branch of the M output branches first outputs an analysis result of one attribute (denoted as an attribute a) of the N attributes, then each output branch of the remaining (M-1) output branches outputs an analysis result of the remaining attributes according to the classification under the attribute a, and the number of classifications under the attribute a is (M-1); n and M are integers greater than 1.
Optionally, each output branch of the (M-1) output branches comprises (N-1) output layers.
Optionally, the image analysis apparatus is further configured to: positioning feature points of the face image to obtain a first image; adjusting the first image according to a standard image to obtain a second image; inputting the second image into the multitask classification model, and analyzing the attribute of the second image; and outputting the analysis result.
Optionally, the feature points include at least one of: eyes, nose, mouth, eyebrows.
Optionally, feature point positioning is performed on the face image by using an ASM algorithm.
Optionally, the image analysis apparatus is further configured to: aligning feature points in the first image with feature points in the standard image; obtaining the second image having the same size as the standard image.
Optionally, the standard image is a face image with feature points marked in advance, and the number of the feature points in the first image is the same as the number of the feature points in the standard image.
Optionally, the first image is adjusted by affine transformation.
Optionally, the image analysis apparatus is further configured to train the multi-task classification model, the training including: acquiring a large number of sample images, and manually marking a part of the sample images to obtain a marking result; inputting the sample image and the labeling result into a multi-task classification model; analyzing the attribute of the sample image to obtain a sample analysis result; comparing the sample analysis result with the labeling result to determine a loss function; and updating the multitask classification model according to the loss function.
Optionally, comparing the sample analysis result with the labeling result, and determining the loss function includes: comparing a plurality of attribute values in the sample analysis result with corresponding attribute values in the marking result respectively to obtain a loss function of each attribute value; and weighting and summing the loss functions of the attribute values to obtain the overall loss function of the sample image.
Optionally, the updating the multitask classification model according to the loss function includes: and adjusting the weight value of each layer in the multi-task classification model according to the loss function until the output value of the multi-task classification model is consistent with the marking result or the minimum difference is kept and is not changed any more, and finishing updating.
According to another aspect of the embodiments of the present invention, there is also provided a terminal device, including: a processor, a memory, and a program stored on the memory and executable on the processor, the processor implementing the steps of: acquiring a face image; analyzing the attribute of the face image and outputting an analysis result; classifying the face images according to the analysis result, and determining the target category; and selecting and pushing the multimedia information corresponding to the target category.
Optionally, the face image is acquired by an image capturing device, where the image capturing device is a separate camera device or a camera device integrated on an electronic device.
Optionally, when the processor executes the program, the following steps may be further implemented: detecting whether the currently acquired image contains a human face; marking a detection frame for an image containing a human face; carrying out quality evaluation on the face image of the mark detection frame; and acquiring the face image with qualified quality evaluation.
Optionally, the attribute of the face image includes at least one of: gender, age group, race, specific age, expression.
Optionally, when the attribute of the face image includes a race attribute, the race attribute of the face image is analyzed first, and the face image is divided into a plurality of race categories according to different races, and if the face image further includes other attributes except for the race attribute, the other attributes of the face image are analyzed under each race category.
Optionally, when the attribute of the face image includes an expression attribute, the expression attribute includes a degree of attention to the multimedia information.
Optionally, the attribute of the face image is analyzed by using a multi-task classification model.
Optionally, when the number of attribute items of the face image is N, the multitask classification model includes N input layers, M small group layers, and N output branches, where each attribute has a corresponding input layer, small group layer, and output branch, each output branch outputs an analysis result of one attribute, and N and M are integers greater than 1.
Optionally, when the number of attribute items of the face image is N, the multitask classification model includes 1 input layer, at least 1 small group layer, and N output branches, where the N output branches share the 1 input layer and the at least 1 small group layer, each output branch outputs an analysis result of one attribute, and N is an integer greater than 1.
Optionally, when the number of attribute items of the face image is N, the multitask classification model includes 1 input layer, at least 1 subgroup layer, and M output branches, where the 1 input layer and the at least 1 subgroup layer are shared by the M output branches, one output branch of the M output branches first outputs an analysis result of one attribute (denoted as an attribute a) of the N attributes, then each output branch of the remaining (M-1) output branches outputs an analysis result of the remaining attributes according to the classification under the attribute a, the number of classifications under the attribute a is (M-1), and N and M are integers greater than 1.
Optionally, each output branch of the (M-1) output branches comprises (N-1) output layers.
Optionally, the analyzing the attribute of the face image, and outputting an analysis result includes: positioning feature points of the face image to obtain a first image; adjusting the first image according to a standard image to obtain a second image; inputting the second image into the multitask classification model, and analyzing the attribute of the second image; and outputting the analysis result.
Optionally, the feature points include at least one of: eyes, nose, mouth, eyebrows.
Optionally, feature point positioning is performed on the face image by using an ASM algorithm.
Optionally, when the processor executes the program, the following steps may be further implemented: aligning feature points in the first image with feature points in the standard image; obtaining the second image having the same size as the standard image.
Optionally, the standard image is a face image with feature points marked in advance, and the number of the feature points in the first image is the same as the number of the feature points in the standard image.
Optionally, the first image is adjusted by affine transformation.
Optionally, when the processor executes a program, the processor may further train the multi-task classification model, where the training includes: acquiring a large number of sample images, and manually marking a part of the sample images to obtain a marking result; inputting the sample image and the labeling result into a multi-task classification model; analyzing the attribute of the sample image to obtain a sample analysis result; comparing the sample analysis result with the labeling result to determine a loss function; and updating the multitask classification model according to the loss function.
Optionally, comparing the sample analysis result with the labeling result, and determining the loss function includes: comparing a plurality of attribute values in the sample analysis result with corresponding attribute values in the marking result respectively to obtain a loss function of each attribute value; and weighting and summing the loss functions of the attribute values to obtain the overall loss function of the sample image.
Optionally, the updating the multitask classification model according to the loss function includes: and adjusting the weight value of each layer in the multi-task classification model according to the loss function until the output value of the multi-task classification model is consistent with the marking result or the minimum difference is kept and is not changed any more, and finishing updating.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, a device on which the storage medium is located is controlled to execute any one of the above multimedia information pushing methods.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring a face image; analyzing the attribute of the face image and outputting an analysis result; classifying the face images according to the analysis result, and determining the target category; and selecting and pushing the multimedia information corresponding to the target category.
It is known to those skilled in the art that the above-mentioned method for analyzing attributes of human face images can also be used for analyzing attributes of human body images, such as clothes, hair style, stature, etc., without creative efforts. The multi-tasking classification model may also be implemented using techniques other than convolutional neural networks.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (25)

1. A multimedia information pushing method is characterized by comprising the following steps:
acquiring a face image;
analyzing the attribute of the face image and outputting an analysis result;
classifying the face images according to the analysis result, and determining the target category;
and selecting and pushing the multimedia information corresponding to the target category.
2. The multimedia information pushing method according to claim 1, wherein the face image is obtained by an image capturing device, wherein the image capturing device is a separate camera device or a camera device integrated with an electronic device.
3. The multimedia information pushing method according to claim 1, wherein the obtaining the face image comprises:
detecting whether the currently acquired image contains a human face;
marking a detection frame for an image containing a human face;
carrying out quality evaluation on the face image of the mark detection frame;
and acquiring the face image with qualified quality evaluation.
4. The multimedia information pushing method as claimed in claim 1, wherein the attribute of the face image includes at least one of: gender, age group, race, specific age, expression.
5. The method as claimed in claim 4, wherein when the attribute of the face image includes a race attribute, the race attribute of the face image is analyzed and the face image is divided into a plurality of race categories according to different races, and if the face image further includes other attributes except the race attribute, the other attributes of the face image are analyzed under each race category.
6. The method as claimed in claim 4, wherein when the attribute of the face image includes an expression attribute, the expression attribute includes a degree of attention to the multimedia information.
7. The multimedia information pushing method as claimed in claim 1, wherein the multi-task classification model is used to analyze the attributes of the face image.
8. The multimedia information pushing method according to claim 7, wherein when the number of attribute items of the face image is N, the multitask classification model includes N input layers, M subgroup layers, and N output branches, where each attribute has a corresponding input layer, subgroup layer, and output branch, each output branch outputs an analysis result of an attribute, and N and M are integers greater than 1.
9. The multimedia information pushing method according to claim 7, wherein when the number of attribute items of the face image is N, the multitask classification model includes 1 input layer, at least 1 tuple layer, and N output branches, wherein the N output branches share the 1 input layer and the at least 1 tuple layer, each output branch outputs an analysis result of an attribute, and N is an integer greater than 1.
10. The method as claimed in claim 7, wherein when the number of attribute items of the face image is N, the multitask classification model includes 1 input layer, at least 1 subgroup layer, and M output branches, wherein the 1 input layer and the at least 1 subgroup layer are shared by the M output branches, one of the M output branches outputs an analysis result of one of the N attributes (denoted as an a attribute), then each of the remaining (M-1) output branches outputs an analysis result of the remaining attributes according to the classification under the a attribute, the number of classifications under the a attribute is (M-1), and N and M are integers greater than 1.
11. The multimedia information pushing method according to claim 10, wherein each of the (M-1) output branches comprises (N-1) output layers, and each output layer outputs an analysis result of an attribute.
12. The method of claim 1, wherein analyzing the attributes of the face image and outputting the analysis result comprises:
positioning feature points of the face image to obtain a first image;
adjusting the first image according to a standard image to obtain a second image;
inputting the second image into the multitask classification model, and analyzing the attribute of the second image;
and outputting the analysis result.
13. The multimedia information pushing method according to claim 12, wherein the feature point includes at least one of: eyes, nose, mouth, eyebrows.
14. The method as claimed in claim 12, wherein feature points of the face image are located using an ASM algorithm.
15. The method as claimed in claim 12, wherein adjusting the first image according to the standard image to obtain the second image comprises:
aligning feature points in the first image with feature points in the standard image;
obtaining the second image having the same size as the standard image.
16. The method as claimed in claim 12, wherein the standard image is a face image with pre-marked feature points, and the number of feature points in the first image is the same as the number of feature points in the standard image.
17. The multimedia information pushing method as claimed in claim 12, wherein the first image is adjusted by affine transformation.
18. The method of claim 7, further comprising training the multitask classification model, wherein the training comprises:
acquiring a large number of sample images, and manually marking a part of the sample images to obtain a marking result;
inputting the sample image and the labeling result into a multi-task classification model;
analyzing the attribute of the sample image to obtain a sample analysis result;
comparing the sample analysis result with the labeling result to determine a loss function;
and updating the multitask classification model according to the loss function.
19. The method of claim 18, wherein comparing the sample analysis result with the mark result and determining a loss function comprises:
comparing a plurality of attribute values in the sample analysis result with corresponding attribute values in the marking result respectively to obtain a loss function of each attribute value;
and weighting and summing the loss functions of the attribute values to obtain the overall loss function of the sample image.
20. The method of claim 18, wherein updating the multitask classification model according to the loss function comprises:
and adjusting the weight value of each layer in the multi-task classification model according to the loss function until the output value of the multi-task classification model is consistent with the marking result or the minimum difference is kept and is not changed any more, and finishing updating.
21. A multimedia information push system, comprising:
an image capturing device configured to acquire a face image;
the image analysis device is configured to analyze the attribute of the face image, output an analysis result, classify the face image according to the analysis result and determine the target category;
the playing device is configured to push the corresponding multimedia information according to the target type.
22. The multimedia information pushing system of claim 21, wherein the image capturing device is a separate camera or a camera integrated with the playing device in a single apparatus.
23. The multimedia information pushing system according to claim 21, wherein the image analysis device comprises a multitask classification model, wherein when the number of attribute items of the face image is N,
the multi-task classification model comprises N input layers, M small group layers and N output branches, wherein each attribute comprises the corresponding input layer, small group layer and output branch, and each output branch outputs an analysis result of one attribute;
alternatively, the first and second electrodes may be,
the multi-task classification model comprises 1 input layer, at least 1 small group layer and N output branches, wherein the N output branches share the 1 input layer and the at least 1 small group layer, and each output branch outputs an analysis result of one attribute; alternatively, the first and second electrodes may be,
the multitask classification model comprises 1 input layer, at least 1 small group layer and M output branches, wherein the 1 input layer and the at least 1 small group layer are shared by the M output branches, one output branch in the M output branches firstly outputs an analysis result of one attribute (marked as an A attribute) in the N attributes, then each output branch in the rest (M-1) output branches outputs an analysis result of the rest attributes according to the classification under the A attribute, and the classification number under the A attribute is (M-1); n and M are integers greater than 1.
24. A terminal device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the multimedia information push method of any of claims 1 to 20 via execution of the executable instructions.
25. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the multimedia information pushing method according to any one of claims 1 to 20.
CN201910132779.5A 2019-02-21 2019-02-21 Multimedia information pushing method and system and terminal equipment Pending CN111598600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910132779.5A CN111598600A (en) 2019-02-21 2019-02-21 Multimedia information pushing method and system and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910132779.5A CN111598600A (en) 2019-02-21 2019-02-21 Multimedia information pushing method and system and terminal equipment

Publications (1)

Publication Number Publication Date
CN111598600A true CN111598600A (en) 2020-08-28

Family

ID=72192008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910132779.5A Pending CN111598600A (en) 2019-02-21 2019-02-21 Multimedia information pushing method and system and terminal equipment

Country Status (1)

Country Link
CN (1) CN111598600A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184314A (en) * 2020-09-29 2021-01-05 福州东方智慧网络科技有限公司 Popularization method based on equipment side visual interaction
CN114866693A (en) * 2022-04-15 2022-08-05 苏州清睿智能科技股份有限公司 Information interaction method and device based on intelligent terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129644A (en) * 2011-03-08 2011-07-20 北京理工大学 Intelligent advertising system having functions of audience characteristic perception and counting
US9317785B1 (en) * 2014-04-21 2016-04-19 Video Mining Corporation Method and system for determining ethnicity category of facial images based on multi-level primary and auxiliary classifiers
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN107742107A (en) * 2017-10-20 2018-02-27 北京达佳互联信息技术有限公司 Facial image sorting technique, device and server
CN107798560A (en) * 2017-10-23 2018-03-13 武汉科技大学 A kind of retail shop's individual character advertisement intelligent method for pushing and system
CN109359499A (en) * 2017-07-26 2019-02-19 虹软科技股份有限公司 A kind of method and apparatus for face classifier

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129644A (en) * 2011-03-08 2011-07-20 北京理工大学 Intelligent advertising system having functions of audience characteristic perception and counting
US9317785B1 (en) * 2014-04-21 2016-04-19 Video Mining Corporation Method and system for determining ethnicity category of facial images based on multi-level primary and auxiliary classifiers
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN109359499A (en) * 2017-07-26 2019-02-19 虹软科技股份有限公司 A kind of method and apparatus for face classifier
CN107742107A (en) * 2017-10-20 2018-02-27 北京达佳互联信息技术有限公司 Facial image sorting technique, device and server
CN107798560A (en) * 2017-10-23 2018-03-13 武汉科技大学 A kind of retail shop's individual character advertisement intelligent method for pushing and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184314A (en) * 2020-09-29 2021-01-05 福州东方智慧网络科技有限公司 Popularization method based on equipment side visual interaction
CN114866693A (en) * 2022-04-15 2022-08-05 苏州清睿智能科技股份有限公司 Information interaction method and device based on intelligent terminal
CN114866693B (en) * 2022-04-15 2024-01-05 苏州清睿智能科技股份有限公司 Information interaction method and device based on intelligent terminal

Similar Documents

Publication Publication Date Title
US10776970B2 (en) Method and apparatus for processing video image and computer readable medium
US10657652B2 (en) Image matting using deep learning
US10366313B2 (en) Activation layers for deep learning networks
Lebreton et al. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images
Ma et al. A saliency prior context model for real-time object tracking
Han et al. Two-stage learning to predict human eye fixations via SDAEs
US20200334867A1 (en) Face synthesis
WO2018166288A1 (en) Information presentation method and device
WO2020078119A1 (en) Method, device and system for simulating user wearing clothing and accessories
CN106537390B (en) Identify the presentation style of education video
CN108830237B (en) Facial expression recognition method
Farinella et al. Face re-identification for digital signage applications
Li et al. Convolutional neural net bagging for online visual tracking
CN111881901A (en) Screenshot content detection method and device and computer-readable storage medium
Zhang et al. Deformable object tracking with spatiotemporal segmentation in big vision surveillance
Yu et al. AI-based targeted advertising system
CN111598600A (en) Multimedia information pushing method and system and terminal equipment
Lienhard et al. How to predict the global instantaneous feeling induced by a facial picture?
Liu et al. RGB-D action recognition using linear coding
Liang et al. Fixation prediction for advertising images: Dataset and benchmark
CN113762257A (en) Identification method and device for marks in makeup brand images
Gautam et al. Perceptive advertising using standardised facial features
JP6995262B1 (en) Learning systems, learning methods, and programs
Rao et al. Generating affective maps for images
Yang et al. Student Classroom Behavior Detection Based on YOLOv7+ BRA and Multi-model Fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination