WO2020147430A1 - Image identification-based product display method, device, apparatus, and medium - Google Patents

Image identification-based product display method, device, apparatus, and medium Download PDF

Info

Publication number
WO2020147430A1
WO2020147430A1 PCT/CN2019/120988 CN2019120988W WO2020147430A1 WO 2020147430 A1 WO2020147430 A1 WO 2020147430A1 CN 2019120988 W CN2019120988 W CN 2019120988W WO 2020147430 A1 WO2020147430 A1 WO 2020147430A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognized
target
images
video data
Prior art date
Application number
PCT/CN2019/120988
Other languages
French (fr)
Chinese (zh)
Inventor
罗琳耀
徐国强
邱寒
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020147430A1 publication Critical patent/WO2020147430A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions

Definitions

  • This application relates to the field of intelligent decision-making, and in particular to a method, device, equipment and medium for displaying commodities based on image recognition.
  • the embodiments of the present application provide a method, device, equipment, and medium for displaying commodities based on image recognition to solve the problem of insufficient attractiveness of displayed commodities.
  • a product display method based on image recognition including:
  • Target video data of the commodity to be displayed where the target video data includes at least two frames to be recognized
  • a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters
  • the set includes at least one frame to be recognized
  • a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
  • a product display device based on image recognition including:
  • a data acquisition module for acquiring target video data of a commodity to be displayed, where the target video data includes at least two frames of images to be recognized;
  • the image clustering collection acquisition module is used to perform face recognition and clustering on at least two frames of the to-be-recognized images by using a face detection model to acquire the number of customers corresponding to the target video data and the image clusters corresponding to each customer A set, each of the image cluster sets includes at least one frame to be recognized;
  • the single-frame emotion determination module is configured to, if the number of the customers is greater than the preset number, use a pre-trained micro-expression recognition model to identify the image to be recognized in each image cluster set, and obtain each Single frame emotion of the image to be recognized;
  • a target emotion obtaining module configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
  • the final emotion obtaining module is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set;
  • the target display product obtaining module is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • Target video data of the commodity to be displayed where the target video data includes at least two frames to be recognized
  • a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters
  • the set includes at least one frame to be recognized
  • a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
  • One or more readable storage media storing computer readable instructions
  • the computer readable storage medium storing computer readable instructions
  • the one Or multiple processors perform the following steps:
  • Target video data of the commodity to be displayed where the target video data includes at least two frames to be recognized
  • a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters
  • the set includes at least one frame to be recognized
  • a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
  • FIG. 1 is a schematic diagram of an application environment of a commodity display method based on image recognition in an embodiment of the present application
  • FIG. 2 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application
  • FIG. 3 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application
  • FIG. 4 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application
  • FIG. 5 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application
  • FIG. 6 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application
  • FIG. 7 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application.
  • FIG. 8 is a functional block diagram of a product display device based on image recognition in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the product display method based on image recognition provided by the embodiment of the present application can be applied in the application environment as shown in FIG. 1, wherein the client communicates with the server through the network.
  • the product display method based on image recognition is applied on the server, and the target video data corresponding to the product to be displayed is analyzed and identified, and the emotion corresponding to each customer in the target video data is obtained, and the customer pair is determined according to the emotion corresponding to each customer
  • the final sentiment of the product to be displayed, the target display product is determined according to the final sentiment corresponding to each product to be displayed, so that the target display product is the product that customers pay more attention to, and the attractiveness of the target display product is improved to attract customers to purchase the displayed product .
  • the client can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a product display method based on image recognition is provided.
  • the method is applied to the server in FIG. 1 as an example for description, which specifically includes the following steps:
  • S10 Obtain target video data of the commodity to be displayed, where the target video data includes at least two frames of images to be recognized.
  • the target video data refers to the video data obtained by filtering the initial video data corresponding to the commodity to be displayed, and may specifically be video data that meets some conditions. For example, video data satisfying the attribute information of the product to be displayed.
  • the attribute information may include appropriate age and appropriate gender.
  • the collected initial video data corresponding to the commodity to be displayed is filtered by suitable age and suitable gender to obtain the target video data.
  • the image to be recognized refers to an image that is screened according to a suitable age and a suitable gender.
  • the initial video data is the video data corresponding to each commodity to be displayed collected by the video collection tool.
  • a video collection tool is configured in advance for each area where the commodity to be displayed is located.
  • the video collection tool is used to collect images or video data.
  • the video collection tool detects that a customer appears within the collection range of the collection tool, the video The acquisition tool automatically triggers and collects image or video data from the customer.
  • the video collection tool is specifically a camera, through which the initial video data within the collection range corresponding to each commodity to be displayed can be collected in real time. Since each collection tool corresponds to a product to be displayed, the initial video data of each area of the product to be displayed is collected through each camera, and the initial video data of the area of the product to be displayed is filtered to obtain each and the product to be displayed. The target video data corresponding to the product.
  • the collected initial video data carries a product identifier corresponding to the product to be displayed, so that the corresponding target video data can be determined subsequently through the product identifier.
  • the initial video data collected by the video collection tool includes the product identifier A, then the initial video data is the initial video data corresponding to the product identifier A, and the initial video data is filtered to obtain the target video data corresponding to the product identifier A.
  • the product identifier refers to a unique identifier used to distinguish different products to be displayed.
  • the product identification may consist of at least one of numbers, letters, words or symbols.
  • the product identifier may be the serial number or serial number of the product to be displayed.
  • S20 Use the face detection model to perform face recognition and clustering on at least two frames of images to be recognized, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, and each image cluster set includes at least one Frame the image to be recognized.
  • the face detection model refers to a pre-trained model used to detect whether each frame of the image to be recognized contains a human face area.
  • the number of customers refers to the number determined according to different customers in the target video data.
  • the image clustering set refers to clustering the to-be-identified images corresponding to the same customer to form a set.
  • the server is connected to the database network, the face detection model is stored in the database, the target video data contains the images to be recognized corresponding to different customers, and the face detection model is used to recognize the images to be recognized to obtain the images that contain Face image, where the face image refers to the image of the customer’s face area.
  • the server inputs each image to be recognized in the acquired target video data into the face detection model, and performs face recognition on each image to be recognized in the target video data through the face detection model to determine whether each image to be recognized is Is a face image, if the image to be recognized is a face image, cluster the same face image, that is, cluster the image to be recognized corresponding to the same face image to obtain the image cluster corresponding to each customer Set, determine the number of customers in the target video data according to the number of image clustering sets in the target video data.
  • the feature extraction algorithm is used to extract the facial features of the facial image corresponding to each image to be recognized, and the facial features corresponding to each image to be recognized are subjected to feature similarity calculation. If the feature similarity is greater than a preset threshold, It means that the facial image corresponding to the feature similarity greater than the preset threshold is the facial image corresponding to the same customer, and the image to be recognized corresponding to the facial image corresponding to the same customer is clustered to obtain the image cluster corresponding to each customer Set, and determine the number of customers in the target video data according to the number of image clustering sets. Wherein, when the threshold is preset, it is preset to evaluate whether the similarity reaches the value determined to be the face of the same customer.
  • Feature extraction algorithms include, but are not limited to, CNN (Convolutional Neural Network, convolutional neural network) algorithms.
  • the CNN algorithm can be used to extract the facial features of the facial image corresponding to the image to be recognized.
  • the micro-expression recognition model is to capture the local features of the customer's face in the image to be recognized, and identify each target facial action unit of the face in the image to be recognized according to the local feature, and then according to the recognized target facial action unit Determine the model of its emotions.
  • Single-frame emotions refer to the emotions to be recognized by the micro-expression recognition model to identify the image to be recognized, and the emotions determined according to the recognized target facial action unit.
  • the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on local binary pattern (LBP).
  • the micro-expression recognition model is a partial recognition model based on classification.
  • the micro-expression recognition model is trained in advance, a large amount of training image data is collected in advance, and the training image data contains the positive samples and faces of each facial action unit. The negative samples of the action unit are trained on the training image data through the classification algorithm to obtain the micro-expression recognition model.
  • a large amount of training image data may be trained by an SVM classification algorithm to obtain SVM classifiers corresponding to N facial action units.
  • the micro-expression recognition model is formed through N SVM classifiers, and the more SVM classifiers obtained, the more accurate the emotions recognized by the formed micro-expression recognition model.
  • the preset number is a preset value, and the preset number corresponding to each commodity to be displayed is the same.
  • the server determines that the number of customers is greater than the preset number, and uses a pre-trained micro-expression recognition model to gather each image.
  • the images to be recognized in the class set are recognized, and the single frame emotion corresponding to each image to be recognized is obtained.
  • Using a pre-trained micro-expression recognition model to recognize the images to be recognized in each image cluster set specifically includes the following steps.
  • the server first performs face key point detection and feature extraction on the images to be recognized, so as Obtain the corresponding local features, and then input the local features corresponding to the image to be recognized into the pre-trained micro-expression recognition model.
  • the micro-expression recognition model includes N SVM classifiers, each SVM classifier recognizes a local feature corresponding to an image to be recognized, and all the local features of the input are recognized by N SVM classifiers to obtain N SVM classifications
  • the facial action unit is determined as the target facial action unit corresponding to the image to be recognized.
  • the preset probability threshold is a preset value.
  • the target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model.
  • the micro-expression recognition model includes 54 SVM classifiers, and a facial action unit number mapping table is established, and each facial action unit is represented by a predetermined number.
  • AU1 means the inner eyebrows are raised
  • AU2 is the outer eyebrows are raised
  • AU5 is the upper eyelids are raised
  • AU26 is the lower jaw.
  • Each facial action unit has a corresponding SVM classifier trained.
  • the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow
  • the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow.
  • the probability value can be a value between 0-1. If the output probability value is 0.6 and the preset probability threshold value is 0.5, then the probability value 0.6 is greater than the preset probability threshold value 0.5, then the facial action unit corresponding to 0.6 is used as the waiting Identify the target facial action unit of the image.
  • the server can determine the single frame emotion of the image to be recognized according to the target facial action unit corresponding to each image to be recognized, that is, query the evaluation table according to the target facial action unit corresponding to each image to be recognized, and obtain the The single frame emotion corresponding to the image.
  • 54 SVM classifiers are used to identify local features, determine all target facial action units corresponding to the image to be recognized, and look up the evaluation table based on all target facial action units to determine the single corresponding to the image to be recognized.
  • Frame emotions to improve the accuracy of obtaining customer single frame emotions.
  • the evaluation table is a pre-configured table. One or more facial action units form different emotions, the facial action unit combination corresponding to each emotion is obtained in advance, and each facial action unit combination and the corresponding emotion are associated and stored to form an evaluation table.
  • S40 Obtain the target emotion of the customer corresponding to the image cluster set based on the emotion of a single frame of at least one frame to be recognized.
  • the target emotion refers to the emotion determined according to the emotion of a single frame corresponding to each image to be recognized in the image cluster set. Understandably, in the image clustering set, there are images to be recognized corresponding to the same customer, and the target emotion corresponding to the customer is determined through a single frame of emotion corresponding to each image to be recognized of the customer.
  • the server obtains the single frame emotion corresponding to each image to be recognized in the image cluster set, and analyzes the single frame emotion corresponding to each image to be recognized in the image cluster set to obtain the target of the customer corresponding to the image cluster set mood. Understandably, it is judged whether the single frame emotion corresponding to each to-be-recognized image in the image cluster set is the same, and if all the single frame emotions in the image cluster set are the same, then the single frame emotion is used as the target emotion. If at least two single-frame emotions are not the same, determine which single-frame emotion in the image cluster set corresponds to the largest number, and use the single-frame emotion corresponding to the largest number as the target emotion.
  • the target emotions of the customers corresponding to each image cluster set in the target video data corresponding to the products to be displayed are sequentially acquired, that is, the emotions of each customer for the products to be displayed are acquired.
  • the target video data corresponding to product A includes 100 image cluster sets, and the target sentiment of the customer corresponding to each image cluster set is acquired, that is, the target sentiment of 100 customers on product A is acquired.
  • S50 Obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set.
  • the final emotion refers to the emotion corresponding to the product to be displayed obtained through quantitative analysis of the target emotion of the product to be displayed by each customer.
  • the server judges whether the target emotions of the customers corresponding to all image clustering sets are the same according to the target emotions of the customers corresponding to each image clustering set. If the target emotions of the customers corresponding to the image clustering sets are different, it will be based on the customer The number and the target emotion of the customer corresponding to each image cluster set are counted, the number corresponding to each target emotion is counted, and the target emotion with the largest number is taken as the final emotion. For example, the number of customers corresponding to the target video data of product A is 100, that is, the image cluster set is also 100.
  • the target emotions corresponding to 100 image cluster sets, 50 are joy, 30 are calm, and 20 are cold , Then the target emotion (ie joy) corresponding to the largest number is regarded as the final emotion of A commodity. If there are at least two target emotions corresponding to the largest number, the final emotion is determined based on the number of emotion categories of the target emotion. Preferably, the target emotion is a positive emotion as the final emotion. For example, the number of customers corresponding to the target video data of product A is 100, that is, the image cluster set is also 100. If 100 image cluster sets correspond to the target emotion, 50 are joy and 50 are indifferent, then the target emotion For joy as the final emotion of the product to be displayed. If there are at least two target emotions corresponding to the largest number of juxtapositions, and the target emotions are all negative emotions, since the final emotion corresponding to the final display target display product should be a positive emotion, any target emotion can be selected as the final emotion.
  • the target display product refers to obtaining a displayable product from the product to be displayed according to the final emotion corresponding to each product to be displayed.
  • the server obtains the target display product from the product to be displayed according to the final emotion corresponding to each product to be displayed, specifically including the following steps:
  • (1) determine whether the final emotion corresponding to each commodity to be displayed is a commodity that can be displayed. Specifically, preset emotions that can be displayed are set in advance, and the final emotion corresponding to each product to be displayed is matched with the preset emotion. If the final emotion is successfully matched with the preset emotion, the matched final emotion is matched to the waiting Display products are products that can be displayed. Through this step, the product to be displayed whose target emotion does not match the preset emotion is avoided as the target display product. Generally speaking, the preset emotions are positive emotions, which avoids taking the products to be displayed corresponding to the negative emotions as the target display products.
  • the final emotion corresponding to the product A to be displayed is joy
  • most customers are more interested in the product to be displayed, and the joy is matched with the preset emotion. If the joy and the preset emotion match successfully, then the product A is determined It is a product that can be displayed.
  • the final emotion corresponding to the item B to be displayed is disgust, anger or disappointment, most customers do not like the item to be displayed, and matching the final emotion corresponding to the item B to be displayed with the preset emotion fails, then B Not as a display product.
  • the emotion ranking table is a preset table, and the more positive emotions are ranked higher, for example, the ranking is based on joy, joy, surprise, calm, disgust, anger, and disappointment.
  • the final emotion corresponding to the item to be displayed that can be displayed is obtained, A is the final emotion corresponding to the item to be displayed is joy, and B corresponds to the item to be displayed.
  • the final emotion of C is joy
  • the final emotion corresponding to the product to be displayed in C is surprise
  • the final emotion corresponding to the product to be displayed in D is calm
  • the displayed products are sorted according to the emotional ranking table, namely A, B, C, and D, and get
  • the product to be displayed with the preset value in the front row is used as the target display product, that is, the first three A, B, and C products to be displayed are used as the target display product.
  • the products to be displayed corresponding to the final emotions are sorted according to the emotion ranking table, there is a situation of juxtaposition, first determine whether the target display product is obtained within the preset value; if the target display cannot be obtained within the preset value Commodities, the number of target video data of the product to be displayed that matches the final emotion of the tie (ie the maximum number of target emotions) is determined, the finite level of the final emotion of the tie is determined according to the number, and the target display product of the preset value is obtained.
  • the final emotion corresponding to the item to be displayed that can be displayed is obtained
  • A is the final emotion corresponding to the item to be displayed is joy
  • B corresponds to the item to be displayed
  • the final emotion of C is joy
  • C is the final emotion corresponding to the product to be displayed is calm
  • D is the final emotion corresponding to the product to be displayed is calm
  • the displayed products are sorted according to the emotion ranking table, namely A, B, C and D, C Parallel to D
  • the target display product cannot be obtained within the preset value
  • the number is 60; then the number of D to-be-displayed products corresponding to a number of 60 is prioritized over the number of C corresponding to 50
  • a preset value of the products to be displayed is obtained as the target display products, that is, the products to be displayed in A, B, and D are used as the target display products.
  • the target video data of the product to be displayed is obtained, and the face detection model is used to perform face recognition and clustering on the image to be recognized in the target video data, and the number of customers corresponding to the target video data and the number of customers corresponding to each customer are obtained.
  • Image clustering collection in order to subsequently determine the target display product according to the target sentiment of a certain number of customers, and improve the accuracy of the target display product. If the number of customers is greater than the preset number, the pre-trained micro-expression recognition model is used to identify the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained to realize the recognition of customer emotions .
  • the target emotion of the customer corresponding to the image cluster set is obtained to determine whether the commodity to be displayed is the commodity that the customer is interested in.
  • the target display products are obtained according to the final emotions corresponding to the products to be displayed, and the target display is improved. The accuracy of the product makes the target product a product that most customers are more interested in.
  • step S10 the target video data of the commodity to be displayed is obtained.
  • the target video data includes at least two frames to be recognized, which specifically includes the following steps:
  • S11 Obtain initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images.
  • the initial video data refers to the video data corresponding to each commodity to be displayed collected by the video collection tool.
  • a video capture tool is used to collect the initial video data corresponding to each product to be displayed.
  • the initial video data includes at least two frames of initial video images, and the initial video data includes the corresponding product identifier of the product to be displayed.
  • the initial video data can be subsequently analyzed to obtain the final emotion corresponding to each commodity to be displayed.
  • S12 Obtain attribute information of the commodity to be displayed, and the attribute information includes a suitable age and a suitable gender.
  • the appropriate age refers to the age corresponding to the commodity to be displayed.
  • the appropriate gender refers to the gender corresponding to the product to be displayed.
  • the attribute information corresponding to each commodity to be displayed is stored in the database.
  • the server searches the database according to the products to be displayed, and obtains attribute information corresponding to each product to be displayed.
  • the attribute information includes suitable age and suitable gender.
  • a certain product to be displayed is clothing, and the attribute information corresponding to the clothing includes a female with a suitable age of 20-24 and a suitable gender.
  • the product to be displayed is cosmetics, the appropriate age in the attribute information corresponding to the cosmetics is 25-30 years old, and the appropriate gender is male.
  • the commodities to be displayed are not specifically limited in this embodiment.
  • S13 Screen the initial video image with a suitable age and a suitable gender, obtain an image to be recognized, and form target video data based on at least two frames of the image to be recognized.
  • a pre-trained classifier is used to identify at least two frames of initial video images in the initial video data, and the target age and target gender corresponding to each initial video image are obtained.
  • the server matches the target age with the appropriate age.
  • the target gender is matched with the appropriate gender, the initial video image that is successfully matched with the appropriate age and successfully matched with the appropriate gender is determined as the image to be recognized, and the initial video image that is unsuccessfully matched is deleted, based on at least two frames of waiting
  • the recognition image forms the target video data.
  • the target age refers to the age obtained by recognizing the initial video image through a pre-trained classifier.
  • the target gender refers to the gender obtained by recognizing the initial video image through a pre-trained classifier.
  • Steps S11-S13 screening the initial video images according to the appropriate age and appropriate gender corresponding to the products to be displayed, obtain the images to be recognized, and form target video data based on at least two frames of images to be recognized, so that the target video data and the products to be displayed are more matched , Improve the accuracy of target display products.
  • the method for displaying goods based on image recognition further includes:
  • Super-Resolution refers to reconstructing a corresponding high-resolution image from the acquired low-resolution image.
  • the initial video image in the initial video data is a low-resolution image.
  • the image to be determined refers to the conversion of the initial video image into a high-resolution image.
  • the server obtains the initial video data.
  • the initial video data includes at least two frames of initial video images.
  • the initial video images are in the low-resolution (LR) space.
  • the feature map of the low-resolution space is extracted through the ESPCN algorithm, and the effective sub-pixel Convolutional layer, enlarge the initial video image from low resolution to high resolution, upgrade the final low resolution feature map to high resolution feature map, and obtain the high resolution corresponding to each initial video image based on the high resolution feature map Rate image, the high-resolution image is used as the image to be determined.
  • the core concept of the ESPCN algorithm is a sub-pixel convolutional layer.
  • the input is a low-resolution image (that is, the initial video image).
  • the size of the characteristic image obtained is The input image is the same, but the characteristic channel is r ⁇ 2 (r is the target magnification of the image).
  • r is the target magnification of the image.
  • step S13 the initial video image is screened by the appropriate age and the appropriate gender to obtain the image to be recognized, which specifically includes the following steps:
  • S131 Use a pre-trained classifier to identify at least two frames of images to be determined, and obtain the target age and target gender corresponding to each image to be determined.
  • the pre-trained classifier includes a gender classifier and an age classifier, and the image to be determined is recognized through the gender classifier and the age classifier respectively to obtain the target age and target gender corresponding to the image to be determined.
  • the target gender refers to the gender obtained by recognizing the image to be determined through the gender classifier.
  • the target age refers to the age obtained by recognizing the image to be determined by the age classifier.
  • the training image data contains face images of different ages and different genders, and each face image in the training image data is processed by age and gender.
  • Annotation the annotated training image data is input to the deep neural network, and the annotated training image data is trained through the deep neural network.
  • the deep neural network includes at least two convolutional layers, the predicted age and the labeled
  • the age is compared to adjust the weight and bias of each layer in the deep neural network until the model converges, and the age classifier is obtained.
  • the gender prediction value is compared with the marked gender to adjust the weight and bias of each layer in the deep neural network until the model converges, and the gender classifier is obtained.
  • a pre-trained gender classifier is used to identify the image to be determined.
  • the image to be determined is an image containing the face of the customer.
  • the image to be determined containing the face of the customer is subjected to face key point detection and feature extraction to obtain the face. Department features.
  • the extracted facial features are input into a pre-trained gender classifier, and the facial features are recognized by the gender classifier to obtain the target gender corresponding to the image to be determined.
  • And input the extracted facial features into a pre-trained age classifier, and classify the facial features through the age classifier to obtain the target age corresponding to the image to be determined.
  • the pre-trained gender classifier and age classifier are used to estimate the gender and age of the customer on the image to be determined, so as to improve the accuracy of obtaining the target gender and target age.
  • S132 Match the target age with the appropriate age, and match the target gender with the appropriate gender.
  • the appropriate age may be an age group, for example, 20-24 years old.
  • the server matches the target age with the appropriate age, mainly to determine whether the target age is within the appropriate age range.
  • the suitable gender is female and male, and the identified target gender is matched with the suitable gender.
  • the server determines that the target age is within the appropriate age range, and the target gender is successfully matched with the appropriate gender, it will use the to-be-determined image corresponding to the target age and the target gender as the image to be recognized.
  • Steps S131-S132 using a pre-trained classifier to identify at least two frames of images to be determined, and obtain the target age and target gender corresponding to each image to be determined, so as to realize the determination of the target age and target gender through the classifier , To improve the speed of acquiring the target display product.
  • the image to be determined corresponding to the target age and gender is used as the image to be identified, so that the acquired image to be identified and the attribute information of the product to be displayed.
  • the obtained target display product is more accurate, and the acquisition accuracy of the target display product is improved.
  • step S20 is to use a face detection model to perform face recognition and clustering on at least two frames of images to be recognized, and obtain the number of customers corresponding to the target video data and the number of customers corresponding to each customer.
  • the image clustering collection specifically includes the following steps:
  • S21 Use a face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized;
  • the server obtains the target video data, uses a face detection model to perform face recognition on each frame of the image to be recognized in the target video data, and obtains a face image corresponding to each image to be recognized in the target video data.
  • face recognition means that for any given frame of image, a certain strategy is used to search it to determine whether the image contains a face.
  • the face detection model is a pre-trained model used to detect whether each frame of the image to be recognized contains a human face area.
  • the server inputs each frame of the to-be-recognized image into the face detection model, and detects whether each frame of the to-be-recognized image contains a human face. If the to-be-recognized image contains a human face, it acquires each frame of the target video data. A face image corresponding to the image to be recognized.
  • S22 Cluster the face images corresponding to the image to be recognized, and obtain at least two image cluster sets, and each image cluster set includes at least one frame of the image to be recognized.
  • the server clusters the acquired facial images corresponding to the image to be recognized, clusters the facial images containing the same customer, and acquires at least two image cluster sets, where each image cluster set Includes at least one frame to be recognized.
  • the feature extraction algorithm is used to extract the facial features of the facial image corresponding to each image to be recognized, and the facial features corresponding to the facial image are calculated for feature similarity. If the feature similarity is greater than the preset threshold, it is stated It is the face image of the same customer, and clusters the to-be-identified images corresponding to the face image of the same customer to obtain the image cluster set corresponding to each customer. That is, one customer corresponds to one image cluster set, and each image cluster set includes at least one frame to be recognized.
  • the number of image cluster sets corresponding to each commodity to be displayed is counted, and the number of image cluster sets is taken as the number of customers corresponding to the target video data.
  • Steps S21-S23 Use the face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized, so as to determine whether the image to be recognized is a face image and avoid not containing a face
  • the images to be recognized are clustered to improve the acquisition speed of subsequent image clustering sets.
  • Cluster the face images corresponding to the image to be recognized obtain at least two image cluster sets, and obtain the number of customers corresponding to the target video data according to the number of image cluster sets, so as to determine the number of customers and ensure that the number of customers is obtained accuracy.
  • each image to be recognized corresponds to a time mark, and the time mark refers to the time corresponding to the image to be recognized is collected.
  • step S22 that is, clustering the face images corresponding to the image to be recognized, and obtaining at least two image cluster sets, specifically includes the following steps:
  • S221 According to the time mark, use the first recognized face image in at least two frames of images to be recognized as a reference image.
  • the reference image refers to the face image recognized for the first time from the image to be recognized.
  • the server obtains the time stamps corresponding to at least two frames of images to be recognized, and according to the time stamps, first determines the first recognized face image in the at least two frames of images to be recognized, and uses the face image as the reference image. By determining the reference image, the acquisition speed of the image cluster set can be improved.
  • a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized except for the reference image in the at least two frames of images to be recognized to obtain the feature similarity.
  • the similarity algorithm may be Euclidean distance algorithm, Manhattan distance algorithm, Minkowski distance algorithm, or cosine similarity algorithm.
  • the cosine similarity algorithm is used to calculate the characteristic similarity between the reference image and the remaining images to be recognized, which can speed up the acquisition of image clustering sets and improve the acquisition efficiency of target display products.
  • the preset threshold is a preset value. If the server determines that the feature similarity between the reference image and the remaining images to be recognized is greater than the preset threshold, it is considered that the reference image matches the remaining images to be recognized successfully, and the reference image matches the remaining images. If the image to be identified is an image of the same customer, the image to be identified and the reference image whose feature similarity is greater than the preset threshold are attributed to the same image cluster set.
  • the reference image 1 and the remaining image 2 to be recognized is 80%
  • the feature similarity between the reference image 1 and the remaining image 3 to be recognized is 99%
  • the preset threshold is 90%
  • the reference image 1 and The feature similarity corresponding to the remaining image 3 to be identified is greater than the preset threshold, and the reference image 1 and the remaining image 3 to be identified are attributed to the same image cluster set.
  • S224 If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images whose feature similarity is not greater than the preset threshold to a new reference image, repeat the execution according to the time mark, and use it in sequence
  • the similarity algorithm calculates the feature similarity between the reference image and the remaining images to be identified until at least two image cluster sets to be identified are completed, forming at least two image cluster sets.
  • the server determines that the feature similarity between the reference image and the remaining images to be recognized is not greater than the preset threshold, it is considered that the reference image and the remaining images to be recognized have failed to match, and the customers corresponding to the reference image and the customers corresponding to the remaining images to be recognized are different.
  • the first image among the remaining images to be identified whose feature similarity is not greater than the preset threshold is updated as a new reference image.
  • the feature similarity between the reference image 1 and the remaining image 2 to be recognized is 80%, and the preset threshold is 90%, then the feature similarity corresponding to the reference image 1 and the remaining image 2 to be recognized is not greater than the preset threshold, then Time stamp, the remaining image 2 to be recognized is updated as a new reference image.
  • Steps S221-S224 according to the time stamp, the first recognized face image in at least two frames to be recognized is used as the reference image, and the similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized in order to determine the reference image Whether it is the same customer as the remaining images to be recognized. If the feature similarity is greater than the preset threshold, the image to be recognized and the reference image with feature similarity greater than the preset threshold are attributed to the same image clustering set, so as to cluster the to-be-recognized images of the same customer.
  • the algorithm calculates the feature similarity between the reference image and the remaining images to be recognized until at least two frame of image clustering sets to be identified are completed to form at least two image clustering sets to achieve clustering of the images to be identified for the same customer. In order to subsequently determine the target sentiment of each customer for the product to be displayed.
  • step S30 a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained. , Specifically including the following steps:
  • S31 Use the face key point algorithm to perform face recognition on the images to be recognized in each image cluster set, and obtain the face key points corresponding to each image to be recognized.
  • the face key point algorithm can be but not limited to Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature transform) algorithm, SURF (Speeded Up Robust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm.
  • ERT Ensemble of Regression Tress
  • SIFT scale-invariant feature transform
  • SURF Speeded Up Robust Features
  • LBP Local Binary Patterns
  • HOG Histogram of Oriented Gridients
  • the ERT algorithm is a regression-based method, and the ERT algorithm is expressed as follows: among them, Is the shape or coordinates of the feature points of the image to be recognized obtained in the t+1 iteration, t represents the cascade number, In order to predict the shape or coordinates of the feature points of the image, I is the image to be recognized as input by the regressor, and r t represents the t-level regressor.
  • Each regressor is composed of many regression trees, which can be obtained through training. Tree, through the regression tree to obtain the key points of the face corresponding to each image to be recognized.
  • S32 Use a feature extraction algorithm to perform feature extraction on the face key points corresponding to each image to be recognized, and obtain local features corresponding to the face key points.
  • the feature extraction algorithm can be the CNN (Convolutional Neural Network, convolutional neural network) algorithm.
  • the CNN algorithm is used to extract the local features of the key points of the face corresponding to the image to be recognized. Specifically, the local features are extracted according to the location of the facial action unit. .
  • the CNN algorithm is a feed-forward neural network, and its artificial neurons can respond to a part of the surrounding units within the coverage area, and can quickly and efficiently perform image processing.
  • a pre-trained convolutional neural network is used to quickly extract local features corresponding to key points of a human face.
  • the key points of the face corresponding to each image to be recognized are subjected to a convolution operation through several convolution kernels, and the result of the convolution is the local feature corresponding to the face detection point.
  • y is the output local feature
  • x is a two-dimensional input vector of size (M, N), which is formed by the coordinates of the key points of the face of L
  • w ij is a convolution kernel of size I*J
  • b Is the bias
  • the size is M*N
  • the activation function is denoted by f
  • each convolution kernel is convolved with the face key points of the input image to be recognized in the previous layer, and each convolution kernel will have a corresponding
  • the local features of the convolution kernel share the weights, and the number of parameters is greatly reduced, which greatly improves the training speed of the network.
  • the local features corresponding to the facial action unit can be obtained.
  • AU1, AU2, AU5, and AU26 are the local features corresponding to raised inner eyebrows, raised outer eyebrows, raised upper eyelids, and opened lower jaw.
  • the convolutional neural network is used to extract the local features of the key points of the face in the image to be recognized, so as to subsequently determine the target facial action unit based on the local features, and determine the customer's emotions based on the recognized target facial action unit.
  • the use of convolutional neural network for recognition is faster and the recognition accuracy is higher.
  • S33 Use a pre-trained classifier to recognize local features, and obtain a target facial action unit corresponding to each local feature.
  • the local features are recognized by each SVM classifier in the pre-trained micro-expression recognition model, where the SVM classifier and the number of recognizable facial motion units are the same, that is, 54 facial motion units can be recognized , Then there are 54 pre-trained SVM classifiers.
  • the probability value is obtained, and the obtained probability value is compared with the preset probability threshold, which will be greater than the preset probability
  • the facial action unit corresponding to the probability value of the threshold is used as the target facial action unit corresponding to the local feature, and all target facial action units corresponding to the local feature are acquired.
  • the evaluation table is a pre-configured table.
  • the evaluation table stores the correspondence between facial action unit combinations and emotions, for example, AU12, AU6, and AU7 combinations, the corresponding emotions are joy, AU9, AU10, AU17, and AU24 correspond to The emotion is disgust.
  • the server searches the evaluation table through the target facial action unit corresponding to each local feature, obtains the combination that matches the target facial action unit, and uses the emotion corresponding to the combination as the single frame emotion corresponding to the image to be recognized.
  • the face key point algorithm is used to perform face recognition on the to-be-recognized images in each image clustering set, to obtain the face key points corresponding to each to-be-recognized image, and provide technology for subsequent extraction of local features Assist to improve the accuracy of local feature extraction; use feature extraction algorithms to extract features of key points of the face to quickly obtain the local features corresponding to the key points of the face, so that the subsequent extracted target facial action units are more accurate; adopt The pre-trained classifier recognizes the local features to quickly obtain the target facial action unit corresponding to each local feature, and realize the determination of the target facial action unit.
  • a product display device based on image recognition is provided, and the product display device based on image recognition has a one-to-one correspondence with the product display method based on image recognition in the foregoing embodiment.
  • the image recognition-based product display device package data acquisition module 10, image cluster collection acquisition module 20, single frame emotion determination module 30, target emotion acquisition module 40, final emotion acquisition module 50 and target display products Get the module 60.
  • the detailed description of each functional module is as follows:
  • the data acquisition module 10 is configured to acquire target video data of the commodity to be displayed, and the target video data includes at least two frames of images to be recognized.
  • the image clustering set acquisition module 20 is used to use the face detection model to perform face recognition and clustering on at least two frames of images to be recognized, to acquire the number of customers corresponding to the target video data and the image clustering set corresponding to each customer.
  • An image cluster set includes at least one frame to be recognized.
  • the single frame emotion determination module 30 is configured to use a pre-trained micro-expression recognition model to recognize images to be recognized in each image cluster set if the number of customers is greater than the preset number, and obtain a single frame of each image to be recognized Frame emotions.
  • the target emotion obtaining module 40 is configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame to be recognized.
  • the final emotion obtaining module 50 is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set.
  • the target display product obtaining module 60 is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.
  • the data acquisition module 10 includes an initial video data acquisition unit 11, an attribute information determination unit 12 and a target video data formation unit 13.
  • the initial video data acquiring unit 11 is configured to acquire initial video data of the commodity to be displayed, and the initial video data includes at least two frames of initial video images.
  • the attribute information determining unit 12 is used to obtain attribute information of the commodity to be displayed, and the attribute information includes a suitable age and a suitable gender.
  • the target video data forming unit 13 is configured to filter the initial video images with a suitable age and a suitable gender, obtain the image to be recognized, and form target video data based on at least two frames of the image to be recognized.
  • the commodity display device based on image recognition before the target video data forming unit, the commodity display device based on image recognition further includes an image resolution conversion unit.
  • the image resolution conversion unit is used to process at least two frames of initial video images by using super-resolution technology, obtain high-resolution images corresponding to the at least two frames of initial video images, and use the high-resolution images as the images to be determined.
  • the target video data forming unit includes a target age and target gender determination subunit, a matching subunit, and a target image determination subunit.
  • the target age and target gender determination subunit is used to identify at least two frames of images to be determined using a pre-trained classifier, and obtain the target age and target gender corresponding to each image to be determined.
  • the matching subunit is used to match the target age with the appropriate age, and match the target gender with the appropriate gender.
  • the to-be-recognized image determination subunit is used for if the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, then the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
  • the image cluster set acquisition module 20 includes a face image acquisition unit, an image cluster set acquisition unit, and a customer number determination unit.
  • the face image acquisition unit is configured to use a face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized.
  • the image clustering set obtaining unit is configured to cluster the face images corresponding to the image to be recognized to obtain at least two image cluster sets, and each image cluster set includes at least one frame of the image to be recognized.
  • the customer number determining unit is used to obtain the number of customers corresponding to the target video data according to the number of image clustering sets.
  • each image to be recognized corresponds to a time stamp.
  • the image cluster set acquisition unit includes a reference image determination subunit, a feature similarity calculation unit, a first image cluster set determination subunit, and a second image cluster set determination subunit.
  • the reference image determination subunit is used to use the first recognized face image in at least two frames of images to be recognized as the reference image according to the time stamp.
  • the feature similarity calculation unit is used to calculate the feature similarity between the reference image and the remaining images to be recognized by sequentially adopting a similarity algorithm according to the time mark.
  • the first image cluster set determining subunit is configured to, if the feature similarity is greater than the preset threshold, attribute the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set.
  • the second image clustering set determining subunit is used to update the first image in the remaining images to be identified whose feature similarity is not greater than the preset threshold according to the time mark if the feature similarity is not greater than the preset threshold to a new reference
  • the steps of calculating the feature similarity between the reference image and the remaining image to be identified are repeated according to the time mark and sequentially using the similarity algorithm, until at least two image cluster sets to be identified are completed to form at least two image cluster sets.
  • the single frame emotion determination module 30 includes a face key point acquisition unit, a local feature extraction unit, a target facial action unit acquisition unit, and a single frame emotion acquisition unit.
  • the face key point acquisition unit is used to perform face recognition on the image to be recognized in each image clustering set by using the face key point algorithm, and obtain the face key point corresponding to each image to be recognized.
  • the local feature extraction unit is used to perform feature extraction on the key points of the face corresponding to each image to be recognized by using a feature extraction algorithm to obtain the local features corresponding to the key points of the face.
  • the target facial action unit acquisition unit is used to recognize the local features using a pre-trained classifier, and acquire the target facial action unit corresponding to each local feature.
  • the single frame emotion acquisition unit is used to look up the evaluation table based on the target facial action unit corresponding to each local feature, and acquire the single frame emotion of each image to be recognized.
  • the various modules in the above-mentioned image recognition-based product display device can be implemented in whole or in part by software, hardware and their combination.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store the face detection model and the attribute information of the commodity to be displayed.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions to implement the
  • the steps of the product display method based on image recognition are, for example, the steps S10 to S60 shown in FIG. 2 or the steps shown in FIG. 3 to FIG. 7.
  • the processor executes the computer-readable instructions, the functions of the modules/units in the commodity display apparatus based on image recognition in the foregoing embodiments are implemented, for example, the functions of modules 10 to 50 shown in FIG. 8. To avoid repetition, I won’t repeat them here.
  • one or more readable storage media storing computer readable instructions are provided, the computer readable storage medium storing computer readable instructions, and the computer readable instructions are processed by one or more
  • the processor executes, the one or more processors execute the following steps to implement the product display method based on image recognition in the above method embodiment, for example, step S10 to step S60 shown in FIG. 2, or, FIG. 3 to FIG. 7 steps shown.
  • the computer readable instruction is executed by the processor, the function of each module/unit in the commodity display device based on image recognition in the above embodiment is realized, for example, the function of the module 10 to the module 60 shown in FIG. 8. To avoid repetition, I won’t repeat them here.
  • the readable storage medium in this embodiment includes a nonvolatile readable storage medium and a volatile readable storage medium.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (RambuS) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An image identification-based product display method, a device, an apparatus, and a medium. The method comprises: performing, by means of a face detection model, face identification and clustering on at least two images to be identified in target video data of a product being displayed, and acquiring a customer number corresponding to the target video data and respective image cluster sets corresponding to customers; if the customer number is greater than a pre-determined number, identifying images to be identified in each of the image cluster sets by means of a pre-trained microexpression identification model, and acquiring respective single-frame emotions of the images (S30); acquiring, on the basis of the single-frame emotions of the images to be identified, target emotions of the customers corresponding to the image cluster sets (S40); acquiring a final emotion according to the customer number and the respective target emotions of the customers corresponding to the image cluster sets (S50); and acquiring a target display product according to the final emotion corresponding to the product to being displayed (S60). The invention solves the problem of insufficient appeal of display products.

Description

基于图像识别的商品展示方法、装置、设备及介质Commodity display method, device, equipment and medium based on image recognition
本申请以2019年1月17日提交的申请号为201910042541.3,名称为“基于图像识别的商品展示方法、装置、设备及介质”的中国发明申请为基础,并要求其优先权。This application is based on the Chinese invention application filed on January 17, 2019 with the application number 201910042541.3, titled "Image recognition-based commodity display method, device, equipment and medium", and claims its priority.
技术领域Technical field
本申请涉及智能决策领域,尤其涉及一种基于图像识别的商品展示方法、装置、设备及介质。This application relates to the field of intelligent decision-making, and in particular to a method, device, equipment and medium for displaying commodities based on image recognition.
背景技术Background technique
目前,随着时间更迭,众多商家在新产品推出或者换季期间,会推出新的商品,需要对推出的商品进行推广,以吸引客户进行购买。通常商家是将需要推出的商品挑选出一些商品进行展示,以供客户浏览查看,但由于不清楚客户的喜好,而以个人眼光从商品中挑选出比较出色的商品进行展示,或者是公司内部人员对需要展示的商品进行决定,从而使得展示的商品不符合客户眼光或需求,降低展示商品的吸引力,无法吸引客户进店购买。At present, as time changes, many merchants will launch new products during the launch of new products or seasons, and they need to promote the products to attract customers to purchase. Usually merchants select some products to display for customers to browse and view, but because they don’t know the customer’s preferences, they select better products from the products for display from a personal perspective, or they are internal personnel of the company. Decide on the products that need to be displayed, so that the displayed products do not meet the customer's vision or needs, reduce the attractiveness of the displayed products, and cannot attract customers to buy.
发明内容Summary of the invention
本申请实施例提供一种基于图像识别的商品展示方法、装置、设备及介质,以解决展示商品吸引力不够的问题。The embodiments of the present application provide a method, device, equipment, and medium for displaying commodities based on image recognition to solve the problem of insufficient attractiveness of displayed commodities.
一种基于图像识别的商品展示方法,包括:A product display method based on image recognition, including:
获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;
采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;
若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;
依据所述待展示商品对应的最终情绪,获取目标展示商品。Obtain the target display product according to the final emotion corresponding to the product to be displayed.
一种基于图像识别的商品展示装置,包括:A product display device based on image recognition, including:
数据获取模块,用于获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;A data acquisition module for acquiring target video data of a commodity to be displayed, where the target video data includes at least two frames of images to be recognized;
图像聚类集合获取模块,用于采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;The image clustering collection acquisition module is used to perform face recognition and clustering on at least two frames of the to-be-recognized images by using a face detection model to acquire the number of customers corresponding to the target video data and the image clusters corresponding to each customer A set, each of the image cluster sets includes at least one frame to be recognized;
单帧情绪确定模块,用于若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;The single-frame emotion determination module is configured to, if the number of the customers is greater than the preset number, use a pre-trained micro-expression recognition model to identify the image to be recognized in each image cluster set, and obtain each Single frame emotion of the image to be recognized;
目标情绪获取模块,用于基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;A target emotion obtaining module, configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
最终情绪获取模块,用于依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;The final emotion obtaining module is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set;
目标展示商品获取模块,用于依据所述待展示商品对应的最终情绪,获取目标展示商品。The target display product obtaining module is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;
采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;
若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;
依据所述待展示商品对应的最终情绪,获取目标展示商品。Obtain the target display product according to the final emotion corresponding to the product to be displayed.
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one Or multiple processors perform the following steps:
获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;
采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;
若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;
依据所述待展示商品对应的最终情绪,获取目标展示商品。Obtain the target display product according to the final emotion corresponding to the product to be displayed.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings, and claims.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings required in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application For those of ordinary skill in the art, without paying creative labor, other drawings can also be obtained based on these drawings.
图1是本申请一实施例中基于图像识别的商品展示方法的应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a commodity display method based on image recognition in an embodiment of the present application;
图2是本申请一实施例中基于图像识别的商品展示方法的流程图;FIG. 2 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;
图3是本申请一实施例中基于图像识别的商品展示方法的流程图;FIG. 3 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;
图4是本申请一实施例中基于图像识别的商品展示方法的流程图;FIG. 4 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;
图5是本申请一实施例中基于图像识别的商品展示方法的流程图;FIG. 5 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;
图6是本申请一实施例中基于图像识别的商品展示方法的流程图;FIG. 6 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;
图7是本申请一实施例中基于图像识别的商品展示方法的流程图;FIG. 7 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;
图8是本申请一实施例中基于图像识别的商品展示装置的原理框图;FIG. 8 is a functional block diagram of a product display device based on image recognition in an embodiment of the present application;
图9是本申请一实施例中计算机设备的一示意图。Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式detailed description
下面将结和本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.
本申请实施例提供的基于图像识别的商品展示方法,可应用在如图1的应用环境中,其中,客户端通过网络与服务端进行通信。该基于图像识别的商品展示方法应用在服务端上,通过对待展示商品对应的目标视频数据进行分析识别,获取目标视频数据中每一客户对应的情绪,并根据每一客户对应的情绪确定客户对该待展示商品的最终情绪,根据每一待展示商品对应的最终情绪确定目标展示商品,使得目标展示商品为客户较为关注的商品,提高目标展示商品的吸引力,以吸引客户对展示商品进行购买。其中,客户端可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务端可以用独立的服务端或者是多个服务端组成的服务端集群来实现。The product display method based on image recognition provided by the embodiment of the present application can be applied in the application environment as shown in FIG. 1, wherein the client communicates with the server through the network. The product display method based on image recognition is applied on the server, and the target video data corresponding to the product to be displayed is analyzed and identified, and the emotion corresponding to each customer in the target video data is obtained, and the customer pair is determined according to the emotion corresponding to each customer The final sentiment of the product to be displayed, the target display product is determined according to the final sentiment corresponding to each product to be displayed, so that the target display product is the product that customers pay more attention to, and the attractiveness of the target display product is improved to attract customers to purchase the displayed product . Among them, the client can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种基于图像识别的商品展示方法,以该方法应用在图1中的服务端为例进行说明,具体包括如下步骤:In one embodiment, as shown in FIG. 2, a product display method based on image recognition is provided. The method is applied to the server in FIG. 1 as an example for description, which specifically includes the following steps:
S10:获取待展示商品的目标视频数据,目标视频数据包含至少两帧待识别图像。S10: Obtain target video data of the commodity to be displayed, where the target video data includes at least two frames of images to be recognized.
其中,目标视频数据是指对与待展示商品对应的初始视频数据进行筛选以获取到的视频数据,具体可以是满足一些条件的的视频数据。例如,满足待展示商品的属性信息的视频数据。属性信息可包括适宜年龄和适宜性别。可以理解地,通过适宜年龄和适宜性别对采集到的与待展示商品对应的初始视频数据进行筛选,获取到的目标视频数据。待识别图像是指根据适宜年龄和适宜性别进行筛选后的图像。其中,初始视频数据为视频采集工具采集到的与每一待展示商品对应的视频数据。Among them, the target video data refers to the video data obtained by filtering the initial video data corresponding to the commodity to be displayed, and may specifically be video data that meets some conditions. For example, video data satisfying the attribute information of the product to be displayed. The attribute information may include appropriate age and appropriate gender. Understandably, the collected initial video data corresponding to the commodity to be displayed is filtered by suitable age and suitable gender to obtain the target video data. The image to be recognized refers to an image that is screened according to a suitable age and a suitable gender. Among them, the initial video data is the video data corresponding to each commodity to be displayed collected by the video collection tool.
具体地,预先为每一待展示商品所在区域配置一视频采集工具,该视频采集工具用于采集图像或者视频数据,当视频采集工具检测到有客户出现在采集工具的采集范围之内时,视频采集工具自动触发,并对客户进行图像或者视频数据采集。该视频采集工具具体为摄像头,通过摄像头可实时采集每一待展示商品对应的采集范围内的初始视频数据。由于每一采集工具对应于一个待展示商品,通过每一摄像头采集每一个待展示商品所在区域的初始视频数据,通过对待展示商品所在区域的初始视频数据进行筛选,以获取到每一与待展示商品对应的目标视频数据。其中,采集到的初始视频数据中携带有待展示商品对应的商品标识,以便后续通过商品标识确定对应的目标视频数据。例如,视频采集工具采集到的初始视频数据包括商品标识A,那么该初始视频数据为商品标识A对应的初始视频数据,对初始视频数据进行筛选,以获取到商品标识A对应的目标视频数据。其中,商品标识是指用于区分不同待展示商品的唯一性标识。可选地,商品标识可以由数字、字母、文字或者符号的至少一项组成。例如,商品标识可以是待展示商品的编号或者是序列号等。Specifically, a video collection tool is configured in advance for each area where the commodity to be displayed is located. The video collection tool is used to collect images or video data. When the video collection tool detects that a customer appears within the collection range of the collection tool, the video The acquisition tool automatically triggers and collects image or video data from the customer. The video collection tool is specifically a camera, through which the initial video data within the collection range corresponding to each commodity to be displayed can be collected in real time. Since each collection tool corresponds to a product to be displayed, the initial video data of each area of the product to be displayed is collected through each camera, and the initial video data of the area of the product to be displayed is filtered to obtain each and the product to be displayed. The target video data corresponding to the product. Wherein, the collected initial video data carries a product identifier corresponding to the product to be displayed, so that the corresponding target video data can be determined subsequently through the product identifier. For example, the initial video data collected by the video collection tool includes the product identifier A, then the initial video data is the initial video data corresponding to the product identifier A, and the initial video data is filtered to obtain the target video data corresponding to the product identifier A. Among them, the product identifier refers to a unique identifier used to distinguish different products to be displayed. Optionally, the product identification may consist of at least one of numbers, letters, words or symbols. For example, the product identifier may be the serial number or serial number of the product to be displayed.
S20:采用人脸检测模型对至少两帧待识别图像进行人脸识别和聚类,获取目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一图像聚类集合包括至少一帧待识别图像。S20: Use the face detection model to perform face recognition and clustering on at least two frames of images to be recognized, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, and each image cluster set includes at least one Frame the image to be recognized.
其中,人脸检测模型是指预先训练好的用于检测每一帧待识别图像是否包含人的面部区域的模型。客户数量是指在目标视频数据中,根据客户不同所确定的数量。图像聚类集合是指将同一客户对应的待识别图像进行聚类,以形成的集合。Among them, the face detection model refers to a pre-trained model used to detect whether each frame of the image to be recognized contains a human face area. The number of customers refers to the number determined according to different customers in the target video data. The image clustering set refers to clustering the to-be-identified images corresponding to the same customer to form a set.
具体地,服务端与数据库网络连接,数据库中存储有人脸检测模型,目标视频数据中 包含不同客户对应的待识别图像,采用人脸检测模型对待识别图像进行识别,以获取到待识别图像中包含的人脸图像,其中,人脸图像是指客户的人脸区域的图像。服务端将获取到的目标视频数据中的每一待识别图像输入至人脸检测模型,通过人脸检测模型对目标视频数据中每一待识别图像进行人脸识别,确定每一待识别图像是否为人脸图像,若待识别图像为人脸图像,则将相同的人脸图像进行聚类,即对相同的人脸图像对应的待识别图像进行聚类,以获取到每一客户对应的图像聚类集合,根据目标视频数据中图像聚类集合的数量,确定目标视频数据中客户数量。Specifically, the server is connected to the database network, the face detection model is stored in the database, the target video data contains the images to be recognized corresponding to different customers, and the face detection model is used to recognize the images to be recognized to obtain the images that contain Face image, where the face image refers to the image of the customer’s face area. The server inputs each image to be recognized in the acquired target video data into the face detection model, and performs face recognition on each image to be recognized in the target video data through the face detection model to determine whether each image to be recognized is Is a face image, if the image to be recognized is a face image, cluster the same face image, that is, cluster the image to be recognized corresponding to the same face image to obtain the image cluster corresponding to each customer Set, determine the number of customers in the target video data according to the number of image clustering sets in the target video data.
更具体地,采用特征提取算法提取每一待识别图像对应的人脸图像的人脸特征,将每一待识别图像对应的人脸特征进行特征相似度计算,若特征相似度大于预设阈值,则说明特征相似度大于预设阈值对应的人脸图像为相同客户对应的人脸图像,将同一客户对应的人脸图像对应的待识别图像进行聚类,以获取每一客户对应的图像聚类集合,并根据图像聚类集合的数量,确定目标视频数据中的客户数量。其中,预设阈值时预先设定的用于评估相似度是否达到判定为同一客户的人脸的值。特征提取算法包括但不局限于CNN(Convolutional Neural Network,卷积神经网)算法,例如,可通过CNN算法提取人待识别图像对应的人脸图像的人脸特征。More specifically, the feature extraction algorithm is used to extract the facial features of the facial image corresponding to each image to be recognized, and the facial features corresponding to each image to be recognized are subjected to feature similarity calculation. If the feature similarity is greater than a preset threshold, It means that the facial image corresponding to the feature similarity greater than the preset threshold is the facial image corresponding to the same customer, and the image to be recognized corresponding to the facial image corresponding to the same customer is clustered to obtain the image cluster corresponding to each customer Set, and determine the number of customers in the target video data according to the number of image clustering sets. Wherein, when the threshold is preset, it is preset to evaluate whether the similarity reaches the value determined to be the face of the same customer. Feature extraction algorithms include, but are not limited to, CNN (Convolutional Neural Network, convolutional neural network) algorithms. For example, the CNN algorithm can be used to extract the facial features of the facial image corresponding to the image to be recognized.
S30:若客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一图像聚类集合中的待识别图像进行识别,获取每一待识别图像的单帧情绪。S30: If the number of customers is greater than the preset number, the pre-trained micro-expression recognition model is used to recognize the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained.
其中,微表情识别模型是指通过捕捉待识别图像中的客户脸部的局部特征,并根据局部特征识别待识别图像中人脸的各个目标面部动作单元,再根据所识别出的目标面部动作单元确定其情绪的模型。单帧情绪是指通过微表情识别模型对待识别图像进行识别,根据识别出的目标面部动作单元确定的情绪。Among them, the micro-expression recognition model is to capture the local features of the customer's face in the image to be recognized, and identify each target facial action unit of the face in the image to be recognized according to the local feature, and then according to the recognized target facial action unit Determine the model of its emotions. Single-frame emotions refer to the emotions to be recognized by the micro-expression recognition model to identify the image to be recognized, and the emotions determined according to the recognized target facial action unit.
其中,微表情识别模型可以是基于深度学习的神经网络识别模型,可以是基于分类的局部识别模型,还可以是基于局部二值模式(Local Binary Pattern,LBP)的局部情绪识别模型。本实施例中,微表情识别模型是基于分类的局部识别模型,微表情识别模型预先进行训练时,通过预先收集大量的训练图像数据,训练图像数据中包含每一面部动作单元的正样本和面部动作单元的负样本,通过分类算法对训练图像数据进行训练,获取微表情识别模型。本实施例中,可以是通过SVM分类算法对大量的训练图像数据进行训练,以获取到与N个面部动作单元对应的SVM分类器。例如,可以是39个面部动作单元对应的39个SVM分类器,也可以是54个面部动作单元对应的54个SVM分类器,进行训练的训练图像数据中包含的不同面部动作单元的正样本和负样本越多,则获取到的SVM分类器数量越多。可以理解地,通过N个SVM分类器以形成微表情识别模型,其获取到的SVM分类器越多,则形成的微表情识别模型所识别出的情绪越精准。Among them, the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on local binary pattern (LBP). In this embodiment, the micro-expression recognition model is a partial recognition model based on classification. When the micro-expression recognition model is trained in advance, a large amount of training image data is collected in advance, and the training image data contains the positive samples and faces of each facial action unit. The negative samples of the action unit are trained on the training image data through the classification algorithm to obtain the micro-expression recognition model. In this embodiment, a large amount of training image data may be trained by an SVM classification algorithm to obtain SVM classifiers corresponding to N facial action units. For example, it may be 39 SVM classifiers corresponding to 39 face action units, or 54 SVM classifiers corresponding to 54 face action units, and the positive samples of different face action units included in the training image data for training The more negative samples, the more SVM classifiers are obtained. Understandably, the micro-expression recognition model is formed through N SVM classifiers, and the more SVM classifiers obtained, the more accurate the emotions recognized by the formed micro-expression recognition model.
其中,预设数量是预先设定的值,每一待展示商品对应的预设数量相同,服务端判断出客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一图像聚类集合中的待识别图像进行识别,获取每一待识别图像对应的单帧情绪。采用预先训练好的微表情识别模型对每一图像聚类集合中的待识别图像进行识别具体包括如下步骤,服务端先对待识别图像进行人脸关键点检测和特征提取,以从待识别图像中获取对应的局部特征,再将待识别图像对应的局部特征输入至预先训练好的微表情识别模型中。微表情识别模型中包括N个SVM分类器,每个SVM分类器对一个待识别图像对应的局部特征进行识别,通过N个SVM分类器对的输入的所有局部特征进行识别,获取N个SVM分类器输出的与该面部动作单元对应的概率值,当概率值大于预设概率阈值时,则将该面部动作单元确定为与该待识别图像对应的目标面部动作单元。其中,预设概率阈值时预先设定的值。其中,目标面部动作单元是指根据微表情识别模型对待识别图像进行识别,获取到的面部动作单元(Action Unit,AU)。Among them, the preset number is a preset value, and the preset number corresponding to each commodity to be displayed is the same. The server determines that the number of customers is greater than the preset number, and uses a pre-trained micro-expression recognition model to gather each image. The images to be recognized in the class set are recognized, and the single frame emotion corresponding to each image to be recognized is obtained. Using a pre-trained micro-expression recognition model to recognize the images to be recognized in each image cluster set specifically includes the following steps. The server first performs face key point detection and feature extraction on the images to be recognized, so as Obtain the corresponding local features, and then input the local features corresponding to the image to be recognized into the pre-trained micro-expression recognition model. The micro-expression recognition model includes N SVM classifiers, each SVM classifier recognizes a local feature corresponding to an image to be recognized, and all the local features of the input are recognized by N SVM classifiers to obtain N SVM classifications When the probability value corresponding to the facial action unit output by the device is greater than the preset probability threshold, the facial action unit is determined as the target facial action unit corresponding to the image to be recognized. Among them, the preset probability threshold is a preset value. The target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model.
本实施例中,微表情识别模型中包含54个SVM分类器,建立面部动作单元编号映射 表,每个面部动作单元用一个预先规定的编号表示。例如,AU1为内眉上扬,AU2为外眉上扬,AU5为上眼睑上扬和AU26为下颚张开等。每个面部动作单元有训练好对应的SVM分类器。例如,通过内眉上扬对应的SVM分类器可识别出内眉上扬的局部特征属于内眉上扬的概率值,通过外眉上扬对应的SVM分类器可识别出外眉上扬的局部特征属于外眉上扬的概率值等。概率值具体可以是0-1之间的值,若输出的概率值为0.6,预设概率阈值为0.5,那么概率值0.6大于预设概率阈值0.5,则将0.6对应的面部动作单元,作为待识别图像的目标面部动作单元。In this embodiment, the micro-expression recognition model includes 54 SVM classifiers, and a facial action unit number mapping table is established, and each facial action unit is represented by a predetermined number. For example, AU1 means the inner eyebrows are raised, AU2 is the outer eyebrows are raised, AU5 is the upper eyelids are raised, and AU26 is the lower jaw. Each facial action unit has a corresponding SVM classifier trained. For example, the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow, and the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow. Probability values, etc. The probability value can be a value between 0-1. If the output probability value is 0.6 and the preset probability threshold value is 0.5, then the probability value 0.6 is greater than the preset probability threshold value 0.5, then the facial action unit corresponding to 0.6 is used as the waiting Identify the target facial action unit of the image.
具体地,服务端可根据每一待识别图像对应的目标面部动作单元确定该待识别图像的单帧情绪,即根据每一待识别图像对应的目标面部动作单元查询评估表,获取与该待识别图像相对应的单帧情绪。如上实施例中,通过54个SVM分类器对局部特征进行识别,确定与待识别图像对应的所有目标面部动作单元,并根据所有目标面部动作单元,查找评估表,确定与待识别图像对应的单帧情绪,以提高获取客户单帧情绪的准确性。其中,评估表是预先配置的表。一种或多种面部动作单元形成不同的情绪,预先获取到每一情绪对应的面部动作单元组合,将每一面部动作单元组合和对应的情绪进行关联存储,以形成评估表。Specifically, the server can determine the single frame emotion of the image to be recognized according to the target facial action unit corresponding to each image to be recognized, that is, query the evaluation table according to the target facial action unit corresponding to each image to be recognized, and obtain the The single frame emotion corresponding to the image. As in the above embodiment, 54 SVM classifiers are used to identify local features, determine all target facial action units corresponding to the image to be recognized, and look up the evaluation table based on all target facial action units to determine the single corresponding to the image to be recognized. Frame emotions to improve the accuracy of obtaining customer single frame emotions. Among them, the evaluation table is a pre-configured table. One or more facial action units form different emotions, the facial action unit combination corresponding to each emotion is obtained in advance, and each facial action unit combination and the corresponding emotion are associated and stored to form an evaluation table.
S40:基于至少一帧待识别图像的单帧情绪,获取与图像聚类集合对应的客户的目标情绪。S40: Obtain the target emotion of the customer corresponding to the image cluster set based on the emotion of a single frame of at least one frame to be recognized.
其中,目标情绪是指根据图像聚类集合每一待识别图像对应的单帧情绪确定的情绪。可以理解地,图像聚类集合中为同一客户对应的待识别图像,通过该客户的每一待识别图像对应的单帧情绪,确定该客户对应的目标情绪。Among them, the target emotion refers to the emotion determined according to the emotion of a single frame corresponding to each image to be recognized in the image cluster set. Understandably, in the image clustering set, there are images to be recognized corresponding to the same customer, and the target emotion corresponding to the customer is determined through a single frame of emotion corresponding to each image to be recognized of the customer.
具体地,服务端获取图像聚类集合每一待识别图像对应的单帧情绪,对图像聚类集合每一待识别图像对应的单帧情绪进行分析,以获取图像聚类集合对应的客户的目标情绪。可以理解地,判断图像聚类集合每一待识别图像对应的单帧情绪是否相同,若图像聚类集合中所有单帧情绪相同,则将该单帧情绪作为目标情绪。若至少两个个单帧情绪不相同,则确定图像聚类集合中哪种单帧情绪对应的数量最多,将数量最多对应的单帧情绪作为目标情绪。具体地,统计图像聚类集合中每一单帧情绪对应的数量,将最大数量对应的单帧情绪作为与图像聚类集合对应的客户的目标情绪。本实施例中,依次获取到与待展示商品对应的目标视频数据中每一图像聚类集合对应的客户的目标情绪,即获取到每一客户对该待展示商品的情绪。例如,与A商品对应的目标视频数据中包含100个图像聚类集合,获取每一图像聚类集合对应的客户的目标情绪,即获取到100个客户对A商品的目标情绪。Specifically, the server obtains the single frame emotion corresponding to each image to be recognized in the image cluster set, and analyzes the single frame emotion corresponding to each image to be recognized in the image cluster set to obtain the target of the customer corresponding to the image cluster set mood. Understandably, it is judged whether the single frame emotion corresponding to each to-be-recognized image in the image cluster set is the same, and if all the single frame emotions in the image cluster set are the same, then the single frame emotion is used as the target emotion. If at least two single-frame emotions are not the same, determine which single-frame emotion in the image cluster set corresponds to the largest number, and use the single-frame emotion corresponding to the largest number as the target emotion. Specifically, the number of emotions corresponding to each single frame in the image cluster set is counted, and the single frame emotion corresponding to the maximum number is taken as the target emotion of the customer corresponding to the image cluster set. In this embodiment, the target emotions of the customers corresponding to each image cluster set in the target video data corresponding to the products to be displayed are sequentially acquired, that is, the emotions of each customer for the products to be displayed are acquired. For example, the target video data corresponding to product A includes 100 image cluster sets, and the target sentiment of the customer corresponding to each image cluster set is acquired, that is, the target sentiment of 100 customers on product A is acquired.
S50:依据客户数量和每一图像聚类集合对应的客户的目标情绪,获取最终情绪。S50: Obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set.
其中,最终情绪是指通过每一客户对该待展示商品的目标情绪进行数量分析,获取到的与待展示商品对应的情绪。Among them, the final emotion refers to the emotion corresponding to the product to be displayed obtained through quantitative analysis of the target emotion of the product to be displayed by each customer.
具体地,服务端根据每一图像聚类集合对应的客户的目标情绪,判断所有图像聚类集合对应的客户的目标情绪是否相同,若图像聚类集合对应的客户的目标情绪不同,则依据客户数量和每一图像聚类集合对应的客户的目标情绪,统计每一目标情绪对应的数量,将数量最大的目标情绪作为最终情绪。例如,A商品的目标视频数据对应的客户数量为100,即图像聚类集合也为100,若100个图像聚类集合对应的目标情绪,50个为喜悦,30个为平静,20个为冷淡,那么将数量最大对应的目标情绪(即喜悦)作为A商品的最终情绪。若数量最大对应的目标情绪为至少两个,则基于目标情绪的情绪类别的数量确定最终情绪。优选地,将目标情绪为积极情绪的作为最终情绪。例如,A商品的目标视频数据对应的客户数量为100,即图像聚类集合也为100,若100个图像聚类集合对应的目标情绪,50个为喜悦,50个为冷淡,则将目标情绪为喜悦作为该待展示商品的最终情绪。若并列的数量最大对应的目标情绪为至少两个,且目标情绪均为负面情绪,由于最终展示的目标 展示商品对应的最终情绪应该为积极情绪,则可选择任意一个目标情绪作为最终情绪。Specifically, the server judges whether the target emotions of the customers corresponding to all image clustering sets are the same according to the target emotions of the customers corresponding to each image clustering set. If the target emotions of the customers corresponding to the image clustering sets are different, it will be based on the customer The number and the target emotion of the customer corresponding to each image cluster set are counted, the number corresponding to each target emotion is counted, and the target emotion with the largest number is taken as the final emotion. For example, the number of customers corresponding to the target video data of product A is 100, that is, the image cluster set is also 100. If the target emotions corresponding to 100 image cluster sets, 50 are joy, 30 are calm, and 20 are cold , Then the target emotion (ie joy) corresponding to the largest number is regarded as the final emotion of A commodity. If there are at least two target emotions corresponding to the largest number, the final emotion is determined based on the number of emotion categories of the target emotion. Preferably, the target emotion is a positive emotion as the final emotion. For example, the number of customers corresponding to the target video data of product A is 100, that is, the image cluster set is also 100. If 100 image cluster sets correspond to the target emotion, 50 are joy and 50 are indifferent, then the target emotion For joy as the final emotion of the product to be displayed. If there are at least two target emotions corresponding to the largest number of juxtapositions, and the target emotions are all negative emotions, since the final emotion corresponding to the final display target display product should be a positive emotion, any target emotion can be selected as the final emotion.
S60:依据待展示商品对应的最终情绪,获取目标展示商品。S60: Obtain the target display product according to the final emotion corresponding to the product to be displayed.
其中,目标展示商品是指根据每一待展示商品对应的最终情绪,从待展示商品中获取到可进行展示的商品。Among them, the target display product refers to obtaining a displayable product from the product to be displayed according to the final emotion corresponding to each product to be displayed.
具体地,服务端根据每一待展示商品对应的最终情绪,从待展示商品中获取到目标展示商品具体包括如下步骤:Specifically, the server obtains the target display product from the product to be displayed according to the final emotion corresponding to each product to be displayed, specifically including the following steps:
(1)先确定每一待展示商品对应的最终情绪是否为可进行展示的商品。具体地,预先设定可进行展示的预设情绪,将每一待展示商品对应的最终情绪与预设情绪进行匹配,若最终情绪与预设情绪匹配成功,将匹配成功的最终情绪对应的待展示商品为可进行展示的商品。通过该步骤,避免目标情绪与预设情绪不匹配的待展示商品作为目标展示商品。一般来说,预设情绪为积极情绪,则避免了将最终情绪为负面情绪对应的待展示商品作为目标展示商品。例如,A待展示商品对应的最终情绪为喜悦,则大部分客户对该待展示商品是比较感兴趣的,将喜悦与预设情绪进行匹配,若喜悦与预设情绪匹配成功,则A商品确定为可进行展示的商品。再例如,B待展示商品对应的最终情绪为厌恶、生气或者失望,则大部分客户对该待展示商品是不喜欢的,将B待展示商品对应的最终情绪与预设情绪匹配失败,则B不可作为展示的商品。(1) First determine whether the final emotion corresponding to each commodity to be displayed is a commodity that can be displayed. Specifically, preset emotions that can be displayed are set in advance, and the final emotion corresponding to each product to be displayed is matched with the preset emotion. If the final emotion is successfully matched with the preset emotion, the matched final emotion is matched to the waiting Display products are products that can be displayed. Through this step, the product to be displayed whose target emotion does not match the preset emotion is avoided as the target display product. Generally speaking, the preset emotions are positive emotions, which avoids taking the products to be displayed corresponding to the negative emotions as the target display products. For example, if the final emotion corresponding to the product A to be displayed is joy, most customers are more interested in the product to be displayed, and the joy is matched with the preset emotion. If the joy and the preset emotion match successfully, then the product A is determined It is a product that can be displayed. For another example, if the final emotion corresponding to the item B to be displayed is disgust, anger or disappointment, most customers do not like the item to be displayed, and matching the final emotion corresponding to the item B to be displayed with the preset emotion fails, then B Not as a display product.
(2)确定可展示的商品的数量,若可展示的商品的数量大于预设值,则根据每一可展示的商品对应的最终情绪查询情绪排序表,将最终情绪对应的待展示商品进行排序,获取预设值的待展示商品作为目标展示商品。其中,情绪排序表是预先设定的表,越积极的情绪排在越前,例如,根据喜悦、欢乐、惊讶、平静、厌恶、生气和失望等进行排序。例如,可进行展示的待展示商品的数量为4,预设值为3,则获取可进行展示的待展示商品对应的最终情绪,A待展示商品对应的最终情绪为喜悦,B待展示商品对应的最终情绪为欢乐,C待展示商品对应的最终情绪为惊讶,D待展示商品对应的最终情绪为平静,那么根据情绪排序表对待展示商品进行排序,即为A、B、C和D,获取排前预设值的待展示商品作为目标展示商品,即排前3个A、B和C的待展示商品作为目标展示商品。(2) Determine the number of products that can be displayed. If the number of products that can be displayed is greater than the preset value, query the emotion ranking table according to the final emotion corresponding to each displayable product, and sort the products to be displayed corresponding to the final emotion , Obtain the preset value of the product to be displayed as the target display product. Among them, the emotion ranking table is a preset table, and the more positive emotions are ranked higher, for example, the ranking is based on joy, joy, surprise, calm, disgust, anger, and disappointment. For example, if the number of items to be displayed that can be displayed is 4, and the preset value is 3, then the final emotion corresponding to the item to be displayed that can be displayed is obtained, A is the final emotion corresponding to the item to be displayed is joy, and B corresponds to the item to be displayed The final emotion of C is joy, the final emotion corresponding to the product to be displayed in C is surprise, and the final emotion corresponding to the product to be displayed in D is calm, then the displayed products are sorted according to the emotional ranking table, namely A, B, C, and D, and get The product to be displayed with the preset value in the front row is used as the target display product, that is, the first three A, B, and C products to be displayed are used as the target display product.
进一步地,若根据情绪排序表,将最终情绪对应的待展示商品进行排序时出现并列的情况,先判断是否在预设值内获取到目标展示商品;若在预设值内不能获取到目标展示商品,则确定待展示商品的目标视频数据中与并列最终情绪相匹配的数量(即目标情绪的最大数量),根据该数量确定并列最终情绪的有限级,获取预设值的目标展示商品。例如,可进行展示的待展示商品的数量为4,预设值为3,则获取可进行展示的待展示商品对应的最终情绪,A待展示商品对应的最终情绪为喜悦,B待展示商品对应的最终情绪为欢乐,C待展示商品对应的最终情绪为平静,D待展示商品对应的最终情绪为平静,那么根据情绪排序表对待展示商品进行排序,即为A、B、C和D,C与D并列,且在预设值内不能获取到目标展示商品,则获取C待展示商品的目标视频数据中与C待展示商品对应的最终情绪相匹配的数量,若该数量为50;再确定D待展示商品的目标视频数据中,与D待展示商品对应的最终情绪相匹配的数量,若该数量为60;则将数量为60对应的D待展示商品的优先于数量为50对应的C待展示商品,获取预设值的待展示商品作为目标展示商品,即A、B和D的待展示商品作为目标展示商品。通过该步骤,避免最终情绪出现并列的情况,以提高目标展示商品的确定速度。Further, if the products to be displayed corresponding to the final emotions are sorted according to the emotion ranking table, there is a situation of juxtaposition, first determine whether the target display product is obtained within the preset value; if the target display cannot be obtained within the preset value Commodities, the number of target video data of the product to be displayed that matches the final emotion of the tie (ie the maximum number of target emotions) is determined, the finite level of the final emotion of the tie is determined according to the number, and the target display product of the preset value is obtained. For example, if the number of items to be displayed that can be displayed is 4, and the preset value is 3, then the final emotion corresponding to the item to be displayed that can be displayed is obtained, A is the final emotion corresponding to the item to be displayed is joy, and B corresponds to the item to be displayed The final emotion of C is joy, C is the final emotion corresponding to the product to be displayed is calm, and D is the final emotion corresponding to the product to be displayed is calm, then the displayed products are sorted according to the emotion ranking table, namely A, B, C and D, C Parallel to D, and the target display product cannot be obtained within the preset value, then obtain the number of target video data of the product C to be displayed that matches the final sentiment corresponding to the product C to be displayed, if the number is 50; then confirm In the target video data of D to-be-displayed product, the number that matches the final sentiment corresponding to D to-be-displayed product. If the number is 60; then the number of D to-be-displayed products corresponding to a number of 60 is prioritized over the number of C corresponding to 50 For the products to be displayed, a preset value of the products to be displayed is obtained as the target display products, that is, the products to be displayed in A, B, and D are used as the target display products. Through this step, it is avoided that the final emotions appear parallel, so as to improve the speed of determining the target display product.
步骤S10-S60中,获取待展示商品的目标视频数据,采用人脸检测模型对目标视频数据中待识别图像进行人脸识别和聚类,获取目标视频数据对应的客户数量和每一客户对应的图像聚类集合,以便后续根据一定数量的客户的目标情绪确定目标展示商品,提高目标展示商品的准确率。若客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一图像聚类集合中的待识别图像进行识别,获取每一待识别图像的单帧情绪,以实现识别客户的情绪。基于至少一帧待识别图像的单帧情绪,获取与图像聚类集合对应的客户的 目标情绪,以确定该待展示商品是否为客户所感兴趣的商品。依据客户数量和每一图像聚类集合对应的客户的目标情绪,以获取到一定数量的客户对该待展示商品的最终情绪,依据待展示商品对应的最终情绪,获取目标展示商品,提高目标展示商品的准确率,使得目标商品为大多数客户较为感兴趣的商品。In steps S10-S60, the target video data of the product to be displayed is obtained, and the face detection model is used to perform face recognition and clustering on the image to be recognized in the target video data, and the number of customers corresponding to the target video data and the number of customers corresponding to each customer are obtained. Image clustering collection, in order to subsequently determine the target display product according to the target sentiment of a certain number of customers, and improve the accuracy of the target display product. If the number of customers is greater than the preset number, the pre-trained micro-expression recognition model is used to identify the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained to realize the recognition of customer emotions . Based on the single frame emotion of at least one frame of the image to be recognized, the target emotion of the customer corresponding to the image cluster set is obtained to determine whether the commodity to be displayed is the commodity that the customer is interested in. According to the number of customers and the target emotions of the customers corresponding to each image clustering set, the final emotions of a certain number of customers for the products to be displayed are obtained, and the target display products are obtained according to the final emotions corresponding to the products to be displayed, and the target display is improved The accuracy of the product makes the target product a product that most customers are more interested in.
在一实施例中,如图3所示,步骤S10中,即获取待展示商品的目标视频数据,目标视频数据包含至少两帧待识别图像,具体包括如下步骤:In one embodiment, as shown in FIG. 3, in step S10, the target video data of the commodity to be displayed is obtained. The target video data includes at least two frames to be recognized, which specifically includes the following steps:
S11:获取待展示商品的初始视频数据,初始视频数据包括至少两帧初始视频图像。S11: Obtain initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images.
其中,初始视频数据是指视频采集工具采集的与每一待展示商品对应的视频数据。Among them, the initial video data refers to the video data corresponding to each commodity to be displayed collected by the video collection tool.
具体地,采用视频采集工具采集每一待展示商品对应的初始视频数据,初始视频数据中包括至少两帧初始视频图像,初始视频数据中包含待展示商品的对应的商品标识,根据该商品标识可确定与每一待展示商品对应的初始视频数据。通过采集每一待展示商品对应的初始视频数据,以便后续对初始视频数据进行分析,获取到每一待展示商品对应的最终情绪。Specifically, a video capture tool is used to collect the initial video data corresponding to each product to be displayed. The initial video data includes at least two frames of initial video images, and the initial video data includes the corresponding product identifier of the product to be displayed. Determine the initial video data corresponding to each commodity to be displayed. By collecting the initial video data corresponding to each commodity to be displayed, the initial video data can be subsequently analyzed to obtain the final emotion corresponding to each commodity to be displayed.
S12:获取待展示商品的属性信息,属性信息包括适宜年龄和适宜性别。S12: Obtain attribute information of the commodity to be displayed, and the attribute information includes a suitable age and a suitable gender.
其中,适宜年龄是指适用于该待展示商品对应的年龄。适宜性别是指适用于该待展示商品对应的性别。Among them, the appropriate age refers to the age corresponding to the commodity to be displayed. The appropriate gender refers to the gender corresponding to the product to be displayed.
具体地,数据库中存储有每一待展示商品对应的属性信息。服务端根据待展示商品查找数据库,获取与每一待展示商品对应的属性信息。其中,属性信息中包含适宜年龄和适宜性别。例如,某一待展示商品为服饰,该服饰对应的属性信息中适宜年龄为20-24岁,适宜性别为的女性。再例如,待展示商品为化妆品,该化妆品对应的属性信息中适宜年龄为25-30岁,适宜性别为男性等。本实施例中对待展示商品不做具体地限定。Specifically, the attribute information corresponding to each commodity to be displayed is stored in the database. The server searches the database according to the products to be displayed, and obtains attribute information corresponding to each product to be displayed. Among them, the attribute information includes suitable age and suitable gender. For example, a certain product to be displayed is clothing, and the attribute information corresponding to the clothing includes a female with a suitable age of 20-24 and a suitable gender. For another example, the product to be displayed is cosmetics, the appropriate age in the attribute information corresponding to the cosmetics is 25-30 years old, and the appropriate gender is male. The commodities to be displayed are not specifically limited in this embodiment.
S13:采用适宜年龄和适宜性别对初始视频图像进行筛选,获取待识别图像,基于至少两帧待识别图像形成目标视频数据。S13: Screen the initial video image with a suitable age and a suitable gender, obtain an image to be recognized, and form target video data based on at least two frames of the image to be recognized.
具体地,采用预先训练好的分类器对初始视频数据中的至少两帧初始视频图像进行识别,获取与每一初始视频图像对应的目标年龄和目标性别,服务端将目标年龄与适宜年龄进行匹配,将目标性别与适宜性别进行匹配,将与适宜年龄进行匹配成功且与适宜性别进行匹配成功的初始视频图像确定为待识别图像,将匹配不成功的初始视频图像进行删除,基于至少两帧待识别图像形成目标视频数据。其中,目标年龄是指通过预先训练好的分类器对初始视频图像进行识别,所获取到年龄。目标性别是指通过预先训练好的分类器对初始视频图像进行识别,所获取到性别。Specifically, a pre-trained classifier is used to identify at least two frames of initial video images in the initial video data, and the target age and target gender corresponding to each initial video image are obtained. The server matches the target age with the appropriate age. , The target gender is matched with the appropriate gender, the initial video image that is successfully matched with the appropriate age and successfully matched with the appropriate gender is determined as the image to be recognized, and the initial video image that is unsuccessfully matched is deleted, based on at least two frames of waiting The recognition image forms the target video data. Among them, the target age refers to the age obtained by recognizing the initial video image through a pre-trained classifier. The target gender refers to the gender obtained by recognizing the initial video image through a pre-trained classifier.
步骤S11-S13,根据待展示商品对应的适宜年龄和适宜性别对初始视频图像进行筛选,获取待识别图像,基于至少两帧待识别图像形成目标视频数据,使得目标视频数据与待展示商品更加匹配,提高目标展示商品的准确率。Steps S11-S13, screening the initial video images according to the appropriate age and appropriate gender corresponding to the products to be displayed, obtain the images to be recognized, and form target video data based on at least two frames of images to be recognized, so that the target video data and the products to be displayed are more matched , Improve the accuracy of target display products.
在一实施例中,在步骤S3之前,即在采用适宜年龄和适宜性别对初始视频图像进行筛选的步骤之前,基于图像识别的商品展示方法还包括:In an embodiment, before step S3, that is, before the step of screening the initial video images with appropriate age and appropriate gender, the method for displaying goods based on image recognition further includes:
(1)采用超分辨率技术对至少两帧初始视频图像进行处理,获取与至少两帧初始视频图像对应的高分辨率图像,将高分辨率图像作为待确定图像。(1) Use super-resolution technology to process at least two frames of initial video images, obtain high-resolution images corresponding to at least two frames of initial video images, and use the high-resolution images as images to be determined.
其中,超分辨率技术(Super-Resolution,SR)是指从获取到的低分辨率图像重建出相应的高分辨率图像。通常,初始视频数据中初始视频图像为低分辨率图像。待确定图像是指将初始视频图像转换成高分辨率的图像。Among them, super-resolution technology (Super-Resolution, SR) refers to reconstructing a corresponding high-resolution image from the acquired low-resolution image. Generally, the initial video image in the initial video data is a low-resolution image. The image to be determined refers to the conversion of the initial video image into a high-resolution image.
具体地,服务端获取初始视频数据,初始视频数据包括至少两帧初始视频图像,初始视频图像在低分辨率(LR)空间,通过ESPCN算法提取低分辨率空间的特征图,通过有效的亚像素卷积层,将初始视频图像从低分辨率放大高分辨率,将最终的低分辨率特征图升级到高分辨率特征图,根据高分辨率特征图获取与每一初始视频图像对应的高分辨率图像,将高分辨率图像作为待确定图像。其中,ESPCN算法的核心概念是亚像素卷积层 (sub-pixel convolutional layer),输入是低分辨率图像(即初始视频图像),通过两个亚像素卷积层以后,得到的特征图像大小与输入图像一样,但是特征通道为r^2(r是图像的目标放大倍数)。将每个像素的r^2个通道重新排列成一个r x r的区域,对应于高分辨率图像中的一个r x r大小的子块,从而大小为r^2 x H x W的特征图像被重新排列成1 x rH x rW大小的高分辨率特征图,根据高分辨率特征图获取与每一帧初始视频图像对应的高分辨率图像,将高分辨率图像作为待确定图像。Specifically, the server obtains the initial video data. The initial video data includes at least two frames of initial video images. The initial video images are in the low-resolution (LR) space. The feature map of the low-resolution space is extracted through the ESPCN algorithm, and the effective sub-pixel Convolutional layer, enlarge the initial video image from low resolution to high resolution, upgrade the final low resolution feature map to high resolution feature map, and obtain the high resolution corresponding to each initial video image based on the high resolution feature map Rate image, the high-resolution image is used as the image to be determined. Among them, the core concept of the ESPCN algorithm is a sub-pixel convolutional layer. The input is a low-resolution image (that is, the initial video image). After passing through the two sub-pixel convolutional layers, the size of the characteristic image obtained is The input image is the same, but the characteristic channel is r^2 (r is the target magnification of the image). Rearrange the r^2 channels of each pixel into an r x r area, corresponding to a sub-block of size r x r in the high-resolution image, so that a feature image of size r 2 x H x W It is rearranged into a high-resolution feature map with the size of 1 x rH x rW, and a high-resolution image corresponding to each frame of the initial video image is obtained according to the high-resolution feature map, and the high-resolution image is used as the image to be determined.
如图4所示,步骤S13中,即采用适宜年龄和适宜性别对初始视频图像进行筛选,获取待识别图像,具体包括如下步骤:As shown in FIG. 4, in step S13, the initial video image is screened by the appropriate age and the appropriate gender to obtain the image to be recognized, which specifically includes the following steps:
S131:采用预先训练好的分类器对至少两帧待确定图像进行识别,获取与每一待确定图像对应的目标年龄和目标性别。S131: Use a pre-trained classifier to identify at least two frames of images to be determined, and obtain the target age and target gender corresponding to each image to be determined.
其中,预先训练好的分类器包括性别分类器和年龄分类器,通过性别分类器和年龄分类器分别对待确定图像进行识别,以获取到与待确定图像对应的目标年龄和目标性别。其中,目标性别是指通过性别分类器对待确定图像进行识别,所获取到的性别。目标年龄是指通过年龄分类器对待确定图像进行识别,所获取到的年龄。Wherein, the pre-trained classifier includes a gender classifier and an age classifier, and the image to be determined is recognized through the gender classifier and the age classifier respectively to obtain the target age and target gender corresponding to the image to be determined. Among them, the target gender refers to the gender obtained by recognizing the image to be determined through the gender classifier. The target age refers to the age obtained by recognizing the image to be determined by the age classifier.
其中,性别分类器和年龄分类器进行训练时,先获取大量的训练图像数据,训练图像数据中包含不同年龄和不同性别的人脸图像,将训练图像数据中的每一人脸图像进行年龄和性别标注,将标注后的训练图像数据输入至深度神经网络,并通过深度神经网络对标注后的训练图像数据进行训练,由于深度神经网络包括至少两个卷积层,将年龄预测值和已标注的年龄进行比对,以调整深度神经网络中各层的权重和偏差,直至模型收敛,获取年龄分类器。将性别预测值和已标注的性别进行比对,以调整深度神经网络中各层的权重和偏差,直至模型收敛,获取性别分类器。Among them, when the gender classifier and age classifier are trained, a large amount of training image data is first obtained. The training image data contains face images of different ages and different genders, and each face image in the training image data is processed by age and gender. Annotation, the annotated training image data is input to the deep neural network, and the annotated training image data is trained through the deep neural network. Since the deep neural network includes at least two convolutional layers, the predicted age and the labeled The age is compared to adjust the weight and bias of each layer in the deep neural network until the model converges, and the age classifier is obtained. The gender prediction value is compared with the marked gender to adjust the weight and bias of each layer in the deep neural network until the model converges, and the gender classifier is obtained.
具体地,采用预先训练好的性别分类器对待确定图像进行识别,待确定图像为包含客户脸部的图像,将包含客户脸部的待确定图像进行人脸关键点检测和特征提取,以获取脸部特征。最后,将提取的脸部特征输入到预先训练好的性别分类器中,通过性别分类器对脸部特征进行识别,以获取与待确定图像对应的目标性别。并将将提取的脸部特征输入到预先训练好的年龄分类器中,通过年龄分类器对脸部特征进行分类,以获取与待确定图像对应的目标年龄。通过预先训练好的性别分类器和年龄分类器来对人待确定图像进行客户性别和年龄预估,以提升目标性别和目标年龄获取准确度。Specifically, a pre-trained gender classifier is used to identify the image to be determined. The image to be determined is an image containing the face of the customer. The image to be determined containing the face of the customer is subjected to face key point detection and feature extraction to obtain the face. Department features. Finally, the extracted facial features are input into a pre-trained gender classifier, and the facial features are recognized by the gender classifier to obtain the target gender corresponding to the image to be determined. And input the extracted facial features into a pre-trained age classifier, and classify the facial features through the age classifier to obtain the target age corresponding to the image to be determined. The pre-trained gender classifier and age classifier are used to estimate the gender and age of the customer on the image to be determined, so as to improve the accuracy of obtaining the target gender and target age.
S132:将目标年龄与适宜年龄进行匹配,并将目标性别与适宜性别进行匹配。S132: Match the target age with the appropriate age, and match the target gender with the appropriate gender.
具体地,适宜年龄可以是一个年龄段,例如,20-24岁。服务端将目标年龄与适宜年龄进行匹配,主要是判断该目标年龄是否在适宜年龄范围内。适宜性别为女性和男性两种,将识别出的目标性别与适宜性别进行匹配。Specifically, the appropriate age may be an age group, for example, 20-24 years old. The server matches the target age with the appropriate age, mainly to determine whether the target age is within the appropriate age range. The suitable gender is female and male, and the identified target gender is matched with the suitable gender.
S133:若目标年龄与适宜年龄匹配成功,且目标性别与适宜性别匹配成功,则将与目标年龄和目标性别对应的待确定图像作为待识别图像。S133: If the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
具体地,服务端判断出目标年龄在适宜年龄段范围内,且目标性别与适宜性别匹配成功,则将与目标年龄和目标性别对应的待确定图像作为待识别图像。Specifically, the server determines that the target age is within the appropriate age range, and the target gender is successfully matched with the appropriate gender, it will use the to-be-determined image corresponding to the target age and the target gender as the image to be recognized.
步骤S131-S132,采用预先训练好的分类器对至少两帧待确定图像进行识别,获取与每一待确定图像对应的目标年龄和目标性别,以实现通过分类器对目标年龄和目标性别的确定,提高目标展示商品的获取速度。若目标年龄与适宜年龄匹配成功,且目标性别与适宜性别匹配成功,则将与目标年龄和目标性别对应的待确定图像作为待识别图像,使得获取到的待识别图像与待展示商品的属性信息相对应,以提高与待展示产品相匹配的人群的吸引力,通过对待识别图像进行分析,使得获取到的目标展示商品更加准确,提高目标展示商品的获取准确率。Steps S131-S132, using a pre-trained classifier to identify at least two frames of images to be determined, and obtain the target age and target gender corresponding to each image to be determined, so as to realize the determination of the target age and target gender through the classifier , To improve the speed of acquiring the target display product. If the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, the image to be determined corresponding to the target age and gender is used as the image to be identified, so that the acquired image to be identified and the attribute information of the product to be displayed Correspondingly, in order to increase the attractiveness of the crowd matching the product to be displayed, by analyzing the image to be recognized, the obtained target display product is more accurate, and the acquisition accuracy of the target display product is improved.
在一实施例中,如图5所示,步骤S20,即采用人脸检测模型对至少两帧待识别图像进行人脸识别和聚类,获取目标视频数据对应的客户数量和每一客户对应的图像聚类集 合,具体包括如下步骤:In one embodiment, as shown in FIG. 5, step S20 is to use a face detection model to perform face recognition and clustering on at least two frames of images to be recognized, and obtain the number of customers corresponding to the target video data and the number of customers corresponding to each customer. The image clustering collection specifically includes the following steps:
S21:采用人脸检测模型对至少两帧待识别图像进行人脸识别,获取每一待识别图像对应的人脸图像;S21: Use a face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized;
具体地,服务端获取目标视频数据,采用人脸检测模型对目标视频数据中每一帧待识别图像进行人脸识别,获取到目标视频数据中每一待识别图像对应的人脸图像。其中,人脸识别是指对于任意一帧给定的图像,采用一定的策略对其进行搜索以确定图像中是否含有人脸。进一步地,人脸检测模型是预先训练好的用于检测每一帧待识别图像是否包含人的面部区域的模型。具体地,服务端将每一帧待识别图像输入到人脸检测模型中,检测每一帧待识别图像中的是否包含人脸,若待识别图像中包含人脸,则获取目标视频数据中每一待识别图像对应的人脸图像。Specifically, the server obtains the target video data, uses a face detection model to perform face recognition on each frame of the image to be recognized in the target video data, and obtains a face image corresponding to each image to be recognized in the target video data. Among them, face recognition means that for any given frame of image, a certain strategy is used to search it to determine whether the image contains a face. Further, the face detection model is a pre-trained model used to detect whether each frame of the image to be recognized contains a human face area. Specifically, the server inputs each frame of the to-be-recognized image into the face detection model, and detects whether each frame of the to-be-recognized image contains a human face. If the to-be-recognized image contains a human face, it acquires each frame of the target video data. A face image corresponding to the image to be recognized.
S22:对待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,每一图像聚类集合包括至少一帧待识别图像。S22: Cluster the face images corresponding to the image to be recognized, and obtain at least two image cluster sets, and each image cluster set includes at least one frame of the image to be recognized.
具体地,服务端将获取到的待识别图像对应的人脸图像进行聚类,将包含相同客户的人脸图像进行聚类,获取至少两个图像聚类集合,其中,每一图像聚类集合中包括至少一帧待识别图像。更具体地,采用特征提取算法提取每一待识别图像对应的人脸图像的人脸特征,将人脸图像对应的人脸特征进行特征相似度计算,若特征相似度大于预设阈值,则说明为相同客户的人脸图像,并将相同客户的人脸图像对应的待识别图像进行聚类,以获取每一客户对应的图像聚类集合。即,一个客户对应一个图像聚类集合,每一图像聚类集合中包括至少一帧待识别图像。Specifically, the server clusters the acquired facial images corresponding to the image to be recognized, clusters the facial images containing the same customer, and acquires at least two image cluster sets, where each image cluster set Includes at least one frame to be recognized. More specifically, the feature extraction algorithm is used to extract the facial features of the facial image corresponding to each image to be recognized, and the facial features corresponding to the facial image are calculated for feature similarity. If the feature similarity is greater than the preset threshold, it is stated It is the face image of the same customer, and clusters the to-be-identified images corresponding to the face image of the same customer to obtain the image cluster set corresponding to each customer. That is, one customer corresponds to one image cluster set, and each image cluster set includes at least one frame to be recognized.
S23:根据图像聚类集合的数量,获取目标视频数据对应的客户数量。S23: Obtain the number of customers corresponding to the target video data according to the number of image clustering sets.
具体地,统计每一待展示商品对应的图像聚类集合的数量,将图像聚类集合的数量作为与目标视频数据对应的客户数量。Specifically, the number of image cluster sets corresponding to each commodity to be displayed is counted, and the number of image cluster sets is taken as the number of customers corresponding to the target video data.
步骤S21-S23,采用人脸检测模型对至少两帧待识别图像进行人脸识别,获取每一待识别图像对应的人脸图像,以实现确定待识别图像是否为人脸图像,避免不包含人脸的待识别图像进行聚类,提高后续图像聚类集合的获取速度。对待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,根据图像聚类集合的数量,获取目标视频数据对应的客户数量,以实现客户数量的确定,保证客户数量获取的准确性。Steps S21-S23: Use the face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized, so as to determine whether the image to be recognized is a face image and avoid not containing a face The images to be recognized are clustered to improve the acquisition speed of subsequent image clustering sets. Cluster the face images corresponding to the image to be recognized, obtain at least two image cluster sets, and obtain the number of customers corresponding to the target video data according to the number of image cluster sets, so as to determine the number of customers and ensure that the number of customers is obtained accuracy.
在一实施例中,每一待识别图像对应一时间标识,时间标识是指采集到该待识别图像对应的时间。In one embodiment, each image to be recognized corresponds to a time mark, and the time mark refers to the time corresponding to the image to be recognized is collected.
如图6所示,步骤S22中,即对待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,具体包括如下步骤:As shown in FIG. 6, in step S22, that is, clustering the face images corresponding to the image to be recognized, and obtaining at least two image cluster sets, specifically includes the following steps:
S221:根据时间标识,将至少两帧待识别图像中首次识别到的人脸图像作为基准图像。S221: According to the time mark, use the first recognized face image in at least two frames of images to be recognized as a reference image.
其中,基准图像是指从待识别图像中首次识别到的人脸图像。Among them, the reference image refers to the face image recognized for the first time from the image to be recognized.
具体地,服务端获取至少两帧待识别图像对应的时间标识,根据时间标识,先确定至少两帧待识别图像中首次识别到的人脸图像,将该人脸图像作为基准图像。通过确定基准图像,以提高图像聚类集合的获取速度。Specifically, the server obtains the time stamps corresponding to at least two frames of images to be recognized, and according to the time stamps, first determines the first recognized face image in the at least two frames of images to be recognized, and uses the face image as the reference image. By determining the reference image, the acquisition speed of the image cluster set can be improved.
S222:根据时间标识,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度。S222: According to the time mark, a similarity algorithm is successively used to calculate the feature similarity between the reference image and the remaining images to be recognized.
具体地,根据时间标识,采用相似度算法将基准图像与至少两帧待识别图像中除基准图像之外的剩余待识别图像进行特征相似度计算,以获取特征相似度。其中,相似度算法可以为欧几里得距离算法、曼哈顿距离算法、明可夫斯基距离算法或者余弦相似度算法等。在本实施例中,采用余弦相似度算法计算基准图像和剩余待识别图像的特性相似度,可加快图像聚类集合的获取速度,提高目标展示商品的获取效率。Specifically, according to the time mark, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized except for the reference image in the at least two frames of images to be recognized to obtain the feature similarity. Among them, the similarity algorithm may be Euclidean distance algorithm, Manhattan distance algorithm, Minkowski distance algorithm, or cosine similarity algorithm. In this embodiment, the cosine similarity algorithm is used to calculate the characteristic similarity between the reference image and the remaining images to be recognized, which can speed up the acquisition of image clustering sets and improve the acquisition efficiency of target display products.
S223:若特征相似度大于预设阈值,将特征相似度大于预设阈值的待识别图像和基准图像归属于同一图像聚类集合。S223: If the feature similarity is greater than the preset threshold, attribute the image to be recognized and the reference image whose feature similarity is greater than the preset threshold to the same image cluster set.
具体地,预设阈值为预先设定的值,若服务端判断出基准图像与剩余待识别图像的特征相似度大于预设阈值,则认为基准图像与剩余待识别图像匹配成功,基准图像与剩余待识别图像为同一客户的图像,则将特征相似度大于预设阈值的待识别图像和基准图像归属于同一图像聚类集合。例如,基准图像1与剩余待识别图像2对应的特征相似度为80%,基准图像1与剩余待识别图像3对应的特征相似度为99%,预设阈值为90%,那么基准图像1与剩余待识别图像3对应的特征相似度大于预设阈值,则将基准图像1与剩余待识别图像3归属于同一图像聚类集合。Specifically, the preset threshold is a preset value. If the server determines that the feature similarity between the reference image and the remaining images to be recognized is greater than the preset threshold, it is considered that the reference image matches the remaining images to be recognized successfully, and the reference image matches the remaining images. If the image to be identified is an image of the same customer, the image to be identified and the reference image whose feature similarity is greater than the preset threshold are attributed to the same image cluster set. For example, if the feature similarity between the reference image 1 and the remaining image 2 to be recognized is 80%, the feature similarity between the reference image 1 and the remaining image 3 to be recognized is 99%, and the preset threshold is 90%, then the reference image 1 and The feature similarity corresponding to the remaining image 3 to be identified is greater than the preset threshold, and the reference image 1 and the remaining image 3 to be identified are attributed to the same image cluster set.
S224:若特征相似度不大于预设阈值,根据时间标识,将特征相似度不大于预设阈值的剩余待识别图像中的第一张更新为新的基准图像,重复执行根据时间标识,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧待识别图像聚类集合完成,形成至少两个图像聚类集合。S224: If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images whose feature similarity is not greater than the preset threshold to a new reference image, repeat the execution according to the time mark, and use it in sequence The similarity algorithm calculates the feature similarity between the reference image and the remaining images to be identified until at least two image cluster sets to be identified are completed, forming at least two image cluster sets.
具体地,服务端判断出基准图像与剩余待识别图像的特征相似度不大于预设阈值,则认为基准图像与剩余待识别图像匹配失败,基准图像对应的客户与剩余待识别图像对应的客户不为同一客户,则根据时间标识,将特征相似度不大于预设阈值的剩余待识别图像中的第一张更新为新的基准图像。例如,基准图像1与剩余待识别图像2对应的特征相似度为80%,预设阈值为90%,那么基准图像1与剩余待识别图像2对应的特征相似度不大于预设阈值,则根据时间标识,剩余待识别图像2更新为新的基准图像。重复执行根据时间标识,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧待识别图像中所有待识别图像聚类集合完成,形成至少两个图像聚类集合。Specifically, if the server determines that the feature similarity between the reference image and the remaining images to be recognized is not greater than the preset threshold, it is considered that the reference image and the remaining images to be recognized have failed to match, and the customers corresponding to the reference image and the customers corresponding to the remaining images to be recognized are different. For the same customer, according to the time mark, the first image among the remaining images to be identified whose feature similarity is not greater than the preset threshold is updated as a new reference image. For example, if the feature similarity between the reference image 1 and the remaining image 2 to be recognized is 80%, and the preset threshold is 90%, then the feature similarity corresponding to the reference image 1 and the remaining image 2 to be recognized is not greater than the preset threshold, then Time stamp, the remaining image 2 to be recognized is updated as a new reference image. Repeat the steps of calculating the feature similarity between the reference image and the remaining images to be recognized by using the similarity algorithm according to the time stamp, until the clustering of all the images to be recognized in at least two frames to be recognized is completed, forming at least two image clusters set.
步骤S221-S224,根据时间标识,将至少两帧待识别图像中首次识别到的人脸图像作为基准图像,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度,以便确定基准图像与剩余待识别图像是否为同一客户。若特征相似度大于预设阈值,将特征相似度大于预设阈值的待识别图像和基准图像归属于同一图像聚类集合,以实现将同一客户的待识别图像进行聚类。若特征相似度不大于预设阈值,根据时间标识,将特征相似度不大于预设阈值的剩余待识别图像中的第一张更新为新的基准图像,重复执行根据时间标识,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧待识别图像聚类集合完成,形成至少两个图像聚类集合,以实现将同一客户的待识别图像进行聚类,以便后续确定每一客户对该待展示商品的目标情绪。Steps S221-S224, according to the time stamp, the first recognized face image in at least two frames to be recognized is used as the reference image, and the similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized in order to determine the reference image Whether it is the same customer as the remaining images to be recognized. If the feature similarity is greater than the preset threshold, the image to be recognized and the reference image with feature similarity greater than the preset threshold are attributed to the same image clustering set, so as to cluster the to-be-recognized images of the same customer. If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images whose feature similarity is not greater than the preset threshold to a new reference image, repeat the time mark, and use the similarity in turn The algorithm calculates the feature similarity between the reference image and the remaining images to be recognized until at least two frame of image clustering sets to be identified are completed to form at least two image clustering sets to achieve clustering of the images to be identified for the same customer. In order to subsequently determine the target sentiment of each customer for the product to be displayed.
在一实施中,如图7所示,步骤S30中,即采用预先训练好的微表情识别模型对每一图像聚类集合中的待识别图像进行识别,获取每一待识别图像的单帧情绪,具体包括如下步骤:In an implementation, as shown in FIG. 7, in step S30, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained. , Specifically including the following steps:
S31:采用人脸关键点算法对每一图像聚类集合中的待识别图像进行人脸识别,获取与每一待识别图像对应的人脸关键点。S31: Use the face key point algorithm to perform face recognition on the images to be recognized in each image cluster set, and obtain the face key points corresponding to each image to be recognized.
其中,人脸关键点算法可以是但不限于Ensemble of Regression Tress(简称ERT)算法、SIFT(scale-invariant feature transform)算法,SURF(Speeded Up Robust Features)算法,LBP(Local Binary Patterns)算法和HOG(Histogram of Oriented Gridients)算法。本实施例中,采用ERT算法,对每一图像聚类集合中的待识别图像进行人脸识别,以获取到每一待识别图像对应的人脸关键点。其中,ERT算法是一种基于回归的方法,ERT算法用公式表示如下:
Figure PCTCN2019120988-appb-000001
其中,
Figure PCTCN2019120988-appb-000002
为第t+1次迭代所获得的待识别图像的特征点的形状或坐标,t表示级联序号,
Figure PCTCN2019120988-appb-000003
为预测出图像的特征点的形状或坐标,I为回归器输入的待识别图像,r t表示t级的回归器,每个回归器由很多棵回归树(tree)组成,通过训练可获得回归树,通过回归树获取到每一待识别图像对应的人脸关键点。
Among them, the face key point algorithm can be but not limited to Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature transform) algorithm, SURF (Speeded Up Robust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm. In this embodiment, the ERT algorithm is used to perform face recognition on the images to be recognized in each image cluster set, so as to obtain the key points of the faces corresponding to each image to be recognized. Among them, the ERT algorithm is a regression-based method, and the ERT algorithm is expressed as follows:
Figure PCTCN2019120988-appb-000001
among them,
Figure PCTCN2019120988-appb-000002
Is the shape or coordinates of the feature points of the image to be recognized obtained in the t+1 iteration, t represents the cascade number,
Figure PCTCN2019120988-appb-000003
In order to predict the shape or coordinates of the feature points of the image, I is the image to be recognized as input by the regressor, and r t represents the t-level regressor. Each regressor is composed of many regression trees, which can be obtained through training. Tree, through the regression tree to obtain the key points of the face corresponding to each image to be recognized.
S32:采用特征提取算法对每一待识别图像对应的人脸关键点进行特征提取,获取人脸关键点对应的局部特征。S32: Use a feature extraction algorithm to perform feature extraction on the face key points corresponding to each image to be recognized, and obtain local features corresponding to the face key points.
其中,特征提取算法可以CNN(Convolutional Neural Network,卷积神经网)算法,通过CNN算法提取待识别图像对应的人脸关键点位置的局部特征,具体是根据面部动作单元对应的位置,提取局部特征。其中,CNN算法是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,可以快速高效进行图像处理。在本实施例中,采用预先训练好的卷积神经网络,快速地提取人脸关键点对应的局部特征。Among them, the feature extraction algorithm can be the CNN (Convolutional Neural Network, convolutional neural network) algorithm. The CNN algorithm is used to extract the local features of the key points of the face corresponding to the image to be recognized. Specifically, the local features are extracted according to the location of the facial action unit. . Among them, the CNN algorithm is a feed-forward neural network, and its artificial neurons can respond to a part of the surrounding units within the coverage area, and can quickly and efficiently perform image processing. In this embodiment, a pre-trained convolutional neural network is used to quickly extract local features corresponding to key points of a human face.
具体地,将每一待识别图像对应的人脸关键点,通过若干个卷积核进行卷积运算,卷积后的结果为人脸检测点对应的局部特征。具体是通过公式
Figure PCTCN2019120988-appb-000004
进行卷积运算,以获取到局部特征。其中,y为输出的局部特征,x为一个大小为(M,N)的二维输入向量,即由L个人脸关键点的坐标所形成,w ij是大小为I*J卷积核,b为偏置,大小为M*N,激活函数用f表示,每个卷积核均与上一层的输入待识别图像的人脸关键点进行卷积操作,每个卷积核都会有一个对应的局部特征,卷积核中权值共享,参数个数大大缩减,极大地提升了网络地训练速度。
Specifically, the key points of the face corresponding to each image to be recognized are subjected to a convolution operation through several convolution kernels, and the result of the convolution is the local feature corresponding to the face detection point. Specifically through the formula
Figure PCTCN2019120988-appb-000004
Perform convolution operation to obtain local features. Among them, y is the output local feature, x is a two-dimensional input vector of size (M, N), which is formed by the coordinates of the key points of the face of L, w ij is a convolution kernel of size I*J, b Is the bias, the size is M*N, the activation function is denoted by f, each convolution kernel is convolved with the face key points of the input image to be recognized in the previous layer, and each convolution kernel will have a corresponding The local features of the convolution kernel share the weights, and the number of parameters is greatly reduced, which greatly improves the training speed of the network.
进一步地,将人脸关键点输入到预设的卷积神经网络中进行识别后,可得到的面部动作单元对应的局部特征。例如,AU1、AU2、AU5和AU26,即内眉上扬、外眉上扬、上眼睑上扬和下颚张开对应的局部特征。本实施例中,采用卷积神经网络对待识别图像中人脸关键点的局部特征进行提取,以便后续根据局部特征确定目标面部动作单元,并根据识别到的目标面部动作单元来确定客户的情绪。在本提案中,相对于LBP-TOP算子,使用卷积神经网络进行识别的运算速度更快,且识别精度更高。Further, after the key points of the face are input into the preset convolutional neural network for recognition, the local features corresponding to the facial action unit can be obtained. For example, AU1, AU2, AU5, and AU26 are the local features corresponding to raised inner eyebrows, raised outer eyebrows, raised upper eyelids, and opened lower jaw. In this embodiment, the convolutional neural network is used to extract the local features of the key points of the face in the image to be recognized, so as to subsequently determine the target facial action unit based on the local features, and determine the customer's emotions based on the recognized target facial action unit. In this proposal, compared with the LBP-TOP operator, the use of convolutional neural network for recognition is faster and the recognition accuracy is higher.
S33:采用预先训练好的分类器对局部特征进行识别,获取每一局部特征对应的目标面部动作单元。S33: Use a pre-trained classifier to recognize local features, and obtain a target facial action unit corresponding to each local feature.
具体地,通过预先训练好的微表情识别模型中每一SVM分类器对局部特征进行识别,其中,SVM分类器与可识别的面部动作单元的数量相同,即可识别的面部动作单元为54个,那么预先训练好的SVM分类器为54个,通过将局部特征输入至对应的SVM分类器中,获取到概率值,将获取到的概率值与预设概率阈值进行对比,将大于预设概率阈值的概率值对应的面部动作单元作为与局部特征对应的目标面部动作单元,并获取到与局部特征对应的所有目标面部动作单元。Specifically, the local features are recognized by each SVM classifier in the pre-trained micro-expression recognition model, where the SVM classifier and the number of recognizable facial motion units are the same, that is, 54 facial motion units can be recognized , Then there are 54 pre-trained SVM classifiers. By inputting local features into the corresponding SVM classifier, the probability value is obtained, and the obtained probability value is compared with the preset probability threshold, which will be greater than the preset probability The facial action unit corresponding to the probability value of the threshold is used as the target facial action unit corresponding to the local feature, and all target facial action units corresponding to the local feature are acquired.
S34:基于每一局部特征对应的目标面部动作单元,查找评估表,获取每一待识别图像的单帧情绪。S34: Based on the target facial action unit corresponding to each local feature, look up the evaluation table to obtain the single frame emotion of each image to be recognized.
其中,评估表是预先配置的表,评估表中存储有面部动作单元组合与情绪的对应关系,例如,AU12、AU6和AU7组合,对应的情绪为欢乐,AU9、AU10、AU17和AU24,对应的情绪为厌恶。服务端通过每一局部特征对应的目标面部动作单元,查找评估表,获取与目标面部动作单元相匹配的组合,将该组合对应的情绪作为与待识别图像对应的单帧情绪。Among them, the evaluation table is a pre-configured table. The evaluation table stores the correspondence between facial action unit combinations and emotions, for example, AU12, AU6, and AU7 combinations, the corresponding emotions are joy, AU9, AU10, AU17, and AU24 correspond to The emotion is disgust. The server searches the evaluation table through the target facial action unit corresponding to each local feature, obtains the combination that matches the target facial action unit, and uses the emotion corresponding to the combination as the single frame emotion corresponding to the image to be recognized.
步骤S31-S34中,采用人脸关键点算法对每一图像聚类集合中的待识别图像进行人脸识别,获取与每一待识别图像对应的人脸关键点,为后续提取局部特征提供技术协助,以提高局部特征提取的准确率;采用特征提取算法对人脸关键点进行特征提取,以快速获取到人脸关键点对应的局部特征,以便后续提取到的目标面部动作单元更加精准;采用预先训练好的分类器对局部特征进行识别,以快速获取每一局部特征对应的目标面部动作单元,实现目标面部动作单元的确定。基于每一局部特征对应的目标面部动作单元,查找评估表,获取每一待识别图像的单帧情绪,以便根据单帧情绪确定每一客户对应的情绪,根据每一客户对应的情绪确定目标展示商品,以提高目标展示商品的准确率。In steps S31-S34, the face key point algorithm is used to perform face recognition on the to-be-recognized images in each image clustering set, to obtain the face key points corresponding to each to-be-recognized image, and provide technology for subsequent extraction of local features Assist to improve the accuracy of local feature extraction; use feature extraction algorithms to extract features of key points of the face to quickly obtain the local features corresponding to the key points of the face, so that the subsequent extracted target facial action units are more accurate; adopt The pre-trained classifier recognizes the local features to quickly obtain the target facial action unit corresponding to each local feature, and realize the determination of the target facial action unit. Based on the target facial action unit corresponding to each local feature, look up the evaluation table to obtain the single frame emotion of each image to be recognized, so as to determine the emotion corresponding to each customer according to the emotion of the single frame, and determine the target display according to the emotion corresponding to each customer Commodities to improve the accuracy of target display commodities.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
在一实施例中,提供一种基于图像识别的商品展示装置,该基于图像识别的商品展示装置与上述实施例中基于图像识别的商品展示方法一一对应。如图8所示,该基于图像识 别的商品展示装置包数据获取模块10、图像聚类集合获取模块20、单帧情绪确定模块30、目标情绪获取模块40、最终情绪获取模块50和目标展示商品获取模块60。各功能模块详细说明如下:In one embodiment, a product display device based on image recognition is provided, and the product display device based on image recognition has a one-to-one correspondence with the product display method based on image recognition in the foregoing embodiment. As shown in FIG. 8, the image recognition-based product display device package data acquisition module 10, image cluster collection acquisition module 20, single frame emotion determination module 30, target emotion acquisition module 40, final emotion acquisition module 50 and target display products Get the module 60. The detailed description of each functional module is as follows:
数据获取模块10,用于获取待展示商品的目标视频数据,目标视频数据包含至少两帧待识别图像。The data acquisition module 10 is configured to acquire target video data of the commodity to be displayed, and the target video data includes at least two frames of images to be recognized.
图像聚类集合获取模块20,用于采用人脸检测模型对至少两帧待识别图像进行人脸识别和聚类,获取目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一图像聚类集合包括至少一帧待识别图像。The image clustering set acquisition module 20 is used to use the face detection model to perform face recognition and clustering on at least two frames of images to be recognized, to acquire the number of customers corresponding to the target video data and the image clustering set corresponding to each customer. An image cluster set includes at least one frame to be recognized.
单帧情绪确定模块30,用于若客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一图像聚类集合中的待识别图像进行识别,获取每一待识别图像的单帧情绪。The single frame emotion determination module 30 is configured to use a pre-trained micro-expression recognition model to recognize images to be recognized in each image cluster set if the number of customers is greater than the preset number, and obtain a single frame of each image to be recognized Frame emotions.
目标情绪获取模块40,用于基于至少一帧待识别图像的单帧情绪,获取与图像聚类集合对应的客户的目标情绪。The target emotion obtaining module 40 is configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame to be recognized.
最终情绪获取模块50,用于依据客户数量和每一图像聚类集合对应的客户的目标情绪,获取最终情绪。The final emotion obtaining module 50 is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set.
目标展示商品获取模块60,用于依据待展示商品对应的最终情绪,获取目标展示商品。The target display product obtaining module 60 is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.
在一实施例中,数据获取模块10,包括初始视频数据获取单元11、属性信息确定单元12和目标视频数据形成单元13。In an embodiment, the data acquisition module 10 includes an initial video data acquisition unit 11, an attribute information determination unit 12 and a target video data formation unit 13.
初始视频数据获取单元11,用于获取待展示商品的初始视频数据,初始视频数据包括至少两帧初始视频图像。The initial video data acquiring unit 11 is configured to acquire initial video data of the commodity to be displayed, and the initial video data includes at least two frames of initial video images.
属性信息确定单元12,用于获取待展示商品的属性信息,属性信息包括适宜年龄和适宜性别。The attribute information determining unit 12 is used to obtain attribute information of the commodity to be displayed, and the attribute information includes a suitable age and a suitable gender.
目标视频数据形成单元13,用于采用适宜年龄和适宜性别对初始视频图像进行筛选,获取待识别图像,基于至少两帧待识别图像形成目标视频数据。The target video data forming unit 13 is configured to filter the initial video images with a suitable age and a suitable gender, obtain the image to be recognized, and form target video data based on at least two frames of the image to be recognized.
在一实施例中,在目标视频数据形成单元之前,基于图像识别的商品展示装置还包括图像分辨率转换单元。In an embodiment, before the target video data forming unit, the commodity display device based on image recognition further includes an image resolution conversion unit.
图像分辨率转换单元,用于采用超分辨率技术对至少两帧初始视频图像进行处理,获取与至少两帧初始视频图像对应的高分辨率图像,将高分辨率图像作为待确定图像。The image resolution conversion unit is used to process at least two frames of initial video images by using super-resolution technology, obtain high-resolution images corresponding to the at least two frames of initial video images, and use the high-resolution images as the images to be determined.
目标视频数据形成单元,包括目标年龄和目标性别确定子单元、匹配子单元和待识别图像确定子单元。The target video data forming unit includes a target age and target gender determination subunit, a matching subunit, and a target image determination subunit.
目标年龄和目标性别确定子单元,用于采用预先训练好的分类器对至少两帧待确定图像进行识别,获取与每一待确定图像对应的目标年龄和目标性别。The target age and target gender determination subunit is used to identify at least two frames of images to be determined using a pre-trained classifier, and obtain the target age and target gender corresponding to each image to be determined.
匹配子单元,用于将目标年龄与适宜年龄进行匹配,并将目标性别与适宜性别进行匹配。The matching subunit is used to match the target age with the appropriate age, and match the target gender with the appropriate gender.
待识别图像确定子单元,用于若目标年龄与适宜年龄匹配成功,且目标性别与适宜性别匹配成功,则将与目标年龄和目标性别对应的待确定图像作为待识别图像。The to-be-recognized image determination subunit is used for if the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, then the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
在一实施例中,图像聚类集合获取模块20,包括人脸图像获取单元、图像聚类集合获取单元和客户数量确定单元。In an embodiment, the image cluster set acquisition module 20 includes a face image acquisition unit, an image cluster set acquisition unit, and a customer number determination unit.
人脸图像获取单元,用于采用人脸检测模型对至少两帧待识别图像进行人脸识别,获取每一待识别图像对应的人脸图像。The face image acquisition unit is configured to use a face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized.
图像聚类集合获取单元,用于对待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,每一图像聚类集合包括至少一帧待识别图像。The image clustering set obtaining unit is configured to cluster the face images corresponding to the image to be recognized to obtain at least two image cluster sets, and each image cluster set includes at least one frame of the image to be recognized.
客户数量确定单元,用于根据图像聚类集合的数量,获取目标视频数据对应的客户数量。The customer number determining unit is used to obtain the number of customers corresponding to the target video data according to the number of image clustering sets.
在一实施例中,每一待识别图像对应一时间标识。In one embodiment, each image to be recognized corresponds to a time stamp.
图像聚类集合获取单元包括基准图像确定子单元、特征相似度计算单元、第一图像聚 类集合确定子单元和第二图像聚类集合确定子单元。The image cluster set acquisition unit includes a reference image determination subunit, a feature similarity calculation unit, a first image cluster set determination subunit, and a second image cluster set determination subunit.
基准图像确定子单元,用于根据时间标识,将至少两帧待识别图像中首次识别到的人脸图像作为基准图像。The reference image determination subunit is used to use the first recognized face image in at least two frames of images to be recognized as the reference image according to the time stamp.
特征相似度计算单元,用于根据时间标识,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度。The feature similarity calculation unit is used to calculate the feature similarity between the reference image and the remaining images to be recognized by sequentially adopting a similarity algorithm according to the time mark.
第一图像聚类集合确定子单元,用于若特征相似度大于预设阈值,将特征相似度大于预设阈值的待识别图像和基准图像归属于同一图像聚类集合。The first image cluster set determining subunit is configured to, if the feature similarity is greater than the preset threshold, attribute the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set.
第二图像聚类集合确定子单元,用于若特征相似度不大于预设阈值,根据时间标识,将特征相似度不大于预设阈值的剩余待识别图像中的第一张更新为新的基准图像,重复执行根据时间标识,依次采用相似度算法计算基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧待识别图像聚类集合完成,形成至少两个图像聚类集合。The second image clustering set determining subunit is used to update the first image in the remaining images to be identified whose feature similarity is not greater than the preset threshold according to the time mark if the feature similarity is not greater than the preset threshold to a new reference For the image, the steps of calculating the feature similarity between the reference image and the remaining image to be identified are repeated according to the time mark and sequentially using the similarity algorithm, until at least two image cluster sets to be identified are completed to form at least two image cluster sets.
在一实施例中,单帧情绪确定模块30,包括人脸关键点获取单元、局部特征提取单元、目标面部动作单元获取单元、单帧情绪获取单元。In an embodiment, the single frame emotion determination module 30 includes a face key point acquisition unit, a local feature extraction unit, a target facial action unit acquisition unit, and a single frame emotion acquisition unit.
人脸关键点获取单元,用于采用人脸关键点算法对每一图像聚类集合中的待识别图像进行人脸识别,获取与每一待识别图像对应的人脸关键点。The face key point acquisition unit is used to perform face recognition on the image to be recognized in each image clustering set by using the face key point algorithm, and obtain the face key point corresponding to each image to be recognized.
局部特征提取单元,用于采用特征提取算法对每一待识别图像对应的人脸关键点进行特征提取,获取人脸关键点对应的局部特征。The local feature extraction unit is used to perform feature extraction on the key points of the face corresponding to each image to be recognized by using a feature extraction algorithm to obtain the local features corresponding to the key points of the face.
目标面部动作单元获取单元,用于采用预先训练好的分类器对局部特征进行识别,获取每一局部特征对应的目标面部动作单元。The target facial action unit acquisition unit is used to recognize the local features using a pre-trained classifier, and acquire the target facial action unit corresponding to each local feature.
单帧情绪获取单元,用于基于每一局部特征对应的目标面部动作单元,查找评估表,获取每一待识别图像的单帧情绪。The single frame emotion acquisition unit is used to look up the evaluation table based on the target facial action unit corresponding to each local feature, and acquire the single frame emotion of each image to be recognized.
关于基于图像识别的商品展示装置的具体限定可以参见上文中对于基于图像识别的商品展示方法的限定,在此不再赘述。上述基于图像识别的商品展示装置中的各个模块可全部或部分通过软件、硬件及其组和来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the product display device based on image recognition, please refer to the above definition of the product display method based on image recognition, which will not be repeated here. The various modules in the above-mentioned image recognition-based product display device can be implemented in whole or in part by software, hardware and their combination. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务端,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储人脸检测模型和待展示商品的属性信息等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于图像识别的商品展示方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the face detection model and the attribute information of the commodity to be displayed. The network interface of the computer device is used to communicate with external terminals through a network connection. When the computer-readable instructions are executed by the processor, a method for displaying goods based on image recognition is realized.
在一实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中基于图像识别的商品展示方法的步骤,例如,图2所示的步骤S10至步骤S60,或者,图3至图7所示的步骤。处理器执行计算机可读指令时实现上述实施例中基于图像识别的商品展示装置中的各模块/单元的功能,例如,图8所示模块10至模块50的功能。为避免重复,此处不再赘述。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. The processor executes the computer-readable instructions to implement the The steps of the product display method based on image recognition are, for example, the steps S10 to S60 shown in FIG. 2 or the steps shown in FIG. 3 to FIG. 7. When the processor executes the computer-readable instructions, the functions of the modules/units in the commodity display apparatus based on image recognition in the foregoing embodiments are implemented, for example, the functions of modules 10 to 50 shown in FIG. 8. To avoid repetition, I won’t repeat them here.
在一实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤时实现上述方法实施例中基于图像识别的商品展示方法,例如,图2所示的步骤S10至步骤S60,或者,图3至图7所示的步骤。该计算机可读指令被处理器执行时实现上述实施例中基于图像识别的商品展示装置中各模 块/单元的功能,例如,图8所示模块10至模块60的功能。为避免重复,此处不再赘述。本实施例中的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In an embodiment, one or more readable storage media storing computer readable instructions are provided, the computer readable storage medium storing computer readable instructions, and the computer readable instructions are processed by one or more When the processor executes, the one or more processors execute the following steps to implement the product display method based on image recognition in the above method embodiment, for example, step S10 to step S60 shown in FIG. 2, or, FIG. 3 to FIG. 7 steps shown. When the computer readable instruction is executed by the processor, the function of each module/unit in the commodity display device based on image recognition in the above embodiment is realized, for example, the function of the module 10 to the module 60 shown in FIG. 8. To avoid repetition, I won’t repeat them here. The readable storage medium in this embodiment includes a nonvolatile readable storage medium and a volatile readable storage medium.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一非易失性可读存储介质也可以存储在易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(RambuS)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile readable storage. The medium may also be stored in a volatile readable storage medium, and when the computer readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (RambuS) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In actual applications, the above-mentioned functions can be allocated by different functional units, Module completion means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of this application, and should be included in this application. Within the scope of protection.

Claims (20)

  1. 一种基于图像识别的商品展示方法,其特征在于,包括:A product display method based on image recognition, characterized in that it includes:
    获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;
    采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;
    若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
    基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
    依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;
    依据所述待展示商品对应的最终情绪,获取目标展示商品。Obtain the target display product according to the final emotion corresponding to the product to be displayed.
  2. 如权利要求1所述的基于图像识别的商品展示方法,其特征在于,所述获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像,包括:The method for displaying products based on image recognition according to claim 1, wherein said obtaining target video data of the products to be displayed, said target video data including at least two frames of images to be recognized, comprises:
    获取待展示商品的初始视频数据,所述初始视频数据包括至少两帧初始视频图像;Acquiring initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images;
    获取所述待展示商品的属性信息,所述属性信息包括适宜年龄和适宜性别;Acquiring attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;
    采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,基于至少两帧所述待识别图像形成目标视频数据。The initial video image is screened using the appropriate age and the appropriate gender to obtain an image to be recognized, and target video data is formed based on at least two frames of the image to be recognized.
  3. 如权利要求2所述的基于图像识别的商品展示方法,其特征在于,在所述采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选的步骤之前,所述基于图像识别的商品展示方法还包括:The product display method based on image recognition according to claim 2, characterized in that, before the step of screening the initial video image with the appropriate age and the appropriate gender, the image recognition-based Product display methods also include:
    采用超分辨率技术对至少两帧所述初始视频图像进行处理,获取与至少两帧所述初始视频图像对应的高分辨率图像,将所述高分辨率图像作为待确定图像;Use super-resolution technology to process at least two frames of the initial video image, obtain a high-resolution image corresponding to at least two frames of the initial video image, and use the high-resolution image as an image to be determined;
    所述采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,包括:The screening the initial video image by using the appropriate age and the appropriate gender to obtain the image to be recognized includes:
    采用预先训练好的分类器对至少两帧所述待确定图像进行识别,获取与每一所述待确定图像对应的目标年龄和目标性别;Recognizing at least two frames of the to-be-determined images using a pre-trained classifier, and obtaining the target age and target gender corresponding to each of the to-be-determined images;
    将所述目标年龄与所述适宜年龄进行匹配,并将所述目标性别与所述适宜性别进行匹配;Matching the target age with the appropriate age, and matching the target gender with the appropriate gender;
    若所述目标年龄与所述适宜年龄匹配成功,且所述目标性别与所述适宜性别匹配成功,则将与所述目标年龄和所述目标性别对应的待确定图像作为待识别图像。If the target age is successfully matched with the suitable age, and the target gender is successfully matched with the suitable gender, the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
  4. 如权利要求1所述的基于图像识别的商品展示方法,其特征在于,所述采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,包括:The product display method based on image recognition according to claim 1, wherein the face detection model is used to perform face recognition and clustering on at least two frames of the to-be-recognized images to obtain the corresponding target video data The number of customers and the image cluster set corresponding to each customer, including:
    采用人脸检测模型对至少两帧所述待识别图像进行人脸识别,获取每一所述待识别图像对应的人脸图像;Using a face detection model to perform face recognition on at least two frames of the to-be-recognized images, and obtain a face image corresponding to each of the to-be-recognized images;
    对所述待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,每一图像聚类集合包括至少一帧待识别图像;Clustering the face images corresponding to the image to be recognized to obtain at least two image cluster sets, each image cluster set including at least one frame of the image to be recognized;
    根据所述图像聚类集合的数量,获取所述目标视频数据对应的客户数量。According to the number of the image clustering sets, the number of customers corresponding to the target video data is obtained.
  5. 如权利要求4所述的基于图像识别的商品展示方法,其特征在于,每一所述待识别图像对应一时间标识;The method for displaying goods based on image recognition according to claim 4, wherein each of the images to be recognized corresponds to a time stamp;
    所述对所述待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,包括:The clustering of the face images corresponding to the image to be recognized to obtain at least two image cluster sets includes:
    根据时间标识,将至少两帧所述待识别图像中首次识别到的人脸图像作为基准图像;According to the time mark, use the first recognized face image in at least two frames of the image to be recognized as a reference image;
    根据时间标识,依次采用相似度算法计算所述基准图像与剩余待识别图像的特征相似 度;According to the time stamp, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized;
    若所述特征相似度大于预设阈值,将所述特征相似度大于预设阈值的待识别图像和所述基准图像归属于同一图像聚类集合;If the feature similarity is greater than the preset threshold, assign the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set;
    若所述特征相似度不大于预设阈值,根据时间标识,将所述特征相似度不大于预设阈值的剩余待识别图像中的第一张更新为新的基准图像,重复执行所述根据时间标识,依次采用相似度算法计算所述基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧所述待识别图像聚类集合完成,形成至少两个图像聚类集合。If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images with the feature similarity not greater than the preset threshold to a new reference image, and repeat the time-based Identify the steps of calculating the feature similarity between the reference image and the remaining images to be identified by sequentially using a similarity algorithm, until at least two frames of the image cluster sets to be identified are completed, forming at least two image cluster sets.
  6. 如权利要求1所述的基于图像识别的商品展示方法,其特征在于,所述采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪,包括:The product display method based on image recognition according to claim 1, wherein the pre-trained micro-expression recognition model is used to recognize the image to be recognized in each image cluster set, and each The single frame emotion of the image to be recognized includes:
    采用人脸关键点算法对每一所述图像聚类集合中的待识别图像进行人脸识别,获取与每一所述待识别图像对应的人脸关键点;Using a face key point algorithm to perform face recognition on the images to be recognized in each of the image clustering sets, and obtain the face key points corresponding to each of the images to be recognized;
    采用特征提取算法对每一所述待识别图像对应的人脸关键点进行特征提取,获取所述人脸关键点对应的局部特征;Using a feature extraction algorithm to perform feature extraction on the face key points corresponding to each of the images to be recognized, and obtain the local features corresponding to the face key points;
    采用预先训练好的分类器对所述局部特征进行识别,获取每一所述局部特征对应的目标面部动作单元;Use a pre-trained classifier to recognize the local features, and obtain a target facial action unit corresponding to each of the local features;
    基于每一所述局部特征对应的目标面部动作单元,查找评估表,获取每一所述待识别图像的单帧情绪。Based on the target facial action unit corresponding to each of the local features, the evaluation table is searched to obtain the single frame emotion of each of the images to be recognized.
  7. 一种基于图像识别的商品展示装置,其特征在于,包括:A product display device based on image recognition, characterized in that it comprises:
    数据获取模块,用于获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;A data acquisition module for acquiring target video data of a commodity to be displayed, where the target video data includes at least two frames of images to be recognized;
    图像聚类集合获取模块,用于采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;The image clustering collection acquisition module is used to perform face recognition and clustering on at least two frames of the to-be-recognized images by using a face detection model to acquire the number of customers corresponding to the target video data and the image clusters corresponding to each customer A set, each of the image cluster sets includes at least one frame to be recognized;
    单帧情绪确定模块,用于若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;The single-frame emotion determination module is configured to, if the number of the customers is greater than the preset number, use a pre-trained micro-expression recognition model to identify the image to be recognized in each image cluster set, and obtain each Single frame emotion of the image to be recognized;
    目标情绪获取模块,用于基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;A target emotion obtaining module, configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
    最终情绪获取模块,用于依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;The final emotion obtaining module is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set;
    目标展示商品获取模块,用于依据所述待展示商品对应的最终情绪,获取目标展示商品。The target display product obtaining module is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.
  8. 如权利要求7所述的基于图像识别的商品展示装置,其特征在于,所述数据获取模块,包括:8. The product display device based on image recognition of claim 7, wherein the data acquisition module comprises:
    初始视频数据获取单元,用于获取待展示商品的初始视频数据,所述初始视频数据包括至少两帧初始视频图像;An initial video data acquiring unit, configured to acquire initial video data of a commodity to be displayed, the initial video data including at least two frames of initial video images;
    属性信息确定单元,用于获取所述待展示商品的属性信息,所述属性信息包括适宜年龄和适宜性别;The attribute information determining unit is configured to obtain attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;
    目标视频数据形成单元,用于采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,基于至少两帧所述待识别图像形成目标视频数据。The target video data forming unit is configured to filter the initial video image by using the suitable age and the suitable gender, obtain the image to be recognized, and form target video data based on at least two frames of the image to be recognized.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that, when the processor executes the computer-readable instructions, it is implemented as follows step:
    获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;
    采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;
    若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
    基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
    依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;
    依据所述待展示商品对应的最终情绪,获取目标展示商品。Obtain the target display product according to the final emotion corresponding to the product to be displayed.
  10. 如权利要求9所述的计算机设备,其特征在于,所述获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像,包括:9. The computer device according to claim 9, wherein said acquiring target video data of the commodity to be displayed, said target video data including at least two frames of images to be recognized, comprises:
    获取待展示商品的初始视频数据,所述初始视频数据包括至少两帧初始视频图像;Acquiring initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images;
    获取所述待展示商品的属性信息,所述属性信息包括适宜年龄和适宜性别;Acquiring attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;
    采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,基于至少两帧所述待识别图像形成目标视频数据。The initial video image is screened using the appropriate age and the appropriate gender to obtain an image to be recognized, and target video data is formed based on at least two frames of the image to be recognized.
  11. 如权利要求10所述的计算机设备,其特征在于,在所述采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:10. The computer device of claim 10, wherein the processor executes the computer-readable instructions before the step of screening the initial video image using the appropriate age and the appropriate gender It also implements the following steps:
    采用超分辨率技术对至少两帧所述初始视频图像进行处理,获取与至少两帧所述初始视频图像对应的高分辨率图像,将所述高分辨率图像作为待确定图像;Use super-resolution technology to process at least two frames of the initial video image, obtain a high-resolution image corresponding to at least two frames of the initial video image, and use the high-resolution image as an image to be determined;
    所述采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,包括:The screening the initial video image by using the appropriate age and the appropriate gender to obtain the image to be recognized includes:
    采用预先训练好的分类器对至少两帧所述待确定图像进行识别,获取与每一所述待确定图像对应的目标年龄和目标性别;Recognizing at least two frames of the to-be-determined images using a pre-trained classifier, and obtaining the target age and target gender corresponding to each of the to-be-determined images;
    将所述目标年龄与所述适宜年龄进行匹配,并将所述目标性别与所述适宜性别进行匹配;Matching the target age with the appropriate age, and matching the target gender with the appropriate gender;
    若所述目标年龄与所述适宜年龄匹配成功,且所述目标性别与所述适宜性别匹配成功,则将与所述目标年龄和所述目标性别对应的待确定图像作为待识别图像。If the target age is successfully matched with the suitable age, and the target gender is successfully matched with the suitable gender, the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
  12. 如权利要求9所述的计算机设备,其特征在于,所述采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,包括:The computer device according to claim 9, wherein the face detection model is used to perform face recognition and clustering on at least two frames of the to-be-recognized images to obtain the number of customers and the number of customers corresponding to the target video data. An image cluster set corresponding to a customer, including:
    采用人脸检测模型对至少两帧所述待识别图像进行人脸识别,获取每一所述待识别图像对应的人脸图像;Using a face detection model to perform face recognition on at least two frames of the to-be-recognized images, and obtain a face image corresponding to each of the to-be-recognized images;
    对所述待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,每一图像聚类集合包括至少一帧待识别图像;Clustering the face images corresponding to the image to be recognized to obtain at least two image cluster sets, each image cluster set including at least one frame of the image to be recognized;
    根据所述图像聚类集合的数量,获取所述目标视频数据对应的客户数量。According to the number of the image clustering sets, the number of customers corresponding to the target video data is obtained.
  13. 如权利要求12所述的计算机设备,其特征在于,每一所述待识别图像对应一时间标识;11. The computer device of claim 12, wherein each of the images to be recognized corresponds to a time stamp;
    所述对所述待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,包括:The clustering of the face images corresponding to the image to be recognized to obtain at least two image cluster sets includes:
    根据时间标识,将至少两帧所述待识别图像中首次识别到的人脸图像作为基准图像;According to the time mark, use the first recognized face image in at least two frames of the image to be recognized as a reference image;
    根据时间标识,依次采用相似度算法计算所述基准图像与剩余待识别图像的特征相似度;According to the time stamp, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized;
    若所述特征相似度大于预设阈值,将所述特征相似度大于预设阈值的待识别图像和所述基准图像归属于同一图像聚类集合;If the feature similarity is greater than the preset threshold, assign the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set;
    若所述特征相似度不大于预设阈值,根据时间标识,将所述特征相似度不大于预设阈 值的剩余待识别图像中的第一张更新为新的基准图像,重复执行所述根据时间标识,依次采用相似度算法计算所述基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧所述待识别图像聚类集合完成,形成至少两个图像聚类集合。If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images with the feature similarity not greater than the preset threshold to a new reference image, and repeat the time-based Identify the steps of calculating the feature similarity between the reference image and the remaining images to be identified by sequentially using a similarity algorithm, until at least two frames of the image cluster sets to be identified are completed, forming at least two image cluster sets.
  14. 如权利要求9所述的计算机设备,其特征在于,所述采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪,包括:The computer device according to claim 9, wherein the pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image cluster sets, and each of the images to be recognized is obtained The single frame emotions include:
    采用人脸关键点算法对每一所述图像聚类集合中的待识别图像进行人脸识别,获取与每一所述待识别图像对应的人脸关键点;Using a face key point algorithm to perform face recognition on the images to be recognized in each of the image clustering sets, and obtain the face key points corresponding to each of the images to be recognized;
    采用特征提取算法对每一所述待识别图像对应的人脸关键点进行特征提取,获取所述人脸关键点对应的局部特征;Using a feature extraction algorithm to perform feature extraction on the face key points corresponding to each of the images to be recognized, and obtain the local features corresponding to the face key points;
    采用预先训练好的分类器对所述局部特征进行识别,获取每一所述局部特征对应的目标面部动作单元;Use a pre-trained classifier to recognize the local features, and obtain a target facial action unit corresponding to each of the local features;
    基于每一所述局部特征对应的目标面部动作单元,查找评估表,获取每一所述待识别图像的单帧情绪。Based on the target facial action unit corresponding to each of the local features, the evaluation table is searched to obtain the single frame emotion of each of the images to be recognized.
  15. 一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, characterized in that, when the computer readable instructions are executed by one or more processors, Make the one or more processors execute the following steps:
    获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像;Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;
    采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,每一所述图像聚类集合包括至少一帧待识别图像;Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;
    若所述客户数量大于预设数量,则采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪;If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;
    基于至少一帧所述待识别图像的单帧情绪,获取与所述图像聚类集合对应的客户的目标情绪;Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;
    依据所述客户数量和每一所述图像聚类集合对应的客户的目标情绪,获取最终情绪;Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;
    依据所述待展示商品对应的最终情绪,获取目标展示商品。Obtain the target display product according to the final emotion corresponding to the product to be displayed.
  16. 如权利要求15所述的可读存储介质,其特征在于,所述获取待展示商品的目标视频数据,所述目标视频数据包含至少两帧待识别图像,包括:15. The readable storage medium according to claim 15, wherein said acquiring target video data of a commodity to be displayed, said target video data including at least two frames of images to be recognized, comprises:
    获取待展示商品的初始视频数据,所述初始视频数据包括至少两帧初始视频图像;Acquiring initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images;
    获取所述待展示商品的属性信息,所述属性信息包括适宜年龄和适宜性别;Acquiring attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;
    采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,基于至少两帧所述待识别图像形成目标视频数据。The initial video image is screened using the appropriate age and the appropriate gender to obtain an image to be recognized, and target video data is formed based on at least two frames of the image to be recognized.
  17. 如权利要求16所述的可读存储介质,其特征在于,在所述采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选的步骤之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 16, wherein before the step of screening the initial video image with the appropriate age and the appropriate gender, the computer readable instruction is executed by one or When multiple processors execute, the one or more processors further execute the following steps:
    采用超分辨率技术对至少两帧所述初始视频图像进行处理,获取与至少两帧所述初始视频图像对应的高分辨率图像,将所述高分辨率图像作为待确定图像;Use super-resolution technology to process at least two frames of the initial video image, obtain a high-resolution image corresponding to at least two frames of the initial video image, and use the high-resolution image as an image to be determined;
    所述采用所述适宜年龄和所述适宜性别对所述初始视频图像进行筛选,获取待识别图像,包括:The screening the initial video image by using the appropriate age and the appropriate gender to obtain the image to be recognized includes:
    采用预先训练好的分类器对至少两帧所述待确定图像进行识别,获取与每一所述待确定图像对应的目标年龄和目标性别;Recognizing at least two frames of the to-be-determined images using a pre-trained classifier, and obtaining the target age and target gender corresponding to each of the to-be-determined images;
    将所述目标年龄与所述适宜年龄进行匹配,并将所述目标性别与所述适宜性别进行匹配;Matching the target age with the appropriate age, and matching the target gender with the appropriate gender;
    若所述目标年龄与所述适宜年龄匹配成功,且所述目标性别与所述适宜性别匹配成 功,则将与所述目标年龄和所述目标性别对应的待确定图像作为待识别图像。If the target age is successfully matched with the suitable age, and the target gender is successfully matched with the suitable gender, then the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
  18. 如权利要求15所述的可读存储介质,其特征在于,所述采用人脸检测模型对至少两帧所述待识别图像进行人脸识别和聚类,获取所述目标视频数据对应的客户数量和每一客户对应的图像聚类集合,包括:The readable storage medium according to claim 15, wherein the face detection model is used to perform face recognition and clustering on at least two frames of the to-be-recognized images to obtain the number of customers corresponding to the target video data The image cluster set corresponding to each customer includes:
    采用人脸检测模型对至少两帧所述待识别图像进行人脸识别,获取每一所述待识别图像对应的人脸图像;Using a face detection model to perform face recognition on at least two frames of the to-be-recognized images, and obtain a face image corresponding to each of the to-be-recognized images;
    对所述待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,每一图像聚类集合包括至少一帧待识别图像;Clustering the face images corresponding to the image to be recognized to obtain at least two image cluster sets, each image cluster set including at least one frame of the image to be recognized;
    根据所述图像聚类集合的数量,获取所述目标视频数据对应的客户数量。According to the number of the image clustering sets, the number of customers corresponding to the target video data is obtained.
  19. 如权利要求18所述的可读存储介质,其特征在于,每一所述待识别图像对应一时间标识;18. The readable storage medium of claim 18, wherein each image to be recognized corresponds to a time stamp;
    所述对所述待识别图像对应的人脸图像进行聚类,获取至少两个图像聚类集合,包括:The clustering of the face images corresponding to the image to be recognized to obtain at least two image cluster sets includes:
    根据时间标识,将至少两帧所述待识别图像中首次识别到的人脸图像作为基准图像;According to the time mark, use the first recognized face image in at least two frames of the image to be recognized as a reference image;
    根据时间标识,依次采用相似度算法计算所述基准图像与剩余待识别图像的特征相似度;According to the time stamp, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized;
    若所述特征相似度大于预设阈值,将所述特征相似度大于预设阈值的待识别图像和所述基准图像归属于同一图像聚类集合;If the feature similarity is greater than the preset threshold, assign the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set;
    若所述特征相似度不大于预设阈值,根据时间标识,将所述特征相似度不大于预设阈值的剩余待识别图像中的第一张更新为新的基准图像,重复执行所述根据时间标识,依次采用相似度算法计算所述基准图像与剩余待识别图像的特征相似度的步骤,直至至少两帧所述待识别图像聚类集合完成,形成至少两个图像聚类集合。If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images with the feature similarity not greater than the preset threshold to a new reference image, and repeat the time-based Identify the steps of calculating the feature similarity between the reference image and the remaining images to be identified by sequentially using a similarity algorithm, until at least two frames of the image cluster sets to be identified are completed, forming at least two image cluster sets.
  20. 如权利要求19所述的可读存储介质,其特征在于,所述采用预先训练好的微表情识别模型对每一所述图像聚类集合中的待识别图像进行识别,获取每一所述待识别图像的单帧情绪,包括:The readable storage medium of claim 19, wherein the pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and each of the images to be recognized is obtained. Recognize the emotion of a single frame of an image, including:
    采用人脸关键点算法对每一所述图像聚类集合中的待识别图像进行人脸识别,获取与每一所述待识别图像对应的人脸关键点;Using a face key point algorithm to perform face recognition on the images to be recognized in each of the image clustering sets, and obtain the face key points corresponding to each of the images to be recognized;
    采用特征提取算法对每一所述待识别图像对应的人脸关键点进行特征提取,获取所述人脸关键点对应的局部特征;Using a feature extraction algorithm to perform feature extraction on the face key points corresponding to each of the images to be recognized, and obtain the local features corresponding to the face key points;
    采用预先训练好的分类器对所述局部特征进行识别,获取每一所述局部特征对应的目标面部动作单元;Use a pre-trained classifier to recognize the local features, and obtain a target facial action unit corresponding to each of the local features;
    基于每一所述局部特征对应的目标面部动作单元,查找评估表,获取每一所述待识别图像的单帧情绪。Based on the target facial action unit corresponding to each of the local features, the evaluation table is searched to obtain the single frame emotion of each of the images to be recognized.
PCT/CN2019/120988 2019-01-17 2019-11-26 Image identification-based product display method, device, apparatus, and medium WO2020147430A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910042541.3A CN109815873A (en) 2019-01-17 2019-01-17 Merchandise display method, apparatus, equipment and medium based on image recognition
CN201910042541.3 2019-01-17

Publications (1)

Publication Number Publication Date
WO2020147430A1 true WO2020147430A1 (en) 2020-07-23

Family

ID=66604505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120988 WO2020147430A1 (en) 2019-01-17 2019-11-26 Image identification-based product display method, device, apparatus, and medium

Country Status (2)

Country Link
CN (1) CN109815873A (en)
WO (1) WO2020147430A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215653A (en) * 2020-10-12 2021-01-12 珠海格力电器股份有限公司 Commodity recommendation method, commodity recommendation device, server and storage medium
CN112434711A (en) * 2020-11-27 2021-03-02 杭州海康威视数字技术股份有限公司 Data management method and device and electronic equipment
CN113705329A (en) * 2021-07-07 2021-11-26 浙江大华技术股份有限公司 Re-recognition method, training method of target re-recognition network and related equipment

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815873A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Merchandise display method, apparatus, equipment and medium based on image recognition
CN110650306B (en) * 2019-09-03 2022-04-15 平安科技(深圳)有限公司 Method and device for adding expression in video chat, computer equipment and storage medium
CN110598790A (en) * 2019-09-12 2019-12-20 北京达佳互联信息技术有限公司 Image identification method and device, electronic equipment and storage medium
CN111144241B (en) * 2019-12-13 2023-06-20 深圳奇迹智慧网络有限公司 Target identification method and device based on image verification and computer equipment
CN111062786B (en) * 2019-12-25 2023-05-23 创新奇智(青岛)科技有限公司 Model updating method based on establishment of commodity appearance characteristic mapping table
CN111291623A (en) * 2020-01-15 2020-06-16 浙江连信科技有限公司 Heart physiological characteristic prediction method and device based on face information
CN111310602A (en) * 2020-01-20 2020-06-19 北京正和恒基滨水生态环境治理股份有限公司 System and method for analyzing attention of exhibit based on emotion recognition
CN111563503A (en) * 2020-05-09 2020-08-21 南宁市第三中学 Minority culture symbol identification method
CN114153342A (en) * 2020-08-18 2022-03-08 深圳市万普拉斯科技有限公司 Visual information display method and device, computer equipment and storage medium
CN112363624B (en) * 2020-11-16 2022-09-09 新之航传媒科技集团有限公司 Interactive exhibition hall system based on emotion analysis
CN113269035A (en) * 2021-04-12 2021-08-17 北京爱奇艺科技有限公司 Image processing method, device, equipment and storage medium
CN113177603B (en) * 2021-05-12 2022-05-06 中移智行网络科技有限公司 Training method of classification model, video classification method and related equipment
CN113762156B (en) * 2021-09-08 2023-10-24 北京优酷科技有限公司 Video data processing method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018143630A1 (en) * 2017-02-01 2018-08-09 삼성전자 주식회사 Device and method for recommending product
CN108509941A (en) * 2018-04-20 2018-09-07 北京京东金融科技控股有限公司 Emotional information generation method and device
CN108858245A (en) * 2018-08-20 2018-11-23 深圳威琳懋生物科技有限公司 A kind of shopping guide robot
CN109048934A (en) * 2018-08-20 2018-12-21 深圳威琳懋生物科技有限公司 A kind of intelligent shopping guide robot system
CN109191190A (en) * 2018-08-20 2019-01-11 深圳威琳懋生物科技有限公司 A kind of control method and computer readable storage medium of shopping guide robot
CN109815873A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Merchandise display method, apparatus, equipment and medium based on image recognition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106843028B (en) * 2016-12-04 2019-07-02 上海如晶新材料科技有限公司 A kind of Intelligent clothes cabinet comprising advertisement and its intelligent screening

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018143630A1 (en) * 2017-02-01 2018-08-09 삼성전자 주식회사 Device and method for recommending product
CN108509941A (en) * 2018-04-20 2018-09-07 北京京东金融科技控股有限公司 Emotional information generation method and device
CN108858245A (en) * 2018-08-20 2018-11-23 深圳威琳懋生物科技有限公司 A kind of shopping guide robot
CN109048934A (en) * 2018-08-20 2018-12-21 深圳威琳懋生物科技有限公司 A kind of intelligent shopping guide robot system
CN109191190A (en) * 2018-08-20 2019-01-11 深圳威琳懋生物科技有限公司 A kind of control method and computer readable storage medium of shopping guide robot
CN109815873A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Merchandise display method, apparatus, equipment and medium based on image recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215653A (en) * 2020-10-12 2021-01-12 珠海格力电器股份有限公司 Commodity recommendation method, commodity recommendation device, server and storage medium
CN112434711A (en) * 2020-11-27 2021-03-02 杭州海康威视数字技术股份有限公司 Data management method and device and electronic equipment
CN112434711B (en) * 2020-11-27 2023-10-13 杭州海康威视数字技术股份有限公司 Data management method and device and electronic equipment
CN113705329A (en) * 2021-07-07 2021-11-26 浙江大华技术股份有限公司 Re-recognition method, training method of target re-recognition network and related equipment

Also Published As

Publication number Publication date
CN109815873A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
WO2020147430A1 (en) Image identification-based product display method, device, apparatus, and medium
Kao et al. Visual aesthetic quality assessment with a regression model
Othmani et al. Age estimation from faces using deep learning: A comparative analysis
Fabian Benitez-Quiroz et al. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild
CN109002766B (en) Expression recognition method and device
US9262445B2 (en) Image ranking based on attribute correlation
Abd El Meguid et al. Fully automated recognition of spontaneous facial expressions in videos using random forest classifiers
US10282431B1 (en) Image similarity-based group browsing
WO2021003938A1 (en) Image classification method and apparatus, computer device and storage medium
Liu et al. Age classification using convolutional neural networks with the multi-class focal loss
CN110175298B (en) User matching method
CN107145857A (en) Face character recognition methods, device and method for establishing model
Islam et al. A CNN based approach for garments texture design classification
Sumi et al. Human gender detection from facial images using convolution neural network
Fahira et al. Classical machine learning classification for javanese traditional food image
Vasavi et al. Age detection in a surveillance video using deep learning technique
Verma et al. Estimation of sex through morphometric landmark indices in facial images with strength of evidence in logistic regression analysis
Dhanashree et al. Fingernail analysis for early detection and diagnosis of diseases using machine learning techniques
Nihar et al. Plant disease detection through the implementation of diversified and modified neural network algorithms
Milke et al. Development of a coffee wilt disease identification model using deep learning
Bashir et al. A comprehensive review on apple leaf disease detection
CN112580527A (en) Facial expression recognition method based on convolution long-term and short-term memory network
Wang et al. A study of convolutional sparse feature learning for human age estimate
Lakshmi et al. Rice Classification and Quality Analysis using Deep Neural Network
Moran Classifying emotion using convolutional neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19910035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 08.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19910035

Country of ref document: EP

Kind code of ref document: A1