WO2020147430A1

WO2020147430A1 - Image identification-based product display method, device, apparatus, and medium

Info

Publication number: WO2020147430A1
Application number: PCT/CN2019/120988
Authority: WO
Inventors: 罗琳耀; 徐国强; 邱寒
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-01-17
Filing date: 2019-11-26
Publication date: 2020-07-23
Also published as: CN109815873A

Abstract

An image identification-based product display method, a device, an apparatus, and a medium. The method comprises: performing, by means of a face detection model, face identification and clustering on at least two images to be identified in target video data of a product being displayed, and acquiring a customer number corresponding to the target video data and respective image cluster sets corresponding to customers; if the customer number is greater than a pre-determined number, identifying images to be identified in each of the image cluster sets by means of a pre-trained microexpression identification model, and acquiring respective single-frame emotions of the images (S30); acquiring, on the basis of the single-frame emotions of the images to be identified, target emotions of the customers corresponding to the image cluster sets (S40); acquiring a final emotion according to the customer number and the respective target emotions of the customers corresponding to the image cluster sets (S50); and acquiring a target display product according to the final emotion corresponding to the product to being displayed (S60). The invention solves the problem of insufficient appeal of display products.

Description

Commodity display method, device, equipment and medium based on image recognition

This application is based on the Chinese invention application filed on January 17, 2019 with the application number 201910042541.3, titled "Image recognition-based commodity display method, device, equipment and medium", and claims its priority.

Technical field

This application relates to the field of intelligent decision-making, and in particular to a method, device, equipment and medium for displaying commodities based on image recognition.

Background technique

At present, as time changes, many merchants will launch new products during the launch of new products or seasons, and they need to promote the products to attract customers to purchase. Usually merchants select some products to display for customers to browse and view, but because they don’t know the customer’s preferences, they select better products from the products for display from a personal perspective, or they are internal personnel of the company. Decide on the products that need to be displayed, so that the displayed products do not meet the customer's vision or needs, reduce the attractiveness of the displayed products, and cannot attract customers to buy.

Summary of the invention

The embodiments of the present application provide a method, device, equipment, and medium for displaying commodities based on image recognition to solve the problem of insufficient attractiveness of displayed commodities.

A product display method based on image recognition, including:

Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;

Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;

If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;

Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;

Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;

Obtain the target display product according to the final emotion corresponding to the product to be displayed.

A product display device based on image recognition, including:

A data acquisition module for acquiring target video data of a commodity to be displayed, where the target video data includes at least two frames of images to be recognized;

The image clustering collection acquisition module is used to perform face recognition and clustering on at least two frames of the to-be-recognized images by using a face detection model to acquire the number of customers corresponding to the target video data and the image clusters corresponding to each customer A set, each of the image cluster sets includes at least one frame to be recognized;

The single-frame emotion determination module is configured to, if the number of the customers is greater than the preset number, use a pre-trained micro-expression recognition model to identify the image to be recognized in each image cluster set, and obtain each Single frame emotion of the image to be recognized;

A target emotion obtaining module, configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;

The final emotion obtaining module is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set;

The target display product obtaining module is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.

A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one Or multiple processors perform the following steps:

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings, and claims.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings required in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application For those of ordinary skill in the art, without paying creative labor, other drawings can also be obtained based on these drawings.

FIG. 1 is a schematic diagram of an application environment of a commodity display method based on image recognition in an embodiment of the present application;

FIG. 2 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;

FIG. 3 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;

FIG. 4 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;

FIG. 5 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;

FIG. 6 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;

FIG. 7 is a flowchart of a method for displaying products based on image recognition in an embodiment of the present application;

FIG. 8 is a functional block diagram of a product display device based on image recognition in an embodiment of the present application;

Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.

The product display method based on image recognition provided by the embodiment of the present application can be applied in the application environment as shown in FIG. 1, wherein the client communicates with the server through the network. The product display method based on image recognition is applied on the server, and the target video data corresponding to the product to be displayed is analyzed and identified, and the emotion corresponding to each customer in the target video data is obtained, and the customer pair is determined according to the emotion corresponding to each customer The final sentiment of the product to be displayed, the target display product is determined according to the final sentiment corresponding to each product to be displayed, so that the target display product is the product that customers pay more attention to, and the attractiveness of the target display product is improved to attract customers to purchase the displayed product . Among them, the client can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2, a product display method based on image recognition is provided. The method is applied to the server in FIG. 1 as an example for description, which specifically includes the following steps:

S10: Obtain target video data of the commodity to be displayed, where the target video data includes at least two frames of images to be recognized.

Among them, the target video data refers to the video data obtained by filtering the initial video data corresponding to the commodity to be displayed, and may specifically be video data that meets some conditions. For example, video data satisfying the attribute information of the product to be displayed. The attribute information may include appropriate age and appropriate gender. Understandably, the collected initial video data corresponding to the commodity to be displayed is filtered by suitable age and suitable gender to obtain the target video data. The image to be recognized refers to an image that is screened according to a suitable age and a suitable gender. Among them, the initial video data is the video data corresponding to each commodity to be displayed collected by the video collection tool.

Specifically, a video collection tool is configured in advance for each area where the commodity to be displayed is located. The video collection tool is used to collect images or video data. When the video collection tool detects that a customer appears within the collection range of the collection tool, the video The acquisition tool automatically triggers and collects image or video data from the customer. The video collection tool is specifically a camera, through which the initial video data within the collection range corresponding to each commodity to be displayed can be collected in real time. Since each collection tool corresponds to a product to be displayed, the initial video data of each area of the product to be displayed is collected through each camera, and the initial video data of the area of the product to be displayed is filtered to obtain each and the product to be displayed. The target video data corresponding to the product. Wherein, the collected initial video data carries a product identifier corresponding to the product to be displayed, so that the corresponding target video data can be determined subsequently through the product identifier. For example, the initial video data collected by the video collection tool includes the product identifier A, then the initial video data is the initial video data corresponding to the product identifier A, and the initial video data is filtered to obtain the target video data corresponding to the product identifier A. Among them, the product identifier refers to a unique identifier used to distinguish different products to be displayed. Optionally, the product identification may consist of at least one of numbers, letters, words or symbols. For example, the product identifier may be the serial number or serial number of the product to be displayed.

S20: Use the face detection model to perform face recognition and clustering on at least two frames of images to be recognized, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, and each image cluster set includes at least one Frame the image to be recognized.

Among them, the face detection model refers to a pre-trained model used to detect whether each frame of the image to be recognized contains a human face area. The number of customers refers to the number determined according to different customers in the target video data. The image clustering set refers to clustering the to-be-identified images corresponding to the same customer to form a set.

Specifically, the server is connected to the database network, the face detection model is stored in the database, the target video data contains the images to be recognized corresponding to different customers, and the face detection model is used to recognize the images to be recognized to obtain the images that contain Face image, where the face image refers to the image of the customer’s face area. The server inputs each image to be recognized in the acquired target video data into the face detection model, and performs face recognition on each image to be recognized in the target video data through the face detection model to determine whether each image to be recognized is Is a face image, if the image to be recognized is a face image, cluster the same face image, that is, cluster the image to be recognized corresponding to the same face image to obtain the image cluster corresponding to each customer Set, determine the number of customers in the target video data according to the number of image clustering sets in the target video data.

More specifically, the feature extraction algorithm is used to extract the facial features of the facial image corresponding to each image to be recognized, and the facial features corresponding to each image to be recognized are subjected to feature similarity calculation. If the feature similarity is greater than a preset threshold, It means that the facial image corresponding to the feature similarity greater than the preset threshold is the facial image corresponding to the same customer, and the image to be recognized corresponding to the facial image corresponding to the same customer is clustered to obtain the image cluster corresponding to each customer Set, and determine the number of customers in the target video data according to the number of image clustering sets. Wherein, when the threshold is preset, it is preset to evaluate whether the similarity reaches the value determined to be the face of the same customer. Feature extraction algorithms include, but are not limited to, CNN (Convolutional Neural Network, convolutional neural network) algorithms. For example, the CNN algorithm can be used to extract the facial features of the facial image corresponding to the image to be recognized.

S30: If the number of customers is greater than the preset number, the pre-trained micro-expression recognition model is used to recognize the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained.

Among them, the micro-expression recognition model is to capture the local features of the customer's face in the image to be recognized, and identify each target facial action unit of the face in the image to be recognized according to the local feature, and then according to the recognized target facial action unit Determine the model of its emotions. Single-frame emotions refer to the emotions to be recognized by the micro-expression recognition model to identify the image to be recognized, and the emotions determined according to the recognized target facial action unit.

Among them, the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on local binary pattern (LBP). In this embodiment, the micro-expression recognition model is a partial recognition model based on classification. When the micro-expression recognition model is trained in advance, a large amount of training image data is collected in advance, and the training image data contains the positive samples and faces of each facial action unit. The negative samples of the action unit are trained on the training image data through the classification algorithm to obtain the micro-expression recognition model. In this embodiment, a large amount of training image data may be trained by an SVM classification algorithm to obtain SVM classifiers corresponding to N facial action units. For example, it may be 39 SVM classifiers corresponding to 39 face action units, or 54 SVM classifiers corresponding to 54 face action units, and the positive samples of different face action units included in the training image data for training The more negative samples, the more SVM classifiers are obtained. Understandably, the micro-expression recognition model is formed through N SVM classifiers, and the more SVM classifiers obtained, the more accurate the emotions recognized by the formed micro-expression recognition model.

Among them, the preset number is a preset value, and the preset number corresponding to each commodity to be displayed is the same. The server determines that the number of customers is greater than the preset number, and uses a pre-trained micro-expression recognition model to gather each image. The images to be recognized in the class set are recognized, and the single frame emotion corresponding to each image to be recognized is obtained. Using a pre-trained micro-expression recognition model to recognize the images to be recognized in each image cluster set specifically includes the following steps. The server first performs face key point detection and feature extraction on the images to be recognized, so as Obtain the corresponding local features, and then input the local features corresponding to the image to be recognized into the pre-trained micro-expression recognition model. The micro-expression recognition model includes N SVM classifiers, each SVM classifier recognizes a local feature corresponding to an image to be recognized, and all the local features of the input are recognized by N SVM classifiers to obtain N SVM classifications When the probability value corresponding to the facial action unit output by the device is greater than the preset probability threshold, the facial action unit is determined as the target facial action unit corresponding to the image to be recognized. Among them, the preset probability threshold is a preset value. The target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model.

In this embodiment, the micro-expression recognition model includes 54 SVM classifiers, and a facial action unit number mapping table is established, and each facial action unit is represented by a predetermined number. For example, AU1 means the inner eyebrows are raised, AU2 is the outer eyebrows are raised, AU5 is the upper eyelids are raised, and AU26 is the lower jaw. Each facial action unit has a corresponding SVM classifier trained. For example, the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow, and the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow. Probability values, etc. The probability value can be a value between 0-1. If the output probability value is 0.6 and the preset probability threshold value is 0.5, then the probability value 0.6 is greater than the preset probability threshold value 0.5, then the facial action unit corresponding to 0.6 is used as the waiting Identify the target facial action unit of the image.

Specifically, the server can determine the single frame emotion of the image to be recognized according to the target facial action unit corresponding to each image to be recognized, that is, query the evaluation table according to the target facial action unit corresponding to each image to be recognized, and obtain the The single frame emotion corresponding to the image. As in the above embodiment, 54 SVM classifiers are used to identify local features, determine all target facial action units corresponding to the image to be recognized, and look up the evaluation table based on all target facial action units to determine the single corresponding to the image to be recognized. Frame emotions to improve the accuracy of obtaining customer single frame emotions. Among them, the evaluation table is a pre-configured table. One or more facial action units form different emotions, the facial action unit combination corresponding to each emotion is obtained in advance, and each facial action unit combination and the corresponding emotion are associated and stored to form an evaluation table.

S40: Obtain the target emotion of the customer corresponding to the image cluster set based on the emotion of a single frame of at least one frame to be recognized.

Among them, the target emotion refers to the emotion determined according to the emotion of a single frame corresponding to each image to be recognized in the image cluster set. Understandably, in the image clustering set, there are images to be recognized corresponding to the same customer, and the target emotion corresponding to the customer is determined through a single frame of emotion corresponding to each image to be recognized of the customer.

Specifically, the server obtains the single frame emotion corresponding to each image to be recognized in the image cluster set, and analyzes the single frame emotion corresponding to each image to be recognized in the image cluster set to obtain the target of the customer corresponding to the image cluster set mood. Understandably, it is judged whether the single frame emotion corresponding to each to-be-recognized image in the image cluster set is the same, and if all the single frame emotions in the image cluster set are the same, then the single frame emotion is used as the target emotion. If at least two single-frame emotions are not the same, determine which single-frame emotion in the image cluster set corresponds to the largest number, and use the single-frame emotion corresponding to the largest number as the target emotion. Specifically, the number of emotions corresponding to each single frame in the image cluster set is counted, and the single frame emotion corresponding to the maximum number is taken as the target emotion of the customer corresponding to the image cluster set. In this embodiment, the target emotions of the customers corresponding to each image cluster set in the target video data corresponding to the products to be displayed are sequentially acquired, that is, the emotions of each customer for the products to be displayed are acquired. For example, the target video data corresponding to product A includes 100 image cluster sets, and the target sentiment of the customer corresponding to each image cluster set is acquired, that is, the target sentiment of 100 customers on product A is acquired.

S50: Obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set.

Among them, the final emotion refers to the emotion corresponding to the product to be displayed obtained through quantitative analysis of the target emotion of the product to be displayed by each customer.

Specifically, the server judges whether the target emotions of the customers corresponding to all image clustering sets are the same according to the target emotions of the customers corresponding to each image clustering set. If the target emotions of the customers corresponding to the image clustering sets are different, it will be based on the customer The number and the target emotion of the customer corresponding to each image cluster set are counted, the number corresponding to each target emotion is counted, and the target emotion with the largest number is taken as the final emotion. For example, the number of customers corresponding to the target video data of product A is 100, that is, the image cluster set is also 100. If the target emotions corresponding to 100 image cluster sets, 50 are joy, 30 are calm, and 20 are cold , Then the target emotion (ie joy) corresponding to the largest number is regarded as the final emotion of A commodity. If there are at least two target emotions corresponding to the largest number, the final emotion is determined based on the number of emotion categories of the target emotion. Preferably, the target emotion is a positive emotion as the final emotion. For example, the number of customers corresponding to the target video data of product A is 100, that is, the image cluster set is also 100. If 100 image cluster sets correspond to the target emotion, 50 are joy and 50 are indifferent, then the target emotion For joy as the final emotion of the product to be displayed. If there are at least two target emotions corresponding to the largest number of juxtapositions, and the target emotions are all negative emotions, since the final emotion corresponding to the final display target display product should be a positive emotion, any target emotion can be selected as the final emotion.

S60: Obtain the target display product according to the final emotion corresponding to the product to be displayed.

Among them, the target display product refers to obtaining a displayable product from the product to be displayed according to the final emotion corresponding to each product to be displayed.

Specifically, the server obtains the target display product from the product to be displayed according to the final emotion corresponding to each product to be displayed, specifically including the following steps:

(1) First determine whether the final emotion corresponding to each commodity to be displayed is a commodity that can be displayed. Specifically, preset emotions that can be displayed are set in advance, and the final emotion corresponding to each product to be displayed is matched with the preset emotion. If the final emotion is successfully matched with the preset emotion, the matched final emotion is matched to the waiting Display products are products that can be displayed. Through this step, the product to be displayed whose target emotion does not match the preset emotion is avoided as the target display product. Generally speaking, the preset emotions are positive emotions, which avoids taking the products to be displayed corresponding to the negative emotions as the target display products. For example, if the final emotion corresponding to the product A to be displayed is joy, most customers are more interested in the product to be displayed, and the joy is matched with the preset emotion. If the joy and the preset emotion match successfully, then the product A is determined It is a product that can be displayed. For another example, if the final emotion corresponding to the item B to be displayed is disgust, anger or disappointment, most customers do not like the item to be displayed, and matching the final emotion corresponding to the item B to be displayed with the preset emotion fails, then B Not as a display product.

(2) Determine the number of products that can be displayed. If the number of products that can be displayed is greater than the preset value, query the emotion ranking table according to the final emotion corresponding to each displayable product, and sort the products to be displayed corresponding to the final emotion , Obtain the preset value of the product to be displayed as the target display product. Among them, the emotion ranking table is a preset table, and the more positive emotions are ranked higher, for example, the ranking is based on joy, joy, surprise, calm, disgust, anger, and disappointment. For example, if the number of items to be displayed that can be displayed is 4, and the preset value is 3, then the final emotion corresponding to the item to be displayed that can be displayed is obtained, A is the final emotion corresponding to the item to be displayed is joy, and B corresponds to the item to be displayed The final emotion of C is joy, the final emotion corresponding to the product to be displayed in C is surprise, and the final emotion corresponding to the product to be displayed in D is calm, then the displayed products are sorted according to the emotional ranking table, namely A, B, C, and D, and get The product to be displayed with the preset value in the front row is used as the target display product, that is, the first three A, B, and C products to be displayed are used as the target display product.

Further, if the products to be displayed corresponding to the final emotions are sorted according to the emotion ranking table, there is a situation of juxtaposition, first determine whether the target display product is obtained within the preset value; if the target display cannot be obtained within the preset value Commodities, the number of target video data of the product to be displayed that matches the final emotion of the tie (ie the maximum number of target emotions) is determined, the finite level of the final emotion of the tie is determined according to the number, and the target display product of the preset value is obtained. For example, if the number of items to be displayed that can be displayed is 4, and the preset value is 3, then the final emotion corresponding to the item to be displayed that can be displayed is obtained, A is the final emotion corresponding to the item to be displayed is joy, and B corresponds to the item to be displayed The final emotion of C is joy, C is the final emotion corresponding to the product to be displayed is calm, and D is the final emotion corresponding to the product to be displayed is calm, then the displayed products are sorted according to the emotion ranking table, namely A, B, C and D, C Parallel to D, and the target display product cannot be obtained within the preset value, then obtain the number of target video data of the product C to be displayed that matches the final sentiment corresponding to the product C to be displayed, if the number is 50; then confirm In the target video data of D to-be-displayed product, the number that matches the final sentiment corresponding to D to-be-displayed product. If the number is 60; then the number of D to-be-displayed products corresponding to a number of 60 is prioritized over the number of C corresponding to 50 For the products to be displayed, a preset value of the products to be displayed is obtained as the target display products, that is, the products to be displayed in A, B, and D are used as the target display products. Through this step, it is avoided that the final emotions appear parallel, so as to improve the speed of determining the target display product.

In steps S10-S60, the target video data of the product to be displayed is obtained, and the face detection model is used to perform face recognition and clustering on the image to be recognized in the target video data, and the number of customers corresponding to the target video data and the number of customers corresponding to each customer are obtained. Image clustering collection, in order to subsequently determine the target display product according to the target sentiment of a certain number of customers, and improve the accuracy of the target display product. If the number of customers is greater than the preset number, the pre-trained micro-expression recognition model is used to identify the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained to realize the recognition of customer emotions . Based on the single frame emotion of at least one frame of the image to be recognized, the target emotion of the customer corresponding to the image cluster set is obtained to determine whether the commodity to be displayed is the commodity that the customer is interested in. According to the number of customers and the target emotions of the customers corresponding to each image clustering set, the final emotions of a certain number of customers for the products to be displayed are obtained, and the target display products are obtained according to the final emotions corresponding to the products to be displayed, and the target display is improved The accuracy of the product makes the target product a product that most customers are more interested in.

In one embodiment, as shown in FIG. 3, in step S10, the target video data of the commodity to be displayed is obtained. The target video data includes at least two frames to be recognized, which specifically includes the following steps:

S11: Obtain initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images.

Among them, the initial video data refers to the video data corresponding to each commodity to be displayed collected by the video collection tool.

Specifically, a video capture tool is used to collect the initial video data corresponding to each product to be displayed. The initial video data includes at least two frames of initial video images, and the initial video data includes the corresponding product identifier of the product to be displayed. Determine the initial video data corresponding to each commodity to be displayed. By collecting the initial video data corresponding to each commodity to be displayed, the initial video data can be subsequently analyzed to obtain the final emotion corresponding to each commodity to be displayed.

S12: Obtain attribute information of the commodity to be displayed, and the attribute information includes a suitable age and a suitable gender.

Among them, the appropriate age refers to the age corresponding to the commodity to be displayed. The appropriate gender refers to the gender corresponding to the product to be displayed.

Specifically, the attribute information corresponding to each commodity to be displayed is stored in the database. The server searches the database according to the products to be displayed, and obtains attribute information corresponding to each product to be displayed. Among them, the attribute information includes suitable age and suitable gender. For example, a certain product to be displayed is clothing, and the attribute information corresponding to the clothing includes a female with a suitable age of 20-24 and a suitable gender. For another example, the product to be displayed is cosmetics, the appropriate age in the attribute information corresponding to the cosmetics is 25-30 years old, and the appropriate gender is male. The commodities to be displayed are not specifically limited in this embodiment.

S13: Screen the initial video image with a suitable age and a suitable gender, obtain an image to be recognized, and form target video data based on at least two frames of the image to be recognized.

Specifically, a pre-trained classifier is used to identify at least two frames of initial video images in the initial video data, and the target age and target gender corresponding to each initial video image are obtained. The server matches the target age with the appropriate age. , The target gender is matched with the appropriate gender, the initial video image that is successfully matched with the appropriate age and successfully matched with the appropriate gender is determined as the image to be recognized, and the initial video image that is unsuccessfully matched is deleted, based on at least two frames of waiting The recognition image forms the target video data. Among them, the target age refers to the age obtained by recognizing the initial video image through a pre-trained classifier. The target gender refers to the gender obtained by recognizing the initial video image through a pre-trained classifier.

Steps S11-S13, screening the initial video images according to the appropriate age and appropriate gender corresponding to the products to be displayed, obtain the images to be recognized, and form target video data based on at least two frames of images to be recognized, so that the target video data and the products to be displayed are more matched , Improve the accuracy of target display products.

In an embodiment, before step S3, that is, before the step of screening the initial video images with appropriate age and appropriate gender, the method for displaying goods based on image recognition further includes:

(1) Use super-resolution technology to process at least two frames of initial video images, obtain high-resolution images corresponding to at least two frames of initial video images, and use the high-resolution images as images to be determined.

Among them, super-resolution technology (Super-Resolution, SR) refers to reconstructing a corresponding high-resolution image from the acquired low-resolution image. Generally, the initial video image in the initial video data is a low-resolution image. The image to be determined refers to the conversion of the initial video image into a high-resolution image.

Specifically, the server obtains the initial video data. The initial video data includes at least two frames of initial video images. The initial video images are in the low-resolution (LR) space. The feature map of the low-resolution space is extracted through the ESPCN algorithm, and the effective sub-pixel Convolutional layer, enlarge the initial video image from low resolution to high resolution, upgrade the final low resolution feature map to high resolution feature map, and obtain the high resolution corresponding to each initial video image based on the high resolution feature map Rate image, the high-resolution image is used as the image to be determined. Among them, the core concept of the ESPCN algorithm is a sub-pixel convolutional layer. The input is a low-resolution image (that is, the initial video image). After passing through the two sub-pixel convolutional layers, the size of the characteristic image obtained is The input image is the same, but the characteristic channel is r^2 (r is the target magnification of the image). Rearrange the r^2 channels of each pixel into an r x r area, corresponding to a sub-block of size r x r in the high-resolution image, so that a feature image of size r 2 x H x W It is rearranged into a high-resolution feature map with the size of 1 x rH x rW, and a high-resolution image corresponding to each frame of the initial video image is obtained according to the high-resolution feature map, and the high-resolution image is used as the image to be determined.

As shown in FIG. 4, in step S13, the initial video image is screened by the appropriate age and the appropriate gender to obtain the image to be recognized, which specifically includes the following steps:

S131: Use a pre-trained classifier to identify at least two frames of images to be determined, and obtain the target age and target gender corresponding to each image to be determined.

Wherein, the pre-trained classifier includes a gender classifier and an age classifier, and the image to be determined is recognized through the gender classifier and the age classifier respectively to obtain the target age and target gender corresponding to the image to be determined. Among them, the target gender refers to the gender obtained by recognizing the image to be determined through the gender classifier. The target age refers to the age obtained by recognizing the image to be determined by the age classifier.

Among them, when the gender classifier and age classifier are trained, a large amount of training image data is first obtained. The training image data contains face images of different ages and different genders, and each face image in the training image data is processed by age and gender. Annotation, the annotated training image data is input to the deep neural network, and the annotated training image data is trained through the deep neural network. Since the deep neural network includes at least two convolutional layers, the predicted age and the labeled The age is compared to adjust the weight and bias of each layer in the deep neural network until the model converges, and the age classifier is obtained. The gender prediction value is compared with the marked gender to adjust the weight and bias of each layer in the deep neural network until the model converges, and the gender classifier is obtained.

Specifically, a pre-trained gender classifier is used to identify the image to be determined. The image to be determined is an image containing the face of the customer. The image to be determined containing the face of the customer is subjected to face key point detection and feature extraction to obtain the face. Department features. Finally, the extracted facial features are input into a pre-trained gender classifier, and the facial features are recognized by the gender classifier to obtain the target gender corresponding to the image to be determined. And input the extracted facial features into a pre-trained age classifier, and classify the facial features through the age classifier to obtain the target age corresponding to the image to be determined. The pre-trained gender classifier and age classifier are used to estimate the gender and age of the customer on the image to be determined, so as to improve the accuracy of obtaining the target gender and target age.

S132: Match the target age with the appropriate age, and match the target gender with the appropriate gender.

Specifically, the appropriate age may be an age group, for example, 20-24 years old. The server matches the target age with the appropriate age, mainly to determine whether the target age is within the appropriate age range. The suitable gender is female and male, and the identified target gender is matched with the suitable gender.

S133: If the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.

Specifically, the server determines that the target age is within the appropriate age range, and the target gender is successfully matched with the appropriate gender, it will use the to-be-determined image corresponding to the target age and the target gender as the image to be recognized.

Steps S131-S132, using a pre-trained classifier to identify at least two frames of images to be determined, and obtain the target age and target gender corresponding to each image to be determined, so as to realize the determination of the target age and target gender through the classifier , To improve the speed of acquiring the target display product. If the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, the image to be determined corresponding to the target age and gender is used as the image to be identified, so that the acquired image to be identified and the attribute information of the product to be displayed Correspondingly, in order to increase the attractiveness of the crowd matching the product to be displayed, by analyzing the image to be recognized, the obtained target display product is more accurate, and the acquisition accuracy of the target display product is improved.

In one embodiment, as shown in FIG. 5, step S20 is to use a face detection model to perform face recognition and clustering on at least two frames of images to be recognized, and obtain the number of customers corresponding to the target video data and the number of customers corresponding to each customer. The image clustering collection specifically includes the following steps:

S21: Use a face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized;

Specifically, the server obtains the target video data, uses a face detection model to perform face recognition on each frame of the image to be recognized in the target video data, and obtains a face image corresponding to each image to be recognized in the target video data. Among them, face recognition means that for any given frame of image, a certain strategy is used to search it to determine whether the image contains a face. Further, the face detection model is a pre-trained model used to detect whether each frame of the image to be recognized contains a human face area. Specifically, the server inputs each frame of the to-be-recognized image into the face detection model, and detects whether each frame of the to-be-recognized image contains a human face. If the to-be-recognized image contains a human face, it acquires each frame of the target video data. A face image corresponding to the image to be recognized.

S22: Cluster the face images corresponding to the image to be recognized, and obtain at least two image cluster sets, and each image cluster set includes at least one frame of the image to be recognized.

Specifically, the server clusters the acquired facial images corresponding to the image to be recognized, clusters the facial images containing the same customer, and acquires at least two image cluster sets, where each image cluster set Includes at least one frame to be recognized. More specifically, the feature extraction algorithm is used to extract the facial features of the facial image corresponding to each image to be recognized, and the facial features corresponding to the facial image are calculated for feature similarity. If the feature similarity is greater than the preset threshold, it is stated It is the face image of the same customer, and clusters the to-be-identified images corresponding to the face image of the same customer to obtain the image cluster set corresponding to each customer. That is, one customer corresponds to one image cluster set, and each image cluster set includes at least one frame to be recognized.

S23: Obtain the number of customers corresponding to the target video data according to the number of image clustering sets.

Specifically, the number of image cluster sets corresponding to each commodity to be displayed is counted, and the number of image cluster sets is taken as the number of customers corresponding to the target video data.

Steps S21-S23: Use the face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized, so as to determine whether the image to be recognized is a face image and avoid not containing a face The images to be recognized are clustered to improve the acquisition speed of subsequent image clustering sets. Cluster the face images corresponding to the image to be recognized, obtain at least two image cluster sets, and obtain the number of customers corresponding to the target video data according to the number of image cluster sets, so as to determine the number of customers and ensure that the number of customers is obtained accuracy.

In one embodiment, each image to be recognized corresponds to a time mark, and the time mark refers to the time corresponding to the image to be recognized is collected.

As shown in FIG. 6, in step S22, that is, clustering the face images corresponding to the image to be recognized, and obtaining at least two image cluster sets, specifically includes the following steps:

S221: According to the time mark, use the first recognized face image in at least two frames of images to be recognized as a reference image.

Among them, the reference image refers to the face image recognized for the first time from the image to be recognized.

Specifically, the server obtains the time stamps corresponding to at least two frames of images to be recognized, and according to the time stamps, first determines the first recognized face image in the at least two frames of images to be recognized, and uses the face image as the reference image. By determining the reference image, the acquisition speed of the image cluster set can be improved.

S222: According to the time mark, a similarity algorithm is successively used to calculate the feature similarity between the reference image and the remaining images to be recognized.

Specifically, according to the time mark, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized except for the reference image in the at least two frames of images to be recognized to obtain the feature similarity. Among them, the similarity algorithm may be Euclidean distance algorithm, Manhattan distance algorithm, Minkowski distance algorithm, or cosine similarity algorithm. In this embodiment, the cosine similarity algorithm is used to calculate the characteristic similarity between the reference image and the remaining images to be recognized, which can speed up the acquisition of image clustering sets and improve the acquisition efficiency of target display products.

S223: If the feature similarity is greater than the preset threshold, attribute the image to be recognized and the reference image whose feature similarity is greater than the preset threshold to the same image cluster set.

Specifically, the preset threshold is a preset value. If the server determines that the feature similarity between the reference image and the remaining images to be recognized is greater than the preset threshold, it is considered that the reference image matches the remaining images to be recognized successfully, and the reference image matches the remaining images. If the image to be identified is an image of the same customer, the image to be identified and the reference image whose feature similarity is greater than the preset threshold are attributed to the same image cluster set. For example, if the feature similarity between the reference image 1 and the remaining image 2 to be recognized is 80%, the feature similarity between the reference image 1 and the remaining image 3 to be recognized is 99%, and the preset threshold is 90%, then the reference image 1 and The feature similarity corresponding to the remaining image 3 to be identified is greater than the preset threshold, and the reference image 1 and the remaining image 3 to be identified are attributed to the same image cluster set.

S224: If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images whose feature similarity is not greater than the preset threshold to a new reference image, repeat the execution according to the time mark, and use it in sequence The similarity algorithm calculates the feature similarity between the reference image and the remaining images to be identified until at least two image cluster sets to be identified are completed, forming at least two image cluster sets.

Specifically, if the server determines that the feature similarity between the reference image and the remaining images to be recognized is not greater than the preset threshold, it is considered that the reference image and the remaining images to be recognized have failed to match, and the customers corresponding to the reference image and the customers corresponding to the remaining images to be recognized are different. For the same customer, according to the time mark, the first image among the remaining images to be identified whose feature similarity is not greater than the preset threshold is updated as a new reference image. For example, if the feature similarity between the reference image 1 and the remaining image 2 to be recognized is 80%, and the preset threshold is 90%, then the feature similarity corresponding to the reference image 1 and the remaining image 2 to be recognized is not greater than the preset threshold, then Time stamp, the remaining image 2 to be recognized is updated as a new reference image. Repeat the steps of calculating the feature similarity between the reference image and the remaining images to be recognized by using the similarity algorithm according to the time stamp, until the clustering of all the images to be recognized in at least two frames to be recognized is completed, forming at least two image clusters set.

Steps S221-S224, according to the time stamp, the first recognized face image in at least two frames to be recognized is used as the reference image, and the similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized in order to determine the reference image Whether it is the same customer as the remaining images to be recognized. If the feature similarity is greater than the preset threshold, the image to be recognized and the reference image with feature similarity greater than the preset threshold are attributed to the same image clustering set, so as to cluster the to-be-recognized images of the same customer. If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images whose feature similarity is not greater than the preset threshold to a new reference image, repeat the time mark, and use the similarity in turn The algorithm calculates the feature similarity between the reference image and the remaining images to be recognized until at least two frame of image clustering sets to be identified are completed to form at least two image clustering sets to achieve clustering of the images to be identified for the same customer. In order to subsequently determine the target sentiment of each customer for the product to be displayed.

In an implementation, as shown in FIG. 7, in step S30, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each image cluster set, and the single frame emotion of each image to be recognized is obtained. , Specifically including the following steps:

S31: Use the face key point algorithm to perform face recognition on the images to be recognized in each image cluster set, and obtain the face key points corresponding to each image to be recognized.

Among them, the face key point algorithm can be but not limited to Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature transform) algorithm, SURF (Speeded Up Robust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm. In this embodiment, the ERT algorithm is used to perform face recognition on the images to be recognized in each image cluster set, so as to obtain the key points of the faces corresponding to each image to be recognized. Among them, the ERT algorithm is a regression-based method, and the ERT algorithm is expressed as follows:

among them,

Is the shape or coordinates of the feature points of the image to be recognized obtained in the t+1 iteration, t represents the cascade number,

In order to predict the shape or coordinates of the feature points of the image, I is the image to be recognized as input by the regressor, and r _t represents the t-level regressor. Each regressor is composed of many regression trees, which can be obtained through training. Tree, through the regression tree to obtain the key points of the face corresponding to each image to be recognized.

S32: Use a feature extraction algorithm to perform feature extraction on the face key points corresponding to each image to be recognized, and obtain local features corresponding to the face key points.

Among them, the feature extraction algorithm can be the CNN (Convolutional Neural Network, convolutional neural network) algorithm. The CNN algorithm is used to extract the local features of the key points of the face corresponding to the image to be recognized. Specifically, the local features are extracted according to the location of the facial action unit. . Among them, the CNN algorithm is a feed-forward neural network, and its artificial neurons can respond to a part of the surrounding units within the coverage area, and can quickly and efficiently perform image processing. In this embodiment, a pre-trained convolutional neural network is used to quickly extract local features corresponding to key points of a human face.

Specifically, the key points of the face corresponding to each image to be recognized are subjected to a convolution operation through several convolution kernels, and the result of the convolution is the local feature corresponding to the face detection point. Specifically through the formula

Perform convolution operation to obtain local features. Among them, y is the output local feature, x is a two-dimensional input vector of size (M, N), which is formed by the coordinates of the key points of the face of L, w _ij is a convolution kernel of size I*J, b Is the bias, the size is M*N, the activation function is denoted by f, each convolution kernel is convolved with the face key points of the input image to be recognized in the previous layer, and each convolution kernel will have a corresponding The local features of the convolution kernel share the weights, and the number of parameters is greatly reduced, which greatly improves the training speed of the network.

Further, after the key points of the face are input into the preset convolutional neural network for recognition, the local features corresponding to the facial action unit can be obtained. For example, AU1, AU2, AU5, and AU26 are the local features corresponding to raised inner eyebrows, raised outer eyebrows, raised upper eyelids, and opened lower jaw. In this embodiment, the convolutional neural network is used to extract the local features of the key points of the face in the image to be recognized, so as to subsequently determine the target facial action unit based on the local features, and determine the customer's emotions based on the recognized target facial action unit. In this proposal, compared with the LBP-TOP operator, the use of convolutional neural network for recognition is faster and the recognition accuracy is higher.

S33: Use a pre-trained classifier to recognize local features, and obtain a target facial action unit corresponding to each local feature.

Specifically, the local features are recognized by each SVM classifier in the pre-trained micro-expression recognition model, where the SVM classifier and the number of recognizable facial motion units are the same, that is, 54 facial motion units can be recognized , Then there are 54 pre-trained SVM classifiers. By inputting local features into the corresponding SVM classifier, the probability value is obtained, and the obtained probability value is compared with the preset probability threshold, which will be greater than the preset probability The facial action unit corresponding to the probability value of the threshold is used as the target facial action unit corresponding to the local feature, and all target facial action units corresponding to the local feature are acquired.

S34: Based on the target facial action unit corresponding to each local feature, look up the evaluation table to obtain the single frame emotion of each image to be recognized.

Among them, the evaluation table is a pre-configured table. The evaluation table stores the correspondence between facial action unit combinations and emotions, for example, AU12, AU6, and AU7 combinations, the corresponding emotions are joy, AU9, AU10, AU17, and AU24 correspond to The emotion is disgust. The server searches the evaluation table through the target facial action unit corresponding to each local feature, obtains the combination that matches the target facial action unit, and uses the emotion corresponding to the combination as the single frame emotion corresponding to the image to be recognized.

In steps S31-S34, the face key point algorithm is used to perform face recognition on the to-be-recognized images in each image clustering set, to obtain the face key points corresponding to each to-be-recognized image, and provide technology for subsequent extraction of local features Assist to improve the accuracy of local feature extraction; use feature extraction algorithms to extract features of key points of the face to quickly obtain the local features corresponding to the key points of the face, so that the subsequent extracted target facial action units are more accurate; adopt The pre-trained classifier recognizes the local features to quickly obtain the target facial action unit corresponding to each local feature, and realize the determination of the target facial action unit. Based on the target facial action unit corresponding to each local feature, look up the evaluation table to obtain the single frame emotion of each image to be recognized, so as to determine the emotion corresponding to each customer according to the emotion of the single frame, and determine the target display according to the emotion corresponding to each customer Commodities to improve the accuracy of target display commodities.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In one embodiment, a product display device based on image recognition is provided, and the product display device based on image recognition has a one-to-one correspondence with the product display method based on image recognition in the foregoing embodiment. As shown in FIG. 8, the image recognition-based product display device package data acquisition module 10, image cluster collection acquisition module 20, single frame emotion determination module 30, target emotion acquisition module 40, final emotion acquisition module 50 and target display products Get the module 60. The detailed description of each functional module is as follows:

The data acquisition module 10 is configured to acquire target video data of the commodity to be displayed, and the target video data includes at least two frames of images to be recognized.

The image clustering set acquisition module 20 is used to use the face detection model to perform face recognition and clustering on at least two frames of images to be recognized, to acquire the number of customers corresponding to the target video data and the image clustering set corresponding to each customer. An image cluster set includes at least one frame to be recognized.

The single frame emotion determination module 30 is configured to use a pre-trained micro-expression recognition model to recognize images to be recognized in each image cluster set if the number of customers is greater than the preset number, and obtain a single frame of each image to be recognized Frame emotions.

The target emotion obtaining module 40 is configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame to be recognized.

The final emotion obtaining module 50 is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set.

The target display product obtaining module 60 is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.

In an embodiment, the data acquisition module 10 includes an initial video data acquisition unit 11, an attribute information determination unit 12 and a target video data formation unit 13.

The initial video data acquiring unit 11 is configured to acquire initial video data of the commodity to be displayed, and the initial video data includes at least two frames of initial video images.

The attribute information determining unit 12 is used to obtain attribute information of the commodity to be displayed, and the attribute information includes a suitable age and a suitable gender.

The target video data forming unit 13 is configured to filter the initial video images with a suitable age and a suitable gender, obtain the image to be recognized, and form target video data based on at least two frames of the image to be recognized.

In an embodiment, before the target video data forming unit, the commodity display device based on image recognition further includes an image resolution conversion unit.

The image resolution conversion unit is used to process at least two frames of initial video images by using super-resolution technology, obtain high-resolution images corresponding to the at least two frames of initial video images, and use the high-resolution images as the images to be determined.

The target video data forming unit includes a target age and target gender determination subunit, a matching subunit, and a target image determination subunit.

The target age and target gender determination subunit is used to identify at least two frames of images to be determined using a pre-trained classifier, and obtain the target age and target gender corresponding to each image to be determined.

The matching subunit is used to match the target age with the appropriate age, and match the target gender with the appropriate gender.

The to-be-recognized image determination subunit is used for if the target age is successfully matched with the appropriate age, and the target gender is successfully matched with the appropriate gender, then the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.

In an embodiment, the image cluster set acquisition module 20 includes a face image acquisition unit, an image cluster set acquisition unit, and a customer number determination unit.

The face image acquisition unit is configured to use a face detection model to perform face recognition on at least two frames of images to be recognized, and obtain a face image corresponding to each image to be recognized.

The image clustering set obtaining unit is configured to cluster the face images corresponding to the image to be recognized to obtain at least two image cluster sets, and each image cluster set includes at least one frame of the image to be recognized.

The customer number determining unit is used to obtain the number of customers corresponding to the target video data according to the number of image clustering sets.

In one embodiment, each image to be recognized corresponds to a time stamp.

The image cluster set acquisition unit includes a reference image determination subunit, a feature similarity calculation unit, a first image cluster set determination subunit, and a second image cluster set determination subunit.

The reference image determination subunit is used to use the first recognized face image in at least two frames of images to be recognized as the reference image according to the time stamp.

The feature similarity calculation unit is used to calculate the feature similarity between the reference image and the remaining images to be recognized by sequentially adopting a similarity algorithm according to the time mark.

The first image cluster set determining subunit is configured to, if the feature similarity is greater than the preset threshold, attribute the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set.

The second image clustering set determining subunit is used to update the first image in the remaining images to be identified whose feature similarity is not greater than the preset threshold according to the time mark if the feature similarity is not greater than the preset threshold to a new reference For the image, the steps of calculating the feature similarity between the reference image and the remaining image to be identified are repeated according to the time mark and sequentially using the similarity algorithm, until at least two image cluster sets to be identified are completed to form at least two image cluster sets.

In an embodiment, the single frame emotion determination module 30 includes a face key point acquisition unit, a local feature extraction unit, a target facial action unit acquisition unit, and a single frame emotion acquisition unit.

The face key point acquisition unit is used to perform face recognition on the image to be recognized in each image clustering set by using the face key point algorithm, and obtain the face key point corresponding to each image to be recognized.

The local feature extraction unit is used to perform feature extraction on the key points of the face corresponding to each image to be recognized by using a feature extraction algorithm to obtain the local features corresponding to the key points of the face.

The target facial action unit acquisition unit is used to recognize the local features using a pre-trained classifier, and acquire the target facial action unit corresponding to each local feature.

The single frame emotion acquisition unit is used to look up the evaluation table based on the target facial action unit corresponding to each local feature, and acquire the single frame emotion of each image to be recognized.

For the specific definition of the product display device based on image recognition, please refer to the above definition of the product display method based on image recognition, which will not be repeated here. The various modules in the above-mentioned image recognition-based product display device can be implemented in whole or in part by software, hardware and their combination. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the face detection model and the attribute information of the commodity to be displayed. The network interface of the computer device is used to communicate with external terminals through a network connection. When the computer-readable instructions are executed by the processor, a method for displaying goods based on image recognition is realized.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. The processor executes the computer-readable instructions to implement the The steps of the product display method based on image recognition are, for example, the steps S10 to S60 shown in FIG. 2 or the steps shown in FIG. 3 to FIG. 7. When the processor executes the computer-readable instructions, the functions of the modules/units in the commodity display apparatus based on image recognition in the foregoing embodiments are implemented, for example, the functions of modules 10 to 50 shown in FIG. 8. To avoid repetition, I won’t repeat them here.

In an embodiment, one or more readable storage media storing computer readable instructions are provided, the computer readable storage medium storing computer readable instructions, and the computer readable instructions are processed by one or more When the processor executes, the one or more processors execute the following steps to implement the product display method based on image recognition in the above method embodiment, for example, step S10 to step S60 shown in FIG. 2, or, FIG. 3 to FIG. 7 steps shown. When the computer readable instruction is executed by the processor, the function of each module/unit in the commodity display device based on image recognition in the above embodiment is realized, for example, the function of the module 10 to the module 60 shown in FIG. 8. To avoid repetition, I won’t repeat them here. The readable storage medium in this embodiment includes a nonvolatile readable storage medium and a volatile readable storage medium.

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile readable storage. The medium may also be stored in a volatile readable storage medium, and when the computer readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (RambuS) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In actual applications, the above-mentioned functions can be allocated by different functional units, Module completion means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of this application, and should be included in this application. Within the scope of protection.

Claims

A product display method based on image recognition, characterized in that it includes:

Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;

Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;

If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;

Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;

Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;

Obtain the target display product according to the final emotion corresponding to the product to be displayed.
The method for displaying products based on image recognition according to claim 1, wherein said obtaining target video data of the products to be displayed, said target video data including at least two frames of images to be recognized, comprises:

Acquiring initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images;

Acquiring attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;

The initial video image is screened using the appropriate age and the appropriate gender to obtain an image to be recognized, and target video data is formed based on at least two frames of the image to be recognized.
The product display method based on image recognition according to claim 2, characterized in that, before the step of screening the initial video image with the appropriate age and the appropriate gender, the image recognition-based Product display methods also include:

Use super-resolution technology to process at least two frames of the initial video image, obtain a high-resolution image corresponding to at least two frames of the initial video image, and use the high-resolution image as an image to be determined;

The screening the initial video image by using the appropriate age and the appropriate gender to obtain the image to be recognized includes:

Recognizing at least two frames of the to-be-determined images using a pre-trained classifier, and obtaining the target age and target gender corresponding to each of the to-be-determined images;

Matching the target age with the appropriate age, and matching the target gender with the appropriate gender;

If the target age is successfully matched with the suitable age, and the target gender is successfully matched with the suitable gender, the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
The product display method based on image recognition according to claim 1, wherein the face detection model is used to perform face recognition and clustering on at least two frames of the to-be-recognized images to obtain the corresponding target video data The number of customers and the image cluster set corresponding to each customer, including:

Using a face detection model to perform face recognition on at least two frames of the to-be-recognized images, and obtain a face image corresponding to each of the to-be-recognized images;

Clustering the face images corresponding to the image to be recognized to obtain at least two image cluster sets, each image cluster set including at least one frame of the image to be recognized;

According to the number of the image clustering sets, the number of customers corresponding to the target video data is obtained.
The method for displaying goods based on image recognition according to claim 4, wherein each of the images to be recognized corresponds to a time stamp;

The clustering of the face images corresponding to the image to be recognized to obtain at least two image cluster sets includes:

According to the time mark, use the first recognized face image in at least two frames of the image to be recognized as a reference image;

According to the time stamp, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized;

If the feature similarity is greater than the preset threshold, assign the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set;

If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images with the feature similarity not greater than the preset threshold to a new reference image, and repeat the time-based Identify the steps of calculating the feature similarity between the reference image and the remaining images to be identified by sequentially using a similarity algorithm, until at least two frames of the image cluster sets to be identified are completed, forming at least two image cluster sets.
The product display method based on image recognition according to claim 1, wherein the pre-trained micro-expression recognition model is used to recognize the image to be recognized in each image cluster set, and each The single frame emotion of the image to be recognized includes:

Using a face key point algorithm to perform face recognition on the images to be recognized in each of the image clustering sets, and obtain the face key points corresponding to each of the images to be recognized;

Using a feature extraction algorithm to perform feature extraction on the face key points corresponding to each of the images to be recognized, and obtain the local features corresponding to the face key points;

Use a pre-trained classifier to recognize the local features, and obtain a target facial action unit corresponding to each of the local features;

Based on the target facial action unit corresponding to each of the local features, the evaluation table is searched to obtain the single frame emotion of each of the images to be recognized.
A product display device based on image recognition, characterized in that it comprises:

A data acquisition module for acquiring target video data of a commodity to be displayed, where the target video data includes at least two frames of images to be recognized;

The image clustering collection acquisition module is used to perform face recognition and clustering on at least two frames of the to-be-recognized images by using a face detection model to acquire the number of customers corresponding to the target video data and the image clusters corresponding to each customer A set, each of the image cluster sets includes at least one frame to be recognized;

The single-frame emotion determination module is configured to, if the number of the customers is greater than the preset number, use a pre-trained micro-expression recognition model to identify the image to be recognized in each image cluster set, and obtain each Single frame emotion of the image to be recognized;

A target emotion obtaining module, configured to obtain the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;

The final emotion obtaining module is used to obtain the final emotion according to the number of customers and the target emotion of the customer corresponding to each image cluster set;

The target display product obtaining module is configured to obtain the target display product according to the final emotion corresponding to the product to be displayed.
8. The product display device based on image recognition of claim 7, wherein the data acquisition module comprises:

An initial video data acquiring unit, configured to acquire initial video data of a commodity to be displayed, the initial video data including at least two frames of initial video images;

The attribute information determining unit is configured to obtain attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;

The target video data forming unit is configured to filter the initial video image by using the suitable age and the suitable gender, obtain the image to be recognized, and form target video data based on at least two frames of the image to be recognized.
A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that, when the processor executes the computer-readable instructions, it is implemented as follows step:

Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;

Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;

If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;

Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;

Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;

Obtain the target display product according to the final emotion corresponding to the product to be displayed.
9. The computer device according to claim 9, wherein said acquiring target video data of the commodity to be displayed, said target video data including at least two frames of images to be recognized, comprises:

Acquiring initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images;

Acquiring attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;

The initial video image is screened using the appropriate age and the appropriate gender to obtain an image to be recognized, and target video data is formed based on at least two frames of the image to be recognized.
10. The computer device of claim 10, wherein the processor executes the computer-readable instructions before the step of screening the initial video image using the appropriate age and the appropriate gender It also implements the following steps:

Use super-resolution technology to process at least two frames of the initial video image, obtain a high-resolution image corresponding to at least two frames of the initial video image, and use the high-resolution image as an image to be determined;

The screening the initial video image by using the appropriate age and the appropriate gender to obtain the image to be recognized includes:

Recognizing at least two frames of the to-be-determined images using a pre-trained classifier, and obtaining the target age and target gender corresponding to each of the to-be-determined images;

Matching the target age with the appropriate age, and matching the target gender with the appropriate gender;

If the target age is successfully matched with the suitable age, and the target gender is successfully matched with the suitable gender, the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
The computer device according to claim 9, wherein the face detection model is used to perform face recognition and clustering on at least two frames of the to-be-recognized images to obtain the number of customers and the number of customers corresponding to the target video data. An image cluster set corresponding to a customer, including:

Using a face detection model to perform face recognition on at least two frames of the to-be-recognized images, and obtain a face image corresponding to each of the to-be-recognized images;

Clustering the face images corresponding to the image to be recognized to obtain at least two image cluster sets, each image cluster set including at least one frame of the image to be recognized;

According to the number of the image clustering sets, the number of customers corresponding to the target video data is obtained.
11. The computer device of claim 12, wherein each of the images to be recognized corresponds to a time stamp;

The clustering of the face images corresponding to the image to be recognized to obtain at least two image cluster sets includes:

According to the time mark, use the first recognized face image in at least two frames of the image to be recognized as a reference image;

According to the time stamp, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized;

If the feature similarity is greater than the preset threshold, assign the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set;

If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images with the feature similarity not greater than the preset threshold to a new reference image, and repeat the time-based Identify the steps of calculating the feature similarity between the reference image and the remaining images to be identified by sequentially using a similarity algorithm, until at least two frames of the image cluster sets to be identified are completed, forming at least two image cluster sets.
The computer device according to claim 9, wherein the pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image cluster sets, and each of the images to be recognized is obtained The single frame emotions include:

Using a face key point algorithm to perform face recognition on the images to be recognized in each of the image clustering sets, and obtain the face key points corresponding to each of the images to be recognized;

Using a feature extraction algorithm to perform feature extraction on the face key points corresponding to each of the images to be recognized, and obtain the local features corresponding to the face key points;

Use a pre-trained classifier to recognize the local features, and obtain a target facial action unit corresponding to each of the local features;

Based on the target facial action unit corresponding to each of the local features, the evaluation table is searched to obtain the single frame emotion of each of the images to be recognized.
One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, characterized in that, when the computer readable instructions are executed by one or more processors, Make the one or more processors execute the following steps:

Acquiring target video data of the commodity to be displayed, where the target video data includes at least two frames to be recognized;

Use a face detection model to perform face recognition and clustering on at least two frames of the to-be-recognized images, and obtain the number of customers corresponding to the target video data and the image cluster set corresponding to each customer, each of the image clusters The set includes at least one frame to be recognized;

If the number of customers is greater than the preset number, a pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and the single frame emotion of each image to be recognized is obtained;

Obtaining the target emotion of the customer corresponding to the image cluster set based on the single frame emotion of at least one frame of the image to be recognized;

Obtaining the final emotion according to the number of customers and the target emotion of the customer corresponding to each of the image clustering sets;

Obtain the target display product according to the final emotion corresponding to the product to be displayed.
15. The readable storage medium according to claim 15, wherein said acquiring target video data of a commodity to be displayed, said target video data including at least two frames of images to be recognized, comprises:

Acquiring initial video data of the commodity to be displayed, where the initial video data includes at least two frames of initial video images;

Acquiring attribute information of the commodity to be displayed, where the attribute information includes a suitable age and a suitable gender;

The initial video image is screened using the appropriate age and the appropriate gender to obtain an image to be recognized, and target video data is formed based on at least two frames of the image to be recognized.
The readable storage medium according to claim 16, wherein before the step of screening the initial video image with the appropriate age and the appropriate gender, the computer readable instruction is executed by one or When multiple processors execute, the one or more processors further execute the following steps:

Use super-resolution technology to process at least two frames of the initial video image, obtain a high-resolution image corresponding to at least two frames of the initial video image, and use the high-resolution image as an image to be determined;

The screening the initial video image by using the appropriate age and the appropriate gender to obtain the image to be recognized includes:

Recognizing at least two frames of the to-be-determined images using a pre-trained classifier, and obtaining the target age and target gender corresponding to each of the to-be-determined images;

Matching the target age with the appropriate age, and matching the target gender with the appropriate gender;

If the target age is successfully matched with the suitable age, and the target gender is successfully matched with the suitable gender, then the image to be determined corresponding to the target age and the target gender is used as the image to be recognized.
The readable storage medium according to claim 15, wherein the face detection model is used to perform face recognition and clustering on at least two frames of the to-be-recognized images to obtain the number of customers corresponding to the target video data The image cluster set corresponding to each customer includes:

Using a face detection model to perform face recognition on at least two frames of the to-be-recognized images, and obtain a face image corresponding to each of the to-be-recognized images;

Clustering the face images corresponding to the image to be recognized to obtain at least two image cluster sets, each image cluster set including at least one frame of the image to be recognized;

According to the number of the image clustering sets, the number of customers corresponding to the target video data is obtained.
18. The readable storage medium of claim 18, wherein each image to be recognized corresponds to a time stamp;

The clustering of the face images corresponding to the image to be recognized to obtain at least two image cluster sets includes:

According to the time mark, use the first recognized face image in at least two frames of the image to be recognized as a reference image;

According to the time stamp, a similarity algorithm is used to calculate the feature similarity between the reference image and the remaining images to be recognized;

If the feature similarity is greater than the preset threshold, assign the image to be recognized and the reference image with the feature similarity greater than the preset threshold to the same image cluster set;

If the feature similarity is not greater than the preset threshold, according to the time mark, update the first image of the remaining to-be-recognized images with the feature similarity not greater than the preset threshold to a new reference image, and repeat the time-based Identify the steps of calculating the feature similarity between the reference image and the remaining images to be identified by sequentially using a similarity algorithm, until at least two frames of the image cluster sets to be identified are completed, forming at least two image cluster sets.
The readable storage medium of claim 19, wherein the pre-trained micro-expression recognition model is used to recognize the images to be recognized in each of the image clustering sets, and each of the images to be recognized is obtained. Recognize the emotion of a single frame of an image, including:

Using a face key point algorithm to perform face recognition on the images to be recognized in each of the image clustering sets, and obtain the face key points corresponding to each of the images to be recognized;

Using a feature extraction algorithm to perform feature extraction on the face key points corresponding to each of the images to be recognized, and obtain the local features corresponding to the face key points;

Use a pre-trained classifier to recognize the local features, and obtain a target facial action unit corresponding to each of the local features;

Based on the target facial action unit corresponding to each of the local features, the evaluation table is searched to obtain the single frame emotion of each of the images to be recognized.