CN116645244A - Dining guiding system and method based on computer vision - Google Patents

Dining guiding system and method based on computer vision Download PDF

Info

Publication number
CN116645244A
CN116645244A CN202310613188.6A CN202310613188A CN116645244A CN 116645244 A CN116645244 A CN 116645244A CN 202310613188 A CN202310613188 A CN 202310613188A CN 116645244 A CN116645244 A CN 116645244A
Authority
CN
China
Prior art keywords
restaurant
user
visual
representing
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310613188.6A
Other languages
Chinese (zh)
Inventor
左智彬
宋添光
胡棵
左凌风
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Bozz Technology Co ltd
Original Assignee
Shenzhen Bozz Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Bozz Technology Co ltd filed Critical Shenzhen Bozz Technology Co ltd
Priority to CN202310613188.6A priority Critical patent/CN116645244A/en
Publication of CN116645244A publication Critical patent/CN116645244A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dining guiding system and a guiding method based on computer vision, which are characterized in that restaurant information is obtained, data preprocessing is carried out on the restaurant information to obtain a data set, visual features of a plurality of images in each view angle of a restaurant are extracted through a pre-trained deep learning model, visual feature vectors corresponding to beverages, foods, internal scenes and external scenes of the restaurant are spliced to construct a visual feature matrix, weights related to the visual feature matrix and users are projected to the visual feature vector, the visual feature vector is subjected to dimension reduction through an embedded matrix to obtain a low-dimension feature vector, the predicted score of the user on the restaurant is obtained, the visual feature of the restaurant is modeled to obtain the restaurant visual information, the preference of the user on the restaurant is predicted based on the restaurant visual information and the visual feature matrix, and the dining guiding is completed according to the preference of the user on the restaurant, so that personalized dining guiding is effectively carried out on the user.

Description

Dining guiding system and method based on computer vision
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a dining guiding system and method based on computer vision.
Background
Vision is one of the most important channels of feeling information for human beings, and has irreplaceable importance in life of human beings, and the irreplaceable visual information often influences people to make selection judgment. Restaurant guidance algorithms typically use user-side information, restaurant-side information, and the like to guide. However, the existing guiding system combines visual features, but only feature vectors of one picture or feature vectors of all pictures or average or maximum pooling vectors of all pictures are generally used for representing visual features of commodities, while in real life, most commodities have multi-view image information, such as guiding of a restaurant, if only a plurality of menu pictures of the dining hall are used, the pictures can represent food information of the restaurant, but visual information of the internal and external environments and drinks of the restaurant is lacking, and as images of all restaurants are pooled by using average or maximum values, differences of different categories of images cannot be fully utilized, so that accuracy of dining guiding to a user is reduced.
Disclosure of Invention
In view of the above, the invention provides a dining guiding system and a guiding method based on computer vision, which can effectively guide a user for personalized dining and improve the accuracy of dining guiding and dining experience of the user, so as to solve the technical problems, and the invention is realized by adopting the following technical scheme.
In a first aspect, the present invention provides a computer vision based dining guidance system comprising:
the data acquisition unit is used for acquiring restaurant information, carrying out data preprocessing on the restaurant information to obtain a data set, and marking the image categories in the data set as beverages, foods, internal scenes and external scenes, wherein the restaurant information comprises user ratings, user comments, restaurant images, restaurant positions and restaurant business information;
the feature extraction unit is used for extracting visual features of a plurality of images in each view angle of the restaurant through a pre-trained deep learning model, and aggregating the visual features of all the images in each category into a visual feature vector through average pooling;
the characteristic splicing unit is used for splicing the visual characteristic vectors corresponding to the beverage, food, internal scene and external scene of the restaurant to construct a visual characteristic matrix, wherein the visual characteristic matrix represents the multi-view visual characteristics of the restaurant;
the model construction unit is used for projecting the weight related to the visual feature matrix and the user to the visual feature vector, reducing the dimension of the restaurant visual feature vector through the embedded matrix to obtain a low-dimension feature vector, and combining the low-dimension feature vector with the implicit factor of the restaurant to construct a prediction model;
the dining guiding unit is used for obtaining the prediction score of the user on the restaurant, introducing the hierarchical attention to model the visual characteristics of the restaurant to obtain the visual information of the restaurant, predicting the preference of the user on the restaurant based on the visual information of the restaurant and the visual characteristic matrix, and guiding according to the preference of the user on the restaurant to complete dining guiding.
As a further preferred aspect of the above-described technical solution, the hierarchical attention includes an upper category attention layer and a lower image attention layer, each image category in the image attention layer is inputted by a plurality of image visual features, all visual features under each image category are integrated into one vector representing the visual features of the image category in the form of attention by calculating the attention of the image attention layer, the vector represents the visual information expression of the restaurant under the image category, and the visual feature vectors of all images are integrated into the visual information expression of the restaurant from the category attention layer.
As a further preferable mode of the technical scheme, the method for obtaining the prediction score of the restaurant by the user, introducing the hierarchical attention to model the visual characteristics of the restaurant to obtain the visual information of the restaurant comprises the following steps:
modeling user preferences among different images under the same image category, and adding an image-level attention mechanism into the expression of the image-level attention mechanismWherein W is α Representing a transformation matrix, b α Representing bias terms, for any restaurant i, f ict Representing the visual characteristics of the t-th image belonging to the c category, f is applied by a full connection layer ict Mapping to an implicit expression form u ict Using ReLU as an activation function, a commonly learned context vector u is used α To measure the importance of the visual features of each image, u ict And u α Is given with respect to u ict Score of u ict Representing importance under the current category, f ic A weighted sum representation representing all visual features of the class.
As a further preferable mode of the technical scheme, the expression added with the attention mechanism based on the category hierarchy is thatWherein W is β Representing a transformation matrix, b β Representing the bias term, f i Representing restaurant visual features obtained through a hierarchical attention layer, u β The context vector representing class level performs collaborative filtering work with the potential factors of the user using the obtained restaurant visual features as part of the restaurant potential factors.
As a further preferred aspect of the above technical solution, projecting weights related to the visual feature matrix and the user onto the visual feature vector, and dimension-reducing the restaurant visual feature vector by the embedding matrix to obtain a feature vector with low dimension, including:
mapping respective real scoring vectors of a user and a restaurant to implicit factors of the user and the restaurant in low-dimensional space learning through a multi-layer fully connected network, and adopting visual characteristicsEnhancing restaurant side information, adding visual factor f i And restaurant implicit factor p i Adding;
cosine similarity between the user side features and the restaurant side features is used as a prediction function of a model, and the expression of the prediction function of the model is as followsWherein->The representation model calculates the predictive score of user u for restaurant i, p i And q u Implicit scoring factors, f, representing restaurant i and user u, respectively i Visual factors representing the restaurant learned by the attention module, the restaurant side expression is represented by p i And f i Is expressed by calculating a restaurant factor (p i +f i ) And a user scoring factor q u Cosine similarity between to predict user's preference for restaurant, restaurant and user's implicit scoring vector p i And q u Respectively by two multi-layer fully connected networks FC it And FC (fiber channel) us Mapping to low-dimensional space, the input of the two multi-layer fully connected networks is respectively extracted from scoring vector R in interaction matrix R of user and restaurant i And r u ,r i Vector representing the score composition of all users for the ith restaurant, r u Representing the scoring vector of user u for all restaurants, the multi-layer fully connected network is used for viewing the restaurant vision factor f i And (5) performing dimension reduction.
As a further preferable aspect of the above-described aspect, the model is constructed by calculating a restaurant factor (p i +f i ) And a user scoring factor q u Cosine similarity between the two to predict the user's preference for restaurants, including:
one vector in the preset n-dimensional space represents one user, two users u and v are selected, and cosine is adopted to calculate the expression of the similarity between the two user vectors as Balancing the difference in ratings between different users by subtracting the mean of the user scores expressed as +.>Wherein L is u,v Representing a set of restaurants where user u and user v score simultaneously, L u 、L v Representing a set of restaurants that are individually rated by users u, v, R u,c 、R v,c Representing the score of user u, user v for restaurant c, +.>The average scoring values of the restaurants scored by the users u and v are respectively shown.
As a further preferable aspect of the above technical solution, the restaurant score prediction model has the expression of Wherein (1)>Representing the predicted score of user u for restaurant i, θ represents model parameters, F represents a function mapping parameters to predicted scores, two potential factors q u 、p i Representing user u and restaurant i, the two potential factors are expressed as +.>Wherein (1)>And->Full connection layer, sigma, representing user u and restaurant i, respectivelyFully connected network representing a single layer for substitution r i To make it and f i Keep consistent, q u And p i The cosine similarity between them is used to calculate the score that user u predicts for restaurant i.
As a further preferred aspect of the above technical solution, splicing the visual feature vectors corresponding to the beverage, the food, the internal view and the external view of the restaurant to construct the visual feature matrix includes:
the expression of the restaurant guide model of the multi-view visual information is Where α represents the global offset, β u And beta i A deviation term representing the user and the restaurant,and gamma i Potential factor vectors, θ, representing the user and restaurant, respectively u Visual factor representing user u, E representing the embedded matrix, using matrix F i To represent the visual characteristics of restaurant i, ω u Representing a 4 x 1 weight vector corresponding to four visual views of the restaurant, A representing the overall bias weight of the multi-view visual feature, W representing the category visual preference matrix of all users, the product of A and W representing the overall preference of the user for the multi-view visual appearance of the preset restaurant;
training the model by using maximum posterior estimation of Bayesian analysis, and presetting a training set D S The expression consisting of one triplet (u, i, k) isWherein u represents the user, i represents the restaurant the user has gone through, j represents the unknown restaurant, ++>Representing all the positive sample restaurant sets for user u,matrix factorization is used to predict user preferences, predicted values +.>Representation, predictive model->The expression of (2) isγ ij Representing gamma i And gamma j Difference between F ij Represents F i And F j The difference between them, the expression of the personalized ranking optimization criterion C is +.>Wherein σ represents a sigmoid function, λ θ Representing by adjusting regularized hyper-parameters.
As a further preferred aspect of the above technical solution, acquiring restaurant information and performing data preprocessing on the restaurant information to obtain a data set, includes:
expressing restaurants and attributes thereof as semantic vectors, calculating preference vectors of users by calculating weights of different attribute values in the restaurants and integrating the restaurants which are historically liked by the users, and presetting D ul Representing a user's history of favorite restaurant collections, restaurant D e D ul H represents restaurant entity in at least the map corresponding to restaurant d, and the user dining attribute triplet expression is B u ={(h,r,t)|h∈D ul (h, r, t) represents a triplet with restaurant entity h as a head entity, r represents an attribute of the restaurant entity, and t represents an attribute value;
by calculating the weights of different attribute characteristics of restaurant in the user history like restaurant as the attention of the user to the different attributes, the attribute r of restaurant d i The weight calculation expression of (2) is Wherein h, r i Respectively representing entity and attribute value vector corresponding to restaurant d obtained by training +.>Representing a projection matrix generated by the entity h and the relation thereof;
calculating a user diet interest vector based on the attribute weight and the attribute value vector, and after calculating the weight of each attribute in the user restaurant attribute triplet, weighting and summing all attribute value vectors of the user favorite restaurant to represent the expression of the user diet preference vector as followsWherein t is i Representing restaurant attribute value vectors generated by training, W ri Representing a certain attribute weight of the restaurant.
In a second aspect, the invention also provides a dining guiding method based on computer vision, which comprises the following specific steps:
acquiring restaurant information, carrying out data preprocessing on the restaurant information to obtain a data set, and marking image categories in the data set as beverages, foods, internal scenes and external scenes, wherein the restaurant information comprises user ratings, user comments, restaurant images, restaurant positions and restaurant business information;
extracting visual features of a plurality of images in each view angle of a restaurant through a pre-trained deep learning model, and aggregating the visual features of all images in each category into a visual feature vector through average pooling;
splicing visual feature vectors corresponding to beverages, foods, internal scenes and external scenes of the restaurant to construct a visual feature matrix, wherein the visual feature matrix represents multi-view visual features of the restaurant;
projecting the weight related to the visual feature matrix and the user to the visual feature vector, reducing the dimension of the restaurant visual feature vector through the embedded matrix to obtain a low-dimension feature vector, and combining the low-dimension feature vector with an implicit factor of the restaurant to construct a prediction model;
the method comprises the steps of obtaining a prediction score of a user on a restaurant, introducing hierarchical attention to model visual features of the restaurant to obtain restaurant visual information, predicting preference of the user on the restaurant based on the restaurant visual features and a visual feature matrix, and guiding according to the preference of the user on the restaurant to complete dining guiding.
The invention provides a dining guiding system and a guiding method based on computer vision, which are characterized in that restaurant information is obtained, data preprocessing is carried out on the restaurant information to obtain a data set, image categories in the data set are marked as beverage, food, internal view and external view, visual features of a plurality of images in each view angle of a restaurant are extracted through a pre-trained deep learning model, the visual features of all the images in each category are aggregated into a visual feature vector through averaging pooling, the visual feature vectors corresponding to the beverage, the food, the internal view and the external view of the restaurant are spliced to construct a visual feature matrix, weights related to the visual feature matrix and a user are projected to the visual feature vector, the visual feature vector of the restaurant is reduced to obtain a low-dimensional feature vector through an embedding matrix, the low-latitude feature vector is combined with an implicit factor of the restaurant to construct a prediction model, the prediction score of the restaurant by the user is obtained, the visual feature of the restaurant is modeled by introducing hierarchical attention to obtain restaurant visual information, the preference of the user on the restaurant is predicted based on the restaurant visual feature vector and dining preference of the restaurant by the user, dining preference of the restaurant is improved by the user, dining is guided by the user, and dining guiding is carried out on the dining accuracy of the user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a block diagram of a dining guidance system based on computer vision provided by the invention;
fig. 2 is a flowchart of a dining guiding method based on computer vision provided by the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Referring to fig. 1, the present invention provides a dining guiding system based on computer vision, comprising:
the data acquisition unit is used for acquiring restaurant information, carrying out data preprocessing on the restaurant information to obtain a data set, and marking the image categories in the data set as beverages, foods, internal scenes and external scenes, wherein the restaurant information comprises user ratings, user comments, restaurant images, restaurant positions and restaurant business information;
the feature extraction unit is used for extracting visual features of a plurality of images in each view angle of the restaurant through a pre-trained deep learning model, and aggregating the visual features of all the images in each category into a visual feature vector through average pooling;
the characteristic splicing unit is used for splicing the visual characteristic vectors corresponding to the beverage, food, internal scene and external scene of the restaurant to construct a visual characteristic matrix, wherein the visual characteristic matrix represents the multi-view visual characteristics of the restaurant;
the model construction unit is used for projecting the weight related to the visual feature matrix and the user to the visual feature vector, reducing the dimension of the restaurant visual feature vector through the embedded matrix to obtain a low-dimension feature vector, and combining the low-dimension feature vector with the implicit factor of the restaurant to construct a prediction model;
the dining guiding unit is used for obtaining the prediction score of the user on the restaurant, introducing the hierarchical attention to model the visual characteristics of the restaurant to obtain the visual information of the restaurant, predicting the preference of the user on the restaurant based on the visual information of the restaurant and the visual characteristic matrix, and guiding according to the preference of the user on the restaurant to complete dining guiding.
In this embodiment, the hierarchical attention includes an upper category attention layer and a lower image attention layer, each image category in the image attention layer is input by a plurality of image visual features, all visual features under each image category are integrated into one vector representing the visual features of the image category in the form of attention by calculating the attention of the image attention layer, the vector represents the visual information expression of the restaurant under the image category, and the visual feature vectors of all images are integrated into the visual information expression of the restaurant from the category attention layer. Obtaining a prediction score of a user on a restaurant, introducing hierarchical attention to model visual features of the restaurant to obtain restaurant visual information, and comprising: modeling user preferences among different images under the same image category, and adding an image-level attention mechanism into the expression of the image-level attention mechanismWherein W is α Representing a transformation matrix, b α Representing bias terms, for any restaurant i, f ict Representing the visual characteristics of the t-th image belonging to the c category, f is applied by a full connection layer ict Mapping to an implicit expression form u ict Using ReLU as an activation function, a commonly learned context vector u is used α To measure the importance of the visual features of each image, u ict And u α Is given with respect to u ict Score of u ict Representing importance under the current category, f ic A weighted sum representation representing all visual features of the class.
It should be noted that a real number set is presetRepresenting a user restaurant scoring matrix, where M and N represent the number of users and restaurants, respectively, using u and i to index users and restaurants, taking R u,i Digital evaluation of restaurant i by user uIf the user u does not interact with the restaurant i, R u,i =0, extracting column vector and row vector r from scoring matrix u And r i ,r u Representing the score of user u for all restaurants, and r i Representing the score of all users for restaurant i, using f i Representing the visual characteristics of restaurant i learned by the hierarchical attention module. The model is input into a user restaurant scoring matrix and multi-view visual features corresponding to each restaurant, the output of the model is the prediction score of the user on the restaurant, namely cosine similarity measurement of a user side hidden vector and a restaurant side hidden vector, in an image attention layer, each image category such as food, drink, restaurant internal view and restaurant external view has a plurality of image visual features as input, and by calculating the attention of the image layer, the model integrates all visual features under each image category into one vector representing visual information of the image category in the form of attention, and the vector is used for representing visual information expression of the restaurant under the image category. At the category attention layer, the model calculates the importance of different categories, integrates the visual feature vectors of all images into the visual feature expression of the restaurant, combines the obtained restaurant visual information with the depth matrix decomposition network, and predicts the preference of the user for the restaurant.
It should be understood that if a restaurant has t image categories, each having a different number of images, after the vision of the images is extracted, the feature vectors are clustered into k clusters by calculating the euclidean distance, the vision feature vector closest to the cluster center is retained, and k vision feature vectors are retained after each image category is sampled, by employing a strategy, each image category contains a fixed number of images for all restaurants, and redundant information is to be excluded, the vision feature for each restaurant is a fixed number of images, and redundant information is to be excluded, and the vision feature for each restaurant is of a fixed size. The visual features of the images are extracted by adopting a deep convolution network and integrated into a collaborative filtering framework, the multi-view visual features are fused through user weights, the user-related weights reflect the personalized restaurant visual preferences of the users, the weights among the users are different and independent, the model is applied to two real restaurant comment data sets, and personalized guidance is provided for the users. The images of different categories are used as auxiliary information for guiding, and the visual attention distribution of the user inside a single image, between the images and between the image categories can be analyzed layer by adopting a self-attention mechanism, so that the accuracy of dining guiding to the user is improved.
Optionally, the expression joining the category-hierarchy based attention mechanism isWherein W is β Representing a transformation matrix, b β Representing the bias term, f i Representing restaurant visual features obtained through a hierarchical attention layer, u β The context vector representing class level performs collaborative filtering work with the potential factors of the user using the obtained restaurant visual features as part of the restaurant potential factors.
In this embodiment, projecting weights related to the visual feature matrix and the user onto the visual feature vector, and reducing the dimensions of the restaurant visual feature vector by the embedding matrix to obtain a low-dimensional feature vector includes: mapping respective real scoring vectors of a user and a restaurant to implicit factors of the user and the restaurant in low-dimensional space learning through a multi-layer full-connection network, enhancing restaurant side information by adopting visual features, and obtaining visual factor f i And restaurant implicit factor p i Adding; cosine similarity between the user side features and the restaurant side features is used as a prediction function of a model, and the expression of the prediction function of the model is as followsWherein->The representation model calculates the predictive score of user u for restaurant i, p i And q u Implicit scoring factors, f, representing restaurant i and user u, respectively i Visual factor representing restaurant learned by attention moduleRestaurant side expression is expressed by p i And f i Is expressed by calculating a restaurant factor (p i +f i ) And a user scoring factor q u Cosine similarity between to predict user's preference for restaurant, restaurant and user's implicit scoring vector p i And q u Respectively by two multi-layer fully connected networks FC it And FC (fiber channel) us Mapping to low-dimensional space, the input of the two multi-layer fully connected networks is respectively extracted from scoring vector R in interaction matrix R of user and restaurant i And r u ,r i Vector representing the score composition of all users for the ith restaurant, r u Representing the scoring vector of user u for all restaurants, the multi-layer fully connected network is used for viewing the restaurant vision factor f i And (5) performing dimension reduction.
The model calculates the restaurant factor (p i +f i ) And a user scoring factor q u Cosine similarity between the two to predict the user's preference for restaurants, including: one vector in the preset n-dimensional space represents one user, two users u and v are selected, and cosine is adopted to calculate the expression of the similarity between the two user vectors asBalancing the difference in ratings between different users by subtracting the mean of the user scores expressed as +.> Wherein L is u,v Representing a set of restaurants where user u and user v score simultaneously, L u 、L v Representing a set of restaurants that are individually rated by users u, v, R u,c 、R v,c Representing the score of user u, user v for restaurant c, +.>Respectively represent the user u and the user v to evaluate the userThe average grading value of the divided dining rooms enhances the work timeliness and effectiveness of the dining guiding system.
Alternatively, the restaurant score prediction model is expressed as Wherein (1)>Representing the predicted score of user u for restaurant i, θ represents model parameters, F represents a function mapping parameters to predicted scores, two potential factors q u 、p i Representing user u and restaurant i, the two potential factors are expressed as +.>Wherein (1)>And->Representing the fully connected layers of user u and restaurant i, respectively, σ represents a single layer fully connected network for displacing r i To make it and f i Keep consistent, q u And p i The cosine similarity between them is used to calculate the score that user u predicts for restaurant i.
In this embodiment, splicing the visual feature vectors corresponding to the beverage, food, internal view and external view of the restaurant to construct the visual feature matrix includes: the expression of the restaurant guide model of the multi-view visual information isWhere α represents the global offset, β u And beta i Deviation item representing user, restaurant, +.>And gamma i Potential factor vectors, θ, representing the user and restaurant, respectively u Visual factor representing user u, E representing the embedded matrix, using matrix F i To represent the visual characteristics of restaurant i, ω u Representing a 4 x 1 weight vector corresponding to four visual views of the restaurant, A representing the overall bias weight of the multi-view visual feature, W representing the category visual preference matrix of all users, the product of A and W representing the overall preference of the user for the multi-view visual appearance of the preset restaurant; training the model by using a maximum a posteriori estimation of a Bayesian analysis, wherein the expression of the preset training set Ds consisting of a triplet (u, i, k) is ∈ -> Wherein u represents the user, i represents the restaurant the user has gone through, j represents the unknown restaurant, ++>All positive sample restaurant sets representing user u, matrix factorization was used to predict user preferences with +.>Representation, predictive model->The expression of (2) is +.> γ ij Representing gamma i And gamma j Difference between F ij Represents F i And F j The difference between them, the expression of the personalized ranking optimization criterion C is +.>Wherein σ represents a sigmoid function, λ θ Representing by adjusting regularized hyper-parameters.
It should be noted that, obtaining restaurant information and performing data preprocessing on the restaurant information to obtain a data set includes: expressing restaurants and attributes thereof as semantic vectors, calculating preference vectors of users by calculating weights of different attribute values in the restaurants and integrating the restaurants which are historically liked by the users, and presetting D ul Representing a user's history of favorite restaurant collections, restaurant D e D ul H represents restaurant entity in at least the map corresponding to restaurant d, and the user dining attribute triplet expression is B u ={(h,r,t)|h∈D ul (h, r, t) represents a triplet with restaurant entity h as a head entity, r represents an attribute of the restaurant entity, and t represents an attribute value; by calculating the weights of different attribute characteristics of restaurant in the user history like restaurant as the attention of the user to the different attributes, the attribute r of restaurant d i The weight calculation expression of (2) is Wherein h, r i Respectively representing entity and attribute value vector corresponding to restaurant d obtained by training +.>Representing a projection matrix generated by the entity h and the relation thereof; calculating a user diet interest vector based on the attribute weight and the attribute value vector, and after calculating the weight of each attribute in the user restaurant attribute triplet, weighting and summing all attribute value vectors of the user favorite restaurant to represent the expression of the user diet preference vector as followsWherein t is i Representing restaurant attribute value vectors generated by training, W ri Representing a certain attribute weight of a restaurant, employing a userModel training is carried out on the implicit feedback data of the restaurant attribute knowledge layer and the semantic information of the restaurant attribute knowledge layer so as to obtain more accurate user and restaurant characteristic representation, and the personalized restaurant guiding effect is improved.
Referring to fig. 2, the invention also provides a dining guiding method based on computer vision, which comprises the following specific steps:
s1: acquiring restaurant information, carrying out data preprocessing on the restaurant information to obtain a data set, and marking image categories in the data set as beverages, foods, internal scenes and external scenes, wherein the restaurant information comprises user ratings, user comments, restaurant images, restaurant positions and restaurant business information;
s2: extracting visual features of a plurality of images in each view angle of a restaurant through a pre-trained deep learning model, and aggregating the visual features of all images in each category into a visual feature vector through average pooling;
s3: splicing visual feature vectors corresponding to beverages, foods, internal scenes and external scenes of the restaurant to construct a visual feature matrix, wherein the visual feature matrix represents multi-view visual features of the restaurant;
s4: projecting the weight related to the visual feature matrix and the user to the visual feature vector, reducing the dimension of the restaurant visual feature vector through the embedded matrix to obtain a low-dimension feature vector, and combining the low-dimension feature vector with an implicit factor of the restaurant to construct a prediction model;
s5: the method comprises the steps of obtaining a prediction score of a user on a restaurant, introducing hierarchical attention to model visual features of the restaurant to obtain restaurant visual information, predicting preference of the user on the restaurant based on the restaurant visual features and a visual feature matrix, and guiding according to the preference of the user on the restaurant to complete dining guiding.
In this embodiment, a pre-trained neural network model is used to extract visual features of each image, such as three images (i) with internal views in a restaurant 1 ,i 2 ,i 3 ) For each image, extracting visual features with specific dimension such as 4096 dimension from the pre-training neural network model, and representing the visual feature vectors of the three imagesThe visual information of the internal scene of the restaurant is then spliced with the visual feature vectors of the food, beverage, internal scene and external scene of the restaurant to form a visual feature matrix which represents the multi-view visual features of the restaurant, and the visual feature matrix passes through the weight omega related to the user u ({ω 1234 And }) projecting a feature vector, wherein the weight reflects personalized visual preference of a user u to a restaurant, the feature vector is subjected to dimension reduction through an embedded matrix to obtain a low-dimension feature vector, and the visual feature vector is combined with an implicit factor of the restaurant and can be used as input of a matrix decomposition model of personalized prediction, so that the robustness of model training is improved, and the effectiveness of personalized dining guidance can be carried out on the user.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the present invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims (10)

1. A computer vision-based dining guidance system, comprising:
the data acquisition unit is used for acquiring restaurant information, carrying out data preprocessing on the restaurant information to obtain a data set, and marking the image categories in the data set as beverages, foods, internal scenes and external scenes, wherein the restaurant information comprises user ratings, user comments, restaurant images, restaurant positions and restaurant business information;
the feature extraction unit is used for extracting visual features of a plurality of images in each view angle of the restaurant through a pre-trained deep learning model, and aggregating the visual features of all the images in each category into a visual feature vector through average pooling;
the characteristic splicing unit is used for splicing the visual characteristic vectors corresponding to the beverage, food, internal scene and external scene of the restaurant to construct a visual characteristic matrix, wherein the visual characteristic matrix represents the multi-view visual characteristics of the restaurant;
the model construction unit is used for projecting the weight related to the visual feature matrix and the user to the visual feature vector, reducing the dimension of the restaurant visual feature vector through the embedded matrix to obtain a low-dimension feature vector, and combining the low-dimension feature vector with the implicit factor of the restaurant to construct a prediction model;
the dining guiding unit is used for obtaining the prediction score of the user on the restaurant, introducing the hierarchical attention to model the visual characteristics of the restaurant to obtain the visual information of the restaurant, predicting the preference of the user on the restaurant based on the visual information of the restaurant and the visual characteristic matrix, and guiding according to the preference of the user on the restaurant to complete dining guiding.
2. The computer vision-based dining guidance system of claim 1, wherein the hierarchical attention comprises an upper category attention layer and a lower image attention layer, each image category in the image attention layer being input by a plurality of image visual features, all visual features under each image category being integrated in form of attention into one vector representing the visual features of the image category by calculating the attention of the image attention layer, the vector representing the visual information representation of the restaurant under the image category, the visual feature vectors of all images being integrated from the category attention layer into the visual information representation of the restaurant.
3. The computer vision-based dining guidance system of claim 1, wherein obtaining a predicted score of a restaurant by a user, introducing hierarchical attention to model visual features of the restaurant to obtain restaurant visual information, comprises:
modeling user preferences among different images under the same image category, and adding an image-level attention mechanism into the expression of the image-level attention mechanismWherein W is α Representing a transformation matrix, b α Representing bias terms, for any restaurant i, f ict Representing the visual characteristics of the t-th image belonging to the c category, f is applied by a full connection layer ict Mapping to an implicit expression form u ict Using ReLU as an activation function, a commonly learned context vector u is used α To measure the importance of the visual features of each image, u ict And u α Is given with respect to u ict Score of u ict Representing importance under the current category, f ic A weighted sum representation representing all visual features of the class.
4. The computer vision based dining guidance system of claim 3, wherein the expression joining the category-level based attention mechanism isWherein W is β Representing a transformation matrix, b β Representing the bias term, f i Representing restaurant visual features obtained through a hierarchical attention layer, u β The context vector representing class level performs collaborative filtering work with the potential factors of the user using the obtained restaurant visual features as part of the restaurant potential factors.
5. The computer vision-based dining guidance system of claim 1, wherein projecting the weights of the visual feature matrix associated with the user onto the visual feature vector and dimension-reducing the restaurant visual feature vector by the embedding matrix to obtain a low-dimensional feature vector, comprises:
mapping respective real scoring vectors of a user and a restaurant to implicit factors of the user and the restaurant in low-dimensional space learning through a multi-layer full-connection network, enhancing restaurant side information by adopting visual features, and obtaining visual factor f i And restaurant implicit factor p i Adding;
cosine similarity between the user side features and the restaurant side features is used as a prediction function of a model, and the expression of the prediction function of the model is as followsWherein->The representation model calculates the predictive score of user u for restaurant i, p i And q u Implicit scoring factors, f, representing restaurant i and user u, respectively i Visual factors representing the restaurant learned by the attention module, the restaurant side expression is represented by p i And f i Is expressed by calculating a restaurant factor (p i +f i ) And a user scoring factor q u Cosine similarity between to predict user's preference for restaurant, restaurant and user's implicit scoring vector p i And q u Respectively by two multi-layer fully connected networks FC it And FC (fiber channel) us Mapping to low-dimensional space, the input of the two multi-layer fully connected networks is respectively extracted from scoring vector R in interaction matrix R of user and restaurant i And r u ,r i Vector representing the score composition of all users for the ith restaurant, r u Representing the scoring vector of user u for all restaurants, the multi-layer fully connected network is used for viewing the restaurant vision factor f i And (5) performing dimension reduction.
6. The computer vision based dining guidance system of claim 5, wherein the model is generated by calculating a restaurant factor (p i +f i ) And a user scoring factor q u Cosine similarity between to predict user preference for restaurant, including:
One vector in the preset n-dimensional space represents one user, two users u and v are selected, and cosine is adopted to calculate the expression of the similarity between the two user vectors as Balancing the difference in ratings between different users by subtracting the mean of the user scores expressed as +.>Wherein L is u,v Representing a set of restaurants where user u and user v score simultaneously, L u 、L v Representing a set of restaurants that are individually rated by users u, v, R u,c 、R v,c Representing the score of user u, user v for restaurant C, +.>The average scoring values of the restaurants scored by the users u and v are respectively shown.
7. The computer vision based dining guidance system of claim 6, further comprising:
the restaurant score prediction model has the expression of Wherein (1)>Representing the predicted score of user u for restaurant i, < >>Representing model parameters, F representing a function mapping parameters to prediction scores, two potential factors q u 、p i Representing user u and restaurant i, the two potential factors are expressed as +.>Wherein (1)>And->Representing the fully connected layers of user u and restaurant i, respectively, σ represents a single layer fully connected network for displacing r i To make it and f i Keep consistent, q u And p i The cosine similarity between them is used to calculate the score that user u predicts for restaurant i.
8. The computer vision-based dining guide system of claim 1, wherein stitching the visual feature vectors corresponding to the beverage, food, interior and exterior views of the restaurant to construct a visual feature matrix comprises:
the expression of the restaurant guide model of the multi-view visual information is Where α represents the global offset, β u And beta i Deviation item representing user, restaurant, +.>And gamma i Potential factor vectors, θ, representing the user and restaurant, respectively u A visual factor representing user u, E representing the embedded matrix, such thatUsing a matrix F i To represent the visual characteristics of restaurant i, ω u Representing a 4 x 1 weight vector corresponding to four visual views of the restaurant, A representing the overall bias weight of the multi-view visual feature, W representing the category visual preference matrix of all users, the product of A and W representing the overall preference of the user for the multi-view visual appearance of the preset restaurant;
training the model by using maximum posterior estimation of Bayesian analysis, and presetting a training set D s The expression consisting of a triplet (u, i, j) isWherein u represents the user, i represents the restaurant the user has gone through, j represents the unknown restaurant, ++>All positive sample restaurant sets representing user u, matrix factorization was used to predict user preferences with +.>Representation, predictive model->The expression of (2) is +.>γ ij Representing gamma i And gamma j Difference between F ij Represents F i And F j The difference between them, the expression of the personalized ranking optimization criterion C isWherein σ represents a sigmoid function, λ θ Representing by adjusting regularized hyper-parameters.
9. The computer vision-based dining guidance system of claim 1, wherein obtaining restaurant information and data preprocessing the restaurant information to obtain a data set comprises:
expressing restaurants and attributes thereof as semantic vectors, calculating preference vectors of users by calculating weights of different attribute values in the restaurants and integrating the restaurants which are historically liked by the users, and presetting D ul Representing a user's history of favorite restaurant collections, restaurant D e D ul H represents restaurant entity in at least the map corresponding to restaurant d, and the user dining attribute triplet expression is B u ={(h,r,t)|h∈D ul (h, r, t) represents a triplet with restaurant entity h as a head entity, r represents an attribute of the restaurant entity, and t represents an attribute value;
by calculating the weights of different attribute characteristics of restaurant in the user history like restaurant as the attention of the user to the different attributes, the attribute r of restaurant d i The weight calculation expression of (2) is Wherein h, r i Respectively representing entity and attribute value vector corresponding to restaurant d obtained by training +.>Representing a projection matrix generated by the entity h and the relation thereof;
calculating a user diet interest vector based on the attribute weight and the attribute value vector, and after calculating the weight of each attribute in the user restaurant attribute triplet, weighting and summing all attribute value vectors of the user favorite restaurant to represent the expression of the user diet preference vector as followsWherein t is i Representing restaurant attribute value vectors generated by training, W ri Representing a certain attribute weight of the restaurant.
10. A computer vision based dining guiding method of a computer vision based dining guiding system according to any of claims 1-9, characterized by the specific steps of:
acquiring restaurant information, carrying out data preprocessing on the restaurant information to obtain a data set, and marking image categories in the data set as beverages, foods, internal scenes and external scenes, wherein the restaurant information comprises user ratings, user comments, restaurant images, restaurant positions and restaurant business information;
extracting visual features of a plurality of images in each view angle of a restaurant through a pre-trained deep learning model, and aggregating the visual features of all images in each category into a visual feature vector through average pooling;
splicing visual feature vectors corresponding to beverages, foods, internal scenes and external scenes of the restaurant to construct a visual feature matrix, wherein the visual feature matrix represents multi-view visual features of the restaurant;
projecting the weight related to the visual feature matrix and the user to the visual feature vector, reducing the dimension of the restaurant visual feature vector through the embedded matrix to obtain a low-dimension feature vector, and combining the low-dimension feature vector with an implicit factor of the restaurant to construct a prediction model;
the method comprises the steps of obtaining a prediction score of a user on a restaurant, introducing hierarchical attention to model visual features of the restaurant to obtain restaurant visual information, predicting preference of the user on the restaurant based on the restaurant visual features and a visual feature matrix, and guiding according to the preference of the user on the restaurant to complete dining guiding.
CN202310613188.6A 2023-05-29 2023-05-29 Dining guiding system and method based on computer vision Pending CN116645244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310613188.6A CN116645244A (en) 2023-05-29 2023-05-29 Dining guiding system and method based on computer vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310613188.6A CN116645244A (en) 2023-05-29 2023-05-29 Dining guiding system and method based on computer vision

Publications (1)

Publication Number Publication Date
CN116645244A true CN116645244A (en) 2023-08-25

Family

ID=87614848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310613188.6A Pending CN116645244A (en) 2023-05-29 2023-05-29 Dining guiding system and method based on computer vision

Country Status (1)

Country Link
CN (1) CN116645244A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256756A1 (en) * 2004-05-17 2005-11-17 Lam Chuck P System and method for utilizing social networks for collaborative filtering
US20150187024A1 (en) * 2013-12-27 2015-07-02 Telefonica Digital España, S.L.U. System and Method for Socially Aware Recommendations Based on Implicit User Feedback
CN108171535A (en) * 2017-12-13 2018-06-15 天津科技大学 A kind of personalized dining room proposed algorithm based on multiple features
CN110119479A (en) * 2019-05-16 2019-08-13 苏州大学 A kind of restaurant recommendation method, apparatus, equipment and readable storage medium storing program for executing
US20210110306A1 (en) * 2019-10-14 2021-04-15 Visa International Service Association Meta-transfer learning via contextual invariants for cross-domain recommendation
KR102371787B1 (en) * 2021-01-25 2022-03-10 강병우 System for providing customized dietary management service
WO2022143482A1 (en) * 2020-12-31 2022-07-07 华为技术有限公司 Recommendation method, recommendation network, and related device
CN115221390A (en) * 2021-04-15 2022-10-21 天津科技大学 Mixed group restaurant recommendation fusing user preferences and trust relationships

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256756A1 (en) * 2004-05-17 2005-11-17 Lam Chuck P System and method for utilizing social networks for collaborative filtering
US20150187024A1 (en) * 2013-12-27 2015-07-02 Telefonica Digital España, S.L.U. System and Method for Socially Aware Recommendations Based on Implicit User Feedback
CN108171535A (en) * 2017-12-13 2018-06-15 天津科技大学 A kind of personalized dining room proposed algorithm based on multiple features
CN110119479A (en) * 2019-05-16 2019-08-13 苏州大学 A kind of restaurant recommendation method, apparatus, equipment and readable storage medium storing program for executing
US20210110306A1 (en) * 2019-10-14 2021-04-15 Visa International Service Association Meta-transfer learning via contextual invariants for cross-domain recommendation
WO2022143482A1 (en) * 2020-12-31 2022-07-07 华为技术有限公司 Recommendation method, recommendation network, and related device
KR102371787B1 (en) * 2021-01-25 2022-03-10 강병우 System for providing customized dietary management service
CN115221390A (en) * 2021-04-15 2022-10-21 天津科技大学 Mixed group restaurant recommendation fusing user preferences and trust relationships

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
朱佩佩: "基于用户间接信任及高斯填充的推荐算法", 《计算机科学》, vol. 46, no. 11, pages 178 - 184 *
李超: "基于知识图谱的个性化美食推荐方法研究", 《中国优秀硕士学位论文全文数据库 工程科技辑》, no. 1, pages 024 - 806 *
罗海华: "基于多视角视觉信息的餐厅推荐系统算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 10 *

Similar Documents

Publication Publication Date Title
US20220222920A1 (en) Content processing method and apparatus, computer device, and storage medium
CN111061946B (en) Method, device, electronic equipment and storage medium for recommending scenerized content
US7860347B2 (en) Image-based face search
CN110795571B (en) Cultural travel resource recommendation method based on deep learning and knowledge graph
CN112926396A (en) Action identification method based on double-current convolution attention
CN111209475B (en) Interest point recommendation method and device based on space-time sequence and social embedded ranking
Caicedo et al. Collaborative personalization of image enhancement
CN113590965B (en) Video recommendation method integrating knowledge graph and emotion analysis
CN112948625B (en) Film recommendation method based on attribute heterogeneous information network embedding
CN114662015A (en) Interest point recommendation method and system based on deep reinforcement learning
CN112712127A (en) Image emotion polarity classification method combined with graph convolution neural network
CN109886281A (en) One kind is transfinited learning machine color image recognition method based on quaternary number
CN115712780A (en) Information pushing method and device based on cloud computing and big data
CN107506419B (en) Recommendation method based on heterogeneous context sensing
CN114357307B (en) News recommendation method based on multidimensional features
CN111159543B (en) Personalized tourist place recommendation method based on multi-level visual similarity
CN116645501A (en) Unbiased scene graph generation method based on candidate predicate relation deviation
CN116645244A (en) Dining guiding system and method based on computer vision
CN114417166B (en) Continuous interest point recommendation method based on behavior sequence and dynamic social influence
CN116958740A (en) Zero sample target detection method based on semantic perception and self-adaptive contrast learning
CN115205768B (en) Video classification method based on resolution self-adaptive network
CN113536109B (en) Interest point recommendation method based on neural network and mobile context
CN113095084B (en) Semantic service matching method and device in Internet of things and storage medium
Rawat et al. Photography and exploration of tourist locations based on optimal foraging theory
CN114357306A (en) Course recommendation method based on meta-relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination