CN116070020A

CN116070020A - Food material recommendation method, equipment and storage medium based on knowledge graph

Info

Publication number: CN116070020A
Application number: CN202211737579.0A
Authority: CN
Inventors: 曾谁飞; 孔令磊; 张景瑞; 李敏; 刘卫强
Original assignee: Qingdao Haier Refrigerator Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Refrigerator Co Ltd; Haier Smart Home Co Ltd
Priority date: 2022-12-31
Filing date: 2022-12-31
Publication date: 2023-05-05

Abstract

The invention discloses a food material recommendation method based on a knowledge graph, which comprises the following steps of: acquiring real-time audio and video data, real-time text data and historical comment text data; preprocessing the voice data to obtain effective real-time voice, real-time text, real-time video and historical comment text data; transferring effective real-time voice and video data into voice text data and image text data; acquiring a real-time vector matrix according to the voice text data, the real-time text data and the image text data; acquiring a history vector matrix corresponding to the history comment text data; and based on the real-time vector matrix and the historical vector matrix, fusing the real-time semantic similarity and the historical semantic similarity to generate and output a food material recommendation list. The method effectively improves the accuracy of recommending food material content and improves the effect of users.

Description

Food material recommendation method, equipment and storage medium based on knowledge graph

Technical Field

The invention relates to the technical field of computers, in particular to a food material recommendation method, equipment and a storage medium based on a knowledge graph.

Background

Along with natural language understanding, multi-mode deep learning, recommendation systems, knowledge maps and wide application of multi-source heterogeneous big data technology in intelligent electrical appliances and refrigeration equipment, problems such as single semantic similarity extraction of food materials and sparse data exist, and the problems further influence the phenomena of inaccurate accuracy and the like of food material content recommendation, so that the experience effect of users is reduced.

At present, in the fields of intelligent electrical appliances and refrigeration equipment, the content recommendation method in the aspects of food materials or health knowledge and the like adopts a traditional collaborative filtering algorithm, a content recommendation algorithm and a fusion recommendation algorithm, and although most of the algorithms are recommendation algorithms based on semantic layers, the problem of sparse or cold start of intrinsic data still exists. It follows that the above method cannot extract optimal text semantic representation information. Therefore, how to realize recommendation of food materials, dietary health and nutrition knowledge in the aspects of deep learning, multi-source heterogeneous, multi-modal cognition, knowledge graph and other integrated fusion algorithms has not been proposed in the industry.

Disclosure of Invention

The invention aims to provide a food material recommendation method, equipment and storage medium based on a knowledge graph.

The invention provides a food material recommendation method based on a knowledge graph, which comprises the following steps:

acquiring real-time audio and video data, acquiring real-time text data and acquiring historical comment text data; preprocessing the real-time audio and video data, the real-time text data and the historical comment text data to obtain real-time voice data, real-time text data, real-time video data and historical comment text data; transferring the effective real-time voice data into voice text data; transferring the effective real-time video data into image text data; acquiring a real-time vector matrix according to the voice text data, the real-time text data and the image text data; acquiring a history vector matrix corresponding to the history comment text data; based on the real-time vector matrix and the historical vector matrix, fusing the real-time semantic similarity and the historical semantic similarity to generate a food material recommendation list; and outputting the food material recommendation list.

As a further improvement of the present invention, the "preprocessing the real-time audio/video data, the real-time text data, and the historical comment text data to obtain real-time voice data, real-time text data, real-time video data, and historical comment text data" specifically includes: performing data cleaning, format analysis, format conversion and data storage on the real-time audio and video data, the real-time text data and the historical comment text data to obtain the real-time audio and video data, the real-time text data and the historical comment text data; the effective audio and video data are subjected to voice and video separation by adopting a script or a third-party tool so as to obtain the real-time voice data and the real-time video data; preprocessing the real-time voice and video data, including: framing and windowing are carried out on the real-time voice data, and clipping and framing are carried out on the real-time video data; preprocessing the real-time text data and the historical comment text data, including: segmentation, stop word removal and duplication removal.

As a further improvement of the present invention, the "transferring the valid voice data into voice text data" specifically includes: extracting the effective voice data characteristics to obtain voice characteristics; inputting the voice characteristics into a voice recognition multichannel multi-size deep cyclic convolution network model to obtain first voice text data through transcription; outputting the alignment relation between the voice characteristic and the first voice text data based on a connection time sequence classification method to obtain second voice text data; acquiring key features of the second voice text data or weight information of the key features based on an attention mechanism; and combining the second voice text data and key features or weight information of the key features through a full-connection layer, and calculating scores through a classification function to obtain the voice text data.

As a further improvement of the present invention, the "extracting the valid voice data feature" specifically includes: and extracting the effective voice data characteristics and obtaining the Mel frequency cepstrum coefficient characteristics.

As a further improvement of the present invention, the "transferring the effective real-time video data into image text data" specifically includes: inputting the real-time video data into a 3D depth cyclic convolutional neural network for calculation to obtain image characteristics; inputting the image characteristics into a multichannel multi-size time convolution network for transcription to obtain first image text data; outputting the alignment relation between the image features and the first image text data based on a connection time sequence classification method to obtain second image text data; and combining the second image text data through the full connection layer, and calculating a score through a classification function to obtain the image text data.

As a further improvement of the present invention, the "obtaining a real-time vector matrix from the voice text data, the real-time text data, and the image text data" specifically includes: performing entity extraction and entity alignment processing on the text data to obtain a plurality of normalized entities; inquiring a food material knowledge graph based on each entity to obtain a corresponding entity vector; and inputting the entity vector into an attention mechanism for calculation to obtain the real-time vector matrix.

As a further improvement of the invention, the entity extraction and the food material knowledge graph are expressed by adopting triples; the entity alignment may be achieved using knowledge-based representation learning.

As a further improvement of the present invention, the "obtaining the historical comment vector matrix corresponding to the historical comment text data" specifically includes: performing entity extraction and entity alignment processing on the historical comment text data to obtain a plurality of normalized historical comment entities; inquiring a food material knowledge graph based on each historical comment entity to obtain a corresponding historical comment entity vector; and inputting the historical comment entity vector into an attention mechanism for calculation to obtain the historical comment vector matrix.

As a further improvement of the present invention, the history text data includes user comment data, a click amount of the food material by the user, and food material browsing information.

As a further improvement of the present invention, the "fusing real-time semantic similarity and historical semantic similarity to generate a food material recommendation list based on the real-time vector matrix and the historical vector matrix" specifically includes: obtaining maximum real-time semantic similarity according to the real-time vector matrix; obtaining the maximum historical semantic similarity according to the historical vector matrix; and fusing the real-time semantic similarity and the historical semantic similarity into a matrix-form target optimization function for calculation to generate a food material recommendation list.

As a further improvement of the present invention, the "obtaining the maximum real-time semantic similarity according to the real-time vector matrix" specifically includes: obtaining a plurality of food vector matrixes to be recommended according to the food knowledge graph; and respectively calculating the semantic similarity of each food material vector matrix to be recommended and the real-time vector matrix to obtain the maximum semantic similarity value.

As a further improvement of the present invention, the "obtaining the maximum historical semantic similarity according to the historical vector matrix" specifically includes: obtaining a plurality of food vector matrixes to be recommended according to the food knowledge graph; and respectively calculating the semantic similarity of each food material vector matrix to be recommended and the historical vector matrix to obtain the maximum semantic similarity value.

As a further improvement of the present invention, the "outputting the food material recommendation list" specifically includes: converting the generated food material recommendation list into voice for output, and/or converting the generated food material recommendation list into voice for output to a client terminal, and/or converting the generated food material recommendation list into text for output to the client terminal, and/or converting the generated food material recommendation list into image for output to the client terminal.

As a further improvement of the present invention, the "acquiring real-time audio and video data, acquiring real-time text data, and acquiring historical comment text data" specifically includes: acquiring the real-time audio and video data acquired by the voice acquisition device and/or acquiring the real-time audio and video data transmitted from the client terminal; acquiring the real-time text data acquired by a text acquisition device and/or acquiring the real-time text data transmitted from a client terminal; and acquiring the history comment text data stored internally, and/or acquiring the history comment text data stored externally, and/or acquiring the history comment text data transmitted from a client terminal.

As a further improvement of the present invention, the "transcribing the voice data into voice text data" further includes: and acquiring configuration data stored in an external cache, executing the multichannel multi-size deep cyclic convolutional neural network model calculation on the basis of the configuration data by the voice data, and performing text transcription.

The invention also provides an electrical apparatus comprising: a memory for storing executable instructions; and the processor is used for realizing the food material recommendation method based on the knowledge graph when the executable instructions stored in the memory are operated.

The present invention also provides a refrigerator including: a memory for storing executable instructions; and the processor is used for realizing the food material recommendation method based on the knowledge graph when the executable instructions stored in the memory are operated.

The invention also provides a computer readable storage medium which stores executable instructions which when executed by a processor realize the food recommendation method based on the knowledge graph.

The beneficial effects of the invention are as follows: the method provided by the invention is respectively and comprehensively innovated from various aspects such as different data source channels, different data types, multi-source heterogeneous data, fusion characteristics, diversification of output result touch modes and the like, so that the user text semantic characteristic information is mined from the historical comment data of the user on food materials, the preference of the user on the food materials, the food materials interested by the user and the like, and the optimal food material list is recommended by comprehensively considering the multi-factors such as semantics, the user food material pushing evaluation score and the like.

The real-time multi-source heterogeneous data and the historical data are used as training data sets, optimal food materials are predicted to be required by users, and abundant semantic feature information with complementarity and relevance is finally obtained, namely, the invention not only digs the real-time and historical semantic intrinsic relevance information between the users and the food materials, but also digs the intrinsic implicit semantic features between the food materials by considering the real-time and historical data with semantic relevance and complementarity such as the preference, comment and the like of the food materials of the users. Therefore, the invention effectively synthesizes the depth fusion model of multi-source heterogeneous data, multi-mode cognition, knowledge patterns and natural language understanding, and fully utilizes real-time and historical data, thereby improving the accuracy of recommending food material content and improving the effect of users.

Drawings

Fig. 1 is a block diagram of a model related to a food recommendation method based on a knowledge graph according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of steps of a food recommendation method based on a knowledge graph according to an embodiment of the invention.

FIG. 3 is a schematic diagram of steps for obtaining real-time audio/video data, real-time text data, and historical comment text data according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a preprocessing step performed on the real-time audio/video data, the real-time text data and the historical comment text data according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a step of converting the real-time voice data into voice text data according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a step of transferring the real-time video data into image text data according to an embodiment of the present invention.

FIG. 7 is a schematic diagram showing a step of obtaining a real-time vector matrix based on the text data according to an embodiment of the present invention _。

FIG. 8 is a schematic diagram illustrating a step of generating a food recommendation list based on real-time semantic similarity and historical semantic similarity according to an embodiment of the present invention _。

Detailed Description

The present invention will be described in detail below with reference to specific embodiments shown in the drawings. These embodiments are not intended to limit the invention and structural, methodological, or functional modifications of these embodiments that may be made by one of ordinary skill in the art are included within the scope of the invention.

It should be noted that the term "comprises," "comprising," or any other variation thereof is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The embodiment of the invention relates to a food material recommendation method based on a knowledge graph. Although the present application provides the method operational steps described in the following embodiments or flowchart 1, the method is not limited to the order of execution provided in the embodiments of the present application in logically no steps of necessary causal relationships based on routine or non-inventive labor.

Fig. 1 is a block diagram of a model related to a food material recommendation method based on a knowledge graph, and fig. 2 is a schematic diagram of steps of the food material recommendation method based on the knowledge graph, which includes:

s1: and acquiring real-time audio and video data, real-time text data and historical comment text data.

S2: and preprocessing the real-time audio and video data, the real-time text data and the historical comment text data to obtain real-time voice data, real-time text data, real-time video data and historical comment text data.

S3: and converting the real-time voice data into voice text data.

S4: and converting the real-time video data into image text data.

S5: and obtaining a real-time vector matrix according to the voice text data, the real-time text data and the image text data.

S6: and obtaining a history vector matrix according to the history comment text data.

S7: and based on the real-time vector matrix and the historical vector matrix, fusing the real-time semantic similarity and the historical semantic similarity to generate a food material recommendation list.

S8: and outputting the food material recommendation list.

The method provided by the invention can be used for the intelligent electronic equipment to realize the functions of real-time interaction with the user or message pushing and the like based on the real-time audio and video data input of the user. In this embodiment, an intelligent refrigerator is taken as an example, and the method is described with reference to a pre-trained deep learning model. Based on audio and video input of a user, the intelligent refrigerator predicts the food material score of the user according to the real-time audio and video data of the user, the real-time text data and the historical comment text data related to the food material of the user, a recommended food material or food material list can be obtained according to the score result, and the food material name or food material list to be output is calculated according to the result information of the recommended food material or food material list.

As shown in fig. 3, in step S1, it specifically includes:

s11: acquiring the real-time audio and video data acquired by the acquisition device, and/or

And acquiring the real-time audio and video data transmitted from the client terminal.

S12: acquiring the real-time text data acquired by the acquisition device, and/or

And acquiring the real-time text data transmitted from the client terminal.

S13: acquiring internally stored historical comment text data, and/or

Acquiring externally stored historical comment text data, and/or

And acquiring historical comment text data transmitted by the client terminal.

The real-time audio and video data refers to real-time voice and real-time video data, and the real-time audio and video data is preprocessed and separated to generate real-time voice data and real-time video data. Of course, the real-time voice data and the video data can also be collected independently through corresponding collecting devices. The real-time voice refers to an inquiry or instruction statement which is currently uttered by a user to the intelligent electronic equipment or to the client terminal equipment and the like which are in communication connection with the intelligent electronic equipment, and the real-time voice also can be that the voice acquisition device acquires voice information sent by the user. As in the present embodiment, the user may present problems such as "instant vegetables in the refrigerator today", "which beef food materials in the refrigerator today" or the user may issue a command instruction such as "delete all food materials". The real-time video data is a real-time video image obtained by shooting in real time by using the intelligent electronic device or the client terminal device connected with the intelligent electronic device in a communication way, for example, in the real-time mode, a video head built in the intelligent refrigerator is used for shooting a face image of a user, and a lip region characteristic image is extracted from the face image to identify text content corresponding to the image, for example, image text data of 'vegetables in the refrigerator today' is identified.

The historical comment text data refer to related historical comments of a user on food materials, click quantity of related food materials, browsing information related to the food materials and the like in the conventional use process, and the data comprise preference, interest or comment of the user on the food materials, further, the historical comment text data of the user food materials, which are input by the user, and the like. Specifically, in this embodiment, the text data of the user food history comment may further include: the method comprises the steps that a command sent by a user in the past or text data related to food material related problems are obtained, the obtained text data contains information related to current real-time audio-video data or real-time text data, and the obtained text data can also be explanatory text information sent by the user according to the food material in the past in the use process, the obtained historical comment text data can be used as a part of a data set of a pre-training and prediction model, so that single semantic representation of the real-time audio-video data and the real-time text data can be effectively supplemented, and semantic features are enriched.

In the present embodiment, the audio and video capturing devices such as a camera and a video camera disposed in the intelligent refrigerator can capture real-time audio and video of the user, and when the user needs to interact with the intelligent refrigerator during use, the user can directly send out voice to the intelligent refrigerator. And, also accessible is based on wireless communication protocol with intelligent refrigerator and is connected customer terminal obtains the real-time audio and video data of user that comes of transmission, customer terminal is the electronic equipment that has information send function, for example intelligent electronic equipment such as cell-phone, panel computer, intelligent camera, intelligent wrist-watch, APP or bluetooth, in the use, the user directly to customer terminal send pronunciation or directly use the built-in camera of refrigerator shoot can, customer terminal gathers after the audio and video and transmits to intelligent refrigerator through wireless communication mode such as wifi or bluetooth. Therefore, the method for acquiring the real-time audio and video of multiple channels is not limited to the way of sending out voice to the intelligent refrigerator. When the user has interaction demands, the user can send out real-time voice through any convenient channel, so that the using convenience of the user can be remarkably improved. In other embodiments of the present invention, one or more of the above-mentioned real-time audio/video data acquisition methods may be used, or the real-time audio/video data may be acquired through other channels based on the prior art, which is not particularly limited in the present invention.

In this embodiment, as described in step S12, the real-time text information of the intelligent refrigerator and the user in the interaction process can be collected through a plurality of real-time text collection devices such as a mobile phone, a pad, an app, an applet, a public number, a web or a customer service, or direct input of the user.

As described in step S13, in the present embodiment, the history comment text data stored in the internal memory of the smart refrigerator may be read. And the storage space of the intelligent refrigerator can be further expanded by setting the external storage device. And, the historical comment text data stored at a client terminal such as a mobile phone, a tablet computer or an application software server side or the like can also be obtained. The method realizes the acquisition channel of various historical comment text data, and can greatly improve the data volume of the historical comment text, thereby improving the accuracy of subsequent voice recognition and video image recognition. In other embodiments of the present invention, one or more of the above-mentioned methods for obtaining historical comment text data may be used, or the historical comment text data may be obtained through other channels based on the prior art, which is not particularly limited in the present invention.

Further, in this embodiment, the intelligent refrigerator is configured with an external buffer, at least a part of the historical audio/video data is stored in the external buffer, as the service time increases, the historical comment text data increases, by storing a part of the data in the external buffer, the internal storage space of the intelligent refrigerator can be saved, and when the neural network calculation is performed, the historical comment text data stored in the external buffer is directly read, so that the algorithm efficiency can be improved.

Specifically, in this embodiment, a dis component is used as the external cache, where the dis component is a current distributed cache system that uses a relatively wide key/value storage structure, and may be used as a database, a cache, and a message queue agent. Other external caches, such as Memcached, may also be employed in other embodiments of the invention, as the invention is not limited in this regard.

In summary, in step S11 to step S13, real-time audio and video data, real-time text data and user food history comment data can be flexibly obtained through multiple channels, so that the user experience is improved, the data volume is ensured, and the algorithm efficiency is effectively improved.

As shown in fig. 4, in step S2, it specifically includes the steps of:

s21: and carrying out data cleaning on the real-time audio and video data and the historical comment text data to obtain the real-time audio and video data and the historical comment text data.

S22: and performing voice and video separation on the real-time audio and video data to obtain the real-time voice data and the real-time video data.

S23: preprocessing the real-time voice and video data, including: and framing and windowing the real-time voice data, and cutting and framing the real-time video data.

S24: preprocessing the real-time text data and the historical comment text data, including: segmentation, stop word removal and duplication removal.

In step S21, the data cleaning of the real-time audio and video data and the historical comment text data specifically includes:

the method comprises the steps of obtaining a certain amount of real-time audio and video data sets and user food material history comment text data, illustratively, importing a data cleaning model in a file form for processing, carrying out data format analysis and data format conversion on data which do not meet the file importing format in order to prevent data importing failure, deleting irrelevant data, repeated data, abnormal value processing, missing value data and the like in the data sets, primarily screening information irrelevant to classification, cleaning the audio and video data, and outputting and storing the cleaned data in a specified format to obtain effective audio and video data and user food material history comment text data.

In step S22, a script or a third party audio/video separation tool is used to perform audio and video separation on the effective audio/video data, thereby obtaining audio data and video data.

In the embodiment of the invention, the python language can be adopted to write the audio and video separation script, or an audio and video separation tool of a third party can be adopted to separate the input audio and video data, so as to realize the separation of voice and video and obtain the classified voice and video data.

In step S23, the classified voices are segmented according to the designated time period or the sampling number, the framing processing of the voices is completed to obtain voice signal data, and the voice signals originally containing noise are enabled to present the characteristics of signal reinforcement and signal periodicity through the function of the window function, so that the windowing processing is completed, and the subsequent better extraction of characteristic parameters of the voices is facilitated. The step S23 further includes cropping the valid video data to generate a plurality of frames of pictures, specifically, the method of writing a script may be adopted to load the video data and read the video information first, then decode the video according to the video information, determine how many pictures are displayed per second, thereby obtaining single-frame image information, where the single-frame image information includes a width and a height of each frame of pictures, and finally store the video into a plurality of pictures. Therefore, through the processing of step S23, effective voice data and image data can be obtained. Other video framing methods, such as third party video cropping tools, may also be employed in other embodiments of the present invention, as the invention is not limited in this regard.

In step S24, the text data is preprocessed, and further, in this embodiment, the real-time text data and the historical comment text data may be subjected to multiple text preprocessing methods such as word segmentation, word deactivation, word duplication removal, word frequency statistics, etc. by a script running manner or a third party tool, and specific preprocessing contents may be selected according to actual requirements, which is not particularly limited in the present invention.

As shown in fig. 5, in step S3, the clock specifically includes:

s31: and extracting the effective voice data characteristics to obtain voice characteristics.

S32: and inputting the voice characteristics into a voice recognition multichannel multi-size deep cyclic convolutional neural network model to be transcribed so as to obtain first voice text data.

S33: and outputting the alignment relation between the voice characteristic and the first voice text data based on a connection time sequence classification method so as to obtain second voice text data.

And S34, acquiring key features of the second voice text data or weight information of the key features based on an attention mechanism.

S35: and combining the second voice text data and key features or weight information of the key features through a full-connection layer, and calculating scores through a classification function to obtain the voice text data.

In step S31, extracting the valid voice data features specifically includes:

extracting the voice data features and obtaining the Mel frequency cepstrum coefficient features (Mel-scale Frequency Cepstral Coefficients, MFCC for short). The MFCC is a distinguishing component in the voice signal, and is a cepstrum parameter extracted in the Mel scale frequency domain, wherein the Mel scale describes the nonlinear characteristic of the human ear frequency, and the parameter of the MFCC considers the sensitivity of the human ear to different frequencies, so that the MFCC is particularly suitable for voice recognition and speaker recognition.

In the embodiment of the invention, the characteristic parameters such as the perceptual linear prediction characteristic (Perceptual Linear Predictive, abbreviated as PLP) or the linear prediction coefficient characteristic (Linear Predictive Coding, abbreviated as LPC) of the voice data can be obtained through different algorithm steps to replace the MFCC characteristic, and the method is not particularly limited in this respect, and can be specifically adjusted according to the actual application scene and the adopted model parameters.

The specific algorithm steps involved in the above steps may refer to the current state of the art, and the specific contents are not specifically described herein.

In step S32, text content transcription is performed on the valid voice data through a network model in the automatic voice recognition technology, so as to obtain the first voice text data.

In this embodiment, a multi-channel multi-size deep cyclic convolutional neural network model is constructed by increasing the width of the network to realize the task of converting speech into text, and the deep cyclic convolutional neural network model is composed of a plurality of layers of deep cyclic convolutional network models, wherein the deep cyclic convolutional neural network model generally comprises a plurality of convolutional layers and a plurality of full-connection layers, and various nonlinear operations and pooling operations are included in the middle of the deep cyclic convolutional neural network model, and the deep cyclic convolutional neural network model is mainly used for processing data of a grid structure, so that the model can filter contours among adjacent pixels by using a filter. In addition, the model firstly proposes the characteristic value of the voice, and then calculates the characteristic value instead of the original voice data value. Therefore, compared with the traditional cyclic neural network, the deep cyclic convolutional neural network model has the advantages of small calculation amount and easiness in describing local features, the model can be endowed with better time domain or frequency domain invariance by sharing weights and pooling layers, and the model can have strong characterization capability due to a deeper nonlinear structure. In addition, the multi-channel multi-size can extract voice characteristics from different perspectives, acquire more voice characteristic information and have better voice recognition precision.

Specifically, in the present embodiment, in step S32, the multi-channel multi-size deep circular convolutional neural network used is composed of 3*3 convolutional layers, 32 channel numbers, and one layer of maximum pooling.

In step S33, the alignment of the input speech feature sequence and the output speech text feature sequence is obtained by using the connection timing classification method (Connectionist temporal classification, CTC).

In this embodiment, it is difficult to construct a precise mapping relationship between the effective speech data and the text of the first speech text data, thereby increasing the difficulty of subsequent speech recognition. In order to solve the problem, a time sequence classification method is adopted, the method is generally used after a convolutional network model is used, the method is a complete end-to-end acoustic model training method, the data is not required to be aligned in advance, only one input sequence and one output sequence are required to be trained, the data is not required to be aligned and labeled one by one, and meanwhile the probability of sequence prediction can be directly output. From this predictive probability we can obtain the most likely text output result to get the second phonetic text data.

Further, in step S34, the attention mechanism may guide the deep volume neural network to pay attention to the more critical feature information and suppress other non-critical feature information, so that by introducing the attention mechanism, local critical features or weight information of the second voice text data can be obtained, thereby further reducing the irregular error alignment phenomenon of the sequence during model training.

Here, in step S35, according to the second voice text data and the weight information of the key features or the key features thereof, the second voice text data is given its own weight information through a model in which a self-attention mechanism and a full-connection layer are fused, so that the internal weight information of the text semantic features of the voice text data is better obtained, the importance of different parts of the text semantic feature information is enhanced, and finally the voice text data is obtained by calculating scores through a classification function.

As shown in fig. 6, in step S4, it specifically includes:

s41: and inputting the real-time video data into a 3D deep cyclic convolutional neural network for calculation to obtain image characteristics.

S42: and inputting the image characteristics into a multichannel multi-size time convolution network for transcription to obtain first image text data.

S43: and outputting the alignment relation between the image features and the first image text data based on a connection time sequence classification method to obtain second image text data.

S44: and combining the second image text data through the full connection layer, and calculating a score through a classification function to obtain the image text data.

In step S41, considering the image features of the speaker, the possible identified sentences are complex, such as different sentence lengths, different sentence pause positions or different word structures, and various situations such as relevance of the image features thereof, in this embodiment, the real-time video data is input into a 3D convolutional neural network model, and by adding time dimension information, more expressive features can be extracted, and the 3D convolutional neural network model can solve the relevance information between multiple pictures, and takes continuous multi-frame images as input, and by adding a new dimension information, motion information in the input frame is captured, so that the image features thereof can be better obtained.

In step S42, the result generated by the 3D deep cyclic convolutional neural network model in step S41 is input into the multichannel multi-dimensional time deep cyclic convolutional neural network model, and after the calculation of the multichannel multi-convolutional kernel, a plurality of feature maps with the same number as the number of the convolutional kernels are output, for example, taking a 3-channel input and a convolutional layer of 2 convolutional kernels as an example, and 2 feature maps are output after the convolution calculation. Considering a sentence-level video image recognition method, in this embodiment, two steps of Pinyin sequence recognition (LipPic to Pinyin, P2P) and kanji sequence recognition (Pinyin to Chinese-chamacter, P2 CC) are adopted to implement the video image lip recognition method, and the method is implemented as a chinese lip recognition method. Specifically, the time sequence image features generated by the multi-channel multi-size time deep cyclic convolutional neural network model processing are mapped into a pinyin sequence of pinyin sentences, the pinyin sequence is translated into a Chinese character sequence of Chinese character sentences, and finally the first image text data is obtained. Of course, the method for identifying other Chinese lips is not particularly limited, and the method is within the protection scope of the invention as long as the conversion of video images into corresponding text data can be realized.

In steps S43 and S44, the same as the above-mentioned voice data processing method, a continuous time sequence classification method is also adopted, so that the mapping relationship between the effective video data and the text of the first image text data is realized, so as to obtain the second image text data. And the second image text data is endowed with self weight information and/or associated weight information through a model in which a self-attention mechanism and a full-connection layer are fused, so that the internal weight information and/or associated weight information of the text semantic features of the image text data are better obtained, the importance of different parts of the text semantic feature information is enhanced, and finally the image text data is obtained through calculating scores through a classification function. The specific processing procedure is the same as the above-mentioned voice data processing steps, and will not be described here.

As shown in fig. 7, in step S5, it specifically includes:

s51: and performing entity extraction and entity alignment processing on the text data to obtain a plurality of normalized entities.

S52: and based on each entity query food material knowledge graph, obtaining a corresponding entity vector.

S53: and inputting the entity vector into an attention mechanism for calculation to obtain the real-time vector matrix.

In step S51, the entity extraction, also called named entity recognition (Named Entities Recognition, NER), mainly aims at recognizing the text scope of the named entity and classifying it into predefined categories, which are generally related to academic terms, including three major categories, entity category, time category, number category, and mainly extracts atomic information elements in text data, such as person name, organization/organization name, geographical location, event/date, character value, etc. The entity is the most basic element of the knowledge graph, and the integrity, accuracy and recall rate of entity extraction directly affect the quality of the knowledge graph.

In the embodiment of the invention, entity extraction is performed on the text data according to the text data converted from the acquired multi-source heterogeneous data, and as the emphasis point of collecting knowledge by different knowledge maps is not passed, the description of the same entity in different knowledge maps is different. In order to determine whether two or more entities of different information sources point to the same object in the real world, entity alignment processing is required to be performed on the result extracted by the entities, an alignment relationship is constructed on a plurality of entities representing the same object, and information contained in the entities is fused and aggregated. Entity alignment is a special cross-network relationship, and according to the characteristics of the entity alignment relationship, specifically, the entity alignment can be realized by adopting a knowledge representation learning-based method, so that standardized or demand-satisfied entity, such as food related entity information, is obtained.

In step S52, the knowledge graph is essentially a knowledge base called semantic network, i.e. a knowledge base with a graph structure, which is a formalized description framework of relatively general semantic knowledge, representing semantic symbols by nodes and semantic relationships between symbols by edges.

In the embodiment of the invention, the food entity vector corresponding to the entity extracted by the entity is obtained according to the pre-constructed food knowledge graph. Specifically, according to the knowledge representation method in the knowledge graph, the relation between entities in the real world is described by using an entity-relation-entity triplet, and a net-shaped knowledge structure is formed through the relation. The knowledge graph distributed representation learning performs distributed representation on the entities and the relations in the knowledge graph to obtain low latitude vector representations containing semantic relations, and then corresponding food entity vectors are obtained. According to the knowledge representation method, multi-granularity and multi-level semantic relations such as entities, categories, attributes and relations are more reflected, so that semantic information in text data is enriched.

In step S53, in order to pay attention to the correlation semantic information between different entities or the entity itself after the extraction of the related entities in the food material knowledge graph and the complementary relationship in the related food material text data, the entity vector generated in step S52 is input to the attention mechanism, and different attention weights are assigned to the entities by guiding the entity characteristics in the multi-hop neighborhood of the given entity or node, that is, different attention is given to different entities according to the actual requirement, and then information is acquired from the corresponding entities according to the weights, so as to obtain the real-time vector matrix. Therefore, based on the food material knowledge graph, vectors of a plurality of different entities obtained by entity extraction are fused and aggregated, deep semantics in various text data are fully mined, the defect of single characteristics in voice and video data is supplemented, semantic representation capability in the text data is enriched, and subsequent text classification capability is optimized.

Also, in step S6, the operation steps specifically included are similar to those of step S5 described above:

specifically, in the embodiment of the invention, entity extraction and entity alignment processing are also performed on the user food material history comment text data so as to obtain normalized food material history comment entities; obtaining corresponding historical comment entity vectors according to the entity query food material knowledge graph; and inputting the historical comment entity vector to an attention mechanism for calculation processing to obtain the historical comment vector matrix.

The user food material historical comment data comprise user historical comments, clicking of food materials by users, browsing information and the like, the data information has rich semantic features such as user food material preference and interests, complementary relations of user food material texts and correlation semantic information of each entity or each entity under a knowledge graph can be obtained through processing of an attention mechanism, and therefore the problem of sparse data is solved.

As shown in fig. 8, in step S7, it specifically includes:

s71: and obtaining the maximum real-time semantic similarity according to the real-time vector matrix.

S72: and obtaining the maximum historical semantic similarity according to the historical vector matrix.

S73: and fusing the real-time semantic similarity and the historical semantic similarity into a matrix-form target optimization function to predict scores so as to generate a food material recommendation list.

The similarity calculation mainly comprises cosine similarity, manhattan distance similarity and Euclidean distance similarity. Specific calculation method the present invention is not particularly limited. The spatial distance similarity can reflect the semantic similarity of food materials, so that the semantic relevance can be better depicted in the aspect of subsequently recommending the food materials.

In steps S71 and S71, according to the learning method of the food material knowledge graph, vector representations of all entities and relations in the field to which the recommended food material object belongs are obtained, and the entity representations of the recommended food material object are screened out in the entity vector matrix, and the vector representations of the recommended food material object incorporate entity knowledge related to the entity representations in the whole field, so that the vector representations contain context semantic knowledge of the recommended food material object.

In this embodiment, according to the real-time vector matrix, a cosine or manhattan distance method is used to calculate the real-time semantic similarity of any two food materials, so as to obtain the real-time semantic similarity with the maximum similarity. Similarly, based on the history vector matrix, the history semantic similarity of any two food materials can be calculated, and then the maximum history semantic similarity is calculated. In order to ensure that the value ranges of the semantic similarity are kept consistent, normalization processing is carried out on the semantic similarity, and the normalized result is the final similarity of the feature vectors of the two food materials.

In step S73, the real-time semantic similarity and the historical semantic similarity calculated in steps S71 and S72 are fused to a prediction score in a matrix-form objective optimization function, a threshold or a threshold range is set based on the principle that the higher the prediction score is, the more interesting the user is, and the prediction scoring food meeting the threshold or the threshold range is recommended to the user, for example, the top N food lists larger than the threshold are recommended to the user. Specifically, the food with the highest score can be recommended to the user, and specific food recommendation information can be correspondingly adjusted according to specific requirements.

In summary, the food material recommendation method based on the knowledge graph provided by the invention can be obtained through the steps sequentially. The real-time audio and video data, the real-time text data and the historical food comment data are obtained, data cleaning is carried out on the real-time audio and video data, voice and video separation is carried out on the real-time audio and video data, effective voice data and effective video data are respectively generated, the effective voice data and the effective video data are converted into corresponding text data which are all used as part of a data set of a pre-training and prediction model, and therefore text semantic features are obtained more comprehensively.

In addition, a multichannel multi-size deep cyclic convolutional network model fused with a connection time sequence classification method and an attention mechanism and a time-based deep cyclic convolutional neural network model are constructed, so that richer high-level semantic feature information is mined and obtained. Finally, a real-time vector matrix and a history vector matrix are obtained based on various text data, real-time semantic similarity and history semantic similarity are calculated based on the real-time vector matrix and the history semantic similarity are fused into a matrix-form target optimization function to predict scores so as to generate a food material recommendation list, and the food material recommendation list is output in various modes. The whole model structure has good text data semantic representation capability, good complementarity and relevance characteristics are reflected on semantic features, and the accuracy of the food material recommendation method is improved.

In step S8, it specifically includes:

converting the generated food material recommendation list into voice for output, and/or

Converting the generated food material recommendation list into voice to be transmitted to a client terminal for output, and/or

Converting the generated food material recommendation list into text for output, and/or

Converting the generated food material recommendation list into text for transmission to a client terminal for output, and/or

Converting the generated food material recommendation list into an image to output, and/or

And converting the generated food material recommendation list into an image, and transmitting the image to a client terminal for outputting.

As described in step S8, in this real-time manner, after the food recommendation list information is obtained in the above steps, the food recommendation list information may be converted into voice, and the result information may be broadcasted by a sound playing device built in the intelligent refrigerator, or the result information may be converted into text, and displayed directly by a display device configured in the intelligent refrigerator, or the result information may be converted into an image and displayed directly by a large screen of the intelligent refrigerator. And the result information can be transmitted to the client terminal for output through voice communication, wherein the client terminal is an electronic device with an information receiving function, such as voice transmission to a mobile phone, a smart sound device, a Bluetooth headset and the like for broadcasting, or the result information of the food material recommendation list is transmitted to the client terminal such as the mobile phone, a tablet computer and the like or application software installed on the client terminal in a text or image mode through communication in a short message, a mail and the like for the user to review. Therefore, the multi-channel multi-type classification result information output mode is realized, the user is not limited to the mode that the user can only obtain the related information near the intelligent refrigerator, and the multi-channel multi-type real-time voice acquisition mode provided by the invention is matched, so that the user can directly interact with the intelligent refrigerator remotely, the convenience is extremely high, and the user experience is greatly improved. In other embodiments of the present invention, only one or more of the above-described classification result information output modes may be used, or food recommendation list information may be output through other channels based on the prior art, which is not particularly limited in the present invention.

In summary, the knowledge graph-based food recommendation method provided by the invention acquires real-time audio and video data, real-time text data and historical food user comment data through multiple channels, converts the audio and video data into corresponding audio and text data and image and text data after data processing is performed on the audio and video data, fully extracts text semantic features through a multi-channel multi-size deep cyclic convolutional neural network model and a multi-channel multi-size time deep cyclic convolutional neural network model after text is generated by combining the audio and video data, respectively performs calculation of real-time and historical semantic similarity according to the text data, and then fuses the text data into a matrix-form target optimization function to predict scores so as to generate a food recommendation list, and outputs the food recommendation result through multiple channels.

Based on the same inventive concept, the present invention also provides an electrical apparatus, comprising:

a memory for storing executable instructions;

And the processor is used for realizing the food material recommendation method based on the knowledge graph when the executable instructions stored in the memory are operated.

Based on the same inventive concept, the present invention also provides a refrigerator including:

a memory for storing executable instructions;

Based on the same inventive concept, the invention also provides a computer readable storage medium which stores executable instructions which when executed by a processor realize the food recommendation method based on the knowledge graph.

It should be understood that while the present description describes embodiments, not every embodiment contains an independent claim, and that such description is for the sake of clarity only, those skilled in the art will recognize that the embodiments described herein may be combined as appropriate to form other embodiments as would be appreciated by those skilled in the art.

The above detailed description is provided for the purpose of illustrating the practical embodiments of the invention and is not to be construed as limiting the scope of the invention, but rather as a cover all equivalent embodiments or modifications that do not depart from the spirit of the invention _。

Claims

1. The food material recommending method based on the knowledge graph is characterized by comprising the following steps of:

acquiring real-time audio and video data, acquiring real-time text data and acquiring historical comment text data;

preprocessing the real-time audio and video data, the real-time text data and the historical comment text data to obtain real-time voice data, real-time text data, real-time video data and historical comment text data;

transferring the effective real-time voice data into voice text data;

transferring the effective real-time video data into image text data;

acquiring a real-time vector matrix according to the voice text data, the real-time text data and the image text data;

acquiring a history vector matrix corresponding to the history comment text data;

based on the real-time vector matrix and the history vector matrix, fusing the real-time semantic similarity and the history semantic similarity to generate a food material recommendation list;

and outputting the food material recommendation list.

2. The knowledge-graph-based food recommendation method according to claim 1, wherein the preprocessing the real-time audio/video data, the real-time text data and the historical comment text data to obtain real-time voice data, real-time text data, real-time video data and historical comment text data specifically comprises:

Performing data cleaning, format analysis, format conversion and data storage on the real-time audio and video data, the real-time text data and the historical comment text data to obtain the real-time audio and video data, the real-time text data and the historical comment text data;

the effective audio and video data are subjected to voice and video separation by adopting a script or a third-party tool so as to obtain the real-time voice data and the real-time video data;

preprocessing the real-time voice and video data, including: framing and windowing are carried out on the real-time voice data, and clipping and framing are carried out on the real-time video data;

preprocessing the real-time text data and the historical comment text data, including: segmentation, stop word removal and duplication removal.

3. The knowledge-graph-based food recommendation method according to claim 1, wherein the "transcribing the valid voice data into voice text data" specifically includes:

extracting the effective voice data characteristics to obtain voice characteristics;

inputting the voice characteristics into a voice recognition multichannel multi-size deep cyclic convolution network model to obtain first voice text data through transcription;

Outputting the alignment relation between the voice characteristic and the first voice text data based on a connection time sequence classification method to obtain second voice text data;

acquiring key features of the second voice text data or weight information of the key features based on an attention mechanism;

and combining the second voice text data and key features or weight information of the key features through a full-connection layer, and calculating scores through a classification function to obtain the voice text data.

4. The knowledge-graph-based food recommendation method according to claim 3, wherein the extracting the valid voice data features specifically comprises:

and extracting the effective voice data characteristics and obtaining the Mel frequency cepstrum coefficient characteristics.

5. The knowledge-graph-based food recommendation method according to claim 1, wherein the transferring the effective real-time video data into image text data specifically comprises:

inputting the real-time video data into a 3D depth cyclic convolutional neural network for calculation to obtain image characteristics;

inputting the image characteristics into a multichannel multi-size time convolution network for transcription to obtain first image text data;

Outputting the alignment relation between the image features and the first image text data based on a connection time sequence classification method to obtain second image text data;

and combining the second image text data through the full connection layer, and calculating a score through a classification function to obtain the image text data.

6. The knowledge-graph-based food recommendation method according to claim 1, wherein the step of obtaining a real-time vector matrix from the voice text data, the real-time text data and the image text data specifically comprises:

performing entity extraction and entity alignment processing on the text data to obtain a plurality of normalized entities;

inquiring a food material knowledge graph based on each entity to obtain a corresponding entity vector;

and inputting the entity vector into an attention mechanism for calculation to obtain the real-time vector matrix.

7. The knowledge-graph-based food recommendation method of claim 6, further comprising:

the entity extraction and the food material knowledge graph are expressed by adopting triples;

the entity alignment may be achieved using knowledge-based representation learning.

8. The knowledge-graph-based food recommendation method according to claim 1, wherein the step of obtaining the history comment vector matrix corresponding to the history comment text data specifically includes:

Performing entity extraction and entity alignment processing on the historical comment text data to obtain a plurality of normalized historical comment entities;

inquiring a food material knowledge graph based on each historical comment entity to obtain a corresponding historical comment entity vector;

and inputting the historical comment entity vector into an attention mechanism for calculation to obtain the historical comment vector matrix.

9. The knowledge-graph-based food recommendation method of claim 8, further comprising:

the historical text data comprises user comment data, click rate of the user on the food material and food material browsing information.

10. The knowledge-graph-based food recommendation method according to claim 1, wherein the fusing of real-time semantic similarity and historical semantic similarity to generate a food recommendation list based on the real-time vector matrix and the historical vector matrix specifically comprises:

obtaining maximum real-time semantic similarity according to the real-time vector matrix;

obtaining the maximum historical semantic similarity according to the historical vector matrix;

and fusing the real-time semantic similarity and the historical semantic similarity into a matrix-form target optimization function for calculation to generate a food material recommendation list.

11. The knowledge-graph-based food recommendation method according to claim 10, wherein the obtaining the maximum real-time semantic similarity according to the real-time vector matrix specifically comprises:

obtaining a plurality of food vector matrixes to be recommended according to the food knowledge graph;

and respectively calculating the semantic similarity of each food material vector matrix to be recommended and the real-time vector matrix to obtain the maximum semantic similarity value.

12. The knowledge-graph-based food recommendation method according to claim 10, wherein the step of obtaining the maximum historical semantic similarity according to the historical vector matrix comprises:

and respectively calculating the semantic similarity of each food material vector matrix to be recommended and the historical vector matrix to obtain the maximum semantic similarity value.

13. The knowledge-graph-based food material recommendation method according to claim 1, wherein the outputting the food material recommendation list specifically comprises:

Converting the generated food material recommendation list into an image for output, and/or

14. The knowledge-graph-based food recommendation method according to claim 1, wherein the step of acquiring real-time audio/video data, acquiring real-time text data, and acquiring historical comment text data specifically comprises:

acquiring the real-time audio and video data acquired by the voice acquisition device and/or

Acquiring the real-time audio and video data transmitted from a client terminal;

acquiring the real-time text data acquired by the text acquisition device, and/or

Acquiring the real-time text data transmitted from a client terminal;

acquiring the internally stored historical comment text data, and/or

Acquiring the externally stored historical comment text data, and/or

And acquiring the historical comment text data transmitted from the client terminal.

15. The knowledge-graph-based food recommendation method according to claim 1, wherein the "transcribing the voice data into voice text data" further comprises:

and acquiring configuration data stored in an external cache, executing the multichannel multi-size deep cyclic convolutional neural network model calculation on the basis of the configuration data by the voice data, and performing text transcription.

16. An electrical device, comprising:

a memory for storing executable instructions;

a processor for implementing the knowledge-graph-based food recommendation method of any one of claims 1 to 15 when executing the executable instructions stored in the memory.

17. A refrigerator, comprising:

a memory for storing executable instructions;

18. A computer readable storage medium storing executable instructions, which when executed by a processor implement the knowledge-graph based food recommendation method of any one of claims 1 to 15.