CN113010701A - Video-centered fused media content recommendation method and device - Google Patents

Video-centered fused media content recommendation method and device Download PDF

Info

Publication number
CN113010701A
CN113010701A CN202110214235.0A CN202110214235A CN113010701A CN 113010701 A CN113010701 A CN 113010701A CN 202110214235 A CN202110214235 A CN 202110214235A CN 113010701 A CN113010701 A CN 113010701A
Authority
CN
China
Prior art keywords
information
content
video
data
content metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110214235.0A
Other languages
Chinese (zh)
Inventor
郑叔亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Star Times Software Technology Co ltd
Original Assignee
Beijing Star Times Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Star Times Software Technology Co ltd filed Critical Beijing Star Times Software Technology Co ltd
Priority to CN202110214235.0A priority Critical patent/CN113010701A/en
Publication of CN113010701A publication Critical patent/CN113010701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/436Filtering based on additional data, e.g. user or group profiles using biological or physiological data of a human being, e.g. blood pressure, facial expression, gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physiology (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for recommending fused media contents by taking a video as a center, which relate to the technical field of Internet, can efficiently transmit various comprehensive contents to a user by using the video media center, analyze and mine entity elements and style elements contained in the video contents by means of artificial intelligence, and complete accurate commodity recommendation to the user by dynamically associating marketable commodities. The method comprises the following steps: integrating a content database of multi-format multimedia, and aggregating content metadata information to construct a content metadata database; labeling each content metadata information in the content metadata base, and forming a fusion media knowledge graph based on the content metadata information and the labels; capturing entity elements and style elements in a current key frame in real time when a video is played, and searching target element information concerned by a user from the fusion media knowledge map; and screening candidate commodities from the commodity library according to the target element information, and recommending the candidate commodities to the user.

Description

Video-centered fused media content recommendation method and device
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for recommending fused media content by taking video as a center.
Background
The integrated media is a media form for performing function integration on various media contents, namely, a carrier of the internet is fully utilized, different media with common points and complementarity such as broadcasting, television and newspaper are comprehensively integrated in the aspects of manpower, content, propaganda and the like, and novel media with resource integration, content compatibility, propaganda mutual integration and benefit integration are realized.
An important technical basis for the development of the convergence media is the internet, and with the continuous development of hardware technologies of terminal devices such as personal computers and mobile phones, more and more people choose to use the terminal devices such as the personal computers or the mobile phones to watch various television programs provided by video websites. The video website refers to a website for enabling an internet user to smoothly publish, browse and share video works on line under the support of a related technical platform, such as a favorite net, a music and video net, an amusement art and the like; in general, a video website may also launch its own video client application (also called a video client), and is specifically used for playing video works provided by the video website on a terminal device such as a mobile phone or a personal computer, for example: a you-cool video client, an Aiqi-art video client, etc. The development of short videos in recent years raises a new trend of heat, such as trembling, fast-hand, etc. Both media and individual creators can release the content created by themselves through a short video platform so as to gain attention and fans, and meanwhile, the traffic conversion profit is realized through the advertisement and electronic commerce modes.
Therefore, in the media format of the converged media, the video content is the core, and the core business model contained therein is the e-commerce transaction guided by the traffic conversion. There are two steps, firstly, attract more users to watch and follow through the content propagation (mainly video) of the fusion media, namely the flow; the second step is to recommend commodities (real or virtual commodities) to the user in a proper way when the user consumes the content, and to realize the transaction through the e-commerce, namely the successful conversion of the marketing of the content.
Electronic commerce is also depopulated to the internet and has undergone significant development, resulting in an extremely large ecological and economic scale. With the continuous development of electronic commerce, more and more users choose to shop online. The user can conveniently select the needed goods by accessing the electronic commerce website through the browser. In many cases, the e-commerce web site may recommend goods to the user, for example, after the user purchases a certain kind of goods, the user may recommend goods similar to or related to the goods, and for example, the user may also recommend new goods on shelf, discounted goods, hot goods, and the like. Generally, the electronic commerce website on the internet is currently used for recommending commodities based on commodity sales ranking, evaluation scoring of commodities by users or analysis of other behavior data of users on the electronic commerce website. The mode of the user-oriented shopping mall is obviously lagged behind, so that new modes such as social e-commerce, live e-commerce, short video e-commerce and the like are developed. Today, the new form of converged media is defined and developed, and by virtue of the capability of diversified media forms for stereoscopic collaboration and rapid propagation and wide coverage, the converged media is bound to become an important guiding force for electronic commerce.
In the similar technical solutions that have been disclosed, there are several methods for recommending video contents and commodities to be promoted by associating the video contents with each other through an object recognition technology, such as "video content-based commodity recommendation method and system" (application No. 201510093789.4), "video-based commodity recommendation method and apparatus" (application No. 201610511072.1), and so on. The purpose of these methods is to hope to make the user see the same physical goods during watching video, so as to improve the user experience and improve the possibility of goods sales conversion. The feasibility of these solutions can be basically confirmed, but the following problems remain:
1. the method is only based on videos and aims at association and recommendation of physical commodities, is not suitable for the current large trend of development of media content fusion, cannot excavate more relevant content by taking the videos as the center, is limited in bandwidth in the recommendation process, cannot form a content matrix and a network, and cannot realize the recommended nuclear fission effect;
2. the questioning of the improvement of the user experience is questionable, because the recommendation is carried out purely according to the commodities matched by the video images, the user is disturbed presumably because the commodity information is complicated, a plurality of commodities are actually not wanted by the user, and the frequent recommendation is equivalent to the frequent playing of advertisements which the user does not want to see, and the bad experience is brought to the user.
Disclosure of Invention
The invention aims to provide a video-centered converged media content recommendation method, which can efficiently transmit various forms of comprehensive contents to a user by using a video media center, analyze and mine entity elements and style elements contained in the video contents by means of artificial intelligence, and complete accurate commodity recommendation to the user by dynamically associating marketable commodities.
In order to achieve the above object, a first aspect of the present invention provides a video-centric converged media content recommendation method, comprising:
integrating a content database of multi-format multimedia, and aggregating content metadata information to construct a content metadata database;
labeling each content metadata information in the content metadata base, and forming a fusion media knowledge graph based on the content metadata information and the labels;
capturing entity elements and style elements in a current key frame during video playing in real time, and searching target element information concerned by a user from the fusion media knowledge graph;
and screening candidate commodities from a commodity library according to the target element information, and recommending the candidate commodities to the user.
Preferably, the method for integrating the content database of the multi-format multimedia and aggregating the content metadata information to construct the content metadata database comprises the following steps:
aggregating content metadata information in a multimedia content database in a video format, a multimedia content database in a picture format and a multimedia content database in an article format to construct a content metadata database;
the content metadata information includes content titles and content descriptions of video data, picture data, and article data.
Preferably, the method for tagging each piece of content metadata information in the content metadata base includes:
classifying, segmenting and labeling part of speech aiming at each content metadata information in the content metadata base, and reserving key information words in the content metadata base;
randomly distributing key information words corresponding to all the content metadata information into a plurality of groups of data;
taking one group of data, adopting a preset label system artificial label as an initial training data set, and setting an initial training model;
training each label in the initial training data set to obtain a label classification model;
training other groups of data by using the label classification model until the label accuracy reaches an accuracy threshold value, and outputting the label classification model, otherwise, continuously optimizing the label classification model;
and labeling the metadata information of each content in the content metadata database by adopting the output label classification model.
Further, the method for continuously optimizing the label classification model comprises the following steps:
optimizing the label classification model by expanding the initial training data set and/or adjusting the initial training model.
Preferably, before capturing the entity elements and style elements in the current key frame in real-time video playing, one or more of the following is also included:
analyzing video data in a multimedia content database, and identifying entity element information and style element information of the key frame image to be expanded into a fusion media knowledge map;
analyzing picture data in a multimedia content database, and identifying entity element information and style element information of the picture to be expanded into a fusion media knowledge map;
analyzing article data in the multimedia content database, and identifying key word entity information in the article data to be expanded into the fusion media knowledge map.
Preferably, the method for identifying the entity element information and the style element information in the key frame image and the picture comprises:
detecting the key frame images and the human faces and human body postures in the images to identify the human faces and the limbs of the trunk;
searching an actor database based on the face recognition result to obtain identity information of the actor, and recognizing dress style of the actor based on the trunk limb recognition result to obtain dress information of the actor;
identifying objects except for human faces and human body postures in the key frame images and the pictures to obtain commodity information of the objects;
recognizing characters in the key frame images and the pictures to obtain character information;
and expanding one or more of the identity information, the clothing information, the commodity information and the text information into a fusion media knowledge graph.
Further, the method for identifying the key word entity information in the article data comprises the following steps:
dividing the article data into clauses according to the separating characters of the sentences, and carrying out sentence identification of unique ID on each clause;
associating the object data of each sentence and the sentence marks of the two preceding and following sentences, and expanding the object data into a fusion media knowledge map;
segmenting each sentence, removing noise words, then performing unique ID segmentation identification on each segmented word, and expanding the segmented words into a fusion media knowledge map;
and carrying out article identification with unique ID according to the content attribute of the article data, and expanding the object data of the article data, the association relationship between the article data and the clauses and the association relationship between the clauses and the participles into a fusion media knowledge map.
Preferably, the method for searching out the target element information focused by the user from the fusion media knowledge graph comprises the following steps:
converting the entity elements and the style elements into feature vectors, and retrieving related element information from the fusion media knowledge graph;
dynamically adjusting the intensity value of the element information based on the occurrence frequency of the element information within the preset key frame interval number;
and when the intensity value of the element information reaches an intensity threshold value, outputting the corresponding element information as target element information.
Preferably, the method for screening out candidate commodities from a commodity library according to the target element information and recommending the candidate commodities to the user comprises the following steps:
if the target element information comprises actor information, searching commodities related to the actor information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list;
if the target element information comprises clothing information, searching commodities related to the clothing information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list;
if the target element information comprises commodity information of the object, searching commodities related to the commodity information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list;
and carrying out correlation scoring on the candidate commodities in the candidate commodity list, and screening the candidate commodity with the highest score to recommend to the user.
Optionally, after recommending the candidate product to the user, the method further includes:
and dynamically adjusting the intensity value of the element information according to the interaction data of the user aiming at the recommended candidate commodity.
Compared with the prior art, the method for recommending the converged media content by taking the video as the center has the following beneficial effects:
the invention provides a video-centered fused media content recommendation method, which comprises the steps of integrating a multi-format multimedia content database to form a basic content warehouse, then performing aggregation processing on content metadata information to construct a content metadata database, labeling each content metadata information by adopting a preset label system, forming a fused media knowledge graph based on the content metadata information and the labeled labels, capturing entity elements and style elements in key frames during video playing in real time when a user watches videos, dynamically retrieving and matching related contents in the fused media knowledge graph based on a feature vector formed by the entity elements and the style elements of each key frame, wherein the related contents comprise videos, pictures and articles, searching target element information concerned by the user, and finally taking the target element information as a commodity library input to the search rear end, and finding out the candidate goods which are most suitable for recommendation, and recommending the candidate goods to the user. Therefore, the invention has the following beneficial effects:
1. the fusion knowledge map of the multi-format multimedia content is established, dynamic cooperation of the fusion media content is realized, more content can be recommended to the user, and more effective content marketing is realized;
2. the semantic content recommendation based on the knowledge graph can realize more generalization capability and more accurate recommendation compared with a keyword matching mode, and meanwhile, the efficiency is better;
3. the video media center efficiently spreads comprehensive contents in various forms to users, entity elements and style elements contained in the video contents are analyzed and mined by means of artificial intelligence, and accurate content recommendation and/or commodity recommendation to the users are completed by dynamically associating marketable commodities.
A second aspect of the present invention provides a video-centric converged media content recommendation apparatus, which is applied to the video-centric converged media content recommendation method according to the foregoing technical solution, and the apparatus includes:
the aggregation unit is used for integrating the content database of the multi-format multimedia and aggregating the content metadata information to construct a content metadata database;
the map unit is used for labeling each content metadata information in the content metadata base and forming a fusion media knowledge map based on the content metadata information and the label;
the retrieval unit is used for capturing entity elements and style elements in the current key frame in real time when the video is played, and searching target element information concerned by the user from the fusion media knowledge graph;
and the recommending unit is used for screening out target commodity information from the commodity library according to the target element information and recommending the target commodity information to the user.
Compared with the prior art, the video-centered converged media content recommendation device provided by the invention has the same beneficial effects as the video-centered converged media content recommendation method provided by the technical scheme, and the details are not repeated herein.
A third aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the steps of the above-mentioned video-centric converged media content recommendation method.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as those of the video-centered converged media content recommendation method provided by the technical scheme, and are not repeated herein.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating a video-centric collaborative media content recommendation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating module interactions of a video-centric converged media content recommendation method according to an embodiment of the present invention;
FIG. 3 is a schematic flowchart illustrating a process of adding keyframe analysis and recognition during video encoding according to a first embodiment of the present invention;
FIG. 4 is a flowchart illustrating the process of adding keyframe analysis and identification to a filter in the middle of video transcoding according to an embodiment of the present invention;
fig. 5 is a logic structure of an online interactive video platform according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present embodiment provides a method for recommending a video-centric converged media content, including: integrating a content database of multi-format multimedia, and aggregating content metadata information to construct a content metadata database; labeling each content metadata information in the content metadata base, and forming a fusion media knowledge graph based on the content metadata information and the labels; capturing entity elements and style elements in a current key frame in real time when a video is played, and searching target element information concerned by a user from the fusion media knowledge map; and screening candidate commodities from the commodity library according to the target element information, and recommending the candidate commodities to the user.
In the method for recommending the content of the fused media with the video as the center, a basic content warehouse is formed by integrating a content database of multi-format multimedia, then content metadata information in the basic content warehouse is aggregated to construct a content metadata database, then a preset label system is adopted to label each content metadata information, a fused media knowledge graph is formed based on the content metadata information and the labeled labels, when a user watches the video, entity elements and style elements in key frames during video playing are captured in real time, relevant content in the fused media knowledge graph, including videos, pictures and articles, is dynamically retrieved and matched based on a feature vector formed by the entity elements and the style elements of each key frame, target element information concerned by the user is searched out, and finally the target element information is used as a commodity library input to the search rear end, and finding out the candidate goods which are most suitable for recommendation, and recommending the candidate goods to the user. Therefore, the present embodiment has the following beneficial effects:
1. the fusion knowledge map of the multi-format multimedia content is established, dynamic cooperation of the fusion media content is realized, more content can be recommended to the user, and more effective content marketing is realized;
2. the semantic content recommendation based on the knowledge graph can realize more generalization capability and more accurate recommendation compared with a keyword matching mode, and meanwhile, the efficiency is better;
3. the video media center efficiently spreads comprehensive contents in various forms to users, entity elements and style elements contained in the video contents are analyzed and mined by means of artificial intelligence, and accurate content recommendation and/or commodity recommendation to the users are completed by dynamically associating marketable commodities.
In the above embodiment, the method for integrating the content database of the multi-format multimedia and aggregating the content metadata information to construct the content metadata database includes:
aggregating content metadata information in a multimedia content database in a video format, a multimedia content database in a picture format and a multimedia content database in an article format to construct a content metadata database; the content metadata information includes content titles and content descriptions of video data, picture data, and article data.
In specific implementation, the data formats of the multimedia content database include a video format, a picture format and an article format, wherein the multimedia content database in the video format stores a plurality of video data, the multimedia content database in the picture format stores a plurality of picture data, the multimedia content database in the article format stores a plurality of article data, and the content metadata information mainly functions to describe the main content and title of each piece of video data, picture data and article data. By uniformly integrating the content metadata information in the content database of the multi-format multimedia, the interconnection and intercommunication of data association can be realized, the dynamic cooperation of the fused media data content is realized, and more recommended content is brought to the user.
In the above embodiment, the method for tagging each piece of content metadata information in the content metadata base includes:
classifying, segmenting and labeling part of speech aiming at each content metadata information in a content metadata base, and reserving key information words in the content metadata base; randomly distributing key information words corresponding to all content metadata information into a plurality of groups of data; taking one group of data, adopting a preset label system artificial label as an initial training data set, and setting an initial training model; training each label in the initial training data set to obtain a label classification model; training other groups of data by using the label classification model until the label accuracy reaches an accuracy threshold value, and outputting the label classification model, otherwise, continuously optimizing the label classification model; and labeling each content metadata information in the content metadata base by adopting the output label classification model.
In the above embodiment, the method for continuously optimizing the label classification model includes: the label classification model is optimized by expanding the initial training data set and/or adjusting the initial training model.
It should be noted that any data will have some metadata, and through proper classification and processing of the metadata, a series of labels describing the content can be formed, and the labels can be used as the data base for constructing the knowledge graph. In the embodiment, a two-level classification label system is adopted, and a typical example of the content classification and metadata of the label system is shown in the following table 1-1:
Figure BDA0002952533630000101
Figure BDA0002952533630000111
TABLE 1-1
According to the label system designed according to the classification structure and the metadata information, the label system can be labeled according to the internal characteristics of the content database in a targeted and more accurate mode. The title and description of the content are the primary source of tag identification, and the tags are a pre-designed set of tags for secondary classification. The marking process can be accomplished by the following iterative process:
1. classifying, segmenting and labeling the part of speech of each content metadata information in a content metadata base, and reserving key information words, namely nouns, adjectives and verbs, in the content metadata base;
2. forming the title and the described key information words and other metadata information into an initial metadata information vector of the content;
3. dividing the content data of the secondary classification into 10 groups in random equal quantity, taking the 1 st group of data to carry out manual labeling to form an initial training data set, and setting an initial training model (SVM);
4. aiming at each label in the initial training data set, training the training data set by using an initial training model to obtain a label classification model;
5. taking a group of data, automatically labeling by adopting the label classification model, manually checking a label result, and calculating the current label accuracy of the label classification model for the group of data;
6. if the current training data set reaches m groups, if m is 4, entering step 7, otherwise, if the accuracy rate of the current label is not higher than 90% and is reduced than the previous evaluation, reconstructing the training data set by using the data set after the verification is completed and the previous group of data, returning to step 4 to retrain the label classification model, and if the accuracy rate of the current label is higher than 90% and is not reduced than the previous evaluation, taking down a group of data and re-executing step 5;
7. if the accuracy of the current label is not ideal, if the accuracy is not higher than 90% or the accuracy is lower as the training set is larger, the initial training model is required to be adjusted to be a mixed type, namely, a logistic regression model and a decision tree model are added, a plurality of models are organized by integrated learning, a new initial training model is formed, the step 4 is returned to the step 4 to retrain the label classification model, and if the accuracy of the current label is ideal, the step 8 is directly entered;
8. and automatically labeling the residual group of data by adopting an ideal training label classification model.
The fusion media knowledge graph in the embodiment is the most core data and is also the hub of several important method processes. Fig. 2 shows the relationship of several core modules of the present solution, and the importance of the converged media knowledge graph can be seen. Tables 1-2 list node models, i.e., entity object information models, included in the converged media knowledge graph, and tables 1-3 list edge models, i.e., object relationship information models. Wherein the prefix meaning of the main label is as follows: v-video, P-pictures, T-text, H-human, O-object, R-relations.
Figure BDA0002952533630000121
Figure BDA0002952533630000131
Tables 1 to 2
Figure BDA0002952533630000132
Figure BDA0002952533630000141
Tables 1 to 3
In the above embodiment, before capturing the entity elements and style elements in the current key frame in real-time video playing, one or more of the following is also included:
analyzing video data in a multimedia content database, and identifying entity element information and style element information of the key frame image to be expanded into a fusion media knowledge map; analyzing picture data in a multimedia content database, and identifying entity element information and style element information of the picture to be expanded into a fusion media knowledge map; analyzing article data in the multimedia content database, and identifying key word entity information in the article data to be expanded into the fusion media knowledge map.
The identification method of the entity element information and the style element information in the key frame image and the picture comprises the following steps:
detecting the human face and the human body posture in the key frame image and the picture, and identifying the human face and the limbs of the trunk; searching an actor database based on the face recognition result to obtain identity information of the actor, and recognizing dress style of the actor based on the trunk limb recognition result to obtain dress information of the actor; identifying objects except for human faces and human body postures in the key frame images and pictures to obtain commodity information of the objects; recognizing characters in the key frame images and pictures to obtain character information; one or more of identity information, clothing information, commodity information and character information are expanded into the fusion media knowledge map.
During specific implementation, the content of the key frame images and the content of the pictures are analyzed, entity elements and style elements in the key frame images and the key frame pictures of the videos are identified, and the information is stored in the fusion media knowledge graph. The specific process is as follows:
1. firstly, detecting the human face and human body postures of a key frame image and a picture, finding out whether a human face or the trunk and the limbs of a human exist, if so, entering a step 2, otherwise, entering a step 3;
2. recognizing the face in the key frame image and the picture, and comparing the recognition result with an actor library to obtain actor information; identifying clothing styles of actors in the key frame images and pictures, obtaining clothing information such as colors, styles, fabrics and overall styles, and storing the actor information and the clothing information into a fusion media knowledge map;
3. detecting and identifying objects in the key frame images and pictures, and storing information into the fusion media knowledge map only by obtaining shape classification, color classification and rough article classification probability without obtaining an accurate object identification result;
4. if characters, such as subtitles, signs and the like, exist in the key frame images and pictures in the areas except the areas detected in the steps 2 and 3, the character information can be further extracted by using an OCR technology, the character information is subjected to text word segmentation and part-of-speech tagging, words with actual meanings, mainly including nouns, verbs and adjectives, are reserved, and the key word information is stored in the fusion media knowledge map.
The face detection and recognition, and limb detection involved in the above embodiments are supported by very sophisticated and robust algorithms, garment and garment style recognition has been supported in recent years by models and algorithms such as Deep fast on, and the algorithms for contour detection, color and texture recognition, and rough classification based on object contour and texture are more sophisticated. Chinese OCR and word segmentation and part of speech tagging technology are also mature and can be directly applied to the scheme. So here is the information that is substantially helpful to the overall technical solution extracted from the picture using these relatively sophisticated techniques.
The method for identifying the key word entity information in the article data comprises the following steps:
dividing the article data into clauses according to the separating characters of the sentences, and carrying out sentence identification of unique ID on each clause; associating the object data of each sentence and the sentence marks of the two preceding and following sentences, and expanding the object data into a fusion media knowledge map; segmenting each sentence, removing noise words, then performing unique ID segmentation identification on each segmented word, and expanding the segmented words into a fusion media knowledge map; and carrying out article identification with unique ID according to the content attribute of the article data, and expanding the object data of the article data, the association relationship between the article data and the clauses and the association relationship between the clauses and the participles into a fusion media knowledge map.
In specific implementation, the content of the article data is analyzed, the key word entity information of the article data is identified, and the information is stored in the fusion media knowledge map. The specific process is as follows:
1. taking each article data as a basic processing unit, splitting the article into a plurality of sentences according to sentence delimiters, such as line breaks, periods, exclamation marks, question marks, semicolons and the like, generating a globally unique ID for each sentence based on the content of the sentence, associating the sentence object data and the IDs of the front and the back sentences, and storing the sentence object data and the IDs in a fusion media knowledge graph;
2. for each divided sentence, performing word segmentation operation on the sentence, and simultaneously performing part-of-speech tagging on each word;
3. and (3) reserving words with parts of speech tagging only words with actual meanings including nouns, verbs and adjectives according to the parts of speech, and discarding the other word objects. For each reserved word, generating a global unique ID according to the text content of the word, and storing word object data into the fusion media knowledge graph;
4. and generating a global unique ID according to the basic attributes of the article, and simultaneously storing article object data, the association relationship between the article and the sentence and the association relationship between the sentence and the words in the fusion media knowledge graph.
In specific implementation, the Chinese word segmentation and part-of-speech tagging methods related to the above processes have very mature technical schemes and open source software, such as a Chinese word segmentation, a Stanford NLP library and the like.
In the above embodiment, the method for searching the target element information focused by the user from the fusion media knowledge graph includes:
converting the entity elements and the style elements into feature vectors, and retrieving related element information from the fusion media knowledge graph; dynamically adjusting the intensity value of the element information based on the occurrence frequency of the element information within the preset key frame interval number; and when the intensity value of the element information reaches the intensity threshold value, outputting the corresponding element information as target element information.
In specific implementation, in the process of watching a certain video content by a user, each key frame image is displayed at a user terminal in time sequence along with the time, and each displayed image can consider that the content information related to the image acts on the user once, so that the action of some information on the user is accumulated and strengthened along with the time, and meanwhile, other information is attenuated along with the time. The same or similar information repeatedly acts for a long time, and a remarkable positive action is generated on the user to a certain degree; conversely, the same information, if not acted upon for a long time, will diminish and lose positive effect on the user. Based on such a basic principle, the recommended content can be generated by using the fusion media knowledge graph, and the ideal effect can be achieved. Examples are as follows:
1. in the process of playing the video, the terminal can obtain information contained in each key frame, such as the identity information of actors, the clothing information such as the color style of clothing, the commodity information such as the shape, the color and the classification probability of main objects, key character information and the like. These information elements may be organized into an information object and each information element in the information object may then be assigned a strength value. When the information elements are initially generated, the strength values are all 0;
2. as video playback progresses, the intensity value is incremented by 1 for each reproduction of an information element in an information object. It should be noted here that if the user operates the terminal to fast forward, fast backward, or position to a certain play position before or after the current time point, the rule of the process still applies;
3. with the continuous advance of video playing, if an information element is not enhanced within a certain key frame interval number X, the video playing enters an attenuation stage, the model of the attenuation stage can be quantified by a half-life period, if the half-life period is Y, the model means that the intensity value of the information element is halved every Y key frames, if the information element already enters the attenuation stage but reappears when a certain key frame is played, the intensity can be continuously increased by 1, and the video playing exits the attenuation stage;
4. by utilizing the fusion media information cooperation model, the information elements of the main video key frame can be used as input to search, match and output more related information elements. The peripheral information elements are basically derived from various contents in the fusion media knowledge graph, one part of the peripheral information elements can strengthen the intensity of the main video information element, the other part of the peripheral information elements can weaken the intensity of the main video information element, and the other part of the peripheral information elements can play a role independently, namely the intensity is required to be calculated independently.
5. When the intensity of any one information element exceeds a preset threshold, the content recommendation mechanism can be triggered. And searching the content with the highest correlation degree in the fusion media knowledge graph by taking the information element as input to serve as recommended content, wherein the recommended content comprises videos, pictures and articles.
The process of the embodiment is as follows: the parameter X expresses the tolerance for the validity of an information element, the larger X the more desirable an information element is to survive. The parameter Y expresses how fast the information element fails after exceeding the tolerance. A smaller Y indicates that a faster failure of the information element is desired. If the number of information elements of a content ensemble is large, a smaller X may be set, whereas a larger X may be set; if the information element change frequency is high, then a smaller Y may be set, whereas a larger Y may be set. Thus, the setting of the parameters X and Y can be achieved by two different methods:
1. static rules: and setting according to the type and duration of the video. Different types of video have different durations and different types of information elements and frequencies of transitions. Generally, the duration of a television play is 30 minutes to 1 hour, the variety of information elements is more, and the conversion frequency is low; the time of the film is 1.5 to 3 hours, the information elements are various, and the conversion frequency is high; the duration of the comprehensive program is 1 hour to 2 hours, the information elements are few in types or concentrated, and the conversion frequency is low; the time length of the media video is 5 minutes to 10 minutes, the types of information elements are few or centralized, and the conversion frequency is low; the short video duration is 15 seconds to 1 minute, the information elements are few or concentrated, and the conversion frequency is low. Therefore, for a television play, a smaller X and a larger Y can be set; for movies, smaller X and Y may be set; for variety, self-media video, and short video, then a larger X and a larger Y may be set. Of course, the length of X and Y is relative to the total duration of the video, and the absolute values of the different videos X and Y need to be related to the total duration of the video itself. For example, if a key frame interval is 5 seconds. Then, the tv play may set X-12, Y-60; the movie may set X-12, Y-24; the synthesis program can be set as X60 and Y60; x-12, Y-12 may be set from the media video; the short video may be set to X ═ 1,2,3, and Y ═ 1,2, 3.
2. Dynamic rules: the setting is made according to the kind and conversion rate of the information element. The basic principle is also followed, but the strategy of dynamic adjustment is calculated in runtime. Assuming that the interval of the key frames is S, the total duration of the video is T, then the total number of the key frames N is T/S key frame intervals, the total number of the video information elements is known in advance, that is, the number of the different information elements is recorded as I, and the number of the different information elements is recorded when each key frame is played, so that when any one key frame is played, it can be known how many proportions of the information elements have appeared, and this proportion is set as p. As the playing progress progresses, p must be monotonically increased from 0 to 1, if the current is the nth key frame interval, the density of information elements in the playing progress which has already been experienced can be represented by p × I/N, so as to dynamically balance the information element change frequency, and if I is larger, X may be set to max (1, N/100); if I is small, X may be set to max (3, N/20). If p I/N is large, then Y max (1, N/100) may be set; if p I/N is small, Y max (3, N/20) may be set. The values are not absolute, and more interval intervals can be completely divided, and even the interval intervals are set by a continuous function.
In the above embodiment, the method for screening candidate commodities from the commodity library according to the target element information and recommending the candidate commodities to the user includes:
if the target element information comprises actor information, searching commodities related to the actor information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list; if the target element information comprises clothing information, searching commodities related to the clothing information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list; if the target element information comprises commodity information of the object, searching commodities related to the commodity information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list; and carrying out correlation scoring on the candidate commodities in the candidate commodity list, screening the candidate commodity with the highest score, and recommending the candidate commodity to the user.
The commodity recommendation process in the above embodiment is specifically as follows:
1. using information elements (hereinafter referred to as "input information elements") which are obtained by fusing the media information cooperation model and can recommend commodities, if the information elements contain identity information of actors, famous persons and the like, inputting the key character information into a commodity library, and judging whether the commodity library has commodities which are declared by key characters or closely related to the key characters, and if so, putting the commodities into a candidate commodity list as candidate commodities;
2. if the input information elements contain clothing-related information, such as color, style, fabric and the like, the information is used as input to search matched clothing commodity information in the commodity library, and the search result is put into a candidate commodity list.
3. If the input information elements comprise object shape, color and classification probability information, the information is used as input to search commodity information which can be matched with commodity attributes in a commodity library, and the search result is put into a candidate commodity list;
4. and if the input information elements contain character information, inputting the character information as a key word into a commodity library for searching, and putting the obtained search result into a candidate commodity list.
All the retrieval in the steps has a result relevance score, the candidate commodity list is comprehensively ranked according to the relevance score, the higher the score is, the position in the front is ranked, and then the result is recommended to the user.
Preferably, the above embodiment further includes, after recommending the candidate product to the user: and dynamically adjusting the intensity value of the element information according to the interaction data of the user aiming at the recommended candidate commodity.
In specific implementation, after the content and the commodity are recommended to the user, the system collects and counts user interaction data, and the display amount (IMP), click rate (CTR) or purchase conversion rate (CVR) of each piece of related content and commodity information can be obtained. The data can be called the acceptance degree A of the user to the recommendation result in a comprehensive mode, and the value of the normalization processing A is between 0 and 1. A is inversely applied to the information element by the recommended content and the goods. If a exceeds a certain threshold, the corresponding information element is considered to be strengthened with an intensity value increasing by a proportion a, and if the information element has entered the decay period, the decay period is exited. Obviously, the above process may directly adjust and optimize content recommendation, and indirectly affect the result of commodity recommendation, the calculation method of a needs to be determined according to the actual implemented scenario, which factors (IMP, CTR, CVR) are more important, the corresponding weight is larger, and the present embodiment does not limit this.
When the method is specifically implemented, the function of retrieving related contents through information elements is supported by the fusion media knowledge graph, and the process is as follows:
1. in basic metadata of videos, pictures, ARTICLEs (i.e., nodes of V _ beginning and P _ beginning and T _ art tag) of entity object data (node data), fuzzy matching is performed on input information element data (character string), and content data on matching is put into a candidate recommended content set.
2. In attribute data of people, clothes and articles of entity object data, fuzzy matching is carried out on input information element data (character strings), and the matched data is put into a related object set.
3. And finding pictures, ARTICLEs and LABEL data associated with the related LABEL set according to R _ [ PIC, ARTICLE, LABEL ] _ [ PERSON, GARMENT, THING ] in the object relation data, putting the pictures and the ARTICLE data in the candidate recommended content set (ignoring if repeated), and putting the LABEL data in the related LABEL set.
4. In the text of the tag (T _ LABEL) of the entity object data, fuzzy matching is performed on the input information element data (character string), and the tag data on the matching is put into the relevant tag set (ignored if there is duplication).
5. And finding VIDEO, picture and ARTICLE data associated with the related LABEL set according to R _ [ VIDEO, PIC, ARTICLE ] _ LABEL in the object relation data (side data), and putting the VIDEO, picture and ARTICLE data into the candidate recommended content set (ignoring if repeated).
6. For each content node in the candidate recommended content set, all attribute data are taken as a feature set (marked as B) and a feature set (marked as A) of the main video to perform distance calculation, and the smaller the distance, the higher the representation correlation is, and the higher the ranking is. This results in an ordered list of candidate recommended content. The distance calculation can be realized by the following method:
a) first, calculate B in A with set inclusion degreeDegree of similarity
Figure BDA0002952533630000211
I.e. the number of intersection elements of a and B divided by the number of elements of a. Therefore, all candidate contents can obtain the contained similarity with the main video, and the higher the similarity is, the higher the rank is.
b) For candidate contents with equal similarity, the Ochiai coefficients can be further utilized for secondary sorting,
Figure BDA0002952533630000212
the higher the K value, the higher the rank.
For the merged media information collaboration model in the above embodiment, it is to be explained that:
the fusion media information collaboration model is used to quantify the impact of related media content on the strength value of the main video key frame content information element. This effect includes three cases: strengthen, weaken, not correlate. The reinforcing effect is to increase the intensity value of the corresponding information element, the weakening effect is to decrease the intensity value of the corresponding information element, and the irrelevant effect is to initialize a new information element to calculate the intensity value. For example, the accumulated information elements and intensities of the main video are { a: a, B: B, C: C, D: D, E: E, F: F }, the newly obtained key frame contains information elements [ a, B, C ], and then the updated accumulated information element intensities are { a: a +1, B: B +1, C: C +1, D: D, E: E, F: F }, and the content related to the key frame has a news report whose information elements are [ B, C, D, G ]. Where B and C have a strong influence on the main video, D has a weak influence, and G has no influence because it is not present in the current main video. The cumulative information element strength after being updated by the news report is { A: a +1, B: B +2, C: C +2, D: D-1, E: E, F: F, G:0 }.
The weakening occurs because content from different sources for the same information element may have different value-oriented. For example, if the main video is a movie starring a star, the star appears in most of the information elements of the key frames, but if a related news article is a negative report of the star, the strength value is weakened. For another example, if an object is emphasized in a key frame, but is exposed to quality problems in an article or video, its intensity value will be reduced.
The fusion media information collaboration model is also constructed in a knowledge graph manner, and the specific manner is to add strength influence attributes in the relevant relation objects in tables 1-3. Such as adding an attribute on an ARTICLE-PERSON (R _ ARTICLE _ PERSON): the strength-affected value is +1, indicating reinforcement, and the reinforcement value is 1. The attribute information after adding the cooperation information is shown in tables 1 to 4.
Figure BDA0002952533630000221
Tables 1 to 4
The working process of the fusion media information collaboration model is explained based on the fusion media knowledge graph model:
1. initializing an information element list according to basic metadata and a tag of the main video, wherein the initial strength is set to be 0;
2. the main video starts playing, if the global ID of the current key FRAME is f, the relation data R _ FRAME _ PERSON, R _ FRAME _ GARMENT and R _ FRAME _ THING with the global ID of the key FRAME being f are found in the knowledge graph, the data of people, clothes and articles contained in the relation data are found, and the intensity value of the information element list of the main video is updated according to the intensity influence value;
3. respectively searching R _ [ PIC, ARTICLE, LABEL ] _ PERSON, R _ [ PIC, ARTICLE, LABEL ] _ GARMENT and R _ [ PIC, ARTICLE, LABEL ] _ GARMENT in a knowledge graph according to the data of the characters, the clothes and the ARTICLEs of the key frame, positioning pictures, ARTICLEs and LABEL nodes in the data, extracting information elements according to the attribute data in the data, and then updating the intensity value of an information element list of the main video by combining the intensity influence value in the relation data;
4. if the intensity values of the current information elements of the main video exceed a preset threshold value, triggering content recommendation and commodity recommendation processes according to the information elements;
5. and returning to the step 2 to continue the execution until the main video is played or the client actively quits.
In order to improve the overall operation efficiency of the system, the process of extracting the video key frames to analyze and recognize the visual information and the character information can be combined with the video coder-decoder, so that the process of decoding and analyzing the coded and transcoded video again can be omitted, namely, the intermediate image result of the coding-decoding is used as the input data of the analysis and recognition process, and the calculation resources are obviously saved. There are two scenarios:
1. encoding an original video signal;
2. transcoding the encoded video file, namely decoding the encoded video file and then encoding the decoded video file into another format;
the first scenario is shown in fig. 3, where the video encoder performs intra prediction, and the result of the original image signal preprocessing is analyzed and identified by the encoder. This has several advantages:
a. the premise of intra-frame prediction is that whether the current frame is a key frame or not is judged, so that a key frame object can be directly obtained without disassembling and analyzing a code stream protocol;
b. the data preprocessed by the encoder are regular, generally, the data are original image data in YUV format, the format is more convenient to analyze, because Y is brightness information, the data can be conveniently used for analyzing the face, the edge contour of an object, character OCR and the like of the image; UV is chrominance information, which facilitates analysis of image color style;
3. the memory space applied by the encoder can be directly utilized, and the computing resource is saved.
In the second scenario, as shown in fig. 4, in the middle of the transcoding process, there is usually an image filter to process the decoded image. The key frame analysis and identification module can be realized as an extended function of the filter and is connected to the whole process. The decoder outputs key frame information so that the module can distinguish the key frame. The advantage of this scheme is that it can be adapted to typical transcoding procedures without modification of the internal structure of the encoder.
Therefore, the whole system is optimized by integrating the image analysis module in the video coding and transcoding, the computing resources are saved, and the efficiency of the whole process is improved. The hard identification method of the association of the contents and the commodities is abandoned, the core element analysis and identification method is adopted, the commodity information can be dynamically associated, and the dynamic update of a commodity library and the dynamic association of commodity recommendation can be realized. And analyzing and mining the preference and the opportunity of the user for successfully carrying out commodity conversion based on the model of the dynamic cooperation of the fusion media so as to bring better commodity recommendation experience.
Fig. 5 shows a logical structure of an online interactive video platform using the solution of the present embodiment, wherein content metadata extraction & tagging, video coding and analysis processing, visual & text information recognition, and video content injection are processes in the content production and preparation stages. After the converged media knowledge graph is ready, the online service can be provided to the user: a user initiates a request for playing a video through a client, and a video streaming media server provides a video stream to the client; in the playing process, the main video and the key frame information are sent to a content recommendation engine at the rear end, and then the content recommendation engine requests data of content collaboration and content recommendation from the fusion media knowledge graph.
Example two
The embodiment provides a video-centered converged media content recommendation device, which comprises:
the aggregation unit is used for integrating the content database of the multi-format multimedia and aggregating the content metadata information to construct a content metadata database;
the map unit is used for labeling each content metadata information in the content metadata base and forming a fusion media knowledge map based on the content metadata information and the label;
the retrieval unit is used for capturing entity elements and style elements in the current key frame in real time when the video is played, and searching target element information concerned by the user from the fusion media knowledge graph;
and the recommending unit is used for screening out target commodity information from the commodity library according to the target element information and recommending the target commodity information to the user.
Compared with the prior art, the video-centric converged media content recommendation device provided by the embodiment of the invention has the same beneficial effects as the video-centric converged media content recommendation method provided by the first embodiment, and is not repeated herein.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program performs the steps of the above-mentioned video-centric converged media content recommendation method.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment are the same as those of the video-centered converged media content recommendation method provided by the above technical solution, and are not repeated herein.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the invention may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the embodiment, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (11)

1. A method for video-centric converged media content recommendation, comprising:
integrating a content database of multi-format multimedia, and aggregating content metadata information to construct a content metadata database;
labeling each content metadata information in the content metadata base, and forming a fusion media knowledge graph based on the content metadata information and the labels;
capturing entity elements and style elements in a current key frame during video playing in real time, and searching target element information concerned by a user from the fusion media knowledge graph;
and screening candidate commodities from a commodity library according to the target element information, and recommending the candidate commodities to the user.
2. The method of claim 1, wherein the method of integrating a content database of multi-format multimedia, aggregating content metadata information and constructing a content metadata database comprises:
aggregating content metadata information in a multimedia content database in a video format, a multimedia content database in a picture format and a multimedia content database in an article format to construct a content metadata database;
the content metadata information includes content titles and content descriptions of video data, picture data, and article data.
3. The method according to claim 2, wherein the method of tagging each of the content metadata information in the content metadata base comprises:
classifying, segmenting and labeling part of speech aiming at each content metadata information in the content metadata base, and reserving key information words in the content metadata base;
randomly distributing key information words corresponding to all the content metadata information into a plurality of groups of data;
taking one group of data, adopting a preset label system artificial label as an initial training data set, and setting an initial training model;
training each label in the initial training data set to obtain a label classification model;
training other groups of data by using the label classification model until the label accuracy reaches an accuracy threshold value, and outputting the label classification model, otherwise, continuously optimizing the label classification model;
and labeling the metadata information of each content in the content metadata database by adopting the output label classification model.
4. The method of claim 3, wherein the method for continuous optimization of the label classification model comprises:
optimizing the label classification model by expanding the initial training data set and/or adjusting the initial training model.
5. The method of claim 1, wherein capturing the entity elements and style elements in the current key frame in real-time during video playback further comprises one or more of:
analyzing video data in a multimedia content database, and identifying entity element information and style element information of the key frame image to be expanded into a fusion media knowledge map;
analyzing picture data in a multimedia content database, and identifying entity element information and style element information of the picture to be expanded into a fusion media knowledge map;
analyzing article data in the multimedia content database, and identifying key word entity information in the article data to be expanded into the fusion media knowledge map.
6. The method of claim 5, wherein the identification method of the entity element information and the style element information in the key frame image and the picture comprises:
detecting the key frame images and the human faces and human body postures in the images to identify the human faces and the limbs of the trunk;
searching an actor database based on the face recognition result to obtain identity information of the actor, and recognizing dress style of the actor based on the trunk limb recognition result to obtain dress information of the actor;
identifying objects except for human faces and human body postures in the key frame images and the pictures to obtain commodity information of the objects;
recognizing characters in the key frame images and the pictures to obtain character information;
and expanding one or more of the identity information, the clothing information, the commodity information and the text information into a fusion media knowledge graph.
7. The method of claim 6, wherein the method for identifying key textual entity information in the article data comprises:
dividing the article data into clauses according to the separating characters of the sentences, and carrying out sentence identification of unique ID on each clause;
associating the object data of each sentence and the sentence marks of the two preceding and following sentences, and expanding the object data into a fusion media knowledge map;
segmenting each sentence, removing noise words, then performing unique ID segmentation identification on each segmented word, and expanding the segmented words into a fusion media knowledge map;
and carrying out article identification with unique ID according to the content attribute of the article data, and expanding the object data of the article data, the association relationship between the article data and the clauses and the association relationship between the clauses and the participles into a fusion media knowledge map.
8. The method of claim 1, wherein the method of searching the fused media knowledge-graph for information about the target element of interest to the user comprises:
converting the entity elements and the style elements into feature vectors, and retrieving related element information from the fusion media knowledge graph;
dynamically adjusting the intensity value of the element information based on the occurrence frequency of the element information within the preset key frame interval number;
and when the intensity value of the element information reaches an intensity threshold value, outputting the corresponding element information as target element information.
9. The method of claim 1, wherein the method of screening candidate commodities from a commodity library according to the target element information and recommending to a user comprises:
if the target element information comprises actor information, searching commodities related to the actor information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list;
if the target element information comprises clothing information, searching commodities related to the clothing information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list;
if the target element information comprises commodity information of the object, searching commodities related to the commodity information from a commodity library as candidate commodities and putting the candidate commodities into a candidate commodity list;
and carrying out correlation scoring on the candidate commodities in the candidate commodity list, and screening the candidate commodity with the highest score to recommend to the user.
10. The method of claim 8, further comprising, after recommending the candidate good to the user:
and dynamically adjusting the intensity value of the element information according to the interaction data of the user aiming at the recommended candidate commodity.
11. A video-centric converged media content recommendation device, comprising:
the aggregation unit is used for integrating the content database of the multi-format multimedia and aggregating the content metadata information to construct a content metadata database;
the map unit is used for labeling each content metadata information in the content metadata base and forming a fusion media knowledge map based on the content metadata information and the label;
the retrieval unit is used for capturing entity elements and style elements in the current key frame in real time when the video is played, and searching target element information concerned by the user from the fusion media knowledge graph;
and the recommending unit is used for screening out target commodity information from the commodity library according to the target element information and recommending the target commodity information to the user.
CN202110214235.0A 2021-02-25 2021-02-25 Video-centered fused media content recommendation method and device Pending CN113010701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110214235.0A CN113010701A (en) 2021-02-25 2021-02-25 Video-centered fused media content recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110214235.0A CN113010701A (en) 2021-02-25 2021-02-25 Video-centered fused media content recommendation method and device

Publications (1)

Publication Number Publication Date
CN113010701A true CN113010701A (en) 2021-06-22

Family

ID=76387177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110214235.0A Pending CN113010701A (en) 2021-02-25 2021-02-25 Video-centered fused media content recommendation method and device

Country Status (1)

Country Link
CN (1) CN113010701A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538090A (en) * 2021-07-13 2021-10-22 海南大学 Virtual community personnel character analysis and content push method based on DIKW map
CN113610034A (en) * 2021-08-16 2021-11-05 脸萌有限公司 Method, device, storage medium and electronic equipment for identifying person entity in video
CN114494982A (en) * 2022-04-08 2022-05-13 北京嘉沐安科技有限公司 Live video big data accurate recommendation method and system based on artificial intelligence
CN115134636A (en) * 2022-08-29 2022-09-30 北京达佳互联信息技术有限公司 Information recommendation method and device
CN115878902A (en) * 2023-02-16 2023-03-31 北京同方凌讯科技有限公司 Automatic information key theme extraction system of media fusion platform based on neural network model
CN117093718A (en) * 2023-10-20 2023-11-21 联通沃音乐文化有限公司 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology
WO2024007861A1 (en) * 2022-07-08 2024-01-11 海信视像科技股份有限公司 Receiving apparatus and metadata generation system
CN117611243A (en) * 2023-12-14 2024-02-27 任拓数据科技(上海)有限公司 Analysis method for quantitatively analyzing interaction and sales indexes of content tags

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550190A (en) * 2015-06-26 2016-05-04 许昌学院 Knowledge graph-oriented cross-media retrieval system
CN106055710A (en) * 2016-07-01 2016-10-26 传线网络科技(上海)有限公司 Video-based commodity recommendation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550190A (en) * 2015-06-26 2016-05-04 许昌学院 Knowledge graph-oriented cross-media retrieval system
CN106055710A (en) * 2016-07-01 2016-10-26 传线网络科技(上海)有限公司 Video-based commodity recommendation method and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538090A (en) * 2021-07-13 2021-10-22 海南大学 Virtual community personnel character analysis and content push method based on DIKW map
CN113538090B (en) * 2021-07-13 2022-09-23 海南大学 Virtual community personnel character analysis and content push method based on DIKW map
CN113610034A (en) * 2021-08-16 2021-11-05 脸萌有限公司 Method, device, storage medium and electronic equipment for identifying person entity in video
CN113610034B (en) * 2021-08-16 2024-04-30 脸萌有限公司 Method and device for identifying character entities in video, storage medium and electronic equipment
CN114494982A (en) * 2022-04-08 2022-05-13 北京嘉沐安科技有限公司 Live video big data accurate recommendation method and system based on artificial intelligence
CN114494982B (en) * 2022-04-08 2022-12-20 华夏文广传媒集团股份有限公司 Live video big data accurate recommendation method and system based on artificial intelligence
WO2024007861A1 (en) * 2022-07-08 2024-01-11 海信视像科技股份有限公司 Receiving apparatus and metadata generation system
CN115134636A (en) * 2022-08-29 2022-09-30 北京达佳互联信息技术有限公司 Information recommendation method and device
CN115878902A (en) * 2023-02-16 2023-03-31 北京同方凌讯科技有限公司 Automatic information key theme extraction system of media fusion platform based on neural network model
CN117093718A (en) * 2023-10-20 2023-11-21 联通沃音乐文化有限公司 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology
CN117093718B (en) * 2023-10-20 2024-04-09 联通沃音乐文化有限公司 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology
CN117611243A (en) * 2023-12-14 2024-02-27 任拓数据科技(上海)有限公司 Analysis method for quantitatively analyzing interaction and sales indexes of content tags

Similar Documents

Publication Publication Date Title
CN113010701A (en) Video-centered fused media content recommendation method and device
CN111143610B (en) Content recommendation method and device, electronic equipment and storage medium
US10769444B2 (en) Object detection from visual search queries
CN112015949B (en) Video generation method and device, storage medium and electronic equipment
US9471936B2 (en) Web identity to social media identity correlation
US9253511B2 (en) Systems and methods for performing multi-modal video datastream segmentation
KR102112973B1 (en) Estimating and displaying social interest in time-based media
US20080294625A1 (en) Item recommendation system
CN112163122A (en) Method and device for determining label of target video, computing equipment and storage medium
Mei et al. ImageSense: Towards contextual image advertising
US9449231B2 (en) Computerized systems and methods for generating models for identifying thumbnail images to promote videos
CN111506831A (en) Collaborative filtering recommendation module and method, electronic device and storage medium
CN110290403B (en) Network video paster advertisement playing method and system
Fei et al. Learning user interest with improved triplet deep ranking and web-image priors for topic-related video summarization
Boukadida et al. Automatically creating adaptive video summaries using constraint satisfaction programming: Application to sport content
Kordabadi et al. A movie recommender system based on topic modeling using machine learning methods
Lin et al. Automated multi-modal video editing for ads video
Wang et al. Interactive web video advertising with context analysis and search
KR20220113221A (en) Method And System for Trading Video Source Data
Wang et al. Exploiting content relevance and social relevance for personalized ad recommendation on internet TV
CN113709529B (en) Video synthesis method, device, electronic equipment and computer readable medium
US20240020336A1 (en) Search using generative model synthesized images
Behera et al. SDAM: Semantic Annotation Model for Multi-modal Short Videos Based on Deep Neural Network
Peronikolis et al. Personalized Video Summarization: A Comprehensive Survey of Methods and Datasets
CN113868547A (en) Content recommendation method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination