CN115052193B - Video recommendation method, system, device and storage medium - Google Patents

Video recommendation method, system, device and storage medium Download PDF

Info

Publication number
CN115052193B
CN115052193B CN202210575753.XA CN202210575753A CN115052193B CN 115052193 B CN115052193 B CN 115052193B CN 202210575753 A CN202210575753 A CN 202210575753A CN 115052193 B CN115052193 B CN 115052193B
Authority
CN
China
Prior art keywords
classification result
information
emotion
emotion classification
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210575753.XA
Other languages
Chinese (zh)
Other versions
CN115052193A (en
Inventor
郝德禄
肖冠正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iMusic Culture and Technology Co Ltd
Original Assignee
iMusic Culture and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iMusic Culture and Technology Co Ltd filed Critical iMusic Culture and Technology Co Ltd
Priority to CN202210575753.XA priority Critical patent/CN115052193B/en
Publication of CN115052193A publication Critical patent/CN115052193A/en
Application granted granted Critical
Publication of CN115052193B publication Critical patent/CN115052193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/768Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4665Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video recommendation method, a system, a device and a storage medium, wherein the method is used for carrying out matching processing according to eyeball information and preset rules to obtain a first emotion classification result, and determining the first emotion classification result by utilizing the eyeball information, so that the accuracy is high; extracting face information from the context information, inputting the face information and the context information into a network model to obtain a second emotion classification result, analyzing according to the first emotion classification result and the second emotion classification result to obtain an emotion type, and combining the first emotion classification result analyzed based on eyeball information with the second emotion classification result analyzed based on the context information to obtain the emotion type, so that the accuracy of the emotion type is further improved; the target video is determined and recommended according to the emotion type, and targeted video recommendation is performed, so that the current emotion of the user is relieved and improved, the applicability is strong, and the method and the device can be widely applied to the technical field of artificial intelligence.

Description

Video recommendation method, system, device and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a video recommendation method, a video recommendation system, a video recommendation device and a storage medium.
Background
The existing video pushing mode is not classified but pushed in a large batch, according to the video category clicked by a user, interests of the user are analyzed, after a background processor learns the interests of the user, similar videos are pushed according to a certain frequency, for example, the user clicks a movie class short video for many times, and after detection of a rear data end, related movie class short videos are pushed.
However, this pushing method cannot be applied to various specific situations, for example, when the user is in different emotions, the recommendation cannot be performed according to the current emotion of the user, so the following situations may occur: recommending boring knowledge-based videos makes the user more boring when the user feels boring; when the mood of the user is excited, recommending the excited video to be excited by the other user; when the user's mood falls, passive, horror videos are pushed, which more affects the user's mood, and thus solutions need to be sought.
Disclosure of Invention
In view of the above, the present invention aims to provide a video recommendation method, apparatus, device and storage medium for performing targeted video recommendation.
The technical scheme adopted by the embodiment of the invention is as follows:
the video recommendation method comprises the following steps:
acquiring user data; the user data comprises eyeball information and context information, wherein the context information consists of background and face information;
matching processing is carried out according to the eyeball information and a preset rule, and a first emotion classification result is obtained;
extracting the face information from the context information, and inputting the face information and the context information into a network model to obtain a second emotion classification result;
analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types;
and determining the target video according to the emotion type and recommending the target video.
Further, the matching processing is performed according to the eyeball information and a preset rule to obtain a first emotion classification result, including:
generating a scanned image of the eyeball information;
analyzing the scanned image to obtain pupil states;
matching the pupil state with a preset rule to obtain a first emotion classification result;
the preset rules comprise that pupil constriction represents negative emotion, pupil enlargement represents excitement or fear, and pupil unchanged represents boring.
Further, before the inputting the face information and the context information into the network model, the method further includes:
performing first preprocessing on the face information to obtain first preprocessed face information;
performing second preprocessing on the context information to obtain context information after the second preprocessing;
the first preprocessing and the second preprocessing include cropping and scaling, and the size of the face information is smaller than the size of the context information.
Further, the extracting the face information from the context information, inputting the face information and the context information into a network model, and obtaining a second emotion classification result includes:
inputting the context information into a context RNN to obtain a context feature;
inputting the face information to a face RNN, and obtaining a second emotion classification result through the face RNN according to the face information and the context characteristics;
the network model comprises the context RNN and the face RNN, and the context RNN have a cascade relation with an attention mechanism.
Further, the face RNN includes a plurality of CNN units and LSTM units; the obtaining, by the face RNN, a second emotion classification result according to the face information and the contextual feature includes:
coding the facial information through the CNN unit, and obtaining an LSTM context vector according to a coding result and the attention operation of the context feature based on an attention mechanism;
and outputting a second emotion classification result through the LSTM unit according to the context vector.
Further, the analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types includes:
when the first emotion classification result is the same as the second emotion classification result, taking the first emotion classification result or the second emotion classification result as an emotion type;
or alternatively, the process may be performed,
and when the first emotion classification result is different from the second emotion classification result, determining a target emotion classification result with higher priority from the first emotion classification result and the second emotion classification result according to the preset priority as an emotion type.
Further, the determining the target video according to the emotion type and recommending the target video comprises the following steps:
acquiring video resources;
classifying the video resources to obtain videos of different video types;
and determining the video of the video type opposite to the emotion type as a target video and recommending the target video.
The embodiment of the invention also provides a video recommendation system, which comprises:
the acquisition module is used for acquiring user data; the user data comprises eyeball information and context information, wherein the context information consists of background and face information;
the processing module is used for carrying out matching processing according to the eyeball information and a preset rule to obtain a first emotion classification result;
the classification module is used for extracting the face information from the context information, inputting the face information and the context information into a network model, and obtaining a second emotion classification result;
the analysis module is used for analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types;
and the recommending module is used for determining the target video according to the emotion type and recommending the target video.
The embodiment of the invention also provides a video recommending device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the method.
Embodiments of the present invention also provide a computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement the method.
The beneficial effects of the invention are as follows: the method comprises the steps that eyeball information, context information and face information are used for forming the context information; the eyeball information is matched with a preset rule to obtain a first emotion classification result, and the first emotion classification result is determined by utilizing the eyeball information, so that the accuracy is high; extracting the face information from the context information, inputting the face information and the context information into a network model to obtain a second emotion classification result, analyzing according to the first emotion classification result and the second emotion classification result to obtain an emotion type, and combining the first emotion classification result based on eyeball information analysis with the second emotion classification result based on context information analysis to obtain the emotion type, so that the accuracy of the emotion type is further improved; and determining and recommending the target video according to the emotion type, and recommending the targeted video, so that the current emotion of the user is relieved and improved, and the applicability is strong.
Drawings
FIG. 1 is a flowchart illustrating steps of a video recommendation method according to the present invention;
fig. 2 is a schematic diagram of a video recommendation method according to an embodiment of the invention.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
As shown in fig. 1, an embodiment of the present invention provides a video recommendation method, including steps S100 to S500:
s100, acquiring user data.
In the embodiment of the invention, the user data comprises eyeball information and context information, and the context information consists of background and face information. Research shows that environmental information including surrounding environment and human body can provide additional clues for more accurately identifying emotion, so that the embodiment of the invention collects context information. When user data is acquired, the user's permission is firstly inquired, the camera of the user terminal is turned on after the user's permission is acquired, one or more pictures or video clips with a certain time length are shot through the camera, and therefore the user data is acquired. In an exemplary embodiment of the present invention, taking a shot video clip as an example, a picture sequence formed by a plurality of pictures may be obtained for subsequent processing.
And S200, performing matching processing according to eyeball information and preset rules to obtain a first emotion classification result.
Optionally, step S200 includes steps S210-S230.
S210, generating a scanning image of eyeball information.
Optionally, real-time scanning capturing is performed according to the user data collected in real time or the user data just collected, and a scanning image corresponding to eyeball information is generated.
S220, analyzing the scanned image to obtain pupil states.
Alternatively, the scanned image may be analyzed by an eyeball analysis model or expression analysis software constructed in advance, thereby determining the pupil state. The pupil state includes, but is not limited to, pupil movement to the left, pupil movement to the right, pupil constriction, pupil dilation, or pupil no change.
And S230, matching the pupil state with a preset rule to obtain a first emotion classification result.
Optionally, the preset rules include, but are not limited to, pupil constriction characterizing negative emotions (e.g., aversive stimulus, tired, offensive, low, sad, hard, frustrated, etc.), pupil dilation characterizing excitement (e.g., pleasure, loving, exciting, etc.) or fear (e.g., upon encountering a pleasurable stimulus, the pupil automatically enlarges, and when the subject is panicked, excited, the pupil enlarges 4 times usual), pupil unchanged characterizing boring (e.g., indifference, no idea, etc.), pupil movement to the left characterizing recall, kong Xiangyou side movement characterizing thinking. In some embodiments, recall, thinking may be categorized as one of boring, passive emotion, excitement, or fear, and may be set according to different circumstances. It should be noted that, the preset rule is applicable to ninety percent of people, so the analysis accuracy is high.
Optionally, steps S240 to S250 are further included after step S200 and before step S300:
s240, performing first preprocessing on the face information to obtain the face information after the first preprocessing.
In the embodiment of the present invention, the face information is a face stream (face stream), which includes the face information detected in each frame of the original frame in the video clip, that is, the sequence of the face information, and the face information is subjected to a first preprocessing, specifically, the face information is cut and scaled to a first size, so as to obtain the face information after the first preprocessing, where the first size is 128×128, and other embodiments may have other sizes.
S250, performing second preprocessing on the context information to obtain the context information after the second preprocessing.
In the embodiment of the present invention, the context information is used as a context stream (context stream), where the context stream includes each frame of the original frame of the video clip, the context information is subjected to a second preprocessing, specifically, the context information is subjected to a center clipping and scaling to a second size, so as to obtain the face information after the second preprocessing, and the second size is 224×224, which may be other sizes in other embodiments.
S300, extracting face information from the context information, and inputting the face information and the context information into the network model to obtain a second emotion classification result.
Alternatively, the face information is extracted separately from the context information. As shown in fig. 2, optionally, the network model adopts a CACA-RNN, that is, a cascade type RNN with context awareness capability and based on attention, and the network model adopts a cascade structure, and is composed of two neural networks, namely a face RNN and a context RNN, where a cascade relationship exists between the context RNN and a context RNN and an attention mechanism (attention), and the attention mechanism can locate relevant context information in the context RNN.
Step S300 includes steps S310-S320:
s310, inputting the context information into the context RNN to obtain the context characteristics.
S320, inputting the face information into a face RNN, and obtaining a second emotion classification result through the face RNN according to the face information and the context characteristics.
Optionally, the context RNN has the same structure as the face RNN, and each includes a plurality of RNN units and LSTM units. Specifically, the context information is input into the context RNN, and a plurality of RNN units in the context RNN process the input context information, that is, images of different frames (t=1-4), respectively, and input the processing result into each LSTM unit in the context RNN, where each LSTM unit outputs a corresponding context feature. The facial information, that is, the images corresponding to different frames (t=1-4) and containing the facial information, are processed and input to the face RNN, the face information is encoded by the CNN unit in the face RNN, the LSTM context vector of the LSTM unit input to the face RNN is obtained according to the encoding result and the attention operation of the context feature based on the attention mechanism, and then the second emotion classification result is output by the LSTM unit according to the context vector, and the output smiling face as shown in fig. 2 represents the second emotion classification result as excitement (such as pleasure, favorites, excitement, etc.).
In the embodiment of the invention, conditional probability is utilized in the face RNN processing process
Wherein y is i Is a prediction generated at output time i, h (·) is a nonlinear function,is a sequence of facial information read by the human face RNN in time steps 1 to i,/for human face RNN>Is the hidden state of the face RNN, expressed as:
wherein f (·) is a nonlinear function,is the hidden state of the face RNN at output time i-1,/and the like>Is LSTM
Context vector, expressed as:
where T is the total time step, score () is a function of the calculated score,is an LSTM context vectorThe hidden state at time t is expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the hidden state of the LSTM context vector at time T-1, the context RNN reads the context feature sequence +.>Entering an LSTM context vector And (5) extracting the context characteristics at the time t.
S400, analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types.
Optionally, step S400 includes step S410 or S420:
s410, when the first emotion classification result is the same as the second emotion classification result, taking the first emotion classification result or the second emotion classification result as the emotion type.
Specifically, when the first emotion classification result is the same as the second emotion classification result, for example, both are negative emotions (such as disgust, offensive, lowly, sad, depressed, etc.), one of the first emotion classification result and the second emotion classification result is taken as an emotion type, and the emotion type is obtained as a negative emotion.
S420, when the first emotion classification result is different from the second emotion classification result, determining a target emotion classification result with higher priority from the first emotion classification result and the second emotion classification result according to the preset priority as an emotion type.
Specifically, when the first emotion classification result is different from the second emotion classification result, the target emotion classification result may be determined as the emotion type according to a preset priority. For example, the preset priority may be:
1) The priority of the first emotion classification result is higher than that of the second emotion classification result, and the first emotion classification result is determined to be a target emotion classification result and is used as an emotion type;
2) The priority of the second emotion classification result is higher than that of the first emotion classification result, and the second emotion classification result is determined to be a target emotion classification result and is used as an emotion type;
3) Setting the priority level from high to low as follows: negative emotion, excitement or fear, boring, for example, the first emotion classification result is boring, the second emotion classification result is negative emotion, and the second emotion classification result, namely, the negative emotion is taken as emotion type.
S500, determining a target video according to the emotion type and recommending the target video.
Optionally, step S500 includes steps S510-S530:
s510, acquiring video resources.
Specifically, the user's terminal or software, applet, web page, etc. used by the user may acquire video resources on the network.
S520, classifying the video resources to obtain videos of different video types.
Specifically, the videos can be classified by identifying the labels of the videos in the video resources to obtain videos of different video types, or the video resources are processed through an artificial intelligence algorithm to identify the types of the videos, so that the videos of different video types are obtained.
And S530, determining the video of the video type opposite to the emotion type as a target video and recommending the target video.
It should be noted that, the video type opposite to the emotion type refers to a video type opposite to the emotion type capable of helping the user improve the current emotion type, and then the video of the video type opposite to the emotion type is determined as a target video and recommended to the user, and after clicking, the user's emotion is improved and relaxed. For example:
when the emotion type is boring, the corresponding recommended video type is novel and interesting, so that the user does not feel boring, and the purpose of relaxation can be achieved;
when the emotion type is excited, the corresponding recommended video type is calm and calm, so that the user can calm the emotion;
when the emotion type is panic, the corresponding recommended video type is placebo, warm, positive energy, soothing the panic emotion of the user;
when the emotion types are sad and difficult, the corresponding recommended video types are pleasant, relaxed and fun, so that the negative emotion is relieved after the user watches the recommended video;
when the emotion type is frustration and the corresponding recommended video type is happy, relaxed, fun and inspired, the user can relax and re-establish confidence after watching the recommended video, and the user can inject vitality into the spirit and actively face life.
The embodiment of the invention also provides a video recommendation system, which comprises:
the acquisition module is used for acquiring user data; the user data comprises eyeball information and context information, wherein the context information comprises background and face information;
the processing module is used for carrying out matching processing according to eyeball information and preset rules to obtain a first emotion classification result;
the classification module is used for extracting facial information from the context information, inputting the facial information and the context information into the network model and obtaining a second emotion classification result;
the analysis module is used for analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types;
and the recommending module is used for determining the target video according to the emotion type and recommending the target video.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
The embodiment of the invention also provides a video recommending device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the video recommending method of the previous embodiment. The video recommendation device of the embodiment of the invention comprises, but is not limited to, any intelligent terminal such as a mobile phone, a tablet personal computer, a vehicle-mounted computer and the like.
The content in the method embodiment is applicable to the embodiment of the device, and the functions specifically realized by the embodiment of the device are the same as those of the method embodiment, and the obtained beneficial effects are the same as those of the method embodiment.
The embodiment of the invention also provides a computer readable storage medium, in which at least one instruction, at least one section of program, code set or instruction set is stored, and the at least one instruction, the at least one section of program, code set or instruction set is loaded and executed by a processor to implement the video recommendation method of the foregoing embodiment.
Embodiments of the present invention also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the video recommendation method of the foregoing embodiment.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (8)

1. The video recommendation method is characterized by comprising the following steps:
acquiring user data; the user data comprises eyeball information and context information, wherein the context information consists of background and face information;
matching processing is carried out according to the eyeball information and a preset rule, and a first emotion classification result is obtained;
extracting the face information from the context information, and inputting the face information and the context information into a CACA-RNN model to obtain a second emotion classification result; the CACA-RNN model is a cascade type RNN with context awareness capability and is based on attention, and before the face information and the context information are input into the CACA-RNN model, the CACA-RNN model further comprises:
performing first preprocessing on the face information to obtain first preprocessed face information;
performing second preprocessing on the context information to obtain context information after the second preprocessing;
the first preprocessing and the second preprocessing comprise clipping and scaling, and the size of the face information is smaller than that of the context information;
analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types;
determining and recommending the target video according to the emotion type, wherein the method comprises the following steps:
acquiring video resources;
classifying the video resources to obtain videos of different video types;
and determining the video of the video type opposite to the emotion type as a target video and recommending the target video.
2. The video recommendation method of claim 1, wherein: the matching processing is performed according to the eyeball information and a preset rule to obtain a first emotion classification result, which comprises the following steps:
generating a scanned image of the eyeball information;
analyzing the scanned image to obtain pupil states;
matching the pupil state with a preset rule to obtain a first emotion classification result;
the preset rules comprise that pupil constriction represents negative emotion, pupil enlargement represents excitement or fear, and pupil unchanged represents boring.
3. The video recommendation method according to claim 1 or 2, wherein: the extracting the face information from the context information, inputting the face information and the context information into a network model, and obtaining a second emotion classification result, including:
inputting the context information into a context RNN to obtain a context feature;
inputting the face information to a face RNN, and obtaining a second emotion classification result through the face RNN according to the face information and the context characteristics;
the network model comprises the context RNN and the face RNN, and the context RNN have a cascade relation with an attention mechanism.
4. The video recommendation method of claim 3, wherein: the face RNN comprises a plurality of CNN units and LSTM units; the obtaining, by the face RNN, a second emotion classification result according to the face information and the contextual feature includes:
coding the facial information through the CNN unit, and obtaining an LSTM context vector according to a coding result and the attention operation of the context feature based on an attention mechanism;
and outputting a second emotion classification result through the LSTM unit according to the context vector.
5. The video recommendation method of claim 1, wherein: analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types, including:
when the first emotion classification result is the same as the second emotion classification result, taking the first emotion classification result or the second emotion classification result as an emotion type;
or alternatively, the process may be performed,
and when the first emotion classification result is different from the second emotion classification result, determining a target emotion classification result with higher priority from the first emotion classification result and the second emotion classification result according to the preset priority as an emotion type.
6. A video recommendation system, characterized in that the video recommendation method according to any one of claims 1-5 is applied, comprising:
the acquisition module is used for acquiring user data; the user data comprises eyeball information and context information, wherein the context information consists of background and face information;
the processing module is used for carrying out matching processing according to the eyeball information and a preset rule to obtain a first emotion classification result;
the classification module is used for extracting the face information from the context information, inputting the face information and the context information into a CACA-RNN model and obtaining a second emotion classification result; the CACA-RNN model is a cascade type RNN with context awareness and based on attention;
the analysis module is used for analyzing according to the first emotion classification result and the second emotion classification result to obtain emotion types;
and the recommending module is used for determining the target video according to the emotion type and recommending the target video.
7. A video recommendation device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set, the at least one instruction, at least one program, code set or instruction set being loaded and executed by the processor to implement the method of any of claims 1-5.
8. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the method of any of claims 1-5.
CN202210575753.XA 2022-05-25 2022-05-25 Video recommendation method, system, device and storage medium Active CN115052193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210575753.XA CN115052193B (en) 2022-05-25 2022-05-25 Video recommendation method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210575753.XA CN115052193B (en) 2022-05-25 2022-05-25 Video recommendation method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN115052193A CN115052193A (en) 2022-09-13
CN115052193B true CN115052193B (en) 2023-07-18

Family

ID=83159023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210575753.XA Active CN115052193B (en) 2022-05-25 2022-05-25 Video recommendation method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN115052193B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021217912A1 (en) * 2020-04-28 2021-11-04 深圳壹账通智能科技有限公司 Facial recognition-based information generation method and apparatus, computer device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423707A (en) * 2017-07-25 2017-12-01 深圳帕罗人工智能科技有限公司 A kind of face Emotion identification method based under complex environment
CN109376621A (en) * 2018-09-30 2019-02-22 北京七鑫易维信息技术有限公司 A kind of sample data generation method, device and robot
CN109376304A (en) * 2018-11-30 2019-02-22 维沃移动通信有限公司 A kind of information recommendation method and device
CN111339847B (en) * 2020-02-14 2023-04-14 福建帝视信息科技有限公司 Face emotion recognition method based on graph convolution neural network
KR102504722B1 (en) * 2020-06-24 2023-02-28 영남대학교 산학협력단 Learning apparatus and method for creating emotion expression video and apparatus and method for emotion expression video creation
CN113723359A (en) * 2021-09-16 2021-11-30 未鲲(上海)科技服务有限公司 User emotion recognition method and device, computer equipment and readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021217912A1 (en) * 2020-04-28 2021-11-04 深圳壹账通智能科技有限公司 Facial recognition-based information generation method and apparatus, computer device and storage medium

Also Published As

Publication number Publication date
CN115052193A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
US11393133B2 (en) Emoji manipulation using machine learning
Hussain et al. Automatic understanding of image and video advertisements
US20170098122A1 (en) Analysis of image content with associated manipulation of expression presentation
US11073899B2 (en) Multidevice multimodal emotion services monitoring
CN108197592B (en) Information acquisition method and device
US11430561B2 (en) Remote computing analysis for cognitive state data metrics
CN110719525A (en) Bullet screen expression package generation method, electronic equipment and readable storage medium
US20220101146A1 (en) Neural network training with bias mitigation
CN110351580B (en) Television program topic recommendation method and system based on non-negative matrix factorization
CN111339420A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114567693B (en) Video generation method and device and electronic equipment
CN113900522A (en) Interaction method and device of virtual image
CN115052193B (en) Video recommendation method, system, device and storage medium
CN113573128A (en) Audio processing method, device, terminal and storage medium
CN113761281B (en) Virtual resource processing method, device, medium and electronic equipment
Pijani et al. Inferring attributes with picture metadata embeddings
WO2022002865A1 (en) A system and a method for personalized content presentation
Balfaqih A Hybrid Movies Recommendation System Based on Demographics and Facial Expression Analysis using Machine Learning.
CN111859165A (en) Real-time personalized information flow recommendation method based on user behaviors
CN113407772A (en) Video recommendation model generation method, video recommendation method and device
Gunes et al. 16 automatic analysis of social emotions
CN115878835B (en) Cartoon background music matching method, device and storage medium
CN117873976A (en) Video tag generation method and device, electronic equipment and nonvolatile storage medium
Lee Efficient Deep Learning-Driven Systems for Real-Time Video Expression Recognition
CN113515636A (en) Text data processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant