Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing the devices, modules or units, and are not used for limiting the devices, modules or units to be different devices, modules or units, and also for limiting the sequence or interdependence relationship of the functions executed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.
In view of the above technical problem, an embodiment of the present disclosure provides a video recommendation method, which may be executed by a user terminal, and as shown in fig. 1, the method may include:
step S110, when the user is detected to watch the video, sensory information of the user is obtained.
The sense organ refers to an organ which receives external stimulation, and includes body parts such as eyes, ears, nose, mouth and the like, information corresponding to each sense organ can be called as sense organ information, the sense organ information includes information corresponding to the sense organ and information sent out through the sense organ, the information corresponding to the sense organ, such as information corresponding to the eyes, information corresponding to the mouth, and information sent out through the sense organ, such as voice information sent out through the mouth. The sensory information acquired by the user when watching the video refers to the corresponding sensory information acquired by the user when watching the video content of the video when watching the currently played video.
And step S120, sending the sensory information to the server, receiving the relevant information of the video to be recommended returned by the server, and displaying the relevant information of the video to be recommended to the user.
Wherein the video to be recommended is determined based on the sensory information.
The sensory information may reflect the viewing state of the user, including viewing feelings of the currently viewed video, such as sadness, happiness, feeling, and the like, and based on the viewing feelings of the user, it may be determined whether the user likes the currently played video, and based on the viewing feelings, the video to be recommended is determined.
The number of the videos to be recommended can be one or more, and the related information of the videos to be recommended refers to information which can reflect the characteristics of the videos, such as the subject of the videos, the brief introduction of the videos, the front cover of the videos, and the like.
It is to be understood that if it is determined that the user likes the video based on the viewing experience of the user, the video to be recommended may be a video related to the video, for example, a video of the same type as the currently played video, and if it is determined that the user does not like the video based on the viewing experience of the user, the video to be recommended may be a video unrelated to the video, for example, a video of a different type from the currently played video.
According to the scheme in the embodiment of the disclosure, when the user watches the video, the sensory information of the user is acquired, and based on the sensory information, the watching state of the user can be reflected, including the watching feeling of the currently watched video, and the video to be recommended determined based on the sensory information of the user takes the watching feeling of the user into consideration, so that the recommended video is more accurate and more accords with the mind of the user.
In an embodiment of the present disclosure, the sensory information includes at least one of facial images or voice information.
Based on the foregoing description, it can be known that the facial expression of the user can be reflected through each sense organ in the facial image, and the current watching experience of the user can be reflected through the facial expression, for example, whether the user is happy can be determined through the radian corresponding to the eyes, and the watching experience of the user can be reflected based on the facial image.
The voice information refers to sound characteristics emitted by a user through a sense organ (for example, a mouth), and the viewing experience of the user can be reflected through the voice information, for example, the tone in the voice information can reflect the viewing experience of the user based on the voice information.
It is understood that, in the solution of the present disclosure, the viewing experience of the user can be reflected by combining the facial image and the voice information, which is within the protection scope of the present disclosure.
In the embodiment of the disclosure, the relevant information of the video to be recommended is determined by the server in the following way:
determining a current viewing mood of the user based on the sensory information;
and determining the relevant information of the video to be recommended based on the current watching mood.
As an alternative, when determining the current viewing mood of the user based on the sensory information, may include: determining sensory characteristics based on the sensory information; a current viewing mood of the user is determined based on the sensory characteristics. The sensory features comprise at least one of expression features or voice features, namely when the sensory information is a face image, the sensory features determined based on the face image are expression features, and when the sensory information is voice information, the sensory features determined based on the voice information are voice features.
The facial image comprises various senses of the user, when the user watches the currently played video, the expression characteristics of the user can be determined based on the facial image of the user, and the watching mood of the user on the currently played video can be reflected according to the expression characteristics.
The number of the acquired face images may be one or several. The facial image is an image captured when the user watches the currently played video, and it should be noted that determining the expression features of the user based on the facial image may be implemented based on an image processing technology, which is not described herein again.
The face image can be acquired based on image acquisition equipment, and the image acquisition equipment can be a user terminal corresponding to the playing of the current video and can also be other image acquisition equipment.
The voice information refers to the voice of the user collected by the audio collecting device, and the audio collecting device may be a user terminal corresponding to the playing of the current video, or may be other audio collecting devices. The voice information is the voice that the user has collected while watching the currently played video. The voice information includes the tone expressed by the user when the user watches the video, so that the voice characteristics of the user can be determined based on the voice information, and the watching mood of the user can be reflected according to the voice characteristics.
Determining the voice characteristics of the user based on the voice information may be implemented based on a voice processing technique, which is not described herein again.
As an alternative, the image capturing device may be a camera of the user terminal, and the audio capturing device may be a microphone of the user terminal.
In the scheme of the disclosure, when determining the viewing mood of the user based on the sensory characteristics, based on actual requirements, for example, only one sensory characteristic of one sensory is obtained, the viewing mood of the user can be determined based on the sensory characteristic of one sensory of the user, and if the sensory characteristics of a plurality of sensory are obtained simultaneously, the viewing mood of the user can also be determined based on the sensory characteristics of a plurality of sensory of the user.
In an embodiment of the present disclosure, determining a current viewing mood of a user based on sensory information includes:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
The mood classification model is trained in advance, a current watching mood model of a user can be determined based on sensory information, the sensory information is input into the model, the probability value of each mood is output, and the mood corresponding to the maximum probability value is taken as the current watching mood.
As an alternative, in the training process of the model, the training sample carries sensory information of each mood label, the mood label represents a mood labeling result corresponding to each facial image, the initial neural network model is trained based on the training sample until a loss function of the initial neural network model converges, and the initial neural network model at the end of training is used as a mood classification model.
The input of the initial neural network model is a training sample, the output is a predicted mood corresponding to each sensory information in the training sample, and the value of the loss function represents the difference between the predicted mood and a corresponding mood labeling result.
In the above example, the initial Neural network model may be a commonly used Neural network model, such as a CNN (Convolutional Neural network) model, an RNN (Recurrent Neural network) model, an LSTM (long-short memory network) model, etc. the loss function may be a classification loss function (e.g., when the classification layer of the Neural network model is a softmax layer, the loss function may be a commonly used classification loss function corresponding to the softmax layer).
In the solution of the present disclosure, when the sensory information includes the facial image and the voice information, the mood classification model may include a first classification model and a second classification model, wherein, based on the sensory information, the current viewing mood of the user is determined through a pre-trained mood classification model, including:
determining a first viewing mood of the user through a first classification model based on the facial image;
determining a second viewing mood of the user through a second classification model based on the voice information;
determining a current viewing mood of the user based on the first viewing mood and the second viewing mood.
The input of the first classification model is a face image, the output is a probability value of each mood, and the mood corresponding to the maximum probability value is taken as the current watching mood. Similarly, the input of the second classification model is voice information, the probability value of each mood is output, and the mood corresponding to the maximum probability value is taken as the current watching mood.
As an alternative, the obtained facial image may be processed through the trained first classification model to determine the current viewing mood corresponding to the facial image, specifically, the facial image may be subjected to feature extraction through the first classification model to obtain an expression feature, and the current viewing mood corresponding to the facial image is determined based on the expression feature.
In the training process of the model, a training sample is a face image carrying each mood label, the mood labels represent mood labeling results corresponding to each face image, the initial neural network model is trained based on the training sample until a loss function of the initial neural network model converges, and the initial neural network model after training is taken as a first classification model.
The input of the initial neural network model is a training sample, the output of the initial neural network model is a predicted mood corresponding to each face image in the training sample, and the value of the loss function represents the difference between the predicted mood and a corresponding mood labeling result.
In the above example, the initial Neural network model may be a commonly used Neural network model, such as a CNN (Convolutional Neural network) model, an RNN (Recurrent Neural network) model, an LSTM (long-short memory network) model, etc. the loss function may be a classification loss function (e.g., when the classification layer of the Neural network model is a softmax layer, the loss function may be a commonly used classification loss function corresponding to the softmax layer).
As an alternative, based on the obtained voice information, the voice information can be processed through a trained second classification model, and the current watching mood corresponding to the voice information is determined; specifically, feature extraction can be performed on the voice information through the second classification model to obtain voice features, and based on the voice features, the current watching mood corresponding to the voice information is determined.
In the training process of the model, a training sample is language information carrying each mood label, the mood labels represent mood labeling results corresponding to each voice information, the initial neural network model is trained based on the training sample until a loss function of the initial neural network model converges, and the initial neural network model after training is taken as a second classification model.
The input of the initial neural network model is a training sample, the output is a predicted mood corresponding to each voice message in the training sample, and the value of the loss function represents the difference between the predicted mood and a corresponding mood labeling result.
In the above example, the initial Neural network model may be a commonly used Neural network model, such as a CNN (Convolutional Neural network) model, an RNN (Recurrent Neural network) model, an LSTM (long-short memory network) model, etc. the loss function may be a classification loss function (e.g., when the classification layer of the Neural network model is a softmax layer, the loss function may be a commonly used classification loss function corresponding to the softmax layer).
In an embodiment of the present disclosure, before obtaining sensory information of a user, the method further includes:
displaying prompt information, wherein the prompt information is used for prompting whether the user allows to acquire the sensory information of the user or not;
the sensory information acquisition determination operation of the user is received based on the prompt information, and the sensory information of the user is acquired when the user is detected to watch the video.
Before the sensory information is collected, prompt information can be displayed firstly in order to ensure the privacy of a user, the user can decide whether to allow the sensory information of the user to be collected based on the prompt information, if the sensory information is allowed to be collected, the sensory information of the user is obtained when the user watches the currently played video, and if the sensory information is not allowed to be collected, the sensory information of the user cannot be obtained when the user watches the currently played video.
The prompt message may be a voice message or a text message, and the specific expression form of the prompt message is not limited in this disclosure.
In an embodiment of the present disclosure, the prompt information includes at least one of first prompt information or second prompt information, the first prompt information is used to prompt the user whether to allow the user to acquire the facial image, and the second prompt information is used to prompt the user whether to allow the user to acquire the voice information.
If the face image of the user is to be acquired, the prompt information is first prompt information, for example, whether to allow access to an image acquisition device (for example, a camera) of the user terminal; if the voice information of the user is wanted to be acquired, the prompt message is a second prompt message, such as whether the audio acquisition device (such as a microphone) of the user terminal is allowed to be accessed.
In an embodiment of the present disclosure, the method further includes:
acquiring user information of a user, wherein the user information comprises at least one item of user attribute information and historical behavior information of watching videos;
the video to be recommended is determined based on the sensory information and the user information.
The user attribute information refers to personal information of the user, such as age, gender, location, and the like, and the historical behavior information of watching videos refers to related information of the user's historical watching videos, such as types of videos that the user likes to watch, information of videos that the user watches in the near period, and the like.
The user attribute information can reflect the type of the video liked by the user, and when the video to be recommended is determined, the user attribute information and the sensory information of the user can be combined for determination, so that the determined video to be recommended is more accurate.
As an example, for example, the video content of the currently played video is a lovely koji, when the obtained facial image of the user reveals a user smiling like a mother, based on the smiling expression, it may be inferred that the current watching mood of the user is liked, based on the current watching mood and the video type (pet type) of the currently played video, it may be inferred that the user likes a video of the pet type, it may be inferred that the user probably likes a pet, it may be determined that the video to be recommended is a video of another pet type, and when the user wants to watch the video to be recommended, the video to be recommended may be played through the user terminal.
As an example, the video content of the currently played video is one fighting hero, at this time, in the obtained facial image of the user, a tear water flows down the user's feeling, based on the expression of the flowing-down tear water, it can be determined that the user likes a feeling-type video, it can be inferred that the current watching mood of the user is like, based on the current watching mood and the video type (feeling type) of the currently played video, it can be inferred that the user is likely to be easily felt, the feeling is fine, it can be determined that the video to be recommended is a feeling-type video, and when the user wants to watch the video to be recommended, the video to be recommended can be played through the user terminal.
Based on the same principle as the video recommendation method shown in fig. 1, an embodiment of the present disclosure also provides a video recommendation method, which may be executed by a server, as shown in fig. 2, and includes:
step S210, receiving sensory information of a user, wherein the sensory information is acquired when the user terminal of the user detects that the user watches the video;
step S220, determining relevant information of a video to be recommended based on sensory information;
step S230, sending the relevant information of the video to be recommended to the user terminal, so that the user terminal displays the relevant information of the video to be recommended to the user.
The method is only different from the method shown in fig. 1 in terms of the main execution body, the principle is the same as that of the method shown in fig. 1, and for the detailed function description of the video recommendation method, reference may be specifically made to the description in the corresponding video recommendation method shown in the foregoing, and details are not repeated here.
In the embodiment of the disclosure, the determining the relevant information of the video to be recommended based on the sensory information includes:
determining a current viewing mood of the user based on the sensory information;
and determining related information of the video to be recommended based on the current watching mood, wherein the sensory information comprises at least one of facial images or voice information.
In an embodiment of the present disclosure, determining a current viewing mood of a user based on sensory information includes:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
In the embodiment of the present disclosure, if the sensory information includes a facial image and voice information, the mood classification model includes a first classification model and a second classification model, and based on the sensory information, the current watching mood of the user is determined through a pre-trained mood classification model, including:
determining a first viewing mood of the user through a first classification model based on the facial image;
determining a second viewing mood of the user through a second classification model based on the voice information;
determining a current viewing mood of the user based on the first viewing mood and the second viewing mood.
In an embodiment of the present disclosure, determining a current viewing mood of a user based on a first viewing mood and a second viewing mood includes:
determining a first weight corresponding to the first viewing mood based on the first viewing mood;
determining a second weight corresponding to the second viewing mood based on the second viewing mood;
and determining the current watching mood of the user based on the first weight, the first watching mood, the second weight and the second watching mood.
For example, if the first viewing mood is more reflective of the viewing mood of the user, the first weight may be configured to be greater than the second weight, and the sum of the first weight and the second weight is 1.
As an example, the first weight is 0.7 and the second weight is 0.3.
In the embodiment of the disclosure, determining the relevant information of the video to be recommended based on the current watching mood includes:
determining a video type of a video being watched by a user;
and acquiring related information of the video to be recommended corresponding to the video type based on the current watching mood.
When determining the video to be recommended, the video type of the currently played video and different watching moods can be determined first, and the video corresponding to the video type can be determined as the video to be recommended, for example, the watching moods include a favorite watching mood and a disliked watching mood, the favorite watching mood indicates that the user likes to watch the currently played video, the disliked watching mood indicates that the user dislikes to watch the currently played video, when the watching mood is the favorite watching mood, the video of the same type as the video type can be determined as the video to be recommended, and when the watching mood is the disliked watching mood, the video of the different video type from the video type can be determined as the video to be recommended.
Based on the same principle as the video recommendation method shown in fig. 1, an embodiment of the present disclosure further provides a video recommendation apparatus 30, as shown in fig. 3, where the apparatus 30 may include: a sensory information acquisition module 310 and a related information acquisition module 320, wherein,
a sensory information acquisition module 310, configured to acquire sensory information of a user when it is detected that the user is watching a video;
the relevant information acquisition module 320 is used for sending the sensory information to the server, receiving the relevant information of the video to be recommended returned by the server, and displaying the relevant information of the video to be recommended to the user;
wherein the video to be recommended is determined based on the sensory information.
According to the video recommending device, when the fact that the user watches the video is detected, the sensory information of the user is obtained, the watching state of the user can be reflected on the basis of the sensory information, the watching experience of the current watched video is included, the video to be recommended is determined on the basis of the sensory information of the user, the watching experience of the user is considered, the recommended video is more accurate, and the user's mind is better met.
Optionally, the sensory information comprises at least one of facial images or voice information.
Optionally, the relevant information of the video to be recommended is determined by the server in the following manner:
determining a current viewing mood of the user based on the sensory information;
and determining the relevant information of the video to be recommended based on the current watching mood.
Optionally, the related information obtaining module, when determining the current viewing mood of the user based on the sensory information, is specifically configured to:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
Optionally, before the sensory information of the user is acquired, the system further comprises a prompt module for displaying prompt information, wherein the prompt information is used for prompting the user whether to allow the sensory information of the user to be acquired; the sensory information acquisition determination operation of the user is received based on the prompt information, and the sensory information of the user is acquired when the user is detected to watch the video.
Optionally, the prompt information includes at least one of first prompt information or second prompt information, the first prompt information is used for prompting the user whether to allow the user to acquire the facial image, and the second prompt information is used for prompting the user whether to allow the user to acquire the voice information.
Optionally, the apparatus further comprises:
the system comprises a user information acquisition module, a video acquisition module and a video display module, wherein the user information acquisition module is used for acquiring user information of a user, and the user information comprises at least one item of user attribute information and historical behavior information of watching videos; the video to be recommended is determined based on the sensory information and the user information.
Based on the same principle as the video recommendation method shown in fig. 2, there is also provided in an embodiment of the present disclosure a video recommendation apparatus 40, as shown in fig. 4, where the apparatus 40 may include: a sensory information determination module 410, a related information determination module 420, and an information processing module 430, wherein,
a sensory information determination module 410, configured to receive sensory information of a user, where the sensory information is obtained when it is detected that the user watches a video through a user terminal of the user;
the related information determining module 420 is configured to determine related information of a video to be recommended based on the sensory information;
and the information processing module 430 is configured to send the relevant information of the video to be recommended to the user terminal, so that the user terminal displays the relevant information of the video to be recommended to the user.
According to the video recommending device, when the fact that the user watches the video is detected, the sensory information of the user is obtained, the watching state of the user can be reflected on the basis of the sensory information, the watching experience of the current watched video is included, the video to be recommended is determined on the basis of the sensory information of the user, the watching experience of the user is considered, the recommended video is more accurate, and the user's mind is better met.
Optionally, when determining the relevant information of the video to be recommended based on the sensory information, the relevant information determination module is specifically configured to:
determining a current viewing mood of the user based on the sensory information;
and determining related information of the video to be recommended based on the current watching mood, wherein the sensory information comprises at least one of facial images or voice information.
Optionally, the related information determining module, when determining the current viewing mood of the user based on the sensory information, is specifically configured to:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
Optionally, if the sensory information includes a facial image and voice information, the mood classification model includes a first classification model and a second classification model, and the related information determining module is configured to, when determining the current viewing mood of the user based on the sensory information and through a pre-trained mood classification model, specifically:
determining a first viewing mood of the user through a first classification model based on the facial image;
determining a second viewing mood of the user through a second classification model based on the voice information;
determining a current viewing mood of the user based on the first viewing mood and the second viewing mood.
Optionally, the related information determining module, when determining the current viewing mood of the user based on the first viewing mood and the second viewing mood, is specifically configured to:
determining a first weight corresponding to the first viewing mood based on the first viewing mood;
determining a second weight corresponding to the second viewing mood based on the second viewing mood;
and determining the current watching mood of the user based on the first weight, the first watching mood, the second weight and the second watching mood.
Optionally, when determining the related information of the video to be recommended based on the current viewing mood, the related information determining module is specifically configured to:
determining a video type of a video being watched by a user;
and acquiring related information of the video to be recommended corresponding to the video type based on the current watching mood.
The apparatus of the embodiment of the present disclosure may execute a video recommendation method shown in fig. 2, and the implementation principles thereof are similar, the actions executed by the modules in the video recommendation apparatus in the embodiments of the present disclosure correspond to the steps in the video recommendation method in the embodiments of the present disclosure, and for the detailed functional description of the modules of the video recommendation apparatus, reference may be specifically made to the description in the corresponding video recommendation method shown in the foregoing, and details are not repeated here.
Based on the same principle as the method in the embodiment of the present disclosure, reference is made to fig. 5, which shows a schematic structural diagram of an electronic device (e.g., a terminal device in fig. 1 or a server in fig. 2) 600 suitable for implementing the embodiment of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
The electronic device includes: a memory and a processor, wherein the processor may be referred to as the processing device 601 hereinafter, and the memory may include at least one of a Read Only Memory (ROM)602, a Random Access Memory (RAM)603 and a storage device 608 hereinafter, which are specifically shown as follows:
as shown in fig. 5, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a triggering operation of a user aiming at an image area in a video frame image of a currently played video through a video playing interface; if the video frame image contains the designated object and the image area is the designated area corresponding to the designated object, generating an object information acquisition request aiming at the designated object; based on the object information acquisition request, link information of the specified object is acquired and displayed.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the designation of a module or unit does not in some cases constitute a limitation of the unit itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, [ example one ] there is provided a video recommendation method, comprising:
when a user is detected to watch a video, obtaining sensory information of the user;
sending the sensory information to a server, receiving relevant information of a video to be recommended returned by the server, and displaying the relevant information of the video to be recommended to the user;
wherein the video to be recommended is determined based on the sensory information.
According to one or more embodiments of the present disclosure, the sensory information includes at least one of facial image or voice information.
According to one or more embodiments of the present disclosure, the information related to the video to be recommended is determined by the server by:
determining a current viewing mood of the user based on the sensory information;
and determining the related information of the video to be recommended based on the current watching mood.
According to one or more embodiments of the present disclosure, the determining the current viewing mood of the user based on the sensory information includes:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
According to one or more embodiments of the present disclosure, before the obtaining sensory information of the user, the method further includes:
displaying prompt information, wherein the prompt information is used for prompting a user whether to allow to acquire sensory information of the user;
and when receiving a sensory information acquisition determination operation of a user based on the prompt information and detecting that the user watches the video, acquiring the sensory information of the user.
According to one or more embodiments of the present disclosure, the prompt information includes at least one of first prompt information for prompting a user whether to allow acquisition of a facial image of the user or second prompt information for prompting a user whether to allow acquisition of voice information of the user.
In accordance with one or more embodiments of the present disclosure, the method further comprises:
acquiring user information of the user, wherein the user information comprises at least one item of user attribute information and historical behavior information of watching videos;
the video to be recommended is determined based on the sensory information and the user information.
According to one or more embodiments of the present disclosure, [ example two ] there is provided a video recommendation method, the method comprising:
receiving sensory information of a user, wherein the sensory information is acquired when the user terminal of the user detects that the user watches a video;
determining related information of a video to be recommended based on the sensory information;
and sending the relevant information of the video to be recommended to the user terminal so that the user terminal displays the relevant information of the video to be recommended to the user.
According to one or more embodiments of the present disclosure, the determining, based on the sensory information, related information of a video to be recommended includes:
determining a current viewing mood of the user based on the sensory information;
and determining related information of the video to be recommended based on the current watching mood, wherein the sensory information comprises at least one of facial images or voice information.
According to one or more embodiments of the present disclosure, the determining the current viewing mood of the user based on the sensory information includes:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
According to one or more embodiments of the present disclosure, if the sensory information includes a facial image and voice information, the mood classification model includes a first classification model and a second classification model, and determining the current viewing mood of the user through a pre-trained mood classification model based on the sensory information includes:
determining, by the first classification model, a first viewing mood of the user based on the facial image;
determining, by the second classification model, a second viewing mood of the user based on the voice information;
determining a current viewing mood of the user based on the first viewing mood and the second viewing mood.
According to one or more embodiments of the present disclosure, the determining the current viewing mood of the user based on the first viewing mood and the second viewing mood comprises:
determining a first weight corresponding to the first viewing mood based on the first viewing mood;
determining a second weight corresponding to the second viewing mood based on the second viewing mood;
determining the current viewing mood of the user based on the first weight, the first viewing mood, the second weight, and the second viewing mood.
According to one or more embodiments of the present disclosure, the determining, based on the current viewing mood, the relevant information of the video to be recommended includes:
determining a video type of a video being viewed by the user;
and acquiring related information of the video to be recommended corresponding to the video type based on the current watching mood.
According to one or more embodiments of the present disclosure, [ example three ] there is provided a video recommendation apparatus including:
the sensory information acquisition module is used for acquiring the sensory information of the user when the user is detected to watch the video;
the relevant information acquisition module is used for sending the sensory information to a server, receiving relevant information of a video to be recommended returned by the server, and displaying the relevant information of the video to be recommended to the user;
wherein the video to be recommended is determined based on the sensory information.
According to one or more embodiments of the present disclosure, the sensory information includes at least one of facial image or voice information.
In accordance with one or more embodiments of the present disclosure,
the relevant information of the video to be recommended is determined by the server in the following way:
determining a current viewing mood of the user based on the sensory information;
and determining the related information of the video to be recommended based on the current watching mood.
In accordance with one or more embodiments of the present disclosure,
the related information obtaining module is specifically configured to, when determining the current viewing mood of the user based on the sensory information:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
According to one or more embodiments of the present disclosure, before the sensory information of the user is acquired, a prompt module is further included, configured to display a prompt message, where the prompt message is used to prompt the user whether to allow the sensory information of the user to be acquired; and when receiving a sensory information acquisition determination operation of a user based on the prompt information and detecting that the user watches the video, acquiring the sensory information of the user.
According to one or more embodiments of the present disclosure, the prompt information includes at least one of first prompt information for prompting a user whether to allow acquisition of a facial image of the user or second prompt information for prompting a user whether to allow acquisition of voice information of the user.
According to one or more embodiments of the present disclosure, the apparatus further comprises:
the user information acquisition module is used for acquiring user information of the user, wherein the user information comprises at least one item of user attribute information and historical behavior information of watching videos; the video to be recommended is determined based on the sensory information and the user information.
According to one or more embodiments of the present disclosure, [ example four ] there is provided a video recommendation apparatus comprising:
the system comprises a sensory information determining module, a video processing module and a video processing module, wherein the sensory information determining module is used for receiving sensory information of a user, and the sensory information is acquired when the user terminal of the user detects that the user watches videos;
the relevant information determining module is used for determining relevant information of the video to be recommended based on the sensory information;
and the information processing module is used for sending the relevant information of the video to be recommended to the user terminal so that the user terminal can display the relevant information of the video to be recommended to the user.
According to one or more embodiments of the present disclosure, when determining the relevant information of the video to be recommended based on the sensory information, the relevant information determination module is specifically configured to:
determining a current viewing mood of the user based on the sensory information;
and determining related information of the video to be recommended based on the current watching mood, wherein the sensory information comprises at least one of facial images or voice information.
According to one or more embodiments of the present disclosure, the related information determining module, when determining the current viewing mood of the user based on the sensory information, is specifically configured to:
and determining the current watching mood of the user through a pre-trained mood classification model based on the sensory information.
According to one or more embodiments of the present disclosure, if the sensory information includes a facial image and voice information, the mood classification model includes a first classification model and a second classification model, and the related information determining module is specifically configured to, when determining the current viewing mood of the user through a pre-trained mood classification model based on the sensory information:
determining, by the first classification model, a first viewing mood of the user based on the facial image;
determining, by the second classification model, a second viewing mood of the user based on the voice information;
determining a current viewing mood of the user based on the first viewing mood and the second viewing mood.
According to one or more embodiments of the present disclosure, the related information determining module, when determining the current viewing mood of the user based on the first viewing mood and the second viewing mood, is specifically configured to:
determining a first weight corresponding to the first viewing mood based on the first viewing mood;
determining a second weight corresponding to the second viewing mood based on the second viewing mood;
determining the current viewing mood of the user based on the first weight, the first viewing mood, the second weight, and the second viewing mood.
According to one or more embodiments of the present disclosure, when determining the related information of the video to be recommended based on the current viewing mood, the related information determining module is specifically configured to:
determining a video type of a video being viewed by the user;
and acquiring related information of the video to be recommended corresponding to the video type based on the current watching mood.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.