CN113722542A

CN113722542A - Video recommendation method and display device

Info

Publication number: CN113722542A
Application number: CN202111016128.3A
Authority: CN
Inventors: 赵明; 史小龙; 黄山山; 王洁
Original assignee: Qingdao Jukanyun Technology Co ltd
Current assignee: Qingdao Jukanyun Technology Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-30

Abstract

The application provides a video recommendation method and display equipment, which can match long videos related to short videos by combining multi-dimensional information such as keyword information in the short videos and content information of different types of labels, instead of matching the long videos by only using the keyword information in short video titles. Compared with the mode of matching the long video by using single keyword information, the mode of matching the long video based on the multi-dimensional information has the advantages that the matched content is more comprehensive, and the correlation degree of the matched long video content and the short video content is higher. Furthermore, the long video recommended to the user by the display equipment can better meet the watching requirements of the user, and the recommended result is more accurate.

Description

Video recommendation method and display device

Technical Field

The present application relates to the field of display technologies, and in particular, to a video recommendation method and a display device.

Background

Because the short video has the advantages of simple content, short playing time, quick watching and the like, the watching of the short video to obtain some information content is favored by more and more broad terminal users. However, the core content provided by the display device is still a long video with a long playing time and rich content, so that recommending relevant long videos to the user on the display device based on the short videos watched by the user is called a mainstream video recommendation mode.

At present, the short videos are high in production speed and high in timeliness, complete contents or description information is difficult to label on the large-amount generated short videos in a short time, and most of the short videos only have header information and can be referred to. In the process of recommending the long video based on the short video, the related long video can be obtained only through the title information of the short video. However, in order to attract users to watch the short videos, the titles of some short videos do not coincide with the actual content of the short videos, so that the content of the long videos recommended according to the titles of the short videos is greatly deviated from the content of the short videos, and the recommendation result of the long videos is inaccurate.

Disclosure of Invention

The application provides a video recommendation method and display equipment, which aim to solve the problem that long video content and short video content which are recommended by the display equipment and related to short videos have large deviation.

In one aspect, the present application provides a video recommendation method, including the following steps: first information of a first video is acquired. And acquiring a target second video matched with the first video by using the first information. And controlling the display to display the display information of the target second video, and further recommending the target second video to the user.

According to the video recommendation method, the long video related to the short video can be matched by combining the keyword information in the short video and the multi-dimensional information such as the content information of different types of labels, and the long video is not matched by only utilizing the keyword information in the short video title. Compared with the mode of matching the long video by using single keyword information, the mode of matching the long video based on the multi-dimensional information has the advantages that the matched content is more comprehensive, and the correlation degree of the matched long video content and the short video content is higher. Furthermore, the long video recommended to the user by the display equipment can better meet the watching requirements of the user, and the recommended result is more accurate.

It is to be understood that, in order to facilitate the description of the first video and the second video, the first video may be referred to as a short video and the second video may be referred to as a long video, respectively.

In another aspect, the present application further provides a display device, including: a display; a controller connected to the display, the controller configured to: acquiring first information of a first video; the first information represents different kinds of information in the first video and at least comprises keywords and label contents; and the video time length of the first video is less than or equal to the preset time length. Acquiring a target second video matched with the first video by using the first information; and the video time length of the target second video is greater than the preset time length. Controlling a display to display the display information of the target second video, and recommending the target second video to the user; wherein the presentation information represents a title and/or a thumbnail of the target second video.

The video recommendation method in the application can be applied to the display device, and further, the beneficial effects that the display device can achieve are the same as those that the video recommendation method can achieve, and are not repeated here.

Drawings

In order to more clearly illustrate the embodiments of the present application or the implementation manner in the related art, a brief description will be given below of the drawings required for the description of the embodiments or the related art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 illustrates a schematic diagram of an operational scenario between a display device and a control apparatus according to some embodiments;

fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments;

fig. 3 illustrates a hardware configuration block diagram of the display apparatus 200 according to some embodiments;

FIG. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments;

FIG. 5 illustrates a schematic diagram of a display device 200 displaying a movie-like short video according to some embodiments;

FIG. 6 illustrates a schematic diagram of a display device 200 displaying a list of recommended long videos for a movie-like short video, according to some embodiments;

FIG. 7 illustrates a flow diagram of a video recommendation method according to some embodiments;

FIG. 8 illustrates a schematic diagram of a display device 200 displaying advertisement-like short videos, according to some embodiments;

FIG. 9 illustrates a schematic diagram of a display device 200 displaying a target long video title according to some embodiments;

FIG. 10 illustrates a schematic diagram of a display device 200 displaying a target long video thumbnail according to some embodiments;

fig. 11 illustrates a schematic diagram of the display apparatus 200 simultaneously displaying a target long video title and a thumbnail according to some embodiments.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Fig. 1 illustrates a schematic diagram of an operational scenario between a display device and a control apparatus according to some embodiments. As shown in fig. 1, a user may operate the display apparatus 200 through the smart device 300 or the control device 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may input a user instruction through a key on a remote controller, voice input, control panel input, etc., to control the display apparatus 200.

In some embodiments, the smart device 300 (e.g., mobile terminal, tablet, computer, laptop, etc.) may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.

In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice command control of the user may be directly received by a module configured inside the display device 200 to obtain a voice command, or may be received by a voice control device provided outside the display device 200.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.

Fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200.

Fig. 3 illustrates a hardware configuration block diagram of a display device 200 according to some embodiments.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.

In some embodiments the controller comprises a processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, a component for receiving an image signal from the controller output, performing display of video content, image content, and a menu manipulation interface, and a user manipulation UI interface.

In some embodiments, the display 260 may be a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the external control apparatus 100 or the server 400 through the communicator 220.

In some embodiments, the user interface may be configured to receive control signals for controlling the apparatus 100 (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

And the CPU is used for executing the operating system and the application program instructions stored in the memory and executing various application programs, data and contents according to various interaction instructions for receiving external input so as to finally display and play various audio and video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on the direct display device 200.

In some embodiments, the video processor includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

In some embodiments, a system of a display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 4, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs may be windows (windows) programs carried by an operating system, system setting programs, clock programs or the like; or an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various applications as well as general navigational fallback functions, such as controlling exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

The display device 200 may provide a video playing platform for a user, and the videos that may be played include short videos with reduced content and long videos with more comprehensive content. Due to the fact that the content is simplified, the video duration of the short video is usually short, for example, 10s, 20s, and the like; the long video has a comprehensive content, so the video duration is relatively long, and the relatively common long video is a television play, a comprehensive art program and the like.

Because the short video has the advantages of simple content, short playing time, quick watching and the like, the watching of the short video to obtain some information content is favored by more and more broad terminal users. However, the core content provided by the display device 200 is still a long video with a long playing time and rich content, so recommending a relevant long video to the user on the display device 200 based on the short video viewed by the user is referred to as a mainstream video recommendation method. Moreover, recommending the long video on the display device 200 is also beneficial for improving the click conversion rate and the user retention rate of the long video in the video platform.

At present, the short video has high production speed and strong timeliness, and the short video generated in a large amount in a short time is difficult to be marked with complete content or description information, so most of the short videos only have header information. Referring to fig. 5, taking a movie-like short video as an example, when playing the short video, the display device 200 will display the title content of the short video "take a subway home after work in a week and find that there is no key" on the display interface of the short video at the same time.

In the display apparatus 200, when recommending a long video based on a short video, since only the title information of the short video can be referred to, the display apparatus 200 can also obtain the relevant long video only by the title information of the short video. Referring to fig. 6, still taking the above-mentioned short video of the movie category as an example, the related long video acquired by the display device 200 through the title content thereof may be displayed on the right side of the short video display interface in a list manner, so as to be recommended to the user. The long video also has a title, but the title is typically shorter than the short video, so the long video can directly display the title in the list for the user to view and select.

However, in order to attract users to watch the short videos, the titles of some short videos do not coincide with the actual content of the short videos, and even some false titles appear, which attract the attention of the users and induce the users to click on the short videos for watching. The content of a long video acquired through a title of such a short video is generally not correlated with the content of the short video. Therefore, the obtained long video is recommended to the user, and the watching requirement of the user cannot be met.

In order to improve the content correlation between the acquired long video and the short video, the embodiment of the present application provides a video recommendation method, which may be applied to the foregoing display device 200, and the execution subject of the method is the controller 250, as shown in fig. 7, where the method includes:

step S101, first information of a first video is obtained.

In order to distinguish the short video from the long video, in the embodiment of the present application, a preset duration may be set. If the video time length of a certain video is greater than the preset time length, determining the certain video as a long video; and if the video duration is less than or equal to the preset duration, determining the video as a short video. For example, if the preset time length is 30s, the video time length of the video a is 28s, and the time length of the video B is 50s, it may be determined that the video a is a short video and the video B is a long video.

In order to facilitate distinguishing between the first video and the second video, in the embodiment of the present application, the first video may also be referred to as a short video, and the second video may also be referred to as a long video.

The first information in the short video may be derived not only from the title but also from the video picture in the short video, for example, text in the picture, subtitles of the picture, a person in the picture, an item in the picture, a genre of the picture, and the like. The title, the picture characters, the picture subtitles and the like belong to contents with definite pointing significance, and for the contents, the contents can be subjected to word segmentation processing, so that the keywords of the short video can be determined. The picture characters, picture articles, picture styles, picture types and the like all belong to different types of labels, and specific contents in the character labels can be a certain character, such as actors, directors, celebrities, hosts and the like; the specific content in the article tag may be an article, such as a television, a table, a basketball, a football, etc.; the specific content in the style label can be a certain style, such as melancholy, horror, cheerful, easy, etc.; the specific content in the genre tag may be a certain video genre, such as news, movie, sports, advertisements, etc. The keyword and the label content belong to different kinds of information of the short video.

The keywords are obtained by performing word segmentation processing on a text such as a title, a screen character, or a subtitle. For a title, its content is just text, which can be directly participled. For the picture, the text on the picture is either the text on the scene in the picture or the caption on the picture, referring to fig. 8, taking the advertisement video as an example, the advertisement text "your nutritious breakfast" and the caption text "XXX" in the scene contained on the video picture provides high-quality breakfast for you and welcomes to the store for tasting. For such texts, when acquiring the text, a sampling interval needs to be determined according to the playing time of the video, and then a frame of image is acquired from the video at intervals of one sampling interval, that is, the short video is sampled at equal intervals, and finally a plurality of sampled images are acquired. And respectively identifying text content on the sampling image.

In a sampled image, there may also be some text that is distracting. Therefore, only the standard text may be recognized when the text is acquired. The standard text refers to a character meeting the identification requirement, and the identification requirement may be defined on an angle and a font of the character, for example, the angle of the character cannot exceed a preset angle, and the font of the character must be one or more of several fonts. If the preset angle is 30 degrees, only the text with the font angle smaller than or equal to 20 degrees needs to be identified in the text identification process; or the preset fonts are a regular script, a song script and a black body, so that only the texts of the regular script, the song script and the black body need to be recognized in the text recognition process.

In addition, in the embodiment of the present application, a text on a screen in a short video may be recognized by using an OCR (Optical Character Recognition).

In some embodiments, there may not be any text on the sample image, and then standard text may not be available for the sample image. In the case where the standard text cannot be recognized in all the sample images, the text may be obtained only by the title in step S101.

After the text is obtained, the text is subjected to word segmentation. Through word segmentation, a plurality of nouns, verbs, adjectives, prepositions and the like can be obtained. However, some nouns, verbs, adjectives or prepositions have no practical meaning or have a very low probability of occurrence, and such words cannot be used as representative keywords, such as "of", "ground", "you", "i", and the like. Therefore, in step S101, a plurality of candidate words obtained after the word segmentation process need to be filtered.

There are various ways to filter the words to be selected, such as filtering by using a term frequency-inverse document frequency (TF-IDF), or filtering by using a part-of-speech weight, or directly filtering out stop words.

When the TF-IDF is used for screening, in step S101, the TF-IDF of all the words to be selected needs to be calculated, and then the TF-IDF of each word to be selected is compared with the preset frequency, so as to filter out the words to be selected whose TF-IDF is less than the preset frequency. For example, the preset frequency is 0.6, the TF-IDF1 of the candidate 1 is 0.3, the TF-IDF2 of the candidate 2 is 0.4, and the TF-IDF3 of the candidate 3 is 0.7, then, through comparison, the candidate 1 and the candidate 2 need to be filtered out, and the remaining candidate 3 is the keyword.

When filtering by using part-of-speech weight, in step S101, it is necessary to first clarify the part-of-speech of each word to be selected, such as noun, verb, adjective, etc., and then calculate the part-of-speech weight of different parts-of-speech in the text, where the part-of-speech weight can be understood as the probability of the part-of-speech appearing in the text. In the text, the contribution, the importance and the like of the candidate words of each part of speech to the content are different, and the nouns and the verbs can have more obvious directional meanings, so that the contribution of the nouns and the verbs to the content is larger, and the adjectives, the prepositions and the like do not have obvious directional meanings, so that the contribution of the adjectives and the prepositions to the content is smaller. In order to screen out the words to be selected which have small content contribution, the part-of-speech weight of each part-of-speech can be compared with the preset weight, so as to filter out the words to be selected whose part-of-speech weight is smaller than the preset weight. For example, the preset weight is 0.5, the part-of-speech weight of a noun in the text is 0.8, the part-of-speech weight of a verb is 0.6, and the part-of-speech weight of an adjective is 0.3, then it can be known through comparison that the adjective needs to be filtered out, and the remaining part-of-speech is the noun and the candidate word of the verb is the keyword.

When the stop words are screened, in step S101, it needs to first determine which words are stop words, such as "of", "ground", "get", "i", "you", "yes", and the like, which words do not have directional meaning and do not contribute to the text content, so when the stop words appear in the candidate words, the stop words can be directly filtered, and the remaining other candidate words are keywords.

It should be noted that the keyword screening method used as an example in the embodiment of the present application may be used singly, or may be used in combination of any two or more kinds. For example, the words to be selected with TF-IDF greater than or equal to the preset frequency are screened out, and then the keywords with part-of-speech weight greater than or equal to the preset weight are screened out from the words to be selected; or screening out the words to be selected which are not stop words, and then screening out the keywords of which TF-IDF is more than or equal to the preset frequency from the words to be selected; and then or, screening a first part of words to be selected as non-stop words, screening a second part of words to be selected with TF-IDF greater than or equal to a preset frequency from the first part of words to be selected, and finally screening keywords with part-of-speech weight greater than or equal to a preset weight from the second part of words to be selected. The combination of the screening methods can be seen in various forms, and the details of the foregoing embodiments are not repeated herein.

The tag contents are obtained by identifying different kinds of contents in the short video picture. In step S101, it is possible to recognize whether there is a person in the short video by a face recognition technique, and to recognize who the person is, for example, a sheet, a week, a king, etc., by a recognition model trained in advance, etc. The objects in the short video can be identified by an object identification algorithm such as yolo (young Only Look once). The short video can be judged to which style the short video belongs by identifying the color features of the short video, for example, if the short video is identified to have more black features or a higher proportion, the short video can be classified as a horror style, and the like. The short videos can also be determined to belong to various types by identifying scene information in the short videos, for example, identifying that the scene information of the short videos includes grassland, people, football, and the like, and then the short videos can be classified as sports types.

In some cases, the identified persons, articles, and the like may not be highly correlated with the content of the short video, for example, a soccer ball appears several times in one cate-type short video, and although the soccer ball is identified in the short video, the soccer ball may be considered to have no correlation with the content of the short video, and if the soccer ball is also used as the tag content for matching and recommending the long video, the obtained long video content may also be less correlated with the short video content. In order to avoid such a situation, in step S101, after the candidate content such as a person or an article is identified, the occurrence time or the number of times of the person or the article in the short video may be determined, and the candidate content with a short occurrence time or a small occurrence number is filtered out by comparing with the preset time or the preset number of times, that is, the invalid content is filtered out. The appearance time is greater than or equal to the preset time, and the appearance time of the person or the article can be regarded as longer; the number of occurrences is greater than or equal to the preset number, and it can be regarded that the number of occurrences of the person or the article is large.

For example, if the preset time is 10s, the total time of the character m appearing in the short video is 12s, and the total time of the character n appearing in the short video is 6s, then it can be seen by comparison that the character n should be filtered out as the content with a shorter appearance time, and the character m should be determined as the tag content that can be used as the content with a longer appearance time. In addition, the preset time needs to be determined according to the specific video time length of the short video, and if the video time length of the short video is short, the preset time also needs to be short; if the video duration of a short video is longer, then the preset time should also be longer.

For another example, if the preset number is 10, the number of times that the item x appears in the short video is 8, and the number of times that the item y appears in the short video is 12, then it can be known through comparison that the item x as the content with a smaller number of occurrences should be filtered out, and the item y as the content with a larger number of occurrences should be determined as the available tag content.

In addition, some characters or articles in the short video pictures have obvious chronological characteristics. Further, in step S101, by recognizing the clothing of the picture character or the like, it is also possible to judge the feature of the clothing and determine the age to which the content in the short video belongs, such as ancient times, modern times, the nation, or the like. Therefore, when determining the tag content or tag type, the chronological tag of the short video and its content can also be determined.

It should be noted that in step S101, one or more tag contents may be determined according to specific contents in the short video frame. For example, if only people appear in the short video, only the people label content of the short video can be determined, and if the short video also has obvious color characteristics, the style label content of the short video can also be determined; and if the short video also has obvious dress features, the chronological label content of the short video can be continuously determined.

In the embodiment of the application, by acquiring keywords, various tag contents and the like in the short video, the information dimension for matching the long video can be increased, and compared with the current mode of matching the long video only through a short video title, the content relevance between the matched long video and the short video can be higher through the multi-dimensional matching information.

And step S102, acquiring a target second video matched with the first video by using the first information.

In the embodiment of the present application, the long video also has second information such as a keyword and tag content, and the determination manner of the second information is the same as that of the first information, and is not described herein again. In the embodiment of the application, the modes for matching the long video at least comprise modes for matching multi-dimensional information, matching information vectors and the like.

And matching the multi-dimensional information, namely matching the first information and the second information. It may also be determined that the long video matches the short video if the keywords of the long video match the keywords of the short video and the tag content of the long video matches the tag content of the short video.

However, there may be more than one keyword and tag content for the short video and more than one keyword and tag content for the long video, and if all keywords and/or all tag content of the short video and the long video are required to be the same, it is likely that there are few or no matching cases for the long video. Therefore, after the second information of the long video is acquired in step S102, if at least one of the second information matches the corresponding kind of information in the first information, the long video may be determined to be the target long video. For example, if a part of keywords in the second information match with keywords in the first information, it may be determined that the long video corresponding to the second information is the target long video; or, if a part of the tag content in the second information matches the tag content in the first information, it may also be determined that the long video corresponding to the second information is the target long video.

When the information vectors are matched, first information of the short video needs to be generated into a first vector, and second information of a plurality of long videos to be recommended needs to be generated into a plurality of second vectors. In step S102, a first vector and a second vector may be obtained separately using an Embedding (Embedding) calculation method. And then, storing the first vector and the second vector into a search engine (such as a Milvus vector search engine and the like), and searching a plurality of second vectors which are matched with the first vector or have higher correlation in the search engine, so that the long videos corresponding to the second vectors are target long videos matched with the short videos.

And step S103, controlling the display to display the display information of the target second video, and further recommending the target second video to the user.

In the embodiment of the present application, the presentation information of the target long video may be a title of the target long video, and referring to fig. 9, a video list including the title of the target long video, for example, XXX, SS, YYY, etc., is displayed on the right side of the current display page in fig. 9. The presentation information may also be a thumbnail of the target long video, and referring to fig. 10, a video list containing the thumbnail of the target long video is displayed on the right side of the current display page in fig. 10, and the thumbnail may be an image of the first frame or an image of a certain frame of the target long video. In order to facilitate viewing and selecting the target long video, the presentation information may further include a title and thumbnails at the same time, and referring to fig. 11, a video list of the target long video thumbnails is displayed at the bottom of the current display page in fig. 11, and the titles of the target long videos are also displayed on the respective thumbnails at the same time.

In the recommendation list of long videos, the target long video which is ranked in the front position and is generally highly correlated with the short video is ranked. In step S103, the correlation between the target long video corresponding to the second information and the short video corresponding to the first information may also be determined according to the number of the second information matched with the first information. And, the more the number of matches, the higher the degree of correlation, which is proportional to the number of matches. For example, the target long video a, the target long video b, and the target long video c are all long videos matched with the short video D, where the number of matches between the keyword of the target long video a and the keyword of the short video D is 2, the number of matches between the tag content of the target long video a and the tag content of the short video D is 3, and then the number of matches between the information of the target long video a and the information of the short video D is 5; if the information matching number of the target long video b and the short video D is 6, and the information matching number of the target long video c and the short video D is 2, the correlation degree of the target long video b is the highest, and the correlation degree of the target long video a is the second, and the correlation degree of the target long video c is the lowest through comparison. In step S103, the target long videos are sorted according to the degree of correlation, so as to obtain the target long video list including the display information.

In the foregoing embodiment, determining whether the first information matches the second information may be performed by determining whether at least one of the second information is the same as the corresponding category of information in the first information, for example, if the genre of sports in the second information is the same as the genre of sports in the first information, it may be determined that the second information matches the first information.

Alternatively, determining whether the first information matches the second information may be performed by determining whether at least one of the second information is similar to the corresponding type of information in the first information. When determining whether the information is similar, the similarity may be compared with a preset similarity, and if the similarity is greater than or equal to the preset similarity, it may be determined that the two information match. For example, if the preset similarity is 70% and the similarity between the keyword F in the second information and the keyword P in the first information is 75%, it may be determined that the keyword F matches the keyword P, and the second information matches the first information.

As can be seen from the foregoing, in some cases, in order to attract users to click on short videos, thereby increasing the click rate of the short videos, the titles of some short videos do not coincide with the actual content. Therefore, the keywords extracted for such a title are also low in content relevance to the short video, and are not favorable for matching the long video. In order to avoid this, in some embodiments, the similarity between each keyword of the short video and each tag content may be determined, and then keywords or tag contents with the similarity smaller than a preset similarity are filtered out.

The keywords and the tag content can form an incidence relation of the keywords and the tag content, and the similarity of the keywords and the tag content is the similarity of the keywords and the tag content. For example, the short video has keywords H, T and Q, the content having the tag is genre entertainment, and the formed association relationships are keyword H-entertainment, keyword T-entertainment and keyword Q-entertainment, if the preset similarity is 40%, the keyword H-entertainment similarity is 50%, the keyword T-entertainment similarity is 60%, and the keyword Q-entertainment similarity is 10%, it can be known through comparison that the keywords Q in the keyword Q-entertainment relationship need to be filtered out, and the short video only retains the keywords H and T.

Or, the short video has a keyword H, and the content of the tag is genre entertainment and genre news, then the relationship that can be formed is the keyword H-entertainment and the keyword H-news, if the preset similarity is 40%, the keyword H-entertainment similarity is 50%, and the keyword H-news similarity is 30%, then it can be known through comparison that the content of the news tag in the keyword H-news relationship needs to be filtered, and further, the short video only retains the content of the entertainment tag.

In order to improve the matching degree of the obtained long video, in some embodiments, the knowledge graph can be used to further screen the long video. The knowledge graph is the relationship and connection among various knowledge formed by utilizing a large amount of prior knowledge. After the character tag content of the short video is acquired, the content related to the character tag content can be directly inquired in the knowledge graph, and then the related long video is acquired according to the content acquired in the knowledge graph. For example, if a moderator in a short video often appears in a news-type video, the weight of the news type can be increased, and a user can be recommended more news-type long videos.

Or if two or more people exist in the people label content of the short video, any two people can be directly associated to establish a people relationship, then other content associated with the people relationship is inquired in the knowledge graph, and further the related long video is obtained according to the content obtained in the knowledge graph. For example, if the characters in the short video include an actor who has a certain character in the grandchild and an actor who has a certain character in the artist, the program content of the actor who has participated in the grandchild and the actor who has a certain character in the actor can be found in the knowledge graph, and then the corresponding long video is obtained according to the program content.

In order to improve the relevance of the label content in the short video, in some embodiments, a clustering model may be trained based on historical label data, the label content identified from the short video is classified by using the clustering model, and the more the label content in the same class, the higher the relevance of the label content in the class and the short video content is. In addition, in the process, the data of the label content in each class can be compared with the preset quantity, so that the label content in the class with the smaller quantity is filtered. For example, the tag contents identified from the short video include a person king, an article blackboard, type education and style depression, and are classified by a clustering model, so that the person king, the article blackboard and the education can be classified into a first class, the depression can be classified into a second class, the number of the tag contents in the first class is 3, the number of the tag contents in the second class is 1, the tag contents need to be filtered out of the depression, and only the tag contents of the person king, the article blackboard, the type education and the like are included.

In some embodiments, in order to obtain long videos related to short videos, a recommended list of target long videos may also be obtained using a dual-tower Deep learning model, such as a DSSM (Deep Structured Semantic) model, and the like. Taking a DSSM model as an example, the model is composed of two deep neural networks, and each deep neural network performs independent operation, and then fuses respective operation results into one vector to be output. When the DSSM model is applied to the embodiment of the present application, the first deep neural network may be regarded as a short video query model, which may input features of each dimension, such as a title and a tag content of a short video; the second deep neural network can be regarded as a long video candidate model, and dimension features of the long video after identification can be input. And then carrying out Embelling calculation on the dimension characteristics of the short video and the long video respectively to obtain a recommendation list of the target long video. And after the target long video is recommended to the user, the click condition of the user on the recommended target long video can be acquired, and the click rate of each target long video is counted. The click through rate is taken as a loss function, and the non-click through rate is minimized. And finally, modeling a DSSM model for each dimension characteristic and loss function, combining the short video query model and the long video candidate model together, and performing model iterative training. When the trained DSSM model is applied, information such as the title and the label content of the short video can be input into the model, and then the DSSM model can output a corresponding long video recommendation list.

In the foregoing embodiment, when a long video is recommended based on a short video, not all short videos may be used as references, because the content of the short video is updated quickly, the timeliness is strong, and the content of the short video before a certain time period usually cannot arouse the interest of the user. Based on this, in some embodiments, short videos recently viewed by the user may be collected in a user travel log of the display device 200. Wherein, the recent period may be a week, 6 days, 5 days, etc. as the period closest to the current collection time.

In the foregoing embodiment, the corresponding long video may be obtained and recommended according to the content of one short video, or the corresponding long video may also be obtained and recommended according to the content of a plurality of short videos, and the specific obtaining manner may refer to the obtaining manner in the foregoing embodiment, which is not described herein again.

As can be seen from the foregoing, the video recommendation method in the embodiment of the present application can be applied to the display device 200 and executed by the controller 250. The method can match the long video related to the short video by combining the keyword information in the short video and the multi-dimensional information such as the content information of different types of labels, and the like, instead of matching the long video by only utilizing the keyword information in the short video title. Compared with the mode of matching the long video by using single keyword information, the mode of matching the long video based on the multi-dimensional information has the advantages that the matched content is more comprehensive, and the correlation degree of the matched long video content and the short video content is higher. Furthermore, the long video recommended to the user by the display device 200 can better meet the watching requirements of the user, and the recommended result is more accurate.

In addition, since the short video and the long video belong to contents between two domains, the accuracy of cross-domain video recommendation and the like can be enhanced by the method in the embodiment of the application.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, comprising:

a display;

a controller connected to the display, the controller configured to:

acquiring first information of a first video; the first information represents different kinds of information in the first video and at least comprises keywords and label contents; the video duration of the first video is less than or equal to a preset duration;

acquiring a target second video matched with the first video by using the first information; the video time length of the target second video is greater than the preset time length;

controlling a display to display the display information of the target second video, and recommending the target second video to a user; the presentation information represents a title and/or a thumbnail of the target second video.

2. The display device of claim 1, wherein the controller is further configured to:

performing word segmentation processing on a video title of a first video to obtain a plurality of words to be selected;

calculating the word frequency-inverse document frequency of all the words to be selected;

and filtering out the words to be selected with the word frequency-inverse document frequency smaller than the preset frequency from all the words to be selected, and further obtaining the keywords with the word frequency-inverse document frequency larger than or equal to the preset frequency.

3. The display device of claim 2, wherein the controller is further configured to:

calculating part-of-speech weights of all the words to be selected;

and filtering out the words to be selected with the part-of-speech weight smaller than the preset weight from all the words to be selected, and further obtaining the keywords with the part-of-speech weight larger than or equal to the preset weight.

4. The display device of claim 1, wherein the controller is further configured to:

identifying different types of contents to be selected appearing in the first video; the types of the contents to be selected at least comprise characters and articles;

determining the occurrence time of each content to be selected in the first video;

determining the content to be selected with the appearance time being more than or equal to the preset time as the label content; and the type of the content to be selected is the type of the label content.

5. The display device according to any one of claims 1-4, wherein the controller is further configured to:

calculating the similarity between each keyword and different types of label contents to obtain a plurality of keyword-label content similarities;

and if the similarity of the keyword-label content is less than the preset similarity, filtering out the corresponding keyword or label content in the relationship of the keyword-label content.

6. The display device according to any one of claims 2-4, wherein the controller is further configured to:

performing equal-interval frame sampling on the first video to obtain a plurality of sampling images;

respectively identifying standard texts in the sampling images; the font and the angle of the standard text representation characters meet the identification requirement;

and performing word segmentation processing on the standard text to obtain a plurality of words to be selected.

7. The display device of claim 1, wherein the controller is further configured to:

acquiring second information of a second video to be recommended; the second information represents different kinds of information in the second video and at least comprises keywords and label contents;

and if at least one kind of information in the second information is matched with the corresponding kind of information in the first information, determining that the second video corresponding to the second information is the target second video.

8. The display device of claim 1, wherein the controller is further configured to:

acquiring second information of a second video to be recommended;

generating a first vector and a second vector for the first information and the second information respectively by using an embedded calculation method;

searching a target second vector matched with the first vector by utilizing a search engine;

and determining a second video corresponding to the target second vector as a target second video.

9. The display device of claim 7, wherein the controller is further configured to:

determining the matching number of the information in the second information and the corresponding information in the first information;

determining the relevancy of a target second video corresponding to the second information according to the matching number; wherein the degree of correlation is proportional to the number of matches;

sequencing all the target second videos according to the correlation degree to form a target second video list; displaying a title of the target second video in the target second video list;

and controlling a display to display the target second video list, and recommending the target second video to the user.

10. A method for video recommendation, the method comprising: