WO2016192501A1

WO2016192501A1 - Video search method and apparatus

Info

Publication number: WO2016192501A1
Application number: PCT/CN2016/080770
Authority: WO
Inventors: 周茂林; 张衎; 付贤会
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-05-29
Filing date: 2016-04-29
Publication date: 2016-12-08
Also published as: CN106294454A

Abstract

A video search method and apparatus, which are applied to a server. The method comprises: acquiring a first correspondence between keyframes of a video and keywords; and receiving a keyword sent by a terminal, and searching for a keyframe of a corresponding video according to the keyword sent by the terminal and the first correspondence.

Description

Video search method and device

Technical field

The invention relates to, but is not limited to, the field of video technology.

Background technique

In the related art, the picture recognition technology on the mobile phone has many mature applications. For example, there are a lot of photos on mobile phones, and it is more troublesome to organize them one by one. Some applications can automatically scan your photo albums, find the photos you want based on keywords, and bring convenience to your life.

However, in the field of video, there is no way for users to search for the desired video conveniently and quickly, thus annoying users in real life; for example, the user is watching a favorite movie on the mobile phone or tablet, but There are sudden things that need to be closed. After the user comes back, they need to go through complicated operations on the phone or tablet to re-search for the movies they watched before. For example, when the user sees a poster of a favorite movie screen outside, they want to watch the movie on the poster. It is necessary to search for movies on posters through complicated operations on a mobile phone or a computer. The movie video is difficult.

In the above scenario, the user searching for the video is very troublesome and difficult, and the user experience is also poor; therefore, how to make the user search for the video conveniently and quickly becomes an urgent problem to be solved in the video field.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a video search method and device, and a video playing method and device, which enable a user to search for a video conveniently and quickly.

An embodiment of the present invention provides a video search method, which is applied to a server, and includes the following steps:

Obtaining a first correspondence between a key frame of the video and the keyword;

The keyword sent by the terminal is received, and the key frame of the corresponding video is searched according to the keyword sent by the terminal and the first correspondence.

Optionally, after the key frame is found, the method further includes:

Sending video information corresponding to the found key frame to the terminal.

Optionally, before receiving the keyword sent by the terminal, the method further includes: acquiring a second correspondence between the video information and a key frame of the video;

After the video information corresponding to the found key frame is sent to the terminal, the method further includes: before the sending, by the terminal, the key information sent by the terminal and the first corresponding relationship, the video information corresponding to the searched key frame, to the terminal, the method further includes: :

And searching for the video information corresponding to the key frame according to the found key frame and the second correspondence.

Optionally, the received keyword includes: a picture keyword, where the picture keyword is a keyword related to the video picture obtained by performing image recognition on a picture including a video picture.

Optionally, the first correspondence between the key frame of the acquired video and the keyword includes:

Get the keyframe of the video;

Performing the image recognition on a key frame of the video to acquire a keyword of the key frame;

Establishing a first correspondence between a key frame of the video and a keyword.

Optionally, the keyword of the key frame further includes: at least one of a text in the key frame, a body content in the key frame, and a ratio of a key frame occupied by the body content in the key frame.

Optionally, before receiving the picture keyword sent by the terminal, the method further includes:

Obtaining time information of a key frame of the video in the video;

Establishing a third correspondence between key frames and time information of the video;

After the searching for the key frame of the corresponding video, the method further includes:

Finding corresponding time information according to the found key frame and the third correspondence relationship;

The found time information is sent to the terminal.

Optionally, the video information includes: video content information or video resource location information.

The embodiment of the invention further provides a video search method, which is applied to the terminal, and includes the following steps:

Get keywords;

The keyword is sent to a server for the server to find a key frame of the video according to the keyword.

Optionally, the step of acquiring a keyword includes:

Get a picture containing the video screen;

Performing image recognition on the picture to obtain a keyword corresponding to the picture.

Optionally, after the keyword is sent to the server, the method further includes:

Receiving video information sent by the server;

The video is played according to the video information.

Optionally, after the keyword is sent to the server, the method further includes: receiving time information sent by the server;

The step of performing video playback according to the video information includes:

The video is played according to the video information and the time information.

Receiving video information sent by the server;

And transmitting the video information to the playback device, where the playback device performs video playback according to the video information.

Receiving time information sent by the server;

And transmitting the time information to the playing device, so that the playing device performs video playing according to the time information and the video information.

The embodiment of the invention further provides a video search device, which is applied to a server, and includes:

a first acquisition module and a first search module;

The first acquiring module is configured to acquire a first correspondence between a key frame of the video and the keyword;

The first search module is configured to receive a keyword sent by the terminal, and search for a key frame of the corresponding video according to the keyword sent by the terminal and the first correspondence.

Optionally, the video search device further includes: a first sending module;

The first sending module is configured to send video information corresponding to the found key frame to the terminal.

The embodiment of the invention further provides a video search device, which is applied to the terminal, and includes:

a second acquisition module and a second transmission module;

The second obtaining module is configured to acquire a keyword;

The second sending module is configured to send the keyword to the server, so that the server searches for a key frame of the video according to the keyword.

Optionally, the acquiring, by the second acquiring module, the keyword includes: acquiring a picture that includes a video image;

The embodiment of the present invention provides a video search method and device. The video search method of the embodiment of the present invention includes: acquiring a first correspondence between a key frame of a video and a keyword; and receiving a keyword sent by the terminal, according to the terminal Searching for a key frame of the corresponding video in the first correspondence relationship; applying the video search method in the embodiment of the present invention, the user terminal only needs to send the keyword to the server, and the server sends the keyword according to the terminal and the first A corresponding relationship automatically finds out the key frame of the corresponding video; since the key frame of the video can represent the video, the corresponding video can be found; for the user, it only needs to acquire and send the keyword to the server, and the operation is simple The solution of the embodiment of the invention is fast, the difficulty of the video search is reduced, and the user experience is improved.

In an optional implementation manner of the video search solution of the embodiment of the present invention, the image recognition technology is based on the image recognition technology, and the user can quickly obtain the corresponding video information by simply acquiring the image including the video image, and the operation is simple and fast. By applying the optional implementation manner, the user does not need to memorize the keyword information of the search video, which reduces the difficulty of the video search and improves the user experience.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

1 is a schematic flowchart of a first video search method according to Embodiment 1 of the present invention;

2 is a schematic flowchart of a second video search method according to Embodiment 1 of the present invention;

3 is a schematic flowchart of a third video search method according to Embodiment 1 of the present invention;

4 is a schematic flowchart of a fourth video search method according to Embodiment 1 of the present invention;

FIG. 5 is a schematic flowchart of a video search method according to Embodiment 2 of the present invention;

FIG. 6 is a schematic flowchart of a video playing method according to Embodiment 2 of the present invention;

FIG. 7 is a schematic flowchart diagram of another video playing method according to Embodiment 2 of the present invention;

FIG. 8 is a schematic flowchart of video search and playback according to Embodiment 3 of the present invention; FIG.

FIG. 9 is a schematic flowchart of another video search and play according to Embodiment 3 of the present invention; FIG.

FIG. 10 is a schematic structural diagram of a first video search apparatus according to Embodiment 4 of the present invention; FIG.

FIG. 11 is a schematic structural diagram of a second video search apparatus according to Embodiment 4 of the present invention;

FIG. 12 is a schematic structural diagram of a third video search apparatus according to Embodiment 4 of the present invention.

Preferred embodiment of the invention

The embodiments of the present invention are described below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

The embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Embodiment 1:

The present invention provides a video search method, which is applied to the server side, as shown in FIG. 1 , and includes the following steps in view of the problem in the related art:

Step 101: Acquire a first correspondence between a key frame of the video and a keyword.

In this embodiment, the manner in which the first correspondence is obtained may be multiple. For example, the correspondence between the key frames of the video and the keywords may be established by other devices, and then the server obtains the device, and the video is established between the servers. The correspondence between key frames and keywords.

In this embodiment, the key frame of the video is a frame picture, for example, it may be an independent and complete frame picture. For a group of GOPs (Group of Pictures), the following video frames are dependent on the key frame; therefore, In this embodiment, the key frame of the video can represent the video, and the key frame of the video can be found to know the corresponding video.

The process of establishing a first correspondence between the keywords of the video and the video key frame by the server in this embodiment may include:

Get the keyframe of the video;

The method in this embodiment may separately acquire key frames of the video for the video, and then perform image recognition on all key frames to obtain keywords of each key frame and save the keywords, and finally establish a correspondence between the key frames and the keywords.

In this embodiment, the image recognition of the key frame is based on the knowledge base content and the image recognition mode, and different knowledge base contents and image recognition methods may obtain different keywords. Among them, the knowledge base is the most important structure, easy to operate, easy to use, and comprehensive and organized knowledge cluster in knowledge engineering. It is the need to solve problems in one or some fields, and adopts one or more kinds of knowledge representation. A collection of interrelated pieces of knowledge stored, organized, managed, and used in computer memory.

Optionally, the keyword of the key frame in this embodiment may include at least one of text in a key frame, body content in a key frame, and a ratio of a key frame occupied by the body content in the key frame. For example, the keyword of the key frame in this embodiment may be the text information in the key frame of the video, the content of the main body, and the proportion of the image occupied by the main content. This set of key information may be used to identify a picture.

The form of the first correspondence between the key frame of the video and the keyword in the embodiment may include: a keyword index of the key frame, the index value is a keyword, and the index object is a key frame.

In the embodiment, the keyword sent by the terminal may be a picture keyword, where the picture keyword is a keyword related to the video picture obtained by performing image recognition on the picture including the video picture; for example, the user terminal directly includes the video. Pictures of the screen (such as screenshots of video screens, pictures formed by video screens, etc.) are keywords that are image-received. Others may also use pictures of video images (such as video screen captures and video images). The picture, etc.) performs the image recognition to obtain the keywords, and then the terminal obtains the forwarding from the other device to the server.

Optionally, in the embodiment, the image recognition mode on the terminal side and the server side use the key frame. The image recognition method needs to be consistent. Otherwise, the recognized keyword content is different, and the video cannot be accurately matched.

Step 102: Receive a keyword sent by the terminal, and search for a key frame of the corresponding video according to the keyword sent by the terminal and the first correspondence.

Optionally, the picture keyword sent by the terminal is received, and the video corresponding to the picture keyword is searched according to the picture keyword and the first correspondence; the picture keyword is that the terminal pair includes a video picture. The picture is subjected to image recognition to acquire keywords related to the video picture.

Since the key frame of the video can characterize the video, the key frame of the video can be found to know the corresponding video.

In this embodiment, one or more key frames corresponding to the keyword may be found; the key frame to be searched may be a key frame in a video or a key frame in a group of videos, and the group of videos may be the most relevant. A strong set of videos.

Applying the video search method of the embodiment, the user terminal only needs to obtain a picture including a video picture (for example, capturing a video picture or taking a screen shot, etc.), and then performing image recognition on the picture to obtain a keyword related to the video picture and sending the keyword to the server. The server automatically finds the key frame of the corresponding video according to the keyword sent by the terminal and the stored correspondence relationship. Since the key frame can represent the video, the key frame of the video is found to find the video; for the user, only The image information including the video picture can be obtained, and the corresponding video information can be quickly obtained. The operation is simple and fast. In addition, the method of the embodiment is used, and the user does not need to memorize the keyword information of the search video, thereby reducing the difficulty of the video search and improving the user experience. .

In order to enable the terminal to play the found video, as shown in FIG. 2, the embodiment further provides a video search method, which is applied to the server side, and includes:

Step 201: Establish a first correspondence between a key frame of the video and the keyword.

For the process of establishing the first correspondence in this step, reference may be made to the related description above. For example, a keyword index of a key frame is established, the index value is a keyword, and the index object is a key frame of the video.

Step 202: Receive a keyword sent by the terminal, and search for a key frame of the corresponding video according to the keyword sent by the terminal and the first correspondence.

Optionally, the keyword sent by the terminal includes a picture keyword, where the picture keyword is a pair of The picture of the video picture is subjected to image recognition to obtain keywords related to the video picture. For example, a keyword related to the video screen obtained by image recognition of a picture including a video picture between transmission terminals.

After receiving the picture keyword, the key frame of the corresponding video is retrieved in the keyword index of the key frame according to the picture keyword.

Step 203: Send video information corresponding to the found key frame to the terminal.

The video information of this step may include: video content information, or video resource location information.

After the corresponding key frame is found, the video content information corresponding to the key frame may be sent to the terminal for the terminal to directly play, or sent to other playback devices for playing.

Or sending the video resource location information corresponding to the key frame, for example, a URI (Uniform Resource Identifier), to the terminal, so that the terminal acquires the corresponding video content according to the identifier information, or sends the identifier information to other playback devices. The other playback device acquires the corresponding video content according to the identification information for playing.

In this embodiment, since the searched key frames may be one or more, the video information corresponding to the key frames may also be one or more. For example, the searched video may be a video or a group of videos, then The embodiment method needs to send information of one video to the terminal, or send information of each video in a group of videos to the terminal.

Optionally, the method in this embodiment may obtain a second correspondence between the video information and the key frame of the video before the step 202. In this case, the step 203 may include: according to the found key frame and the first The two correspondences find the corresponding video information; and the found video information is sent to the terminal.

In the embodiment, the process of obtaining the second correspondence between the video information and the key frame of the video may be established by the user terminal by itself; the second correspondence may be established by other devices, and then the user terminal obtains the second correspondence. The second correspondence. Among them, there is a one-to-one correspondence between video information and video.

The process of establishing a second correspondence between the video information and the key frame of the video in this embodiment may include:

Obtain key frames of video and video information (such as video content or video resource location information) for the video;

Then, a second correspondence between the key frame of the video and the video information is established.

In this embodiment, the second correspondence may be a key frame index of the video information, where the index value of the key frame index of the video is a key frame, and the index object is video information; after the key frame of the video is found, the video information is searched. The key frame index matches the corresponding video information.

Applying the video search method of the embodiment, the user terminal only needs to obtain a picture including a video picture (for example, capturing a video picture or taking a screen shot, etc.), and then performing image recognition on the picture to obtain a keyword related to the video picture and sending the keyword to the server. The server automatically finds the corresponding video according to the keyword sent by the terminal and the stored correspondence relationship, and feeds back the search result to the terminal. It can be seen that the video search method in the embodiment of the present invention is based on the image recognition technology, and for the user, Obtaining the video information including the video picture can quickly obtain the corresponding video information, and the operation is simple and fast. In addition, the method of the embodiment is used, and the user does not need to memorize the keyword information of the search video, thereby reducing the difficulty of the video search and improving the user experience.

According to the above description, as shown in FIG. 3, this embodiment further provides another video search method, which is applied to the server side, and includes the following steps:

Step 300: Acquire a key frame of the video; perform the image recognition on the key frame to obtain a keyword of the key frame.

Step 301: Establish a first correspondence between the video key frame and the keyword and a second correspondence between the video information and the video key frame.

The establishing manner of the corresponding relationship in this embodiment may be to establish an index, for example, first establishing a key frame index of the video (ie, a second correspondence), and then establishing a keyword index of the key frame (ie, a first correspondence); wherein the video is The index value of the key frame index is a key frame, and the index object is video information (including video content information or resource location information); the index of the key index of the key frame is a keyword, and the index object is a key frame; After the keyword is sent, the keyword index of the search key frame first matches the corresponding key frame, and then the corresponding video information is matched in the key frame index of the search video.

Step 302: Receive a picture keyword sent by the terminal, according to the picture keyword and the first A correspondence relationship searches for a video key frame corresponding to the picture keyword.

For example, after receiving the picture keyword, the key picture corresponding to the picture keyword is matched in the keyword index of the key frame by using the picture keyword.

Step 303: Search for video information corresponding to the video key frame according to the found video key frame and the second correspondence.

For example, after matching the corresponding key frame, the key frame is used to match the video corresponding to the key frame in the index of the key frame of the video.

Step 304: Send the found video information to the terminal.

For example, the media content of the video may be sent to the terminal for playing, or the URI may be sent to the terminal for the terminal to acquire the corresponding video content for playing.

Considering that the user can obtain the previously viewed video from the beginning after the user obtains the video information, the user repeatedly watches the video content that has already been viewed or performs fast forwarding, etc., which reduces the user experience is low; for this case, the embodiment provides a The solution is that the server also needs to send related time information to the terminal, so that the user can continue watching the video from the point of time when the video is played, which improves the user experience.

Optionally, in this embodiment, before the step 302, the method of the embodiment further includes: acquiring time information of the key frame of the video in the video; establishing a third between the video key frame and the time information. Correspondence relationship

After the step 302, the method of the embodiment further includes: searching for the corresponding time information according to the found key frame and the third correspondence; and sending the found time information to the terminal.

In this embodiment, the time information corresponding to the key frame can be sent to the terminal, so that the user can continue watching the video from the time point of the previous viewing when the video is played, thereby improving the user experience.

As shown in FIG. 4, this embodiment further provides another video search method, which is applied to the server side, and includes the following steps:

Step 400: Acquire a key frame of the video, perform the image recognition on the key frame to obtain a keyword of the key frame, and obtain time information of the key frame in the video.

Step 401: Establish a first correspondence between the video key frame and the keyword, a second correspondence between the video information and the video key frame, and a third pair between the video key frame and the time information. It should be related, and store the first correspondence, the second correspondence, and the third correspondence.

In this step, the correspondence between the keywords of the video and the video key frame is composed of the first relationship and the second correspondence.

The establishing manner of the corresponding relationship in this embodiment may be establishing an index, for example, establishing an index of a video key frame of a video (ie, a second correspondence), and establishing an index of a keyword of the video key frame (ie, a first correspondence relationship.

In this embodiment, the third correspondence relationship may be established by establishing an index, for example, establishing a key frame index of the time information, the index value is a key frame, and the index object is time information; after the key frame is found, the key may be found according to the key The frame matches the corresponding time information in the key frame index of the time information. The time information in this embodiment may be time point information.

Step 402: Receive a picture keyword sent by the terminal, and search for a video key frame corresponding to the picture keyword according to the picture keyword and the first correspondence.

Step 403: Search for video information corresponding to the video key frame according to the found video key frame and the second correspondence, and search for the corresponding video key frame according to the found video key frame and the third correspondence. Time information.

Step 404: Send the found video information (such as content information or resource location information) and time information to the terminal.

By adopting the method of the embodiment, the video capture or photographing can be conveniently performed, and the corresponding video and video time points are matched, which brings convenience to the user to watch the video.

Embodiment 2:

This embodiment provides a video search method, which is applied to the terminal side, as shown in FIG. 5, and includes the following steps:

Step 501: Acquire keywords.

The manner of obtaining the keyword in this embodiment may include multiple types. For example, the keyword may be generated by the terminal itself, or the keyword may be generated by the device, and the terminal acquires the keyword from other devices.

Optionally, in the embodiment, the keyword may be a picture keyword, and the picture keyword is an image recognition of the picture including the video picture to obtain a picture keyword related to the video picture.

The process of the terminal acquiring the picture keyword may include:

First, obtain a picture containing a video picture;

In this embodiment, there are various ways to obtain a picture including a video picture, for example, taking a screen shot to obtain a screen shot photo, or taking a picture of the video picture (such as taking a picture of a display that is playing a video).

Next, performing image recognition on the picture to obtain a keyword corresponding to the picture.

When the keyword is a picture keyword, performing image recognition on the picture to obtain keywords corresponding to the picture includes:

Performing image recognition on the picture to obtain a picture keyword related to the video picture.

Optionally, the terminal may perform image recognition on the image by using a specific image recognition application to obtain a picture keyword related to the video picture, and the application scans the photo acquisition keyword including the video picture.

In this embodiment, the picture containing the video picture has two forms, one is that the entire picture is filled with the video picture, and the picture is a video picture, for example, a video screen capture picture obtained by taking a screenshot of the video picture, and only the entire picture is needed at this time. Image recognition can be used; the other is that part of the picture fills the video picture. For example, when the area of the picture is larger than the area of the video, the captured picture also contains other content. In this case, image recognition is required for the video picture, and the non-video picture is The content is discarded.

The keywords related to the video screen identified in this embodiment may include at least one of a text in the video screen, a main content in the video screen, and a ratio of the video content in the video screen. In this embodiment, the image recognition process on the terminal side is consistent with the image recognition process on the server side.

Step 502: Send the keyword to the server, so that the server searches for a key frame of the video according to the keyword.

The video search method of the embodiment can send the keyword to the server, and the server automatically finds the key frame of the corresponding video, thereby finding the video, which is convenient and simple, and improves the user experience.

After the server side finds the video key frame, the video information corresponding to the video key frame is also sent to the terminal for video playback. Therefore, the method in this embodiment may further include: receiving the video information sent by the server after the step 502; The video is played according to the video information.

As shown in FIG. 6, this embodiment provides a video playing method, including the following steps:

Step 601: Acquire a picture containing a video picture.

Step 602: Perform image recognition on the picture to obtain a picture keyword related to the video picture, and send the picture keyword to the server.

Step 603: Receive video information sent by the server.

After obtaining the picture keyword, the terminal sends the acquired picture keyword to the server, and the server searches for the corresponding video according to the correspondence between the picture keyword and the stored video and the keyword of the video key frame, and then the server searches for the corresponding video. The information of the outgoing video is sent to the terminal.

The video found in this embodiment may be a video, or may be a group of videos, for example, the one with the strongest association with the picture keyword. Therefore, the video information received by the terminal in this embodiment may be one video information or multiple video information (for example, a group of video information).

The information of the video in this embodiment may include content information of the video or identification information (for example, a URI) of the video.

Step 604: Perform video playback according to the video information.

When the information of the received video is the content information of the video, the terminal directly plays the content information of the video;

When the information of the received video is the location information (for example, a URI) of the video resource, the terminal acquires the corresponding video content according to the location information, and then plays the obtained video content.

When the terminal receives a set of video information, the user also needs to select the desired video information for playback.

The video playing method of this embodiment can enable the user to search for the desired video conveniently and quickly and play it.

In the case that the server also needs to send the time information, the playing method of the embodiment may further include: receiving the time information sent by the server after the step 602; at this time, the step 604 includes: according to the time information and the video. Information for video playback.

In the method of the embodiment, the terminal can also receive the time information, and the terminal can know the time when the user acquired the picture (that is, the time when the user interrupts watching the video), and can start playing from the time when playing the video, and does not need to play from the beginning, and improve. The user experience.

The above describes the case where the terminal directly plays the video. The following describes the case where the video is played by the other playback device. As shown in FIG. 7, the embodiment further provides another video playing method, including the following steps:

Step 701: Acquire a picture containing a video picture.

For example, taking a picture of a television screen that is playing a video to obtain a picture containing a video picture.

Step 702: Perform image recognition on the picture to obtain a picture keyword related to the video picture, and send the picture keyword to the server.

In this embodiment, the picture containing the video picture has two forms, one is that the entire picture is filled with the video picture, and the picture is a video picture, for example, a video screen capture picture obtained by taking a screenshot of the video picture, and only the entire picture is needed at this time. Image recognition can be used; the other is that part of the picture fills the video picture. For example, when the area of the picture is larger than the area of the video, the captured picture also contains other content. In this case, image recognition is required for the video picture, and the non-video picture is The content is discarded, for example, when the picture acquired by the television screen is recognized, only the content of the television screen is recognized, and the interface portion not belonging to the content of the television screen is discarded.

Step 703: Receive video information sent by the server.

For a description, reference may be made to the description of step 603 above.

Step 704: Send the video information to a playback device, where the playback device performs video playback according to the video information.

In this embodiment, the terminal directly plays the video, but converts the information of the video sent by the server to a playback device (such as a television or a set top box) for playing.

Optionally, when the information of the received video is the content information of the video, the terminal sends the content information of the video to the playing device, and the playing device directly plays the video after receiving the content information of the video;

When receiving the video information as the video resource location information, the terminal sends the video resource location information to the playback device, and the playback device acquires the corresponding video content according to the received video resource location information for playing.

In the case that the server further sends the time information to the terminal, in the method shown in FIG. 7, after step 702, the method further includes: receiving time information sent by the server; and transmitting the time information to the playback device, where The playback device performs video playback according to the time information and the video information.

Embodiment 3:

According to the descriptions of the first embodiment and the second embodiment, the application of the method described in the first embodiment and the second embodiment is introduced:

First, the server establishes the key frame index of the video, the keyword index of the key frame, and the key frame index of the time point. The flow is as follows:

1. Process all the videos to obtain key frames of each video, and establish a key frame index of the video.

A key frame is a separate and complete frame. For a group of GOPs, the subsequent video frames are dependent on the key frame.

2. Obtain the time point information of the key frame in the video, perform image recognition for all key frames, and obtain keyword information of each key frame and save it.

In this embodiment, the image recognition algorithm and the knowledge base content determine the content of the keyword, and also determine the accuracy of the search video and the location.

At present, many applications can accurately identify the text in the picture, the main content, and the proportion of the picture occupied by the main content, and a set of keyword information can be used to identify a picture. This set of keywords is also the corresponding keyword in this article.

3. Establish key index of key frame and key frame index of time point.

The following is a video playback and playback process by taking a video directly from the terminal as an example:

After the terminal takes a picture through the camera, or other way, after obtaining a picture of the video screenshot, As shown in Figure 8, the following steps are included:

Step 801: The terminal traces the screen image to obtain the screen image keyword information.

Step 802: The terminal sends the keyword information to the server.

Step 803: The server receives the keyword information sent by the terminal, and searches for a key frame of the key frame according to the keyword information to match the corresponding key frame.

Since the screen capture is not necessarily at the position of the key frame, the screen capture may not exactly match the key frame existence, and one or more closest video frames need to be matched.

Step 804: The server searches for the matching video and time point of the key frame index of the time point and all the key frames of the video according to the matched key frame.

The matching result in this step can be a video, and the server sends a video or identification information to the terminal.

The matching result in this step can be a group of videos, and a set of video or identification information is sent to the terminal.

Step 805: The server sends the video information corresponding to the matched video and the time point to the terminal.

The video information may include: identification information corresponding to the matched video or video content of the matched video.

Step 806: The terminal receives the time point and the video sent by the server, or the time point and the identification information; and then plays the corresponding video according to the received information.

Let's take a video playback of other playback devices (TV) as an example to introduce the video search and playback process:

Under the premise that the mobile phone has taken a photo for the video played by the television to obtain the photo containing the video content, as shown in FIG. 9, the process of video search and playback includes the following steps:

Step 901: The mobile phone starts a specific identification application to scan a photo, and obtains keyword information related to the video content.

Since the area of the photograph may be larger than the area of the video, this needs to be identified for the content of the television screen, and the portion of the interface that does not belong to the content of the television screen is discarded.

The keyword information identified in this embodiment may be subject information of a photo, and a percentage of various colors.

Step 902: The specific identification application sends the keyword information to the server where the video is located through the network.

Step 903: After receiving the keyword information, the server searches for a key frame of the key frame according to the keyword information to match the corresponding key frame.

The result of the matching may be a key frame in a video, or a key frame in a corresponding group of videos with the strongest correlation.

Step 904: The server searches for the video and time point information corresponding to all the matching of the key frame index of the time point and the key frame of the video according to the matched key frame.

The matching result can be a video or a group of videos, a point in time information or a set of time point information.

Step 905: The server sends the video information corresponding to the matched video and the time point information to the terminal.

Since the received information may be a set of video information, in this case, the mobile application or the mobile phone user is required to perform screening. For example, the user filters out the video identifier of the desired video from a set of video identification information.

Step 906: The mobile phone pushes the video information and the time point information to the television or the set top box.

The push mode can be AirPlay mode, or DLNA.

Step 907: The television or the set top box starts the corresponding program play according to the video information and the time point information.

Embodiment 4:

As shown in FIG. 10, the embodiment provides a video search device, which is applied to a server, and includes: a first acquiring module and a first searching module;

As shown in FIG. 11, the video search apparatus of this embodiment further includes: a first sending module;

Optionally, the first obtaining module is further configured to: acquire, by the first acquiring module, a second correspondence between video information and a key frame of the video;

The first search module is further configured to search for video information corresponding to the key frame according to the found key frame and the second correspondence.

Optionally, the first obtaining module includes:

a key frame acquisition unit configured to acquire a key frame of the video;

a keyword acquiring unit, configured to perform the image recognition on a key frame of the video to acquire a keyword of the key frame;

The first correspondence establishing unit is configured to establish a first correspondence between the key frame of the video and the keyword.

Optionally, the first obtaining module further includes:

a time information obtaining unit, configured to acquire time information of a key frame of the video in the video;

a third correspondence establishing unit is configured to establish a third correspondence between the key frame and the time information of the video;

The first sending module is further configured to search for corresponding time information according to the found key frame and the third correspondence, and send the found time information to the terminal.

As shown in FIG. 12, the embodiment further provides a video search device, which is applied to a terminal, and includes: a second acquiring module and a second sending module;

The second obtaining module is configured to acquire a keyword;

The acquiring, by the second acquiring module, the keyword includes: acquiring a picture that includes a video image;

Optionally, the video search device further includes: a second receiving module, configured to: receive video information sent by the server.

The embodiment of the present invention further provides a video playing device, where the video playing device includes any video searching device provided by the embodiment of the present invention. The video playing device further includes: a playing module, configured to perform video playing according to the video information.

Optionally, the second sending module is further configured to: receive time information sent by the server;

The step of the playing module performing video playback according to the video information includes:

The video search device further includes: a third sending module, configured to:

After receiving the video information sent by the server, the video information is sent to the playback device, so that the playback device performs video playback according to the video information.

Optionally, the second receiving module is further configured to: receive time information sent by the server;

The third sending module is further configured to: send the time information to the playing device, so that the playing device performs video playing according to the time information and the video information.

Applying the video search device of the embodiment, the user terminal only needs to acquire a picture including a video picture (for example, capturing a video picture or taking a screen shot, etc.), and then performing image recognition on the picture to obtain a keyword related to the video picture and sending the keyword to the server. The server automatically finds the corresponding video according to the keyword sent by the terminal and the stored correspondence relationship, and feeds back the search result to the terminal. It can be seen that the video search method in the embodiment of the present invention is based on the image recognition technology, and for the user, Obtaining the image containing the video screen can quickly obtain the corresponding video information, and the operation is simple and fast, in addition, By applying the device of the embodiment, the user does not need to memorize the keyword information of the search video, which reduces the difficulty of the video search and improves the user experience.

The embodiment of the invention further provides a computer readable storage medium storing computer executable instructions for performing the above method.

One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.

Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.

The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.

When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Industrial applicability

According to the solution of the embodiment of the present invention, based on the image recognition technology, the user can quickly obtain the corresponding video information only by acquiring the picture containing the video picture, and the operation is simple and fast. In addition, the method of the present invention is applied without the user. Memorizing the keyword information of the search video reduces the difficulty of the video search and improves the user experience.

Claims

A video search method, applied to a server, includes the following steps:

Obtaining a first correspondence between a key frame of the video and the keyword;

The keyword sent by the terminal is received, and the key frame of the corresponding video is searched according to the keyword sent by the terminal and the first correspondence.
The video search method of claim 1, wherein after the key frame is found, the method further comprises:

Sending video information corresponding to the found key frame to the terminal.
The video search method of claim 2, wherein before the receiving the keyword sent by the terminal, the method further comprises: acquiring a second correspondence between the video information and the key frame of the video;

After the video information corresponding to the found key frame is sent to the terminal, the method further includes: before the sending, by the terminal, the key information sent by the terminal and the first corresponding relationship, the video information corresponding to the searched key frame, to the terminal, the method further includes: :

And searching for the video information corresponding to the key frame according to the found key frame and the second correspondence.
The video search method according to any one of claims 1 to 3, wherein the received keyword includes: a picture keyword, and the picture keyword is an image recognition acquisition of the picture including the video picture and the video Screen related keywords.
The video search method of claim 4, wherein the first correspondence between the key frame of the acquired video and the keyword comprises:

Get the keyframe of the video;

Performing the image recognition on a key frame of the video to acquire a keyword of the key frame;

Establishing a first correspondence between a key frame of the video and a keyword.
The video search method according to claim 5, wherein the keyword of the key frame further comprises: at least a text in a key frame, a body content in a key frame, and a ratio of a key frame occupied by the body content in the key frame. One.
The video search method according to claim 5, wherein the picture transmitted at the receiving terminal is off Before the key words, the method further includes:

Obtaining time information of a key frame of the video in the video;

Establishing a third correspondence between key frames and time information of the video;

After the searching for the key frame of the corresponding video, the method further includes:

Finding corresponding time information according to the found key frame and the third correspondence relationship;

The found time information is sent to the terminal.
The video search method according to claim 2, wherein the video information comprises: video content information or video resource location information.
A video search method is applied to a terminal, including the following steps:

Get keywords;

The keyword is sent to a server for the server to find a key frame of the video according to the keyword.
The video search method according to claim 9, wherein the step of acquiring a keyword comprises:

Get a picture containing the video screen;

Performing image recognition on the picture to obtain a keyword corresponding to the picture.
The video search method according to claim 9 or 10, wherein after the keyword is transmitted to the server, the method further comprises:

Receiving video information sent by the server;

The video is played according to the video information.
The video search method according to claim 11, wherein after the keyword is transmitted to the server, the method further comprises: receiving time information sent by the server;

The step of performing video playback according to the video information includes:

The video is played according to the video information and the time information.
The video search method according to claim 9 or 10, wherein after the keyword is transmitted to the server, the method further comprises:

Receiving video information sent by the server;

And transmitting the video information to the playback device, where the playback device performs video playback according to the video information.
The video search method of claim 13, wherein after the keyword is sent to the server, the method further comprises:

Receiving time information sent by the server;

And transmitting the time information to the playing device, so that the playing device performs video playing according to the time information and the video information.
A video search device is applied to a server, including: a first acquiring module and a first searching module;

The first acquiring module is configured to acquire a first correspondence between a key frame of the video and the keyword;

The first search module is configured to receive a keyword sent by the terminal, and search for a key frame of the corresponding video according to the keyword sent by the terminal and the first correspondence.
The video search device of claim 15, further comprising: a first transmitting module;

The first sending module is configured to send video information corresponding to the found key frame to the terminal.
A video search device is applied to the terminal, including: a second acquiring module and a second sending module;

The second obtaining module is configured to acquire a keyword;

The second sending module is configured to send the keyword to the server, so that the server searches for a key frame of the video according to the keyword.
The video search device of claim 17, wherein the acquiring, by the second obtaining module, the keyword comprises: acquiring a picture including a video image;

Performing image recognition on the picture to obtain a keyword corresponding to the picture.