WO2019144850A1 - 一种基于视频内容的视频搜索方法和视频搜索装置 - Google Patents

一种基于视频内容的视频搜索方法和视频搜索装置 Download PDF

Info

Publication number
WO2019144850A1
WO2019144850A1 PCT/CN2019/072392 CN2019072392W WO2019144850A1 WO 2019144850 A1 WO2019144850 A1 WO 2019144850A1 CN 2019072392 W CN2019072392 W CN 2019072392W WO 2019144850 A1 WO2019144850 A1 WO 2019144850A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video frame
information
query sequence
frame
Prior art date
Application number
PCT/CN2019/072392
Other languages
English (en)
French (fr)
Inventor
罗江春
陈锡岩
Original Assignee
北京一览科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京一览科技有限公司 filed Critical 北京一览科技有限公司
Publication of WO2019144850A1 publication Critical patent/WO2019144850A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/743Browsing; Visualisation therefor a collection of video files or sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present invention relates to the field of video search, and in particular, to a video search technology based on video content.
  • the prior art mainly searches for a video required by a user from a large amount of data through an external tag labeled by a video, such as a video title, a publisher name, a video summary, or profile information.
  • an external tag labeled by a video such as a video title, a publisher name, a video summary, or profile information.
  • the user sometimes only remembers a certain picture, or the content displayed or introduced by the picture, and often cannot search for the video including the picture according to the general external label marked by the video.
  • a video content-based video search device comprising:
  • a receiving device configured to receive a query sequence input by a user
  • Determining means configured to determine, according to the video frame index, at least one video frame and its corresponding at least one video included in the matching with the query sequence, wherein the video frame index is corresponding to each video frame in the video
  • the identification of information is established or updated
  • a positioning device configured to locate a play position of the corresponding at least one video according to the timestamp corresponding to the at least one video frame, and provide the at least one video to the user at the play position.
  • the video search device further includes:
  • a first identifying device configured to identify information corresponding to each video frame in each video, and obtain corresponding labeling information
  • an updating device configured to establish or update the video frame index according to the labeling information
  • the manner of matching the corresponding at least one video frame and the corresponding at least one video according to the query sequence in the determining apparatus includes:
  • the video search device further includes:
  • a second identifying device configured to identify information corresponding to each video frame in each video, and obtain corresponding labeling information
  • the manner of matching the corresponding at least one video frame and the corresponding at least one video according to the query sequence in the determining apparatus includes:
  • the annotation information is obtained by at least one of the following:
  • the video search device further includes:
  • a sorting device configured to sort the at least one video to obtain the sorted at least one video
  • the positioning device is used for:
  • the at least one video is sorted according to at least one of the following:
  • the positioning device is further configured to:
  • the target video includes multiple video frames that match the query sequence, locate the playback position of the target video according to any of the following:
  • the default is determined according to the timestamp of the first video frame in the target video that matches the query sequence
  • a video content-based video search method comprises:
  • the video frame index Determining, according to the video frame index, at least one video frame corresponding to the matching of the query sequence and its corresponding at least one video, wherein the video frame index is identified according to information corresponding to each video frame in the video Established or updated;
  • the video search method further includes:
  • the manner of matching the corresponding at least one video frame and the corresponding at least one video according to the query sequence in step b includes:
  • the video search method further includes:
  • the manner of matching the corresponding at least one video frame and the corresponding at least one video according to the query sequence in step b includes:
  • the annotation information is obtained by at least one of the following:
  • the video search method further includes:
  • step c includes:
  • the at least one video is sorted according to at least one of the following:
  • providing the at least one video to the user in the playing position in the step c further comprises:
  • the playback position of the target video is located according to any of the following:
  • the default is determined according to the timestamp of the first video frame in the target video that matches the query sequence
  • a computer readable storage medium storing computer code, when the computer code is executed, the video search method according to any one of the above Executed.
  • a computer apparatus comprising a memory and a processor, the memory storing computer code, the processor being configured to execute by executing the computer code The method of any of the preceding claims.
  • the present invention has the following advantages:
  • the present invention determines at least one video frame corresponding to the matching of the query sequence and its corresponding at least one video included in the video frame index, and the video frame index is based on the identification of information corresponding to each video frame in the video. Establishing or updating, and then, according to the timestamp corresponding to the at least one video frame, positioning a play position of the corresponding at least one video, and providing the at least one video to the user at the play position, not only based on the video
  • the information of each video frame is used to determine the video corresponding to the matching of the query sequence, so that the user can accurately find the video to be viewed by the user, and after searching for the corresponding video, the playback position of the video can be located to the video corresponding to the query sequence.
  • the frame is played, so that the user can quickly view the video segment corresponding to the query sequence, which reduces the operation time of the user, and can accurately locate the user to the play position corresponding to the query sequence.
  • FIG. 1 shows a schematic structural diagram of a video content-based video search device according to an aspect of the present invention
  • FIG. 2 is a flow chart showing a video content-based video search method according to another aspect of the present invention.
  • FIG. 3 illustrates a block diagram of an exemplary computer system/server suitable for use in implementing embodiments of the present invention.
  • the computer device includes a user device and a network device.
  • the user equipment includes, but is not limited to, a computer, a smart phone, a PDA, etc.
  • the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing based computer Or a cloud composed of a network server, wherein cloud computing is a type of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
  • the computer device can be operated separately to implement the present invention, and can also access the network and implement the present invention by interacting with other computer devices in the network.
  • the network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
  • the user equipment, the network equipment, the network, and the like are merely examples, and other existing or future possible computer equipment or networks, such as those applicable to the present invention, are also included in the scope of the present invention. It is included here by reference.
  • the video search device 1 shows a schematic structural diagram of a video content-based video search apparatus according to an aspect of the present invention.
  • the video search device 1 includes a receiving device 101, a determining device 102, and a pointing device 103.
  • the video search device 1 is for example located in a computer device comprising a user device and a network device.
  • the video search device 1 is located in the network device as an example for detailed description.
  • the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud computing-based cloud composed of a large number of computers or network servers, where the cloud computing is distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the computer device can be operated separately to implement the present invention, and can also access the network and implement the present invention by interacting with other computer devices in the network.
  • the network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
  • the receiving device 101 receives a query sequence input by the user. Specifically, if the user wants to search for a video segment related to the query sequence, or wants to search for a video including the video segment, input the query sequence through the input interface, and click the search button, the receiving device 101 receives the user input.
  • the sequence is queried for subsequent devices to search in the video database for a video segment that includes a sequence of queries, or further, a video that includes the video segment.
  • the user wants to search for a video clip including "Huang Huaweing eating at home" or a video including the video clip, and the user inputs a query sequence of "Huang Xiaoming eats at home" in the video search input interface, and the receiving device 101 passes, for example, one or more times.
  • the method of calling the application program interface receives the query sequence of "Huang Xiaoming eating at home" input by the user.
  • the determining device 102 determines, according to the video frame index, at least one video frame and its corresponding at least one video that are included in the matching with the query sequence, wherein the video frame index is based on information corresponding to each video frame in the video. Identify established or updated.
  • the video frame index may be a total index including video frames in all videos, or may be a sub-index established for each video separately.
  • the information included in each video frame is identified by identifying each video frame in each video separately, wherein the same video frame may include different information, and then separately included according to each video frame and each video frame.
  • the video frame index includes: a correspondence between each video frame and at least one information included in the video frame. Further, each of the video frames may further include an attribute tag for marking which video the video frame belongs to.
  • the video search device 1 identifies the information corresponding to each video frame in each video
  • at least one content summary corresponding to the content of the video frame is obtained from each video frame, for example, respectively.
  • the keywords summarized in the different information included in the video frame are referred to herein as label information. Therefore, the video frame index includes at least one of: a corresponding relationship between the video frames and different annotation information of the video frames.
  • the determining device 102 performs a matching query in the video frame index according to the received query sequence information by means of exact matching, fuzzy matching or a combination of the two, such as sequentially determining whether the query sequence hits each video frame in the video frame index.
  • the video frame corresponding to the annotation information is the video frame corresponding to the query sequence input by the user, and further the video corresponding to the video frame can be determined.
  • the determining device 102 can determine at least one video corresponding thereto according to the attribute tag of the at least one video frame.
  • the at least one video frame corresponding to the matching of the query sequence may be a video frame belonging to the same video, or may be a video frame belonging to different videos; the at least one video frame may be a continuous video frame, Can be a discontinuous video frame.
  • the identification of the information corresponding to each video frame in the video includes, but is not limited to, scene recognition based on a single frame image, scene recognition based on continuous multi-frame images, recognition of audio information based on the image, and image-based correspondence. Identification of subtitle information.
  • the 1000th video frame of video V1 is a picture of Huang Huaweing eating at home.
  • the video frame also shows the content of Huang Huaweing eating the noodles, or the picture of Huang Huaweing's parents, after identifying the information in the video frame, the video frame
  • the index may include: the attribute is marked as the corresponding relationship between the 1000th video frame of the video V1 and "Huang Huaweing eating at home", "Huang Huaweing eating noodles” and "Huang Huaweing parents”. If the 1000-4949 video frames of video V1 are including Huang Xiaoming at home, Huang Xiaoming eats the noodles and the pictures of Huang Xiaoming's parents, but Huang Huaweing's parents do not appear in the 1025-1049 video frames.
  • each of the 1000-1024 video frames of the video V1 corresponds to "Huang Huaweing eating at home", “Huang Huaweing eating noodles” and “Huang Huaweing parents”; video V1, 1025-1049
  • Each video frame in the video frame corresponds to "Huang Huaweing eating at home” and "Huang Huaweing eating fried noodles”.
  • the video database of the search video includes multiple videos such as video V1, video V2, video V3, etc.
  • a total video frame index including video frames in all videos in the database may be established, or one video may be established for each video.
  • the determining device 102 sequentially matches the corresponding at least one video frame for the query sequence in the video frame index corresponding to each video.
  • the determining device 102 may adopt a fuzzy matching manner to query the video frame index by “Huang Huaweing”, “Huang Huaweing eating”, “Huang Xiaoming at home”, “ Huang Xiaoming eats at home, “stars eat at home”, “Huang Huawei and angelababy eat at home” and other key hits, the video frame corresponding to the callout information is the video frame corresponding to "Huang Xiaoming eating at home”.
  • the determining means 102 determines that the 54 annotation information hits in the video frame index respectively corresponds to: 1000-1049 of the video V1.
  • Video frames, the 225th video frame of video V3, and the 25th, 126th, and 127th frames of video V8, the video frame corresponding to "Huang Xiaoming eating at home” includes the 1000-1049 video of video V1.
  • the frame, the 225th video frame of the video V3, and the 25th, 126th, and 127th video frames of the video V8, and the videos corresponding to "Huang Xiaoming eating at home” are videos V1, V3, and V8, respectively.
  • the determining device 102 determines a video corresponding to the query sequence according to the video frame index, wherein the video frame index is established or updated according to the identification of the information corresponding to each video frame in the video, and thus, the determining device 102 may Determining, by the video frame index, the information of each video frame in the video, determining the information in which the sequence is hit, determining the video frame corresponding to the query sequence according to the information of the hit, thereby determining the sequence with the query.
  • the corresponding video is provided to the user.
  • the video-based content that is, the information of each video frame in the video is searched, so that the user can search for the video that he wants to watch more accurately, which improves the user's search experience.
  • the locating device 103 locates a play position of the corresponding at least one video according to the time stamp corresponding to the at least one video frame, and provides the at least one video to the user at the play position. Specifically, each video frame in the video has its corresponding timestamp in the video. After the determining device 102 determines the at least one video frame corresponding to the query sequence, the positioning device 103 obtains a time stamp of each of the video frames in the respective corresponding video, and then respectively displays a video corresponding to the at least one video frame respectively. The playback position is located to the position of the video frame that matches the query sequence, so that when the user selects the video for viewing, the video starts playing from the video frame corresponding to the query sequence. The positioning device 103 allows the user to directly view the video clip he or she wants to see.
  • the positioning device 103 acquires one target video selected by the user from the at least one video, and if the target video includes multiple video frames that match the query sequence, The item locates the play position of the target video: the default is determined according to a timestamp of the first video frame in the target video that matches the query sequence; and is selected by the user. Specifically, when the user selects a target video from at least one video provided by the positioning device 103 for viewing, the positioning device 103 acquires the target video selected by the user. If the target video includes multiple video frames that match the query sequence, the playback position of the video may be positioned to the position of the video frame with the smallest timestamp, that is, the position of the video frame with the earliest playback time.
  • the selection method has a list form and a play progress bar.
  • An annotation form wherein the list form includes and includes not only a pop-up window list. For example, after a video is selected by a user, the positioning device 103 provides a selection box for the user in the form of a pop-up window list, and the corresponding video is selected by the user. The frame starts playing.
  • the positioning device 103 then provides the at least one video that has located the playback position to the user, so that regardless of which video is selected by the user for viewing, the video will start playing directly from the video frame matching the query sequence, so the user can The screen corresponding to the query sequence is directly viewed.
  • the video frame matching the "Huang Huaweing's meal" in the video V1 is: the 1000th to 1049th video frame in the video V1, the 225th video frame in the video V3, and the 25th, 126th, 127th in the video V8.
  • Video frames by default, the playback position of the video V1 is positioned to the 1000th video frame.
  • the playback position of the V3 is located to the 225th video frame, and the playback position of the V8 is positioned to the 25th video frame by default;
  • V1 is positioned to any one of the 1000-1049 video frames, any one of the 25th, 126th, and 127th video frames of V8; or the user may select which matching video frame to position the playback position to. .
  • the video search device 1 can search not only the video corresponding to the query sequence matching based on the information of each video frame in the video, but also find the video that satisfies the user's viewing condition accurately, and after the determining device 102 determines the corresponding video.
  • the positioning device 103 can accurately position the play position of the video to a position corresponding to the query sequence, so that the video is played from the video frame corresponding to the query sequence, so that the user can view the query more quickly.
  • the video segment corresponding to the sequence improves the search efficiency of the user and reduces the operation time of the user. Therefore, the video search device 1 can not only accurately search for the user to view the video, but also accurately position the play position of the video to the position of the video segment that the user wants to watch.
  • the video search device 1 further includes a first identification device 104 (not shown) and an update device 105 (not shown).
  • the first identifying means 104 identifies the information corresponding to each video frame in each video to obtain corresponding annotation information. Specifically, the first identification device 104 performs image recognition on each video frame in each video, and identifies related audio information or subtitle information corresponding to each video frame to obtain corresponding at least one annotation information, where
  • the annotation information is a summary of content included in the corresponding video frame.
  • the 1000th video frame of video V1 is a picture of Huang Huaweing eating at home.
  • the video frame also includes the content of Huang Huaweing eating fried noodles, or Huang Huaweing's parents. Therefore, the 1000th video frame of video V1 and the annotation information are respectively: Huang Huaweing eats at home, Huang Xiaoming eats fried noodles, and Huang Xiaoming's parents correspond.
  • the manner of obtaining the corresponding at least one annotation information includes but is not limited to:
  • the image feature of the video frame image is identified by means of image recognition, and at least one annotation information included in the video frame image is determined according to the image feature, where the at least one annotation information is respectively the video frame A summary of the different content included.
  • the video frame image may be determined according to the image features of the at least one video frame and the at least one video frame of the video frame image.
  • At least one content summary for example, determining a state of the object in the video frame according to a positional movement of the object in the plurality of video frames before and after, and then using the at least one content summary as at least one annotation information of the frame video, and the video frame
  • the at least one annotation information is updated into the video frame index by the video frame correspondence.
  • a video frame image displays a car
  • the state in which the car is stopped or the state of motion in the video frame cannot be recognized based on only one video frame, and therefore needs to be based on the front image of the video frame or
  • the rear image jointly determines the running state of the car. If the position of the car in the front image and the rear image changes according to the position of the car appearing in the video frame, the car is determined to be in an operating state, and the car is determined according to the size of the position change.
  • the high-speed running state then the car running or the car high-speed operation as the annotation information of the video frame, corresponding to the video frame in the video frame index.
  • the updating device 105 establishes or updates the video frame index according to the labeling information, wherein the manner of matching the corresponding at least one video frame and the corresponding at least one video according to the query sequence in the determining device comprises: according to the manner And a query sequence in which the at least one video frame having the annotation information matching the query sequence and its corresponding at least one video are matched in the video frame index.
  • the updating device 105 establishes a video frame index according to a correspondence between each video frame in the video and its corresponding annotation information.
  • the video frame index includes: a correspondence between each video frame and at least one of the corresponding annotation information, so that the video search device 1 matches the query sequence with the annotation information in the video frame index, and if the matching is successful, the annotation is performed.
  • the video frame corresponding to the information is a video frame matching the query sequence, and then the video search device 1 can determine the video of the video frame according to the attribute tag of each video frame.
  • the video frame index may be a total index established according to each video frame of each video, or may be a sub index established according to each video frame in any one video.
  • the determining device 102 When the video frame index is a total index including each video frame in each video, after the receiving device 101 receives the query sequence of the user, the determining device 102 performs matching in the video frame index according to the query sequence, and then Determining respective video corresponding to each of the matched video frames; if the video frame is at least one sub-index corresponding to each video respectively, determining device 102 sequentially sub-indexes corresponding to each video according to the received query sequence The matching is performed, and when the matching obtains the corresponding video frame, the determining device 102 can directly learn the video corresponding to the video frame.
  • the 48th video frame of the video V112 is a picture that Zhang Jie is singing.
  • the first identification device 104 can recognize the video frame.
  • the contents also include: Zhang Jie singing in China, Zhang Jie singing and other information, so "Zhang Jie is singing", “Zhang Jie singing in China” and "Zhang Jie singing against the war” as the annotation information of the video frame.
  • the updating apparatus 105 establishes a video frame index including the correspondence relationship between the video frames and the annotation information.
  • the method includes: the 48th video frame whose attribute is marked as the video V112 and the annotation information "Zhang Jie is singing", “Zhang Jie singing in China", “Zhang Jie singing” Corresponding to the anti-war; or if the video frame is a sub-index corresponding to each video, the video frame index corresponding to the video V112 includes: the 112th video frame and the annotation information "Zhang Jie is singing", “Zhang Jie China’s new singing voice sings and “Zhang Jie sings against the war”. There may be multiple annotation information for each video frame to comprehensively label the video frame from different aspects, so that the video frame is more easily retrieved or matched.
  • the video search device 1 further includes a second identification device 106 (not shown) and an update device 107 (not shown).
  • the second identification device 106 identifies the information corresponding to each video frame in each video to obtain corresponding annotation information; the establishing device 107 establishes a sub-index for each video according to the annotation information; wherein, in the determining device
  • the manner of matching the corresponding at least one video frame and the corresponding at least one video according to the query sequence includes: matching, according to the query sequence, each of the sub-indexes corresponding to the respective videos to have a match with the query sequence At least one video frame of the annotation information.
  • the second identification device 106 is implemented in the same manner as the first identification device 104 described above.
  • the second identification device 106 obtains the annotation information corresponding to the video frame, and the establishing device 107 establishes a sub-index for each video according to the annotation information, and after the receiving device 101 receives the query sequence of the user, the determining device 102 sequentially The label information in the video frame sub-index corresponding to each video is matched with the query sequence, and the video frame corresponding to the at least one label information is determined according to at least one label information corresponding to the query sequence matching. .
  • the video search device 1 can be provided with an in-video search function for searching for any video segment within the video, so that the user can directly locate the play position by searching for the keyword input when watching the video.
  • the box enters the annotation information corresponding to any video frame or video segment that the user wants to watch, so that the video search device 1 searches within the video.
  • the video search device 1 After obtaining the video frame corresponding to the annotation information, the video search device 1 locates the playback position to the video frame corresponding to the annotation information. Wherein, if there are multiple video frames corresponding to the annotation information, the positioning device 103 defaults to positioning the playback position to the video frame whose timestamp is the smallest, that is, at the video frame of the earliest playback position.
  • the video V27 is a segment of the movie video "Frozen”
  • there is a video frame index corresponding to the video V27 wherein the video frame index includes the correspondence of the 285th video frame and the annotation information "Ice Snow Queen's Magic Building Ice Palace" relationship.
  • the user inputs a query sequence of “Ice and Snow Palace” in the search function input box corresponding to the video V27, and determines that the device 102 matches the label information of the “Ice and Snow Queen Magic Snow Palace”, and then determines the label information according to the label information.
  • the corresponding video frame is the 285th video frame, and the positioning device 103 locates the playback position to the position where the 285th video frame is played according to the time stamp of the 285th video frame, and asks the user whether to start playing from the position. If the user agrees, the video will be played directly from the video frame of "The Snow Queen".
  • the video search device 1 realizes that the user can directly navigate to the position desired to be viewed by the search without manual dragging by the user.
  • the video search device 1 further includes a sorting device 108 (not shown).
  • the sorting device sorts the at least one video to obtain the sorted at least one video
  • the positioning device 103 is configured to: according to the at least one video frame, locate a play position of the corresponding at least one of the sorted videos Providing the sorted at least one video to the user at the play position.
  • the manner of sorting the at least one video includes but is not limited to:
  • Video publisher information corresponding to the video Specifically, if the video publisher publishes a video with a large number of historical records and the evaluation is high, the video has a higher sorting priority or a higher sorting weight.
  • Source information of the video Specifically, if the video is from a relatively well-known large website, for example, iQiyi, Youku, Sohu, and other well-known websites, the video has a higher sorting priority or a higher ranking weight.
  • the clarity or fluency of the video Specifically, if the resolution of the video is higher, or the play is smoother, the video has a higher sorting priority or a higher sorting weight.
  • the subject information of the video Specifically, if the subject of the video is popular in the most recent period, the video has a higher sorting priority or a higher sorting weight.
  • the matching query for example, fuzzy matching or exact matching
  • the video has higher ranking priority or higher ranking weight
  • the video is fuzzy If the match is obtained, the video has a higher sort priority or a higher sort weight.
  • FIG. 2 is a flow chart showing a video content based video search method in accordance with another aspect of the present invention.
  • the video search device 1 receives the query sequence input by the user. Specifically, if the user wants to search for a video segment related to the query sequence, or wants to search for a video including the video segment, input the query sequence through the input interface, and click the search button, the video search device 1 receives the user input.
  • the sequence of queries is such that subsequent devices search in the video database for a video segment that includes a sequence of queries, or further, a video that includes the video segment.
  • the user wants to search for a video segment including "Huang Huaweing eating at home” or a video including the video segment, and the user inputs a query sequence of "Huang Xiaoming eating at home” in the video search input interface, and in step S201, the video search device 1 Receive a query sequence of “Huang Xiaoming eating at home” input by the user, for example, by calling the application interface one or more times.
  • step S202 the video search apparatus 1 determines, according to the video frame index, at least one video frame corresponding to the matching of the query sequence and its corresponding at least one video, wherein the video frame index is based on each of the video pairs The identification of the information corresponding to the video frame is established or updated.
  • the video frame index may be a total index including video frames in all videos, or may be a sub-index established for each video separately.
  • the information included in each video frame is identified by identifying each video frame in each video separately, wherein the same video frame may include different information, and then separately included according to each video frame and each video frame.
  • the video frame index includes: a correspondence between each video frame and at least one information included in the video frame. Further, each of the video frames may further include an attribute tag for marking which video the video frame belongs to.
  • the video search device 1 After the video search device 1 identifies the information corresponding to each video frame in each video, at least one content summary corresponding to the content of the video frame is obtained from each video frame, for example, respectively.
  • the keywords summarized in the different information included in the video frame are referred to herein as label information. Therefore, the video frame index includes at least one of: a corresponding relationship between the video frames and different annotation information of the video frames.
  • the video search device 1 performs a matching query in the video frame index by means of exact matching, fuzzy matching, or a combination of the two according to the received query sequence information, such as sequentially determining whether the query sequence hits the video.
  • the video frame corresponding to the annotation information is a video frame corresponding to the query sequence input by the user, and further, the video frame corresponding to the video frame may be determined.
  • the video search device 1 can determine at least one video corresponding thereto according to the attribute tag of the at least one video frame.
  • the at least one video frame corresponding to the matching of the query sequence may be a video frame belonging to the same video, or may be a video frame belonging to different videos; the at least one video frame may be a continuous video frame, Can be a discontinuous video frame.
  • the identification of the information corresponding to each video frame in the video includes, but is not limited to, scene recognition based on a single frame image, scene recognition based on continuous multi-frame images, recognition of audio information based on the image, and image-based correspondence. Identification of subtitle information.
  • the 1000th video frame of video V1 is a picture of Huang Huaweing eating at home.
  • the video frame also shows the content of Huang Huaweing eating the noodles, or the picture of Huang Huaweing's parents, after identifying the information in the video frame, the video frame
  • the index may include: the attribute is marked as the corresponding relationship between the 1000th video frame of the video V1 and "Huang Huaweing eating at home", "Huang Huaweing eating noodles” and "Huang Huaweing parents”. If the 1000-4949 video frames of video V1 are including Huang Xiaoming at home, Huang Xiaoming eats the noodles and the pictures of Huang Xiaoming's parents, but Huang Huaweing's parents do not appear in the 1025-1049 video frames.
  • each of the 1000-1024 video frames of the video V1 corresponds to "Huang Huaweing eating at home", “Huang Huaweing eating noodles” and “Huang Huaweing parents”; video V1, 1025-1049 Each video frame in the video frame corresponds to "Huang Huaweing eating at home” and "Huang Xiaoming eating fried noodles”.
  • the video database of the search video includes multiple videos such as video V1, video V2, video V3, etc., a total video frame index including video frames in all videos in the database may be established, or one video may be established for each video.
  • step S202 the video search device 1 sequentially matches the query sequence in the video frame index corresponding to each video. At least one video frame.
  • the video search device 1 may use a fuzzy matching manner to query the video frame index by "Huang Huaweing” in step S202. , “Huang Huaweing eats”, “Huang Xiaoming at home”, “Huang Xiaoming eats at home”, “stars eat at home”, “Huang Huaweing and angelababy eat at home” and other key hits, the video frame corresponding to the hit information is a video frame corresponding to "Huang Huaweing eating at home", and further determining a video corresponding to the video frame; or using an exact matching manner, searching only the video frame index corresponding to "Huang Xiaoming eating at home” video frame and Corresponding video; or a combination of fuzzy matching and exact matching to match the corresponding video frame and video to the user.
  • the video search device 1 determines that the 54 annotation information hits corresponds to the video in the video frame index.
  • the 1000th to 1049th video frame of V1, the 225th video frame of video V3, and the 25th, 126th, and 127th frames of video V8, the video frame corresponding to "Huang Xiaoming eating at home” includes video V1.
  • the first 1000-1049 video frames, the 225th video frames of the video V3, and the 25th, 126th, and 127th video frames of the video V8, and the videos corresponding to "Huang Xiaoming eating at home” are videos V1, V3, and V8, respectively.
  • the video search device 1 determines a video corresponding to the query sequence according to the video frame index, wherein the video frame index is established or updated according to the identification of information corresponding to each video frame in the video, Therefore, in step S202, the video search device 1 may perform a search and match on the information of each video frame in the video according to the video frame index, determine information in which the sequence is hit, and determine and query the sequence according to the information of the hit. Corresponding video frames, thereby determining a video corresponding to the query sequence, and providing the video to the user.
  • the video-based content that is, the information of each video frame in the video is searched, so that the user can search for the video that he wants to watch more accurately, which improves the user's search experience.
  • step S203 the video search device 1 locates the play position of the corresponding at least one video according to the time stamp corresponding to the at least one video frame, and provides the at least one video to the user at the play position. Specifically, each video frame in the video has its corresponding timestamp in the video.
  • step S202 after the video search device 1 determines the at least one video frame corresponding to the query sequence, in step S203, the video search device 1 obtains a time stamp of each of the video frames in the respective corresponding video, and then And playing a position of the video frame corresponding to the at least one video frame to a position of the video frame that matches the query sequence, so that when the user selects the video for viewing, the video starts playing from the video frame corresponding to the query sequence.
  • the video search device 1 allows the user to directly view the video clip he or she wants to see.
  • the video search device 1 acquires, in step S203, one target video selected by the user from the at least one video, and if the target video includes multiple video frames that match the query sequence, press Any of the following positions the playback position of the target video: the default is determined according to a timestamp of the first video frame in the target video that matches the query sequence; selected by the user.
  • the video search device 1 acquires the target video selected by the user.
  • the playback position of the video may be positioned to the position of the video frame with the smallest timestamp, that is, the position of the video frame with the earliest playback time. It is also possible to locate the video frame with the highest matching degree by default; randomly locate the playback position of the video to any matching video frame; or select which video frame the playback position is to be selected by the user, and the selection method has a list form and a play progress bar.
  • An annotation form wherein the list form includes and includes not only a popup list. For example, after a video is selected by a user, in step S203, the video search device 1 provides a selection box for the user in the form of a popup list, by the user.
  • step S203 the video search device 1 provides the at least one video in which the playback position is located to the user, such that the user selects which video to watch from, the video will be directly from the video frame matching the query sequence. Playback is started, so the user can directly view the screen corresponding to the query sequence.
  • the video frame matching the "Huang Huaweing's meal" in the video V1 is: the 1000th to 1049th video frame in the video V1, the 225th video frame in the video V3, and the 25th, 126th, 127th in the video V8.
  • Video frames by default, the playback position of the video V1 is positioned to the 1000th video frame.
  • the playback position of the V3 is located to the 225th video frame, and the playback position of the V8 is positioned to the 25th video frame by default;
  • V1 is positioned to any one of the 1000-1049 video frames, any one of the 25th, 126th, and 127th video frames of V8; or the user may select which matching video frame to position the playback position to. .
  • the video search device 1 can search not only the video corresponding to the query sequence matching based on the information of each video frame in the video, but also find the video that satisfies the user's viewing condition accurately, and determine the corresponding video in the video search device 1. Then, in step S203, the video search device 1 can accurately position the play position of the video to a position corresponding to the query sequence, so that the video is played from the video frame corresponding to the query sequence, so that the user can be faster. Viewing the video clip corresponding to the query sequence improves the search efficiency of the user and reduces the operation time of the user. Therefore, the video search device 1 can not only accurately search for the user to view the video, but also accurately position the play position of the video to the position of the video segment that the user wants to watch.
  • the video search device 1 further includes step S204 (not shown) and step S205 (not shown).
  • the video search device 1 identifies the information corresponding to each video frame in each video to obtain corresponding annotation information. Specifically, in step S204, the video search device 1 performs image recognition on each video frame in each video, and identifies related audio information or subtitle information corresponding to each video frame to obtain corresponding at least one annotation.
  • the 1000th video frame of video V1 is a picture of Huang Huaweing eating at home.
  • the video frame also includes the content of Huang Huaweing eating fried noodles, or Huang Huaweing's parents. Therefore, the 1000th video frame of video V1 and the annotation information are respectively: Huang Xiaoming eats at home, Huang Xiaoming eats fried noodles, and Huang Xiaoming's parents correspond.
  • the manner of obtaining the corresponding at least one annotation information includes but is not limited to:
  • the image feature of the video frame image is identified by means of image recognition, and at least one annotation information included in the video frame image is determined according to the image feature, where the at least one annotation information is respectively the video frame A summary of the different content included.
  • the video frame image may be determined according to the image features of the at least one video frame and the at least one video frame of the video frame image.
  • At least one content summary for example, determining a state of the object in the video frame according to a positional movement of the object in the plurality of video frames before and after, and then using the at least one content summary as at least one annotation information of the frame video, and the video frame
  • the at least one annotation information is updated into the video frame index by the video frame correspondence.
  • a video frame image displays a car
  • the state in which the car is stopped or the state of motion in the video frame cannot be recognized based on only one video frame, and therefore needs to be based on the front image of the video frame or
  • the rear image jointly determines the running state of the car. If the position of the car in the front image and the rear image changes according to the position of the car appearing in the video frame, the car is determined to be in an operating state, and the car is determined according to the size of the position change.
  • the high-speed running state then the car running or the car high-speed operation as the annotation information of the video frame, corresponding to the video frame in the video frame index.
  • step S205 the video search device 1 establishes or updates the video frame index according to the annotation information, wherein the determining device matches the corresponding at least one video frame and its corresponding at least one video according to the query sequence.
  • the method includes: matching, according to the query sequence, at least one video frame having the annotation information that matches the query sequence and the corresponding at least one video in the video frame index.
  • the video search device 1 establishes a video frame index according to a correspondence between each video frame in the video and its corresponding annotation information.
  • the video frame index includes: a correspondence between each video frame and at least one of the corresponding annotation information, so that the video search device 1 matches the query sequence with the annotation information in the video frame index, and if the matching is successful, the annotation is performed.
  • the video frame corresponding to the information is a video frame matching the query sequence, and then the video search device 1 can determine the video of the video frame according to the attribute tag of each video frame.
  • the video frame index may be a total index established according to each video frame of each video, or may be a sub index established according to each video frame in any one video.
  • the video search device 1 after the video search device 1 receives the query sequence of the user, the video search device 1 performs matching in the video frame index according to the query sequence. And determining, according to the matched at least one video frame, the respective corresponding video; if the video frame is at least one sub-index corresponding to each video, the video search device 1 sequentially corresponds to each video according to the received query sequence.
  • the matching is performed in the sub-index, and when the matching obtains the corresponding video frame, the video search device 1 can directly learn the video corresponding to the video frame.
  • the 48th video frame of the video V112 is a picture that Zhang Jie is singing.
  • the video search device 1 can recognize the message in step S204.
  • the content of the video frame also includes: Zhang Jie singing in China's new song, Zhang Jie singing the war and other information, so "Zhang Jie is singing", “Zhang Jie singing in China”, “Zhang Jie singing against the war” as the video frame Label information.
  • the video search device 1 After obtaining the annotation information of the other video frames in the video and the annotation information of the video frames of the other videos, in step S205, the video search device 1 establishes a video frame index including the correspondence relationship between the video frames and the annotation information.
  • the method includes: the 48th video frame whose attribute is marked as the video V112 and the annotation information "Zhang Jie is singing", “Zhang Jie singing in China", “Zhang Jie singing” Corresponding to the anti-war; or if the video frame is a sub-index corresponding to each video, the video frame index corresponding to the video V112 includes: the 112th video frame and the annotation information "Zhang Jie is singing", “Zhang Jie China’s new singing voice sings and “Zhang Jie sings against the war”. There may be multiple annotation information for each video frame to comprehensively label the video frame from different aspects, so that the video frame is more easily retrieved or matched.
  • the video search device 1 further includes step S206 (not shown) and step S207 (not shown).
  • step S206 the video search device 1 identifies the information corresponding to each video frame in each video to obtain corresponding annotation information.
  • the video search device 1 establishes for each video according to the annotation information. a sub-index; wherein the matching the at least one video frame and the corresponding at least one video according to the query sequence comprises: sequentially determining, according to the query sequence, each sub-index corresponding to each video to have a match The query sequence matches at least one video frame of the annotation information.
  • step S206 is the same as the implementation process of the foregoing S204.
  • step S206 the video search device 1 obtains the annotation information corresponding to the video frame, and in step S207, the video search device 1 establishes a sub-index for each video according to the annotation information, in step S201, After the video search device 1 receives the query sequence of the user, in step S202, the video search device 1 sequentially matches the annotation information in the video frame sub-index corresponding to each video to the query sequence, and according to the query.
  • the at least one annotation information corresponding to the sequence matching determines a video frame respectively corresponding to the at least one annotation information.
  • the video search device 1 can be provided with an in-video search function for searching for any video segment within the video, so that the user can directly locate the play position by searching for the keyword input when watching the video.
  • the input box inputs the annotation information corresponding to any video frame or video segment that the user wants to watch, so that the video search device 1 searches within the video.
  • the video search device 1 After obtaining the video frame corresponding to the annotation information, the video search device 1 locates the playback position to the video frame corresponding to the annotation information. If there are multiple video frames corresponding to the tag information, in step S203, the video search device 1 defaults to positioning the play position to the video frame whose time stamp is the smallest, that is, at the video frame of the earliest play position.
  • the video V27 is a segment of the movie video "Frozen”
  • there is a video frame index corresponding to the video V27 wherein the video frame index includes the correspondence of the 285th video frame and the annotation information "Ice Snow Queen's Magic Building Ice Palace" relationship.
  • the user inputs a query sequence of “Ice Snow Palace” in the search function input box corresponding to the video V27, and in step S202, the video search device 1 matches the label information of the “Ice and Snow Queen Magic Snow Palace” for the user, and then Determining, according to the label information, the corresponding video frame is the 285th video frame, and in step S203, the video search device 1 locates the play position to the position of playing the 285th video frame according to the time stamp of the 285th video frame. And ask the user if playback starts from that location. If the user agrees, the video will be played directly from the video frame of "The Snow Queen".
  • the video search device 1 realizes that the user can directly navigate to the position desired to be viewed by the search without manual dragging by the user.
  • the video search device 1 further includes step S208 (not shown).
  • step S208 the video search device 1 sorts the at least one video to obtain the sorted at least one video; wherein, in step S203, the video search device 1 is configured to: locate the at least one video frame according to the at least one video frame. And corresponding to the play position of the sorted at least one video, and the sorted at least one video is provided to the user by the play position.
  • the manner of sorting the at least one video includes but is not limited to:
  • Video publisher information corresponding to the video Specifically, if the video publisher publishes a video with a large number of historical records and the evaluation is high, the video has a higher sorting priority or a higher ranking weight.
  • Source information of the video Specifically, if the video is from a relatively well-known large website, for example, iQiyi, Youku, Sohu, and other well-known websites, the video has a higher sorting priority or a higher ranking weight.
  • the clarity or fluency of the video Specifically, if the resolution of the video is higher, or the play is smoother, the video has a higher sorting priority or a higher sorting weight.
  • the subject information of the video Specifically, if the subject of the video is popular in the most recent period, the video has a higher sorting priority or a higher sorting weight.
  • the matching query for example, fuzzy matching or exact matching
  • the video has higher ranking priority or higher ranking weight
  • the video is fuzzy If the match is obtained, the video has a higher sort priority or a higher sort weight.
  • the present invention also provides a computer readable storage medium storing computer code, the video search method of any of the above being executed when the computer code is executed.
  • the present invention also provides a computer program product, wherein the video search method of any of the above is performed when the computer program product is executed by a computer device.
  • the present invention also provides a computer device comprising a memory and a processor, the memory storing computer code, the processor being configured to execute the computer code to perform any of the above Video search method.
  • FIG. 3 illustrates a block diagram of an exemplary computer system/server 12 suitable for use in implementing embodiments of the present invention.
  • the computer system/server 12 shown in FIG. 3 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • computer system/server 12 is embodied in the form of a general purpose computing device.
  • the components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, system memory 28, and bus 18 that connects different system components, including system memory 28 and processing unit 16.
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an Enhanced ISA Bus, a Video Electronics Standards Association (VESA) local bus, and peripheral component interconnects ( PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI peripheral component interconnects
  • Computer system/server 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer system/server 12, including both volatile and non-volatile media, removable and non-removable media.
  • Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
  • Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 3, commonly referred to as a "hard disk drive”).
  • a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk”), and a removable non-volatile disk (such as a CD-ROM, DVD-ROM) may be provided. Or other optical media) read and write optical drive.
  • each drive can be coupled to bus 18 via one or more data medium interfaces.
  • Memory 28 can include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of various embodiments of the present invention.
  • a program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment.
  • Program module 42 typically performs the functions and/or methods of the described embodiments of the present invention.
  • Computer system/server 12 may also be in communication with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and may also be in communication with one or more devices that enable a user to interact with the computer system/server 12. And/or in communication with any device (e.g., network card, modem, etc.) that enables the computer system/server 12 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 22. Also, computer system/server 12 may also communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 20.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • network adapter 20 communicates with other modules of computer system/server 12 via bus 18. It should be understood that although not shown in FIG. 3, other hardware and/or software modules may be utilized in conjunction with computer system/server 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems. , tape drives, and data backup storage systems.
  • Processing unit 16 executes various functional applications and data processing by running programs stored in memory 28.
  • the memory 28 stores therein a computer program for performing the functions and processes of the present invention, and when the processing unit 16 executes the corresponding computer program, the identification of the incoming call intention at the network side by the present invention is implemented.
  • the present invention can be implemented in software and/or a combination of software and hardware.
  • the various devices of the present invention can be implemented using an application specific integrated circuit (ASIC) or any other similar hardware device.
  • the software program of the present invention may be executed by a processor to implement the steps or functions described above.
  • the software program (including related data structures) of the present invention can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like.
  • some of the steps or functions of the present invention may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

一种基于视频内容的视频搜索方法和视频搜索装置,该方法包括:接收用户输入的查询序列(S201),根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新(S202),根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户(S203)。可以基于视频中每一个视频帧的信息来搜索与查询序列匹配对应的视频,为用户精准地找到其要观看的视频,而且在搜索出对应的视频后,可将视频的播放位置定位至与查询序列对应的视频帧处进行播放,减少了用户的操作时间,提高了视频搜索的准确率。

Description

一种基于视频内容的视频搜索方法和视频搜索装置
相关申请的交叉引用
本申请享有2018年1月26日提交的专利申请号为201810077785.0、名称为“一种基于视频内容的视频搜索方法和视频搜索装置”的中国专利申请的优先权,该在先申请的内容以引用方式合并于此。
技术领域
本发明涉及视频搜索领域,尤其涉及一种基于视频内容的视频搜索技术。
背景技术
随着多媒体业务的不断发展,数据库中存储的视频数量越来越多,用户对视频搜索的需求也越来越大。
现有技术主要通过视频标注的外部标签,例如,视频标题,发布者名称,视频摘要,或者简介信息来实现从海量的数据中搜索用户需要的视频。但是用户有时候只能记得某个画面,或者该画面展示或者介绍的内容,经常无法根据视频标注的笼统的外部标签,搜索到包括该画面的视频。
因此,如何提出一种基于视频的内容或者视频帧的内容进行视频搜索的方法,成为本领域亟需解决的技术问题之一。
发明内容
本发明的目的是提供一种基于视频内容的视频搜索装置和视频搜索方法。
根据本发明的一个方面,提供了一种基于视频内容的视频搜索装 置,其中,该视频搜索装置包括:
接收装置,用于接收用户输入的查询序列;
确定装置,用于根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新;
定位装置,用于根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户。
优选地,该视频搜索装置还包括:
第一识别装置,用于对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
更新装置,用于根据所述标注信息,建立或更新所述视频帧索引;
其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
根据所述查询序列,在所述视频帧索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧及其对应的至少一个视频。
优选地,该视频搜索装置还包括:
第二识别装置,用于对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
建立装置,用于根据所述标注信息,为每一个视频建立一个子索引;
其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
根据所述查询序列,依次在各个视频分别对应的各个子索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧。
优选地,所述标注信息通过以下至少任一项获得:
识别一个视频帧图像的图像特征,根据所述图像特征确定所述一个视频帧图像对应的标注信息;
识别一个视频帧图像的图像特征,并根据识别的所述一个视频帧图像的前置图像和后置图像的图像特征,确定对应的标注信息;
识别至少一个连续的视频帧所对应的音频信息,将所述音频信息转换成文字信息,根据所述文字信息确定所述至少一个连续的视频帧对应的标注信息;
提取至少一个连续的视频帧所对应的字幕信息,根据所述字幕信息确定所述至少一个连续的视频帧对应的标注信息。
优选地,该视频搜索装置还包括:
排序装置,用于对所述至少一个视频进行排序,获得排序后的至少一个视频;
其中,定位装置用于:
根据所述至少一个视频帧,定位其对应的所述排序后的至少一个视频的播放位置,以所述播放位置将所述排序后的至少一个视频提供给所述用户。
优选地,根据以下至少任一项对所述至少一个视频进行排序:
视频中所包括的与所述查询序列匹配的视频帧的数量;
视频对应的视频发布者信息;
视频的来源信息;
视频的清晰度;
视频的主题信息;
用户对视频的反馈信息。
优选地,所述定位装置还用于:
获取所述用户自所述至少一个视频中所选择的一个目标视频;
若所述目标视频中包括多个与所述查询序列匹配的视频帧,则按以下任一项来定位所述目标视频的播放位置:
默认根据所述目标视频中第一个与所述查询序列匹配的视频帧 的时间戳来确定;
默认定位至与所述查询序列匹配程度最高的视频帧;
由所述用户选择。
根据本发明的另一个方面,还提供了一种基于视频内容的视频搜索方法,其中,该视频搜索方法包括:
a.接收用户输入的查询序列;
b.根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新;
c.根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户。
优选地,该视频搜索方法还包括:
对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
根据所述标注信息,建立或更新所述视频帧索引;
其中,步骤b中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
根据所述查询序列,在所述视频帧索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧及其对应的至少一个视频。
优选地,该视频搜索方法还包括:
对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
根据所述标注信息,为每一个视频建立一个子索引;
其中,步骤b中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
根据所述查询序列,依次在各个视频分别对应的各个子索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧。
优选地,所述标注信息通过以下至少任一项获得:
识别一个视频帧图像的图像特征,根据所述图像特征确定所述一个视频帧图像对应的标注信息;
识别一个视频帧图像的图像特征,并根据识别的所述视频帧图像的前置图像和后置图像的图像特征,确定对应的标注信息;
识别一个视频帧所对应的音频信息,将所述音频信息转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息;
提取一个视频帧所对应的字幕信息,根据所述字幕信息确定所述视频帧对应的标注信息。
优选地,该视频搜索方法还包括:
对所述至少一个视频进行排序,获得排序后的至少一个视频;
其中,步骤c包括:
根据所述至少一个视频帧,定位其对应的所述排序后的至少一个视频的播放位置,以所述播放位置将所述排序后的至少一个视频提供给所述用户。
优选地,根据以下至少任一项对所述至少一个视频进行排序:
视频中所包括的与所述查询序列匹配的视频帧的数量;
视频对应的视频发布者信息;
视频的来源信息;
视频的清晰度;
视频的主题信息;
用户对视频的反馈信息。
优选地,步骤c中以所述播放位置将所述至少一个视频提供给所述用户还包括:
获取所述用户自所述至少一个视频中所选择的一个目标视频;
若所述目标视频中包括多个与所述查询序列匹配的视频帧,则按 以下任一项来定位所述目标视频的播放位置:
默认根据所述目标视频中第一个与所述查询序列匹配的视频帧的时间戳来确定;
默认定位至与所述查询序列匹配程度最高的视频帧;
由所述用户选择。
根据本发明的又一个方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机代码,当所述计算机代码被执行时,如上任一项所述的视频搜索方法被执行。
根据本发明的再一个方面,还提供了一种计算机程序产品,当所述计算机程序产品被计算机设备执行时,如上任一项所述的视频搜索方法被执行。
根据本发明的再一个方面,还提供了一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器中存储有计算机代码,所述处理器被配置来通过执行所述计算机代码以执行如上任一项所述的方法。
与现有技术相比,本发明具有以下优点:
本发明根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,并且所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新,然后根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户,不仅可以基于视频中每一个视频帧的信息来确定与查询序列匹配对应的视频,为用户精准的找到其要观看的视频,而且在搜索出对应的视频后,可以将视频的播放位置定位至与查询序列对应的视频帧处进行播放,使得用户可以快速的观看到与所述查询序列对应的视频片段,减少了用户的操作时间,而且能准确的为用户定位至与查询序列对应的播放位置。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:
图1示出根据本发明一个方面的一种基于视频内容的视频搜索装置的结构示意图;
图2示出根据本发明另一个方面的一种基于视频内容的视频搜索方法的流程示意图;
图3示出了适于用来实现本发明实施方式的示例性计算机系统/服务器的框图。
附图中相同或相似的附图标记代表相同或相似的部件。
具体实施方式
在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
所述计算机设备包括用户设备与网络设备。其中,所述用户设备包括但不限于电脑、智能手机、PDA等;所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量计算机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。其中,所述计算机设备可单独运行来实现本发明,也可接入网络并通过与网络中的其他计算机设备的交互操作来实现本发明。其中,所述计算机设备所处的网络包括但不限于互联网、广 域网、城域网、局域网、VPN网络等。
需要说明的是,所述用户设备、网络设备和网络等仅为举例,其他现有的或今后可能出现的计算机设备或网络如可适用于本发明,也应包含在本发明保护范围以内,并以引用方式包含于此。
后面所讨论的方法(其中一些通过流程图示出)可以通过硬件、软件、固件、中间件、微代码、硬件描述语言或者其任意组合来实施。当用软件、固件、中间件或微代码来实施时,用以实施必要任务的程序代码或代码段可以被存储在机器或计算机可读介质(比如存储介质)中。(一个或多个)处理器可以实施必要的任务。
这里所公开的具体结构和功能细节仅仅是代表性的,并且是用于描述本发明的示例性实施例的目的。但是本发明可以通过许多替换形式来具体实现,并且不应当被解释成仅仅受限于这里所阐述的实施例。
应当理解的是,虽然在这里可能使用了术语“第一”、“第二”等等来描述各个单元,但是这些单元不应当受这些术语限制。使用这些术语仅仅是为了将一个单元与另一个单元进行区分。举例来说,在不背离示例性实施例的范围的情况下,第一单元可以被称为第二单元,并且类似地第二单元可以被称为第一单元。这里所使用的术语“和/或”包括其中一个或更多所列出的相关联项目的任意和所有组合。
应当理解的是,当一个单元被称为“连接”或“耦合”到另一单元时,其可以直接连接或耦合到所述另一单元,或者可以存在中间单元。与此相对,当一个单元被称为“直接连接”或“直接耦合”到另一单元时,则不存在中间单元。应当按照类似的方式来解释被用于描述单元之间的关系的其他词语(例如“处于...之间”相比于“直接处于...之间”,“与...邻近”相比于“与...直接邻近”等等)。
这里所使用的术语仅仅是为了描述具体实施例而不意图限制示例性实施例。除非上下文明确地另有所指,否则这里所使用的单数形式“一个”、“一项”还意图包括复数。还应当理解的是,这里所使用的术语“包括”和/或“包含”规定所陈述的特征、整数、步骤、操作、单元和/或组件 的存在,而不排除存在或添加一个或更多其他特征、整数、步骤、操作、单元、组件和/或其组合。
还应当提到的是,在一些替换实现方式中,所提到的功能/动作可以按照不同于附图中标示的顺序发生。举例来说,取决于所涉及的功能/动作,相继示出的两幅图实际上可以基本上同时执行或者有时可以按照相反的顺序来执行。
下面结合附图对本发明作进一步详细描述。
图1示出根据本发明一个方面的一种基于视频内容的视频搜索装置的结构示意图。视频搜索装置1包括:接收装置101、确定装置102、和定位装置103。
在此,视频搜索装置1例如位于计算机设备中,所述计算机设备包括用户设备与网络设备。以下以该视频搜索装置1位于网络设备中为例进行详细描述。
其中,所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量计算机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。其中,所述计算机设备可单独运行来实现本发明,也可接入网络并通过与网络中的其他计算机设备的交互操作来实现本发明。其中,所述计算机设备所处的网络包括但不限于互联网、广域网、城域网、局域网、VPN网络等。
接收装置101接收用户输入的查询序列。具体地,用户若想搜索某个与查询序列相关的视频片段,或想搜索某个包括前述视频片段的视频时,其通过输入界面输入查询序列,并点击搜索按钮,接收装置101接收用户输入的查询序列,以便后续装置在视频数据库中搜索其中包括与查询序列对应的视频片段,或进一步地,包括该视频片段的视频。例如,用户想搜索其中包括“黄晓明在家吃饭”的视频片段或者其中包括该视频片段的视频,用户在视频搜索输入界面输入:“黄晓明在家吃饭”这一查询序列,接收装置101例如通过一次或者多次 调用应用程序接口的方式接收用户输入的“黄晓明在家吃饭”这一查询序列。
确定装置102根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新。
在此,所述视频帧索引可以为包括所有视频中的视频帧的总索引,也可以是为每个视频分别建立的一个子索引。通过将每一个视频中的每一个视频帧分别进行识别,识别出每一个视频帧所包括的信息,其中,同一个视频帧可以包括不同信息,然后根据每一个视频帧以及每个视频帧分别包括的信息,建立一个对应于所有视频的总的视频帧索引,或者为每一个视频分别建立一个视频帧索引。所述视频帧索引中包括:每一个视频帧分别与其包括的至少一个信息的对应关系。进一步地,所述每一个视频帧还可以分别包括一个属性标记,该属性标记用于标记该视频帧属于哪一个视频。
具体地,视频搜索装置1对每一个视频中每一个视频帧对应的信息进行识别后,自每一个视频帧分别获得至少一个对应该视频帧的内容的内容摘要,所述内容摘要例如分别为根据该视频帧内所包括的不同的信息总结出来的关键词,在此称之为标注信息,因此,视频帧索引中包括至少一个:视频帧分别与该视频帧的不同的标注信息的对应关系。确定装置102根据接收到的查询序列信息,通过精确匹配、模糊匹配或者两者相结合的方式,在视频帧索引中进行匹配查询,如依次判断该查询序列是否命中该视频帧索引中各个视频帧所对应的标注信息,若命中某个标注信息,则该标注信息对应的视频帧即是该用户输入的查询序列所对应的视频帧,并进而可以确定该视频帧对应的视频。例如,确定装置102根据所述至少一个视频帧的属性标记即可确定与其对应的至少一个视频。其中,与所述查询序列匹配对应的所述至少一个视频帧可以是属于同一个视频的视频帧,也可以是属于不同视频的视频帧;所述至少一个视频帧可以是连续的视频帧,也可以 是不连续的视频帧。其中,对视频中每一个视频帧对应的信息的识别包括但不限于:基于单独一帧图像的场景识别、基于连续多帧图像的场景识别、基于图像对应的音频信息的识别、基于图像对应的字幕信息的识别。
例如,视频V1的第1000个视频帧是黄晓明在家吃饭的画面,该视频帧还展示了黄晓明吃炸酱面的内容,或者黄晓明父母的画面,则对该视频帧中信息进行识别后,视频帧索引中可以包括:属性标记为视频V1的第1000个视频帧分别与“黄晓明在家吃饭”、“黄晓明吃炸酱面”和“黄晓明父母”的对应关系。若视频V1的第1000-1049个视频帧都为包括黄晓明在家吃饭,黄晓明吃的是炸酱面,以及黄晓明父母三者的画面,但是第1025-1049个视频帧中没有出现黄晓明父母,则在视频帧索引中,视频V1的第1000-1024个视频帧中的每一个视频帧都分别对应“黄晓明在家吃饭”、“黄晓明吃炸酱面”和“黄晓明父母”;视频V1的第1025-1049个视频帧中的每一个视频帧都分别对应“黄晓明在家吃饭”和“黄晓明吃炸酱面”。若搜索视频的视频数据库中包括视频V1,视频V2,视频V3……等多个视频,可以建立包括该数据库中所有视频中的视频帧的总的视频帧索引,也可以为每一个视频建立一个仅包括其视频帧的视频帧索引,例如为视频V1建立一个对应的视频帧索引时,确定装置102依次在每个视频对应的视频帧索引中为所述查询序列匹配对应的至少一个视频帧。
若接收装置101接收到的查询序列为“黄晓明在家吃饭”,确定装置102可以采用模糊匹配的方式,在所述视频帧索引中查询被“黄晓明”、“黄晓明吃饭”、“黄晓明在家”、“黄晓明在家吃饭”、“明星在家吃饭”、“黄晓明和angelababy在家吃饭”等关键词命中的标注信息,命中的所述标注信息对应的视频帧即是与“黄晓明在家吃饭”所对应的视频帧,并进而可以确定该视频帧对应的视频;或者采用精确匹配方式,仅在所述视频帧索引中搜索对应于“黄晓明在家吃饭”的视频帧及其对应的视频;或者采用模糊匹配和精确匹配相结 合的方式为用户匹配对应的视频帧及视频。若在视频帧索引中被“黄晓明在家吃饭”这一查询序列命中的标注信息为54个,确定装置102确定命中的54个标注信息在视频帧索引中分别对应于:视频V1的第1000-1049个视频帧,视频V3的第225个视频帧,以及视频V8的第25帧、第126帧、第127帧,则与“黄晓明在家吃饭”对应的视频帧包括视频V1的第1000-1049个视频帧,视频V3的第225个视频帧,以及视频V8的第25、126、127个视频帧,与“黄晓明在家吃饭”对应的视频分别为视频V1、V3和V8。
在此,确定装置102根据视频帧索引确定与查询序列对应的视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或者更新的,因此,确定装置102可以根据视频帧索引,对视频中的每一个视频帧的信息进行检索匹配,确定其中被查询序列命中的信息,根据命中的所述信息确定与查询序列对应的视频帧,从而确定与所述查询序列对应的视频,提供给用户。基于视频的内容,即视频中每一个视频帧的信息进行搜索,使得用户可以更加精准的搜索到自己想要观看的视频,提升了用户的搜索体验。
本领域技术人员应能理解,对视频中每一个视频帧对应的信息的识别的方式仅为举例,现有的或者今后可能出现的对视频中每一个视频帧对应的信息的识别的方式,如可适用于本发明也应包含在本发明保护范围内,并在此以引用的方式包含于此。
定位装置103根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户。具体地,视频中的每个视频帧都有其在该视频中对应的时间戳。确定装置102确定与查询序列对应的所述至少一个视频帧后,定位装置103获得其中的每一个视频帧在各自对应的视频中的时间戳,然后将与所述至少一个视频帧分别对应的视频的播放位置定位至与查询序列匹配的视频帧的位置,以便用户选中该视频进行观看时,该视频自与查询序列对应的视频帧处开始播放。定位装置103使 得用户可以直接观看其想看到的视频片段。
优选地,所述定位装置103获取所述用户自所述至少一个视频中所选择的一个目标视频,若所述目标视频中包括多个与所述查询序列匹配的视频帧,则按以下任一项定位所述目标视频的播放位置:默认根据所述目标视频中第一个与所述查询序列匹配的视频帧的时间戳来确定;由所述用户选择。具体地,用户自定位装置103提供的至少一个视频中选中一目标视频进行观看时,定位装置103获取用户选中的所述目标视频。如果该目标视频中包括多个与所述查询序列匹配的视频帧,则:可以默认将该视频的播放位置定位至时间戳最小的那个视频帧的位置,即播放时间最早的视频帧的位置,也可以默认定位至匹配程度最高的视频帧;将该视频的播放位置随机定位至任意一个匹配的视频帧;或者由用户选择播放位置定位至哪个视频帧,选择的方法有列表形式和播放进度条标注形式,其中所述列表形式包括且不仅仅包括弹窗列表,例如,一视频被用户选中后,定位装置103以弹窗列表的形式为用户提供一个选择框,由用户选择自哪个对应的视频帧开始播放。然后定位装置103将定位了播放位置的所述至少一个视频提供给用户,使得用户无论从中选择哪个视频进行观看,该视频都会直接从与所述查询序列匹配的视频帧处开始播放,因而用户可以直接观看与所述查询序列对应的画面。
接上例,视频V1中与“黄晓明吃饭”匹配的视频帧为:视频V1中的第1000-1049个视频帧、视频V3中的第225个视频帧、视频V8中的第25、126、127个视频帧,则可默认将视频V1的播放位置定位至第1000个视频帧,默认将V3的播放位置定位至第225个视频帧,默认将V8的播放位置定位至第25个视频帧;或者V1定位至第1000-1049个视频帧中的任意一个视频帧、V8的第25、126、127个视频帧中的任意一个视频帧;或者可以由用户选择将播放位置定位至哪个匹配的视频帧。
在此,视频搜索装置1不仅可以基于视频中每一个视频帧的信息 来搜索与查询序列匹配对应的视频,为用户准确的找到满足用户观看条件的视频,而且在确定装置102确定对应的视频后,定位装置103可以准确的将该视频的播放位置定位至与查询序列对应的位置,使该视频自与查询序列对应的视频帧处进行播放,从而使得用户可以更加快速的观看到与所述查询序列对应的视频片段,提高了用户的搜索效率,减少了用户的操作时间。因此,视频搜索装置1不仅能为用户准确的搜索到想观看视频,还能将视频的播放位置精确的定位至用户想观看的视频段的位置。
优选地,视频搜索装置1还包括第一识别装置104(未示出)和更新装置105(未示出)。
第一识别装置104对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息。具体地,第一识别装置104分别对每一个视频中的每一个视频帧进行图像识别,以及对每一个视频帧对应的相关音频信息或者字幕信息进行识别,获得对应的至少一个标注信息,其中,所述标注信息为对应的视频帧所包括的内容的摘要。例如,视频V1的第1000个视频帧是黄晓明在家吃饭的画面,该视频帧还包括黄晓明吃炸酱面的内容,或者黄晓明的父母,因此,视频V1的第1000个视频帧分别与标注信息:黄晓明在家吃饭、黄晓明吃炸酱面、黄晓明父母对应。
优选地,获得对应的所述至少一个标注信息的方式包括但不限于:
1)识别一个视频帧图像的图像特征,根据所述图像特征确定所述一个视频帧图像对应的标注信息。具体地,通过图像识别的方式,识别所述视频帧图像的图像特征,根据所述图像特征确定所述视频帧图像中包括的至少一个标注信息,所述至少一个标注信息分别为该视频帧所包括的不同内容的摘要。
2)识别一个视频帧图像的图像特征,并根据识别的所述一个视频帧图像的前置图像和后置图像的图像特征,确定对应的标注信息。 具体地,若仅根据所述视频帧图像无法确定标注信息,可以根据该视频帧图像的前置至少一个视频帧和后置至少一个视频帧的图像特征,综合判断,确定该视频帧图像包括的至少一个内容摘要,例如,根据前后多个视频帧中物体的位置移动确定视频帧中该物体的状态,然后将所述至少一个内容摘要分别作为该帧视频的至少一个标注信息,并该视频帧的至少一个标注信息分别该视频帧对应关系更新至所述视频帧索引中。例如,一视频帧图像显示的内容为一辆汽车,仅根据一个视频帧无法识别出该视频帧内出现的该汽车是停止的状态还是运动的状态,因此需要根据该视频帧的前置图像或者后置图像共同确定该汽车的运行状态,若根据该视频帧中出现的汽车在其前置图像和后置图像中的位置变化,判断该汽车为运行状态,并且根据位置变化的大小判断汽车为高速运行状态,然后将汽车运行或者汽车高速运行作为该视频帧的标注信息,在视频帧索引中与该视频帧对应。
3)识别一个视频帧所对应的音频信息,将所述音频信息转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息。具体地,识别所述视频帧对应的音频信息,将所述音频信息进行语音识别,转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息。
4)提取一个视频帧所对应的字幕信息,根据所述字幕信息确定所述视频帧对应的标注信息。
5)通过人工智能的方式,自动识别一个视频帧图像的图像特征、该视频帧图像的前置图像和后置图像的图像特征、该视频帧图像所对应的音频信息的内容、以及该视频帧图像对应的字幕信息的内容,并基于通过人智能识别的上述信息确定该视频帧图像对应的标注信息。
本领域技术人员应能理解,上述获得对应的所述至少一个标注信息的方式仅为举例,现有的或者今后可能出现的其他获得对应的所述至少一个标注信息的方式,如可适用于本发明也应包含在本发明保护范围内,并在此以引用的方式包含于此。
更新装置105根据所述标注信息,建立或更新所述视频帧索引,其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:根据所述查询序列,在所述视频帧索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧及其对应的至少一个视频。具体地,更新装置105根据视频中每一个视频帧与其对应的标注信息之间的对应关系建立视频帧索引。其中,视频帧索引中包括:每一个视频帧分别与其对应的至少一个标注信息的对应关系,以便视频搜索装置1将查询序列与视频帧索引中的标注信息进行匹配,若匹配成功,则该标注信息对应的视频帧为与查询序列匹配的视频帧,然后视频搜索装置1根据每一个视频帧的属性标记便能确定该视频帧所在的视频。其中,所述视频帧索引可以是根据每一个视频的每一个视频帧建立的总索引,也可以是分别根据任意一个视频中的每一个视频帧建立的一个子索引。当所述视频帧索引为包括每一个视频中的每一个视频帧的总索引时,在接收装置101接收用户的查询序列后,确定装置102根据该查询序列在该视频帧索引中进行匹配,然后根据匹配的至少一个视频帧分别确定其各自对应的视频;若所述视频帧为与每个视频分别对应的至少一个子索引,确定装置102根据接收的查询序列依次在每个视频对应的子索引中进行匹配,在匹配得到对应的视频帧时,确定装置102可以直接获知与该视频帧对应的视频。
例如,视频V112的第48个视频帧是张杰在唱歌的画面,通过该视频帧的音频信息、字幕信息、以及该视频帧上出现的海报信息,第一识别装置104可识别出该视频帧的内容还包括:张杰在中国新歌声唱歌,张杰唱逆战等信息,因此将“张杰在唱歌”、“张杰在中国新歌声唱歌”、“张杰唱逆战”作为该视频帧的标注信息。在获得该视频中其他视频帧的标注信息以及其他视频的视频帧的标注信息后,更新装置105建立包括视频帧与标注信息的对应关系的视频帧索引。若 所述视频帧索引为总的视频帧索引,则其中包括:属性标记为视频V112的第48个视频帧分别与标注信息“张杰在唱歌”、“张杰在中国新歌声唱歌”、“张杰唱逆战”对应;或者若所述视频帧为对应于每个视频的子索引,则视频V112对应的视频帧索引中包括:第112个视频帧分别与标注信息“张杰在唱歌”、“张杰在中国新歌声唱歌”、“张杰唱逆战”对应。其中,每一个视频帧的标注信息可以有多个,以从不同方面对该视频帧进行全面标注,以便该视频帧被更容易的检索到或者匹配到。
优选地,视频搜索装置1还包括第二识别装置106(未示出)和更新装置107(未示出)。
第二识别装置106对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;建立装置107根据所述标注信息,为每一个视频建立一个子索引;其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:根据所述查询序列,依次在各个视频分别对应的各个子索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧。在此,第二识别装置106与前述第一识别装置104的实现方式相同。具体地,第二识别装置106获得与视频帧对应的标注信息,建立装置107根据所述标注信息,为每一个视频建立一个子索引,接收装置101接收用户的查询序列后,确定装置102依次将各个视频分别对应的视频帧子索引中的标注信息与所述查询序列进行匹配对应,并根据与所述查询序列匹配对应的至少一个标注信息确定与所述至少一个标注信息所分别对应的视频帧。
优选地,可以基于视频搜索装置1为用户提供在视频内搜索任意视频段的视频内搜索功能,使得用户在观看视频时通过对其输入的关键词的搜索,便可以直接将播放位置定位至其想观看的视频帧位置。若用户自定位装置103提供的至少一个视频中选择一个视频进行播放观看,或者用户任意点选一个视频进行播放,用户可以根据该视频对 应的视频帧索引,即子索引,在视频内搜索功能输入框中输入用户想观看的任意视频帧或者视频片段所对应的标注信息,以便视频搜索装置1在该视频内进行搜索。在获得该标注信息对应的视频帧后,视频搜索装置1将播放位置定位至与所述标注信息对应的视频帧处。其中,若与该标注信息对应的视频帧为多个时,定位装置103默认将播放位置定位至其时间戳最小的视频帧处,即处于最早播放位置的视频帧处。例如,视频V27为电影视频《冰雪奇缘》的片段,存在与视频V27对应的视频帧索引,其中视频帧索引中包括第285个视频帧与标注信息“冰雪女王施魔法建冰雪宫殿”的对应关系。用户在该视频V27对应的搜索功能输入框中输入“冰雪宫殿”这一查询序列,确定装置102会为用户匹配“冰雪女王施魔法建冰雪宫殿”这一标注信息,然后根据该标注信息确定其对应的视频帧为第285个视频帧,定位装置103根据第285个视频帧的时间戳,将播放位置定位至播放第285个视频帧的位置,并向用户询问是否自该位置开始播放。若用户同意,则该视频直接自“冰雪女王施魔法建冰雪宫殿”这一视频帧开始播放。在此,通过视频搜索装置1实现无需用户手动拖拽,便能通过搜索便直接定位至想观看的位置进行播放。
优选地,视频搜索装置1还包括排序装置108(未示出)。
排序装置对所述至少一个视频进行排序,获得排序后的至少一个视频;其中,定位装置103用于:根据所述至少一个视频帧,定位其对应的所述排序后的至少一个视频的播放位置,以所述播放位置将所述排序后的至少一个视频提供给所述用户。具体地,对所述至少一个视频进行排序的方式包括但不限于:
1)视频中所包括的与所述查询序列匹配的视频帧的数量。具体地,例如,与所述查询序列匹配的视频帧的数量多的视频的排序优先级高,或者排序权重较高。
2)视频对应的视频发布者信息。具体地,若视频发布者发布视频的历史记录比较多,且评价较高,则该视频的排序优先级较高或者 排序权重较高。
3)视频的来源信息。具体地,若视频来源于比较知名的大网站,例如,爱奇艺,优酷,搜狐等知名网站,则该视频的排序优先级较高或者排序权重较高。
4)视频的清晰度或流畅度。具体地,若视频的清晰度越高,或者播放越流畅,则该视频的排序优先级较高或者排序权重较高。
5)视频的主题信息。具体地,若视频的主题在最近一段时期内比较热门,则该视频的排序优先级较高或者排序权重较高。
6)用户对视频的反馈信息。具体地,若用户对该视频的反馈或者评分比较高,则该视频的排序优先级较高或者排序权重较高。
7)通过何种方式进行匹配查询,例如,模糊匹配还是精确匹配的方式,若该视频为通过精确匹配获得,则该视频的排序优先级较高或者排序权重较高;若该视频为通过模糊匹配获得,则该视频的排序优先级较高或者排序权重较高。
本领域技术人员应能理解,上述对所述视频进行排序的方式仅为举例,现有的或者今后可能出现的其他对视频进行排序的方式,如可适用于本发明也应包含在本发明保护范围内,并在此以引用的方式包含于此。
图2示出根据本发明另一个方面的一种基于视频内容的视频搜索方法的流程示意图。
在步骤S201中,视频搜索装置1接收用户输入的查询序列。具体地,用户若想搜索某个与查询序列相关的视频片段,或想搜索某个包括前述视频片段的视频时,其通过输入界面输入查询序列,并点击搜索按钮,视频搜索装置1接收用户输入的查询序列,以便后续装置在视频数据库中搜索其中包括与查询序列对应的视频片段,或进一步地,包括该视频片段的视频。例如,用户想搜索其中包括“黄晓明在家吃饭”的视频片段或者其中包括该视频片段的视频,用户在视频搜索输入界面输入:“黄晓明在家吃饭”这一查询序列,在步骤S201 中,视频搜索装置1例如通过一次或者多次调用应用程序接口的方式接收用户输入的“黄晓明在家吃饭”这一查询序列。
在步骤S202中,视频搜索装置1根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新。
在此,所述视频帧索引可以为包括所有视频中的视频帧的总索引,也可以是为每个视频分别建立的一个子索引。通过将每一个视频中的每一个视频帧分别进行识别,识别出每一个视频帧所包括的信息,其中,同一个视频帧可以包括不同信息,然后根据每一个视频帧以及每个视频帧分别包括的信息,建立一个对应于所有视频的总的视频帧索引,或者为每一个视频分别建立一个视频帧索引。所述视频帧索引中包括:每一个视频帧分别与其包括的至少一个信息的对应关系。进一步地,所述每一个视频帧还可以分别包括一个属性标记,该属性标记用于标记该视频帧属于哪一个视频。
具体地,视频搜索装置1对每一个视频中每一个视频帧对应的信息进行识别后,自每一个视频帧分别获得至少一个对应该视频帧的内容的内容摘要,所述内容摘要例如分别为根据该视频帧内所包括的不同的信息总结出来的关键词,在此称之为标注信息,因此,视频帧索引中包括至少一个:视频帧分别与该视频帧的不同的标注信息的对应关系。在步骤S202中,视频搜索装置1根据接收到的查询序列信息,通过精确匹配、模糊匹配或者两者相结合的方式,在视频帧索引中进行匹配查询,如依次判断该查询序列是否命中该视频帧索引中各个视频帧所对应的标注信息,若命中某个标注信息,则该标注信息对应的视频帧即是该用户输入的查询序列所对应的视频帧,并进而可以确定该视频帧对应的视频。例如,在步骤S202中,视频搜索装置1根据所述至少一个视频帧的属性标记即可确定与其对应的至少一个视频。其中,与所述查询序列匹配对应的所述至少一个视频帧可以是属于同 一个视频的视频帧,也可以是属于不同视频的视频帧;所述至少一个视频帧可以是连续的视频帧,也可以是不连续的视频帧。其中,对视频中每一个视频帧对应的信息的识别包括但不限于:基于单独一帧图像的场景识别、基于连续多帧图像的场景识别、基于图像对应的音频信息的识别、基于图像对应的字幕信息的识别。
例如,视频V1的第1000个视频帧是黄晓明在家吃饭的画面,该视频帧还展示了黄晓明吃炸酱面的内容,或者黄晓明父母的画面,则对该视频帧中信息进行识别后,视频帧索引中可以包括:属性标记为视频V1的第1000个视频帧分别与“黄晓明在家吃饭”、“黄晓明吃炸酱面”和“黄晓明父母”的对应关系。若视频V1的第1000-1049个视频帧都为包括黄晓明在家吃饭,黄晓明吃的是炸酱面,以及黄晓明父母三者的画面,但是第1025-1049个视频帧中没有出现黄晓明父母,则在视频帧索引中,视频V1的第1000-1024个视频帧中的每一个视频帧都分别对应“黄晓明在家吃饭”、“黄晓明吃炸酱面”和“黄晓明父母”;视频V1的第1025-1049个视频帧中的每一个视频帧都分别对应“黄晓明在家吃饭”和“黄晓明吃炸酱面”。若搜索视频的视频数据库中包括视频V1,视频V2,视频V3……等多个视频,可以建立包括该数据库中所有视频中的视频帧的总的视频帧索引,也可以为每一个视频建立一个仅包括其视频帧的视频帧索引,例如为视频V1建立一个对应的视频帧索引时,在步骤S202中,视频搜索装置1依次在每个视频对应的视频帧索引中为所述查询序列匹配对应的至少一个视频帧。
若在步骤S201中,视频搜索装置1接收到的查询序列为“黄晓明在家吃饭”,在步骤S202中,视频搜索装置1可以采用模糊匹配的方式,在所述视频帧索引中查询被“黄晓明”、“黄晓明吃饭”、“黄晓明在家”、“黄晓明在家吃饭”、“明星在家吃饭”、“黄晓明和angelababy在家吃饭”等关键词命中的标注信息,命中的所述标注信息对应的视频帧即是与“黄晓明在家吃饭”所对应的视频帧,并 进而可以确定该视频帧对应的视频;或者采用精确匹配方式,仅在所述视频帧索引中搜索对应于“黄晓明在家吃饭”的视频帧及其对应的视频;或者采用模糊匹配和精确匹配相结合的方式为用户匹配对应的视频帧及视频。若在视频帧索引中被“黄晓明在家吃饭”这一查询序列命中的标注信息为54个,在步骤S202中,视频搜索装置1确定命中的54个标注信息在视频帧索引中分别对应于:视频V1的第1000-1049个视频帧,视频V3的第225个视频帧,以及视频V8的第25帧、第126帧、第127帧,则与“黄晓明在家吃饭”对应的视频帧包括视频V1的第1000-1049个视频帧,视频V3的第225个视频帧,以及视频V8的第25、126、127个视频帧,与“黄晓明在家吃饭”对应的视频分别为视频V1、V3和V8。
在此,在步骤S202中,视频搜索装置1根据视频帧索引确定与查询序列对应的视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或者更新的,因此,在步骤S202中,视频搜索装置1可以根据视频帧索引,对视频中的每一个视频帧的信息进行检索匹配,确定其中被查询序列命中的信息,根据命中的所述信息确定与查询序列对应的视频帧,从而确定与所述查询序列对应的视频,提供给用户。基于视频的内容,即视频中每一个视频帧的信息进行搜索,使得用户可以更加精准的搜索到自己想要观看的视频,提升了用户的搜索体验。
本领域技术人员应能理解,对视频中每一个视频帧对应的信息的识别的方式仅为举例,现有的或者今后可能出现的对视频中每一个视频帧对应的信息的识别的方式,如可适用于本发明也应包含在本发明保护范围内,并在此以引用的方式包含于此。
在步骤S203中,视频搜索装置1根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户。具体地,视频中的每个视频帧都有其在该视频中对应的时间戳。在步骤S202中,视频搜索装置1 确定与查询序列对应的所述至少一个视频帧后,在步骤S203中,视频搜索装置1获得其中的每一个视频帧在各自对应的视频中的时间戳,然后将与所述至少一个视频帧分别对应的视频的播放位置定位至与查询序列匹配的视频帧的位置,以便用户选中该视频进行观看时,该视频自与查询序列对应的视频帧处开始播放。视频搜索装置1使得用户可以直接观看其想看到的视频片段。
优选地,视频搜索装置1在步骤S203中获取所述用户自所述至少一个视频中所选择的一个目标视频,若所述目标视频中包括多个与所述查询序列匹配的视频帧,则按以下任一项定位所述目标视频的播放位置:默认根据所述目标视频中第一个与所述查询序列匹配的视频帧的时间戳来确定;由所述用户选择。具体地,用户自视频搜索装置1提供的至少一个视频中选中一目标视频进行观看时,在步骤S203中,视频搜索装置1获取用户选中的所述目标视频。如果该目标视频中包括多个与所述查询序列匹配的视频帧,则:可以默认将该视频的播放位置定位至时间戳最小的那个视频帧的位置,即播放时间最早的视频帧的位置,也可以默认定位至匹配程度最高的视频帧;将该视频的播放位置随机定位至任意一个匹配的视频帧;或者由用户选择播放位置定位至哪个视频帧,选择的方法有列表形式和播放进度条标注形式,其中所述列表形式包括且不仅仅包括弹窗列表,例如,一视频被用户选中后,在步骤S203中,视频搜索装置1以弹窗列表的形式为用户提供一个选择框,由用户选择自哪个对应的视频帧开始播放。然后在步骤S203中,视频搜索装置1将定位了播放位置的所述至少一个视频提供给用户,使得用户无论从中选择哪个视频进行观看,该视频都会直接从与所述查询序列匹配的视频帧处开始播放,因而用户可以直接观看与所述查询序列对应的画面。
接上例,视频V1中与“黄晓明吃饭”匹配的视频帧为:视频V1中的第1000-1049个视频帧、视频V3中的第225个视频帧、视频V8中的第25、126、127个视频帧,则可默认将视频V1的播放位置定位 至第1000个视频帧,默认将V3的播放位置定位至第225个视频帧,默认将V8的播放位置定位至第25个视频帧;或者V1定位至第1000-1049个视频帧中的任意一个视频帧、V8的第25、126、127个视频帧中的任意一个视频帧;或者可以由用户选择将播放位置定位至哪个匹配的视频帧。
在此,视频搜索装置1不仅可以基于视频中每一个视频帧的信息来搜索与查询序列匹配对应的视频,为用户准确的找到满足用户观看条件的视频,而且在视频搜索装置1确定对应的视频后,在步骤S203中,视频搜索装置1可以准确的将该视频的播放位置定位至与查询序列对应的位置,使该视频自与查询序列对应的视频帧处进行播放,从而使得用户可以更加快速的观看到与所述查询序列对应的视频片段,提高了用户的搜索效率,减少了用户的操作时间。因此,视频搜索装置1不仅能为用户准确的搜索到想观看视频,还能将视频的播放位置精确的定位至用户想观看的视频段的位置。
优选地,视频搜索装置1还包括步骤S204(未示出)和步骤S205(未示出)。
在步骤S204中,视频搜索装置1对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息。具体地,在步骤S204中,视频搜索装置1分别对每一个视频中的每一个视频帧进行图像识别,以及对每一个视频帧对应的相关音频信息或者字幕信息进行识别,获得对应的至少一个标注信息,其中,所述标注信息为对应的视频帧所包括的内容的摘要。例如,视频V1的第1000个视频帧是黄晓明在家吃饭的画面,该视频帧还包括黄晓明吃炸酱面的内容,或者黄晓明的父母,因此,视频V1的第1000个视频帧分别与标注信息:黄晓明在家吃饭、黄晓明吃炸酱面、黄晓明父母对应。
优选地,获得对应的所述至少一个标注信息的方式包括但不限于:
1)识别一个视频帧图像的图像特征,根据所述图像特征确定所 述一个视频帧图像对应的标注信息。具体地,通过图像识别的方式,识别所述视频帧图像的图像特征,根据所述图像特征确定所述视频帧图像中包括的至少一个标注信息,所述至少一个标注信息分别为该视频帧所包括的不同内容的摘要。
2)识别一个视频帧图像的图像特征,并根据识别的所述一个视频帧图像的前置图像和后置图像的图像特征,确定对应的标注信息。具体地,若仅根据所述视频帧图像无法确定标注信息,可以根据该视频帧图像的前置至少一个视频帧和后置至少一个视频帧的图像特征,综合判断,确定该视频帧图像包括的至少一个内容摘要,例如,根据前后多个视频帧中物体的位置移动确定视频帧中该物体的状态,然后将所述至少一个内容摘要分别作为该帧视频的至少一个标注信息,并该视频帧的至少一个标注信息分别该视频帧对应关系更新至所述视频帧索引中。例如,一视频帧图像显示的内容为一辆汽车,仅根据一个视频帧无法识别出该视频帧内出现的该汽车是停止的状态还是运动的状态,因此需要根据该视频帧的前置图像或者后置图像共同确定该汽车的运行状态,若根据该视频帧中出现的汽车在其前置图像和后置图像中的位置变化,判断该汽车为运行状态,并且根据位置变化的大小判断汽车为高速运行状态,然后将汽车运行或者汽车高速运行作为该视频帧的标注信息,在视频帧索引中与该视频帧对应。
3)识别一个视频帧所对应的音频信息,将所述音频信息转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息。具体地,识别所述视频帧对应的音频信息,将所述音频信息进行语音识别,转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息。
4)提取一个视频帧所对应的字幕信息,根据所述字幕信息确定所述视频帧对应的标注信息。
5)通过人工智能的方式,自动识别一个视频帧图像的图像特征、该视频帧图像的前置图像和后置图像的图像特征、该视频帧图像所对 应的音频信息的内容、以及该视频帧图像对应的字幕信息的内容,并基于通过人智能识别的上述信息确定该视频帧图像对应的标注信息。
本领域技术人员应能理解,上述获得对应的所述至少一个标注信息的方式仅为举例,现有的或者今后可能出现的其他获得对应的所述至少一个标注信息的方式,如可适用于本发明也应包含在本发明保护范围内,并在此以引用的方式包含于此。
在步骤S205中,视频搜索装置1根据所述标注信息,建立或更新所述视频帧索引,其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:根据所述查询序列,在所述视频帧索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧及其对应的至少一个视频。具体地,在步骤S205中,视频搜索装置1根据视频中每一个视频帧与其对应的标注信息之间的对应关系建立视频帧索引。其中,视频帧索引中包括:每一个视频帧分别与其对应的至少一个标注信息的对应关系,以便视频搜索装置1将查询序列与视频帧索引中的标注信息进行匹配,若匹配成功,则该标注信息对应的视频帧为与查询序列匹配的视频帧,然后视频搜索装置1根据每一个视频帧的属性标记便能确定该视频帧所在的视频。其中,所述视频帧索引可以是根据每一个视频的每一个视频帧建立的总索引,也可以是分别根据任意一个视频中的每一个视频帧建立的一个子索引。当所述视频帧索引为包括每一个视频中的每一个视频帧的总索引时,在视频搜索装置1接收用户的查询序列后,视频搜索装置1根据该查询序列在该视频帧索引中进行匹配,然后根据匹配的至少一个视频帧分别确定其各自对应的视频;若所述视频帧为与每个视频分别对应的至少一个子索引,视频搜索装置1根据接收的查询序列依次在每个视频对应的子索引中进行匹配,在匹配得到对应的视频帧时,视频搜索装置1可以直接获知与该视频帧对应的视频。
例如,视频V112的第48个视频帧是张杰在唱歌的画面,通过该 视频帧的音频信息、字幕信息、以及该视频帧上出现的海报信息,在步骤S204中,视频搜索装置1可识别出该视频帧的内容还包括:张杰在中国新歌声唱歌,张杰唱逆战等信息,因此将“张杰在唱歌”、“张杰在中国新歌声唱歌”、“张杰唱逆战”作为该视频帧的标注信息。在获得该视频中其他视频帧的标注信息以及其他视频的视频帧的标注信息后,在步骤S205中,视频搜索装置1建立包括视频帧与标注信息的对应关系的视频帧索引。若所述视频帧索引为总的视频帧索引,则其中包括:属性标记为视频V112的第48个视频帧分别与标注信息“张杰在唱歌”、“张杰在中国新歌声唱歌”、“张杰唱逆战”对应;或者若所述视频帧为对应于每个视频的子索引,则视频V112对应的视频帧索引中包括:第112个视频帧分别与标注信息“张杰在唱歌”、“张杰在中国新歌声唱歌”、“张杰唱逆战”对应。其中,每一个视频帧的标注信息可以有多个,以从不同方面对该视频帧进行全面标注,以便该视频帧被更容易的检索到或者匹配到。
优选地,视频搜索装置1还包括步骤S206(未示出)和步骤S207(未示出)。
在步骤S206中,视频搜索装置1对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;在步骤S207中,视频搜索装置1根据所述标注信息,为每一个视频建立一个子索引;其中,根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:根据所述查询序列,依次在各个视频分别对应的各个子索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧。在此,步骤S206与前述S204的实现过程相同。具体地,在步骤S206中,视频搜索装置1获得与视频帧对应的标注信息,在步骤S207中,视频搜索装置1根据所述标注信息,为每一个视频建立一个子索引,在步骤S201中,视频搜索装置1接收用户的查询序列后,在步骤S202中,视频搜索装置1依次将各个视频分别对应的视频帧子索引中的标注信息与所述查询序列进行匹配对应,并根据与 所述查询序列匹配对应的至少一个标注信息确定与所述至少一个标注信息所分别对应的视频帧。
优选地,可以基于视频搜索装置1为用户提供在视频内搜索任意视频段的视频内搜索功能,使得用户在观看视频时通过对其输入的关键词的搜索,便可以直接将播放位置定位至其想观看的视频帧位置。若用户自视频搜索装置1提供的至少一个视频中选择一个视频进行播放观看,或者用户任意点选一个视频进行播放,用户可以根据该视频对应的视频帧索引,即子索引,在视频内搜索功能输入框中输入用户想观看的任意视频帧或者视频片段所对应的标注信息,以便视频搜索装置1在该视频内进行搜索。在获得该标注信息对应的视频帧后,视频搜索装置1将播放位置定位至与所述标注信息对应的视频帧处。其中,若与该标注信息对应的视频帧为多个时,在步骤S203中,视频搜索装置1默认将播放位置定位至其时间戳最小的视频帧处,即处于最早播放位置的视频帧处。例如,视频V27为电影视频《冰雪奇缘》的片段,存在与视频V27对应的视频帧索引,其中视频帧索引中包括第285个视频帧与标注信息“冰雪女王施魔法建冰雪宫殿”的对应关系。用户在该视频V27对应的搜索功能输入框中输入“冰雪宫殿”这一查询序列,在步骤S202中,视频搜索装置1会为用户匹配“冰雪女王施魔法建冰雪宫殿”这一标注信息,然后根据该标注信息确定其对应的视频帧为第285个视频帧,在步骤S203中,视频搜索装置1根据第285个视频帧的时间戳,将播放位置定位至播放第285个视频帧的位置,并向用户询问是否自该位置开始播放。若用户同意,则该视频直接自“冰雪女王施魔法建冰雪宫殿”这一视频帧开始播放。在此,通过视频搜索装置1实现无需用户手动拖拽,便能通过搜索便直接定位至想观看的位置进行播放。
优选地,视频搜索装置1还包括步骤S208(未示出)。
在步骤S208中,视频搜索装置1对所述至少一个视频进行排序,获得排序后的至少一个视频;其中,在步骤S203中,视频搜索装置 1用于:根据所述至少一个视频帧,定位其对应的所述排序后的至少一个视频的播放位置,以所述播放位置将所述排序后的至少一个视频提供给所述用户。具体地,对所述至少一个视频进行排序的方式包括但不限于:
1)视频中所包括的与所述查询序列匹配的视频帧的数量。具体地,例如,与所述查询序列匹配的视频帧的数量多的视频的排序优先级高,或者排序权重较高。
2)视频对应的视频发布者信息。具体地,若视频发布者发布视频的历史记录比较多,且评价较高,则该视频的排序优先级较高或者排序权重较高。
3)视频的来源信息。具体地,若视频来源于比较知名的大网站,例如,爱奇艺,优酷,搜狐等知名网站,则该视频的排序优先级较高或者排序权重较高。
4)视频的清晰度或流畅度。具体地,若视频的清晰度越高,或者播放越流畅,则该视频的排序优先级较高或者排序权重较高。
5)视频的主题信息。具体地,若视频的主题在最近一段时期内比较热门,则该视频的排序优先级较高或者排序权重较高。
6)用户对视频的反馈信息。具体地,若用户对该视频的反馈或者评分比较高,则该视频的排序优先级较高或者排序权重较高。
7)通过何种方式进行匹配查询,例如,模糊匹配还是精确匹配的方式,若该视频为通过精确匹配获得,则该视频的排序优先级较高或者排序权重较高;若该视频为通过模糊匹配获得,则该视频的排序优先级较高或者排序权重较高。
本领域技术人员应能理解,上述对所述视频进行排序的方式仅为举例,现有的或者今后可能出现的其他对视频进行排序的方式,如可适用于本发明也应包含在本发明保护范围内,并在此以引用的方式包含于此。
本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机代码,当所述计算机代码被执行时,如上任一项所述的视频搜索方法被执行。
本发明还提供了一种计算机程序产品,当所述计算机程序产品被计算机设备执行时,如上任一项所述的视频搜索方法被执行。
本发明还提供了一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器中存储有计算机代码,所述处理器被配置来通过执行所述计算机代码以执行如上任一项所述的视频搜索方法。
图3示出了适于用来实现本发明实施方式的示例性计算机系统/服务器12的框图。图3显示的计算机系统/服务器12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图3所示,计算机系统/服务器12以通用计算设备的形式表现。计算机系统/服务器12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
计算机系统/服务器12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机系统/服务器12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机系 统/服务器12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图3未示出,通常称为“硬盘驱动器”)。尽管图3中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。
计算机系统/服务器12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机系统/服务器12交互的设备通信,和/或与使得该计算机系统/服务器12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机系统/服务器12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机系统/服务器12的其它模块通信。应当明白,尽管图3中未示出,可以结合计算机系统/服务器12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元16通过运行存储在存储器28中的程序,从而执行各种功能应用以及数据处理。
例如,存储器28中存储有用于执行本发明的各项功能和处理的计算机程序,处理单元16执行相应计算机程序时,本发明在网络端对来电意图的识别被实现。
需要注意的是,本发明可在软件和/或软件与硬件的组合体中被实施,例如,本发明的各个装置可采用专用集成电路(ASIC)或任何其他类似硬件设备来实现。在一个实施例中,本发明的软件程序可以通过处理器执行以实现上文所述步骤或功能。同样地,本发明的软件程序(包括相关的数据结构)可以被存储到计算机可读记录介质中,例如,RAM存储器,磁或光驱动器或软磁盘及类似设备。另外,本发明的一些步骤或功能可采用硬件来实现,例如,作为与处理器配合从而执行各个步骤或功能的电路。
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。

Claims (17)

  1. 一种基于视频内容的视频搜索方法,其中,所述视频搜索方法包括:
    a.接收用户输入的查询序列;
    b.根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新;
    c.根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户。
  2. 根据权利要求1所述的视频搜索方法,其中,该视频搜索方法还包括:
    对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
    根据所述标注信息,建立或更新所述视频帧索引;
    其中,步骤b中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
    根据所述查询序列,在所述视频帧索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧及其对应的至少一个视频。
  3. 根据权利要求1所述的视频搜索方法,其中,该视频搜索方法还包括:
    对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
    根据所述标注信息,为每一个视频建立一个子索引;
    其中,步骤b中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
    根据所述查询序列,依次在各个视频分别对应的各个子索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧。
  4. 根据权利要求2或3所述的视频搜索方法,其中,所述标注信息通过以下至少任一项获得:
    识别一个视频帧图像的图像特征,根据所述图像特征确定所述一个视频帧图像对应的标注信息;
    识别一个视频帧图像的图像特征,并根据识别的所述视频帧图像的前置图像和后置图像的图像特征,确定对应的标注信息;
    识别一个视频帧所对应的音频信息,将所述音频信息转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息;
    提取一个视频帧所对应的字幕信息,根据所述字幕信息确定所述视频帧对应的标注信息。
  5. 根据权利要求1至4中任一项所述的视频搜索方法,其中,该视频搜索方法还包括:
    对所述至少一个视频进行排序,获得排序后的至少一个视频;
    其中,步骤c包括:
    根据所述至少一个视频帧,定位其对应的所述排序后的至少一个视频的播放位置,以所述播放位置将所述排序后的至少一个视频提供给所述用户。
  6. 根据权利要求5中任一项所述的视频搜索方法,其中,根据以下至少任一项对所述至少一个视频进行排序:
    视频中所包括的与所述查询序列匹配的视频帧的数量;
    视频对应的视频发布者信息;
    视频的来源信息;
    视频的清晰度;
    视频的主题信息;
    用户对视频的反馈信息。
  7. 根据权利要求1至6中任一项所述的视频搜索方法,其中,步 骤c中以所述播放位置将所述至少一个视频提供给所述用户还包括:
    获取所述用户自所述至少一个视频中所选择的一个目标视频;
    若所述目标视频中包括多个与所述查询序列匹配的视频帧,则按以下任一项来定位所述目标视频的播放位置:
    默认根据所述目标视频中第一个与所述查询序列匹配的视频帧的时间戳来确定;
    由所述用户选择。
  8. 一种基于视频内容的视频搜索装置,其中,所述视频搜索装置包括:
    接收装置,用于接收用户输入的查询序列;
    确定装置,用于根据视频帧索引确定其中包括的与所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频,其中,所述视频帧索引根据对视频中每一个视频帧对应的信息的识别所建立或更新;
    定位装置,用于根据所述至少一个视频帧对应的时间戳,定位其对应的至少一个视频的播放位置,以所述播放位置将所述至少一个视频提供给所述用户。
  9. 根据权利要求8所述的视频搜索装置,其中,该视频搜索装置还包括:
    第一识别装置,用于对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
    更新装置,用于根据所述标注信息,建立或更新所述视频帧索引;
    其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
    根据所述查询序列,在所述视频帧索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧及其对应的至少一个视频。
  10. 根据权利要求8所述的视频搜索装置,其中,该视频搜索装 置还包括:
    第二识别装置,用于对每一个视频中每一个视频帧对应的信息进行识别,获得对应的标注信息;
    建立装置,用于根据所述标注信息,为每一个视频建立一个子索引;
    其中,在确定装置中根据所述查询序列匹配对应的至少一个视频帧及其对应的至少一个视频的方式包括:
    根据所述查询序列,依次在各个视频分别对应的各个子索引中匹配确定具有与所述查询序列相匹配的标注信息的至少一个视频帧。
  11. 根据权利要求9或10所述的视频搜索装置,其中,所述标注信息通过以下至少任一项获得:
    识别一个视频帧图像的图像特征,根据所述图像特征确定所述一个视频帧图像对应的标注信息;
    识别一个视频帧图像的图像特征,并根据识别的所述一个视频帧图像的前置图像和后置图像的图像特征,确定对应的标注信息;
    识别一个视频帧所对应的音频信息,将所述音频信息转换成文字信息,根据所述文字信息确定所述视频帧对应的标注信息;
    提取一个视频帧所对应的字幕信息,根据所述字幕信息确定所述视频帧对应的标注信息。
  12. 根据权利要求8至11中任一项所述的视频搜索装置,其中,该视频搜索装置还包括:
    排序装置,用于对所述至少一个视频进行排序,获得排序后的至少一个视频;
    其中,定位装置用于:
    根据所述至少一个视频帧,定位其对应的所述排序后的至少一个视频的播放位置,以所述播放位置将所述排序后的至少一个视频提供给所述用户。
  13. 根据权利要求12中任一项所述的视频搜索装置,其中,根据 以下至少任一项对所述至少一个视频进行排序:
    视频中所包括的与所述查询序列匹配的视频帧的数量;
    视频对应的视频发布者信息;
    视频的来源信息;
    视频的清晰度;
    视频的主题信息;
    用户对视频的反馈信息。
  14. 根据权利要求8至13中任一项所述的视频搜索装置,其中,所述定位装置还用于:
    获取所述用户自所述至少一个视频中所选择的一个目标视频;
    若所述目标视频中包括多个与所述查询序列匹配的视频帧,则按以下任一项来定位所述目标视频的播放位置:
    默认根据所述目标视频中第一个与所述查询序列匹配的视频帧的时间戳来确定;
    由所述用户选择。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机代码,当所述计算机代码被执行时,如权利要求1至7中任一项所述的视频搜索方法被执行。
  16. 一种计算机程序产品,当所述计算机程序产品被计算机设备执行时,如权利要求1至7中任一项所述的视频搜索方法被执行。
  17. 一种计算机设备,所述计算机设备包括:
    一个或多个处理器;
    存储器,用于存储一个或多个计算机程序;
    当所述一个或多个计算机程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至7中任一项所述的视频搜索方法。
PCT/CN2019/072392 2018-01-26 2019-01-18 一种基于视频内容的视频搜索方法和视频搜索装置 WO2019144850A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810077785.0A CN108388583A (zh) 2018-01-26 2018-01-26 一种基于视频内容的视频搜索方法和视频搜索装置
CN201810077785.0 2018-01-26

Publications (1)

Publication Number Publication Date
WO2019144850A1 true WO2019144850A1 (zh) 2019-08-01

Family

ID=63077430

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/072392 WO2019144850A1 (zh) 2018-01-26 2019-01-18 一种基于视频内容的视频搜索方法和视频搜索装置

Country Status (2)

Country Link
CN (1) CN108388583A (zh)
WO (1) WO2019144850A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388583A (zh) * 2018-01-26 2018-08-10 北京览科技有限公司 一种基于视频内容的视频搜索方法和视频搜索装置
CN109614515B (zh) * 2018-10-30 2020-09-01 北京奇艺世纪科技有限公司 视频搜索评价方法和系统
CN110162665B (zh) * 2018-12-28 2023-06-16 腾讯科技(深圳)有限公司 视频搜索方法、计算机设备及存储介质
CN109933691B (zh) * 2019-02-11 2023-06-09 北京百度网讯科技有限公司 用于内容检索的方法、装置、设备和存储介质
CN109905772B (zh) * 2019-03-12 2022-07-22 腾讯科技(深圳)有限公司 视频片段查询方法、装置、计算机设备及存储介质
CN112019789B (zh) * 2019-05-31 2022-05-31 杭州海康威视数字技术股份有限公司 录像回放方法及装置
CN110825913A (zh) * 2019-09-03 2020-02-21 上海擎测机电工程技术有限公司 专业词抽取和词性标注方法
CN110557683B (zh) * 2019-09-19 2021-08-10 维沃移动通信有限公司 一种视频播放控制方法及电子设备
CN110913241B (zh) * 2019-11-01 2022-09-30 北京奇艺世纪科技有限公司 一种视频检索方法、装置、电子设备及存储介质
CN112911378B (zh) * 2019-12-03 2024-09-24 西安光启智能技术有限公司 一种视频帧的查询方法
CN111538858B (zh) * 2020-05-06 2023-06-23 英华达(上海)科技有限公司 建立视频图谱的方法、装置、电子设备、存储介质
CN111901668B (zh) * 2020-09-07 2022-06-24 三星电子(中国)研发中心 视频播放方法和装置
CN112395420A (zh) * 2021-01-19 2021-02-23 平安科技(深圳)有限公司 视频内容检索方法、装置、计算机设备及存储介质
CN113722537B (zh) * 2021-08-11 2024-04-26 北京奇艺世纪科技有限公司 短视频排序及模型训练方法、装置、电子设备和存储介质
CN116886991B (zh) * 2023-08-21 2024-05-03 珠海嘉立信发展有限公司 生成视频资料的方法、装置、终端设备以及可读存储介质
CN118606513B (zh) * 2024-07-29 2024-10-08 广东科学中心 一种基于属性识别的科普内容匹配系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105120321A (zh) * 2015-08-21 2015-12-02 北京佳讯飞鸿电气股份有限公司 一种视频搜索方法、视频存储方法和相关装置
CN105335387A (zh) * 2014-07-04 2016-02-17 杭州海康威视系统技术有限公司 一种视频云存储系统的检索方法
CN108388583A (zh) * 2018-01-26 2018-08-10 北京览科技有限公司 一种基于视频内容的视频搜索方法和视频搜索装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539929B (zh) * 2009-04-17 2011-04-06 无锡天脉聚源传媒科技有限公司 利用计算机系统进行的电视新闻标引方法
US9684719B2 (en) * 2012-12-10 2017-06-20 Verint Systems Ltd. Object search by description
CN104281651B (zh) * 2014-09-16 2018-05-04 福建星网物联信息系统有限公司 一种海量视频数据检索的方法及其系统
CN105187795B (zh) * 2015-09-14 2018-11-09 博康云信科技有限公司 一种基于视图库的视频标签定位方法及装置
WO2017180928A1 (en) * 2016-04-13 2017-10-19 Tran David Dat Positional recording synchronization system
CN107122456A (zh) * 2017-04-26 2017-09-01 合信息技术(北京)有限公司 展示视频搜索结果的方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335387A (zh) * 2014-07-04 2016-02-17 杭州海康威视系统技术有限公司 一种视频云存储系统的检索方法
CN105120321A (zh) * 2015-08-21 2015-12-02 北京佳讯飞鸿电气股份有限公司 一种视频搜索方法、视频存储方法和相关装置
CN108388583A (zh) * 2018-01-26 2018-08-10 北京览科技有限公司 一种基于视频内容的视频搜索方法和视频搜索装置

Also Published As

Publication number Publication date
CN108388583A (zh) 2018-08-10

Similar Documents

Publication Publication Date Title
WO2019144850A1 (zh) 一种基于视频内容的视频搜索方法和视频搜索装置
US9148619B2 (en) Music soundtrack recommendation engine for videos
CN105677735B (zh) 一种视频搜索方法及装置
US9357242B2 (en) Method and system for automatic tagging in television using crowd sourcing technique
US7908556B2 (en) Method and system for media landmark identification
US8280158B2 (en) Systems and methods for indexing presentation videos
US20190130185A1 (en) Visualization of Tagging Relevance to Video
US8300953B2 (en) Categorization of digital media based on media characteristics
US20190026367A1 (en) Navigating video scenes using cognitive insights
CN107229741B (zh) 信息搜索方法、装置、设备以及存储介质
US20080215548A1 (en) Information search method and system
WO2024001057A1 (zh) 一种基于注意力片段提示的视频检索方法
WO2017080173A1 (zh) 基于自然信息识别的推送系统和方法及一种客户端
JP4746568B2 (ja) 情報提供装置、情報提供方法、及びプログラム
US20210216772A1 (en) Visual Menu
CN113779381B (zh) 资源推荐方法、装置、电子设备和存储介质
WO2016155299A1 (zh) 用于显示网页标记信息的方法与装置
CN113704507B (zh) 数据处理方法、计算机设备以及可读存储介质
CN108702551B (zh) 用于提供视频的概要信息的方法和装置
CN111309200A (zh) 一种扩展阅读内容的确定方法、装置、设备及存储介质
US20150010288A1 (en) Media information server, apparatus and method for searching for media information related to media content, and computer-readable recording medium
CN113486212A (zh) 搜索推荐信息的生成和展示方法、装置、设备及存储介质
CN110035298B (zh) 一种媒体快速播放方法
JP5013840B2 (ja) 情報提供装置、情報提供方法、及びコンピュータプログラム
CN113965798A (zh) 一种视频信息生成、展示方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19744348

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 10/11/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 19744348

Country of ref document: EP

Kind code of ref document: A1