WO2019129075A1 - 视频检索的方法和装置以及计算机可读存储介质 - Google Patents

视频检索的方法和装置以及计算机可读存储介质 Download PDF

Info

Publication number
WO2019129075A1
WO2019129075A1 PCT/CN2018/123938 CN2018123938W WO2019129075A1 WO 2019129075 A1 WO2019129075 A1 WO 2019129075A1 CN 2018123938 W CN2018123938 W CN 2018123938W WO 2019129075 A1 WO2019129075 A1 WO 2019129075A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
feature information
information
image
identifier
Prior art date
Application number
PCT/CN2018/123938
Other languages
English (en)
French (fr)
Inventor
贾嘉
俞婷婷
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2019129075A1 publication Critical patent/WO2019129075A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Definitions

  • Embodiments of the present disclosure relate to, but are not limited to, the field of information processing, and in particular, to a method and apparatus for video retrieval and a computer readable storage medium.
  • a conventional method for video retrieval in a terminal such as a mobile phone is, for example, that a user labels text information for a specific frame of a video, and subsequently retrieves the image frame by the text information.
  • the method has the following disadvantages: the user can only manually mark the text information one by one, the workload is large, and the label is incomplete; the marked text label can only be retrieved in the video file, and the search cannot be performed across the video; the existing mark cannot be based on the existing mark Automatic classification, poor maintainability, and difficult retrieval.
  • An aspect of the present disclosure provides a video retrieval method, including the steps of: after receiving a target image selected by a user, performing graphic recognition on the target image, extracting feature information, and saving the feature information as a video identifier. Performing image recognition on each frame or sample frame of the currently played video or the specified video or all of the locally stored videos, and updating the video identifier when it is found that there is feature information identical or similar to the feature information; After receiving the retrieval instruction, the video identifier is retrieved according to the information carried by the retrieval instruction.
  • an identification module configured to, after receiving a target image selected by a user, perform pattern recognition on the target image, extract feature information, and The feature information is saved as a video identifier; the identification module is configured to perform image recognition on each frame or sample frame of the currently played video or the specified video or all locally stored video, and when the presence or absence is identical or similar to the feature information And updating the video identifier; and the retrieval module is configured to, after receiving the retrieval instruction, retrieve the video identifier according to the information carried by the retrieval instruction.
  • Another aspect of the present disclosure provides a computer readable storage medium storing instructions that, when executed by a processor, implement the method as described above.
  • Another aspect of the present disclosure provides an apparatus for video retrieval, comprising a processor and a computer readable storage medium having instructions stored therein when the instructions are executed by the processor Implement the method as described above.
  • FIG. 1 is a flowchart of a method for video retrieval according to an embodiment of the present disclosure
  • Figure 2 is a block diagram of the original system structure of the Android system
  • FIG. 3 is a structural structural diagram of a method for video retrieval according to an embodiment of the present disclosure
  • FIG. 4 is a block diagram of a target tag engine of a video retrieval method according to an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of an apparatus for video retrieval according to an embodiment of the present disclosure.
  • FIG. 1 is a flowchart of a method for video retrieval according to an embodiment of the present disclosure.
  • the method of video retrieval according to an embodiment of the present disclosure includes steps S10-S30.
  • step S10 after receiving the target image selected by the user, the target image is subjected to pattern recognition, feature information is extracted, and the feature information is saved as a video identifier.
  • step S20 image recognition is performed on each frame or sample frame of the currently played video or the specified video or all the locally stored videos, and when the feature information that is identical or similar to the feature information is recognized, the update is performed.
  • Video logo is performed on each frame or sample frame of the currently played video or the specified video or all the locally stored videos
  • step S30 after receiving the retrieval instruction, the video identification is retrieved based on the information carried by the retrieval instruction (ie, the information of the retrieval instruction).
  • the target image may be marked by the user, and the object marked by the user is graphically recognized, and the feature information is recognized and saved as a video identifier.
  • the target image may be selected during video recording or playback, or may be an image selected by the user.
  • the method can only manually mark the text information by the user.
  • the method in this embodiment only needs to perform the identification once, and the video identifier can be updated for the video and all the local video files, and all the records are recorded.
  • the method of the embodiment facilitates the user to quickly retrieve the video by means of text or a picture, find all video time points appearing on the target object, and quickly locate and view.
  • the video identification is updated for the video according to the "feature information", when the target video relates to the face, it is not affected by the age or dress of the subject, thereby making the result more accurate.
  • the method of the present disclosure can speed up the retrieval of video files, improve the accuracy of video file retrieval, and improve the user experience, especially the experience of mobile phone users of Android systems.
  • performing pattern recognition on the target image to extract feature information includes the following steps: determining an identification area of the target image, performing edge detection on the identification area, and extracting features of the main object information.
  • the main object may include any one of the following: an object occupying the largest area, an object having the largest color difference from other colors, an object having the highest definition, or another object having the largest recognition factor.
  • the feature information of the primary object may include one or more of the following: a type of the primary object, a color of the primary object, an area ratio of the primary object to the entire frame image, and a primary object located in a region of the entire frame of the image.
  • the pattern recognition of the target image to extract feature information further includes the steps of: identifying one or more secondary objects in the identification area to obtain corresponding secondary feature information.
  • a number of categories may be added to the video identification by means of one or more of "main feature information (category pointer)", “secondary feature information (category pointer)”, and “auxiliary feature information (category pointer)” described below. Marking, which facilitates subsequent categorization and user retrieval by automatically classifying all video identities.
  • the performing image recognition on the target image to extract feature information further includes: receiving feature information marked by a user.
  • the solution of the present disclosure can also identify a plurality of feature information (category pointers) for the object marked by the user, such as: main feature information (category pointer), secondary feature information (category pointer), auxiliary feature information (category pointer), and Recording is made to facilitate the classification of videos in subsequent use and to facilitate retrieval by users.
  • category pointers such as: main feature information (category pointer), secondary feature information (category pointer), auxiliary feature information (category pointer), and Recording is made to facilitate the classification of videos in subsequent use and to facilitate retrieval by users.
  • the pattern recognition of the target image and the extraction of the feature information further include: increasing the identification area to identify the auxiliary feature information in the increased identification area.
  • the information corresponding to the video identifier includes one or more of the following: a feature image corresponding to the feature information, time information in which the feature information is located in the video, and file information of the video file.
  • step S10 may include steps S11-S14.
  • step S11 during the recording of the video or the playing of the video, the selected operation of the user is received.
  • the selected operation may be a selection, a frame selection, a circle selection, or a selected area corresponding to the pressure size.
  • step S12 the identification area is determined according to the selected operation of the user.
  • step S13 image analysis is performed on the identification area, and feature information is extracted.
  • step S14 the feature information of the identification area is saved as a video identifier, and the feature information of the identification area and the time point information of the identification area are located in the video, and the file information and/or the identifier of the video file may be further recorded.
  • a feature picture of the area is saved as a video identifier, and the feature information of the identification area and the time point information of the identification area are located in the video, and the file information and/or the identifier of the video file may be further recorded.
  • performing image analysis on the identification area and extracting the feature information in step S13 may include: step S1300: performing edge detection in the area selected by the user to identify the main object. According to the main object, the corresponding "category pointer", that is, the main feature information (category pointer) is identified.
  • step S13 may further include one or more of “step S1301”, “step S1302”, and “step S1303”, and the order is not limited.
  • step S1301 one or more secondary objects in the identified area are identified, and corresponding secondary feature information (category pointer) is obtained.
  • step S1302 in the case where the user previously marks or edits the video identification information afterwards, the "category pointer" marked by the user, that is, the manual feature information (category pointer) is further obtained.
  • step S1303 the marker object range is expanded, and the auxiliary feature information (category pointer) is identified.
  • each frame/sample frame of the video (eg, every 10 frames, or 20 frames, one frame, or every other time, such as 2 seconds) may be sampled.
  • the image analysis is performed, and when the feature information that is the same as or similar to the existing video identifier is identified, the video identifier is updated. Specifically, in the video identification information, the new time point information appearing in the video is added. .
  • the partial/global scan portion/all video files are retrieved for each frame/sample frame based on the feature information of the video identification to update the video identification.
  • the information carried by the retrieval instruction includes a feature image.
  • the method includes: displaying feature images corresponding to all video identifiers.
  • the searching for the video identifier according to the information carried by the retrieval instruction includes the following steps: acquiring a corresponding identifier name according to the feature image, searching all video identifiers in the database according to the identifier name, and acquiring the identifier The video file information corresponding to the name.
  • the video identification retrieval process of step S30 may include the following methods:
  • the method can include steps S3011-S3013.
  • step S3011 the feature picture corresponding to all/part of the video identification is displayed to the user.
  • the classification display is performed according to the category pointer of the feature information, and when the same category of pictures exceeds a certain number, the hiding may be partially performed.
  • step S3012 the user's selection is received, and the identification name corresponding to the feature picture is obtained.
  • the identification name contains the feature picture related information to be searched and matched with the video to be retrieved.
  • the identifier name belongs to the feature picture category, the video identifier belongs to the feature video category, and the feature video includes multiple pictures, and some of the pictures are feature pictures (not all pictures are logo pictures).
  • step S3013 all the video identifiers are retrieved according to the identifier name, and the corresponding "video file information" is obtained and displayed to the user.
  • the specific presentation mode may be a file method or a method of abbreviating a video file.
  • the video file information may include: a video file path, a video file name, a video file thumbnail, a video file content introduction, and the like.
  • a special mark is displayed at the position where the video identification appears on the progress bar.
  • the primary feature information has priority greater than the secondary feature information (category pointer), and the secondary feature information (category pointer) has priority greater than the auxiliary feature information (category pointer) ), or the main feature information (category pointer) has priority greater than the auxiliary feature information (category pointer), the auxiliary feature information (category pointer) has priority greater than the secondary feature information (category pointer), and the manual feature information (category pointer) Priority is not limited.
  • the method can include steps S3021-3023.
  • step S3021 text search information input by the user is received.
  • step S3022 all video identifications are retrieved based on the text retrieval information.
  • the "main feature information (category pointer)", “secondary feature information (category pointer)” field, “manual feature information (type pointer)”, “auxiliary feature information (category pointer)" of the video identifier are retrieved. One or more of them.
  • step S3023 the "video file information" of the video ID of the retrieved hit is displayed to the user. It can be a file or a way to shorten a video file.
  • the approximate words may also be searched for in all video identifiers according to the text search words input by the user.
  • the application environment of the embodiment of the present disclosure may be an Android system mobile phone or other operating systems.
  • the Android system will be described as an example.
  • the Android mobile phone there are various multimedia applications, such as taking pictures/viewing pictures, recording/playing videos, recording/playing audio, and the like. If the user takes photos/view pictures, records/plays videos, records/plays audio, etc., the Android system phone will generate corresponding pictures, videos, audio files, etc. in the storage device, and the Android system phone uses the SQLite database to manage multimedia files. It scans the storage device and creates a database index based on the file header information for newly generated images, videos, audio and other files.
  • the file header information may include information such as a video file name, a video format, a video size, a video duration, a video resolution, a video frame rate, and the like.
  • the Android system phone includes a file manager application that queries the SQLite database to create a file list view and presents it to the user for selection by the user, for example, by clicking on the video file name for video playback.
  • FIG. 2 is a block diagram of the original system structure of the Android system.
  • a module such as a video recording application (Camcorder), a video playback application (Video Player), and a file manager application (File Explorer) interacts with a SQLite engine (Engine) through a media provider to create a multimedia file. Data indexing.
  • the video file index information (such as video file name, video format, video size, video duration, video resolution, video frame rate, etc.) is written to the media provider database.
  • the video playback application and file manager application reads the video file index information from the media provider database, and the user selects the desired video file based on the index information.
  • the video file in the existing Android mobile phone contains the multi-frame image compressed by the compression algorithm. If the complete video content needs to be obtained, the video file must be decompressed. Due to the compression characteristics of the video file and the graphical representation of the image, the video file cannot be like a picture, and the complete information is obtained through the thumbnail, and the content in the text file cannot be directly searched like a text file. At the same time, the SQLite database used by the Android mobile phone only establishes the data index by reading the multimedia file header information, so the video file content cannot be accurately described, and the user cannot effectively manage the video file, and cannot quickly retrieve the target video file.
  • FIG. 3 is a structural block diagram of a system for video retrieval according to an embodiment of the present invention.
  • a video recording application (Camcorder), a video playback application (Video Player), a file manager application (File Explorer) and the like implement video tagging and video retrieval by interacting with an object marker (Object Marker) module.
  • the tag data is written to the multimedia file database by the media provider.
  • Object Marker acquires image data per frame/sample frame, which is marked by the user for target detection, target tracking, target recognition, and recorded as video. logo.
  • the file manager application obtains a video file index from the media provider, and displays more detailed video content information for the user according to the video tag information in the media provider database, and displays the marked target image thumbnails, thereby Provide a more intuitive video index.
  • FIG. 4 is a block diagram of a target tag engine of a video retrieval method according to an embodiment of the present invention.
  • the Object Marker Engine dispatches modules such as Object Detection, Object Tracking, Object Recognize, and Object Mark.
  • the target tag engine interacts with the video encoder/decoder (Video Encoder/Decoder) to acquire video per frame/sample frame images.
  • the target tag engine interacts with the media provider to write video identification information into the multimedia database index.
  • Receiving the user's marking operation such as selecting a frame area (for example, a man's head) in 00:44 seconds of "Video File 1", performing image recognition on the image in the frame area, extracting feature information, and saving as video identification A, As shown in Table 1:
  • Video File 1 Perform image analysis for each frame/sample frame of "Video File 1" (eg, every 10 frames, or 20 frames, sample one frame, or every other time, such as 2 frames for 2 seconds).
  • the video identification A is updated. For example, if the man starts appearing at 00:15,00:44,01:35 of the video, update the video identifier A as shown in Table 2:
  • the location information of the time can also be recorded in the time interval, such as [00:44-01:10].
  • step S1300 edge detection is performed in the area selected by the user to identify the main object.
  • the selection of the main object may be the object with the largest area occupied by the object, and may be the object with the largest difference between the color and the color difference of the other colors, or the object with the highest resolution or the object with the largest recognition factor.
  • the corresponding “category pointer” is identified, and the “category pointer” may be specifically: the type of the main object, the color of the main object, the area ratio of the main object to the entire frame image, and the main object located in the entire frame image. One or more of the regions.
  • the category pointer can be a specific value, a specific category, or a range of ranges. An example is shown in Table 3:
  • the image analysis method used may be: Scale-invariant feature transform (SIFT)/Speeded Up Robust Features (SURF), Haar wavelet transform (haar) feature, generalized Hough transform and other methods.
  • SIFT Scale-invariant feature transform
  • SURF Speeded Up Robust Features
  • Har Haar wavelet transform
  • generalized Hough transform generalized Hough transform
  • step S1301 one or more secondary objects in the identified area are identified, and corresponding secondary feature information (category pointer) is obtained.
  • secondary feature information category pointer
  • Example 1 The secondary object, such as “brown eyes”, “brown short hair”, “yellow hat”, can be further identified according to the feature information A of the video identification A, and the information is updated to the video identification A, such as saving the field Feature information (category pointer 1_1): eye; secondary feature information (category pointer 1_2): brown; secondary feature information (category pointer 2_1): short hair; secondary feature information (category pointer 2_2): brown; secondary feature Information (category pointer 3_1): hat; secondary feature information (category pointer 3_2): yellow.
  • the saved information is shown in Table 4:
  • Example 2 The user identifies a beach with “a brown reef with a 12% image ratio” and “a white shell with a 5% image ratio”. After identifying the secondary object, the secondary feature information is saved (category pointer) As shown below, as shown in Table 5:
  • step S1302 in the case where the user previously marks or edits the video identification information afterwards, the manual feature information (category pointer) marked by the user is further obtained, or the recognized category pointer is provided to the user for confirmation.
  • category pointer Such as: user tag “James”, “journa” or other tags.
  • step S1303 the range of the marker object is expanded, the auxiliary feature information is identified, and the corresponding "category pointer" is obtained.
  • the edge detection the range of the area of the selected mark object is increased, thereby increasing the identifiable information, for example, the face part of the object A circled by the original user, except that the feature information of the face part is recognized.
  • the edge recognition technology can also be used to extend the marker object to the whole body of the object A, and extract new feature information for recognition, and store the result as auxiliary feature information, for example, the feature "blue shirt", " Workers Card”.
  • video identification B range orange, blue suit
  • video identification C yellow hat, red bow tie
  • the video identification information may be updated for part of the video file or all of the video files. For example, after the video identifier A is added and the video identifier A has been updated for the "video file 1", the video identifier is further updated for other video files such as "video file 2", “video file 3", "video file 4", and the like.
  • Table 8 A, as shown in Table 8:
  • the feature pictures of all video identifiers are displayed to the user.
  • the specific presentation manner may be: a folder mode, or a method of abbreviating a video file.
  • the feature picture is a picture selected by the user, and contains information of interest to the user; the identifier name can be regarded as a database index, and is used to find/match the video identifier corresponding to all the identifier names in the database through the feature picture.
  • the video identifier corresponds to the video file one by one, and the video information includes: a video file path, a video file name, a video file thumbnail, a video file content introduction, and the like.
  • the user inputs a text search term for searching, such as “mobile phone”, searches for the “main feature information (category pointer) field of all video identifiers in the database, searches for video identification information related to the mobile phone, and presents the feature image of the search result to The user, that is, displays the feature picture corresponding to the video identifier E and the video identifier F. It is also possible to search for words based on the text entered by the user and find the approximate words in all the video identifiers. As shown in Table 9:
  • the "primary feature information (category pointer)" is greater than “secondary feature information (category pointer)” is greater than “auxiliary feature information (category pointer)", or priority
  • the main feature information (category pointer) is larger than the “secondary feature information (category pointer)” is greater than the “secondary feature information (category pointer)", and the priority of the “feature information (category pointer)” is not limited in this disclosure.
  • the system receives the keywords/key sentences input by the user, and classifies the keywords/key sentences to obtain a number of words/words.
  • the search is performed in one or more of the fields, each hit a word/word, that is, accumulating or multiplying a certain proportional coefficient or a combination of the two on the relevant index of the video identification.
  • the relevant indexes of all video identifiers are sorted, and the video files with greater relevance to the search results are preferentially presented to the user.
  • the above embodiment of the present disclosure can perform image analysis on the mark image by one mark of the user, extract feature information, and search for the same or related image in the current video or other videos according to the feature information, and update the video identifier accordingly.
  • the updated video identification information records all the time point information of all the videos in which all the images appear in the local, and also includes "feature information (category pointer)" obtained according to the feature information.
  • the above-described embodiments of the present disclosure can also identify a plurality of feature information (category pointers) for the object marked by the user, such as: main feature information (category pointer), secondary feature information (category pointer), and auxiliary feature information (category) Pointer), and record, to facilitate the classification of videos in subsequent use, and to facilitate user retrieval.
  • feature information category pointers
  • main feature information category pointer
  • secondary feature information category pointer
  • auxiliary feature information category Pointer
  • FIG. 5 is a schematic diagram of an apparatus for video retrieval according to an embodiment of the present disclosure.
  • the apparatus of this embodiment may include: an identification module configured to: after receiving the target image selected by the user, perform pattern recognition on the target image, extract feature information, and save the feature information as a video identifier; an identifier module configured to perform image recognition on each frame or sample frame of a currently played video or a specified video or all locally stored video, and when it is found that there is feature information identical or similar to the feature information, And updating the video identifier; and the retrieval module is configured to, after receiving the retrieval instruction, retrieve the video identifier according to the information carried by the retrieval instruction.
  • the identifying module performs pattern recognition on the target image to extract feature information, including the steps of: determining an identification area of the target image, performing edge detection on the identification area, and extracting a main object. Feature information.
  • the main object may include any one of the following: an object occupying the largest area, an object having the largest color difference from other colors, and the object having the highest definition.
  • the feature information of the primary object may include one or more of the following: a type of the primary object, a color of the primary object, an area ratio of the primary object to the entire frame image, and a primary object located in a region of the entire frame of the image.
  • the identifying module performs pattern recognition on the target image to extract feature information, and further includes: identifying one or more secondary objects in the identified area to obtain corresponding secondary feature information.
  • the identifying module performs pattern recognition on the target image to extract feature information, and further includes: receiving feature information marked by the user.
  • the identifying module performs pattern recognition on the target image to extract feature information, and further includes: increasing the identification area to identify auxiliary feature information in the increased identification area.
  • the information corresponding to the video identifier includes one or more of the following: a feature image corresponding to the feature information, time information in which the feature information is located in the video, and file information of the video file.
  • the device also includes a display module.
  • the display module is configured to: before the retrieval module receives the retrieval instruction, display a feature image corresponding to all the video identifiers, where the information carried by the retrieval instruction includes a feature image,
  • the searching module searches for the video identifier according to the information carried by the retrieval instruction, and the method includes: obtaining a corresponding identifier name according to the feature image, searching for all video identifiers in the database according to the identifier name, and acquiring the Identify the video file information corresponding to the name.
  • An embodiment of the present invention further provides an apparatus for video retrieval, comprising a processor and a computer readable storage medium, wherein the computer readable storage medium stores an instruction, wherein when the instruction is executed by the processor, The above method of video retrieval.
  • Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that implement the method of video retrieval when executed by the computer executable instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本公开提供了一种视频检索的方法和装置以及计算机可读存储介质,该方法包括:接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识;对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识;接收到检索指令后,根据所述检索指令携带的信息对所述视频标识进行检索。

Description

视频检索的方法和装置以及计算机可读存储介质 技术领域
本公开实施例涉及但不限于信息处理领域,尤其涉及一种视频检索的方法和装置以及计算机可读存储介质。
背景技术
在手机等终端中视频检索的现有方法例如为:用户针对视频某一特定帧标注文字信息,后续通过该文字信息检索该图像帧。该方法存在以下缺点:用户只能手动一个个标注文字信息,工作量大,且标注不全;只能在该视频文件中检索已标记的文字标签,无法跨视频进行检索;无法根据现有的标记进行自动分类,可维护性差,检索难度大。
发明内容
本公开的一方面提供了一种视频检索的方法,包括步骤:接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识;对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识;以及接收到检索指令后,根据所述检索指令携带的信息对所述视频标识进行检索。
本公开的另一方面提供了一种视频检索的装置,其中,包括:识别模块,配置为接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识;标识模块,配置为对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识;以及检索模块,配置为接收到检索指令后,根据所述检索指令携带的 信息对所述视频标识进行检索。
本公开的另一方面提供了一种计算机可读存储介质,其存储有指令,当所述指令被处理器执行时,实现如上所述的方法。
本公开的另一方面提供了一种视频检索的装置,其包括处理器和计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令被所述处理器执行时,实现如上所述的方法。
附图说明
图1为本公开实施例的一种视频检索的方法的流程图;
图2为安卓系统原有的系统结构框图;
图3为本公开实施例提供的一种视频检索的方法的系统结构框;
图4为本公开实施例提供的一种视频检索的方法的目标标记引擎框图;
图5为本公开实施例提供的一种视频检索的装置的示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚明白,下文中将结合附图对本公开的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。
图1为本公开实施例的一种视频检索的方法的流程图。
如图1所示的,根据本公开实施例的视频检索的方法包括步骤S10-S30。
在步骤S10中,接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识。
在步骤S20中,对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识。
在步骤S30中,接收到检索指令后,根据所述检索指令携带的信息(即,所述检索指令的信息)对所述视频标识进行检索。
本实施例中,可由用户标记目标图像,并针对用户标记的对象进行图形识别,识别出其特征信息,并保存为视频标识。所述目标图像可以是在视频录制或者播放过程中选择的,也可以是用户自己选择的图像。
相对于现有技术的只能由用户手动一个个标注文字信息的方法,本实施例的方法只需用户进行一次标识,即可针对该视频乃至所有本地视频文件更新该视频标识,记录下所有涉及到的视频文件信息和各视频中视频标识出现的时间点/时间范围。而且,本实施例的方法便于用户通过文字或者图片的方式,快速地检索视频,找到目标对象出现的所有视频时间点,并快速定位和查看。此外,由于是根据“特征信息”针对视频来更新视频标识的,所以在对象视频涉及到人脸的时候,不会受到对象年龄或者打扮的影响,由此使得结果更为准确。
因此,本公开的方法可加快视频文件检索速度,提高视频文件检索准确度,提升用户体验,尤其是安卓系统手机用户的体验。
在一实施例中,所述对所述目标图像进行图形识别,提取出特征信息,包括以下步骤:确定所述目标图像的标识区域,对所述标识区域进行边缘检测,提取出主要对象的特征信息。
所述主要对象可以包括以下的任一种:占区域面积最大的对象,颜色与其他颜色色差区别最大的对象,清晰度最高的对象或其他某一识别因素最大的对象。
所述主要对象的特征信息可以包括以下的一项或多项:主要对象的类型、主要对象的色彩、主要对象占整帧图像的面积比例、主要对象位于整帧图像的所在区域。
在一实施例中,所述对所述目标图像进行图形识别,提取出特征信息,还包括以下步骤:识别所述标识区域内的一个或多个次要对象,获得对应的次要特征信息。
可借助“主要特征信息(类别指针)”、“次要特征信息(类别指针)”、以及下面描述的“辅助特征信息(类别指针)”的一种或多种,对视频标识增加若干个类别标记,通过对所有视频标识进行自动分类,便于后续的归类和用户检索。
在一实施例中,所述对所述目标图像进行图形识别,提取出特征信息,还包括:接收用户标注的特征信息。
本公开的方案还能对用户标注的对象识别出多个特征信息(类别指针),如:主要特征信息(类别指针)、次要特征信息(类别指针)、辅助特征信息(类别指针),并进行记录,便于后续使用中的视频分类,以及便于用户进行检索。
在一实施例中,所述对所述目标图像进行图形识别,提取出特征信息,还包括:对标识区域进行增大,识别增大的标识区域内的辅助特征信息。
在一实施例中,所述视频标识对应的信息包括以下的一项或多项:所述特征信息对应的特征图片、所述特征信息位于视频中的时间信息、视频文件的文件信息。
在一实施例中,步骤S10可以包括步骤S11-S14。
在步骤S11中,录制视频或播放视频过程中,接收用户的选定操作。具体的,选定操作可以为点选、框选、圈选,也可以为通过压力大小选择对应的选定区域。
在步骤S12中,根据用户的选定操作,确定标识区域。
在步骤S13中,对标识区域进行图像分析,提取其特征信息。
在步骤S14中,将该标识区域的特征信息保存为视频标识,记录该标识区域的特征信息、标识区域位于该视频的时间点信息,还可进一步记录该视频文件的文件信息和/或该标识区域的特征图片。
在一实施例中,步骤S13中“对标识区域进行图像分析,提取其特征信息”可以包括:步骤S1300:在用户选定的区域内,进行边缘检测,识别出主要对象。根据该主要对象识别出相应的“类别指针”,即主要特征信息(类 别指针)。
在一实施例中,步骤S13还可以包括下述“步骤S1301”、“步骤S1302”、“步骤S1303”中的一个或多个,而先后顺序不做限定。
在步骤S1301中,识别该标识区域内的一个或多个次要对象,获得对应的次要特征信息(类别指针)。
在步骤S1302中,在用户事先标记或者事后编辑视频标识信息的情况下,进一步获得用户标注的“类别指针”,即手动特征信息(类别指针)。
在步骤S1303中,扩大标记对象范围,识别辅助特征信息(类别指针)。
在一实施例中,在步骤S20中,可以对视频的每一帧/抽样帧(如:每隔10帧,或者20帧,抽样一帧,或每隔一段时间,如2秒抽样一帧)进行图像分析,当识别到存在与现有视频标识相同或近似的特征信息时,更新该视频标识,具体为,在视频标识信息中,增加该标识区域在该视频中出现的新的时间点信息。
在一实施例中,局部/全局扫描部分/全部视频文件,根据视频标识的特征信息对每帧/抽样帧进行检索,以更新视频标识。
在一实施例中,所述检索指令携带的信息包括特征图像。在接收到检索指令之前,所述方法包括:显示所有视频标识对应的特征图片。所述根据所述检索指令携带的信息对所述视频标识进行检索,包括以下步骤:根据所述特征图像获取对应的标识名称,根据所述标识名称查找数据库中的所有视频标识,获取所述标识名称对应的视频文件信息。
步骤S30的视频标识检索流程,可以包括以下方法:
方法一:图片检索接口
该方法可以包括步骤S3011-S3013。
在步骤S3011中,向用户显示所有/部分视频标识对应的特征图片。
在一实施例中,根据特征信息的类别指针进行分类显示,当同类别图片超出一定数量时,可部分进行隐藏。
在步骤S3012中,接收用户的选择,并获得该特征图片对应的标识名称。
标识名称包含了特征图片相关信息,以此与待检索的视频进行检索匹配。标识名称属于特征图片范畴,视频标识属于特征视频范畴,特征视频包含多幅图片,其中若干图片为特征图片(并非所有图片为标识图片)。
在步骤S3013中,根据上述标识名称对所有视频标识进行检索,获得对应的“视频文件信息”,并显示给用户。具体呈现方式可以为文件的方式,也可以为缩略视频文件的方式。所述视频文件信息可以包括:视频文件路径,视频文件名称,视频文件缩略图,视频文件内容简介等。
在一实施例中,视频文件打开后,在进度条上视频标识出现的位置显示特殊标记。
在一实施例中,在进行图像分类时,主要特征信息(类别指针)的优先级大于次要特征信息(类别指针),次要特征信息(类别指针)的优先级大于辅助特征信息(类别指针),或主要特征信息(类别指针)的优先级大于辅助特征信息(类别指针),辅助特征信息(类别指针)的优先级大于次要特征信息(类别指针),手动特征信息(类别指针)的优先级不进行限定。
方法二:文字检索接口
该方法可以包括步骤S3021-3023。
在步骤S3021中,接收用户输入的文字检索信息。
在步骤S3022中,根据文字检索信息,对所有视频标识进行检索。
在一实施例中,检索视频标识的“主要特征信息(类别指针)”、“次要特征信息(类别指针)”字段、“手动特征信息(类型指针)”、“辅助特征信息(类别指针)”的一种或多种。
在步骤S3023中,将检索命中的视频标识的“视频文件信息”显示给用户。可以为文件的方式,也可以为缩略视频文件的方式。
在一实施例中,也可根据用户输入的文字检索词,在所有视频标识中查找其近似词。
本公开实施例的应用环境可以为安卓系统手机,也可以为其他操作系统。以下,以安卓系统为例进行说明。
在安卓系统手机中,存在各种多媒体应用,涉及拍照/查看图片、录制/播放视频、录制/播放音频等。若用户进行拍照/查看图片、录制/播放视频、录制/播放音频等,安卓系统手机会在存储设备中生成相应的图片、视频、音频等文件,同时,安卓系统手机采用SQLite数据库管理多媒体文件,其会扫描存储设备,为新生成的图片、视频、音频等文件建立基于文件头信息的数据库索引。文件头信息可包括例如视频文件名、视频格式、视频尺寸、视频时长、视频分辨率、视频帧率等信息。安卓系统手机包含文件管理器应用,其通过查询SQLite数据库,创建文件列表视图并呈现给用户,供用户选择,例如,通过点击视频文件名进行视频播放。
图2为安卓系统原有的系统结构框图。如图2所示,视频录制应用(Camcorder)、视频播放应用(Video Player)、文件管理器应用(File Explorer)等模块通过媒体提供者(Media Provider)与SQLite引擎(Engine)交互,创建多媒体文件数据索引。视频录制应用录制视频完成后,将视频文件索引信息(如:视频文件名、视频格式、视频尺寸、视频时长、视频分辨率、视频帧率等)写媒体提供者数据库。视频播放应用和文件管理器应用从媒体提供者数据库读取视频文件索引信息,用户根据索引信息选择所需的视频文件。
现有安卓系统手机中的视频文件中包含通过压缩算法进行压缩后的多帧图像,若需获取完整视频内容,则须对视频文件进行解压。由于视频文件的压缩特性,以及图像的图形化表现形式,导致视频文件无法像图片一样,通过缩略图获取完整信息,也无法像文本文件一样,直接搜索文本文件中的内容。同时,安卓系统手机采用的SQLite数据库仅通过读取多媒体文件头信息建立数据索引,故无法准确描述视频文件内容,导致用户无法有效管理视频文件,无法快速检索目标视频文件。
图3为本实施例提供的一种视频检索的方法的系统结构框图。如图3所示,视频录制应用(Camcorder)、视频播放应用(Video Player)、文件管理器应用(File Explorer)等模块通过与目标标记(Object Marker)模块交互,实现视频标记和视频检索,视频标记数据通过媒体提供者写入多媒体文件数 据库。
视频录制应用在录制视频过程中/视频播放应用在播放视频过程中,Object Marker获取每帧/抽样帧图像数据,由用户进行标记,对其进行目标检测、目标追踪、目标识别,并记录为视频标识。
文件管理器应用从媒体提供者获取视频文件索引,并根据媒体提供者数据库中的视频标记信息,为用户显示更为详细的视频内容信息,同时会显示已标记的目标图像缩略图,从而为用户提供更为直观的视频索引。
图4为本实施例提供的一种视频检索的方法的目标标记引擎框图。如图4所示,目标标记引擎(Object Marker Engine)对目标检测(Object Detection)、目标追踪(Object Tracking)、目标识别(Object Recognize)、目标标记(Object Mark)等模块进行调度。目标标记引擎与视频编/解码器(Video Encoder/Decoder)交互,获取视频每帧/抽样帧图像。目标标记引擎与媒体提供者交互,将视频标识信息写入多媒体数据库索引中。
下面通过具体实施例对本公开的方法及装置进一步说明。
实施例1:
接收用户的标记操作,如在“视频文件1”的00:44秒选定框区域(例如一男子头像),对框区域内的图像进行图像识别,提取特征信息,并保存为视频标识A,如表1所示:
表1
Figure PCTCN2018123938-appb-000001
对“视频文件1”的每一帧/抽样帧(如:每隔10帧,或者20帧,抽样一帧,或每隔一段时间,如2秒抽样一帧)进行图像分析,当出现与特征信息A相同的信息时,更新视频标识A。举例,如果该男子在视频的00:15,00:44,01:35开始出现,则更新视频标识A,如表2所示:
表2
Figure PCTCN2018123938-appb-000002
所在时间位置信息,还可以以时间区间的方式进行记录,如【00:44-01:10】。
在步骤S1300中,在用户选定的区域内,进行边缘检测,识别出主要对象。
其中,如上所述,主要对象的选择可以是所占区域面积最大的对象,可以是颜色与其他颜色色差区别最大的对象,也可以是清晰度最高的对象或其他某一识别因素最大的对象。
根据该主要对象识别出相应的“类别指针”,“类别指针”具体可以为为:主要对象的类型、主要对象的色彩、主要对象占整帧图像的面积比例、主要对象位于整帧图像的所在区域中的一项或多项。类别指针可以是具体数值、具体类别,也可以是区间范围。示例如表3所示:
表3
在步骤S13中,使用的图像分析方法可以为:尺度不变特征变换 (Scale-invariant feature transform,SIFT)/加速稳健特征(Speeded Up Robust Features,SURF)、哈尔小波转换(haar)特征、广义霍夫(hough)变换等方法。
在步骤S1301中,识别该标识区域内的一个或多个次要对象,获得对应的次要特征信息(类别指针)。例如:
例1:可根据视频标识A的特征信息A进一步识别出次要对象,如“棕色眼睛”、“棕色短发”、“黄色帽子”,并将该信息更新到视频标识A中,如保存字段次要特征信息(类别指针1_1):眼睛;次要特征信息(类别指针1_2):棕色;次要特征信息(类别指针2_1):短发;次要特征信息(类别指针2_2):棕色;次要特征信息(类别指针3_1):帽子;次要特征信息(类别指针3_2):黄色。保存信息如表4所示:
表4
Figure PCTCN2018123938-appb-000004
例2:用户标识了一个海滩,海滩上还有“占图像比例12%的褐色礁石”、“占图像比例5%的白色贝壳”,则识别次要对象后,保存次要特征信息(类别指针如下),如表5所示:
表5
Figure PCTCN2018123938-appb-000005
Figure PCTCN2018123938-appb-000006
在步骤S1302中,在用户事先标记或者事后编辑视频标识信息的情况下,进一步获得用户标注的手动特征信息(类别指针),或将已识别到的类别指针提供给用户确认。如:用户标记“James”,“记者”或其他标签。如表6所示:
表6
Figure PCTCN2018123938-appb-000007
步骤S1303中,扩大标记对象范围,识别辅助特征信息,获得对应的“类别指针”。具体可以是,根据边缘检测,对选定标记对象的区域范围进行增大,从而增加可识别的信息,例如:原用户圈选的对象A的人脸部分,除了识别人脸部分的特征信息得出类别指针外,还可利用边缘识别技术将标记对象扩展到对象A的全身,并提取新的特征信息进行识别,并将结果作为辅助特征信息进行存储,例如,特征“蓝色衬衫”,“工卡”。同理,视频标识B(橙色橘子、蓝色西装)、视频标识C(黄色帽子、红色领结)经识别以 后,保存信息如表7所示:
表7
Figure PCTCN2018123938-appb-000008
当系统新增新的视频标识后,可以针对部分视频文件或者全部视频文件更新该视频标识信息。例如,新增视频标识A且已针对“视频文件1”更新视频标识A后,进一步针对其他视频如“视频文件2”、“视频文件3”、“视频文件4”等视频文件,更新视频标识A,如表8所示:
表8
Figure PCTCN2018123938-appb-000009
Figure PCTCN2018123938-appb-000010
实施例2:
根据视频标识中的各种特征信息(类别指针)字段,向用户分类显示所有视频标识的特征图片。
用户点击某一特征图片,检索与其相关的视频信息。具体为,获得该特征图片对应的标识名称,根据标识名称查找数据库中的所有视频标识信息,获得该标识名称对应的视频文件信息,或视频文件信息及其所在时间位置。将检索结果呈现给用户。具体呈现方式可以为:文件夹的方式,也可以为缩略视频文件的方式。
特征图片为用户选择的图片,包含了用户感兴趣的信息;标识名称可认为是数据库索引,用于通过特征图片来查找/匹配数据库中所有标识名称所对应的视频标识。视频标识与视频文件一一对应,视频信息包括:视频文件路径,视频文件名称,视频文件缩略图,视频文件内容简介等。
实施例3:
用户输入文字检索词进行检索,如“手机”,则检索数据库中所有视频标识的“主要特征信息(类别指针)”字段,查找与手机相关的视频标识信息,并将检索结果的特征图片呈现给用户,即,显示视频标识E与视频标识F对应的特征图片。也可以根据用户输入的文字检索词,在所有视频标识中查找其近似词。如表9所示:
表9
Figure PCTCN2018123938-appb-000011
检索时,也可以同时检索视频标识的“主要特征信息(类别指针)”字段、“次要特征信息(类别指针)”字段、“手动特征信息(类别指针)”字段”、“辅助特征信息(类别指针)”字段的一种或多种。优先级上,“主要特征信息(类别指针)”大于“次要特征信息(类别指针)”大于“辅助特征 信息(类别指针)”,或优先级主要特征信息(类别指针)”大于“辅助特征信息(类别指针)”大于“次要特征信息(类别指针)”,“特征信息(类别指针)”的优先级本公开不进行限制。
系统接收用户输入的关键词/关键语句,对关键词/关键语句进行分词,得到若干个字/词。针对获得的字/词分别在所有视频标识的“主要特征信息(类别指针)”、“次要特征信息(类别指针)”、“手动特征信息(类别指针)”、“辅助特征信息(类别指针)”的一种或多种中进行检索,每命中一个字/词,即在该视频标识的相关指数上进行累加或乘以一定比例系数或两者的组合。对所有视频标识的相关指数进行排序,优先向用户呈现与检索结果相关度较大的视频文件。
本公开的上述实施例可以通过用户的一次标记,对标记图像进行图像分析,提取特征信息,并根据特征信息查找当前视频或者其他视频中相同或相关的图像,并据此更新视频标识。更新后的视频标识信息中记录了本地所有出现该图像的全部视频的所有时间点信息,还包括根据特征信息得到的“特征信息(类别指针)”。借助视频标识,用户可通过文字或者图片快速检索并定位到所有相关视频的所有时间点,达到快速检索的技术效果。
此外,本公开的上述实施例还能对用户标记的对象识别出多个特征信息(类别指针),如:主要特征信息(类别指针)、次要特征信息(类别指针)、辅助特征信息(类别指针),并进行记录,便于后续使用中的视频分类,以及便于用户进行检索。
图5为本公开实施例的一种视频检索的装置的示意图。如图5所示,本实施例的装置可以包括:识别模块,配置为接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识;标识模块,配置为对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识;以及检索模块,配置为接收到检索指令后,根据所述检索指令携带的信息对所述视频标识进行检索。
在一实施例中,所述识别模块对所述目标图像进行图形识别,提取出特 征信息,包括步骤:确定所述目标图像的标识区域,对所述标识区域进行边缘检测,提取出主要对象的特征信息。
所述主要对象可以包括以下的任一种:占区域面积最大的对象,颜色与其他颜色色差区别最大的对象,清晰度最高的对象。所述主要对象的特征信息可以包括以下的一项或多项:主要对象的类型、主要对象的色彩、主要对象占整帧图像的面积比例、主要对象位于整帧图像的所在区域。
在一实施例中,所述识别模块对所述目标图像进行图形识别,提取出特征信息,还包括:识别所述标识区域内的一个或多个次要对象,获得对应的次要特征信息。
在一实施例中,所述识别模块对所述目标图像进行图形识别,提取出特征信息,还包括:接收用户标注的特征信息。
在一实施例中,所述识别模块对所述目标图像进行图形识别,提取出特征信息,还包括:对所述标识区域进行增大,识别增大的标识区域内的辅助特征信息。
在一实施例中,所述视频标识对应的信息包括以下的一项或多项:所述特征信息对应的特征图片、所述特征信息位于视频中的时间信息、视频文件的文件信息。
在一实施例中,所述装置还包显示模块。
所述显示模块配置为,在所述检索模块接收到检索指令之前,显示所有视频标识对应的特征图片,所述检索指令携带的信息包括特征图像,
所述检索模块根据所述检索指令携带的信息对所述视频标识进行检索,包括步骤:根据所述特征图像获取对应的标识名称,根据所述标识名称查找数据库中的所有视频标识,获取所述标识名称对应的视频文件信息。
本发明实施例还提供一种视频检索的装置,包括处理器和计算机可读存储介质,所述计算机可读存储介质中存储有指令,其中,当所述指令被所述处理器执行时,实现上述视频检索的方法。
本发明实施例还提供了一种计算机可读存储介质,其存储有计算机可执 行指令,所述计算机可执行指令被执行时实现所述视频检索的方法。
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。
以上仅为本公开的优选实施例,当然,本公开还可有其他多种实施例,在不背离本公开精神及其实质的情况下,熟悉本领域的技术人员当可根据本公开作出各种相应的改变和变形,但这些相应的改变和变形都应属于本公开所附的权利要求的保护范围。

Claims (17)

  1. 一种视频检索的方法,包括:
    接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识;
    对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识;以及
    接收到检索指令后,根据所述检索指令携带的信息对所述视频标识进行检索。
  2. 如权利要求1所述的方法,其中,所述对所述目标图像进行图形识别,提取出特征信息,包括:
    确定所述目标图像的标识区域,对所述标识区域进行边缘检测,提取出主要对象的特征信息,
    所述主要对象包括以下的任一种:占区域面积最大的对象,颜色与其他颜色色差区别最大的对象,清晰度最高的对象。
  3. 如权利要求2所述的方法,其中,所述主要对象的特征信息包括以下的一项或多项:
    主要对象的类型、主要对象的色彩、主要对象占整帧图像的面积比例、主要对象位于整帧图像的所在区域。
  4. 如权利要求2所述的方法,其中,所述对所述目标图像进行图形识别,提取出特征信息,还包括:
    识别所述标识区域内的一个或多个次要对象,获得对应的次要特征信息。
  5. 如权利要求2所述的方法,其中,所述对所述目标图像进行图形识别,提取出特征信息,还包括:
    接收用户标注的特征信息。
  6. 如权利要求2所述的方法,其中,所述对所述目标图像进行图形识别,提取出特征信息,还包括:
    对所述标识区域进行增大,识别增大的标识区域内的辅助特征信息。
  7. 如权利要求1所述的方法,其中,所述视频标识对应的信息包括以下的一项或多项:
    所述特征信息对应的特征图片、所述特征信息位于视频中的时间信息、视频文件的文件信息。
  8. 如权利要求1所述的方法,其中,
    所述检索指令携带的信息包括特征图像,在接收到所述检索指令之前,所述方法包括:显示所有视频标识对应的特征图片;并且
    所述根据所述检索指令携带的信息对所述视频标识进行检索,包括:根据所述特征图像获取对应的标识名称,根据所述标识名称查找数据库中的所有视频标识,获取所述标识名称对应的视频文件信息。
  9. 一种视频检索的装置,包括:
    识别模块,配置为接收到用户选择的目标图像后,对所述目标图像进行图形识别,提取出特征信息,并将所述特征信息保存为视频标识;
    标识模块,配置为对当前播放的视频或指定视频或本地存储的全部视频的每一帧或抽样帧进行图像识别,当识别到存在与所述特征信息相同或近似的特征信息时,更新所述视频标识;以及
    检索模块,配置为接收到检索指令后,根据所述检索指令携带的信息对所述视频标识进行检索。
  10. 如权利要求9所述的装置,其中,
    所述识别模块对所述目标图像进行图形识别,提取出特征信息,包括:确定所述目标图像的标识区域,对所述标识区域进行边缘检测,提取出主要对象的特征信息,
    所述主要对象包括以下的任一种:
    占区域面积最大的对象,颜色与其他颜色色差区别最大的对象,清晰度最高的对象,
    所述主要对象的特征信息包括以下的一项或多项:
    主要对象的类型、主要对象的色彩、主要对象占整帧图像的面积比例、主要对象位于整帧图像的所在区域。
  11. 如权利要求10所述的装置,其中,
    所述识别模块对所述目标图像进行图形识别,提取出特征信息,还包括:识别所述标识区域内的一个或多个次要对象,获得对应的次要特征信息。
  12. 如权利要求10所述的装置,其中,
    所述识别模块对所述目标图像进行图形识别,提取出特征信息,还包括:接收用户标注的特征信息。
  13. 如权利要求10所述的装置,其中,
    所述识别模块对所述目标图像进行图形识别,提取出特征信息,还包括:对所述标识区域进行增大,识别增大的标识区域内的辅助特征信息。
  14. 如权利要求9所述的装置,其中,所述视频标识对应的信息包括以下的一项或多项:
    所述特征信息对应的特征图片、所述特征信息位于视频中的时间信息、视频文件的文件信息。
  15. 如权利要求9所述的装置,其中,所述装置还包显示模块,
    所述显示模块配置为在所述检索模块接收到检索指令之前,显示所有视频标识对应的特征图片,所述检索指令携带的信息包括特征图像,
    所述检索模块根据所述检索指令携带的信息对所述视频标识进行检索,包括:根据所述特征图像获取对应的标识名称,根据所述标识名称查找数据库中的所有视频标识,获取所述标识名称对应的视频文件信息。
  16. 一种计算机可读存储介质,其存储有指令,其中,当所述指令被处理器执行时,实现如权利要求1-8任一项所述的方法。
  17. 一种视频检索的装置,包括处理器和计算机可读存储介质,所述计算机可读存储介质中存储有指令,其中,当所述指令被所述处理器执行时,实现如权利要求1-8任一项所述的方法。
PCT/CN2018/123938 2017-12-27 2018-12-26 视频检索的方法和装置以及计算机可读存储介质 WO2019129075A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711447279.8 2017-12-27
CN201711447279.8A CN110110147A (zh) 2017-12-27 2017-12-27 一种视频检索的方法及装置

Publications (1)

Publication Number Publication Date
WO2019129075A1 true WO2019129075A1 (zh) 2019-07-04

Family

ID=67066612

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123938 WO2019129075A1 (zh) 2017-12-27 2018-12-26 视频检索的方法和装置以及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110110147A (zh)
WO (1) WO2019129075A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020154982A1 (en) * 2019-01-30 2020-08-06 Zheng Shi System and method for composing music with physical cards
CN110598705B (zh) * 2019-09-27 2022-02-22 腾讯科技(深圳)有限公司 图像的语义标注方法及装置
CN110825909A (zh) * 2019-11-05 2020-02-21 北京字节跳动网络技术有限公司 视频图像的识别方法、装置、服务器、终端和存储介质
CN111027406B (zh) * 2019-11-18 2024-02-09 惠州Tcl移动通信有限公司 图片识别方法、装置、存储介质及电子设备
CN111241344B (zh) * 2020-01-14 2023-09-05 新华智云科技有限公司 视频查重方法、系统、服务器及存储介质
CN112347305B (zh) * 2020-12-11 2022-08-26 南京简昊智能科技有限公司 一种智能运维系统及其控制方法
CN113254702A (zh) * 2021-05-28 2021-08-13 浙江大华技术股份有限公司 一种视频录像检索的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211341A (zh) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 图像智能模式识别搜索方法
CN101840422A (zh) * 2010-04-09 2010-09-22 江苏东大金智建筑智能化系统工程有限公司 基于目标特征和报警行为的智能视频检索系统和方法
CN102509118A (zh) * 2011-09-28 2012-06-20 安科智慧城市技术(中国)有限公司 一种监控视频检索方法
CN104504121A (zh) * 2014-12-29 2015-04-08 北京奇艺世纪科技有限公司 一种视频检索方法及装置
CN104636497A (zh) * 2015-03-05 2015-05-20 四川智羽软件有限公司 一种视频数据智能检索方法
WO2016033676A1 (en) * 2014-09-02 2016-03-10 Netra Systems Inc. System and method for analyzing and searching imagery

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211341A (zh) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 图像智能模式识别搜索方法
CN101840422A (zh) * 2010-04-09 2010-09-22 江苏东大金智建筑智能化系统工程有限公司 基于目标特征和报警行为的智能视频检索系统和方法
CN102509118A (zh) * 2011-09-28 2012-06-20 安科智慧城市技术(中国)有限公司 一种监控视频检索方法
WO2016033676A1 (en) * 2014-09-02 2016-03-10 Netra Systems Inc. System and method for analyzing and searching imagery
CN104504121A (zh) * 2014-12-29 2015-04-08 北京奇艺世纪科技有限公司 一种视频检索方法及装置
CN104636497A (zh) * 2015-03-05 2015-05-20 四川智羽软件有限公司 一种视频数据智能检索方法

Also Published As

Publication number Publication date
CN110110147A (zh) 2019-08-09

Similar Documents

Publication Publication Date Title
WO2019129075A1 (zh) 视频检索的方法和装置以及计算机可读存储介质
US9542419B1 (en) Computer-implemented method for performing similarity searches
US8583647B2 (en) Data processing device for automatically classifying a plurality of images into predetermined categories
US8897505B2 (en) System and method for enabling the use of captured images through recognition
US7783135B2 (en) System and method for providing objectified image renderings using recognition information from images
US7636450B1 (en) Displaying detected objects to indicate grouping
US7809722B2 (en) System and method for enabling search and retrieval from image files based on recognized information
US7243101B2 (en) Program, image managing apparatus and image managing method
US20180025215A1 (en) Anonymous live image search
WO2019144850A1 (zh) 一种基于视频内容的视频搜索方法和视频搜索装置
US20130179172A1 (en) Image reproducing device, image reproducing method
US8259995B1 (en) Designating a tag icon
US7694885B1 (en) Indicating a tag with visual data
US7813526B1 (en) Normalizing detected objects
US20170140226A1 (en) Apparatus and method for identifying a still image contained in moving image contents
JP2004234228A (ja) 画像検索装置、画像検索装置におけるキーワード付与方法、及びプログラム
US20080162561A1 (en) Method and apparatus for semantic super-resolution of audio-visual data
JP2000276484A (ja) 画像検索装置、画像検索方法及び画像表示装置
CN110263746A (zh) 基于姿势的视觉搜索
JP6046393B2 (ja) 情報処理装置、情報処理システム、情報処理方法および記録媒体
JP2007280325A (ja) 動画表示装置
US20170154240A1 (en) Methods and systems for identifying an object in a video image
Papadopoulos et al. ClustTour: City exploration by use of hybrid photo clustering
US7702186B1 (en) Classification and retrieval of digital photo assets
US20120059855A1 (en) Method and computer program product for enabling organization of media objects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18895807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/12/2020).

122 Ep: pct application non-entry in european phase

Ref document number: 18895807

Country of ref document: EP

Kind code of ref document: A1