WO2022070340A1 - 映像検索システム、映像検索方法、及びコンピュータプログラム - Google Patents
映像検索システム、映像検索方法、及びコンピュータプログラム Download PDFInfo
- Publication number
- WO2022070340A1 WO2022070340A1 PCT/JP2020/037251 JP2020037251W WO2022070340A1 WO 2022070340 A1 WO2022070340 A1 WO 2022070340A1 JP 2020037251 W JP2020037251 W JP 2020037251W WO 2022070340 A1 WO2022070340 A1 WO 2022070340A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- similarity
- scene information
- search query
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/735—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
Definitions
- the present invention relates to a video search system for searching video, a video search method, and a technical field of a computer program.
- Patent Document 1 discloses a technique for extracting an image feature amount for each frame from a video and searching for the video.
- Patent Document 2 discloses a technique for searching a video using a still image for a search query.
- the present invention has been made in view of the above problems, and an object of the present invention is to provide a video search system, a video search method, and a computer program capable of appropriately searching for a desired video.
- One aspect of the video search system of the present invention is a scene information acquisition unit that acquires scene information indicating the scene of the video, a search query acquisition unit that acquires a search query, and similarities between the scene information and the search query.
- a similarity calculation unit for calculating the degree and a video search unit for searching the video corresponding to the search query based on the similarity are provided.
- scene information indicating a video scene is acquired, a search query is acquired, the similarity between the scene information and the search query is calculated, and the similarity is based on the similarity.
- Search for the video corresponding to the search query is acquired.
- One aspect of the computer program of the present invention is to acquire scene information indicating a video scene, acquire a search query, calculate the similarity between the scene information and the search query, and based on the similarity, Operate the computer to search for the video corresponding to the search query.
- each one of the above-mentioned video search system, video search method, and computer program it is possible to appropriately search for a desired video, and in particular, appropriately execute a video search using natural language. can do.
- FIG. 1 is a block diagram showing a hardware configuration of the video search system according to the first embodiment.
- the video search system 10 includes a CPU (Central Processing Unit) 11, a RAM (Random Access Memory) 12, a ROM (Read Only Memory) 13, and a storage device 14. It is equipped with.
- the video search system 10 may further include an input device 15 and an output device 16.
- the CPU 11, the RAM 12, the ROM 13, the storage device 14, the input device 15, and the output device 16 are connected via the data bus 17.
- the CPU 11 reads a computer program.
- the CPU 11 is configured to read a computer program stored in at least one of the RAM 12, the ROM 13, and the storage device 14.
- the CPU 11 may read a computer program stored in a computer-readable recording medium using a recording medium reading device (not shown).
- the CPU 11 may acquire (that is, may read) a computer program from a device (not shown) arranged outside the video search system 10 via a network interface.
- the CPU 11 controls the RAM 12, the storage device 14, the input device 15, and the output device 16 by executing the read computer program.
- a functional block for searching a video is realized in the CPU 11.
- the RAM 12 temporarily stores the computer program executed by the CPU 11.
- the RAM 12 temporarily stores data temporarily used by the CPU 11 while the CPU 11 is executing a computer program.
- the RAM 12 may be, for example, a D-RAM (Dynamic RAM).
- the ROM 13 stores a computer program executed by the CPU 11.
- the ROM 13 may also store fixed data.
- the ROM 13 may be, for example, a P-ROM (Programmable ROM).
- the storage device 14 stores data stored for a long period of time by the video search system 10.
- the storage device 14 may operate as a temporary storage device of the CPU 11.
- the storage device 14 may include, for example, at least one of a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), and a disk array device.
- the input device 15 is a device that receives an input instruction from the user of the video search system 10.
- the input device 15 may include, for example, at least one of a keyboard, a mouse and a touch panel.
- the output device 16 is a device that outputs information about the video search system 10 to the outside.
- the output device 16 may be a display device (for example, a display) capable of displaying information about the video search system 10.
- FIG. 2 is a block diagram showing a functional block included in the video search system according to the first embodiment.
- FIG. 3 is a block diagram showing a configuration of a modified example of the video search system according to the first embodiment.
- the video search system 10 is configured to be able to search for a desired video (specifically, a video corresponding to a search query input by a user) from the stored video.
- the video to be searched includes, for example, a life log based on the video, but is not particularly limited.
- the video may be stored in, for example, a storage device 14 (see FIG. 1) or a storage means (for example, a server or the like) outside the system.
- the video search system 10 includes a scene information acquisition unit 110, a search query acquisition unit 120, a similarity calculation unit 130, and a video search unit 140 as functional blocks for realizing the function. .. These functional blocks are realized, for example, in the CPU 11 (see FIG. 1).
- the scene information acquisition unit 110 is configured to be able to acquire scene information indicating a video scene.
- the scene information includes, for example, location information on which the image was captured, time information, information indicating the situation and atmosphere when the image was taken, and the like.
- the scene information may include other information that may be related to the scene of the video.
- the position information is information obtained from, for example, GPS (Global Positioning System) or the like.
- the time information is information about the date and time obtained from a time stamp or the like.
- the information indicating the situation, atmosphere, etc. when the image is taken may include information obtained from the behavior of the imager or the imaged person.
- the scene information may be assigned to one video one by one, or a plurality of scene information may be assigned to one video for a video in which scenes are switched.
- a plurality of scene information may be added to the video of a certain period.
- time information obtained from a time stamp and position information obtained from GPS may be added as scene information to a video of a certain period.
- the scene information acquisition unit 110 may include a storage unit that stores the acquired scene information. The scene information acquired by the scene information acquisition unit 110 is output to the similarity calculation unit 130.
- the search query acquisition unit 120 is configured to be able to acquire the search query input by the user.
- the search query contains information about the video desired by the user (that is, the video to be searched for).
- the search query is entered, for example, as a natural language.
- the search query in this case may include, for example, a plurality of words or phrases. Examples of search queries in natural language include "sandwiches eaten while using a computer", “distillation kilns visited", and "lunch eaten in Hokkaido".
- the user can input a search query using, for example, an input device 15 (see FIG. 1 or the like).
- the search query acquired by the search query acquisition unit 120 is configured to be output to the similarity calculation unit 130.
- the similarity calculation unit 130 is configured to be able to calculate the similarity between the object tag acquired by the scene information acquisition unit 110 and the search query acquired by the search query acquisition unit 120.
- the "similarity" here is calculated as a quantitative parameter indicating the degree of similarity between the scene information and the search query.
- the similarity may be calculated for each of the plurality of images, or may be calculated for each predetermined period of the images. In this case, the predetermined period may be appropriately determined according to the image and may be variable.
- the similarity calculation unit 130 may have a function of decomposing a search query into a plurality of words (search terms) by using, for example, a dictionary or morphological analysis.
- the similarity calculation unit 130 may calculate the number of matches between the object tag and the search term as the similarity.
- the number of matches between the object tag and the search term may be calculated, for example, in a preset aggregation time (for example, 1 minute, 1 hour, etc.).
- the similarity calculated by the similarity calculation unit 130 is output to the video search unit 140.
- the similarity calculation unit 130 may divide the video into a plurality of scene ranges based on the scene information, divide the video into a plurality of scene ranges based on the scene information, and calculate the similarity for each scene range.
- the scene range may be set using the bias of the scene information in the video.
- the image is divided by a predetermined time (for example, 10 seconds) and the position information of each divided image (hereinafter, appropriately referred to as "separated image") is obtained. Calculate the average value of the latitude and longitude information included in.
- the difference between the calculated average values of the adjacent separated videos is less than the predetermined value, they are integrated as the same divided video (for example, there are separated videos such as 1, 2, 3, 4, ..., 3 and 4). If the difference between 3 and 4 is less than a predetermined value, 3 and 4 are integrated into 5, and 1, 2, 5, ). After that, the average value is calculated again for the integrated separated video, and the same process is repeated until there is no difference whose difference is less than the predetermined value. By doing so, the images taken at a relatively close place will be set as one scene.
- the scene range may be set using the bias of the object tag.
- the scene range may be set using information that is reflected in the image for a certain period of time or longer. For example, a period in which the same object is continuously reflected for a certain period or longer may be set as one scene range.
- an object tag may be used to identify the object reflected in the image.
- the video search unit 140 searches for video according to the search query based on the similarity calculated by the similarity calculation unit 130.
- the video search unit 140 searches for a video whose similarity satisfies a predetermined condition, for example.
- the video search unit 140 may output the searched video as a search result. In this case, a plurality of images may be output.
- the video search unit 140 may output the video having the highest degree of similarity, or may output a plurality of videos having the highest degree of similarity as the search result.
- the video search unit 140 may have a function of reproducing the video output as the search result.
- the video search unit 140 may have a function of displaying an image showing a video output as a search result, such as a thumbnail.
- the video search system 10 may be configured to include a scene information adding unit 150.
- the scene information adding unit 150 adds scene information to the video by using, for example, a pre-machine-learned scene recognition model. As a specific method of automatically recognizing a scene and adding scene information, it is possible to appropriately adopt an existing technique.
- the video search system 10 includes the scene information adding unit 150, the video search can be performed even when the scene information is not added to the video. That is, in the video search system 10, the scene information adding unit 150 can add scene information to the video and then perform the video search.
- the video search system 10 does not include the scene information adding unit 150, it is sufficient to prepare a video to which the scene information is added in advance. In this case, the scene information may be automatically given by video analysis or may be given manually.
- FIG. 4 is a flowchart showing the operation flow of the video search system according to the first embodiment.
- the scene information acquisition unit 110 first acquires scene information from the accumulated video (step S101).
- scene information may be added by the scene information adding unit 150 before step S101 is executed.
- the search query acquisition unit 120 acquires the search query entered by the user (step S102).
- the similarity calculation unit 130 calculates the similarity between the scene information acquired by the scene information acquisition unit 110 and the search query acquired by the search query acquisition unit 120 (step S103).
- the video search unit 140 searches for the video according to the search query based on the degree of similarity (step S104).
- the video search system 10 may be configured to enable narrowing down of search results. In this case, after a new search query is acquired by the search query acquisition unit 120, the process of step S103 (that is, calculation of similarity) and the process of step S104 (that is, video search based on the similarity) described above are performed. Should be executed again.
- video search is performed based on the degree of similarity between the scene information and the search query. Therefore, it is possible to appropriately search for the video corresponding to the search query. Then, in the video search system 10 according to the present embodiment, in particular, even when the search query is input as a natural language, the video desired by the user can be appropriately searched.
- video search using a search query in natural language can be performed, so even if some information is missing in the search query, it is desired from a large amount of video. It is possible to search for the video to be used. In other words, it is possible to realize highly accurate video search while allowing some ambiguity.
- the second embodiment differs from the first embodiment described above in a part of the configuration and operation (specifically, the point that the cluster is used for calculating the similarity), and the other parts are different. It is almost the same. Therefore, in the following, the parts different from the first embodiment will be described in detail, and the description of other overlapping parts will be omitted as appropriate.
- FIG. 5 is a block diagram showing a functional block included in the video search system according to the second embodiment.
- FIG. 6 is a table showing an example of a word corresponding to a cluster.
- the same components as those shown in FIG. 2 are designated by the same reference numerals.
- the video search system 10 includes a word vector analysis unit 50, a word clustering unit 60, a word cluster information storage unit 70, a scene information acquisition unit 110, and a search query acquisition.
- a unit 120, a similarity calculation unit 130, a video search unit 140, a first cluster acquisition unit 160, and a second cluster acquisition unit 170 are provided. That is, in the video search system 10 according to the second embodiment, in addition to the configuration of the first embodiment (see FIG. 2), the word vector analysis unit 50, the word clustering unit 60, the word cluster information storage unit 70, and the first cluster The acquisition unit 160 and the second cluster acquisition unit 170 are further provided.
- the word vector analysis unit 50 is configured to analyze document data and convert words contained in the document into vector data (hereinafter, appropriately referred to as "word vector").
- the document data may be, for example, a general document such as a web site or a time point, or a document related to a video (for example, a document related to the business or service of the photographer of the video).
- a document related to video it is possible to analyze the similarity based on technical terms related to video rather than the similarity of general words.
- the word vector analysis unit 50 converts into a word vector by using, for example, a wordembedding method such as word2vec or a docembedding method such as doc2vec.
- the word vector generated by the word vector analysis unit 50 is output to the word clustering unit 60.
- the word clustering unit 60 is configured so that each word can be clustered based on the word vector generated by the word vector analysis unit 50.
- the word clustering unit 60 may perform clustering based on the similarity between the vectors of words.
- the word clustering unit 60 performs clustering by k-means, for example, based on the cos similarity between word vectors and the Euclidean distance.
- the clustering method is not particularly limited.
- the clustering result of the word clustering unit 60 is output to the word cluster information storage unit 70.
- the word cluster information storage unit 70 is configured to be able to store the result of clustering by the word clustering unit 60.
- the word cluster information storage unit 70 stores the ID of each cluster and the words belonging to each cluster, as shown in FIG. 7, for example.
- the information stored in the word cluster information storage unit 70 is stored in a state in which it can be appropriately used by the first cluster acquisition unit 160 and the second cluster acquisition unit 170.
- the first cluster acquisition unit 160 uses the information stored in the word cluster information storage unit 70 (that is, the result of clustering) to perform clustering using the information included in the scene information acquired by the scene information acquisition unit 110. It is configured to be able to execute and acquire a cluster to which the information included in the scene information belongs (hereinafter, appropriately referred to as "first cluster").
- the information contained in the object tag includes, for example, a word contained in the object tag, but the information is not limited to this.
- the information about the first cluster acquired by the first cluster acquisition unit 160 is output to the similarity calculation unit 130.
- the second cluster acquisition unit 170 uses the information stored in the word cluster information storage unit 70 (that is, the result of clustering), and the information included in the search query acquired by the search query acquisition unit 120 (typically). , A cluster (hereinafter, appropriately referred to as "second cluster") to which a word included in a search query belongs can be acquired.
- the information about the second cluster acquired by the second cluster acquisition unit 170 is output to the similarity calculation unit 130.
- FIG. 7 is a flowchart showing the operation flow of the video search system according to the second embodiment.
- the same reference numerals are given to the same processes as those shown in FIG.
- the scene information acquisition unit 110 first acquires scene information from the accumulated video (step S101). Then, the first cluster acquisition unit 160 acquires the first cluster to which the information included in the scene information belongs by using the clustering result stored in the word cluster information storage unit 70 (step S102). For example, the first cluster acquisition unit 160 makes an inquiry to the word cluster information storage unit 70 for each of the words included in the scene information acquired from the video, and acquires the cluster ID corresponding to each word.
- the search query acquisition unit 120 acquires the search query entered by the user (step S102).
- the second cluster acquisition unit 170 acquires the second cluster to which the information included in the search query belongs by using the clustering result stored in the word cluster information storage unit 70 (step S202).
- the second cluster acquisition unit 170 makes an inquiry to the word cluster information storage unit 70 for each of the search terms included in the search query, and acquires the cluster ID corresponding to each search term.
- the similarity calculation unit 130 calculates the similarity between the object tag and the search query by comparing the first cluster and the second cluster (step S103).
- the similarity in the second embodiment is calculated as the similarity between the first cluster (that is, the cluster to which the scene information belongs) and the second cluster (that is, the cluster to which the search query belongs).
- the video search unit 140 searches for the video corresponding to the search query based on the similarity (step S104).
- the similarity between the first cluster and the second cluster can be calculated as the cos similarity when the cluster information of the first cluster and the cluster information of the second cluster are regarded as vectors.
- the cluster information of the first cluster is Va
- the cluster information of the second cluster is Vb
- the degree of similarity between the first cluster and the second cluster can be calculated using the following equation (1).
- is norms of Va and Vb, respectively.
- the similarity is calculated using the cluster to which the word included in the scene information and the search query belongs. By doing so, the similarity between the object tag and the search query can be calculated as a more appropriate value. Therefore, it is possible to search the video corresponding to the search query more appropriately.
- the video search system 10 according to the third embodiment will be described with reference to FIGS. 8 to 11.
- the third embodiment differs from the first and second embodiments described above only in a part of the configuration and operation (specifically, the point of using an object tag), and the other parts are generally different. The same is true. Therefore, in the following, the parts different from the first and second embodiments will be described in detail, and the description of other overlapping parts will be omitted as appropriate.
- FIG. 8 is a block diagram showing a functional block included in the video search system according to the third embodiment.
- FIG. 9 is a table showing an example of an object tag.
- FIG. 10 is a block diagram showing a configuration of a modified example of the video search system according to the third embodiment.
- the same components as those shown in FIGS. 2 and 3 are designated by the same reference numerals.
- the video search system 10 according to the third embodiment has a scene information acquisition unit 110, a search query acquisition unit 120, a similarity calculation unit 130, a video search unit 140, and an object tag acquisition unit. It is equipped with 180. That is, the video search system 10 according to the third embodiment is configured to further include an object tag acquisition unit 180 in addition to the configuration of the first embodiment (see FIG. 2).
- the object tag acquisition unit 180 is configured to be able to acquire an object tag from the accumulated video.
- the object tag is information about an object reflected in the image, and is associated with each object in the image. However, a plurality of object tags may be associated with one object.
- the object tag is typically a general noun, but may be associated with a proper noun by performing an identity test or the like, for example. That is, the object tag may include unique identification information that individually distinguishes the objects). Further, the object tag may be information indicating information other than the name of the object (for example, shape, property, etc.).
- the object tag acquisition unit 180 may acquire an object tag in units of frames of video, for example.
- the object tag acquisition unit 180 may include a storage unit that stores the acquired object tag. As shown in FIG. 9, for example, the object tag may be stored in the storage unit for each frame of each video.
- the object tag acquired by the object tag acquisition unit 180 is configured to be output to the similarity calculation unit 130.
- the video search system 10 may include a scene information adding unit 150 and an object tag adding unit 190. That is, the object tagging unit 190 may be further provided in the modified example of the video search system shown in FIG.
- the object tagging unit 190 associates an object tag with an object reflected in an image by using, for example, a machine-learned object recognition model in advance.
- a specific method of recognizing an object and attaching an object tag it is possible to appropriately adopt an existing technique.
- the video search system 10 includes the object tag attachment unit 190, the video search can be performed even when the object tag is not attached to the video. That is, the video search system 10 can perform a video search after the object tag adding unit 190 attaches an object tag to the video.
- a video to which the object tag is attached may be prepared in advance. In this case, the object tag may be automatically attached by video analysis or may be attached manually.
- FIG. 11 is a flowchart showing the operation flow of the video search system according to the third embodiment.
- the same reference numerals are given to the same processes as those shown in FIG.
- the scene information acquisition unit 110 first acquires scene information from the accumulated video (step S101). Further, the object tag acquisition unit 180 acquires the object tag from the accumulated video (step S301). Further, the search query acquisition unit 120 acquires the search query input by the user (step S102). In the configuration provided with the object tagging unit 190 described above, the object tag attachment unit 190 may execute the object tag attachment before the step S301 is executed.
- the similarity calculation unit 130 calculates the similarity between the scene information and the object tag and the search query (step S103).
- the similarity here may be calculated separately as the similarity between the scene information and the search query, and the similarity between the object tag and the search query (that is, the similarity regarding the scene information and the similarity regarding the object tag). Two types of similarity with and may be calculated).
- the similarity may be calculated collectively as the similarity between both the scene information and the object tag and the search query (that is, one kind of similarity considering both the scene information and the object tag is calculated. May be).
- the video search unit 140 searches for the video according to the search query based on the similarity (step S104). If the similarity between the scene information and the search query and the similarity between the object tag and the search query are calculated separately, the overall similarity calculated from the two similarities (for example,). The video according to the search query may be searched based on the average value of the two similarities, etc.).
- the similarity is further calculated using the object tag.
- the video search system 10 according to the fourth embodiment will be described with reference to FIGS. 12 and 13. It should be noted that the fourth embodiment differs from the third embodiment described above only in a part of the configuration and operation (specifically, the point that the cluster is used for calculating the similarity), and the other parts are different. It is almost the same. Therefore, in the following, the parts different from the third embodiment will be described in detail, and the description of other overlapping parts will be omitted as appropriate.
- FIG. 12 is a block diagram showing a functional block included in the video search system according to the fourth embodiment.
- the same components as those shown in FIGS. 5 and 8 are designated by the same reference numerals.
- the video search system 10 includes a word vector analysis unit 50, a word clustering unit 60, a word cluster information storage unit 70, a scene information acquisition unit 110, and a search query acquisition.
- a unit 120, a similarity calculation unit 130, a video search unit 140, a first cluster acquisition unit 160, a second cluster acquisition unit 170, an object tag acquisition unit 180, and a third cluster acquisition unit 200 are provided.
- the first cluster acquisition unit 160, the second cluster acquisition unit 170, and the third cluster acquisition unit 200 are further provided.
- the first cluster acquisition unit 160 and the second cluster acquisition unit 170 may be the same as the configuration of the second embodiment (see FIG. 5).
- the third cluster acquisition unit 200 uses the information stored in the word cluster information storage unit 70 (that is, the result of clustering), and the cluster to which the information included in the object tag acquired by the object tag acquisition unit 180 belongs (hereinafter referred to as). , Appropriately referred to as "third cluster").
- the information about the third cluster acquired by the third cluster acquisition unit 200 is output to the similarity calculation unit 130.
- FIG. 13 is a flowchart showing the operation flow of the video search system according to the fourth embodiment.
- the same reference numerals are given to the same processes as those shown in FIGS. 7 and 11.
- the scene information acquisition unit 110 first acquires scene information from the stored video (step S101). Then, the first cluster acquisition unit 160 acquires the first cluster to which the information included in the scene information belongs by using the clustering result stored in the word cluster information storage unit 70 (step S102).
- the object tag acquisition unit 180 acquires the object tag from the accumulated video (step S301). Then, the third cluster acquisition unit 200 acquires the third cluster to which the information included in the object tag belongs by using the clustering result stored in the word cluster information storage unit 70 (step S401).
- the search query acquisition unit 120 acquires the search query entered by the user (step S102). Then, the second cluster acquisition unit 170 acquires the second cluster to which the information included in the search query belongs by using the clustering result stored in the word cluster information storage unit 70 (step S202).
- the similarity calculation unit 130 calculates the similarity between the scene information and the object tag and the search query by comparing the first cluster, the third cluster, and the second cluster (step S103).
- the similarity in the fourth embodiment is the first cluster (that is, the cluster to which the scene information belongs) and the third cluster (that is, the cluster to which the object tag belongs) and the second cluster (that is, the cluster to which the search query belongs). ) Is calculated as the degree of similarity.
- the video search unit 140 searches for the video corresponding to the search query based on the similarity (step S104).
- the similarity is calculated using the scene information, the object tag, and the information about the cluster to which the information included in the search query belongs. Will be. By doing so, the degree of similarity between the scene information and the object tag and the search query can be calculated as a more appropriate value. Therefore, it is possible to search the video corresponding to the search query more appropriately.
- the video search system described in Appendix 1 calculates the similarity between the scene information and the search query, the scene information acquisition unit that acquires the scene information indicating the video scene, the search query acquisition unit that acquires the search query, and the scene information. It is a video search system characterized by including a similarity calculation unit and a video search unit that searches for a video corresponding to the search query based on the similarity.
- the video search system described in Appendix 2 has a first cluster acquisition unit that acquires the first cluster to which the information included in the scene information belongs, and a second cluster that acquires the second cluster to which the information included in the search query belongs.
- the appendix 1 is further provided with an acquisition unit, wherein the similarity calculation unit compares the first cluster with the second cluster and calculates the similarity between the scene information and the search query. It is a video search system described in.
- the video search system according to the appendix 3 is the video search system according to the appendix 1 or 2, wherein the scene information includes information about a place where the video is taken.
- the video search system according to the appendix 4 is the video search system according to any one of the appendices 1 to 3, wherein the scene information includes information regarding the date and time when the video was taken.
- the video search system according to the appendix 5 is any one of the appendices 1 to 4, wherein the scene information includes information on the behavior of the imager of the image or the imaged person reflected in the image.
- the video search system according to Supplementary Note 6 is the video search system according to any one of Supplementary note 1 to 5, further comprising a scene information adding unit for adding the scene information to the video.
- the video search system according to Appendix 7 further includes an object tag acquisition unit that acquires an object tag associated with an object reflected in the image, and the similarity calculation unit includes the scene information and the object tag.
- the video search system according to the appendix 8 is the video search system according to the appendix 7, further comprising an object information adding unit for associating the object tag with the object reflected in the video.
- the video search system according to Appendix 9 is characterized in that the similarity calculation unit divides the video into a plurality of scene ranges based on the scene information and calculates the similarity for each scene range. It is a video search system according to any one of 8 to 8.
- the video search system according to the appendix 10 is the video search system according to any one of the appendices 1 to 9, wherein the search query is in a natural language.
- the video search method according to Appendix 11 acquires scene information indicating a video scene, acquires a search query, calculates the similarity between the scene information and the search query, and based on the similarity, said. It is a video search method characterized by searching for a video corresponding to a search query.
- Appendix 12 The computer program according to Appendix 12 acquires scene information indicating a video scene, acquires a search query, calculates the similarity between the scene information and the search query, and searches the search based on the similarity. It is a computer program characterized by operating a computer to search for a video corresponding to a query.
- Appendix 13 The recording medium described in Appendix 13 is a recording medium characterized in that the computer program described in Appendix 12 is recorded.
- the present invention can be appropriately modified within the scope of the claims and within a range not contrary to the gist or idea of the invention that can be read from the entire specification, and a video search system, a video search method, and a computer program accompanied by such changes are also possible. It is also included in the technical idea of the present invention.
- Video search system 110 Scene information acquisition unit 120 Search query acquisition unit 130 Similarity calculation unit 140
- Video search unit 150 Scene information assignment unit 160
- Second cluster acquisition unit 180 Object tag acquisition unit 190
- Object tag assignment unit 200 Third cluster acquisition unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022553334A JPWO2022070340A1 (https=) | 2020-09-30 | 2020-09-30 | |
| PCT/JP2020/037251 WO2022070340A1 (ja) | 2020-09-30 | 2020-09-30 | 映像検索システム、映像検索方法、及びコンピュータプログラム |
| US18/023,124 US20230297613A1 (en) | 2020-09-30 | 2020-09-30 | Video search system, video search method, and computer program |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/037251 WO2022070340A1 (ja) | 2020-09-30 | 2020-09-30 | 映像検索システム、映像検索方法、及びコンピュータプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022070340A1 true WO2022070340A1 (ja) | 2022-04-07 |
Family
ID=80949998
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/037251 Ceased WO2022070340A1 (ja) | 2020-09-30 | 2020-09-30 | 映像検索システム、映像検索方法、及びコンピュータプログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20230297613A1 (https=) |
| JP (1) | JPWO2022070340A1 (https=) |
| WO (1) | WO2022070340A1 (https=) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12130891B2 (en) * | 2020-11-05 | 2024-10-29 | Samsung Electronics Co., Ltd. | Method of live video event detection based on natural language queries, and an apparatus for the same |
| US12411827B2 (en) * | 2023-12-29 | 2025-09-09 | Dish Network Technologies India Private Limited | Proactively suggesting a digital medium and automatically generating a ribbon indicating the digital medium of interest to a user |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09128401A (ja) * | 1995-10-27 | 1997-05-16 | Sharp Corp | 動画像検索装置及びビデオ・オン・デマンド装置 |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050114357A1 (en) * | 2003-11-20 | 2005-05-26 | Rathinavelu Chengalvarayan | Collaborative media indexing system and method |
| US8126643B2 (en) * | 2007-12-28 | 2012-02-28 | Core Wireless Licensing S.A.R.L. | Method, apparatus and computer program product for providing instructions to a destination that is revealed upon arrival |
| EP2926269A4 (en) * | 2012-11-30 | 2016-10-12 | Thomson Licensing | METHOD AND DEVICE FOR VIDEO RECALL |
| CN106294344B (zh) * | 2015-05-13 | 2019-06-18 | 北京智谷睿拓技术服务有限公司 | 视频检索方法和装置 |
| US10311104B2 (en) * | 2016-04-13 | 2019-06-04 | Google Llc | Video competition discovery and recommendation |
| US20180101540A1 (en) * | 2016-10-10 | 2018-04-12 | Facebook, Inc. | Diversifying Media Search Results on Online Social Networks |
| US10061987B2 (en) * | 2016-11-11 | 2018-08-28 | Google Llc | Differential scoring: a high-precision scoring method for video matching |
| CN110110144A (zh) * | 2018-01-12 | 2019-08-09 | 天津三星通信技术研究有限公司 | 视频的处理方法和设备 |
| KR20200024541A (ko) * | 2018-08-28 | 2020-03-09 | 십일번가 주식회사 | 동영상 컨텐츠 검색 지원 방법 및 이를 지원하는 서비스 장치 |
| CN110688529A (zh) * | 2019-09-26 | 2020-01-14 | 北京字节跳动网络技术有限公司 | 用于检索视频的方法、装置和电子设备 |
| US11500927B2 (en) * | 2019-10-03 | 2022-11-15 | Adobe Inc. | Adaptive search results for multimedia search queries |
| US11302361B2 (en) * | 2019-12-23 | 2022-04-12 | Samsung Electronics Co., Ltd. | Apparatus for video searching using multi-modal criteria and method thereof |
| CN113094550B (zh) * | 2020-01-08 | 2023-10-24 | 百度在线网络技术(北京)有限公司 | 视频检索方法、装置、设备和介质 |
| US11386151B2 (en) * | 2020-04-11 | 2022-07-12 | Open Space Labs, Inc. | Image search in walkthrough videos |
| CN111611436B (zh) * | 2020-06-24 | 2023-07-11 | 深圳市雅阅科技有限公司 | 一种标签数据处理方法、装置以及计算机可读存储介质 |
-
2020
- 2020-09-30 JP JP2022553334A patent/JPWO2022070340A1/ja active Pending
- 2020-09-30 US US18/023,124 patent/US20230297613A1/en not_active Abandoned
- 2020-09-30 WO PCT/JP2020/037251 patent/WO2022070340A1/ja not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09128401A (ja) * | 1995-10-27 | 1997-05-16 | Sharp Corp | 動画像検索装置及びビデオ・オン・デマンド装置 |
Non-Patent Citations (1)
| Title |
|---|
| MORIMOTO, MAYO; MIKAMI, SAWAKO; MOTHASHI, YOSUKE: "A Study of Video Lifelog Retrieval System by using Natural Language", IPSJ SIG TECHNICAL REPORT, vol. 2020, no. 34 (2020-GN-109), 16 January 2020 (2020-01-16), JP , pages 1 - 8, XP009536891, ISSN: 0919-6072 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2022070340A1 (https=) | 2022-04-07 |
| US20230297613A1 (en) | 2023-09-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4337064B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| US8971641B2 (en) | Spatial image index and associated updating functionality | |
| JP5489660B2 (ja) | 画像管理装置およびその制御方法およびプログラム | |
| Karthika et al. | Digital video copy detection using steganography frame based fusion techniques | |
| Pita et al. | A Spark-based Workflow for Probabilistic Record Linkage of Healthcare Data. | |
| CN112765197B (zh) | 数据查询方法、装置、计算机设备和存储介质 | |
| JP6377917B2 (ja) | 画像検索装置及び画像検索プログラム | |
| JP7116969B2 (ja) | 2次元マップ生成装置、2次元マップ生成方法および2次元マップ生成用プログラム | |
| JP2006216026A (ja) | ディジタル写真の時間的イベント・クラスタリングのための有効な方法 | |
| JP7416091B2 (ja) | 映像検索システム、映像検索方法、及びコンピュータプログラム | |
| JP5866064B2 (ja) | 画像検索装置、画像検索方法、および記録媒体 | |
| WO2022070340A1 (ja) | 映像検索システム、映像検索方法、及びコンピュータプログラム | |
| JP6314071B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
| US9378248B2 (en) | Retrieval apparatus, retrieval method, and computer-readable recording medium | |
| US8533196B2 (en) | Information processing device, processing method, computer program, and integrated circuit | |
| US20180189602A1 (en) | Method of and system for determining and selecting media representing event diversity | |
| JP5265610B2 (ja) | 関連語抽出装置 | |
| CN120804394A (zh) | 基于大模型检索增强生成的服务推荐决策方法和相关设备 | |
| JPWO2022070340A5 (https=) | ||
| US12373854B2 (en) | Video providing system, video providing method, and computer program | |
| Trad et al. | Large scale visual-based event matching | |
| JP2010009237A (ja) | 多言語間類似文書検索装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体 | |
| JP7646091B2 (ja) | 情報処理装置、検索方法、及び検索プログラム | |
| JP6904619B1 (ja) | 検索方法 | |
| CN115269785B (zh) | 搜索方法、装置、计算机设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20956268 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022553334 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20956268 Country of ref document: EP Kind code of ref document: A1 |