WO2021221209A1 - Procédé et appareil de recherche d'informations dans une vidéo - Google Patents

Procédé et appareil de recherche d'informations dans une vidéo Download PDF

Info

Publication number
WO2021221209A1
WO2021221209A1 PCT/KR2020/005718 KR2020005718W WO2021221209A1 WO 2021221209 A1 WO2021221209 A1 WO 2021221209A1 KR 2020005718 W KR2020005718 W KR 2020005718W WO 2021221209 A1 WO2021221209 A1 WO 2021221209A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
scene
shot
sentence
metadata
Prior art date
Application number
PCT/KR2020/005718
Other languages
English (en)
Korean (ko)
Inventor
구원용
홍의재
Original Assignee
엠랩 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 엠랩 주식회사 filed Critical 엠랩 주식회사
Priority to PCT/KR2020/005718 priority Critical patent/WO2021221209A1/fr
Priority to KR1020207014777A priority patent/KR20210134866A/ko
Publication of WO2021221209A1 publication Critical patent/WO2021221209A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present invention relates to a method for retrieving information inside a moving picture.
  • it is intended to propose a method of searching only a specific section in which the user provides information desired by the user and providing it to the user.
  • a method for searching information inside a video includes: receiving a sentence as a search word from a user; Searching for a scene having the highest degree of matching with the search term in a video indexed for each scene by providing metadata in the form of a sentence for each scene; and reproducing only the start point to the end point of the searched scene in the video It is characterized in that it includes;
  • the user selects one video to be searched, and searches for content to be searched in the selected video in the form of a sentence.
  • the moving picture is composed of at least one scene, and a summary sentence indicating the contents of each of the at least one scene is given in the form of a sentence to each of the at least one scene and used as metadata. characterized by being
  • the degree of matching becomes 0 when two sentences are the same, and the value increases as the similarity between the two sentences decreases. It is determined using a Levenshtein distance technique. do.
  • a method for retrieving information inside a moving image includes segmenting the moving image in shot units; applying a tag set to each of the segmented shots, and deriving keywords that highlight the characteristics of each shot for each shot through topic analysis of the tag set; and between adjacent shots determined based on the keyword and generating the scene by performing hierarchical clustering based on the similarity.
  • the tag set is composed of a video tag and an audio tag.
  • a scene tag is assigned to each scene, and metadata is assigned in the form of a sentence, so that the moving picture is indexed for each scene.
  • metadata converts at least one keyword derived from each of at least one shot constituting one scene and voice data of each of the at least one shot through STT (Speech To Text) technique. It is characterized in that it is generated based on one text.
  • STT Seech To Text
  • At least one sentence including the keyword is selected from at least one text data obtained by converting the voice data of each of the at least one shot constituting the one scene into STT, and the selected It is characterized by generating a single sentence through deep learning from at least one sentence, and storing the generated single sentence as metadata describing the scene.
  • each image into an HSV color space; generating three time series data including median values of H (hue), S (saturation), and v (brightness); and setting the corresponding point as the start or end point of the shot when all three inflection points detected in the three time series data coincide.
  • an apparatus for searching video internal information includes: a search word input unit for receiving a sentence as a search word from a user; a video section search unit for searching a specific section having the highest degree of relevance to the search term within the video; and a video section playback unit that reproduces only the specific section within the video, wherein the video is segmented based on meaning, and a sentence explaining the meaning of the segmented section is provided as metadata for each segmented section, It is characterized in that the relevance to the search word is determined by using the sentence as an index along the timeline of the video.
  • an apparatus and method for searching video internal information provides the user with only a specific section within the video that the user wants to search, so that the user does not have to watch the video from beginning to end. It has the effect of being able to grasp information quickly and easily.
  • the user can check in advance what content is included before watching the video.
  • FIG. 1 illustrates an example in which components constituting a moving picture are divided into a scene and a shot as a preferred embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for retrieving information inside a moving picture as a preferred embodiment of the present invention.
  • FIG. 3 is a diagram showing an internal configuration of an apparatus for searching video internal information as a preferred embodiment of the present invention.
  • FIG. 4 shows an example of dividing a shot in a moving picture as a preferred embodiment of the present invention.
  • FIG. 5 shows an example of assigning a tag set to a shot as a preferred embodiment of the present invention.
  • FIG. 6 shows an example of grouping a shot into a scene as a preferred embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for retrieving information in a moving picture as another preferred embodiment of the present invention.
  • FIG. 8 shows an embodiment of searching for information inside a moving picture as a preferred embodiment of the present invention.
  • a method for searching information inside a video includes the steps of: receiving a sentence as a search word from a user; Metadata is provided in the form of a sentence for each scene and indexed in a scene unit within a video Searching for a scene having the highest degree of matching with the search term; and reproducing only the start point to the end point of the searched scene in the video; characterized in that it comprises a.
  • FIG. 1 shows an example in which components constituting a moving picture are divided into a scene and a shot as a preferred embodiment of the present invention.
  • the moving picture 100 is segmented into n shots (n is a natural number) 111, 113, 121, 123, 125, 131, 133.
  • n shots is a natural number
  • FIG. 4 For a method of classifying shots in a video, refer to FIG. 4 .
  • At least one shot is grouped into units having similar meanings or subjects to constitute a scene.
  • the first shot 111 and the second shot 113 are grouped into the first scene 110
  • the third shot 121 , the fourth shot 123 , and the fifth shot are grouped together.
  • 125 may be grouped into the second scene 120
  • the sixth shot 131 and the seventh shot 133 may be grouped into the third scene 130 .
  • a subject may include at least one meaning.
  • FIG. 2 is a flowchart of a method for retrieving video internal information as a preferred embodiment of the present invention.
  • the user selects the video and inputs a search word through a search word input interface provided when video selection is activated.
  • the video is indexed in units of scenes by providing metadata in the form of sentences for each scene.
  • the video internal information search apparatus searches for a specific section that matches the search word or has high relevance in the video, and reproduces only the searched specific section.
  • the video internal information search apparatus searches for a scene with the highest matching degree with the search word in the video (S210). and (S220), only the start point to the end point of the searched scene is played back (S230).
  • FIG. 3 shows an internal configuration diagram of an apparatus 300 for searching video internal information as a preferred embodiment of the present invention.
  • 4 to 6 show detailed functions of the video section search unit 320 constituting the apparatus 300 for searching video internal information.
  • 7 is a flowchart showing a search for video inside information.
  • a method for searching video internal information in a device for searching video internal information will be described with reference to FIGS. 3 to 7 .
  • the apparatus 300 for searching video internal information may be implemented in a terminal, a computer, a notebook computer, a handheld device, or a wearable device.
  • the apparatus 300 for searching video internal information may be implemented in the form of a terminal having an input unit for receiving a user's search word, a display for displaying a video, and a processor.
  • the method of searching the video internal information may be implemented by being installed in the form of an application in the terminal.
  • the apparatus 300 for searching video internal information includes a search word input unit 310 , a video section search unit 320 , and a video section playback unit 330 .
  • the video section search unit 320 includes a shot segmentation unit 340 , a scene generation unit 350 , a metadata generation unit 360 , and a video index unit 370 .
  • the search word input unit 310 receives a search word from the user in the form of a sentence.
  • the user can use all forms such as voice search, text search, and image search.
  • An example of an image search is a case where the contents scanned from a book are converted into text and used as a search term.
  • the search word input unit 310 may be implemented as a keyboard, a stylus pen, a microphone, or the like.
  • the video section search unit 320 searches for a specific section in the video that matches the search word input from the search word input unit 310 or has content related to the search word. As an embodiment, the video section search unit 320 searches for a scene in which a sentence having the highest degree of matching with the input search word sentence is assigned as metadata.
  • the video section search unit 320 indexes and manages videos so that information can be searched within a single video.
  • the shot segmentation unit 340 segments the video in shot units (S710), assigns a tag set to each segmented shot (S720), and adds a tag set to each shot.
  • a keyword is derived for each shot by applying a topic analysis algorithm (S730). The keyword is derived in the form of identifying and discriminating the content of each of at least one shot constituting the moving picture.
  • the scene generator 350 determines the similarity between adjacent before and after shots on the timeline of the video. The similarity determination may be performed based on a keyword derived from each shot, an object detected in each shot, a voice feature detected in each shot, and the like. As a preferred embodiment of the present invention, the scene generator 350 may create a scene by grouping shots having a high degree of similarity between adjacent shots based on a keyword ( S740 ).
  • An algorithm for performing grouping may include a hierarchical clustering technique (S750). In this case, a plurality of shots included in one scene may be interpreted as delivering content having similar meaning or subject matter. For an example of grouping shots through hierarchical clustering in the scene generator 350 , refer to FIG. 8 .
  • the scene generator 350 assigns a scene tag to each created scene (351, 353, 355).
  • the scene tag may be generated based on an image tag assigned to each of at least one shot included in each scene.
  • a scene tag may be generated by a combination of a tag set assigned to each of at least one shot constituting a scene.
  • the scene keyword may be generated by a combination of keywords derived from each of at least one shot constituting the scene.
  • the scene tag may serve as a weight when generating metadata for each scene.
  • the metadata generator 360 analyzes the scenes generated by the scene generator 350, and provides metadata for each scene, thereby supporting a search for internal video content (S760). Metadata assigned to each scene acts as an index.
  • the metadata is in the form of a summary sentence indicating the contents of each scene.
  • the metadata may be generated by further referring to a scene tag assigned to each of at least one shot constituting one scene.
  • Scene tags can serve as weights when performing deep learning to generate metadata. For example, weight may be assigned to image tag information and voice tag information extracted from at least one tag set included in the scene tag.
  • the metadata is generated based on STT (Speech to Text) data of voice data extracted from at least one shot constituting each scene, and a scene tag extracted from each of at least one shot constituting each scene.
  • STT Seech to Text
  • a summary sentence is generated by performing deep learning machine learning on at least one STT data and at least one scene tag obtained from at least one shot constituting one scene. Metadata is given to each scene by using a summary sentence generated through machine learning for each scene.
  • the video indexing unit 370 uses metadata assigned to each scene of the video S300 as an index. For example, if the video S300 is classified into three scenes, the video indexing unit 370 uses the first sentence 371 given as metadata to the first scene 351 (0:00 to t1). Used as an index, the second sentence 373 assigned as metadata to the second scene 353 (t1 to t2) is used as an index, and given as metadata to the third scene 355 (t2 to t3) The third sentence 375 is used as an index.
  • the user's search sentence is a first search sentence (S311)
  • the first search sentence (S311) is a first of a plurality of metadata (371, 373, 375) allocated to each of a plurality of scenes in one video.
  • the sentence 371 has the highest degree of matching
  • the video section having the highest degree of matching with the search sentence input in the search word input unit 310 is the first scene 351 .
  • the video section reproducing unit 330 reproduces only the section 0:00 to t1 of the first scene 351 in the video S300.
  • the video indexing unit 370 uses the Levenshtein distance technique, in which the value becomes 0 when two sentences are identical and the value increases as the similarity between the two sentences decreases. can be determined, but is not limited thereto, and various algorithms for determining the similarity between two sentences can be used.
  • the user's search text is the second search text (S313)
  • the second search text (S313) is the first of a plurality of metadata (371, 373, 375) allocated to each of a plurality of scenes in one video
  • the video section reproducing unit 330 reproduces only the section t1 to t2 of the second scene 353 in the video S300.
  • the video indexing unit 370 determines that the user's search sentence is the third search sentence (S315) and the third search sentence (S315) has the highest degree of matching with the third sentence, it is input into the search word input unit 310 It is determined that the video section having the highest degree of matching with the search sentence is the third scene 355 .
  • the video section reproducing unit 330 reproduces only the section t2 to t3 of the third scene 355 in the video S300.
  • FIG. 4 shows an example of dividing a shot in a moving picture as a preferred embodiment of the present invention.
  • the x-axis represents time (sec)
  • the y-axis represents a representative HSV value.
  • the shot segmentation unit 340 of the video internal information search apparatus extracts frames from the video S300 at regular intervals as images, and then converts each image into an HSV color space. Then, three time series data composed of representative values (median) of H (hue) (S401), S (saturation) (S403) and v (brightness) (S405) of each image are generated. And, when the inflection points of each of the three time series data of H (hue) (S401), S (saturation) (S403), and v (brightness) (S405) all match or are within a certain time period, the corresponding point of the shot Set as a starting point or an ending point.
  • FIG. 5 shows an example of assigning a tag set to a shot as a preferred embodiment of the present invention.
  • FIG. 5 illustrates an example in which the first tag set 550 is applied to the first shot 510 .
  • the shot 510 is classified into image data 510a and audio data 510b.
  • image data 510a after extracting images per second (520a), an object is detected in each image (530a). Then, an image tag is generated based on the detected object (540a).
  • Image tags apply object annovation or labeling to objects detected in images to construct learning data, and then perform object recognition through deep learning related to image recognition. Information obtained by extracting objects from each image can be created based on
  • a tag set 550 is generated.
  • the tag set refers to a combination of the image tag 540a and the voice tag 540b detected during the time when the first shot 510 is, for example, between 00:00 and 10:00 seconds.
  • FIG. 6 shows an example of grouping a shot into a scene as a preferred embodiment of the present invention.
  • FIG. 6 illustrates an example of creating a scene through hierarchical clustering 640 after determining the degree of similarity based on the keyword 630 .
  • FIG. 8 shows an embodiment of searching for information inside a moving picture as a preferred embodiment of the present invention.
  • FIG. 8 shows an example in which the video 800 selected by the user in the shot segmentation unit is segmented into seven shots 801 to 807.
  • the device for searching video internal information generates a tag set by extracting a video tag and an audio tag from each of the seven shots (801 to 807), and then performs topic analysis such as LDA on the tag set for each shot (801 to 807). 807), a keyword is derived.
  • the first shot 801 is in the range of 0:00 to 0:17, and the first keyword derived from the first shot 801 is (Japan, Corona 19, severe) 801a ) am.
  • the second shot 802 is a section from 0:18 to 0:29, and the second keyword derived from the second shot 802 is (Japan, Corona 19, Spread) 802a.
  • the third shot 803 is a section from 0:30 to 0:34, and the third keyword derived from the third shot 803 is (New York, Corona 19, Europe, Inflow) 803a.
  • the fourth shot 804 is a section from 0:34 to 0:38, and the fourth keyword derived from the fourth shot 804 is (US, Corona 19, death) 804a.
  • the fifth shot 805 is a section from 0:39 to 0:41, and the fifth keyword derived from the fifth shot 805 is (US, Corona 19, confirmed, dead).
  • the sixth shot 806 is a section from 0:42 to 0:45, and the sixth keyword derived from the sixth shot 806 is (US, Corona 19, death) 806a.
  • the seventh shot 807 is a section from 0:46 to 0:50, and the seventh keyword derived from the seventh shot 807 is (US, Corona 19, death) 807a.
  • the scene generator groups at least one shot based on the similarity.
  • the degree of similarity can be determined based on keywords extracted from each shot, and video tags and voice tags can be further referred to.
  • the first shot 801 and the second shot 802 are grouped into the first scene 810
  • the third shot 803 is grouped with the second scene 820
  • the fourth to seventh shots 804 to 807 are grouped into the third scene 830 .
  • the first scene 810 is a section from 0:00 to 0:29, and the first keyword derived from the first shot 801 is derived from (Japan, Corona 19, severe) 801a and the second shot 802 .
  • the second keyword used is (Japan, Corona 19, Spread) (802a) to "Japan Corona 19 continues to spread” (810b) with reference to the voice data of the first shot 801 and the second shot 802. metadata is provided.
  • the second scene 820 is a section from 0:30 to 0:34, and the third keywords derived from the third shot 803 are (New York, Corona 19, Europe, Inflow) 803a and the third shot 803. Referring to the voice data of the "New York's Corona 19 is said to be coming from Europe" (820b) is given.
  • the third scene 830 is a section from 0:35 to 0:50, and is derived from the fourth keyword (mi, corona 19, death) 804a derived from the fourth shot 804 and the fifth shot 805.
  • the fifth keyword (US, Corona 19, confirmed, dead)
  • the sixth keyword (US, Corona 19, dead)
  • (806a) derived from the sixth shot 806, and the fourth shot 804 to the sixth shot 806 ) with reference to the voice data of "This is the news of the death of COVID-19 in the United States.” (830b) is given.
  • the user when the user selects the video 800 and the search word input interface is activated, the user inputs the content to be searched in the form of a sentence. For example, a search term sentence "What is the current state of Corona in the United States?" (840) may be input.
  • the video indexing unit searches for metadata with the highest degree of matching with the search word sentence 840 by using the metadata given to each scene as an index.
  • the degree of matching is determined based on the similarity between the search word 840 and the metadata 810b, 820b, and 830b, and becomes 0 when two sentences are the same, and the Levinstein distance ( Levenshtein distance) technique can be used.
  • the video indexer searches for the metadata most similar to the user search term (840) "How is the current state of Corona in the United States?"
  • the finished third scene 830 is played back to the user.
  • the search word 840 only the section of the third scene 830 corresponding to the section 0:35 to 0:50 related to the search word 840 in the video 800 can be searched and viewed.
  • the video indexing unit may provide the user with metadata assigned to each scene 810 to 830 constituting the video as an index. Users can preview the contents of the video in advance through the video index.
  • Methods according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium.
  • the computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.
  • the program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Dans un mode de réalisation préféré de la présente invention, un procédé de recherche d'informations dans une vidéo comprend les étapes consistant : à recevoir une phrase comme terme de recherche en provenance d'un utilisateur ; à rechercher une scène ayant un degré de correspondance le plus élevé avec le terme de recherche dans une vidéo indexée par une unité de scène dans laquelle des métadonnées sont fournies sous la forme d'une phrase pour chaque scène ; et à reproduire uniquement un point de début et un point final de la scène recherchée dans la vidéo.
PCT/KR2020/005718 2020-04-29 2020-04-29 Procédé et appareil de recherche d'informations dans une vidéo WO2021221209A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/KR2020/005718 WO2021221209A1 (fr) 2020-04-29 2020-04-29 Procédé et appareil de recherche d'informations dans une vidéo
KR1020207014777A KR20210134866A (ko) 2020-04-29 2020-04-29 동영상 내부의 정보를 검색하는 방법 및 장치

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2020/005718 WO2021221209A1 (fr) 2020-04-29 2020-04-29 Procédé et appareil de recherche d'informations dans une vidéo

Publications (1)

Publication Number Publication Date
WO2021221209A1 true WO2021221209A1 (fr) 2021-11-04

Family

ID=78374167

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/005718 WO2021221209A1 (fr) 2020-04-29 2020-04-29 Procédé et appareil de recherche d'informations dans une vidéo

Country Status (2)

Country Link
KR (1) KR20210134866A (fr)
WO (1) WO2021221209A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116702707A (zh) * 2023-08-03 2023-09-05 腾讯科技(深圳)有限公司 基于动作生成模型的动作生成方法、装置及设备
CN117633297A (zh) * 2024-01-26 2024-03-01 江苏瑞宁信创科技有限公司 基于注释的视频检索方法、装置、系统及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102670850B1 (ko) * 2023-05-04 2024-05-30 주식회사 액션파워 비디오 분할에 기초하여 비디오를 검색하는 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080111376A (ko) * 2007-06-18 2008-12-23 한국전자통신연구원 디지털 비디오 특징점 비교 방법 및 이를 이용한 디지털비디오 관리 시스템
KR20150022088A (ko) * 2013-08-22 2015-03-04 주식회사 엘지유플러스 컨텍스트 기반 브이오디 검색 시스템 및 이를 이용한 브이오디 검색 방법
JP2016035607A (ja) * 2012-12-27 2016-03-17 パナソニック株式会社 ダイジェストを生成するための装置、方法、及びプログラム
KR20190114548A (ko) * 2018-03-30 2019-10-10 주식회사 엘지유플러스 콘텐츠 제어 장치 및 그 방법
KR20190129266A (ko) * 2018-05-10 2019-11-20 네이버 주식회사 컨텐츠 제공 서버, 컨텐츠 제공 단말 및 컨텐츠 제공 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080111376A (ko) * 2007-06-18 2008-12-23 한국전자통신연구원 디지털 비디오 특징점 비교 방법 및 이를 이용한 디지털비디오 관리 시스템
JP2016035607A (ja) * 2012-12-27 2016-03-17 パナソニック株式会社 ダイジェストを生成するための装置、方法、及びプログラム
KR20150022088A (ko) * 2013-08-22 2015-03-04 주식회사 엘지유플러스 컨텍스트 기반 브이오디 검색 시스템 및 이를 이용한 브이오디 검색 방법
KR20190114548A (ko) * 2018-03-30 2019-10-10 주식회사 엘지유플러스 콘텐츠 제어 장치 및 그 방법
KR20190129266A (ko) * 2018-05-10 2019-11-20 네이버 주식회사 컨텐츠 제공 서버, 컨텐츠 제공 단말 및 컨텐츠 제공 방법

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116702707A (zh) * 2023-08-03 2023-09-05 腾讯科技(深圳)有限公司 基于动作生成模型的动作生成方法、装置及设备
CN116702707B (zh) * 2023-08-03 2023-10-03 腾讯科技(深圳)有限公司 基于动作生成模型的动作生成方法、装置及设备
CN117633297A (zh) * 2024-01-26 2024-03-01 江苏瑞宁信创科技有限公司 基于注释的视频检索方法、装置、系统及介质
CN117633297B (zh) * 2024-01-26 2024-04-30 江苏瑞宁信创科技有限公司 基于注释的视频检索方法、装置、系统及介质

Also Published As

Publication number Publication date
KR20210134866A (ko) 2021-11-11

Similar Documents

Publication Publication Date Title
WO2021221209A1 (fr) Procédé et appareil de recherche d'informations dans une vidéo
CN108829893B (zh) 确定视频标签的方法、装置、存储介质和终端设备
WO2014092446A1 (fr) Système de recherche et procédé de recherche pour images à base d'objet
WO2010117213A2 (fr) Appareil et procédé destinés à fournir des informations en lien avec des programmes de radiodiffusion
WO2020080606A1 (fr) Procédé et système de génération automatique de métadonnées intégrées à un contenu vidéo à l'aide de métadonnées de vidéo et de données de script
US6507838B1 (en) Method for combining multi-modal queries for search of multimedia data using time overlap or co-occurrence and relevance scores
US6578040B1 (en) Method and apparatus for indexing of topics using foils
US8126897B2 (en) Unified inverted index for video passage retrieval
CN111078943B (zh) 一种视频文本摘要生成方法及装置
KR101516995B1 (ko) 컨텍스트 기반 브이오디 검색 시스템 및 이를 이용한 브이오디 검색 방법
US20110078176A1 (en) Image search apparatus and method
WO2017188606A2 (fr) Dispositif terminal et procédé de fourniture d'informations supplémentaires
WO2021167238A1 (fr) Procédé et système de création automatique d'une table des matières de vidéo sur la base d'un contenu
KR101640317B1 (ko) 오디오 및 비디오 데이터를 포함하는 영상의 저장 및 검색 장치와 저장 및 검색 방법
KR20200063316A (ko) 각본 기반의 영상 검색 장치 및 방법
Luo et al. Exploring large-scale video news via interactive visualization
JP2007328713A (ja) 関連語表示装置、検索装置、その方法及びプログラム
US20080016068A1 (en) Media-personality information search system, media-personality information acquiring apparatus, media-personality information search apparatus, and method and program therefor
WO2021221210A1 (fr) Procédé et appareil de génération d'itinéraire intelligent
Sack et al. Automated annotations of synchronized multimedia presentations
Aletras et al. Computing similarity between cultural heritage items using multimodal features
Tapu et al. TV news retrieval based on story segmentation and concept association
Kim et al. Content-Based Video Indexing and Retrieval--A Natural Language Approach--
WO2015190834A1 (fr) Procédé et recherche et de fourniture de vidéo
WO2016089110A1 (fr) Dispositif et procédé de génération de ressource de connaissances basée sur une entrée

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933240

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20933240

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 11/04/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20933240

Country of ref document: EP

Kind code of ref document: A1