WO2021149923A1 - Method and apparatus for providing image search - Google Patents
Method and apparatus for providing image search Download PDFInfo
- Publication number
- WO2021149923A1 WO2021149923A1 PCT/KR2020/018892 KR2020018892W WO2021149923A1 WO 2021149923 A1 WO2021149923 A1 WO 2021149923A1 KR 2020018892 W KR2020018892 W KR 2020018892W WO 2021149923 A1 WO2021149923 A1 WO 2021149923A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scene
- keyword
- keywords
- search
- image
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
Definitions
- the following embodiments relate to an image search providing method and apparatus.
- Users are often interested in some content of a specific image. That is, there are many cases where the user is interested in the content of a part of the image rather than the entire image. For example, there may be a case where a user who wants to watch a soccer relay wants to watch only a scene in which a specific player scores a goal, rather than watching the entire soccer relay video. Also, there may be a case where the viewer of the entertainment program wants to view only a specific scene.
- the general video search method since the entire soccer relay or the entire entertainment program is the target of the search, it is impossible to search for some scenes of the video desired by the user.
- Embodiments attempt to extract keywords at regular time intervals from an image.
- Embodiments intend to generate metadata in which a keyword corresponding to a scene and time information are matched.
- Embodiments are intended to provide a search for an image and a search for a specific scene based on the generated metadata.
- An image search providing method comprises: receiving one or more keywords corresponding to each shot of an image and a confidence value corresponding to the keywords from an image analysis engine; in successive shots, determining shots in which the match ratio of the candidate keyword of each shot is equal to or greater than a second threshold value as one scene; determining a final keyword representing the scene by statistically processing confidence values corresponding to keywords of shots constituting the scene; receiving a scene search request comprising a search query; determining a scene corresponding to the search query based on the final keyword; and providing an image corresponding to the determined scene.
- the method of providing an image search may further include modifying the final keyword based on the search query.
- the first threshold value may be determined based on a distribution of the confidence values.
- the determining of the final keyword may include: determining a weight for each shot of the shots constituting the scene; weighted summing confidence values corresponding to keywords of shots constituting the scene based on the weights for each shot; and determining a keyword whose weighted sum is equal to or greater than a third threshold value as the final keyword.
- the determining of the weight for each shot may include determining the weight for each shot of the shots constituting the scene based on the number of candidate keywords included in each of the shots constituting the scene.
- the third threshold value may be determined based on a distribution of the confidence values corresponding to the keywords of shots constituting the scene.
- the weighted summing may include weighted summing confidence values corresponding to the candidate keywords of the shots constituting the scene based on the weight.
- the determining of the scene corresponding to the search query may include: comparing the search query with a keyword equal to or greater than the third threshold; and determining a scene corresponding to the search query based on the comparison result.
- the method of providing an image search further includes adding one or more relational keywords having a predetermined relation with the keywords equal to or greater than the third threshold, and determining a scene corresponding to the search query includes the relation
- the method may include determining a scene corresponding to the search query by further considering keywords.
- the image analysis engine may include an external image analysis engine that receives the image and generates the first metadata.
- An image search providing apparatus receives one or more keywords corresponding to each shot of an image and a confidence value corresponding to the keyword from an image analysis engine, and in successive shots, The shots in which the matching ratio of the candidate keywords are equal to or greater than the second threshold are determined as one scene, and confidence values corresponding to the keywords of shots constituting the scene are statistically processed to determine the final keyword representing the scene, and a processor that receives a scene search request including a search query, determines a scene corresponding to the search query based on the final keyword, and provides an image corresponding to the determined scene.
- the processor may modify the final keyword based on the search query/
- the first threshold value may be determined based on a distribution of the confidence values.
- the processor determines a weight for each shot of the shots constituting the scene, weights and sums confidence values corresponding to keywords of the shots constituting the scene based on the weight for each shot, and the weighted sum is a third threshold value
- the above keywords may be determined as the final keywords.
- the processor may determine a weight for each shot of the shots constituting the scene based on the number of the candidate keywords included in each of the shots constituting the scene.
- the third threshold value may be determined based on a distribution of the confidence values corresponding to the keywords of shots constituting a scene.
- the processor may weight-add confidence values corresponding to the candidate keywords of the shots constituting the scene based on the weight.
- the processor may compare the search query with a keyword equal to or greater than the third threshold, and determine a scene corresponding to the search query based on the comparison result.
- the processor may determine a scene corresponding to the search query by adding one or more relational keywords having a predetermined relation with a keyword equal to or greater than the third threshold value, and further considering the relational keyword.
- Embodiments may extract keywords from an image at regular time intervals.
- Embodiments may generate metadata in which a keyword corresponding to a scene and time information are matched.
- Embodiments may provide image search based on the generated metadata.
- FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment.
- FIG. 2 is a diagram for explaining a method of operating an image search providing system according to an exemplary embodiment.
- FIG. 3 is a diagram illustrating an example of an image search result page according to an exemplary embodiment.
- FIG. 4 is a flowchart illustrating a method for providing an image search according to an exemplary embodiment.
- FIG. 5 is a diagram for describing a specific method of determining a scene according to an exemplary embodiment.
- first or second may be used to describe various elements, but these terms should be understood only for the purpose of distinguishing one element from another element.
- a first component may be termed a second component, and similarly, a second component may also be termed a first component.
- the embodiments may be implemented in various types of products, such as personal computers, laptop computers, tablet computers, smart phones, televisions, smart home appliances, intelligent cars, kiosks, wearable devices, and the like.
- products such as personal computers, laptop computers, tablet computers, smart phones, televisions, smart home appliances, intelligent cars, kiosks, wearable devices, and the like.
- FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment.
- a network environment may include a plurality of terminals 110 , 120 , 130 , 140 , a plurality of servers 150 , 160 , and a network 170 .
- FIG. 1 is an example for the description of the invention, and the number of terminals or the number of servers is not limited as in FIG. 1 .
- the plurality of terminals 110 , 120 , 130 , and 140 may be a fixed terminal implemented as a computer device or a mobile terminal.
- a plurality of terminals 110 , 120 , 130 and 140 a smart phone, a mobile phone, a navigation device, a computer, a notebook computer, a digital broadcasting terminal, a PDA (Personal Digital Assistants), a PMP (Portable Multimedia Player) , tablet PC, HMD (Head mounted Display), TV, smart TV, etc.
- the terminal 110 may communicate with the other terminals 120 , 130 , 140 and/or the servers 150 and 160 through the network 170 using a wireless or wired communication method.
- the server 150 may communicate with the terminals 110 , 120 , 130 , 140 and/or other servers 160 through the network 170 using a wireless or wired communication method.
- the communication method is not limited, and not only a communication method using a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 170 may include, but also short-range wireless communication between devices may be included.
- the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , the Internet, and the like.
- PAN personal area network
- LAN local area network
- CAN campus area network
- MAN metropolitan area network
- WAN wide area network
- BBN broadband network
- the network 170 may include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree, or a hierarchical network, etc. not limited
- Each of the servers 150 and 160 communicates with the plurality of terminals 110 , 120 , 130 , 140 and the network 170 through a computer device or a plurality of computer devices to provide commands, codes, files, contents, services, etc. can be implemented with
- the server 150 may provide a file for installing an application to the terminal 110 connected through the network 170 .
- the terminal 110 may install the application using the file provided from the server 150 .
- OS operating system
- at least one program eg, a browser or the installed application
- the service provided by the server 150 or content can be provided.
- the terminal 110 transmits a service request message to the server 150 through the network 170 under the control of the application
- the server 150 transmits a code corresponding to the service request message to the terminal 110 . can be transmitted, and the terminal 110 can provide content to the user by composing and displaying a screen according to the code according to the control of the application.
- FIG. 2 is a diagram for explaining a method of operating an image search providing system according to an exemplary embodiment.
- an image search providing system may include a terminal 210 , an image search providing apparatus 220 , and an image analysis engine 230 .
- the image search providing apparatus 220 includes a processor 221 and a database 222 .
- the terminal 210 according to an embodiment may be one of the terminals 110 to 140 of FIG. 1
- the image search providing apparatus 220 and the image analysis engine 230 are the servers 150 and 160 of FIG. 1 . can be one of
- the image search providing apparatus 220 when a user requests a scene search including a search query to the image search providing apparatus 220 through the terminal 210, the image search providing apparatus 220 provides a search query It is possible to determine a scene corresponding to , and provide an image corresponding to the scene to the user.
- the image search providing system utilizes keywords, appearance frequency, and relationship derived from one scene for the search, and furthermore, the user search results and viewing records are also utilized for the search, so that more accurate search results can be provided. there is.
- the image search providing apparatus 220 may extract keywords related to an image corresponding to the corresponding time at regular time intervals. More specifically, the image search providing apparatus 220 may transmit an image corresponding to the corresponding time at regular time intervals to the image analysis engine 230 , and receive a keyword corresponding to each image from the image analysis engine 230 .
- an image may be divided into minimum time intervals, and an image corresponding to each time may be referred to as a shot.
- the image search providing apparatus 220 may transmit the shot to the image analysis engine 230 , and may receive first metadata corresponding to each shot from the image analysis engine 230 .
- the first metadata may include one or more keywords corresponding to the image and a confidence value corresponding to the keywords.
- the confidence value may be a numerical value relating to the degree of relation between the corresponding keyword and the shot. For example, the confidence value may be a value between 0 and 1, and the closer to 1, the higher the degree of the keyword related to the corresponding shot.
- the image search providing apparatus 220 may determine a keyword having a confidence value equal to or greater than a first threshold value among one or more keywords as a candidate keyword.
- the image search providing apparatus 220 may determine a scene including one or more shots based on the first metadata. Through this, the image search providing apparatus 220 may also determine a scene change time based on the first metadata. As an example, the image search providing apparatus 220 may determine, among consecutive shots, shots in which the match ratio of the candidate keyword of each shot is equal to or greater than a second threshold value as one scene.
- a specific method of determining a scene based on the first metadata will be described in detail below with reference to FIG. 5 .
- the image search providing apparatus 220 may determine second metadata corresponding to the scene, and may determine a scene corresponding to the search query input by the user based on the second metadata of the scene. For example, the image search providing apparatus 220 may statistically process confidence values corresponding to keywords of shots constituting a scene to determine a final keyword representing the scene.
- the image analysis engine 230 may include an external image analysis engine that receives an image and generates first metadata corresponding to the input image.
- the image analysis engine 230 may be a google vision API.
- the google vision API builds metadata by finding the dominant object included in the image, and can classify objects in the image into thousands of categories using the built metadata.
- the google vision API is merely exemplary, and may be employed and applied to various types of models or devices that output object recognition and corresponding metadata other than the above-described image analysis engine.
- the image search providing apparatus 220 utilizes the first metadata received from the external image analysis engine and the image search providing apparatus 220 performs image analysis on its own. Because it does not, processing speed can be improved.
- the image search providing apparatus 220 includes a processor 221 and a database 222 .
- the image search providing apparatus 220 may include more components than those of FIG. 2 .
- the image search providing apparatus 220 may further include other components such as a memory, a communication module, an input/output interface, and the like.
- the memory is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive.
- RAM random access memory
- ROM read only memory
- a permanent mass storage device such as a disk drive.
- codes for a browser installed and driven in the operating system and at least one program code or the above-described application may be stored in the memory.
- These software components may be loaded from a computer-readable recording medium separate from the memory using a drive mechanism.
- the separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card.
- the software components may be loaded into the memory through a communication module rather than a computer-readable recording medium.
- the at least one program may be loaded into the memory based on a file distribution system that distributes installation files of developers or applications.
- the processor 221 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to the processor 221 by a memory or communication module. For example, the processor 221 may be configured to execute a received instruction according to a program code stored in a recording device such as a memory.
- the communication module may provide a function for the terminal 210 and the image search providing apparatus 220 to communicate with each other through a network, and may provide a function for communicating with another server. For example, a control signal, command, content, file, etc. provided under the control of the processor 221 of the image search providing apparatus 220 may be received by the terminal 210 through the communication module and the network 170 . .
- FIG. 3 is a diagram illustrating an example of an image search result page according to an exemplary embodiment.
- the image search result page may include a scene image title field 310 , a player field 320 for reproducing a scene image, and an additional information field 330 .
- the scene video title field 310 may display a title of a scene video selected as a search result. According to the search word input by the user, "a scene where AAA and BBB enjoy a picture of a child in OOO" shown in FIG. 3 may be the title of the scene video.
- a player for playing the scene video selected as a result of the search is displayed in the player field 320 .
- the scene video may be set to be played only when the user clicks the play button of the player.
- image quality information In the additional information field 330 , image quality information, file type information, capacity information, reproduction time information, screen size information, source information, and the like may be displayed.
- the image search method may provide the user with a search result including not only the scene image selected as a result of the scene image search, but also the entire image selected as matching the search query. That is, the image search method may display the entire image search result and the scene image search result corresponding to the search word input by the user.
- FIG. 4 is a flowchart illustrating a method for providing an image search according to an exemplary embodiment.
- steps 410 to 460 may be performed by the image search providing apparatus 220 described above with reference to FIG. 2 .
- the image search providing apparatus 220 may be implemented by one or more hardware modules, one or more software modules, or various combinations thereof.
- the image search providing apparatus 220 receives one or more keywords corresponding to each shot of an image and a confidence value corresponding to the keywords from the image analysis engine.
- the image analysis engine may be the image analysis engine 230 described above with reference to FIG. 2 .
- the image search providing apparatus 220 determines, among one or more keywords, a keyword having a confidence value equal to or greater than a first threshold value as a candidate keyword.
- the image search providing apparatus 220 may regard keywords having a confidence value less than the first threshold value as noise among keywords corresponding to the shot.
- the first threshold value may be determined based on a distribution of confidence values. For example, the first threshold value may be determined based on an average and variance of confidence values corresponding to a specific shot.
- the video search providing apparatus 220 determines, among consecutive shots, shots in which the match ratio of the candidate keyword of each shot is equal to or greater than the second threshold value as one scene.
- One or more consecutive shots may be grouped together to form a scene. For example, in the case of a scene in which the main character walks down the street, the scene may be divided into several shots from various angles, but all of the shots may be scenes in which the main character is walking. A specific method for determining a scene will be described with reference to FIG. 5 .
- FIG. 5 is a diagram for describing a specific method of determining a scene according to an exemplary embodiment.
- shot 1 510 to shot 3 530 are consecutive shots, and Table 1 below shows shots 1 510 to shot 3 530 received from the image analysis engine. Corresponding first metadata (eg, keyword and confidence value) are indicated.
- first metadata eg, keyword and confidence value
- the image search providing apparatus 220 may determine, as a candidate keyword, a keyword having a confidence value equal to or greater than a first threshold among keywords corresponding to each shot.
- the first threshold value may be determined based on a distribution of confidence values. For example, the first threshold value may be determined as a confidence value corresponding to the lower 20%. Referring to Table 1, the first threshold value of the first shot 510 may be determined as 0.565, the first threshold value of the second shot 520 may be determined as 0.555, and the first threshold value of the third shot 530 may be determined as 0.615.
- the keywords 'Animation', 'Exhibition', 'llustration', 'Conversation' and 'Mural' of the first shot 510 are excluded from the candidate keywords, and the keywords 'Luggage & bags' and 'Visual' of the second shot 520 are excluded.
- Arts' may be excluded from the candidate keywords, and the keywords 'Black Hair', 'Conversation', and 'Jaw' of the shot 3 530 may be excluded from the candidate keywords.
- the image search providing apparatus 220 may determine, among consecutive shots, shots in which the match ratio of candidate keywords of each shot is equal to or greater than the second threshold value as one scene.
- the image search providing apparatus 220 may determine, as one scene, shots in which a match ratio of a candidate keyword with a previous shot is equal to or greater than a second threshold value.
- shot 2 520 includes candidate keywords of shot 1 510 and 5 ('Man', 'Picture frame', Clothing', 'Art', 'Event') among 10 candidate keywords. match Also, in the shot 3 530 , three keywords ('Woman', 'Photography', and 'Gesture') match the keywords of the shot 2 520 out of 10 candidate keywords.
- the second threshold value is, for example, 0.5
- shot 2 520 matches 5 keywords of shot 1 510 out of 10 total keywords, so shot 1 510 and shot 2 520 are 0.5 (5) /10), which is equal to or greater than the second threshold value of 0.5, it can be determined as a shot constituting one scene.
- shot 3 530 matches three keywords of shot 2 520 out of 10 keywords, so shot 2 520 and shot 3 530 have a matching ratio of 0.3 (3/10), Since this is less than the second threshold value of 0.5, the second shot 520 and the third shot 530 may be determined as shots constituting different scenes.
- the image search providing apparatus 220 cumulatively counts the number of matching candidate keywords in each shot in successive shots, and based on the accumulated count value, the matching ratio of the candidate keywords can be calculated.
- the keyword of the short 1 510 and 1 ('Person') among the 10 candidate keywords, the keyword of the short 2 520 and 3 ('Woman', ') Photography', 'Gesture') coincide and accumulate, so that the shot 3 530 may have a coincidence ratio of 0.4 (4/10).
- the image search providing apparatus 220 statistically processes confidence values corresponding to keywords of shots constituting a scene to determine a final keyword representing the scene.
- the image search providing apparatus 220 may determine a final keyword composed of keywords representing the scene by removing keywords that can be viewed as noise from keywords corresponding to one or more shots constituting the scene.
- the image search providing apparatus 220 may generate a final keyword matched with time information corresponding to a scene, and the final keyword matched with time information may be stored in a database in the form of a file.
- the image search providing apparatus 220 may determine a weight for each shot of the shots constituting the scene, and weight the confidence values corresponding to keywords of the shots constituting the scene based on the weight for each shot. . Furthermore, the image search providing apparatus 220 may determine a keyword whose weighted sum is equal to or greater than the third threshold value as the final keyword.
- the statistical processing method is not limited to the above-described weighted sum, and may include any method related to statistical processing.
- the image search providing apparatus 220 may determine a weight for each shot based on the importance of shots constituting the scene. For example, the image search providing apparatus 220 may assign a greater weight to a shot having a higher importance.
- the image search providing apparatus 220 may determine that the greater the number of candidate keywords, the higher the importance. Accordingly, the image search providing apparatus 220 may determine a weight for each shot of the shots constituting the scene based on the number of candidate keywords included in each of the shots constituting the scene. For example, in Table 1, shot 1 510 has 17 candidate keywords, shot 2 520 has 10 candidate keywords, and shot 1 510 has a weight of 0.63 (17/27). , shot 2 520 may have a weight of 0.37 (10/27). Table 2 below shows the weighted sum of keywords and corresponding confidence values of shots (eg, shot 1 510 and shot 2 520 ) constituting the scene according to the example above.
- the weighted sum of the confidence values corresponding to the keywords of the first shot 510 and the second shot 520 is equal to or greater than a predetermined third threshold (eg, 0.5).
- Keywords 'Man', 'Picture frame', 'Art', 'Event', 'Clothing', 'Visual Arts', 'Person'
- a predetermined third threshold eg, 0.5.
- the third threshold value may be determined based on a distribution of confidence values corresponding to keywords of shots constituting a scene.
- the third threshold value may be determined based on an average and variance of confidence values corresponding to keywords of shots constituting a scene.
- the third threshold value may be determined based on a predetermined percentile value of confidence values corresponding to keywords of shots constituting a scene.
- the method of determining the third threshold value is not limited to the above example, and includes any method that may be determined based on the distribution of confidence values.
- the image search providing device 220 receives a scene search request including a search query.
- the image search providing apparatus 220 may include a search engine itself or may use an external search engine. When the image search providing apparatus 220 includes a search engine itself, it may directly receive a scene search request including a search query from a user's terminal. Alternatively, when an external search engine is used, the user may request a scene search through the external search engine, and the image search providing apparatus 220 may receive a scene search request including a search query from the external search engine.
- the image search providing apparatus 220 determines a scene corresponding to the search query based on the final keyword.
- the image search providing apparatus 220 may compare the final keyword with the search query, and may determine a scene corresponding to the search query based on the comparison result.
- the image search providing apparatus 220 may add one or more relational keywords having a predetermined relationship with the final keyword, and may determine a scene corresponding to the search query by further considering the relational keywords.
- the relational keyword may include a final keyword, a similar word, an upper/lower word, and the like.
- the image search providing apparatus 220 provides an image corresponding to the determined scene.
- the image search providing apparatus 220 may provide an image corresponding to a scene determined by the external search engine.
- the image search providing apparatus 220 may modify the final keyword based on the search query.
- the image search providing apparatus 220 may improve search accuracy while continuously correcting determined information by applying feedback on the search result.
- the embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component.
- the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers.
- the processing device may execute an operating system (OS) and one or more software applications running on the operating system.
- the processing device may also access, store, manipulate, process, and generate data in response to execution of the software.
- OS operating system
- the processing device may also access, store, manipulate, process, and generate data in response to execution of the software.
- the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.
- Software may comprise a computer program, code, instructions, or a combination of one or more of these, which configures a processing device to operate as desired or is independently or collectively processed You can command the device.
- the software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave.
- the software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.
- the method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium.
- the computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.
- the program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software.
- Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks.
- - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.
- Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.
Abstract
Description
쇼트1short 1 | 쇼트2short 2 | 쇼트3short 3 | |||
ManMan | 0.880.88 | Womanwoman | 0.840.84 | Womanwoman | 0.830.83 |
Picture framepicture frame | 0.850.85 | ManMan | 0.840.84 | PersonPerson | 0.790.79 |
PersonPerson | 0.810.81 | Picture framepicture frame | 0.780.78 | TopTop | 0.670.67 |
ClothingClothing | 0.690.69 | ClothingClothing | 0.560.56 | GestureGesture | 0.810.81 |
Luggage & bagsLuggage & bags | 0.570.57 | Luggage & bagsLuggage & bags | 0.550.55 | ForeheadForehead | 0.780.78 |
ArtArt | 0.860.86 | ArtArt | 0.680.68 | FingerFinger | 0.720.72 |
Visual ArtsVisual Arts | 0.820.82 | RoomRoom | 0.660.66 | SceneScene | 0.710.71 |
Modern ArtModern Art | 0.740.74 | EventEvent | 0.630.63 | HandHand | 0.700.70 |
PaintingPainting | 0.740.74 | PhotographyPhotography | 0.620.62 | MouthMouth | 0.680.68 |
OrganismOrganism | 0.720.72 | ConversationConversation | 0.570.57 | SmileSmile | 0.640.64 |
FunFun | 0.700.70 | GestureGesture | 0.560.56 | PhotographyPhotography | 0.620.62 |
EventEvent | 0.670.67 | Visual ArtsVisual Arts | 0.550.55 | Black HairBlack Hair | 0.610.61 |
AdaptationAdaptation | 0.670.67 | ConversationConversation | 0.580.58 | ||
RoomRoom | 0.660.66 | JawJaw | 0.570.57 | ||
Art ExhibitionArt Exhibition | 0.650.65 | ||||
DrawingDrawing | 0.590.59 | ||||
PortraitPortrait | 0.570.57 | ||||
AnimationAnimation | 0.560.56 | ||||
ExhibitionExhibition | 0.560.56 | ||||
IllustrationIllustration | 0.550.55 | ||||
ConversationConversation | 0.540.54 | ||||
MuralMural | 0.530.53 |
장면1scene 1 | |
ManMan | 0.86520.8652 |
Picture framepicture frame | 0.82410.8241 |
ArtArt | 0.83040.8304 |
EventEvent | 0.42210.4221 |
ClothingClothing | 0.63820.6382 |
Visual ArtsVisual Arts | 0.51660.5166 |
PersonPerson | 0.51030.5103 |
Modern ArtModern Art | 0.46620.4662 |
PaintingPainting | 0.46620.4662 |
OrganismOrganism | 0.45360.4536 |
FunFun | 0.4410.441 |
AdaptationAdaptation | 0.42210.4221 |
RoomRoom | 0.41580.4158 |
Art ExhibitionArt Exhibition | 0.40950.4095 |
DrawingDrawing | 0.37170.3717 |
Luggage & bagsLuggage & bags | 0.35910.3591 |
PortraitPortrait | 0.35910.3591 |
AnimationAnimation | 0.35280.3528 |
ExhibitionExhibition | 0.35280.3528 |
IllustrationIllustration | 0.34650.3465 |
ConversationConversation | 0.34020.3402 |
MuralMural | 0.33390.3339 |
Womanwoman | 0.31080.3108 |
RoomRoom | 0.24420.2442 |
PhotographyPhotography | 0.22940.2294 |
ConversationConversation | 0.21090.2109 |
GestureGesture | 0.20720.2072 |
Luggage & bagsLuggage & bags | 0.20350.2035 |
Visual ArtsVisual Arts | 0.20350.2035 |
Claims (20)
- 영상 분석 엔진으로부터 영상의 각 쇼트(shot)에 대응하는 하나 이상의 키워드 및 상기 키워드에 대응하는 컨피던스 값을 수신하는 단계;receiving one or more keywords corresponding to each shot of an image and a confidence value corresponding to the keywords from an image analysis engine;상기 하나 이상의 키워드 중에서, 상기 컨피던스 값이 제1 임계값 이상인 키워드를 후보 키워드로 결정하는 단계;determining, among the one or more keywords, a keyword having the confidence value equal to or greater than a first threshold value as a candidate keyword;연속하는 쇼트들에 있어서, 상기 각 쇼트의 상기 후보 키워드의 일치 비율이 제2 임계값 이상인 쇼트들을 하나의 장면으로 결정하는 단계;in successive shots, determining shots in which the match ratio of the candidate keyword of each shot is equal to or greater than a second threshold value as one scene;상기 장면을 구성하는 쇼트들의 키워드에 대응하는 컨피던스 값을 통계 처리하여, 상기 장면을 대표하는 최종 키워드를 결정하는 단계;determining a final keyword representing the scene by statistically processing confidence values corresponding to keywords of shots constituting the scene;검색 쿼리를 포함하는 장면 검색 요청을 수신하는 단계;receiving a scene search request comprising a search query;상기 최종 키워드에 기초하여 상기 검색 쿼리에 대응하는 장면을 결정하는 단계; 및determining a scene corresponding to the search query based on the final keyword; and상기 결정된 장면에 대응하는 영상을 제공하는 단계providing an image corresponding to the determined scene를 포함하는 영상 검색 제공 방법.A method of providing a video search comprising a.
- 제1항에 있어서,According to claim 1,상기 검색 쿼리에 기초하여, 상기 최종 키워드를 수정하는 단계modifying the final keyword based on the search query를 더 포함하는, 영상 검색 제공 방법.Further comprising, a video search providing method.
- 제1항에 있어서,According to claim 1,상기 제1 임계값은 상기 컨피던스 값의 분포에 기초하여 결정되는, 영상 검색 제공 방법.The first threshold value is determined based on a distribution of the confidence values.
- 제3항에 있어서,4. The method of claim 3,상기 최종 키워드를 결정하는 단계는The step of determining the final keyword is상기 장면을 구성하는 쇼트들의 쇼트 별 가중치를 결정하는 단계;determining a weight for each shot of the shots constituting the scene;상기 쇼트 별 가중치에 기초하여, 상기 장면을 구성하는 쇼트들의 키워드에 대응하는 컨피던스 값을 가중합하는 단계; 및weighted summing confidence values corresponding to keywords of shots constituting the scene based on the weights for each shot; and상기 가중합이 제3 임계값 이상인 키워드를 상기 최종 키워드로 결정하는 단계determining a keyword whose weighted sum is equal to or greater than a third threshold value as the final keyword를 포함하는, 영상 검색 제공 방법.Including, a method for providing video search.
- 제4항에 있어서,5. The method of claim 4,상기 쇼트 별 가중치를 결정하는 단계는The step of determining the weight for each shot is상기 장면을 구성하는 쇼트들 각각에 포함된 상기 후보 키워드의 수에 기초하여 상기 장면을 구성하는 쇼트들의 상기 쇼트 별 가중치를 결정하는 단계determining a weight for each shot of the shots constituting the scene based on the number of candidate keywords included in each of the shots constituting the scene를 포함하는, 미디어 인리치먼트 제공 방법.A method of providing media enrichment, comprising a.
- 제4항에 있어서,5. The method of claim 4,상기 제3 임계값은The third threshold is상기 장면을 구성하는 쇼트들의 상기 키워드에 대응하는 상기 컨피던스 값의 분포에 기초하여 결정되는, 인리치먼트 제공 방법.It is determined based on the distribution of the confidence value corresponding to the keyword of the shots constituting the scene, the enrichment providing method.
- 제4항에 있어서,5. The method of claim 4,상기 가중합하는 단계는The weighting step is상기 가중치에 기초하여, 상기 장면을 구성하는 상기 쇼트들의 상기 후보 키워드에 대응하는 컨피던스 값을 가중합하는 단계weighted summing confidence values corresponding to the candidate keywords of the shots constituting the scene based on the weight를 포함하는, 미디어 인리치먼트 제공 방법.A method of providing media enrichment, comprising a.
- 제4항에 있어서,5. The method of claim 4,상기 검색 쿼리에 대응하는 장면을 결정하는 단계는The step of determining a scene corresponding to the search query is상기 제3 임계값 이상인 키워드와 상기 검색 쿼리를 비교하는 단계; 및comparing the search query with a keyword equal to or greater than the third threshold; and상기 비교 결과에 기초하여 상기 검색 쿼리에 대응하는 장면을 결정하는 단계determining a scene corresponding to the search query based on the comparison result를 포함하는, 영상 검색 제공 방법.Including, a method for providing video search.
- 제8항에 있어서,9. The method of claim 8,상기 제3 임계값 이상인 키워드와 미리 정해진 관계를 갖는 하나 이상의 관계 키워드를 추가하는 단계adding one or more relational keywords having a predetermined relation with a keyword equal to or greater than the third threshold value;를 더 포함하고,further comprising,상기 검색 쿼리에 대응하는 장면을 결정하는 단계는The step of determining a scene corresponding to the search query is상기 관계 키워드를 더 고려하여 상기 검색 쿼리에 대응하는 장면을 결정하는 단계determining a scene corresponding to the search query by further considering the relational keyword를 포함하는, 영상 검색 제공 방법.Including, a method for providing video search.
- 제1항에 있어서,According to claim 1,상기 영상 분석 엔진은The video analysis engine상기 영상을 입력 받아 상기 키워드 및 상기 키워드에 대응하는 상기 컨피던스 값을 생성하는 외부 영상 분석 엔진을 포함하는, 영상 검색 제공 방법.and an external image analysis engine that receives the image and generates the keyword and the confidence value corresponding to the keyword.
- 하드웨어와 결합되어 제1항 내지 제10항 중 어느 하나의 항의 방법을 실행시키기 위하여 매체에 저장된 컴퓨터 프로그램.A computer program stored on a medium in combination with hardware to execute the method of any one of claims 1 to 10.
- 영상 분석 엔진으로부터 영상의 각 쇼트(shot)에 대응하는 하나 이상의 키워드 및 상기 키워드에 대응하는 컨피던스 값을 수신하고, 상기 하나 이상의 키워드 중에서, 상기 컨피던스 값이 제1 임계값 이상인 키워드를 후보 키워드로 결정하고, 연속하는 쇼트들에 있어서, 상기 각 쇼트의 상기 후보 키워드의 일치 비율이 제2 임계값 이상인 쇼트들을 하나의 장면으로 결정하고, 상기 장면을 구성하는 쇼트들의 키워드에 대응하는 컨피던스 값을 통계 처리하여, 상기 장면을 대표하는 최종 키워드를 결정하고, 검색 쿼리를 포함하는 장면 검색 요청을 수신하고, 상기 최종 키워드에 기초하여 상기 검색 쿼리에 대응하는 장면을 결정하며, 상기 결정된 장면에 대응하는 영상을 제공하는Receive one or more keywords corresponding to each shot of an image and a confidence value corresponding to the keyword from the image analysis engine, and determine, among the one or more keywords, a keyword having a confidence value equal to or greater than a first threshold value as a candidate keyword and, in successive shots, the shots in which the match ratio of the candidate keyword of each shot is equal to or greater than a second threshold value is determined as one scene, and the confidence values corresponding to the keywords of the shots constituting the scene are statistically processed to determine a final keyword representing the scene, receive a scene search request including a search query, determine a scene corresponding to the search query based on the final keyword, and display an image corresponding to the determined scene provided프로세서processor를 포함하는 영상 검색 제공 장치.A video search providing device comprising a.
- 제12항에 있어서,13. The method of claim 12,상기 프로세서는the processor상기 검색 쿼리에 기초하여, 상기 최종 키워드를 수정하는, 영상 검색 제공 장치.Based on the search query, the video search providing apparatus for modifying the final keyword.
- 제12항에 있어서,13. The method of claim 12,상기 제1 임계값은 상기 컨피던스 값의 분포에 기초하여 결정되는, 영상 검색 제공 장치.The first threshold value is determined based on a distribution of the confidence values.
- 제14항에 있어서,15. The method of claim 14,상기 프로세서는the processor상기 장면을 구성하는 쇼트들의 쇼트 별 가중치를 결정하고, 상기 쇼트 별 가중치에 기초하여, 상기 장면을 구성하는 쇼트들의 키워드에 대응하는 컨피던스 값을 가중합하고, 상기 가중합이 제3 임계값 이상인 키워드를 상기 최종 키워드로 결정하는, 영상 검색 제공 장치.A weight of shots constituting the scene is determined for each shot, and confidence values corresponding to keywords of shots constituting the scene are weighted and summed based on the weight of each shot, and a keyword whose weighted sum is equal to or greater than a third threshold is selected. Determined by the final keyword, video search providing device.
- 제15항에 있어서,16. The method of claim 15,상기 프로세서는the processor상기 장면을 구성하는 쇼트들 각각에 포함된 상기 후보 키워드의 수에 기초하여 상기 장면을 구성하는 쇼트들의 상기 쇼트 별 가중치를 결정하는, 영상 검색 제공 장치.and determining a weight for each shot of the shots constituting the scene based on the number of candidate keywords included in each of the shots constituting the scene.
- 제15항에 있어서,16. The method of claim 15,상기 제3 임계값은The third threshold is상기 장면을 구성하는 쇼트들의 상기 키워드에 대응하는 상기 컨피던스 값의 분포에 기초하여 결정되는, 영상 검색 제공 장치.which is determined based on a distribution of the confidence values corresponding to the keywords of shots constituting the scene.
- 제15항에 있어서,16. The method of claim 15,상기 프로세서는the processor상기 가중치에 기초하여, 상기 장면을 구성하는 상기 쇼트들의 상기 후보 키워드에 대응하는 컨피던스 값을 가중합하는, 영상 검색 제공 장치.Based on the weight, the apparatus for providing an image search for weighting the confidence values corresponding to the candidate keywords of the shots constituting the scene.
- 제15항에 있어서,16. The method of claim 15,상기 프로세서는the processor상기 제3 임계값 이상인 키워드와 상기 검색 쿼리를 비교하고, 상기 비교 결과에 기초하여 상기 검색 쿼리에 대응하는 장면을 결정하는, 영상 검색 제공 장치.Comparing a keyword equal to or greater than the third threshold value with the search query, and determining a scene corresponding to the search query based on the comparison result.
- 제15항에 있어서,16. The method of claim 15,상기 프로세서는the processor상기 제3 임계값 이상인 키워드와 미리 정해진 관계를 갖는 하나 이상의 관계 키워드를 추가하고, 상기 관계 키워드를 더 고려하여 상기 검색 쿼리에 대응하는 장면을 결정하는, 영상 검색 제공 장치.An image search providing apparatus for adding one or more relational keywords having a predetermined relationship with a keyword equal to or greater than the third threshold value, and determining a scene corresponding to the search query by further considering the relational keyword.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2020-0007170 | 2020-01-20 | ||
KR20200007170 | 2020-01-20 | ||
KR10-2020-0180768 | 2020-12-22 | ||
KR1020200180768A KR20210093744A (en) | 2020-01-20 | 2020-12-22 | Method and apparatus for providing image search |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021149923A1 true WO2021149923A1 (en) | 2021-07-29 |
Family
ID=76993220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/018892 WO2021149923A1 (en) | 2020-01-20 | 2020-12-22 | Method and apparatus for providing image search |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021149923A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060097895A (en) * | 2005-03-07 | 2006-09-18 | 삼성전자주식회사 | Method and apparatus for speech recognition |
KR20080103227A (en) * | 2007-05-23 | 2008-11-27 | 삼성전자주식회사 | Method for searching supplementary data related to contents data and apparatus thereof |
KR20160058587A (en) * | 2014-11-17 | 2016-05-25 | 삼성전자주식회사 | Display apparatus and method for summarizing of document |
KR20170090678A (en) * | 2016-01-29 | 2017-08-08 | (주) 다이퀘스트 | Apparatus for extracting scene keywords from video contents and keyword weighting factor calculation apparatus |
US20170300570A1 (en) * | 2016-04-13 | 2017-10-19 | Google Inc. | Video Metadata Association Recommendation |
KR20180136265A (en) * | 2017-06-14 | 2018-12-24 | 주식회사 핀인사이트 | Apparatus, method and computer-readable medium for searching and providing sectional video |
-
2020
- 2020-12-22 WO PCT/KR2020/018892 patent/WO2021149923A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060097895A (en) * | 2005-03-07 | 2006-09-18 | 삼성전자주식회사 | Method and apparatus for speech recognition |
KR20080103227A (en) * | 2007-05-23 | 2008-11-27 | 삼성전자주식회사 | Method for searching supplementary data related to contents data and apparatus thereof |
KR20160058587A (en) * | 2014-11-17 | 2016-05-25 | 삼성전자주식회사 | Display apparatus and method for summarizing of document |
KR20170090678A (en) * | 2016-01-29 | 2017-08-08 | (주) 다이퀘스트 | Apparatus for extracting scene keywords from video contents and keyword weighting factor calculation apparatus |
US20170300570A1 (en) * | 2016-04-13 | 2017-10-19 | Google Inc. | Video Metadata Association Recommendation |
KR20180136265A (en) * | 2017-06-14 | 2018-12-24 | 주식회사 핀인사이트 | Apparatus, method and computer-readable medium for searching and providing sectional video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11481428B2 (en) | Bullet screen content processing method, application server, and user terminal | |
WO2020080606A1 (en) | Method and system for automatically generating video content-integrated metadata using video metadata and script data | |
WO2018074716A1 (en) | Method and system for recommending query by using search context | |
WO2010117213A2 (en) | Apparatus and method for providing information related to broadcasting programs | |
WO2015119335A1 (en) | Content recommendation method and device | |
WO2016013885A1 (en) | Method for retrieving image and electronic device thereof | |
WO2014106986A1 (en) | Electronic apparatus controlled by a user's voice and control method thereof | |
WO2020189884A1 (en) | Machine learning-based approach to demographic attribute inference using time-sensitive features | |
WO2020166883A1 (en) | Method and system for editing video on basis of context obtained using artificial intelligence | |
WO2018174314A1 (en) | Method and system for producing story video | |
WO2014175520A1 (en) | Display apparatus for providing recommendation information and method thereof | |
WO2020022550A1 (en) | Method and apparatus for providing dance game based on recognition of user motion | |
WO2018080228A1 (en) | Server for translation and translation method | |
CN104102683A (en) | Contextual queries for augmenting video display | |
WO2013066041A1 (en) | Social data management system and method for operating the same | |
WO2022071635A1 (en) | Recommending information to present to users without server-side collection of user data for those users | |
CN113254779A (en) | Content search method, device, equipment and medium | |
WO2024008184A1 (en) | Information display method and apparatus, electronic device, and computer readable medium | |
WO2021149923A1 (en) | Method and apparatus for providing image search | |
WO2020242089A2 (en) | Artificial intelligence-based curating method and device for performing same method | |
WO2021149924A1 (en) | Method and apparatus for providing media enrichment | |
KR20210093743A (en) | Method and apparatus for providing media enrichment | |
EP3555883A1 (en) | Security enhanced speech recognition method and device | |
WO2019194569A1 (en) | Image searching method, device, and computer program | |
WO2017222226A1 (en) | Method for registering advertised product on image content and server for executing same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20916104 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20916104 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 191222) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20916104 Country of ref document: EP Kind code of ref document: A1 |