CN110751086A

CN110751086A - Target searching method, device, equipment and storage medium based on video

Info

Publication number: CN110751086A
Application number: CN201910989383.2A
Authority: CN
Inventors: 周鸣; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2020-02-04

Abstract

The disclosure provides a video-based target searching method, device, equipment and storage medium. The target searching method based on the video comprises the following steps: extracting frames in the video; detecting whether a preset target exists in a frame in the video, and if so, determining the position of the preset target in the frame in the video by using a boundary frame; extracting the characteristics of the preset target determined by the bounding box; searching information to be recommended corresponding to the preset target through the characteristics of the preset target; and the boundary frame is the minimum frame wrapping the preset target. The method and the device for determining the same target in the video can determine the same target in the video so as to avoid repeated searching and determine the optimal target, and improve the use experience of a user.

Description

Target searching method, device, equipment and storage medium based on video

Technical Field

The present disclosure relates to the field of computer software technologies, and in particular, to a method, an apparatus, a device, and a storage medium for video-based object search.

Background

With the development of multimedia technology, people are increasingly unable to leave various intelligent devices. The intelligent device includes various devices and terminals, including a system terminal for processing and controlling information using computer technology and digital communication network technology, etc. Currently, smart devices with touch screens, such as mobile phones, are widely used.

On some intelligent devices with touch screens, people often use computer vision technologies, for example, image recognition technologies in various e-commerce, such as panning of treasure, shooting and purchasing in kyoto, and the like, but with the development of industries such as short videos or live videos, users want to detect and recognize contents in videos when watching the short videos or live videos, so as to make personalized recommendation for the users, and improve user experience.

At present, the following problems mainly exist: the traditional identification search is based on images, the images are much simpler than videos, and generally are actively matched by users, and the identification search technology based on videos is very difficult.

Disclosure of Invention

The present disclosure has been made to solve the above problems, and an object of the present disclosure is to provide a simple and efficient video-based object searching method, device, apparatus, and storage medium, which can determine an identical object in a video to avoid repeated searching and determine an optimal object. This disclosure provides this summary in order to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In order to solve the above technical problem, an embodiment of the present disclosure provides a video-based target search method, which adopts the following technical solutions:

extracting frames in the video;

detecting whether a preset target exists in a frame in the video, and if so, determining the position of the preset target in the frame in the video by using a boundary frame;

extracting the characteristics of the preset target determined by the bounding box;

searching information to be recommended corresponding to the preset target through the characteristics of the preset target;

and the boundary frame is the minimum frame wrapping the preset target.

In order to solve the above technical problem, an embodiment of the present disclosure further provides a video-based target search apparatus, which adopts the following technical solution, including:

the video frame extracting device is used for extracting frames in the video;

the target detection device is used for detecting whether a preset target exists in a frame in the video or not, and if the preset target exists, the position of the preset target in the frame in the video is determined by using a boundary frame;

a feature extraction device for extracting the feature of the preset target determined by the bounding box;

the searching device is used for searching the information to be recommended corresponding to the preset target through the characteristics of the preset target;

and the boundary frame is the minimum frame wrapping the preset target.

In order to solve the above technical problem, an embodiment of the present disclosure further provides a computer device, which adopts the following technical solutions:

comprising a memory having stored therein a computer program and a processor implementing the video-based object search method as described above when executing the computer program.

In order to solve the above technical problem, an embodiment of the present disclosure further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements a video-based object search method as described above.

According to the technical scheme disclosed by the disclosure, compared with the prior art, the method and the device for determining the optimal target can determine the same target in the video so as to avoid repeated searching and determine the optimal target, and the use experience of a user is improved.

Drawings

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of a video-based target search method according to the present disclosure;

FIG. 3 is a schematic diagram of a bounding box of a video-based object search method according to the present disclosure;

FIG. 4 is a schematic diagram of a decimated frame picture of one embodiment of a video-based object search method according to the present disclosure;

FIG. 5 is a schematic diagram illustrating one embodiment of a video-based object search apparatus, according to the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of a computer device according to the present disclosure.

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure; the terms "including" and "having," and any variations thereof, in the description and claims of this disclosure and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of the present disclosure or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the disclosure. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions of the present disclosure better understood by those skilled in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

[ System Structure ]

First, the structure of the system of one embodiment of the present disclosure is explained. As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, 104, a network 105, and a server 106. The network 105 serves as a medium for providing communication links between the

terminal devices

101, 102, 103, 104 and the server 106.

In the present embodiment, an electronic device (e.g.,

terminal device

101, 102, 103, or 104 shown in fig. 1) on which the video-based object search method operates can perform transmission of various information through the network 105. Network 105 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G/5G connection, a Wi-Fi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a UWB connection, and other now known or later developed wireless connection means.

A user may use

terminal devices

101, 102, 103, 104 to interact with a server 106 via a network 105 to receive or send messages or the like. Various client applications, such as a video live and play application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the

terminal device

101, 102, 103, or 104.

The

terminal device

101, 102, 103 or 104 may be various electronic devices having a touch screen display and/or supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (moving picture experts group compressed standard audio layer 3), MP4 (moving picture experts group compressed standard audio layer 4) players, head mounted display devices, laptop portable computers, desktop computers, and the like.

The server 106 may be a server that provides various services, such as a background server that provides support for pages displayed on the

terminal devices

101, 102, 103, or 104.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Here, the terminal device may implement the embodiment method of the present disclosure independently or by running an application in an android system in cooperation with other electronic terminal devices, or may run an application in other operating systems, such as an iOS system, a Windows system, a hongmeng system, and the like, to implement the embodiment method of the present disclosure.

[ video-based object search method ]

Referring to FIG. 2, a flow diagram for one embodiment of a video-based target search method is shown, according to the present disclosure. The video-based target searching method comprises the following steps:

s21, the terminal device extracts the frames in the currently played video in a manner of frame rate extraction, where the frame rate of the extracted video is at least 24fps, preferably 120fps, 240fps or 300 fps.

S22, the terminal equipment detects whether a preset target exists in the frame in the video, and if yes, the position of the preset target in the frame in the video is determined by using the bounding box; here, the bounding box refers to a minimum frame that can enclose a preset target, and may be, for example, a rectangular frame, a circular frame, or the like.

Here, the detected preset target may be a human body, and the features of the human body, including gender and posture, in the frames in the video are extracted through a classification algorithm. Here, the posture refers to a front, side, back, or the like, and it is preferable to detect a front or back posture in which more information can be obtained.

Of course, the preset target of the detection may also be clothing, and here, the clothing may be directly detected when detecting the clothing, or the clothing on the human body may be detected by detecting the human body. When detecting a dress on a human body by detecting the human body, whether a boundary frame of the detected dress is in the boundary frame of the detected human body needs to be judged, if so, the gender and the posture of the detected human body are set in the detected dress, namely, the gender of the dress is equal to the gender of the human body wearing the dress.

Here, the criterion for determining whether or not the boundary box of the clothing is within the boundary box of the human body is ioa (intersectionover area), as shown in fig. 3, is a schematic diagram of a bounding box of a video-based object search method according to the present disclosure, wherein in the case of determining whether the bounding box of the apparel is within the bounding box of the human body, S3 is, for example, the area of the overlapping portion of the bounding box representing the apparel and the bounding box of the human body, S1 is, for example, the area of the bounding box representing the apparel minus the area of the overlapping portion S3, S2 is, for example, the area of the bounding box representing the human body minus the area of the overlapping portion S3, calculating whether the ratio IOA1 of the area of the overlapping portion S3 to the area of the bounding box of the apparel S1+ S3 is greater than a first threshold, i.e., IOA1 is S3/(S1+ S3), if the ratio IOA1 is greater than the first threshold, then the bounding box representing the apparel is within the bounding box of the human body and if the ratio IOA1 is less than the first threshold then the bounding box representing the apparel is not within the bounding box of the human body. Here, the first threshold is preferably 0.8 to achieve a proper judgment criterion, but of course, the first threshold may be 1.0, and when the first threshold is 1.0, it means that the area S3 of the overlapped part is overlapped with the area S1+ S3 of the boundary frame of the clothing, that is, S1+ S3, S2+ S3, S3 are completely overlapped, so the judgment criterion is too high. Of course, the first threshold value may be any value between 0 and 1.0.

After the gender and the posture of the clothes are detected, the style characteristics determined by the boundary box of the clothes can be extracted by using a classification algorithm.

Here, the detecting of the object, for example, the detecting of the human body or the clothing in the frame in the video, further includes determining whether the object, for example, clothing of the human body, in adjacent front and rear frames is the same piece, and determining the positional relationship of the clothing in the front and rear frames by determining the positional relationship of the clothing in the front and rear frames as IOU (interaction over unit), as shown in fig. 3, is a schematic diagram of a boundary box of the video-based object search method according to the present disclosure, where, in the case of the positional relationship of the clothing in the front and rear frames, S3 is, for example, an area representing a coincident portion of the boundary box of the clothing in the front and rear frames, S1 is, for example, an area representing an area of the boundary box of the clothing in the front frame minus an area of the coincident portion S3, S2 is, for example, an area representing an area of the boundary box of the clothing in the rear frame minus an area of the coincident portion S3, and, if a ratio S3 of the two boundary boxes in the front and the total area of the boundary box (S1+ S2+ 3) is greater than And if the Euclidean distance of the features of the clothes in the front and rear frames is smaller than the third threshold value, the clothes in the adjacent front and rear frames are the same. Here, the second threshold is preferably 0.5 to achieve a proper judgment standard, but the second threshold may be any value between 0 and 1.0 according to the judgment standard. Here, the third threshold is preferably 0.1 to achieve a proper judgment criterion, but the third threshold may be arbitrarily small according to the judgment criterion to improve the execution degree. Here, the feature refers to an array [ x1, x2, x3 … ] represented by floating point numbers, and the euclidean distance refers to the euclidean distance between two feature arrays.

Here, the video-based target search method of the present disclosure sequentially determines all adjacent previous and subsequent frames to determine whether the targets in all frames in the video are the same target, and if the preset targets in each group of adjacent previous and subsequent frames are the same target, the preset targets in the video are the same target.

Here, when the target is, for example, a piece of apparel, after extracting features of the piece of apparel determined by the bounding box, a candidate target with the preset features is filtered from the preset target apparel, for example, the apparel in different frames in the video with postures different from the front view and the side view is removed, and the apparel displaying the front view and the side view of the apparel is selected as the candidate target apparel, and the apparel with the largest area of the corresponding bounding box in the candidate target apparel is determined as the final target apparel, and is used for extracting the features and searching in subsequent steps. Here, there may be multiple targets in the video to be retrieved, each target having its final target's frame.

S23, the terminal device extracts the characteristics of the preset target determined by the bounding box; here, the feature of the preset target refers to an array [ x1, x2, x3 … ] represented by floating point numbers.

Here, after the final object in the video is determined, the features of the final object determined by the bounding box are extracted, and here, the features of the final object also refer to an array [ x1, x2, x3 … ] represented by floating point numbers.

And S24, the terminal device searches information to be recommended corresponding to the preset target, such as brand information and price information of the preset target, through the characteristics of the preset target.

Here, after the final object in the video is determined, the extracted feature data of the final object is used to search for information to be recommended of the final object, such as brand information, price information, and the like of the final object.

Here, different final objects in the video are searched separately, and information that does not match is excluded from the search results according to predetermined attributes and displayed.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

[ example method ]

In the following, taking an example of searching for clothes in a video, a method of an embodiment of the present disclosure is described, which includes the following steps:

step 1, the terminal equipment extracts the frame in the video played currently, and the frame in the video played currently is extracted according to the frame rate. Fig. 4 is a schematic diagram of a frame picture extracted according to an embodiment of the video-based object search method of the present disclosure, as shown in fig. 4, S4 is an area of a bounding box representing a human body minus an area of a bounding box of clothing, S5 is an area of a bounding box representing clothing, and in the present embodiment, S5 is also an area of a coincident part of two bounding boxes.

And 2, detecting the human body and the clothes in the frame in the video by the terminal equipment to determine whether the human body and the clothes exist in the frame in the video, and determining the positions of the human body and the clothes on the human body in the frame in the video by utilizing the boundary box.

Firstly, detecting a human body, determining the position of the human body in a frame in a video through a human body boundary box, and extracting the characteristics of the human body in the human body boundary box, namely the gender and the posture through a classification algorithm. For example, in fig. 4, the gender of the human body is female, and the posture is a frontal posture.

Then, clothing on the human body is detected based on the detected human body. When detecting a dress on a human body by detecting the human body, it is necessary to determine whether a boundary box of the detected dress is in a boundary box of the detected human body.

Here, the criterion for determining whether the clothing is in the boundary frame of the human body is IOA (intersection over area), and whether the ratio IOA2 of the area S5 of the overlapped portion to the area S5 of the boundary frame of the clothing is greater than a first threshold, i.e., IOA2 is S5/S5, and if IOA2 is greater than the first threshold 0.8, it indicates that the clothing is in the boundary frame of the human body, in the present embodiment, IOA2 is 1, and therefore, in the boundary frame of the human body, the detected sex and posture of the human body are set to the detected clothing, i.e., the sex of the clothing is female, and the posture is a front posture.

When the gender and pose of the garment is detected, a classification algorithm may also be used to extract a style characteristic of the garment, such as being detected as a sports jacket in this embodiment.

Then, after the clothing of the human body in the picture of the current frame is detected, whether the clothing of the human body in the previous and next frames adjacent to the current frame is the same piece is judged, the position relation of the clothing in the previous and next frames is judged to be IOU (interaction over Unit), if the ratio IOU of the overlapping area S5 of the two boundary frames of the clothing in the previous and next frames to the total area (S4+ S5) of the two boundary frames is greater than a second threshold value 0.5 and the Euclidean distance of the characteristics of the clothing in the previous and next frames is less than a third threshold value 0.1, the clothing in the adjacent previous and next frames is the same piece and is the sport jacket shown in the picture of FIG. 4. Here, the feature refers to an array [ x1, x2, x3 … ] represented by floating point numbers, and the euclidean distance refers to the euclidean distance between two feature arrays.

And then, sequentially judging all adjacent front and rear frames to judge whether the clothes in all the frames in the video are the sports jacket, wherein if the clothes in each group of adjacent front and rear frames are the sports jacket, the preset target in the video is the sports jacket.

Then, the sports jacket with the front view and the side view pose is filtered from the sports jacket of the preset target in all the frames, namely the sports jacket with the pose which is not the front and the side in different frames in the video is removed, and the boundary box of the sports jacket with the largest area in all the boundary boxes belonging to the sports jacket in the filtered frames is determined as the frame of the final target jacket of the sports jacket in the video, wherein the jacket is used for extracting the features and retrieving in the subsequent steps. Here, there may be multiple targets to be retrieved in the video, such as a T-shirt for a human body in the picture in addition to the sports jacket, each piece of clothing having a frame of the final target.

And 3, after extracting the final target of the sports jacket in the video determined by the bounding box, the terminal equipment extracts the characteristics of the sports jacket of the final target determined by the bounding box, wherein the characteristics of the final target of the sports jacket refer to an array [ x1, x2, x3 … ] represented by floating point numbers.

And 4, searching the information of the sports jacket, such as brand information, price information and the like, by the terminal equipment through the characteristics of the sports jacket of the final target. Here, it is also possible to search for representations of different clothes in the video, and display the search result after excluding non-compliant information according to a predetermined attribute, for example, clothes not belonging to women and not belonging to the type of sport jacket.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read Only Memory (ROM), or a Random Access Memory (RAM).

[ video-based object search device ]

As shown in fig. 5, in order to implement the technical solution in the embodiment of the present disclosure, the present disclosure provides a video-based target searching apparatus, which may be specifically applied to various electronic terminal devices.

The video-based target searching device described in this embodiment includes: a video frame extraction module 501, an object detection module 502, a feature extraction module 503, a search module 504, a deduplication module 505, and a filtering module 506.

A video frame extracting module 501, which extracts frames in a currently played video; the frames in the currently played video are extracted at a frame rate, where the frame rate of the extracted video is at least 24fps, preferably 120fps, 240fps, or 300 fps.

The target detection module 502 detects whether a preset target exists in a frame in the video, and if so, determines the position of the preset target in the frame in the video by using a bounding box; here, the bounding box refers to a minimum frame that can enclose a preset target, and may be, for example, a rectangular frame, a circular frame, or the like.

Here, the detected target may be a human body, and the gender and posture of the human body in the frame in the video are extracted through a classification algorithm. Here, the posture refers to a front, side, back, or the like, and it is preferable to detect a front or back posture in which more information can be obtained.

Of course, the target of detection may also be clothing, and here, the clothing may be directly detected when detecting the clothing, or the clothing on the human body may be detected by detecting the human body. When detecting a dress on a human body by detecting the human body, whether a boundary frame of the detected dress is in the boundary frame of the detected human body needs to be judged, if so, the gender and the posture of the detected human body are set in the detected dress, namely, the gender of the dress is equal to the gender of the human body wearing the dress.

Here, the criterion for determining whether or not the boundary box of the clothing is within the boundary box of the human body is ioa (intersectionover area), as shown in fig. 3, is a schematic diagram of a bounding box of a video-based object search method according to the present disclosure, wherein in the case of determining whether the apparel is in the boundary box of the human body, S3 is, for example, an area of an overlapped portion of the boundary box representing the apparel and the boundary box of the human body, S1 is, for example, an area of the boundary box representing the apparel minus an area of an overlapped portion S3, S2 is, for example, an area of the boundary box representing the human body minus an area of an overlapped portion S3, calculating whether a ratio IOA1 of the area of the overlapped portion S3 to the area of the boundary box of the apparel, for example, S1+ S3, is greater than a first threshold, i.e., IOA1 is S3/(S1+ S3), if the ratio IOA1 is greater than the first threshold, then the apparel is within the bounding box of the human body and if the ratio IOA1 is less than the first threshold then the apparel is not within the bounding box of the human body. Here, the first threshold is preferably 0.8 to achieve a proper judgment criterion, but of course, the first threshold may be 1.0, and when the first threshold is 1.0, it means that the area S3 of the overlapped part is overlapped with the area S1+ S3 of the boundary frame of the clothing, that is, S1+ S3, S2+ S3, S3 are completely overlapped, so the judgment criterion is too high. Of course, the first threshold value may be any value between 0 and 1.0.

After the gender and the posture of the clothes are detected, the style characteristics of the clothes can be extracted by using a classification algorithm.

The feature extraction module 503 is configured to extract features of the preset target determined by the bounding box; after the final target in the video is determined, the characteristics of the final target determined by the bounding box are extracted, wherein the characteristics of the final target refer to an array [ x1, x2, x3 … ] represented by floating point numbers. The determination of the final goal will be made by the following modules.

The searching module 504 searches information to be recommended corresponding to a preset target through the characteristics of the preset target; and after the final target in the video is determined, searching the information to be recommended of the final target by using the extracted characteristic data of the final target. Here, the information to be recommended includes brand information of clothing, price information, and the like. Here, different objects in the video are searched for, and information that does not match is excluded from the search results according to predetermined attributes and displayed.

The duplication elimination module 505 is configured to determine whether the targets in the adjacent previous and subsequent frames are the same target, and determine the position relationship of the targets in the previous and subsequent frames, and if the ratio of the overlapping area of the two bounding boxes of the targets in the previous and subsequent frames to the total area of the two bounding boxes is greater than a second threshold and the euclidean distance of the features of the targets is smaller than a third threshold, determine that the targets in the adjacent previous and subsequent frames are the same target; here, the criterion for determining the position relationship of the apparel in the previous and subsequent frames is iou (interaction over union), as shown in fig. 3, which is a schematic diagram of a bounding box of the video-based object search method according to the present disclosure, in the case of the positional relationship of the clothing in the preceding and following frames, S3 represents, for example, the area of the overlapped portion of the bounding boxes of S1 and S2, S1 represents, for example, the area of the bounding box of clothing of the preceding frame minus the area of the overlapped portion S3, S2 represents, for example, the area of the bounding box of clothing of the following frame minus the area of the overlapped portion S3, and if the ratio IOU of the overlapped area S3 of two bounding boxes of clothing in the preceding and following frames to the total area of the two bounding boxes (S1+ S2+ S3) is greater than the second threshold value and the euclidean distance of the feature of clothing in the preceding and following frames is smaller than the third threshold value, then the clothing in the adjacent preceding and following frames is the same piece. Here, the second threshold is preferably 0.5 to achieve a proper judgment standard, but the second threshold may be any value between 0 and 1.0 according to the judgment standard. Here, the third threshold is preferably 0.1 to achieve a proper judgment criterion, but the third threshold may be arbitrarily small according to the judgment criterion to improve the execution degree. Here, the feature refers to an array [ x1, x2, x3 … ] represented by floating point numbers, and the euclidean distance refers to the euclidean distance between two feature arrays.

And a filtering module 506, when the target is a dress, filtering out candidate target dresses with preset characteristics, namely removing dresses with postures not being the front and the side in different frames in the video, and determining the bounding box of the dress with the largest area in all the bounding boxes belonging to the same dress in the rest frames as the frame of the final target of the dress in the video, wherein the final target dress is used for extracting characteristics and searching in the subsequent steps. Here, there may be multiple targets in the video to be retrieved, each target having its final target's frame.

It should be understood that although each block in the block diagrams of the figures may represent a module, a portion of which comprises one or more executable instructions for implementing the specified logical function(s), the blocks are not necessarily executed sequentially. Each module and functional unit in the device embodiments in the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more modules or functional units are integrated into one module. The integrated modules can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium. The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

[ video-based object search apparatus ]

In order to solve the technical problem, an embodiment of the present disclosure further provides an electronic device. Referring now to fig. 6, a schematic diagram of an electronic device (e.g., a terminal device or a server in fig. 1) 600 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from storage 606 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 606 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 606, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (hypertext transfer protocol), and may interconnect with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a video-based target search method, including:

extracting frames in the video;

and the boundary frame is the minimum frame wrapping the preset target.

In accordance with one or more embodiments of the present disclosure, there is provided a video-based object search method, characterized in that,

the frames in the video are extracted according to a frame rate of at least 24 fps.

the preset target is a human body, and the characteristics of the human body in the frame in the video, including the sex and the posture of the human body, determined by the boundary frame of the human body are extracted.

the preset target further comprises clothing, whether a boundary frame of the clothing is in the boundary frame of the human body is judged, and if yes, the gender and the posture of the human body are set in the clothing.

and extracting the style characteristics of the clothes determined by the boundary box of the clothes.

the method for judging the clothing in the boundary box of the human body is that the ratio of the overlapping area of the boundary box of the clothing and the boundary box of the human body to the area of the boundary box of the clothing is larger than a first threshold value.

the method further comprises the steps of judging whether the targets in the adjacent front and rear frames are the same target or not, and if the ratio of the overlapping area of the two boundary frames of the target in the front and rear frames to the total area of the two boundary frames is larger than a second threshold value and the Euclidean distance of the features of the target is smaller than a third threshold value, determining that the targets in the adjacent front and rear frames are the same target.

and the method also comprises the steps of sequentially judging all adjacent front and back frames, and if the preset targets in each group of adjacent front and back frames are the same target, determining that the targets in the video are the same target.

after extracting the features of the preset target determined by the bounding box, the method further comprises the following steps: filtering out candidate targets with preset characteristics from preset targets, and determining the candidate target with the largest corresponding bounding box area in the candidate targets as a final target, wherein the preset characteristics are as follows: displaying a front view and a side view of the preset target, wherein the searching for the information to be recommended corresponding to the preset target through the characteristics of the preset target comprises: and searching the information to be recommended corresponding to the final target through the characteristics of the final target.

any one of the video-based target search methods is applied to android system applications or iOS system applications.

According to one or more embodiments of the present disclosure, there is provided a video-based target search apparatus, including:

the video frame extracting module is used for extracting frames in the video;

the target detection module is used for detecting whether a preset target exists in a frame in the video or not, and if so, determining the position of the preset target in the frame in the video by using a boundary box;

the characteristic extraction module is used for extracting the characteristics of the preset target determined by the bounding box;

the searching module is used for searching the information to be recommended corresponding to the preset target through the characteristics of the target;

and the boundary frame is the minimum frame wrapping the preset target.

According to one or more embodiments of the present disclosure, there is provided a video-based object search apparatus, characterized in that,

the duplication removing module is used for judging whether the preset targets in the adjacent front and rear frames are the same target or not, and judging the position relation of the preset targets in the front and rear frames, if the ratio of the overlapping area of two boundary frames of the preset targets in the front and rear frames to the total area of the two boundary frames is greater than a second threshold value and the Euclidean distance of the characteristics of the preset targets is smaller than a third threshold value, the preset targets in the adjacent front and rear frames are the same target;

the filtering module, after extracting the characteristics of the preset target determined by the bounding box, further includes: filtering out candidate targets with preset characteristics from preset targets, and determining a target with the largest corresponding bounding box area in the candidate targets as a final target, wherein the preset characteristics are as follows: displaying a front view and a side view of the preset target, wherein the searching for the information to be recommended corresponding to the preset target through the characteristics of the preset target comprises: and searching the information to be recommended corresponding to the final target through the characteristics of the final target.

According to one or more embodiments of the present disclosure, there is provided a computer device comprising a memory having stored therein a computer program and a processor that, when executed, implements the video-based object search method as in any one of the above.

According to one or more embodiments of the present disclosure, a computer-readable storage medium is provided, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the video-based object search method as described in any one of the above.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A video-based target searching method is characterized by comprising the following steps:

extracting frames in the video;

and the boundary frame is the minimum frame wrapping the preset target.

2. The video-based object searching method of claim 1,

3. The video-based object searching method of claim 1,

the preset target is a human body, and the characteristics of the human body in the frame in the video, which are determined by the boundary frame of the human body, including the sex and the posture of the human body are extracted.

4. The video-based object searching method of claim 3,

the preset target further comprises apparel, the method further comprising: and judging whether the boundary frame of the clothing is in the boundary frame of the human body, and if so, setting the gender and the posture of the human body in the clothing.

5. The video-based object searching method of claim 4, further comprising extracting style features of the apparel determined by the bounding box of the apparel.

6. The video-based object searching method of claim 4,

the method for judging the boundary box of the clothing in the boundary box of the human body is that the ratio of the overlapping area of the boundary box of the clothing and the boundary box of the human body to the area of the boundary box of the clothing is larger than a first threshold value.

7. The video-based object searching method according to claim 1, further comprising sequentially determining whether the preset objects in each set of adjacent previous and subsequent frames are the same object, wherein if the preset objects in each set of adjacent previous and subsequent frames are the same object, the preset objects in the video are the same object, and determining whether the preset objects in a set of adjacent previous and subsequent frames are the same object, wherein if a ratio of a coincidence area of two bounding boxes of the preset objects in the previous and subsequent frames to a total area of the two bounding boxes is greater than a second threshold and a euclidean distance of a feature of the preset object is less than a third threshold, the preset objects in the adjacent previous and subsequent frames are the same object.

8. The video-based object searching method of claim 1, wherein after extracting the features of the preset object determined by the bounding box, further comprising: filtering out candidate targets with preset characteristics from preset targets, and determining the candidate target with the largest corresponding bounding box area in the candidate targets as a final target, wherein the preset characteristics are as follows: displaying a front view and a side view of the preset target, wherein the searching for the information to be recommended corresponding to the preset target through the characteristics of the preset target comprises: and searching the information to be recommended corresponding to the final target through the characteristics of the final target.

9. A video-based object searching apparatus, comprising:

the video frame extracting module is used for extracting frames in the video;

the searching module is used for searching the information to be recommended corresponding to the preset target through the characteristics of the preset target;

and the boundary frame is the minimum frame wrapping the preset target.

10. The video-based object searching device of claim 9, further comprising,

the filtering module, after extracting the characteristics of the preset target determined by the bounding box, further includes: filtering out candidate targets with preset characteristics from preset targets, and determining the candidate target with the largest corresponding bounding box area in the candidate targets as a final target, wherein the preset characteristics are as follows: displaying a front view and a side view of the preset target, wherein the searching for the information to be recommended corresponding to the preset target through the characteristics of the preset target comprises: and searching the information to be recommended corresponding to the final target through the characteristics of the final target.

11. A computer device comprising a memory having stored therein a computer program and a processor that when executed implements the video-based object search method of any one of claims 1-8.

12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, implements the video-based object search method according to any one of claims 1-8.