CN109963164A

CN109963164A - A kind of method, apparatus and equipment of query object in video

Info

Publication number: CN109963164A
Application number: CN201711340586.6A
Authority: CN
Inventors: 陈小帅; 张扬
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2019-07-02

Abstract

The invention discloses a kind of methods of query object in video.This method comprises: receiving the user query request for carrying query information；It searches and the matched goal description information of query information, wherein goal description information is in advance for the description information of the target object setting in target video；Query result is formed with target object, returns to query result for user query request.The method provided through the embodiment of the present invention, for objects such as a certain frame image, some segments in video, user can quickly navigate to the position that oneself wants viewing in video by way of inquiry, to simplify the operation of user.In addition, the invention also discloses the devices and equipment of a kind of query object in video.

Description

A kind of method, apparatus and equipment of query object in video

Technical field

The present invention relates to search technique fields, more particularly to a kind of method, apparatus of query object in video and set It is standby.

Background technique

Currently, many search engines both provide video search function, user is searched by search engine Desired video.Usually complete video is long, and user may be not intended to watch complete video from the beginning sometimes, and Only wish to certain a part in viewing video.For example, user may merely desire to check a certain frame image in video sometimes.Again Such as, user may only want to some segment in viewing video sometimes.For another example, user may only want to from a certain frame figure sometimes Picture or some segment start to watch video.When user only wants to some part in viewing video, user needs in video The part for oneself wanting viewing is fast-forward in playing process manually, therefore, user's operation is comparatively laborious and time-consuming.

Summary of the invention

The technical problem to be solved by the invention is to provide the method, apparatus and equipment of a kind of object search in video, So that user can in video quick search to oneself want image and/or segment, user can pass through the side of search Formula navigates to the position that oneself wants viewing in video, to simplify the operation of user.

In a first aspect, in order to solve the above technical problems, the embodiment of the invention provides a kind of query objects in video Method, this method comprises:

Receive the user query request for carrying query information；

It searches and the matched goal description information of query information, wherein goal description information is in advance in target video Target object setting description information；

Query result is formed with target object, returns to query result for user query request；

Wherein, goal description information is in the case where the video data for including target object is input to machine learning model The information of machine learning model output, machine learning model are based on the video data including the history object in history video Corresponding relationship between the known description information of history object is trained.

Optionally, above-mentioned known description information is based on history object corresponding subtitle and/or history in history video Object in history video corresponding segment theme and formed.

Optionally, above-mentioned known description information includes any one or more in following information: temporal information, place letter Breath, type information, people information；

Temporal information indicates the time described in history object；

Location information indicates place described in history object；

Type information indicates event type described in history object；

People information indicates personage described in history object.

Optionally, target object is the video frame images in target video, and history object is the video frame in history video Image；

Or,

Target object is the video clip in target video, and history object is the video clip in history video.

Optionally, the video data including target object is specially target video, including the history object in history video Video data be specially history video；

Machine learning model is specifically based on the corresponding relationship between history video and history object, known description information It is trained；

In the case where target video is input to machine learning model, machine learning model specifically exports target object and mesh Mark description information.

Optionally, query result is specially target object itself.

Optionally, query result is formed with target object, returns to query result for user query request, comprising:

If target object includes multiple video frame images, believe according to the goal description information and inquiry of each video frame images Matched similarity degree between breath, is ranked up each video frame images, obtains the similitude sequence of each video frame images；

It sorts according to similitude, constructs cardon sequence with each video frame images；

Using cardon sequence as query result, query result is returned for user query request.

If target object is video clip, query result is formed with target video and the starting point identification of video clip；

It requests to return to query result for user query；

Wherein, starting point identification in target video for prompting the initial point position of video clip.

Optionally, query result is specifically the end of the starting point identification and video clip with target video, video clip What point identification was formed；

Wherein, terminate point identification, for prompting the end point position of video clip in target video.

Second aspect, in order to solve the above technical problems, the embodiment of the invention also provides a kind of query objects in video Device, which includes:

Receiving unit, for receiving the user query request for carrying query information；

Searching unit, for searching and the matched goal description information of query information, wherein goal description information is preparatory For the description information of the target object setting in target video；

Return unit returns to query result for forming query result with target object for user query request；

Temporal information indicates the time described in history object；

Location information indicates place described in history object；

Type information indicates event type described in history object；

People information indicates personage described in history object.

Or,

Optionally, query result is specially target object itself.

Optionally, return unit, comprising:

Sorting subunit, if including multiple video frame images for target object, according to the target of each video frame images Matched similarity degree between description information and query information, is ranked up each video frame images, obtains each video frame The similitude of image sorts；

Subelement is constructed, for sorting according to similitude, constructs cardon sequence with each video frame images；

Subelement is returned to, for returning to query result for user query request using cardon sequence as query result.

Optionally, return unit, comprising:

It requests to return to query result for user query；

The third aspect, in order to solve the above technical problems, the embodiment of the invention provides a kind of query objects in video Equipment includes that perhaps more than one program one of them or more than one program is stored in and deposits by memory and one In reservoir, and it is configured to execute one or more than one program by one or more than one processor to include for carrying out The instruction operated below:

Receive the user query request for carrying query information；

Temporal information indicates the time described in history object；

Location information indicates place described in history object；

Type information indicates event type described in history object；

People information indicates personage described in history object.

Or,

Optionally, query result is specially target object itself.

It requests to return to query result for user query；

Fourth aspect, the embodiment of the invention provides a kind of non-transitorycomputer readable storage mediums, work as storage medium In instruction when being executed by the processor of electronic equipment so that electronic equipment is able to carry out a kind of side of query object in video Method, method include:

Receive the user query request for carrying query information；

Temporal information indicates the time described in history object；

Location information indicates place described in history object；

Type information indicates event type described in history object；

People information indicates personage described in history object.

Or,

Optionally, query result is specially target object itself.

It requests to return to query result for user query；

Compared with prior art, the embodiment of the present invention has the advantage that

It in embodiments of the present invention, can be preparatory for each object in each video by the machine learning model trained Description information is set.It, can be in pre-set description information when user initiates to be directed to the inquiry request of certain query information Search the goal description information to match with the query information.If goal description information be for the target object in target video it is pre- First it is arranged, then query result can be formed with the target object and be returned to user.It can be seen that for a certain frame in video The objects such as image, some segment, user can quickly navigate to oneself in video by way of inquiry and want viewing Position, to simplify the operation of user.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in invention, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.

Fig. 1 is the block schematic illustration of an exemplary application scene in the embodiment of the present invention；

Fig. 2 is a kind of flow diagram of the method for query object in video in the embodiment of the present invention；

Fig. 3 is a kind of flow diagram of the training method of machine learning model provided in an embodiment of the present invention；

Fig. 4 is the effect diagram that query result provided in an embodiment of the present invention is more video frame images；

Fig. 5 is another effect diagram that query result provided in an embodiment of the present invention is more video frame images；

Fig. 6 is the display renderings that query result provided in an embodiment of the present invention is target video；

Fig. 7 is another display renderings that query result provided in an embodiment of the present invention is target video；

Fig. 8 is a kind of structural schematic diagram of the device of query object in video in the embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of the prompt equipment of the communication information in the embodiment of the present invention；

Figure 10 is the structural schematic diagram of server in the embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.

Applicant has found after study, when certain a part that user wants to watch in some video is (e.g., in complete video A certain frame image, alternatively, some segment in complete video) when, firstly, video search function of the user in search engine Under, the query information for wanting the object obtained is inputted in search box, search engine would generally be corresponding complete by the query information Whole video feed is to user；Then, user will play complete video obtained, be fast-forward to oneself manually during broadcasting Want the part of viewing.Obviously, when user want to check from the video search function of search engine it is a certain in complete video When part, need first to obtain complete video, then be operated manually, and the part that cannot really want user in video is anti- Feed user, causes to operate comparatively laborious and time-consuming, and user experience is bad.

It to solve the above-mentioned problems, in embodiments of the present invention, is in advance each in each video by machine learning model Corresponding description information is arranged in a object, when user wants to check some target object, as long as substantially according to target object Content determines that query information concurrently plays the inquiry request for the query information, and server matches by searching for query information Description information be determined as target object as goal description information, and by the corresponding object of goal description information, feed back To user, in this way, user just obtains the target object for wanting to check.By the above method, user can with quick search to certain A target object for oneself wanting viewing, to simplify the operation of user.

For example, the embodiment of the present invention can be applied to scene as shown in Figure 1.In this scenario, user terminal 101 It can be interacted by network 102 between server 103.Firstly, user terminal 101 can be in response to user input query The user's operation of information generates the user query request for carrying the query information according to the user's operation, and can lead to It crosses network 102 and the request of above-mentioned user query is sent to server 103.Server 103 receive user query request after, It can be searched in pre-set description information and the matched goal description information of the query information, wherein goal description letter Breath is in advance for the description information of target object setting.Server 103 can form query result with the target object, pass through Network 102 requests to return to the query result to user terminal 101 for the user query, so as to user terminal 101 to Family is presented or prompts the target object.

It is understood that user terminal 101 can be it is existing, researching and developing or in the future research and development, can pass through Any type of wiredly and/or wirelessly connection (for example, Wi-Fi, LAN, honeycomb, coaxial cable etc.) is realized hands over server 103 The mutual user equipment with video and/or image player function, including but not limited to: existing, researching and developing or future Smart phone, non-smart mobile phone, tablet computer of research and development etc..

In addition, server 103 be only it is existing, researching and developing or in the future research and development, can analyzing user queries request, Query result is obtained, and to an example of the equipment for user terminal feedback query result.Embodiments of the present invention are herein Aspect is unrestricted.

It is understood that in above-mentioned application scenarios, although can part by the action description of embodiment of the present invention Executed by user terminal 101, partially executed by server 103, can also executed completely by server 103, or completely by with Family terminal 101 executes.The present invention is unrestricted in terms of executing subject, moves disclosed in embodiment of the present invention as long as performing Work.

It should be noted that above-mentioned application scenarios are merely for convenience of understanding the present invention and showing, embodiment party of the invention Formula is unrestricted in this regard.On the contrary, embodiments of the present invention can be applied to applicable any scene.

With reference to the accompanying drawing, the various non-limiting embodiments that the present invention will be described in detail.

Illustrative methods

Referring to fig. 2, a kind of method flow diagram of query object in video in the embodiment of the present invention is shown.In this implementation In example, this method comprises:

Step 201, the user query request for carrying query information is received.

In specific implementation, user can open some application with function of search on the subscriber terminal, in the application Search box in key in query information；The application can be handled the query information, generate the use for carrying the query information Family inquiry request；User query request can be sent to server by the network on the user terminal belonging to it by the application； Server handles the user query request received, can be parsed out the query information of user's input.

In a kind of implementation, above-mentioned application can be the client-side program (Application, APP) on user terminal, Such as search dog input method APP, search dog search APP etc..User opens the APP on user terminal, inputs and looks into the search box of APP Information is ask, which produces user query using the query information and request, which requests user query by user's end Network on end, is sent to server, that is, server has received the user query request for carrying query information.

In another implementation, above-mentioned application can be browser, such as sogou browser, baidu browser etc..With The above-mentioned browser on user terminal, the input inquiry information in the search box of browser are opened in family, which is looked into using this It askes information and produces user query request, which requests to be sent to by the network on user terminal by user query Server, then, server has received the user query request for carrying query information.

For example, user opens the sogou browser on mobile phone, it is desirable to which search is about related " tall in the video of " the semi-gods and the semi-devils " The video clip of peak and monk's air exercise of sweeping the floor ", selects " video " function of search, input inquiry is believed in search box in the browser Breath: " Qiao Feng sweep the floor monk fight ", sogou browser generate one using above-mentioned query information and carry query information " Qiao Feng sweeps the floor Monk fights " user query instruction, and the user query are instructed, server, server are sent to by the network on the mobile phone The query information of " Qiao Feng sweep the floor monk fight " can be obtained from the user query instruction received.

Step 202, it searches and the matched goal description information of query information, wherein goal description information is in advance for mesh Mark the description information of the target object setting in video.

When specific implementation, on the server, all videos can be split, form multiple objects, it then can be pre- Corresponding description information first is respectively set for each object in each video, and stores between each object and its description information Corresponding relationship.In this way, can be searched in pre-set description information after server receives user query request With the matched goal description information of query information, and using object corresponding to goal description information as target object, the target Video where object is target video.Wherein, target object can be the video frame images in target video, or can also be with It is the video clip in target video.

It should be noted that being retouched with the matched description information of query information as target for user query request lookup Before stating information, need that corresponding description information is arranged for each object in video in advance on the server.In the present embodiment, Machine learning model can be used and generate description information for the object in video.For the video in server, each video Including at least one object.For each object in video, the input of the machine learning model, which can be in the video, includes The corresponding video data of object, the output of the machine learning model can be the description information of the object.That is, in video Each object for, by the video include object video data be input in the machine learning model trained, the machine Device learning model can export the description information of the object.Wherein, machine learning model for example can be Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN), convolutional neural networks (Convolutional Neural Network, CNN), deep neural network (Deep Neural Network, DNN), genetic algorithm etc..

For the machine learning model trained, input is in video on server include object video data, it is defeated It is the corresponding description information of each object out.Server can save the correspondence of the description information of all objects and each object Relationship.

As an example it is assumed that having on the server: video P and video Q.For video P, will include in video P object p1, The video data of p2 is separately input in the machine learning model trained, which can export: description information m1, Description information m2；For video Q, by the video data in video Q including object q1, q2, q3, q4, q5, it is separately input to In trained machine learning model, which can be exported: description information n1, description information n2, description information n3, be retouched State information n4, description information n5.Server can preserve the corresponding relationship of " object-description information ", specifically have: p1-m1, p2- m2、q1-n1、q2-n2、q3-n3、q4-n4、q5-n5。

It is understood that being engineering before goal description information is arranged in target object using machine learning model Practise model can by the way that the history object for being stored with description information in history video is trained as training sample, In, the training sample of the machine learning model is the history object in history video, and training data is the video counts of history object Accordingly and the corresponding known description information of history object.Specifically, which can be based on including in history video The video data of history object and the known description information of the history object between corresponding relationship be trained.Wherein, History object can be the video frame images in history video, or the video clip being also possible in history video.In history In the case that object is the video frame images in history video, target object is the video frame images in target video.In history In the case that object is the video clip in history video, target object is the video clip in target video.

It is known that description information, is the information for being known to describe the history object content.

As an example, it is known that description information can be that corresponding subtitle is formed in history video based on history object 's.For example, history object corresponding subtitle in history video are as follows: " I has the honor to assemble northern Murong Qiao Fengnan today at long last ", So, known description information: " Buddhist monk understands Qiao Feng and Murong " can be added for the history object according to the subtitle.

As another example, it is known that description information can be based on history object the corresponding segment master in history video What topic was formed.Wherein, segment theme is the included description information of history video, for describing related content in the history video. For example, segment theme of the history object in history video are as follows: " Qiao Feng and the monk that sweeps the floor carry out fierce fight ", then, according to The theme can add known description information for the history object: " Qiao Feng and the monk that sweeps the floor fight ".

As another example, it is known that description information can be based on history object in history video corresponding subtitle and What segment theme was formed.For example, subtitle of the history object in history video are as follows: " I has the honor to assemble northern Qiao Feng today at long last Southern Murong ", theme are as follows: " Qiao Feng and the monk that sweeps the floor carry out fierce fight ", then, it can be gone through according to above-mentioned subtitle and theme for this History object adds known description information: " Qiao Feng and the monk that sweeps the floor fight ".

It is understood that the additional information that video image can not embody can be depicted in subtitle and segment theme. For example, the image of a history video can only embody two people fighting and can not embody which two people is fighting, but should The included segment theme of history video is " Qiao Fengyu sweep the floor monk fight ", i.e. the included segment subject description of the history video should The plot of history video, which is in the image of the history video can not be reflected.Furthermore, Machine learning model is trained as known description information using the segment theme that history video carries, then training is completed Machine learning model is that the volume that can not embodied in the image of the video can be depicted in the description information of the object output in video Outer content.For example, machine learning model may be the view if the image of a video has embodied the scene that three people become sworn brothers or sisters The description information of object output in frequency is " Friendship ", rather than just " three people become sworn brothers or sisters ".It can be seen that passing through history pair As the known description information that subtitle corresponding in history video and/or segment theme are formed, determining known description can be made Information and video frame images perhaps the plot of video clip it is related realize it is more specific to video frame images or video clip, Accurately description.

It is understood that the known description information can wrap the information containing one or more types.For example, as it is known that retouching Stating information may include temporal information, location information, type information, any one or more in people information.Wherein, the time Information indicates the time described in history object；Location information indicates place described in history object；Type information expression is gone through Event type described in history object；People information indicates personage described in history object.

If known description information includes temporal information, location information, type information, multiple information in people information, I.e., it is known that description information includes type information as much as possible, and this makes it possible to more fully embody in history object Hold.

In some embodiments of the present embodiment, as shown in figure 3, based on the view including the history object in history video Frequency according to the corresponding relationship between the known description information of history object to the training process of machine learning model, specifically can be with Include:

Step 301, initialize machine learning model, using in each history video include history object video data as One, to training data, will be denoted as M to training data number, wherein M=history video number.

Step 302, if to training data number M > 0,303 are thened follow the steps to step 308；If to training data Number M=0, then directly terminate to train, and obtains the machine learning model trained.

Step 303, the video in next training data-history video including history object is extracted to training data Data i (i=1,2 ..., M).

Step 304, the video data i in history video including history object is input in machine learning model, is exported Learning object and study description information, as learning value j.

Step 305, the corresponding history object of video data i for including history object and known description are obtained in history video Information, as theoretical value k.

Step 306, it estimates the error between learning value j and theoretical value k, is denoted as error w.

Step 307, using error w, oppositely above description machine learning model is adjusted.

Step 308, to training data in deleting history video include the video data i of history object, and enable wait instruct Practice data amount check M=M-1, return step 302.

As an example it is assumed that RNN Recognition with Recurrent Neural Network of the machine learning model using one 4 layers, training data are It include the video data of history object in history video, specifically three history videos are respectively as follows: history video A, history video B, history video C, wherein for " the known description information of history object-" of history video A are as follows: a1-x1, a2-x2, a3-x3, a4-x4；For " the known description information of history object-" of history video B are as follows: b1-y1, b2-y2, b3-y3；For history video " the known description information of history object-" of C are as follows: c1-z1, c2-z2, c3-z3.So, according to training method shown in Fig. 3, tool Body are as follows: the first step is determined to training data number M=history video number=3；Second step, because of M > 0, extract history Video A；History video A is input in RNN Recognition with Recurrent Neural Network 1 by third step, exports learning object-study description information: A1 '-x1 ', a2 '-x2 ', a3 '-x3 ', a4 '-x4 ', as learning value A '；4th step obtains the history pair in history video A As-known description information: a1-x1, a2-x2, a3-x3, a4-x4, as theoretical value A "；5th step estimates learning value A ' and reason By the error between value A ", it is denoted as error delta A；6th step oppositely carries out RNN Recognition with Recurrent Neural Network 1 using error delta A Adjustment obtains RNN Recognition with Recurrent Neural Network 2；7th step, the deleting history video A to training data, and enable to training data Number M=3-1=2.

Above-mentioned second step is returned, because of M=2, M > 0, so, history video B is extracted, and repeat third step~7th Step, specifically: history video B is input in RNN Recognition with Recurrent Neural Network 2 by third step, output learning object-study description letter Breath: b1 '-y1 ', b2 '-y2 ', b3 '-y3 ', as learning value B '；4th step, the history object-obtained in history video B are known Description information: b1-y1, b2-y2, b3-y3, as theoretical value B "；5th step estimates the mistake between learning value B ' and theoretical value B " Difference is denoted as error delta B；6th step is oppositely adjusted RNN Recognition with Recurrent Neural Network 2 using error delta B, obtains RNN and follows Ring neural network 3；7th step, the deleting history video B to training data, and enable to training data number M=2-1=1.

Similarly, above-mentioned second step is returned, because of M=1, M > 0, so, history video C is extracted, and repeat third step ~the seven step, specifically: history video C is input in RNN Recognition with Recurrent Neural Network 3 by third step, exports learning object-study Description information: c1 '-z1 ', c2 '-z2 ', c3 '-z3 ', as learning value C '；4th step obtains the history pair in history video C As-known description information: c1-z1, c2-z2, c3-z3, as theoretical value C "；5th step estimates learning value C ' and theoretical value C " Between error, be denoted as error delta C；6th step is oppositely adjusted RNN Recognition with Recurrent Neural Network 3, obtains using error delta C Obtain RNN Recognition with Recurrent Neural Network 4；7th step, the deleting history video C to training data, and enable to training data number M=1- 1=0.

Similarly, above-mentioned second step is returned, because of M=0, terminate training, obtain the machine learning model trained Are as follows: RNN Recognition with Recurrent Neural Network 4.

In specific implementation, for the adjustment of machine learning model, with it is general machine learning model is trained when Adjustment procedural type, be specifically adjusted using what algorithm, be not construed as limiting in the present invention, as long as being able to achieve for machine The effect of learning model training, belongs to protection scope of the present invention.

Certainly, machine mould is trained using more history videos, to the number of machine learning model adjustment More, the learning performance of obtained machine learning model is better, then, using the machine learning model obtain object with And description information is just closer to theoretical value, that is, more accurate.Therefore training sample is more, the obtained machine learning model trained Better.

It is understood that machine learning model is output and input, it can be there are many different design methods.

As an example, in training machine learning model, machine learning mould can be used as using history object itself The input of type, output are the known description informations of history object.After training, what it is for expression is pair to this kind of machine learning model As the corresponding relationship between description information, therefore, the object of video in server itself is being input to the machine learning mould After type, the corresponding description information of the object is exported.

As another example, in training machine learning model, machine learning can be used as using history video itself Model, output are the known description informations of the history object and history object in the history video.This kind of machine learning model instruction After white silk, what it is for mark is corresponding relationship between video and object, description information, therefore, by video input to the machine After device learning model, export in the video object and corresponding description information.

By the example, it can use trained machine learning model and directly complete video handled, obtain Video is split as object in advance without technical staff, recycles machine learning mould by the corresponding object of the video and description information Type obtains description information, provides more convenient implementation to preset description information.

It should be noted that above-mentioned believe for the history object of training machine learning model and its corresponding known description Breath, can be a part of target objects on server, for user query, be also possible to dedicated for training machine study The training data sample of model, not as the target object for being supplied to user query.

The target object, as user want the object checked, in the concrete realization, there are two kinds of possibility for target object The case where:

In a kind of situation, object is the video frame images in video.So, several picture frames, each frame are divided into video Image is an object, the corresponding description information of each object.

For example, it is assumed that there are 100 picture frames in video A, video A is divided into 100 frame images according to picture frame: image 1, Image 2 ..., image 100, each image is an object, that is, in video A include 100 objects: object 1, object 2 ..., Object 100, each object have 1 description information, then, corresponding 100 description informations of 100 objects in video A: description letter Breath 1, description information 2 ..., description information 100；Corresponding relationship are as follows:

In another case, object is the video clip in video.So, several video clips are divided into video, it is each A video clip is an object, the corresponding description information of each object.

For example, for video A, video A is divided into 10 video clips according to video clip: segment 1, segment 2 ..., Segment 10, each video clip are an object, that is, in video A include 10 objects: object 1, object 2 ..., object 10, often A object has 1 description information, then, corresponding 10 description informations of 10 objects in video A: description information 1, description information 2 ..., description information 10；Corresponding relationship are as follows:

In some embodiments, user can also apply in carrying out video before the inquiry of object or when inquiring Upper setting object type to be checked, specifically can be video frame images, video clip or unlimited.If pre-set to be checked The object type of inquiry is video frame images, can be carried out in step 202 just for the object that object type is video frame images It searches；It is in step 202 video just for object type if pre-set object type to be checked is video clip The object of segment is searched；If pre-set object type to be checked is that be not provided with this to be checked by unlimited or user The type of object is ask, can be searched in step 202 for all object types.

Step 203, query result is formed with target object, returns to the query result for user query request.

Query result, a kind of implementation are as follows: using target object itself as query result are formed with target object；It is another Kind of implementation are as follows: using position indicating in target video of target video belonging to target object and target object as Query result.

Under the first implementation, query result specifically includes target object itself, returns to the query result to user.

Under the implementation, there are following two kinds of situations:

In a kind of situation, target object in query result only one.

For example, the query result returned for user can be if obtained query result is a video frame images The video frame images.User keys in query information " XXXX " in search box.If the description for the object a1 being matched in video A Information x1, then description information x1 is goal description information, and the corresponding object a1 of goal description information x1 is target object, Video A belonging to it is target video, then can show the corresponding mesh of description information x1 in query result display area Mark object a1, as video frame images " a1 ".

For another example, if obtained query result is a video clip, the query result returned for user can be this Video clip.If the description information z1 for the object c1 being matched in video C, description information z1 is goal description letter Breath, the corresponding object c1 of goal description information z1 are target object, belonging to video C be target video, then can be In query result display area, the corresponding target object a1 of description information z1, as video clip " c1 " are shown.

Optionally, in order to optimize user experience, its corresponding description information can also be shown below target object, also The link of the corresponding target video of the target object can be shown in the lower section of the description information of target object, user is facilitated to check Complete video content.

In another case, the target object in query result have it is multiple.

For example, if the query result returned for user can be with when obtained query result is multiple video frame images For multiple video frame images.User keys in query information " YYYY " in search box.If the object a1 being matched in video A Description information x1, video A in object a3 description information x3, video P in object p2 description information m2, then the description Information x1, x3, m2 are goal description information, and goal description information x1, x3, m2 corresponding object a1, a3, p2 are target pair As, belonging to video A, P be target video, then can show x1 pairs of the description information in query result display area The target object a1 answered, as video frame images " a1 ", the corresponding target object a3 of description information x3, as video frame images " a3 ", the corresponding target object p2 of description information m2, as video frame images " p2 ".

For another example, if obtained query result is multiple video clips, the query result returned for user can be Multiple video clip.If the description of the object c3 in the description information z1 for the object c1 being matched in video C, video C is believed The description information n2 of z3, object q2 in video Q are ceased, then description information z1, z3, n2 is goal description information, the target Description information z1, z3, n2 corresponding object c1, c3, q2 be target object, belonging to video C, Q be target video, then may be used In query result display area, to show the corresponding target object c1 of description information z1, as video clip " c1 ", description The corresponding target object c3 of information z3, as video clip " c3 ", the corresponding target object q2 of description information n2, as piece of video Section " q2 ".

For another example, if existing video clip is then returned when also having video frame images for user in obtained multiple queries result The query result returned is the multiple queries result that the existing video clip has video frame images again.If be matched in video B The description information m1 of object p1 in description information y3, the video P of object b3 in description information y1, the video B of object b1, then The description information y1, y3, m1 are goal description information, and goal description information y1, y3, m1 corresponding object b1, b3, p1 are Target object, belonging to video B, P be target video, then can in query result display area, show the description believe Cease the corresponding target object b1 of y1, as video clip " b1 ", the corresponding target object b3 of description information y3, as video clip " b3 ", the corresponding target object p1 of description information m1, as video frame images " p1 ".

Optionally, in order to optimize user experience, display can also be corresponded to below each target object, and its corresponding is retouched Information is stated, can also show the chain of the corresponding target video of the target object in the lower section of the description information of each target object It connects；Optionally, if one page cannot show all query results completely, a page turning can be shown in the bottom end of the page Plug-in unit provides the possibility for showing more query results.

It should be noted that in the case that above-mentioned query result includes multiple target objects itself, this multiple target object It can be from identical target video, be also possible to be not construed as limiting in the present invention from different target videos；For looking into The display for asking result, can be according to the similitude of the corresponding goal description information of each target object and query information from big to small It is ranked up, can also check that the historical data of video carries out viewing possibility and is ranked up from big to small according to user, have The sortord that body uses, is not construed as limiting in the present invention.

In above-mentioned implementation, using target object itself as query result, and the query result is directly returned to use Family, user can intuitively receive query result, do not have to carry out searching target object manually again from entire video, save User time, the user experience is improved.

In addition, if object type to be checked is video frame images, and the corresponding mesh of goal description information being matched to When mark object is multiple, other than above-mentioned display mode, there is also the display modes of another multiple video frame images, specifically may be used To include:

The similitudes of each video frame images sorts, refer to each video frame images as target object, it is corresponding according to its The similarity degree between query information that goal description information and user key in, if similarity degree is higher, above-mentioned similitude is got over Greatly, the similitude sequence of the video frame images is more forward；, whereas if similarity degree is lower, above-mentioned similitude is with regard to smaller, the view The similitude sequence of frequency frame image is more rearward.

Cardon sequence refers to that handling multiple original images in a certain order is an image collection, and treated Image collection can it is dynamic, successively show every original image in the order described above.

In specific implementation, it is assumed that obtained target object includes 3 video frame images: image 1, image 1, image 3 are pressed According to matched similarity degree between the goal description information and query information of each video frame images, to this 3 video frame images It is ranked up, the high video frame images of similarity degree come forward position, obtain the similitude sequence of 3 video frame images Are as follows: 2 > image of image, 3 > image 1；It sorts according to above-mentioned similitude, constructs cardon sequence O with 3 video frame images；With cardon sequence O is arranged as query result, returns to query result for user query request.

Wherein, there are many kinds of the modes for constructing cardon sequence: the first, as shown in figure 4, by image 2, image 3, image 1 The cardon sequence o1 together, built is stacked gradually, image 2 is in top, and image 1 is in bottom；At this point, passing through fixation Time interval (such as 0.1s), or the operation such as touch image realizes the transformation between image.For example, cardon sequence o1 is initial Show image 2, after by 0.1s or touching picture 2, cardon sequence o1 shows image 3；Using 0.1s or touch picture 3 Afterwards, cardon sequence o1 shows image 1；Again by 0.1s or after touching picture 3, cardon sequence o1 shows image 1；By 0.1s Or after touching picture 1, cardon sequence o1 shows image 2, circuits sequentially back and forth, chooses some until user touches or executes again Until the operation of image.

Second, as shown in figure 5, image 2, image 3, image 1 are successively stitched together in accordance with the order from top to bottom, Form a cardon sequence o2, wherein image 2 is in the top of cardon sequence o2, and image 1 is in the bottom of cardon sequence o2； So, dynamic sequence o2 only shows a part of content of the dynamic sequence in display area, under normal circumstances display portion Size is identical as size shown by a still image.At this point, by fixed time interval (such as 0.1s), or upwards (under) sliding etc. operation, realize image between transformation.For example, the upper half of cardon sequence o2 initial display image 2 and image 3 Point, after 0.1s or upward sliding dynamic sequence o2, cardon sequence o2 shows lower half portion and the image 1 of image 3；It passes through again After crossing 0.1s or slide downward dynamic sequence o2, cardon sequence o1 shows the top half of image 2 and image 3；It circuits sequentially Back and forth, until user touches or execute the operation for choosing some image again.

Multiple video frame images are shown above by the mode of building cardon sequence, return to user's as query result Mode understands lossless compression image to a certain extent, occupies query result as few as possible due to constructing the operation of cardon sequence Space resources, facilitate transmission, check and download.

Under second of implementation, query result specifically includes target video belonging to target object and target object Position indicating information in target video, then return to the query result can specifically include to user:

It requests to return to query result for user query；

In this implementation, return to user query result specifically may is that target video belonging to target object, Starting point identification (such as starting playing time, initial point position) of the target object in target video.Show the query result Mode as shown in fig. 6, displaying target video broadcast interface, and " the starting point of target object is marked at playing progress bar Mark ".The complete video is not played in the first preset time (such as 2s), and in first preset time, in the window Significant position provides the first prompt information: " whether directly playing target object ".If it exceeds the first preset time, and user Above-mentioned first prompt information is not triggered, then is played out from 0 moment of the complete video, it is preferable that in playing progress bar The lower section of " the starting point identification of target object " shows a prompting frame, prompts user " whether jumping to point viewing ".If It in first preset time, monitors that user triggers above-mentioned first prompt information, is then played at " starting point of target object " The video.

If user wants to check some complete video, but in the case where only remembering one of segment plot, then, User only needs the key message of segment of oneself memory extracting query information, using above-mentioned implementation of the invention, The corresponding complete video of the query information can be obtained, provide possibility for above situation, moreover, passing through prompt information and label Form, overcome the deficiency that cannot directly play target object in the prior art, realize the effect for directly determining target object Fruit.

In addition, query result can also be with target video, the starting point identification of video clip and view in the implementation Frequency segment terminates what point identification was formed；

In this implementation, return to user query result specifically may is that target video belonging to target object, Starting point identification, target object starting point identification in target video of the target object in target video.Show the inquiry As a result mode is as shown in fig. 7, similar with specific implementation situation shown in fig. 6, and only " target object rises will be played to When initial point mark ", in the lower section of playing progress bar " the end point identification of target object ", a prompting frame is also shown, prompts to use Whether family " terminates to watch " in the point, and when being played to " the starting point identification of target object ", and pause plays, and second In preset time (such as 2s), whether significant position provides the second prompt information in the window: " terminating to play "；If it exceeds the Two preset times, and user does not trigger above-mentioned second prompt information, then plays out from pause and continue to play the target video, directly To playing entire complete video；If monitoring that user triggers above-mentioned second prompt information, then in the second preset time From the broadcasting for terminating the video at once.

As an example it is assumed that the target object obtained is video clip s, target video belonging to video clip s is view Frequency X, at the time of starting point is identified as in playing progress bar, specifically: 02 point 30 seconds；End point is identified as in playing progress bar Moment, specifically: 12 points 40 seconds.Query result are as follows: and video X, 02 point 30 seconds, 12 points 40 seconds, the inquiry knot when the user clicks When fruit, the broadcast interface of video X is shown, but do not play video X in 2s, and significant position display reminding in the window: " whether directly playing video clip s ".

In a kind of situation, user clicks the prompt, then plays at " 02 point and 30 seconds "；Until " 12 points will be played to 40 seconds " when, it in the lower section of playing progress bar " 12 points and 40 seconds ", prompts user " whether terminating to watch in the point ", user clicks should It after prompt, is played to " 12 points and 40 seconds " and exits broadcasting at once；Alternatively, user does not click mentioning for " whether terminating to watch in the point " Show, when being played to " 12 points and 40 seconds ", pause is played, and in 2s, whether significant position provides prompt in the window: " tying Beam plays ", user, which clicks this prompt, can exit broadcasting.In this case, play video X 02 point 30 seconds to 12 points 40 seconds it Between content, that is, played the content of target object s.

In another case, user does not click " whether directly playing video clip s ", then from the initial position of video X Start to play；When will be played to " 12 points and 40 seconds ", in the lower section of playing progress bar " 12 points and 40 seconds ", user is prompted " to be It is no to terminate to watch in the point ", it after user clicks the prompt, is played to " 12 points and 40 seconds " and exits broadcasting at once；Alternatively, user does not have The prompt for clicking " whether terminating to watch in the point ", when being played to " 12 points and 40 seconds ", pause is played, and in 2s, in window In significant position provide prompt: whether " terminating to play ", user, which clicks this prompt, can exit broadcasting.In this case, playing Assign to 12 points of contents between 40 seconds in the 0 of video X.

In the case of another, user does not click " whether directly playing video clip s ", then from the initial position of video X Start to play；When will be played to " 12 points and 40 seconds ", in the lower section of playing progress bar " 12 points and 40 seconds ", user is prompted " to be It is no to terminate to watch in the point ", user does not click the prompt, and when being played to " 12 points and 40 seconds ", pause is played, and in 2s Interior, significant position provides prompt in the window: whether " terminating to play ", user does not click this prompt yet；Know and is played to At the ending of video X, broadcasting is exited.In this case, playing all the elements of video X.

Using above-mentioned implementation, wants to check some complete video for user, but only remember one of segment feelings The case where section provides convenience, efficiently inquiry mode, is not only added to the starting point identification of video clip, but also increase view The end point identification of frequency segment provide the user the implementation for accurately checking target object, keep user more flexible Determine oneself viewing video content.

In the present embodiment, all already existing each videos are segmented by server, form different history pair As, and known description information is added for each history object；Server is by each history video and history object, known description Training sample of the corresponding relationship of information as machine learning model, is trained machine learning model, has been trained Machine learning model.Each new video ever-increasing in server is separately input in the machine learning model trained, Export different newly-increased objects and corresponding newly-increased description information.History object and newly-increased object are denoted as object, it will be known Description information and newly-increased description information are denoted as description information.

When user initiates the user query request for certain query information, server is in pre-set description information In, search the goal description information to match with the query information；If goal description information is for the target pair in target video As pre-set, then query result can be formed with the target object and returned to user.It can be seen that in video The objects such as a certain frame image, some segment, the mode for the above-mentioned inquiry that user can provide through the embodiment of the present invention, quickly Find in video oneself want viewing position, thus simplify user operation.

Example devices

Referring to Fig. 8, a kind of structural schematic diagram of the device of query object in video in the embodiment of the present invention is shown.? In the present embodiment, described device for example be can specifically include: receiving unit 801, searching unit 802 and return unit 803.

The receiving unit 801, for receiving the user query request for carrying query information；

The searching unit 802, for searching and the matched goal description information of query information, wherein goal description letter Breath is in advance for the description information of the target object setting in target video；

The return unit 803 returns to inquiry knot for forming query result with target object for user query request Fruit；

Optionally, the known description information is based on history object corresponding subtitle and/or history in history video Object in history video corresponding segment theme and formed.

Optionally, the known description information includes any one or more in following information: temporal information, place letter Breath, type information, people information；

The temporal information indicates the time described in the history object；

The location information indicates place described in the history object；

The type information indicates event type described in the history object；

The people information indicates personage described in the history object.

Optionally, the target object is the video frame images in target video, and the history object is in history video Video frame images；

Or,

The target object is the video clip in target video, and the history object is the piece of video in history video Section.

Optionally, the video data including the target object is specially the target video, and described includes history The video data of history object in video is specially the history video；

The machine learning model has specifically been based on the history video and the history object, the known description are believed Corresponding relationship between breath is trained；

In the case where the target video is input to the machine learning model, the machine learning model is specifically exported The target object and the goal description information.

Optionally, the query result in the return unit 803, specially target object itself.

Optionally, the return unit 803, comprising:

It requests to return to query result for user query；

Wherein, the starting point identification in target video for prompting the initial point position of video clip.

Optionally, the query result is specifically with target video, the starting point identification of video clip and video clip Terminate what point identification was formed；

Wherein, the end point identification, for prompting the end point position of video clip in target video.

The device that real-time mode of the present invention provides can be each in each video by the machine learning model trained Object presets description information, and user can directly input the relevant information of the video clip or image wanting to check, makees For query information；By match query information and description information, goal description information is obtained；Believed using goal description Breath can determine target object corresponding with the query information, that is, user wants the video clip or image checked；Into And query result is formed with the target object, return to user.As it can be seen that user in some segment for wanting to check in video or When person's image, it is no longer necessary to search entire video, then manually adjust, navigate to the position for oneself wanting viewing, and utilize this Invent provide above-mentioned apparatus, can by directly by inquiry in a manner of, quickly navigate in video oneself want viewing position It sets, provides users with the convenient, optimize user experience.

Referring to Fig. 9, device 900 may include following one or more components: processing component 902, memory 904, power supply Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and Communication component 916.

The integrated operation of the usual control device 900 of processing component 902, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing component 902 may include that one or more processors 920 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate Interaction between media component 908 and processing component 902.

Memory 904 is configured as storing various types of data to support the operation in equipment 900.These data are shown Example includes the instruction of any application or method for operating on device 900, contact data, and telephone book data disappears Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 906 provides electric power for the various assemblies of device 900.Power supply module 906 may include power management system System, one or more power supplys and other with for device 900 generate, manage, and distribute the associated component of electric power.

Multimedia component 908 includes the screen of one output interface of offer between described device 900 and user.One In a little embodiments, screen may include liquid crystal display (LAD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 908 includes a front camera and/or rear camera.When equipment 900 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike Wind (MIA), when device 900 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set Part 916 is sent.In some embodiments, audio component 910 further includes a loudspeaker, is used for output audio signal.

I/O interface 912 provides interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 914 includes one or more sensors, and the state for providing various aspects for device 900 is commented Estimate.Such as sensor module 914 can detecte the state that opens/closes of equipment 900, the relative positioning of component, such as described Component is the display and keypad of device 900, and sensor module 914 can be with 900 1 components of detection device 900 or device Position change, the existence or non-existence that user contacts with device 900,900 orientation of device or acceleration/deceleration and device 900 Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 914 can also include optical sensor, such as AMOS or AAD imaging sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 916 is configured to facilitate the communication of wired or wireless way between device 900 and other equipment.Device 900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 916 further includes near-field communication (NFA) module, to promote short range communication.Example Such as, NFA module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 900 can be believed by one or more application specific integrated circuit (ASIA), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

Specifically, the embodiment of the invention provides a kind of equipment of query object in video, which can be specially Device 900 includes that perhaps more than one program one of them or more than one program is deposited by memory 904 and one Be stored in memory 904, and be configured to by one or more than one processor 920 execute the one or more Program includes the instruction for performing the following operation:

Receive the user query request for carrying query information；

It searches and the matched goal description information of the query information, wherein the goal description information is in advance for mesh Mark the description information of the target object setting in video；

Query result is formed with the target object, returns to the query result for user query request；

Wherein, the goal description information is to be input to machine learning model in the video data including the target object In the case where machine learning model output information, the machine learning model is based on including going through in history video Corresponding relationship between the video data of history object and the known description information of the history object is trained.

Optionally, it is known that description information be based on the history object in the history video corresponding subtitle and/or The history object in the history video corresponding segment theme and formed.

Optionally, it is known that description information includes any one or more in following information: temporal information, location information, Type information, people information；

The temporal information indicates the time described in the history object；

The location information indicates place described in the history object；

The type information indicates event type described in the history object；

The people information indicates personage described in the history object.

Optionally, the processor 920 be also used to execute the one or more programs include for carry out with The instruction of lower operation:

The target object is the video frame images in the target video, and the history object is in the history video Video frame images；

Or,

The target object is the video clip in the target video, and the history object is in the history video Video clip.

The video data including the target object is specially the target video, and described includes in history video The video data of history object is specially the history video；

The query result is specially described target object itself.

It is described that query result is formed with the target object, the query result, packet are returned for user query request It includes:

If the target object includes multiple video frame images, according to the goal description information of each video frame images The matched similarity degree between the query information is ranked up each video frame images, obtains each view The similitude of frequency frame image sorts；

It sorts according to the similitude, constructs cardon sequence with each video frame images；

Using the cardon sequence as the query result, the query result is returned for user query request.

If the target object is video clip, formed with the starting point identification of the target video and the video clip The query result；

The query result is returned for user query request；

Wherein, the starting point identification is used to prompt the initial point position of the video clip in the target video.

Optionally, the query result is specifically with the target video, the starting point identification of the video clip and institute That states video clip terminates what point identification was formed；

Wherein, the end point identification, for prompting the end point position of the video clip in the target video.

The embodiment of the invention also provides a kind of non-transitorycomputer readable storage medium including instruction, for example including The memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of device 900 to complete the above method.For example, described non- Provisional computer readable storage medium can be ROM, random access memory (RAM), AD-ROM, tape, floppy disk and light number According to storage equipment etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment When device executes, so that a kind of method that electronic equipment is able to carry out query object in video, which comprises

Receive the user query request for carrying query information；

Figure 10 is the structural schematic diagram of server in the embodiment of the present invention.The server 1000 can be different because of configuration or performance And generate bigger difference, may include one or more central processing units (aentral proaessing units, APU) 1022 (for example, one or more processors) and memory 1032, one or more storage application programs 1042 or data 1044 storage medium 1030 (such as one or more mass memory units).Wherein, memory 1032 It can be of short duration storage or persistent storage with storage medium 1030.Be stored in storage medium 1030 program may include one or More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further Ground, central processing unit 1022 can be set to communicate with storage medium 1030, and storage medium 1030 is executed on server 1000 In series of instructions operation.

Server 1000 can also include one or more power supplys 1026, one or more wired or wireless nets Network interface 1050, one or more input/output interfaces 1058, one or more keyboards 1056, and/or, one or More than one operating system 1041, such as Windows ServerTM, Maa OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of method of query object in video characterized by comprising

Receive the user query request for carrying query information；

It searches and the matched goal description information of the query information, wherein the goal description information is to regard in advance for target The description information of target object setting in frequency；

Wherein, the goal description information is to be input to the feelings of machine learning model in the video data including the target object The information of the machine learning model output under condition, the machine learning model are based on including the history pair in history video Corresponding relationship between the video data of elephant and the known description information of the history object is trained.

2. the method according to claim 1, wherein the known description information is existed based on the history object In the history video corresponding subtitle and/or the history object in the history video corresponding segment theme and formed 's.

3. the method according to claim 1, wherein the known description information includes any in following information It is one or more: temporal information, location information, type information, people information；

The temporal information indicates the time described in the history object；

The location information indicates place described in the history object；

The type information indicates event type described in the history object；

The people information indicates personage described in the history object.

4. method according to any one of claims 1 to 3, which is characterized in that the target object is target view Video frame images in frequency, the history object are the video frame images in the history video；

Or,

The target object is the video clip in the target video, and the history object is the video in the history video Segment.

5. according to the method described in claim 4, it is characterized in that, the video data including the target object is specially The target video, the video data including the history object in history video is specially the history video；

The machine learning model be specifically based on the history video and the history object, the known description information it Between corresponding relationship trained；

In the case where the target video is input to the machine learning model, described in the machine learning model specifically exports Target object and the goal description information.

6. the method according to claim 1, wherein the query result is specially described target object itself.

7. the method according to claim 1, wherein described form query result with the target object, for institute It states user query request and returns to the query result, comprising:

Goal description information and institute if the target object includes multiple video frame images, according to each video frame images Matched similarity degree between query information is stated, each video frame images are ranked up, obtains each video frame The similitude of image sorts；

8. the method according to claim 1, wherein described form query result with the target object, for institute It states user query request and returns to the query result, comprising:

If the target object is video clip, formed with the starting point identification of the target video and the video clip described Query result；

The query result is returned for user query request；

9. according to the method described in claim 8, it is characterized in that, the query result is specifically with the target video, institute What the end point identification of the starting point identification and the video clip of stating video clip was formed；

10. a kind of device of query object in video characterized by comprising

Searching unit, for searching and the matched goal description information of the query information, wherein the goal description information is It is in advance the description information of the target object setting in target video；

Return unit returns to the inquiry knot for forming query result with the target object for user query request Fruit；

11. a kind of equipment of query object in video, which is characterized in that include memory and one or one with On program, one of them perhaps more than one program be stored in memory and be configured to by one or more than one It includes the instruction for performing the following operation that processor, which executes the one or more programs:

Receive the user query request for carrying query information；

12. a kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment When device executes, so that a kind of method that electronic equipment is able to carry out query object in video, which comprises

Receive the user query request for carrying query information；