CN113283480A

CN113283480A - Object identification method and device, electronic equipment and storage medium

Info

Publication number: CN113283480A
Application number: CN202110523550.1A
Authority: CN
Inventors: 王视鎏
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-20
Anticipated expiration: 2041-05-13
Also published as: CN113283480B

Abstract

The invention relates to an object identification method, an object identification device, an electronic device and a storage medium, wherein the object identification method comprises the following steps: receiving an identification request aiming at an object in a currently played target video; extracting the video identification and the current playing time of the target video from the identification request; acquiring an object library corresponding to the video identifier, wherein the object library comprises recognition results of objects in a plurality of video frames of the target video, and the recognition results are obtained by recognizing clustering results of the objects in the target video; and in the object library, acquiring an identification result of an object in a video frame corresponding to at least one corresponding timestamp based on the current playing time, and returning the identification result. According to the embodiment of the invention, even if the object to be identified in the picture presents the condition that the side face, the back shadow, the defocus or the shielding are difficult to identify, the object in the video frame can be identified, and the success rate of object identification is improved.

Description

Object identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an object identification method and apparatus, an electronic device, and a storage medium.

Background

With the development of the internet, users often watch favorite videos through the video APP, and for strange roles, actors or lines and the like appearing in the videos, the users need to independently inquire and search related contents on the internet, which is very inconvenient.

In order to meet the identification requirements of a user on roles, actors, lines and the like in the film watching process, the video APP provides an automatic identification function, so that the user can identify the content in the video by clicking when watching the video.

However, at present, the identification of video content is mainly realized based on information analysis in video, but an object to be identified in a picture presents a side face, a back shadow, defocus or occlusion, resulting in low identification success rate.

Disclosure of Invention

In order to solve the technical problems described above or at least partially solve the technical problems, the present application provides an object recognition method, an apparatus, an electronic device, and a storage medium.

In a first aspect, the present application provides an object recognition method, including:

receiving an identification request aiming at an object in a currently played target video;

extracting the video identification and the current playing time of the target video from the identification request;

acquiring an object library corresponding to the video identifier, wherein the object library comprises recognition results of objects in a plurality of video frames of the target video, the recognition results are obtained by recognizing clustering results of the objects in the target video, and each video frame has a corresponding timestamp;

and in the object library, acquiring an identification result of an object in a video frame corresponding to at least one corresponding timestamp based on the current playing time, and returning the identification result to display the identification result.

Optionally, before receiving an identification request for an object in a currently playing target video, the method further includes:

extracting a plurality of video frames from the target video according to a preset sampling rate;

carrying out object detection on a plurality of video frames in the target video to obtain an object image in each video frame;

clustering objects in the object images to obtain a clustering result, wherein the clustering result corresponds to one or more object images;

identifying the clustering results based on a preset object identification library to obtain an identification result corresponding to each clustering result;

determining a video frame corresponding to each clustering result according to the object image corresponding to each clustering result and the video frame to which the object image belongs;

determining an identification result corresponding to the object in each video frame according to the identification result corresponding to each clustering result;

and constructing the object library based on the identification result corresponding to the object in each video frame.

Optionally, in the object library, obtaining, based on the current playing time, an identification result of an object in a video frame corresponding to at least one corresponding timestamp, where the identification result includes:

determining a query time period based on the current playing time;

determining, in the object library, at least one target video frame with a timestamp within the query time period;

merging repeated recognition results of the same object in at least one target video frame;

and determining the unrepeated identification result after merging as the identification result of the object in the corresponding video frame at the current playing moment.

Optionally, the method further comprises:

acquiring object images in a plurality of video frames of the target video, wherein the object images are obtained by carrying out object detection on objects in the video frames;

carrying out image quality evaluation on objects in a plurality of video frames to obtain the quality scores of the object images in the video frames;

determining a video frame where an object image with the highest quality score is located for each object in video frames corresponding to at least one corresponding timestamp acquired based on the current playing time;

and returning the display information corresponding to the video frame where the object image with the highest quality score is located, so as to display the object image and the identification result together.

Optionally, the method further comprises:

acquiring a frame identifier of a video frame where the object image with the highest quality score is located;

the returning of the display information corresponding to the video frame where the object image with the highest quality score is located so as to be displayed together with the identification result includes:

and returning the frame identification corresponding to the identification result, and correspondingly displaying the video frame corresponding to the frame identification according to the displayed identification result.

Optionally, the method further comprises:

positioning object images in a plurality of video frames of the target video, and determining the image position of each object image in the corresponding video frame;

aiming at each object, acquiring the image position of the object image in the video frame where the object image with the highest quality score is located;

and returning the image position corresponding to the recognition result so as to correspondingly mark the image position in the displayed video frame according to the displayed recognition result.

Optionally, obtaining a frame identifier of a video frame in which the object image with the highest quality score is located includes:

determining whether the video frame corresponding to the acquired at least one corresponding timestamp contains object images of at least two objects in the video frame corresponding to the acquired at least one corresponding timestamp based on the current playing time;

if the video frame corresponding to at least one acquired corresponding timestamp contains the object images of at least two objects, determining whether the object image with the highest quality score of each object is located in the same video frame;

and if the object image with the highest quality score of each object is positioned in the same video frame, acquiring the frame identifier of the video frame.

Optionally, the method further comprises:

and if the object image with the highest quality score of each object is positioned in different video frames, acquiring the frame identifier of the video frame where the object image with the highest quality score of each object is positioned, and correspondingly displaying the video frame corresponding to the frame identifier according to the displayed identification result.

Optionally, the determining, for each object, a video frame in which an object image with the highest quality score is located in the video frames corresponding to at least one corresponding timestamp acquired based on the current playing time includes:

acquiring lens information corresponding to a lens contained in the target video, wherein the lens information comprises a lens time range of the lens;

determining a shot time range corresponding to each object in a video frame corresponding to at least one corresponding timestamp acquired based on the current playing time;

and determining the video frame where the object image with the highest quality score is located in the video frames with the time stamps within the shot time range.

Optionally, the method further comprises:

acquiring a head portrait image of an object in a video frame where an object image with the highest quality score is located;

and returning the head portrait image to correspondingly display the head portrait image of the object according to the displayed recognition result.

In a second aspect, the present application provides an object recognition apparatus, comprising:

the receiving module is used for receiving an identification request aiming at an object in a currently played target video;

the extraction module is used for extracting the video identifier and the current playing time of the target video from the identification request;

a first obtaining module, configured to obtain an object library corresponding to the video identifier, where the object library includes recognition results of objects in multiple video frames of the target video, the recognition results are obtained by recognizing clustering results of the objects in the target video, and each video frame has a corresponding timestamp;

and the second acquisition module is used for acquiring an identification result of an object in the video frame corresponding to at least one corresponding timestamp in the object library based on the current playing time, and returning the identification result to display the identification result.

Optionally, before the receiving module, the apparatus further comprises:

the extraction module is used for extracting a plurality of video frames from the target video according to a preset sampling rate;

the detection module is used for carrying out object detection on a plurality of video frames in the target video to obtain an object image in each video frame;

the clustering module is used for clustering objects in the object images to obtain a clustering result, and the clustering result corresponds to one or more object images;

the first identification module is used for identifying the clustering results based on a preset object identification library to obtain an identification result corresponding to each clustering result;

the first determining module is used for determining a video frame corresponding to each clustering result according to the object image corresponding to each clustering result and the video frame to which the object image belongs;

the second determining module is used for determining the identification result corresponding to the object in each video frame according to the identification result corresponding to each clustering result;

and the construction module is used for constructing the object library based on the identification result corresponding to the object in each video frame.

Optionally, the second obtaining module includes:

a first determining unit, configured to determine a query time period based on the current play time;

a second determining unit, configured to determine, in the object library, a target video frame with at least one timestamp located within the query time period;

a merging unit, configured to merge repeated recognition results of the same object in at least one target video frame;

and the third determining unit is used for determining the unrepeated identification result after the merging as the identification result of the object in the corresponding video frame at the current playing time.

Optionally, the apparatus further comprises:

a third obtaining module, configured to obtain object images in multiple video frames of the target video, where the object images are obtained by performing object detection on objects in the video frames;

the first evaluation module is used for evaluating the image quality of the object in a plurality of video frames to obtain the quality score of the object image in each video frame;

a third determining module, configured to obtain, for each object, a video frame in which an object image with a highest quality score is located, in video frames corresponding to at least one corresponding timestamp obtained based on the current playing time;

a fourth determining module, configured to determine, for each object, a video frame in which an object image with a highest quality score is located, in video frames corresponding to at least one corresponding timestamp acquired based on the current playing time;

and the first returning module is used for returning the display information corresponding to the video frame where the object image with the highest quality score is located so as to display the identification result together.

Optionally, the apparatus further comprises:

the fourth acquisition module is used for acquiring the frame identifier of the video frame where the object image with the highest quality score is located;

the first returning module is further configured to return the frame identifier corresponding to the recognition result, so as to correspondingly display the video frame corresponding to the frame identifier according to the displayed recognition result.

Optionally, the apparatus further comprises:

a fourth determining module, configured to locate object images in multiple video frames of the target video, and determine image positions of the object images in corresponding video frames;

the fifth acquisition module is used for acquiring the image position of the object image in the video frame where the object image with the highest quality score is located and the identification result of the object in the object image aiming at each object;

and the second returning module is used for returning the image position corresponding to the identification result so as to correspondingly mark the image position in the displayed video frame according to the displayed identification result.

Optionally, the third determining module includes:

a fourth determining unit, configured to determine, in the video frame acquired based on the current playing time, whether an object image of at least two objects is included in a video frame corresponding to the acquired at least one corresponding timestamp;

a fifth determining unit, configured to determine whether an object image with a highest quality score of each object is located in the same video frame if the video frame corresponding to the obtained at least one corresponding timestamp contains object images of at least two objects;

the first acquisition unit is used for acquiring the frame identifier of each video frame if the object image with the highest quality score of each object is positioned in the same video frame.

Optionally, the third determining module further includes:

and the second acquisition unit is used for acquiring the frame identifier of the video frame where the object image with the highest quality score of each object is located if the object image with the highest quality score of each object is located in different video frames, so as to correspondingly display the video frame corresponding to the frame identifier according to the displayed identification result.

Optionally, the fourth determining module includes:

a second obtaining unit, configured to obtain lens information corresponding to a lens included in the target video, where the lens information includes a lens time range of the lens;

a sixth determining unit, configured to determine, for each object, a lens time range corresponding to the object in a video frame corresponding to at least one corresponding timestamp acquired based on the current playing time;

and the selecting unit is used for determining the video frame where the object image with the highest quality score is located in the video frames with the timestamps within the shot time range.

Optionally, the apparatus further comprises:

the sixth acquisition module is used for acquiring the head portrait image of the object in the video frame where the object image with the highest quality score is located;

and the first returning module is also used for returning the head portrait image so as to correspondingly display the head portrait image of the object according to the displayed identification result.

In a third aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

a processor configured to implement the object recognition method according to any one of the first aspect when executing a program stored in a memory.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a program of an object recognition method, which when executed by a processor, implements the steps of the object recognition method of any one of the first aspects.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

the method comprises the steps of receiving an identification request aiming at an object in a target video played currently, extracting a video identifier and a current playing time of the target video from the identification request, and obtaining an object library corresponding to the video identifier, wherein the object library comprises identification results of the object in a plurality of video frames of the target video, the identification results are obtained by identifying clustering results of the object in the target video, each video frame has a corresponding timestamp, and the identification results of the object in the video frames corresponding to at least one corresponding timestamp can be obtained in the object library based on the current playing time and returned.

According to the embodiment of the invention, the recognition result of the object in the object library is obtained by recognizing the clustering result of the object in the target video, so that different presented contents of the same object correspond to the same clustering result, and further, even if the object appearing in a certain video frame is not easy to recognize, the object can be associated to the clustering result based on the video frame and then associated to the corresponding recognition result, so that even if the object to be recognized in the picture presents the condition of difficult recognition such as side face, back shadow, defocus or shielding, the object in the video frame can be recognized, and the success rate of object recognition is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of an object identification method according to an embodiment of the present application;

fig. 2 is a structural diagram of an object recognition apparatus according to an embodiment of the present application;

fig. 3 is a structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The identification of the video content is mainly realized based on information analysis in the video at present, but the identification success rate is low because an object to be identified in a picture presents a side face, a back shadow, defocus or shielding, and therefore, the embodiment of the invention provides an object identification method, an object identification device, electronic equipment and a storage medium, wherein the object identification method can be applied to a server side, and the server side can communicate with a client side through a network.

As shown in fig. 1, the object recognition method may include the steps of:

step S101, receiving an identification request aiming at an object in a currently played target video;

in the embodiment of the invention, in the process of playing the target video, if the client detects the trigger operation of clicking the preset button in the video playing interface, the client generates the identification request and sends the identification request to the server, so that the server can determine the video and the video frame which are aimed at by the identification request, and the identification request can carry the video identifier of the target video and the current playing time.

After the client sends the identification request, the server may receive the identification request.

Step S102, extracting the video identification and the current playing time of the target video from the identification request;

the identification request carries the video identifier of the target video and the current playing time, so that the identification request can be analyzed to obtain the video identifier and the current playing time.

Step S103, acquiring an object library corresponding to the video identifier;

in the embodiment of the invention, an object library can be constructed for each video in advance, and the corresponding relation between the video identification of the video and the object library is established. The object library comprises recognition results of objects in a plurality of video frames of the target video, the recognition results are obtained by recognizing clustering results of the objects in the target video, and each video frame has a corresponding timestamp.

The object may refer to a human face, a body type, and the like, and the recognition result may refer to identity information of the object (such as a name of a character, a name of an actor, an age, a height, a weight, and the like).

Since any object in the target video appears in at least one video frame, and the content of the object presented in different video frames may be different, for example, for the same actor, in different video frames, a picture of a front face, a side face or other angles of the face, or a picture of a body of the actor at different angles, or a picture of the actor blocked by other actors or objects, or a picture of the actor out of focus may be presented, in order to obtain recognition results of the same actor presenting different pictures in different video frames, all actors appearing in the target video need to be clustered to obtain clustering results, each clustering result corresponds to one actor, and each clustering result corresponds to at least one presented content of the actor.

In this step, an object library corresponding to the video identifier may be searched for according to the video identifier.

Step S104, in the object library, acquiring an identification result of an object in a video frame corresponding to at least one corresponding timestamp based on the current playing time, and returning the identification result to display the identification result.

Because the object library comprises the identification result of the object in each video frame, and each video frame has a corresponding timestamp, the corresponding video frame can be searched according to the current playing time, the identification result of the object in the video frames is acquired, the acquired identification result is returned finally, and the client can display the identification result on an independent UI interface after receiving the identification result.

The method comprises the steps of receiving an identification request aiming at an object in a target video played currently, extracting a video identifier and a current playing time of the target video from the identification request, and obtaining an object library corresponding to the video identifier, wherein the object library comprises identification results of the object in a plurality of video frames of the target video, each video frame has a corresponding timestamp, the identification results are obtained by identifying clustering results of the objects in the target video, and the identification results of the objects in the video frames corresponding to at least one corresponding timestamp can be obtained in the object library based on the current playing time and returned.

In a further embodiment of the present invention, before receiving the identification request for the object in the currently played target video in step S101, the method further includes:

step 201, extracting a plurality of video frames from the target video according to a preset sampling rate;

step 202, performing object detection on a plurality of video frames in the target video to obtain an object image in each video frame;

in the embodiment of the present invention, each video includes a plurality of video frames, and the plurality of video frames are extracted from the video at the server in advance according to a preset sampling rate, where the sampling rate can be adjusted, for example: 8 frames are sampled in one second, etc., so that a plurality of video frames may refer to all or a portion of the video frames in the target video.

The object detection may refer to face detection, body type detection, or the like, and performs object detection on the video frame to obtain a detection frame, where an image in the detection frame is an object image, and illustratively, a pre-trained neural network model may be used to perform object detection.

Step 203, clustering objects in the plurality of object images to obtain a clustering result, wherein the clustering result corresponds to one or more object images;

because the content presented by the same object in each object image may be different, the content presented by the same object in different object images can be aggregated through clustering to obtain a clustering result, and the clustering result can correspond to one or more object images containing the same object.

In an embodiment of the invention, the object features in the object image can be extracted, the object features and the clustering results are correspondingly stored, the stored object features can facilitate the recognition of the clustering results, and the clustering results are stored, so that the recognition results corresponding to the clustering results can be rapidly reproduced after the object recognition library is updated, and the rapid online validation and the use are facilitated.

Step 204, identifying the clustering results based on a preset object identification library to obtain an identification result corresponding to each clustering result;

in the embodiment of the present invention, the object recognition library may store the identity information corresponding to each object, and the recognition may refer to face recognition or body type recognition, for example, a pre-trained neural network model may be used to perform object recognition.

The clustering result can correspond to one or more object images containing the same object, so that the clustering result can be identified, and the identification result is corresponding to the identity information of each object in the object identification library to obtain the identification result corresponding to the clustering result.

Step 205, determining a video frame corresponding to each clustering result according to the object image corresponding to each clustering result and the video frame to which the object image belongs;

step 206, determining an identification result corresponding to the object in each video frame according to the identification result corresponding to each clustering result;

since the clustering result corresponds to one or more object images, the object images are extracted from the video frames, and the identification result corresponding to each clustering result is determined, the identification result corresponding to the object in each video frame can be obtained according to the video frame to which the object image belongs.

Step 207, building the object library based on the identification result corresponding to the object in each video frame.

By storing the clustering result as the intermediate state, the offline data can be quickly reproduced to be effective online after the character library is updated.

According to the embodiment of the invention, the objects in the object images are clustered and then identified in advance in an off-line manner, and then the object library is constructed based on the identification result corresponding to the object in each video frame, so that different presented contents of the same object can correspond to the same clustering result, and further, when online production is facilitated, even if the object appearing in a certain video frame is not easy to identify (such as the object image is unclear or looks like differently), the clustering result can be associated based on the video frame, and then the corresponding identification result is associated, so that even if the object to be identified in the picture presents the condition that the side face, the back shadow, the defocus or the shielding is not easy to identify, the object in the video frame can be identified, and the success rate of object identification is improved.

In practical applications, there is a certain delay from the intention of the user to generate the identification to the preset button of the client being actually operated, so that it is easy for the object to be identified in the video to have changed when the identification is actually triggered, for this reason, in another embodiment of the present invention, the step S104 is implemented in the object library, and the obtaining the identification result of the object in the corresponding video frame based on the current playing time includes:

step 301, determining a query time period based on the current playing time;

in the embodiment of the present invention, a preset time period may be added before or after the current playing time to obtain the query time period, and the preset time period may be determined comprehensively according to the average operation delay time of the user and the resource processing capability of the server, for example: 2 seconds, etc.

Step 302, in the object library, determining at least one target video frame with a timestamp within the query time period;

since each video frame has a corresponding timestamp, the timestamp of each video frame can be compared with two boundary thresholds of the query time period, and if the timestamps of one or more video frames are within the query time period, the one or more video frames can be determined as the target video frame.

Step 303, merging repeated recognition results of the same object in at least one target video frame;

because there may be a plurality of target video frames, and a certain object may appear for a plurality of times in a plurality of target video frames, repeated recognition results need to be merged, and finally, only one recognition result is reserved for each object, so as to avoid that too many repeated recognition results are returned to the client to occupy too many resources.

And step 304, determining the non-repeated identification result after the combination as the identification result of the object in the video frame corresponding to the current playing time.

Since the foregoing embodiment constructs object libraries each of which stores the recognition result of an object in each video frame, the recognition result of an object in a target video frame can be acquired from the object libraries.

The embodiment of the invention can automatically determine the query time period based on the current playing time, so that even if a user delays a certain time period from the generation of a recognition intention to the actual operation of the preset button, the user can obtain the recognition result desired by the user, the user can use the query time period conveniently, and the query range can be expanded based on the video frame, the video frames in the query range are associated to the clustering result and then associated to the corresponding recognition result by expanding the query range, even if the object to be recognized in a certain video frame is not easily recognized (such as the object image is unclear or looks up differently), the object in the video frame can be recognized even if the object to be recognized in the picture presents the condition that the object is not easily recognized, such as side face, back shadow, focus loss or shielding, and the success rate of object recognition is improved.

In practical applications, in order to determine which object is the recognition result when determining the recognition result of the object, in another embodiment of the present invention, the method further includes:

step 401, acquiring object images in a plurality of video frames of the target video.

In the embodiment of the present invention, the object image is obtained by performing object detection on an object in each video frame.

Step 402, performing image quality evaluation on objects in a plurality of video frames to obtain quality scores of object images in each video frame;

for example, the image quality evaluation may include evaluating the sharpness of the image of the object, the deflection angle of the object in the image of the object (e.g., the extent of the side face of the human face), whether the object in the image of the object is occluded, the proportion of the object in the image of the object in the picture, and so on.

And respectively carrying out quality scoring on the items, and calculating (such as multiplying by a weight coefficient and then accumulating) based on the quality scoring of the items so as to obtain the quality score of the object image in each video frame.

Step 403, determining, for each object, a video frame in which the object image with the highest quality score is located, from among the video frames corresponding to at least one corresponding timestamp acquired based on the current playing time.

And step 404, returning display information corresponding to the video frame where the object image with the highest quality score is located, so as to display the object image and the identification result together.

According to the embodiment of the invention, by evaluating the quality scores of the object images, when the same object appears in a plurality of video frames, the video frame where the object image with the highest quality score is located is selected according to the quality scores for each object, and the corresponding display information of the video frame is returned so as to be displayed together with the identification result.

In practical applications, in order to make a user clearly know which object is the recognition result when seeing the recognition result of the object, based on the foregoing embodiment, in another embodiment of the present invention, the method further includes:

step 501, obtaining a frame identifier of a video frame where an object image with the highest quality score is located;

the frame identifier may be returned to the client together with the recognition result, or may not be sent to the client.

Step 404 includes:

step 502, returning the frame identifier corresponding to the recognition result, so as to correspondingly display the video frame corresponding to the frame identifier according to the displayed recognition result.

The embodiment of the invention can facilitate that when the same object appears in a plurality of video frames by evaluating the quality scores of the object images, for each object, selecting the frame identifier of the video frame where the object image with the highest quality score is located according to the quality score, when determining the identification result of any object, according to the frame identification, the identification result is determined to be the object in the video frame corresponding to the frame identification, the embodiment of the invention selects the frame identification of the video frame where the object image with the highest quality score is located according to the quality score, and returns the frame identification to the client, therefore, the client can search and display the corresponding video frame according to the frame identification corresponding to the identification result when displaying the identification result, because only the frame identification is transmitted, the transmission bandwidth is saved, and the user can conveniently check the identification result of which object is currently displayed by displaying the corresponding video frame.

In practical applications, in order to facilitate determining which object the recognition result of the object is directed to when determining the recognition result of the object, in a further embodiment of the present invention, the method further comprises:

step 601, positioning object images in a plurality of video frames of the target video, and determining image positions of the object images in corresponding video frames.

In the embodiment of the present invention, the image position may refer to a coordinate position of four vertices of the object image, or a coordinate position of two coordinates at opposite angles, or the like.

Step 602, for each object, acquiring an image position of an object image in a video frame where the object image with the highest quality score is located and an identification result of the object in the object image;

step 603, returning the image position corresponding to the recognition result, so as to correspondingly mark the image position in the displayed video frame according to the displayed recognition result.

The embodiment of the invention positions the image position of the object image in the video frame and returns the image position to the client, so that the client can mark the position of the object image on the video frame according to the image position when displaying the corresponding video frame according to the frame identifier corresponding to the identification result when displaying the identification result, and a user can conveniently check the currently displayed identification result of which object is.

In another embodiment of the present invention, the step 403 includes:

step 701, determining whether the video frame corresponding to the obtained at least one corresponding timestamp contains object images of at least two objects in the video frame corresponding to the obtained at least one corresponding timestamp based on the current playing time;

since in practical applications, there are many cases where a group play is likely to occur in a video, that is, at least two objects may occur in a video frame acquired based on a current playing time, it is necessary to determine the number of objects occurring in the acquired video frame.

If the acquired video frame only comprises one object, the video frame comprising the object image with the highest quality score is selected from the video frames comprising the object.

Step 702, if the video frame corresponding to at least one acquired corresponding timestamp contains object images of at least two objects, determining whether the object image with the highest quality score of each object is located in the same video frame;

in this step, for each object, the quality scores of the object images including the object may be compared to obtain an object image with the highest quality score corresponding to each object, and a video frame corresponding to the object image may be obtained, and by comparing the frame identifiers of the video frames, it may be determined whether the image with the highest quality score corresponding to each object is located in the same video frame.

Step 703, if the object image with the highest quality score of each object is located in the same video frame, acquiring a frame identifier of the video frame;

step 704, if the object image with the highest quality score of each object is located in a different video frame, obtaining a frame identifier of the video frame where the object image with the highest quality score of each object is located, so as to correspondingly display the video frame corresponding to the frame identifier according to the displayed identification result.

According to the embodiment of the invention, when the object image with the highest quality score of each object is positioned in the same video frame, the frame identifier of the common video frame is obtained, so that the object images of all the objects can be displayed in the same frame, and the system resources are saved while the object image with the best image quality is displayed; when the object image with the highest quality score of each object is located in different video frames, the frame identifier of the video frame where each object image is located can be obtained, the object images with the best image quality of each object can be conveniently and respectively displayed, and the user can conveniently view the object images.

In practical applications, in order to determine which object the recognition result of the object is for when determining the recognition result of the object, in a further embodiment of the present invention, the step 403 includes:

step 801, acquiring lens information corresponding to a lens contained in the target video, wherein the lens information includes a lens time range of the lens;

in this step, the start time and the end time of each shot in the target video may be marked in advance, and a time range between the start time and the end time of each shot, that is, a shot time range, may be set.

Step 802, determining a shot time range corresponding to each object in a video frame corresponding to at least one corresponding timestamp acquired based on the current playing time;

in this step, the shot time range in which the current playing time is located can be searched.

And 803, determining the video frame where the object image with the highest quality score is located in the video frames with the time stamps within the shot time range.

According to the embodiment of the invention, the shot time range corresponding to each object is determined, the video frame where the object image with the highest quality score is located is selected from the video frames with the timestamps located in the shot time range according to the quality scores, and compared with the video frame where the object image with the highest quality score is obtained only from the video frames corresponding to at least one corresponding timestamp obtained based on the current playing time, the search range is expanded to the shot time range corresponding to each object, so that the search is larger, the possible quality score of the video frame where the object image with the highest quality score is finally obtained is higher, and the effect of obtaining the video frame where the object image with the highest quality score is located is better.

Based on the foregoing embodiment, in a further embodiment of the present invention, the method further includes:

step 901, acquiring an avatar image of an object in a video frame where an object image with the highest quality score is located;

the avatar image may be returned to the client together with the recognition result or may not be sent to the client.

Step 404 includes:

and step 902, returning the head portrait image to correspondingly display the head portrait image of the object according to the displayed recognition result.

According to the embodiment of the invention, the video frame where the object image with the highest quality score is located is obtained within the shot time range corresponding to each object, so that the head portrait image of the object in the video frame in the same shot can be determined (the actor's looks are guaranteed to be unchanged) when the identification result is determined, and the identification result is conveniently determined as the identification result of which object in the video frame.

In the embodiment of the invention, in order to facilitate the client to directly display the head portrait of the object in one shot, the shot refers to a continuous shot (ensuring that actors look unchanged) before and after the current playing time, and at the moment, the result with the highest face quality of the same character in the shot time range can be selected and returned, so that the user using the client can conveniently check the result.

In yet another embodiment of the present invention, the method further comprises:

1001, if the object library corresponding to the video identifier is not obtained, acquiring a screenshot of a currently played video frame when the client sends the identification request;

if the object library corresponding to the video identifier is not obtained, the server cannot return an identification result to the client, and the client cannot know whether the video identifier exists in the corresponding object library in advance, and the operation of screenshot takes longer time than the identification request, so that the client needs to screenshot the currently played video frame every time the client sends the identification request, so that when the server does not return the identification result in the specified time period, the server sends the identification result to the server for identification.

Step 1002, performing object detection on the screenshot to obtain object characteristics;

and 1003, identifying the object features based on a preset object identification library to obtain an identification result, and returning the identification result.

The client side adopts an offline and online parallel scheme, and the response speed is greatly improved under the condition of being supported by offline data (an object library corresponding to the video identification exists or is obtained); and no extra response delay is introduced under the condition of no offline data (when no object library exists or is not acquired), so that the method is convenient for users to use.

In still another embodiment of the present invention, as shown in fig. 2, there is also provided an object recognition apparatus including:

a receiving module 11, configured to receive an identification request for an object in a currently played target video;

an extracting module 12, configured to extract the video identifier and the current playing time of the target video from the identification request;

a first obtaining module 13, configured to obtain an object library corresponding to the video identifier, where the object library includes recognition results of objects in multiple video frames of the target video, and the recognition results are obtained by recognizing clustering results of the objects in the target video, and each video frame has a corresponding timestamp;

a second obtaining module 14, configured to obtain, in the object library, an identification result of an object in a video frame corresponding to at least one corresponding timestamp based on the current playing time, and return the identification result to show the identification result.

Optionally, before the receiving module, the apparatus further comprises:

Optionally, the second obtaining module includes:

Optionally, the apparatus further comprises:

and the fifth acquisition module is used for acquiring the image position of the object image in the video frame where the object image with the highest quality score is located and the identification result of the object in the object image for each object.

Optionally, the apparatus further comprises:

Optionally, the third determining module includes:

Optionally, the third determining module further includes:

Optionally, the fourth determining module includes:

Optionally, the apparatus further comprises:

In another embodiment of the present invention, an electronic device is further provided, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the object identification method in the embodiment of the method when executing the program stored in the memory.

In the electronic device provided by the embodiment of the present invention, the processor executes the program stored in the memory to implement that an object library corresponding to the video identifier is obtained by receiving an identification request for an object in a currently played target video, extracting the video identifier and the current playing time of the target video from the identification request, where the object library includes identification results for the object in a plurality of video frames of the target video, the identification results are obtained by identifying clustering results for the object in the target video, each video frame has a corresponding timestamp, and the identification results for the object in the video frame corresponding to the corresponding timestamp can be obtained based on the current playing time in the object library and returned.

The communication bus 1140 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

The communication interface 1120 is used for communication between the electronic device and other devices.

The memory 1130 may include a Random Access Memory (RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The processor 1110 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

In still another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program of an object recognition method, which when executed by a processor, implements the steps of the object recognition method described in the foregoing method embodiment.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An object recognition method, comprising:

2. The object recognition method of claim 1, wherein prior to said receiving a recognition request for an object in a currently playing target video, the method further comprises:

3. The object recognition method according to claim 1, wherein the obtaining, in the object library, a recognition result of an object in a video frame corresponding to at least one corresponding timestamp based on the current playing time comprises:

determining a query time period based on the current playing time;

4. The object recognition method of claim 1, further comprising:

5. The object recognition method of claim 4, further comprising:

6. The object recognition method of claim 4, further comprising:

7. The object recognition method according to claim 5, wherein the obtaining of the frame identifier of the video frame in which the object image with the highest quality score is located comprises:

8. The object recognition method of claim 7, further comprising:

9. The object identification method according to claim 4, wherein the determining, for each object, the video frame in which the object image with the highest quality score is located, among the video frames corresponding to the at least one corresponding timestamp obtained based on the current playing time, comprises:

10. The object recognition method of claim 4, further comprising:

11. An object recognition apparatus, comprising:

12. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the object recognition method according to any one of claims 1 to 10 when executing a program stored in a memory.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program of an object recognition method, which when executed by a processor implements the steps of the object recognition method of any one of claims 1-10.