CN111314759A

CN111314759A - Video processing method and device, electronic equipment and storage medium

Info

Publication number: CN111314759A
Application number: CN202010137626.2A
Authority: CN
Inventors: 毕思远; 江宁; 刘莹
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2020-06-19
Anticipated expiration: 2040-03-02
Also published as: CN111314759B

Abstract

The application discloses a video processing method, a video processing device, electronic equipment and a storage medium, which relate to a computer vision technology, and are used for identifying a target object so as to realize video processing, wherein the method comprises the following steps: playing a target video, wherein the target video comprises at least one target object; acquiring an operation track input on a video playing interface; and if the operation track is determined to be consistent with the preset operation track, displaying the multimedia content associated with the target object aimed at by the operation track on the video playing interface. According to the video processing method and device, the electronic equipment and the storage medium, a user can conveniently and efficiently acquire the related content of a certain object in the video when watching the video.

Description

Video processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a video processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of internet technology, people increasingly like watching dramas or movies through the internet. When a user is interested in characters, articles and the like appearing in a video, the user needs to leave a video playing page, the content interested by the user needs to be manually searched through a browser and other related applications, the user returns to the video playing page to continue watching after the searched content is browsed, the operation is complicated, the user can be interrupted to watch the video, and particularly, the user experience is greatly reduced in a mobile terminal which cannot simultaneously display a plurality of pages, such as a smart phone.

Disclosure of Invention

The embodiment of the application provides a video processing method and device, an electronic device and a storage medium, so that a user can conveniently and efficiently acquire related content of a certain object in a video when watching the video, and the operation convenience is improved.

In one aspect, an embodiment of the present application provides a video processing method, including:

playing a target video, wherein the target video comprises at least one target object;

acquiring an operation track input on a video playing interface;

and if the operation track is determined to be consistent with the preset operation track, displaying the multimedia content associated with the target object aimed at by the operation track on the video playing interface.

Optionally, when it is determined that the playing time of the target video is within the triggerable time period, the method further includes:

and displaying the simulation input mode of the preset operation track in a set prompt area on the video playing interface.

Optionally, the method further comprises:

responding to the track setting operation, and displaying a track setting interface;

and determining the operation track input on the track setting interface as the preset operation track, or determining an operation track selected from a plurality of operation tracks displayed on the track setting interface as the preset operation track.

Optionally, the video processing method according to the embodiment of the present application further includes:

determining a target multimedia content type corresponding to the operation track according to the multimedia content type corresponding to the preset operation track matched with the operation track, wherein the corresponding preset operation track is configured for each multimedia content type in advance;

the acquiring of the multimedia content associated with the target object included in the corresponding target area specifically includes:

and acquiring multimedia content which is associated with the target object contained in the corresponding target area and belongs to the type of the target multimedia content.

determining a display mode corresponding to the operation track according to the multimedia content type corresponding to the preset operation track matched with the operation track, wherein the corresponding preset operation track is configured for each display mode in advance;

the displaying, on the video playing interface, the multimedia content associated with the target object for which the operation trajectory is specific includes:

and displaying the multimedia content associated with the target object aimed at by the preset operation track on the video playing interface according to the display mode corresponding to the operation track.

receiving an identification request sent by terminal equipment, wherein the identification request comprises an image to be identified, and the image to be identified is a target area corresponding to an operation track input on a video playing interface of the terminal equipment in a played video picture;

determining a target object contained in the image to be recognized;

and acquiring the multimedia content associated with the target object and sending the multimedia content to the terminal equipment.

In one aspect, an embodiment of the present application provides a video processing apparatus, including:

the playing control module is used for playing a target video, and the target video comprises at least one target object;

and the operation response module is used for acquiring an operation track input on a video playing interface, and if the operation track is determined to be consistent with a preset operation track, displaying the multimedia content associated with the target object aimed at by the operation track on the video playing interface.

Optionally, the operation response module is specifically configured to:

when an event of inputting an operation track on the video playing interface is monitored, acquiring at least one video picture played on the video playing interface in the process of inputting the operation track;

determining a corresponding target area of the operation track in each video picture;

acquiring multimedia content associated with a target object contained in a corresponding target area;

and displaying the acquired multimedia content on the video playing interface.

Optionally, the operation response module is specifically configured to:

acquiring and displaying a multimedia content list associated with a target object contained in the target area;

and responding to the operation of selecting the multimedia content in the multimedia content list, and acquiring the selected multimedia content.

Optionally, when the selected multimedia content includes an extension video, the operation response module is specifically configured to:

when the target video is played in the video playing interface, the extension video is played in a set display area of the video playing interface, wherein the extension video comprises the target object, and the extension video and the target video have the same shooting scene but different shooting visual angles.

Optionally, the operation response module is further configured to reduce a playing speed of the target video when an event that an operation trajectory is input on the video playing interface is monitored.

Optionally, the video processing apparatus further includes a detection module, configured to determine that the playing time of the target video is in a triggerable time period before the operation response module executes, and enable the operation response module.

Optionally, the operation response module is further configured to respond to a multimedia content change operation input on the video playing interface, and acquire multimedia content associated with other target objects included in the target video.

Optionally, the video processing apparatus further includes a prompt module, configured to display a simulation input mode of the preset operation track in a set prompt area on the video playing interface when it is determined that the playing time of the target video is within a triggerable time period.

Optionally, the video processing apparatus further includes a setting module, configured to:

Optionally, the operation response module is further configured to determine a target multimedia content type corresponding to the operation track according to the multimedia content type corresponding to the preset operation track matched with the operation track, where a corresponding preset operation track is configured in advance for each multimedia content type;

the operation response module is specifically configured to acquire multimedia content that is associated with a target object included in the corresponding target area and belongs to the type of the target multimedia content.

Optionally, the operation response module is further configured to determine, according to the multimedia content type corresponding to the preset operation trajectory matched with the operation trajectory, a display mode corresponding to the operation trajectory, where a corresponding preset operation trajectory is configured in advance for each display mode;

the operation response module is specifically configured to display, on the video playing interface, the multimedia content associated with the target object targeted by the preset operation track according to the display mode corresponding to the operation track.

the terminal equipment comprises a receiving module, a processing module and a display module, wherein the receiving module is used for receiving an identification request sent by the terminal equipment, the identification request comprises an image to be identified, and the image to be identified is a target area corresponding to an operation track input on a video playing interface of the terminal equipment in a played video picture;

the object identification module is used for determining a target object contained in the image to be identified;

the content acquisition module is used for acquiring the multimedia content associated with the target object;

and the sending module is used for sending the acquired multimedia content to the terminal equipment.

Optionally, the content obtaining module is specifically configured to obtain a multimedia content list associated with the target object;

the sending module is specifically configured to send the acquired multimedia content list to the terminal device;

the receiving module is specifically configured to receive a multimedia content acquisition request sent by the terminal device, where the multimedia content acquisition request includes a content identifier of a multimedia content selected from the multimedia content list;

the content obtaining module is specifically configured to obtain multimedia content corresponding to the content identifier;

the sending module is specifically configured to send the multimedia content corresponding to the obtained content identifier to the terminal device.

Optionally, the identification request further includes a multimedia content type corresponding to the operation track;

the content obtaining module is specifically configured to obtain the multimedia content associated with the target object and belonging to the multimedia content type in the identification request.

Optionally, when the multimedia content includes an extension position video, the identification request further includes a video identifier and playing time of a target video played by the terminal device;

the content obtaining module is specifically configured to:

acquiring an extension position video which contains the target object and contains the playing time in a corresponding association time period from extension position videos which are associated with the target video corresponding to the video identification, wherein the target video is associated with at least one extension position video, each extension position video comprises at least one target object, each extension position video has the same shooting scene with the target video but different shooting visual angles, and the association time period of each extension position video is as follows: determining a time period corresponding to the extension video on the playing time axis of the target video according to the shooting time of the extension video;

and determining the obtained extension bit video as the extension bit video associated with the target object.

In one aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any one of the methods when executing the computer program.

In one aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement the steps of any of the above-described methods.

In one aspect, an embodiment of the present application provides a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when executed by a processor, implement the steps of any of the methods described above.

Based on the video processing method, the video processing device, the electronic device and the storage medium provided by the embodiment of the application, when a user is interested in a target object such as a certain person or an article in a target video watched by the user, an operation track can be input in an area presented by the target object on a video playing interface, after the terminal device monitors that the operation track input by the user is a preset operation track, the terminal device obtains the target object in the area corresponding to the operation track, and then displays multimedia content associated with the target object on the video playing interface. Therefore, in the process of watching the video, a user can circle any target object in the video by inputting a preset operation track at any time and display the multimedia content associated with the target object on the video playing interface, so that the user can conveniently and efficiently acquire the related content of any target object in the video when watching the video, the interruption of video playing is avoided while the related content is quickly retrieved, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a video processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 3 is a schematic interface diagram illustrating that a terminal device displays multimedia content associated with a target object in response to a preset operation track according to an embodiment of the present application;

fig. 4 is a schematic flowchart illustrating a process of displaying multimedia content associated with a target object by a terminal device in response to a preset operation track according to an embodiment of the present application;

fig. 5A is a schematic interface diagram illustrating a video playing interface displaying multimedia content in an overlapping manner according to an embodiment of the present application;

fig. 5B is a schematic interface diagram of multimedia content with different areas on a video playing interface according to an embodiment of the present application;

fig. 6 is a schematic diagram of a corresponding relationship between a main bit video and a plurality of extension bit videos in playing time according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an interface for displaying a plurality of multimedia contents through a multimedia content list according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating a simulation input method for displaying a preset operation track in a set prompt area according to an embodiment of the present application;

fig. 9 is a schematic diagram illustrating an operation of acquiring multimedia content of another target object through a multimedia content change operation according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a trajectory setting interface according to an embodiment of the present application;

fig. 11 is a flowchart illustrating a video processing method according to an embodiment of the present application;

fig. 12 is a flowchart illustrating a video processing method according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

For convenience of understanding, terms referred to in the embodiments of the present application are explained below:

computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Operation track: is a continuous track that the user inputs on the screen of the terminal device by manipulating the input device. For example, the input device may be a touch-enabled device such as a touch screen or a tablet, and the user slides on the touch-enabled device with a finger or a stylus to input a special operation track, and the terminal device recognizes the shape of the input operation track and executes a function corresponding to the operation track; the input device may also be a mouse, and the user may input the operation trajectory by sliding the mouse. The operation track may be in various shapes such as a circle, a rectangle, a triangle, a heart, and the like, and the embodiment of the present application is not limited.

Target object: refers to an item or person appearing in the video.

Multimedia content: the method refers to the synthesis of various media, and generally comprises various media forms such as texts, audios, images, videos, animation special effects and the like.

Host position video: i.e., videos posted on various video websites that a user may click to view.

Extension bit video: is a video which is the same as the shooting scene of the host position video but has a different shooting angle. One master bit video may be associated with a plurality of extension bit videos.

A Client (Client), also called Client, refers to a program corresponding to a server and providing local services to clients. Except for some application programs which only run locally, the application programs are generally installed on common clients and need to be operated together with a server. After the internet has developed, the more common clients include web browsers used on the world wide web, email clients for receiving and sending emails, and client software for instant messaging. For this kind of application, a corresponding server and a corresponding service program are required in the network to provide corresponding services, such as database services, e-mail services, etc., so that a specific communication connection needs to be established between the client and the server to ensure the normal operation of the application program.

Any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

In a specific practical process, when a user is interested in characters, articles and the like appearing in a video, the user needs to leave a video playing page, manually searches the interested content through a browser and other related application programs, returns to the video playing page to continue watching after browsing the searched content, is complicated in operation and can interrupt the user from watching the video, and particularly, the user experience is greatly reduced in a mobile terminal which cannot display a plurality of pages simultaneously, such as a smart phone. Although some video clients recommend related video content to users according to the viewing records or preferences of the users, for example, when the users are watching movie a, the video clients display movies of the same type as movie a or other movies or episodes in which actors in movie a have participated in the recommendation list, the recommendation of the above content obviously fails to meet the needs of the users, and when some characters or items appearing in the videos are no longer in the recommendation list, the users still need to search manually.

Therefore, the application provides a video processing method, which specifically comprises the following steps: playing a target video, wherein the target video comprises at least one target object; acquiring an operation track input on a video playing interface; and if the operation track is determined to be consistent with the preset operation track, displaying the multimedia content associated with the target object aimed at by the operation track on the video playing interface. When a user is interested in a target object such as a certain person or an article in a target video watched by the user, an operation track can be input in an area presented by the target object on a video playing interface, after monitoring an event of the operation track input by the user, the terminal device compares the input operation track with a preset operation track, if the operation track is determined to be consistent with the preset operation track, the target object in the area corresponding to the operation track is obtained, and then multimedia content related to the target object is displayed on the video playing interface. Therefore, in the process of watching the video, a user can circle any target object in the video by inputting a preset operation track at any time and display the multimedia content associated with the target object on the video playing interface, so that the user can conveniently and efficiently acquire the related content of any target object in the video when watching the video, the interruption of video playing is avoided while the related content is quickly retrieved, and the user experience is improved.

After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Fig. 1 is a schematic view of an application scenario of a video processing method according to an embodiment of the present application. The application scenario includes terminal device 101 (including terminal device 101-1, terminal device 101-2, … … terminal device 101-n), backend server 102, and data storage server 103. The terminal device 101, the background server 102 and the data storage server 103 are connected through a wireless or wired network. The terminal device 101 may be installed with various clients, and may be a device capable of displaying an object provided in the installed client, where the terminal device 101 includes, but is not limited to, an electronic device such as a desktop computer, a smart phone, a mobile computer, a tablet computer, a media player, a smart wearable device, a smart television, and a vehicle-mounted device. The background server 102 and the data storage server 103 may be a server, a server cluster formed by a plurality of servers, or a cloud computing center. Of course, the background server 102 and the data storage server 103 shown in fig. 1 may be arranged in the same server or server cluster.

The background server 102 is used for providing video-related services such as video playing, video live broadcast and the like, and the data storage server 103 is used for storing videos and video-related multimedia contents, wherein the types of the multimedia contents include, but are not limited to, videos, pictures, audios, texts, animation special effects and the like. The client installed in the terminal device 101 may be a browser client, a video application client, or the like, and the background server 102 provides a video playing service. The user may access the backend server 102 through a client installed in the terminal device 101 to use the multimedia service provided by the multiple backend servers 102. For example, the terminal device 101 may access the background server 102 through a video application client, and may also access a web portal of the background server 102 through a browser client. In the process of using the video related service provided by the background server 102, the user can also issue comments, barrage and other contents in the process of watching the video for interaction.

Of course, the method provided in the embodiment of the present application is not limited to be used in the application scenario shown in fig. 1, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 1 will be described in the following method embodiments, and will not be described in detail herein.

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operation steps as shown in the following embodiments or figures, more or less operation steps may be included in the method based on the conventional or non-inventive labor. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application.

The following describes the technical solution provided in the embodiment of the present application with reference to the application scenario shown in fig. 1.

Referring to fig. 2, an embodiment of the present application provides a video processing method, which can be applied to the terminal device 101 shown in fig. 1, and specifically includes the following steps:

s201, playing a target video, wherein the target video comprises at least one target object.

In specific implementation, a user can open a client on the terminal device, select a target video to be watched, the terminal device sends a request for acquiring the target video to the background server, and the background server sends the target video to the terminal device, so that the terminal device plays the target video. Or, the user can select a target video to be watched from videos stored in the terminal device, and play the target video through a client installed on the terminal device.

The target object may be an item or a person appearing in the video, such as an actor MARY, a car, etc. in the video.

S202, obtaining an operation track input on the video playing interface.

S203, judging whether the acquired operation track is consistent with a preset operation track; if the operation track is determined to be consistent with the preset operation track, step S204 is executed, otherwise, the next event of inputting the operation track is waited.

In specific implementation, the terminal device may identify the shape of the operation trajectory input by the user, compare the shape of the operation trajectory input by the user with the shape of the preset operation trajectory, and if the shape of the operation trajectory input by the user matches the shape of the preset operation trajectory, determine that the operation trajectory input by the user is the preset operation trajectory, and the terminal device executes step S204; if the shape of the operation trajectory input by the user does not match the shape of the preset operation trajectory, step S204 is not executed, and the next event of inputting the operation trajectory is waited. The terminal device may identify the shape of the operation track input by the user through any existing track identification method, and details are not repeated.

The shape of the preset operation trajectory in the embodiment of the present application is not limited to a circle, a rectangle, a triangle, a heart, and the like. The shape of the preset operation track can be preset and configured by the client, or the shape of the preset operation track can be set by the user through the client by user definition.

And S204, displaying the multimedia content associated with the target object aimed at by the operation track on the video playing interface.

In specific implementation, according to the track coordinates corresponding to the input operation track in the video playing interface and the video picture played in the video playing interface when the operation track is input, the corresponding area of the input operation track in the video picture is determined, image recognition processing is performed on the image in the area, and the target object contained in the image is recognized, so that the target object targeted by the input operation track is determined. The image recognition method adopted in the embodiment of the present application is not limited, and for example, the target object may be recognized from a video picture by using an image recognition model obtained based on a training deep learning network.

In practical application, a user can adjust the size of the input operation track according to the size of the target object in the video playing interface so as to ensure that only one target object exists in the area corresponding to the operation track, and the target object can be accurately identified based on the image in the area. For example, if the target object is a human, the region defined by the operation trajectory input by the user should cover the human face region of the target object, and if the target object is a bag, the region defined by the operation trajectory input by the user should cover the region where the bag is located. In the embodiment of the application, the target object is defined through the operation track input by the user, so that the user can independently select the target object needing to display the multimedia content, particularly, when a plurality of target objects appear in a video picture at the same time, the user can only define one target object interested by the user through the preset operation track, and then only the multimedia content related to the target object is obtained, and the independence and the convenience of the user in selecting the target object are improved.

In specific implementation, the multimedia content associated with each target object can be configured in advance, the background server can automatically capture the related information of the target object from the network, and after the captured information is cleaned, classified and the like, the captured information is associated with the target object and stored in the data storage server, so that the multimedia content associated with the target object can be provided to the terminal equipment when needed.

Referring to fig. 3, taking a smart phone as an example, when a user wants to view related information of a target object 301 in a target video being played, an operation track 302 may be input in an area presented by the target object on a video playing interface 30 through a touch screen of the smart phone, after monitoring the operation track 302 input by the user, the smart phone determines whether the input operation track 302 is consistent with a preset operation track, if so, determines the target object 301 included in an area corresponding to the operation track 302, and then displays a multimedia content 303 associated with the target object 301 on the video playing interface 30. Therefore, in the process of watching the video, a user can circle any target object in the video by inputting a preset operation track at any time and display the multimedia content associated with the target object on the video playing interface, so that the user can conveniently and efficiently acquire the related content of any target object in the video when watching the video, the interruption of video playing is avoided while the related content is quickly retrieved, and the user experience is improved.

Referring to fig. 4, in a possible implementation, step S204 specifically includes the following steps:

s401, when an event of inputting an operation track on a video playing interface is monitored, at least one video picture played on the video playing interface in the process of inputting the operation track is obtained.

In specific implementation, when monitoring an event of inputting an operation track on a video playing interface, the terminal device may perform at least one screen capture operation on the video playing interface in the process of inputting the operation track, so as to obtain at least one video picture played on the video playing interface in the process of inputting the operation track, and after determining the input operation track and a preset operation track, determine a target area corresponding to the operation track based on the captured video picture. Specifically, the terminal device may perform a screen capture operation when monitoring a start point of the input operation trajectory, may also perform a screen capture operation when monitoring an end point of the input operation trajectory, and may also perform a screen capture operation at any time in the process of inputting the operation trajectory.

In specific implementation, the terminal device may record playing time displayed by a playing time axis on the video playing interface when the operation track is input, and then, after the input operation track is determined to be the preset operation track, obtain at least one video frame corresponding to the playing time from the target video as the obtained video frame. For example, when the operation track is input, the playing time axis on the video playing interface shows 1 minute 20 seconds, and at least one video frame is acquired from a plurality of video frames in the target video at the time of 1 minute 20 seconds as the acquired video frame. When the time taken by the input operation track is long, the recorded playing time may be a time period, such as 1 minute 20 seconds to 1 minute 21 seconds, and at least one video frame is acquired from a plurality of video frames in the time period of 1 minute 20 seconds to 1 minute 21 seconds in the target video as the acquired video picture. In order to improve the accuracy of identifying the target object, a video frame with higher definition may be selected as the video picture for determining the target area in step S402.

S402, determining a corresponding target area of the input operation track in each video picture.

During specific implementation, the terminal device may determine a corresponding target area of the operation track in the video picture according to the track coordinate of the input operation track on the display screen, the relative position of the video playing interface in the display screen, and the relative position of the played video picture in the video playing interface. When full-screen playing is adopted, namely the video picture is full of the whole display screen, at the moment, the track coordinate of the operation track on the display screen is the track coordinate of the operation track in the video picture. Referring to fig. 3, the inner region defined by the operation trace 302 is the target region.

S403, multimedia contents related to the target object contained in the corresponding target area are obtained.

In specific implementation, one video picture with higher definition can be selected from the obtained multiple video pictures, image recognition processing is performed on the basis of a target area corresponding to the video picture, and a target object contained in the target area is determined, so that the recognition accuracy of the target object is improved.

In a specific implementation, the image recognition processing may be performed on target areas corresponding to the plurality of video pictures, and the target object included in the target area may be determined based on the plurality of image recognition results, so as to improve the recognition accuracy of the target object. For example, if a total of 6 video frames are subjected to image recognition processing, wherein 5 image recognition results are the person a and 1 image recognition result is the person B, the target object included in the target area is determined to be the person a.

In a possible implementation manner, the terminal device sends the image to be recognized corresponding to the at least one target area determined in step S402 to the background server. The background server carries out image recognition processing on each image to be recognized to obtain a target object contained in a target area, acquires multimedia content associated with the target object from the data storage server, and sends the multimedia content associated with the target object to the terminal equipment. And the terminal equipment displays the acquired multimedia content on the video playing interface.

In another possible implementation, step S403 specifically includes: acquiring and displaying a multimedia content list associated with a target object contained in a target area; and responding to the operation of selecting the multimedia content in the multimedia content list, and acquiring the selected multimedia content.

Specifically, the terminal device sends the image to be recognized corresponding to the at least one target area determined in step S402 to the background server. The background server carries out image recognition processing on each image to be recognized to obtain a target object contained in a target area, a multimedia content list associated with the target object is obtained from the data storage server, the multimedia content list contains a plurality of multimedia contents associated with the target object, and the background server sends the multimedia content list to the terminal equipment. The terminal equipment acquires and displays a multimedia content list sent by the background server, responds to the operation of selecting the multimedia content in the multimedia content list and sends a multimedia content acquisition request to the background server, wherein the multimedia content acquisition request comprises the content identification of the multimedia content selected from the multimedia content list. And the background server acquires the multimedia content corresponding to the content identifier in the multimedia content acquisition request and sends the multimedia content to the terminal equipment. And the terminal equipment receives and acquires the multimedia content returned by the background server and displays the multimedia content on a video playing interface.

When the target object is associated with a plurality of multimedia contents, the plurality of multimedia contents can be presented to the user through the multimedia content list, so that the user can select the multimedia contents which the user needs to view from the multimedia content list. Of course, when the number of the multimedia contents associated with the target object is limited, for example, only 2 to 5 multimedia contents are associated, the associated multimedia contents may also be sequentially displayed according to a preset sequence.

In a specific implementation, there may be a waiting time during the execution of steps S401 to S402, and at this time, a prompt message for identifying the target object may be displayed on the video playing interface to prompt the user that the user has responded to the input track operation.

And S404, displaying the acquired multimedia content on a video playing interface.

In specific implementation, when the target video is played in the video playing interface, the acquired multimedia content is displayed in the set display area of the video playing interface, that is, the target video and the multimedia content associated with the acquired target object are simultaneously displayed in the video playing interface, so that a user can acquire the related information of the target object while watching the target video.

Through the manner shown in fig. 4, in the process of watching a video, when a user is interested in a certain target object in the video, the terminal device may acquire a target area where the target object defined by the operation trajectory in the video screen is located, identify the target object based on the screen corresponding to the target area, and display the multimedia content associated with the target object on the video playing interface, only by inputting an operation trajectory that is consistent with a preset operation trajectory in an area where the target object appears on the video playing interface. Therefore, the user only needs to input the preset operation track in the area where the target object displayed on the video playing interface is located, the terminal device can quickly identify the target object which is interested by the user, the operation is simple, the video playing is not required to be interrupted, and the user experience is improved.

Fig. 5A is one possible way of displaying multimedia content. In fig. 5A, the setting display area 503 may be a pop-up window or a floating layer added to the video playing interface 501, wherein the setting display area 503 only covers a part of the video playing interface 501. At this time, if the target object selected by the user is the packet 502, the multimedia content associated with the packet 502 may be obtained, and the multimedia content associated with the packet 502 is displayed in the set display area 503, where the specifically displayed multimedia content may include: the brand, model, price, and front and back pictures of the package 502 may be displayed so that the user may quickly learn the relevant information about the package 502.

Further, the user can adjust the size and position of the setting display area in the video playing interface to adjust the setting display area to the orientation according with the watching habit of the user. For example, the setting display area may be a left or right position in the video playback interface. Of course, the user can also move the position of the setting display area in the video playing interface at any time, so that the user can adjust the setting display area at any time when the setting display area blocks the area concerned by the user.

Fig. 5B is another possible way of displaying multimedia content. In fig. 5B, the video playing interface 501 is divided into a first area 504 and a second area 505, the target video is played in the first area 504, and the acquired multimedia content is displayed in the second area 505, that is, the second area 505 is a set display area. The embodiment of the present application does not limit the specific dividing manner. The ratio and the position of the first area 504 and the second area 505 in the video playing interface 501, and the arrangement manner of the first area 504 and the second area 505 may be configured in advance or set by a user, which is not limited in the embodiment of the present application. The display mode of fig. 5B ensures that the target video and the multimedia content are not occluded from each other, improving the viewing experience.

When the multimedia content is more, the multimedia content can be displayed in a classified way so as to be convenient for the user to check. The classification mode of the multimedia content can be determined according to the actual application requirements. For example, multimedia content can be classified into the following types: text class, picture class, video class, audio class. Alternatively, multimedia content can be divided into the following types: the method comprises the steps of machine location video, detail tracking video, relevant information of a target object, special effects and the like.

The related information of the target object may include: a brief introduction, a photograph, a work (e.g., a movie, a television show, audio, a book, an advertisement, etc.) of the target object, etc.

The extension video of the target video is the video which has the same shooting scene with the target video but has a different shooting visual angle. Specifically, when the same program is shot, the program can be shot by a plurality of cameras in different machine positions, for example, a camera I can shoot a panorama containing all people, a camera II can shoot close-ups of a person A, a camera III can shoot pictures of different machine positions of the person, after a main video is cut, a plurality of branch machine position videos can be cut based on the content shot by other machine positions, so that the close-ups of different people in the same scene at the same time can be shown at the same time, or the same item or person can be shown from different angles. Each main machine position video is associated with at least one extension position video, and each extension position video comprises at least one target object. The staff further needs to associate the playing time of each extension video with the playing time axis of the main machine position video, and specifically, the time period corresponding to each extension video on the playing time axis of the main machine position video can be determined according to the shooting time of the extension video and the shooting time of the main machine position video, and the determined time period is the associated time period of each extension video. And then, packaging and storing the main machine bit video, the extension bit video and the associated time periods of the extension bit videos into a data storage server. The host position video is a target video which is published on each video website and can be selected by a user to watch.

Referring to fig. 6, a corresponding relationship between the main bit video and the plurality of extension bit videos in the playing time is shown. As can be seen from the play time axis of the main bit video shown in fig. 6, the video duration of the main bit video is 2 minutes 50 seconds in total, the video duration of the extension bit video 1 is 50 seconds, the corresponding time period on the play time axis is 00:30-01:20, the video duration of the extension bit video 2 is 40 seconds, the corresponding time period on the play time axis is 01:00-01:40, the frequency durations of the extension bit video 3 and the extension bit video 4 are 30 seconds, and the corresponding time period on the play time axis is 02:10-02: 40.

In specific implementation, the extension bit video associated with the target object can be acquired from the multiple extension bit videos associated with the target video in the following manner: and acquiring the extension bit video which contains the target object and contains the playing time of the target video in the corresponding associated time period from the extension bit video associated with the target video, and determining the acquired extension bit video as the extension bit video associated with the target object.

The target video is associated with at least one extension video, each extension video comprises at least one target object, each extension video and the target video have the same shooting scene but different shooting visual angles, and the associated time period of each extension video is as follows: and determining a time period corresponding to the extension video on the playing time axis of the target video according to the shooting time of the extension video. Wherein, the playing time of the target video refers to: and when the terminal equipment monitors the event of inputting the operation track, the corresponding time point on the playing time axis of the target video is obtained.

The obtaining of the extension bit video associated with the target object can be executed by the background server, and the obtained extension bit video associated with the target object is sent to the terminal device.

After the terminal equipment acquires the extension video sent by the background server, when the target video is played in the video playing interface, the extension video is played in the set display area of the video playing interface. Specifically, the target video and the extension video can be synchronously played according to the playing time of the target video and the acquired associated time period of the extension video, that is, the playing progress of the extension video is kept consistent with the playing progress of the target video. Taking fig. 6 as an example, time point 00:00 in the playing time axis of the extension bit video 1 corresponds to 00:30 in the playing time axis of the target video (i.e., the main bit video in fig. 6), time point 00:50 in the playing time axis of the extension bit video 1 corresponds to 01:20 in the playing time axis of the target video, assuming that the terminal device acquires the extension bit video 1 when the target video is played to 00:40, the playing is started from 00:10 of the extension bit video 1, when the target video is played to 00:50, the extension bit video 1 is also played to 00:20 synchronously until the target video is played to 01:20, and the extension bit video 1 is played completely. In the playing process of the extension position video 1, a user can randomly close the extension position video 1.

Taking fig. 6 as an example, extension bit video 1 only includes target object a, extension bit video 2 only includes target object B, extension bit video 3 includes target objects a and C, and extension bit video 4 includes target objects B and C. When the target video (i.e. the main station video in fig. 6) is played to 00:40, the user inputs a preset operation track for the target object a, and at this time, only the associated time period (00:30-01:20) of the extension station video 1 includes the playing time (00:40) of the target video, and the extension station video 1 includes the target object a, so that the extension station video 1 is the extension station video associated with the target object a, and the terminal device acquires the extension station video 1 and plays the extension station video 1 in the video playing interface. When the target video is played to 01:05, a user inputs a preset operation track for a target object B, at this time, the associated time period (00:30-01:20) of the extension bit video 1 and the associated time period (01:00-01:40) of the extension bit video 2 comprise the playing time (01:05) of the target video, and the extension bit video 1 does not comprise the target object A, so that only the extension bit video 2 is the extension bit video associated with the target object B, the terminal device acquires the extension bit video 2, at this time, the extension bit video 1 is not played completely, the terminal device can stop playing the extension bit video 1 and then play the extension bit video 2. When the target video is played to 02:15, a user inputs a preset operation track aiming at a target object C, at the moment, the associated time period (02:10-02:40) of the extension bit video 3 and the extension bit video 4 comprises the playing time (02:15) of the target video, and the extension bit video 3 and the extension bit video 4 both comprise the target object C, so that the extension bit video 3 and the extension bit video 4 are extension bit videos associated with the target object C, the terminal equipment displays the extension bit video 3 and the extension bit video 4 through a multimedia content list, and plays the extension bit video 3 or the extension bit video 4 according to the selection of the user.

In the process of watching a target video, a user can circle a target object interested by the user by inputting a preset operation track on a video playing interface, then the extension video containing the target object is acquired based on the playing time of the target video, and the target video and the extension video are synchronously played, wherein the extension video can be a close-up shot of the target object and also can be a video of the target object shot from an angle different from that of the target video, so that the user can watch the target video more clearly and comprehensively, a better video watching experience is provided for the user, and the interest in the video watching process is improved.

In the field of video real-time transmission, the quality of transmitted video needs to be reduced due to the influence of network transmission rate, which may result in some details such as facial expressions of characters, details and textures of articles, and the like not being well shown to users. For this reason, the embodiment of the application also provides a detail tracking video for the user to select to watch. The detail tracking video is a video which is captured from the extension bit video and contains the local details of the target object, specifically, a picture corresponding to the local details of the target object can be captured from the high-definition extension bit video to form the detail tracking video of the target object, the detail tracking video also needs to be associated with the host bit video, and the specific association mode can refer to a mode of associating the extension bit video with the host bit video and is not described any more.

For example, a partial picture of the face of the target object a may be cut from a high-definition extension video to form a face tracking video. When the user defines the target object a, the terminal device may display a face tracking video of the target object a. Therefore, the user can see the panoramic picture and the facial expression of the target object A, and the user experience is improved. And because the face tracking video only intercepts the local content in the high-definition video, the data volume of the face tracking video is small, and the transmission efficiency can be ensured, and the high-definition picture impression can be provided for the user.

The special effect prop refers to an application plug-in capable of superposing a special display effect in a video. For example, a special effect of putting fireworks is added to a video, and people in the video are provided with glasses, wigs and the like. When the multimedia content selected by the user is the special effect prop, the terminal device can preset the area where the target object is located and the operation track is aimed at on the video playing interface, and display the special display effect corresponding to the special effect prop. Specifically, the area of the target object on the video playing boundary can be determined according to the track coordinates of the preset operation track on the video playing boundary, and the special display effect corresponding to the special effect prop is displayed in the determined position in an overlapping mode. Furthermore, the area of the target object on the video playing boundary can be accurately positioned by combining the image recognition technology and the image segmentation technology.

The mode of using the special effect prop for the target object is triggered based on the input operation track aiming at the target object, so that a user can determine which target object in the video uses the special effect prop according to the requirement of the user, and the interestingness in the video watching process is increased.

On the basis of providing the multimedia contents of the plurality of types, if one target video is associated with the multimedia contents of the plurality of types at the same time, the multimedia contents of the plurality of types can be displayed through the multimedia content list. In particular, each type of the multimedia content list may include a plurality of multimedia contents, for example, a plurality of special effects and a plurality of extension videos may be provided for the user to select, and then the multimedia contents selected by the user are displayed on the video playing interface.

Referring to fig. 7, 3 types of multimedia contents, respectively, "pan", "face tracking", and "more works", are shown in a left column 702 of a multimedia content list 701. When a user clicks 'shoot', at least one extension video related to a target object 'MARY' is displayed in the right column 703 of the multimedia content list 701, when only one extension video exists, the extension video is directly played in the right column 703, when a plurality of extension videos exist, a front cover and simple description information corresponding to each extension video can be displayed in the right column 703, and the user can select one extension video to play. When the user clicks "face tracking", at least one face tracking video associated with the target object "MARY" is displayed in the right column 703 of the multimedia content list 701, and the specific display mode may refer to the display mode of the extension video, which is not described again. When the user clicks on "more works", personal information associated with the target object "MARY" such as a personal profile shown in fig. 7 and works such as movies, television shows, etc. may be presented.

On the basis of any one of the above embodiments, the video processing method according to the embodiment of the present application further includes the following steps: and when an event of inputting an operation track on the video playing interface is monitored, reducing the playing speed of the target video.

In specific implementation, the event of inputting the operation trajectory includes: a trajectory input start event, which is an event generated when a start point of the operation trajectory is input, and a trajectory input end event, which is an event generated when a start point of the operation trajectory is input. Taking a touch display screen of a smart phone as an example, a track input start event is generated when a user finger touches the touch display screen and starts to slide, and a track input end event is generated when the user finger leaves the touch display screen. When monitoring a track input starting event, the terminal device reduces the playing speed of the target video, for example, the playing speed of the target video can be adjusted to be 0.5 times of the normal speed, so that a user can accurately mark an area corresponding to the target object by inputting an operation track, and the identification accuracy of the target object is improved. And when monitoring the track input ending event, the terminal equipment adjusts the playing speed of the target video to be a normal speed.

By means of the mode of the playing speed of the low-target video in the process of inputting the operation track by the user, the user can input the operation track of the defined target object unconsciously, operation difficulty is reduced, meanwhile, the problem that the target object cannot be accurately positioned due to the fact that the playing speed is too high is prevented, and the method is beneficial to improving the identification accuracy of the target object.

In specific implementation, the terminal device may also stop playing the target video when monitoring a track input start event, and continue playing the target video when monitoring a track input end event. Due to the fact that the time for inputting the operation track is short, the watching experience of the user cannot be influenced by the short pause.

On the basis of any of the above embodiments, before step S203, the video processing method according to the embodiment of the present application further includes the following steps: and judging whether the playing time of the target video is in the triggerable time period, if so, executing the step S203, otherwise, not executing the step S203. The triggerable time period refers to a time period capable of responding to an operation track which is input on the video playing interface and is consistent with a preset operation track. Only in the triggerable event segment, whether the input operation track is consistent with the preset track or not is judged, and the step S204 is executed under the condition of consistency, so that misoperation of a user can be prevented.

In specific implementation, the triggerable time period can be determined by setting a start tag and an end tag of the triggerable time period on the playing time axis of the target videos, each target video can be configured with at least one triggerable time period, the triggerable time periods of one target video are not overlapped, and the start tag of the same triggerable time period on the playing time axis is located before the end tag. For example, the start tab for one triggerable time period in the target video is set at 00:30 of the play timeline, the end tab is set at 00: at 50, when the target video plays 00: in the video frame between 00 and 00:30, even if the operation track input by the user is consistent with the preset operation track, the terminal device is not triggered to execute the step S203; when the target video is played at 00:30 to 00: and in the triggerable time period when the video frame is between 50, the user can trigger the terminal device to execute step S203 by inputting a preset operation track.

In specific implementation, the triggering time period of the target video can be configured in advance by a video production method, and the triggerable time period of the target video can be determined according to the associated time period of the extension bit video associated with the target video. Taking fig. 6 as an example, the associated time periods on the playing time axis of the target video include: "00: 30-01: 20", "01: 00-01: 40" and "02: 10-02: 40", the triggerable time period may be "00: 30-01: 40" and 02:10-02:40 ".

Furthermore, the terminal device can set a prompt area on the video playing interface, display a simulation input mode of the preset operation track, and prompt the user to input the mode of the preset operation track.

In specific implementation, the user can manually close the prompt information displayed in the set prompt area. Or, the terminal device may display the simulation input mode of the preset operation track in the set prompt area on the video playing interface only when the first N seconds of video frames in the target video are played, and then automatically hide the set prompt area. Therefore, the video prompt device can play a role in prompting and can prevent the prompting information from interfering the user to watch the video.

In specific implementation, when it is determined that the playing time of the target video is within the triggerable time period, a simulation input mode of a preset operation track can be displayed in a set prompt area on the video playing interface. And when the playing time of the target video is not in the triggerable time period, automatically hiding the set prompt area. In this way, the user can be prompted when the triggerable time period is available while the simulation input mode of the preset operation track is prompted, namely when the terminal device can be triggered to display the multimedia content associated with the target object by inputting the preset operation track.

Referring to fig. 8, a simulation input mode of a preset operation track is displayed in a set prompt region 802 on the video playing interface 801, and a user can know that the preset operation track is a circle by setting prompt information displayed in the prompt region 802. A close button 803 may also be displayed in the setting prompt area 802, and the user may hide the setting prompt area 802 by clicking the close button 803.

On the basis of any of the above embodiments, the terminal device may respond to a multimedia content change operation input on the video playing interface to obtain multimedia content associated with other target objects included in the target video.

In specific implementation, the multimedia content changing operation includes, but is not limited to: a slide-up operation, a slide-left operation, a long-press operation, and the like, which are input in the setting display area. Referring to fig. 9, in a setting display area 901 on a video playing interface 901, multimedia content associated with other target objects in a target video can be viewed through a slide-up operation. Specifically, the terminal device responds to a multimedia content change operation input on the video playing interface, sends a multimedia content change request to the background server, the background server randomly obtains multimedia contents associated with other target objects in the target video, sends the multimedia contents to the terminal device, and the terminal device receives and displays the multimedia contents associated with the other target objects. Therefore, the user can quickly check the multimedia contents related to other target objects through the multimedia content changing operation.

On the basis of any one of the above embodiments, the user can set the shape of the preset operation track in a user-defined manner. Specifically, the terminal device responds to the track setting operation, displays a track setting interface, and determines the operation track input on the track setting interface as a preset operation track. Or the terminal device responds to the track setting operation, displays a track setting interface, and determines an operation track selected from a plurality of operation tracks displayed on the track setting interface as a preset operation track. The track setting operation may be a click of a track setting button in a setting menu bar in the video playing interface, or a specified operation such as an input of a specific track or a slide-up operation in the video playing interface. Therefore, the user can set the corresponding preset operation track according to the operation habit of the user, and the operation convenience is improved.

FIG. 10 is a schematic diagram of a trajectory setting interface. The user may select one operation trajectory as a preset operation trajectory among a plurality of operation trajectories displayed from the trajectory setting interface 1001. The user may also draw an operation trajectory in the right trajectory input region 1002, click the determination button, take the drawn operation trajectory as a preset operation trajectory, and click the clear button to clear the operation trajectory in the trajectory input region 1002, thereby redrawing the operation trajectory.

On the basis of any of the above embodiments, a corresponding preset operation track may also be configured in advance for each multimedia content type, for example, a circular track corresponding to an extension video, a triangular track corresponding to a face tracking video, a heart-shaped track corresponding to related information of a target object, and the like. During specific implementation, the user can also configure the corresponding relationship between each multimedia content type and each preset operation track.

For this reason, before performing step S203, the method of the embodiment of the present application further includes the following steps: and determining a preset operation track matched with the input operation track, and determining a target multimedia content type corresponding to the operation track according to the multimedia content type corresponding to the preset operation track matched with the operation track. Accordingly, when the multimedia content associated with the target object included in the corresponding target area is acquired, only the multimedia content associated with the target object included in the corresponding target area and belonging to the type of the target multimedia content is acquired.

For example, the multimedia content associated with the target object includes an extension video, a face tracking video and related information of the target object, and the preset operation track input by the user for the target object is circular, and the type configured for the circular track is the extension video, then the terminal device only acquires the extension video associated with the target object.

By configuring the corresponding preset operation track for each multimedia content type in advance, the user can acquire the multimedia content which is associated with the defined target object and belongs to the multimedia content type only by inputting the preset operation track corresponding to a certain multimedia content type, so that the user can quickly acquire different multimedia contents through different preset operation tracks, the target object is defined through the preset operation track, the acquired multimedia content type is selected, and the operation convenience is improved.

In addition to any of the above embodiments, a plurality of display modes for displaying multimedia content may be configured in advance, such as a normal mode for normally displaying multimedia content, and a cool mode for displaying multimedia content and adding a special display effect at the same time. The display mode can be further divided into an overlapping display mode shown in fig. 5A and a partitioned display mode shown in fig. 5B.

Based on this, a corresponding preset operation track can be configured for each display mode in advance, for example, a circular track corresponds to the overlapping display mode, and a triangular track corresponds to the regional display mode. During specific implementation, the user can also configure the corresponding relationship between each display mode and each preset operation track.

For this reason, before executing step S203, the video processing method according to the embodiment of the present application further includes the steps of: and the terminal equipment determines a display mode corresponding to the operation track according to the display mode corresponding to the preset operation track matched with the input operation track. Correspondingly, step S203 specifically includes: and the terminal equipment displays the multimedia content associated with the target object aimed at by the operation track on a video playing interface according to the display mode corresponding to the operation track. Therefore, when the user defines the target object through the preset operation track, the corresponding display mode can be selected, the operation convenience is improved, and meanwhile the diversity of the display modes is increased.

Referring to fig. 11, an embodiment of the present application provides another video processing method, which can be applied to the background server 102 shown in fig. 1, and specifically includes the following steps:

s1101, the background server receives an identification request sent by the terminal device, wherein the identification request comprises an image to be identified, and the image to be identified is a target area corresponding to an operation track input on a video playing interface of the terminal device in a played video picture.

After determining that the operation track input by the user is the preset operation track, the terminal device determines an image to be recognized corresponding to at least one target area through step S402, and sends an identification request to the background server, where the identification request includes the determined image to be recognized.

S1102, the background server determines a target object contained in the image to be recognized.

In specific implementation, the background server performs image recognition processing on each image to be recognized in the recognition request to obtain a target object contained in the target area. The image recognition method adopted in the embodiment of the application is not limited, and for example, the target object can be recognized from the image to be recognized by using an image recognition model obtained based on a training deep learning network.

S1103, the background server obtains the multimedia content associated with the target object and sends the multimedia content to the terminal device.

In the embodiment of the application, a user can circle any target object in a video by inputting an operation track at any time in the process of watching the video, then the terminal device sends an image of a region corresponding to the operation track to the background server, the background server identifies the image to obtain the target object contained in the image, and sends multimedia content associated with the target object to the terminal device, and the terminal device displays the multimedia content associated with the target object on a video playing interface, so that the user can conveniently and efficiently obtain the related content of any target object in the video when watching the video, interruption of video playing is avoided while the related content is quickly retrieved, and user experience is improved.

In a possible implementation manner, the background server acquires the multimedia content associated with the target object from the data storage server, and sends the multimedia content associated with the target object to the terminal device.

In another possible implementation, referring to fig. 12, step S1103 specifically includes the following steps:

s1201, the background server obtains a multimedia content list associated with the target object and sends the multimedia content list to the terminal equipment.

In specific implementation, the background server acquires a multimedia content list associated with the target object from the data storage server, the multimedia content list includes a plurality of multimedia contents associated with the target object, and the background server sends the multimedia content list to the terminal device. The terminal equipment acquires and displays a multimedia content list sent by the background server, responds to the operation of selecting the multimedia content in the multimedia content list and sends a multimedia content acquisition request to the background server, wherein the multimedia content acquisition request comprises the content identification of the multimedia content selected from the multimedia content list.

S1202, the background server receives a multimedia content obtaining request sent by the terminal equipment, wherein the multimedia content obtaining request comprises a content identifier of the multimedia content selected from the multimedia content list.

S1203, the background server acquires the multimedia content corresponding to the content identification and sends the multimedia content to the terminal equipment.

And the terminal equipment receives and acquires the multimedia content returned by the background server and displays the multimedia content on a video playing interface.

When the target object is associated with a plurality of multimedia contents, the plurality of multimedia contents can be presented to the user through the multimedia content list, so that the user can select the multimedia contents which need to be viewed from the multimedia content list, and the multimedia contents selected by the user are displayed on the video playing interface.

Further, a corresponding preset operation track can be configured in advance for each multimedia content type, for example, a circular track corresponding to the extension video, a triangular track corresponding to the face tracking video, a heart-shaped track corresponding to the related information of the target object, and the like. At this time, the terminal device needs to determine the shape of the input operation track, compare the input operation track with a plurality of preset operation tracks, determine a preset operation track matched with the operation track, determine the multimedia content type corresponding to the matched preset operation track as the target multimedia content type corresponding to the operation track, and add the target multimedia content type corresponding to the preset operation track in the identification request sent to the terminal device.

Based on this, step S1103 specifically includes: and the background server acquires the multimedia content which is associated with the target object and belongs to the target multimedia content type in the identification request, and sends the multimedia content to the terminal equipment.

For example, the multimedia content associated with the target object includes an extension video, a face tracking video and related information of the target object, and if the operation track input by the user for the target object is circular, and the type configured for the circular track is the extension video, only the extension video associated with the target object is acquired and sent to the terminal device.

In specific implementation, the background server may first obtain the multimedia content associated with the target object, then select the multimedia content belonging to the multimedia content type in the identification request from the multimedia content associated with the target object, and send the selected multimedia content to the terminal device.

The corresponding preset operation track is configured for each multimedia content type in advance, so that a user can quickly acquire different multimedia contents through different preset operation tracks, the target object is defined through the preset operation track, the acquired multimedia content type is selected, and the operation convenience is improved.

On the basis of any of the above embodiments, when the multimedia content includes the extension video, the identification request sent by the terminal device further includes a video identifier and a playing time of a target video played by the terminal device.

Based on the method, the extension bit video associated with the target object can be acquired by the following steps: and acquiring the extension bit video which contains the target object and contains the playing time in the corresponding associated time period from the extension bit video associated with the target video corresponding to the video identifier, and determining the acquired extension bit video as the extension bit video associated with the target object. The target video is associated with at least one extension video, each extension video comprises at least one target object, each extension video and the target video have the same shooting scene but different shooting visual angles, and the associated time period of each extension video is as follows: and determining a time period corresponding to the extension video on the playing time axis of the target video according to the shooting time of the extension video.

In specific implementation, the background server may query the extension bit videos associated with the target video from the data storage server according to the video identifier in the identification request, then obtain the extension bit videos including the target object from the extension bit videos associated with the target video, then find the extension bit videos including the playing time in the associated time period from the obtained extension bit videos including the target object, and determine the found extension bit videos as the extension bit videos associated with the target object.

In specific implementation, the background server can query the extension bit videos associated with the target video from the data storage server according to the video identifiers in the identification request, then obtain the extension bit videos containing the playing time in the associated time period from the extension bit videos associated with the target video, then determine the extension bit videos containing the target object from the extension bit videos containing the playing time in the associated time period, and take the determined extension bit videos as the extension bit videos associated with the target object.

By synchronously playing the target video and the extension video corresponding to the target object in the target video, a user can watch a certain target object in the target video more clearly and comprehensively while watching the target video, better video watching experience is provided for the user, and the interestingness in the video watching process is improved.

When the multimedia content comprises the detail tracking video, the identification request sent by the terminal device further comprises the video identification and the playing time of the target video played by the terminal device. The background server finds out the detail tracking video associated with the target object according to the video identifier and the playing time in the identification request, and the specific implementation manner can refer to a method for obtaining the extension bit video associated with the target object, which is not repeated.

As shown in fig. 13, based on the same inventive concept as the video processing method, the embodiment of the present application further provides a video processing apparatus 130, which specifically includes a play control module 1301 and an operation response module 1302.

A playing control module 1301, configured to play a target video, where the target video includes at least one target object.

The operation response module 1302 is configured to obtain an operation track input on the video playing interface, and if it is determined that the operation track is consistent with the preset operation track, display multimedia content associated with a target object for which the operation track is specific on the video playing interface.

Optionally, the operation response module 1302 is specifically configured to:

when an event of inputting an operation track on a video playing interface is monitored, at least one video picture played on the video playing interface in the process of inputting the operation track is obtained;

and displaying the acquired multimedia content on a video playing interface.

Optionally, the operation response module 1302 is specifically configured to:

acquiring and displaying a multimedia content list associated with a target object contained in a target area;

Optionally, when the selected multimedia content includes an extension bit video, the operation response module 1302 is specifically configured to:

when the target video is played in the video playing interface, the extension video is played in a set display area of the video playing interface, wherein the extension video comprises a target object, and the shooting scenes of the extension video and the target video are the same but the shooting visual angles are different.

Optionally, the operation response module 1302 is further configured to reduce the playing speed of the target video when an event that an operation track is input on the video playing interface is monitored.

Optionally, the video processing apparatus 130 further includes a detection module for determining that the playing time of the target video is in the triggerable time period and enabling the operation response module before the operation response module executes.

Optionally, the operation response module 1302 is further configured to respond to a multimedia content change operation input on the video playing interface, and acquire multimedia content associated with other target objects included in the target video.

Optionally, the video processing apparatus 130 further includes a prompt module, configured to display a simulation input mode of a preset operation track in a set prompt area on the video playing interface when it is determined that the playing time of the target video is in the triggerable time period.

Optionally, the video processing apparatus further comprises a setting module, configured to:

and determining the operation track input on the track setting interface as a preset operation track, or determining an operation track selected from a plurality of operation tracks displayed on the track setting interface as the preset operation track.

Optionally, the operation response module 1302 is further configured to determine a target multimedia content type corresponding to the operation track according to the multimedia content type corresponding to the preset operation track matched with the operation track, where a corresponding preset operation track is configured in advance for each multimedia content type.

Accordingly, the operation response module 1302 is specifically configured to obtain the multimedia content that is associated with the target object included in the corresponding target area and belongs to the type of the target multimedia content.

Optionally, the operation response module 1302 is further configured to determine, according to the multimedia content type corresponding to the preset operation track matched with the operation track, a display mode corresponding to the operation track, where a corresponding preset operation track is configured in advance for each display mode.

Correspondingly, the operation response module 1302 is specifically configured to display, on the video playing interface, the multimedia content associated with the target object for which the operation trajectory is preset according to the display mode corresponding to the operation trajectory.

As shown in fig. 14, based on the same inventive concept as the video processing method, an embodiment of the present application further provides a video processing apparatus 140, which specifically includes: a receiving module 1401, an object identifying module 1402, a content obtaining module 1403, and a transmitting module 1404.

A receiving module 1401, configured to receive an identification request sent by a terminal device, where the identification request includes an image to be identified, and the image to be identified is a target area corresponding to an operation track input on a video playing interface of the terminal device in a played video picture.

An object recognition module 1402 is configured to determine a target object included in the image to be recognized.

A content obtaining module 1403, configured to obtain the multimedia content associated with the target object.

A sending module 1404, configured to send the obtained multimedia content to the terminal device.

Optionally, the content obtaining module 1403 is specifically configured to obtain a multimedia content list associated with the target object.

The sending module 1404 is specifically configured to send the obtained multimedia content list to the terminal device.

The receiving module 1401 is specifically configured to receive a multimedia content obtaining request sent by a terminal device, where the multimedia content obtaining request includes a content identifier of a multimedia content selected from a multimedia content list.

The content obtaining module 1403 is specifically configured to obtain the multimedia content corresponding to the content identifier.

The sending module 1404 is specifically configured to send the multimedia content corresponding to the obtained content identifier to the terminal device.

Optionally, the identification request further includes a multimedia content type corresponding to the operation track.

Accordingly, the content obtaining module 1403 is specifically configured to obtain the multimedia content associated with the target object and belonging to the multimedia content type in the identification request.

Optionally, when the multimedia content includes the extension video, the identification request further includes a video identifier and a playing time of a target video played by the terminal device.

Correspondingly, the content obtaining module 1403 is specifically configured to obtain, from the extension bit video associated with the target video corresponding to the video identifier, the extension bit video that includes the target object and includes the playing time in the corresponding associated time period, and determine the obtained extension bit video as the extension bit video associated with the target object. The target video is associated with at least one extension video, each extension video comprises at least one target object, each extension video and the target video have the same shooting scene but different shooting visual angles, and the associated time period of each extension video is as follows: and determining a time period corresponding to the extension video on the playing time axis of the target video according to the shooting time of the extension video.

The video processing apparatus and the video processing method provided by the embodiment of the application adopt the same inventive concept, can obtain the same beneficial effects, and are not described herein again.

Based on the same inventive concept as the video processing method, an embodiment of the present application further provides an electronic device, which may specifically be a terminal device or a background server as shown in fig. 1. As shown in fig. 15, the electronic device 150 may include a processor 1501 and a memory 1502.

The Processor 1501 may be a general-purpose Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The memory 1502, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1502 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

An embodiment of the present application provides a computer-readable storage medium for storing computer program instructions for the electronic device, which includes a program for executing the video processing method.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method of the embodiments of the present application, and should not be construed as limiting the embodiments of the present application. Modifications and substitutions that may be readily apparent to those skilled in the art are intended to be included within the scope of the embodiments of the present application.

Claims

1. A video processing method, comprising:

acquiring an operation track input on a video playing interface;

2. The method according to claim 1, wherein the displaying, on the video playing interface, the multimedia content associated with the target object for which the operation trajectory is directed specifically includes:

and displaying the acquired multimedia content on the video playing interface.

3. The method according to claim 2, wherein the obtaining of the multimedia content associated with the target object included in the target area specifically includes:

4. The method according to claim 3, wherein when the selected multimedia content includes an extension video, the displaying the acquired multimedia content on the video playing interface specifically includes:

5. The method according to any one of claims 1 to 4, further comprising:

and when an event of inputting an operation track on the video playing interface is monitored, reducing the playing speed of the target video.

6. The method according to any one of claims 1 to 4, wherein before the obtaining the operation track input on the video playing interface, the method further comprises:

determining that the playing time of the target video is in a triggerable time period.

7. The method according to any one of claims 1 to 4, further comprising:

responding to the multimedia content change operation input on the video playing interface, and acquiring multimedia content related to other target objects contained in the target video.

8. A video processing method, comprising:

determining a target object contained in the image to be recognized;

9. The method according to claim 8, wherein the obtaining of the multimedia content associated with the target object and sending to the terminal device specifically includes:

acquiring a multimedia content list associated with the target object and sending the multimedia content list to the terminal equipment;

receiving a multimedia content acquisition request sent by the terminal equipment, wherein the multimedia content acquisition request comprises a content identifier of multimedia content selected from the multimedia content list;

and acquiring the multimedia content corresponding to the content identification, and sending the multimedia content to the terminal equipment.

10. The method of claim 8, wherein the identification request further includes a multimedia content type corresponding to the operation track;

the acquiring the multimedia content associated with the target object and sending the multimedia content to the terminal device specifically includes:

and acquiring the multimedia content which is associated with the target object and belongs to the multimedia content type in the identification request, and sending the multimedia content to the terminal equipment.

11. The method according to claim 8, wherein when the multimedia content includes an extension video, the identification request further includes a video identifier and a playing time of a target video played by the terminal device, and the acquiring the multimedia content associated with the target object specifically includes:

12. A video processing apparatus, comprising:

13. A video processing apparatus, comprising:

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 11 are implemented when the computer program is executed by the processor.

15. A computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method of any one of claims 1 to 11.