CN110662113A

CN110662113A - Video playing method and device and computer readable storage medium

Info

Publication number: CN110662113A
Application number: CN201910909375.2A
Authority: CN
Inventors: 郭梦茹
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-01-07
Anticipated expiration: 2039-09-25
Also published as: CN110662113B

Abstract

The application discloses a video playing method and device and a computer readable storage medium, and belongs to the technical field of computers. The method comprises the following steps: in the process of playing a first video, detecting whether a video picture of a second video is contained in a video picture of the first video, wherein the second video is other videos in the first video picture; when a video picture of a second video is contained in a video picture of a first video, acquiring a plurality of video images from the first video, wherein each video image in the plurality of video images comprises a video image of the second video; extracting a video image of a second video from each of the plurality of video images; coding a video image of a second video according to the parameters of the first video to obtain a second video; and playing the second video. According to the method and the device, the user can watch the second video which is played simultaneously in the first video in the process of watching the first video, and the flexibility of video playing is improved.

Description

Video playing method and device and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video playing method and apparatus, and a computer-readable storage medium.

Background

With the development of computer technology, the use of terminals such as mobile phones and computers is more and more common, and the functions are more and more perfect. Among them, video playing is one of the basic functions of a terminal, and can bring more entertainment to people, so that it is more and more emphasized by people.

When a user watches a video, other videos are often played in the watched video at the same time, for example, characters displayed in the watched video are also browsing the video on a computer or a television. At this point the user may be interested in other videos being played in this video to view. Therefore, a video playing method is needed to play the other videos simultaneously played in the watched videos separately.

Disclosure of Invention

The embodiment of the application provides a video playing method, a video playing device and a computer readable storage medium, which can improve the flexibility of video playing. The technical scheme is as follows:

in one aspect, a video playing method is provided, where the method includes:

in the process of playing a first video, detecting whether a video picture of a second video is contained in a video picture of the first video, wherein the second video is other videos in the first video picture;

when the video picture of the second video is contained in the video picture of the first video, acquiring a plurality of video images from the first video, wherein each video image in the plurality of video images comprises the video image of the second video;

extracting a video image of the second video from each of the plurality of video images;

coding a video image of the second video according to the parameter of the first video to obtain the second video;

and playing the second video.

Optionally, the extracting the video image of the second video from each of the plurality of video images includes:

detecting a video frame from each video image in the plurality of video images, wherein the video frame is a frame body capable of playing videos inside;

and for any one video image in the plurality of video images, extracting pixel points in a video frame in the one video image, and forming the extracted pixel points into one video image of the second video.

Optionally, the encoding the video image of the second video according to the parameter of the first video includes:

storing the video image of the second video in a designated folder;

and calling a specified tool according to the parameters of the first video and the name of the specified folder to indicate the specified tool to encode the video images stored in the specified folder according to the parameters.

Optionally, the detecting whether the video picture of the second video is included in the video pictures of the first video includes:

detecting whether the video pictures of the first video contain the video pictures of the second video according to the objects in the video images of the first video; and/or

Detecting whether the video picture of the first video contains the video picture of the second video or not according to the character information in the video picture of the first video; and/or

Detecting whether the video picture of the second video is contained in the video picture of the first video or not according to the audio of the first video and the character information in the video picture of the first video; and/or

And detecting whether the video pictures of the second video are contained in the video pictures of the first video or not according to a specified classification model.

Optionally, the detecting, according to an object in the video image of the first video, whether the video picture of the second video is included in the video pictures of the first video includes:

detecting whether a video frame is contained in a video image of the first video; when the video image of the first video contains a video frame, determining that the video picture of the second video is contained in the video picture of the first video; or

Detecting whether scene attributes of at least two objects are different in a video image of the first video; when the scene attributes of at least two objects in the video image of the first video are different, determining that the video picture of the second video is included in the video pictures of the first video.

Optionally, the detecting, according to text information in the video image of the first video, whether the video picture of the second video is included in the video picture of the first video includes:

detecting whether the target positions except the subtitle position in the video image of the first video continuously present character information in a reference time length;

and when the target position continuously appears text information within the reference time length, determining that the video pictures of the second video are contained in the video pictures of the first video.

Optionally, the detecting, according to the audio of the first video and the text information in the video image of the first video, whether the video picture of the second video is included in the video picture of the first video includes:

when audio is being played in the first video, if the subtitle information does not appear at the subtitle position of the video image of the first video, detecting whether the audio is matched with the text information appearing at other positions except the subtitle position in the video image of the first video;

and when the audio is matched with the text information appearing at other positions except the subtitle position in the video image of the first video, determining that the video image of the second video is contained in the video image of the first video.

Optionally, the detecting whether the video picture of the second video is included in the video pictures of the first video according to the specified classification model includes:

inputting the video images of the first video into a specified classification model, and outputting the type of the video images of the first video by the specified classification model;

and when the type of the video image of the first video is a specified type, determining that the video pictures of the second video are contained in the video pictures of the first video.

In one aspect, a video playing method is provided, where the method includes:

acquiring a playing address of the second video according to the image characteristics of the plurality of video images;

acquiring the second video according to the playing address of the second video;

and playing the second video.

Optionally, the obtaining a play address of the second video according to the image features of the plurality of video images includes:

acquiring first video characteristics according to the image characteristics of the plurality of video images;

matching the first video features with a plurality of second video features stored in a video feature library;

and if the first video feature is successfully matched with one of the plurality of second video features, acquiring a video playing address corresponding to the successfully matched second video feature as the playing address of the second video.

Optionally, the obtaining a first video feature according to the image features of the plurality of video images includes:

determining image features of the plurality of video images as first video features; or

Detecting a video frame from each video image in the plurality of video images, wherein the video frame is a frame body capable of playing videos inside; for any one of the video images, extracting pixel points in a video frame in the video image, and forming the extracted pixel points into a target image; image features of a plurality of target images are determined as first video features.

In one aspect, a video playing apparatus is provided, the apparatus including:

the device comprises a detection module, a processing module and a display module, wherein the detection module is used for detecting whether a video picture of a second video is contained in a video picture of a first video in the process of playing the first video, and the second video is other videos in the first video picture;

an obtaining module, configured to obtain a plurality of video images from the first video when a video image of the second video is included in a video image of the first video, where each of the plurality of video images includes a video image of the second video;

an extraction module for extracting a video image of the second video from each of the plurality of video images;

the encoding module is used for encoding the video image of the second video according to the parameter of the first video to obtain the second video;

and the playing module is used for playing the second video.

Optionally, the extraction module comprises:

the detection unit is used for detecting a video frame from each video image in the plurality of video images, and the video frame is a frame body capable of playing videos inside;

and the extraction unit is used for extracting pixel points positioned in a video frame in any one of the video images and forming the extracted pixel points into one video image of the second video.

Optionally, the encoding module comprises:

a storage unit configured to store a video image of the second video in a designated folder;

and the calling unit is used for calling a specified tool according to the parameters of the first video and the name of the specified folder so as to indicate the specified tool to encode the video image stored in the specified folder according to the parameters.

Optionally, the detection module includes:

a first detection unit, configured to detect whether a video picture of the second video is included in a video picture of the first video according to an object in a video image of the first video; and/or

The second detection unit is used for detecting whether the video picture of the second video is contained in the video picture of the first video according to the character information in the video picture of the first video; and/or

A third detecting unit, configured to detect whether a video picture of the second video is included in a video picture of the first video according to the audio of the first video and the text information in the video image of the first video; and/or

And the fourth detection unit is used for detecting whether the video pictures of the second video are contained in the video pictures of the first video according to the specified classification model.

Optionally, the first detection unit is configured to:

Optionally, the second detecting unit is configured to:

Optionally, the third detecting unit is configured to:

Optionally, the fourth detecting unit is configured to:

In one aspect, a video playing apparatus is provided, the apparatus including:

a first obtaining module, configured to obtain, when a video picture of the second video is included in a video picture of the first video, a plurality of video images from the first video, where each of the plurality of video images includes a video image of the second video;

the second obtaining module is used for obtaining the playing address of the second video according to the image characteristics of the plurality of video images;

the third obtaining module is used for obtaining the second video according to the playing address of the second video;

and the playing module is used for playing the second video.

Optionally, the second obtaining module includes:

the first acquisition unit is used for acquiring first video characteristics according to the image characteristics of the plurality of video images;

the matching unit is used for matching the first video characteristics with a plurality of second video characteristics stored in a video characteristic library;

and the second obtaining unit is used for obtaining a video playing address corresponding to one successfully matched second video feature as the playing address of the second video if the first video feature is successfully matched with one of the plurality of second video features.

Optionally, the first obtaining unit is configured to:

Optionally, the detection module includes:

Optionally, the first detection unit is configured to:

Optionally, the second detecting unit is configured to:

Optionally, the third detecting unit is configured to:

Optionally, the fourth detecting unit is configured to:

In one aspect, a video playing apparatus is provided, the apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of the video playing method described above.

In one aspect, a computer-readable storage medium is provided, which stores instructions that, when executed by a processor, implement the steps of the video playing method described above.

The technical scheme provided by the embodiment of the application can at least bring the following beneficial effects:

in the process of playing the first video, whether a video picture of a second video is contained in a video picture of the first video is detected, and the second video is other videos in the first video picture. When the video picture of the second video is contained in the video picture of the first video, a plurality of video images are obtained from the first video, and the video image of the second video is extracted from each of the plurality of video images. And then, coding the video image of the second video according to the parameter of the first video to obtain the second video. And finally, playing the second video. Therefore, the user can watch the second video which is played simultaneously in the first video in the process of watching the first video, and the flexibility of video playing is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a video playing method provided in an embodiment of the present application;

fig. 2 is a flowchart of another video playing method provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a video playing method provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a first video playing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a second video playback device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a third video playback device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before explaining the embodiments of the present application in detail, application scenarios of the embodiments of the present application will be described.

When watching a video, other videos are often played in the watched video at the same time, for example, a person shown in the watched video is also browsing the video on a computer or a television. In this case, the video picture of the video to be viewed includes video pictures of other videos. At this point the viewer may be interested in other videos that are played in this video and want to view them. Therefore, the embodiment of the present application provides a video playing method, so that when a video picture of a second video is included in a video picture of a first video being played, the second video is acquired and played, so as to meet a user requirement and improve the flexibility of video playing.

Fig. 1 is a flowchart of a video playing method provided in an embodiment of the present application, where the method is applied to a terminal. Referring to fig. 1, the method includes:

step 101: in the process of playing the first video, whether the video pictures of the second video are contained in the video pictures of the first video is detected.

When watching a video, other videos are often played in the watched video at the same time, for example, a person shown in the watched video is also browsing the video on a computer or a television. In this case, the video picture of one video includes the video picture of another video. Therefore, in the process of playing the first video, whether the video picture of the second video is contained in the video picture of the first video can be detected, and the second video is other video in the first video picture.

The operation of detecting whether the video picture of the second video is included in the video picture of the first video may include at least one possible implementation manner of the following four possible implementation manners:

a first possible implementation: and detecting whether the video pictures of the second video are contained in the video pictures of the first video according to the objects in the video pictures of the first video.

Specifically, whether a video frame is included in the video image of the first video may be detected, and when the video frame is included in the video image of the first video, it is determined that the video frame of the second video is included in the video frame of the first video; alternatively, it may be detected whether scene attributes of at least two objects are different in a video image of the first video; when the scene attributes of at least two objects in the video image of the first video are different, determining that the video picture of the second video is contained in the video pictures of the first video.

The video frame is a frame body in which a video can be played. For example, the video frame may be a window of a video window in a page for playing a video, such as an application interface of a video application or a web page of a video website, or may be an equipment frame of a video playing device, such as a television or a computer. The video border may be generally rectangular in shape and the color of the video border may be generally black.

In addition, when the video image of the first video includes the video frame, it indicates that the video image of the first video includes a frame capable of playing the video, and it can be determined that other videos are simultaneously played in the first video, that is, the video frame of the first video includes the video frame of the second video.

When detecting whether the video image of the first video contains the video frame, detecting whether the video image of the first video contains a black rectangular frame, and if so, determining that the video image of the first video contains the video frame; or, an object included in the video image of the first video may be identified, and if a video playing device exists in the object included in the video image of the first video, it is determined that a video frame is included in the video image of the first video. Of course, whether the video image of the first video includes the video frame may also be detected in other ways, which is not limited in the embodiment of the present application.

It should be noted that the scene attribute of the object is used to indicate the scene where the object is usually located. For example, when the object is an object that generally appears outdoors, such as an airport, a tree, or a sea, the scene attribute of the object may be outdoors; when the object is an object that is usually present in a room, such as a wardrobe, a bed, or the like, the scene attribute of the object may be indoor.

In addition, when scene attributes of at least two objects in the video image of the first video are different, it is indicated that an object of another scene appears in one scene. For example, in the video image of the first video, there is an object whose scene attribute is indoor, and there is an object whose scene attribute is outdoor, which indicates that an object that should appear outdoors appears indoors or that an object that should appear indoors appears outdoors. In this case, it is highly probable that objects of different scene attributes belong to different videos, and thus it can be determined that other videos are simultaneously played in the first video, that is, a video picture including the second video among video pictures of the first video.

When it is detected that the scene attributes of the at least two objects in the video image of the first video are different, the objects included in the video image of the first video may be identified first, and then the scene attributes of the identified objects are acquired to determine the scene attributes of the objects included in the video image of the first video, and then it is determined whether the scene attributes of the at least two objects in the video image of the first video are different.

When the scene attribute of the identified object is obtained, for any identified object, the corresponding scene attribute can be obtained from the stored corresponding relationship between the object and the scene attribute according to the object, and the corresponding scene attribute is used as the scene attribute of the object.

A second possible implementation: and detecting whether the video picture of the first video contains the video picture of the second video or not according to the character information in the video picture of the first video.

Specifically, whether the target position except the subtitle position in the video image of the first video continuously appears text information within the reference duration can be detected; and when the target position continuously appears the text information within the reference time length, determining that the video pictures of the second video are contained in the video pictures of the first video.

It should be noted that the subtitle position of the first video is a position for displaying subtitle information of the first video, and the subtitle position may be set in advance, that is, the position of the subtitle position in the video image of the first video is often fixed, for example, the subtitle position may be a bottom position of the video image of the first video.

In addition, the target position may be any one of fixed positions in the video image of the first video, except for the subtitle position. The reference time period may be set in advance, for example, the reference time period may be 2 seconds, 3 seconds, and the like, which is not limited in this embodiment of the application.

Furthermore, when the target position except the subtitle position in the video image of the first video continuously appears the text information in the reference time length, it indicates that a fixed position except the subtitle position in the video image of the first video continuously appears the text information in a longer time length. In this case, it is highly possible that the text information is subtitle information of other videos that are simultaneously played in the first video, and thus it can be determined that other videos are simultaneously played in the first video, that is, a video picture of the first video includes a video picture of the second video.

When detecting whether the text information continuously appears at the target position except the subtitle position in the video image of the first video within the reference duration, the method can detect whether the text information continuously appears at the other positions except the subtitle position in the video image of the first video.

A third possible implementation: and detecting whether the video picture of the second video is contained in the video picture of the first video or not according to the audio frequency of the first video and the character information in the video picture of the first video.

Specifically, it may be detected whether the audio matches text information appearing at a position other than the subtitle position in the video image of the first video if the subtitle information does not appear at the subtitle position of the video image of the first video while the audio is being played in the first video; and when the audio is matched with the text information appearing at other positions except the subtitle position in the video image of the first video, determining that the video image of the second video is contained in the video image of the first video.

It should be noted that, when the audio is being played in the first video, if the subtitle information does not appear at the subtitle position of the video image of the first video, it is indicated that the audio is not the audio corresponding to the subtitle information of the first video. In this case, if the audio matches text information appearing at a position other than the subtitle position in the video image of the first video, it indicates that the audio is highly likely to be audio corresponding to the subtitle information of the other video simultaneously played in the first video, and that the text information is highly likely to be the subtitle information of the other video simultaneously played in the first video. It can thus be determined that other videos are simultaneously played in the first video, i.e., video pictures of the first video that include the second video.

When detecting whether the audio is matched with the text information appearing at the position except the subtitle position in the video image of the first video, the audio can be firstly converted into the text information, and simultaneously, whether the text information appears at the position except the subtitle position in the video image of the first video is detected, when the text information is detected at the position except the subtitle position in the video image of the first video, the text information obtained by the audio conversion is compared with the text information detected at the other position, if the two text information are the same, the audio is determined to be matched with the text information detected at the other position, and if the two text information are different, the audio is determined not to be matched with the text information detected at the other position.

A fourth possible implementation: and detecting whether the video pictures of the second video are contained in the video pictures of the first video according to the specified classification model.

Specifically, the video image of the first video may be input into a specified classification model, and the type of the video image of the first video may be output by the specified classification model; and when the type of the video image of the first video is the designated type, determining that the video pictures of the second video are contained in the video pictures of the first video.

It should be noted that the classification model is specified as a model for classifying the video image, that is, the classification model is specified for determining the type of the video image. The types that can be determined by a given classification model may include a given type and a non-given type. The specified type is a type of a video image containing other video images, and the unspecified type is a type of a video image not containing other video images.

In addition, when the type of the video image of the first video is the designated type, it indicates that the video image of the first video contains the video images of other videos, that is, the video picture of the second video is contained in the video picture of the first video.

Further, before the video image of the first video is input into the designated classification model, the designated classification model can be trained. Specifically, a training video image set may be used as an input sample, and the type of the training video image set is used as a sample label to train a classification model to be trained, so as to obtain an assigned classification model. That is, the training video image set may be input into a classification model to be trained, and after the output of the classification model to be trained is obtained, parameters in the classification model to be trained are adjusted by using a loss function according to a difference between the output of the classification model to be trained and the sample label to obtain an assigned classification model.

It should be noted that the training video image set may include a plurality of training video images, and the types of the plurality of training video images are all specified types, that is, the type of the training video image set is a specified type.

In addition, the classification model to be trained may be preset, for example, the classification model to be trained may be a decision tree model, a neural network model (such as a multilayer fully-connected neural network model), and the like, which is not limited in this embodiment of the present application.

The Loss function may be set in advance, for example, the Loss function may be a Euclidean distance Loss function (Euclidean Loss), a Sigmoid Cross Entropy Loss function (Sigmoid Cross Entropy Loss), a Softmax Loss function (Softmax With Loss), and the like, which is not limited in the embodiment of the present application.

The operation of adjusting the parameters in the classification model to be trained by using the loss function according to the difference between the output of the classification model to be trained and the sample label is similar to the operation of adjusting the parameters in the classification model by using the loss function according to the difference between the output of a certain model and the sample label in the related art, which is not described in detail in the embodiment of the present application.

It should be noted that, in addition to the above four possible implementation manners, it may also be detected in other manners whether the video picture of the second video is included in the video pictures of the first video, which is not limited in the embodiment of the present application.

Step 102: when the video picture of the second video is contained in the video picture of the first video, a plurality of video images are obtained from the first video.

It should be noted that each of the plurality of video images includes a video image of the second video. Through the above step 101, a plurality of video images including the video image of the second video in the first video can be determined, and at this time, the plurality of video images can be acquired.

Step 103: a video image of the second video is extracted from each of the plurality of video images.

In particular, a video border may be detected from each of the plurality of video images; and for any one video image in the plurality of video images, extracting pixel points positioned in a video frame in the video image, and forming the extracted pixel points into one video image of the second video.

It should be noted that, for the process of detecting a video frame from each of the plurality of video images, reference may be made to the first possible implementation manner in step 101, which is not described again in this embodiment of the present application.

Because the video frame is a frame body capable of playing the video inside, the image in the video frame contained in each video image in the plurality of video images is the video image of the second video, so that the pixel points in the video frame in each video image can be extracted to form the video image of the second video.

Step 104: and coding the video image of the second video according to the parameter of the first video to obtain the second video.

It should be noted that the parameter of the first video may be a parameter related to encoding of the first video, for example, the parameter of the first video may include a resolution, a bitrate, a frame rate, and the like of the first video.

In this case, the video image of the second video included in the video image of the first video is extracted, and then the extracted video image of the second video is independently encoded to obtain the second video.

Specifically, the operation of step 104 may be: storing the video image of the second video in a designated folder; and calling a specified tool according to the parameters of the first video and the name of the specified folder to indicate the specified tool to encode the video images stored in the specified folder according to the parameters of the first video to obtain a second video.

It should be noted that the designated folder may be a preset folder, and the designated folder is used for storing the video image of the second video.

In addition, the designation tool may be set in advance, and the designation tool is used for encoding the video image. For example, the designated tool may be ffmpeg (fast Forward mpeg) tool.

Furthermore, after the designation tool is called according to the parameters of the first video and the name of the designated folder, the designation tool can search the designated folder according to the name of the designated folder, then acquire the video image stored in the designated folder, and then encode the acquired video image according to the parameters of the first video to obtain the second video.

Step 105: and playing the second video.

Specifically, the second video may be directly played, or the second video may be played again when the play confirmation instruction is received.

It should be noted that the confirm play instruction is used to instruct to play the second video that is played simultaneously in the first video, the confirm play instruction may be triggered by a user, and the user may trigger through operations such as a click operation, a slide operation, a voice operation, and a gesture operation, which is not limited in this embodiment of the present application.

In addition, when the second video is played, a video playing window can be popped up on the page where the first video is played, and the second video is played in the video playing window, so that a user can watch the second video which is played simultaneously in the first video in the process of watching the first video, and the video watching experience of the user is improved. Of course, the page on which the first video is played may be switched to another page, and the second video may be played on the other page, which is not limited in this embodiment of the application.

In the embodiment of the application, in the process of playing the first video, whether the video picture of the second video is included in the video picture of the first video is detected, and the second video is other videos in the first video picture. When the video picture of the second video is contained in the video picture of the first video, a plurality of video images are obtained from the first video, and the video image of the second video is extracted from each of the plurality of video images. And then, coding the video image of the second video according to the parameter of the first video to obtain the second video. And finally, playing the second video. Therefore, the user can watch the second video which is played simultaneously in the first video in the process of watching the first video, and the flexibility of video playing is improved.

Fig. 2 is a flowchart of a video playing method provided in an embodiment of the present application, where the method is applied to a terminal. Referring to fig. 2, the method includes:

step 201: in the process of playing the first video, whether the video pictures of the second video are contained in the video pictures of the first video is detected.

It should be noted that the operation of step 201 is the same as the operation of step 101 in the embodiment of fig. 1, and details thereof are not repeated in this embodiment of the application.

Step 202: when the video picture of the second video is contained in the video picture of the first video, a plurality of video images are obtained from the first video.

It should be noted that the operation of step 202 is the same as the operation of step 102 in the embodiment of fig. 1, and the description of this embodiment is omitted here.

Step 203: and acquiring the playing address of the second video according to the image characteristics of the plurality of video images.

It should be noted that the image features of the video image may be color features, texture features, shape features, spatial relationship features, and the like of the video image.

In addition, the operation of extracting the image features of the plurality of video images is similar to the operation of extracting the image features of the images in the related art, and this is not described in detail in the embodiments of the present application. For example, an image hashing algorithm may be used to obtain image features of the plurality of video images.

Specifically, the operation of step 203 may be: acquiring first video characteristics according to the image characteristics of the plurality of video images; matching the first video features with a plurality of second video features stored in a video feature library; and if the first video characteristic is successfully matched with one of the plurality of second video characteristics, acquiring a video playing address corresponding to the successfully matched second video characteristic as a playing address of the second video.

It should be noted that, a plurality of second video features stored in the video feature library may be preset, where each of the plurality of second video features is an image feature of a video image of one video, and each of the plurality of second video features corresponds to a video playing address of one video indicated by the second video feature.

In addition, when the first video feature is successfully matched with a second video feature stored in the video feature library, the first video feature is more likely to be the image feature of the video image of the video indicated by the second video feature. Therefore, the video playing address corresponding to the second video feature can be obtained at this time, and the obtained video playing address is used as the playing address of the second video.

According to the image features of the plurality of video images, the operation of acquiring the first video feature may be: image features of the plurality of video images are determined as first video features. Or, detecting a video frame from each of the plurality of video images; for any one video image in the plurality of video images, extracting pixel points positioned in a video frame in the video image, and forming the extracted pixel points into a target image; image features of a plurality of target images are determined as first video features.

When the first video feature is matched with a plurality of second video features stored in a video feature library, the similarity between any one of the second video feature and the first video feature can be calculated, when the calculated similarity is greater than or equal to a similarity threshold value, the second video feature is determined to be successfully matched with the first video feature, and when the calculated similarity is less than the similarity threshold value, the second video feature is determined to be unsuccessfully matched with the first video feature.

It should be noted that the similarity threshold may be set in advance, and the similarity threshold may be set to be larger, for example, the similarity threshold may be 80%, 90%, and the like.

Step 204: and acquiring the second video according to the playing address of the second video.

It should be noted that the operation of step 204 is similar to the operation of obtaining a corresponding video according to a video playing address in the related art, and this is not described in detail in this embodiment of the present application.

In this case, the playing address of the second video that is played simultaneously in the first video is obtained first, and then the second video is obtained according to the playing address.

Step 205: and playing the second video.

It should be noted that the operation of step 205 is the same as the operation of step 105 in the embodiment of fig. 1, and details thereof are not repeated in this embodiment of the application.

In the embodiment of the application, in the process of playing the first video, whether the video picture of the second video is included in the video picture of the first video is detected, and the second video is other videos in the first video picture. When the video picture of the second video is contained in the video picture of the first video, a plurality of video images are obtained from the first video, and the playing address of the second video is obtained according to the image characteristics of the plurality of video images. And then, acquiring the second video according to the playing address of the second video. And finally, playing the second video. Therefore, the user can watch the second video which is played simultaneously in the first video in the process of watching the first video, and the flexibility of video playing is improved.

It is to be noted that the video playing method provided in the embodiment of the present application may be applied to a terminal installed with an operating system such as android (android), ios, and the like, and when a video picture of another video exists in a video picture of the video, the video playing method may obtain the another video appearing in the video, and a user may select whether to play the another video appearing in the video.

A possible implementation of the video playing method is described below with reference to fig. 3.

Referring to fig. 3, in the process of playing the first video, a video picture of the first video may be detected. When a video picture of a second video is detected in a video picture of a first video, extracting pixel points positioned in a video frame in each of a plurality of video pictures including the video picture of the second video in the first video to form the video picture of the second video, coding the video picture of the second video to obtain the second video, and then storing the second video; or when the video picture of the second video is detected in the video picture of the first video, extracting the image characteristics of a plurality of video images including the video image of the second video in the first video, acquiring the playing address of the second video according to the extracted image characteristics, and acquiring the second video according to the playing address.

Then, in the process of playing the first video, when the first video is played to a time point when the video image of the first video contains the video image of the second video, a play prompt button can pop up in the page where the first video is played, and the play prompt button is used for prompting a user whether to play the second video which is played simultaneously in the first video. When the user triggers a play confirmation instruction after confirming the play prompt button, the second video acquired before can be directly played.

Fig. 4 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present application. Referring to fig. 4, the apparatus includes: a detection module 401, an acquisition module 402, an extraction module 403, an encoding module 404, and a playing module 405.

A detecting module 401, configured to detect whether a video picture of a second video is included in a video picture of a first video in a process of playing the first video, where the second video is another video in the first video picture;

an obtaining module 402, configured to obtain, when a video picture of a second video is included in a video picture of a first video, a plurality of video images from the first video, where each of the plurality of video images includes a video image of the second video;

an extracting module 403, configured to extract a video image of the second video from each of the plurality of video images;

the encoding module 404 is configured to encode a video image of a second video according to a parameter of the first video to obtain a second video;

and a playing module 405, configured to play the second video.

Optionally, the extracting module 403 includes:

the video frame detection device comprises a detection unit, a display unit and a display unit, wherein the detection unit is used for detecting a video frame from each video image in a plurality of video images, and the video frame is a frame body capable of playing a video inside;

and the extraction unit is used for extracting pixel points positioned in a video frame in one video image for any one of the plurality of video images and forming the extracted pixel points into one video image of the second video.

Optionally, the encoding module 404 includes:

a storage unit configured to store a video image of a second video in a designated folder;

and the calling unit is used for calling the specified tool according to the parameters of the first video and the name of the specified folder so as to indicate the specified tool to encode the video images stored in the specified folder according to the parameters.

Optionally, the detection module 401 includes:

the first detection unit is used for detecting whether the video picture of the second video is contained in the video picture of the first video according to the object in the video picture of the first video; and/or

The second detection unit is used for detecting whether the video picture of the first video contains the video picture of the second video or not according to the character information in the video picture of the first video; and/or

The third detection unit is used for detecting whether the video picture of the second video is contained in the video picture of the first video or not according to the audio of the first video and the character information in the video picture of the first video; and/or

Optionally, the first detection unit is configured to:

detecting whether a video frame is contained in a video image of a first video; when the video image of the first video contains the video frame, determining that the video picture of the second video is contained in the video picture of the first video; or

Detecting whether scene attributes of at least two objects exist in a video image of a first video are different; when the scene attributes of at least two objects in the video image of the first video are different, determining that the video picture of the second video is contained in the video pictures of the first video.

Optionally, the second detection unit is configured to:

and when the target position continuously appears the text information within the reference time length, determining that the video pictures of the second video are contained in the video pictures of the first video.

Optionally, the third detection unit is configured to:

when the audio is being played in the first video, if the subtitle information does not appear at the subtitle position of the video image of the first video, detecting whether the audio is matched with the text information appearing at the other positions except the subtitle position in the video image of the first video;

and when the audio is matched with the text information appearing at other positions except the subtitle position in the video image of the first video, determining that the video picture of the second video is contained in the video picture of the first video.

Optionally, the fourth detection unit is configured to:

inputting the video image of the first video into a specified classification model, and outputting the type of the video image of the first video by the specified classification model;

and when the type of the video image of the first video is the designated type, determining that the video pictures of the second video are contained in the video pictures of the first video.

It should be noted that: in the video playing device provided in the above embodiment, when playing a video, only the division of the above functional modules is used for illustration, in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the video playing apparatus and the video playing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 5 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present application. Referring to fig. 5, the apparatus includes: a detection module 501, a first obtaining module 502, a second obtaining module 503, a third obtaining module 504 and a playing module 505.

A detecting module 501, configured to detect whether a video picture of a second video is included in a video picture of a first video in a process of playing the first video, where the second video is another video in the first video picture;

a first obtaining module 502, configured to obtain, when a video picture of a second video is included in a video picture of a first video, a plurality of video images from the first video, where each of the plurality of video images includes a video image of the second video;

a second obtaining module 503, configured to obtain a playing address of a second video according to image features of multiple video images;

a third obtaining module 504, configured to obtain a second video according to a playing address of the second video;

and a playing module 505, configured to play the second video.

Optionally, the second obtaining module 503 includes:

the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring first video characteristics according to the image characteristics of a plurality of video images;

the matching unit is used for matching the first video characteristics with a plurality of second video characteristics stored in the video characteristic library;

and the second obtaining unit is used for obtaining a video playing address corresponding to one successfully matched second video characteristic as the playing address of the second video if the first video characteristic is successfully matched with one of the plurality of second video characteristics.

Optionally, the first obtaining unit is configured to:

determining image features of a plurality of video images as first video features; or

Detecting a video frame from each of a plurality of video images, wherein the video frame is a frame body capable of playing a video inside; for any one video image in the plurality of video images, extracting pixel points positioned in a video frame in one video image, and forming the extracted pixel points into a target image; image features of a plurality of target images are determined as first video features.

Optionally, the detection module 501 includes:

Optionally, the first detection unit is configured to:

Optionally, the second detection unit is configured to:

Optionally, the third detection unit is configured to:

Optionally, the fourth detection unit is configured to:

Fig. 6 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present application. Referring to fig. 6, the apparatus may be a terminal 600, and the terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 600 includes: a processor 601 and a memory 602.

The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement a video playback method provided by method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this application.

The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, etc. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, disposed on the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.

The positioning component 608 is used for positioning the current geographic Location of the terminal 600 to implement navigation or LBS (Location Based Service). The positioning component 608 can be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 609 is used to provide power to the various components in terminal 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the touch screen display 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on the terminal 600. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 613 may be disposed on a side frame of the terminal 600 and/or on a lower layer of the touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 614 may be disposed on the front, back, or side of the terminal 600. When a physical button or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical button or vendor Logo.

The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.

The proximity sensor 616, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually decreases, the processor 601 controls the touch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually becomes larger, the processor 601 controls the touch display 605 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A video playback method, the method comprising:

and playing the second video.

2. The method of claim 1, wherein said extracting the video image of the second video from each of the plurality of video images comprises:

3. The method of claim 1 or 2, wherein said encoding video images of said second video according to parameters of said first video comprises:

storing the video image of the second video in a designated folder;

4. A video playback method, the method comprising:

and playing the second video.

5. The method of claim 4, wherein the obtaining the playing address of the second video according to the image features of the plurality of video images comprises:

6. The method of claim 5, wherein said obtaining a first video feature from image features of said plurality of video images comprises:

7. The method of claim 1 or 4, wherein said detecting whether the video pictures of the first video comprise video pictures of a second video comprises:

8. The method of claim 7, wherein said detecting whether the video pictures of the first video include the video pictures of the second video based on objects in the video images of the first video comprises:

9. The method of claim 7, wherein the detecting whether the video picture of the second video is included in the video pictures of the first video according to the text information in the video picture of the first video comprises:

10. The method of claim 7, wherein the detecting whether the video frame of the second video is included in the video frame of the first video according to the audio of the first video and the text information in the video image of the first video comprises:

11. The method of claim 7, wherein said detecting whether the video pictures of the first video include the video pictures of the second video according to a specified classification model comprises:

12. A video playback apparatus, comprising:

and the playing module is used for playing the second video.

13. The apparatus of claim 12, wherein the extraction module comprises:

14. The apparatus of claim 12 or 13, wherein the encoding module comprises:

15. A video playback apparatus, comprising:

and the playing module is used for playing the second video.

16. The apparatus of claim 15, wherein the second obtaining module comprises:

17. The apparatus of claim 16, wherein the first obtaining unit is to:

18. The apparatus of claim 12 or 15, wherein the detection module comprises:

19. The apparatus of claim 18, wherein the first detection unit is to:

20. The apparatus of claim 18, wherein the second detection unit is to:

21. The apparatus of claim 18, wherein the third detection unit is to:

22. The apparatus of claim 18, wherein the fourth detection unit is to:

23. A video playback apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of any of the methods of claims 1-11.

24. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of any of the methods of claims 1-11.