CN111193965B

CN111193965B - Video playing method, video processing method and device

Info

Publication number: CN111193965B
Application number: CN202010044374.9A
Authority: CN
Inventors: 周霆; 王健; 刘小辉; 陈海龙; 庹虎
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2022-09-06
Anticipated expiration: 2040-01-15
Also published as: CN111193965A

Abstract

The embodiment of the invention provides a video playing method, a video processing method and a video processing device, and belongs to the technical field of videos. The video playing method comprises the following steps: receiving target information; and displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content. In the embodiment of the invention, before the picture rendering is carried out by only using the local image of the video to be played, the plug-in subtitle corresponding to the video to be played is obtained. Because the plug-in caption and the video to be played are two independent files, the plug-in caption can be controlled independently, so that the plug-in caption can be displayed on the screen of the terminal device completely, and the playing effect is improved.

Description

Video playing method, video processing method and device

Technical Field

The present invention relates to the field of video technologies, and in particular, to a video playing method, a video processing method, and a video processing device.

Background

With the popularization of terminal devices (such as mobile phones and tablet computers) and the characteristics of small size and portability of the terminal devices, more and more users watch videos by using the terminal devices.

In the prior art, there is a video playing mode, in which only a local image of a video to be played is used for image rendering, that is, only a part of an image of each video frame of the video to be played is displayed when the video is played.

However, in the foregoing playback mode, since only a local image of one frame of video image is displayed, subtitles corresponding to the video image may also be displayed only locally, which affects the video playback effect.

Disclosure of Invention

The invention provides a video playing method, a video processing method and a video processing device, which are used for solving the problem that only partial subtitles are displayed when only partial images of a video to be played are used for picture rendering in the prior art to a certain extent.

In a first aspect of the present invention, a video playing method is provided, which is applied to a terminal device, and the video playing method includes:

receiving target information; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video;

displaying the video image in the first target image area in each video frame, and simultaneously displaying the target subtitle content; the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image.

Optionally, the subtitle content is a plug-in subtitle obtained according to the embedded subtitle in the target video.

Optionally, before the displaying the video image in the first target image area in each video frame, the video playing method further includes:

removing the embedded subtitles in the second target image area;

the second target image area is all image areas in a target video frame, the first target image area in the target video frame or image areas except the first target image area in the target video frame; the target video frame is a video frame with embedded subtitles in the target video.

Optionally, the removing the embedded subtitle in the second target image region includes:

identifying the outline of the embedded subtitle in the second target image area;

determining pixel points corresponding to the embedded subtitles according to the outline of the embedded subtitles;

and replacing the color value of the pixel point corresponding to the embedded caption by a preset color value.

Optionally, the target information further includes: the position information is coordinate point information of the same position of each video frame or central coordinate point information of a third target image area in each video frame, the third target image area is a local image area in each video frame, and the position of the third target image area in each video frame changes dynamically.

Optionally, before the displaying the video image in the first target image region in each video frame, the video playing method further includes:

and determining the first target image area in each video frame according to the position information and the rotation angle of the terminal equipment.

In a second aspect of the present invention, there is provided a video processing method applied to a server, the video processing method including:

sending the target information to the terminal equipment; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; the subtitle content is the subtitle content which is acquired and stored in advance by the server.

Optionally, before the sending the target information to the terminal device, the video processing method further includes:

and acquiring the subtitle content.

Optionally, the obtaining the subtitle content includes:

performing subtitle recognition on video frames in the target video every other preset frame number; wherein the preset frame number is greater than or equal to 0;

grouping the identified initial subtitle content according to the similarity;

taking a union set of the initial caption contents in each group, and sequencing the words in the obtained union set by taking the length of the longest public subsequence as a measurement; wherein the longest common subsequence is the longest common subsequence between the words in the union set and the initial subtitle content in the corresponding group;

and determining the sequenced words as the subtitle content.

Optionally, the target information further includes: the position information is coordinate point information of the same position of each video frame or central coordinate point information in a third target image area in each video frame; wherein the video frame is a video frame in the target video; the third target image area is a local image area in each video frame, and the position of the third target image area in each video frame is dynamically changed.

In a third aspect of the present invention, there is provided a video playing apparatus applied to a terminal device, the video playing apparatus including:

the receiving module is used for receiving the target information; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video;

the display module is used for displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content; wherein the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image.

Optionally, the subtitle content is an external subtitle obtained according to an embedded subtitle in the target video.

Optionally, the video playing apparatus further includes:

the subtitle processing module is used for removing the embedded subtitles in the second target image area;

the second target image area is all image areas in a target video frame, the first target image area in the target video frame or an image area except the first target image area in the target video frame; the target video frame is a video frame with embedded subtitles in the target video.

Optionally, the subtitle processing module includes:

the first identification unit is used for identifying the outline of the embedded subtitle in the second target image area;

the first determining unit is used for determining pixel points corresponding to the embedded subtitles according to the outline of the embedded subtitles;

and the pixel processing unit is used for replacing the color value of the pixel point corresponding to the embedded subtitle with a preset color value.

Optionally, the video playing apparatus further includes:

and the determining module is used for determining the first target image area in each video frame according to the position information and the rotation angle of the terminal equipment.

In a fourth aspect of the present invention, there is provided a video processing apparatus applied to a server, the video processing apparatus including:

the sending module is used for sending the target information to the terminal equipment; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; and the subtitle content is the subtitle content which is acquired and stored in advance by the server.

Optionally, the video processing apparatus further includes:

and the acquisition module is used for acquiring the subtitle content.

Optionally, the obtaining module includes:

the second identification unit is used for carrying out subtitle identification on the video frames in the target video every other preset frame number; wherein the preset frame number is greater than or equal to 0;

a grouping unit for grouping the identified initial caption contents according to the similarity;

the sequencing unit is used for taking a union set of the initial caption contents in each group and sequencing the words in the obtained union set by taking the length of the longest public subsequence as a measurement; wherein the longest common subsequence is the longest common subsequence between the words in the union set and the initial subtitle content in the corresponding group;

and the second determining unit is used for determining the sequenced words as the subtitle content.

In a fifth aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and a processor for implementing the steps of the video playing method or implementing the steps of the video processing method when executing the program stored in the memory.

In a sixth aspect implemented by the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the video playing method described above or implements the video processing method described above.

In a seventh aspect of the embodiments of the present invention, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the video playing method as described above or execute the video processing method as described above.

Aiming at the prior art, the invention has the following advantages:

in the embodiment of the invention, before picture rendering is performed only by using a local image (namely, a video image in a first target image area) of a video to be played (namely, a target video), a plug-in subtitle corresponding to the video to be played (namely, subtitle content corresponding to the target video) is obtained. Because the plug-in caption and the video to be played are two independent files, the plug-in caption can be controlled independently, so that the plug-in caption can be displayed on the screen of the terminal device completely, and the playing effect is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below.

Fig. 1 is a schematic flowchart of a video playing method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another video playing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a subtitle removal process according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a video processing method according to an embodiment of the present invention;

fig. 5 is a flowchart illustrating another video processing method according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating interaction between a terminal device and a server according to an embodiment of the present invention;

fig. 7 is a block diagram of a video playing apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of another video playing apparatus according to an embodiment of the present invention;

fig. 9 is a block diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 10 is a block diagram of another video processing apparatus according to an embodiment of the present invention;

fig. 11 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 1 is a schematic flowchart of a video playing method according to an embodiment of the present invention. The video playing method is applied to terminal equipment. The terminal devices described herein may include, but are not limited to: the electronic equipment comprises electronic equipment with a video playing function, such as a mobile phone, a tablet personal computer and wearable equipment.

As shown in fig. 1, the video playing method may include:

step 101: target information is received.

The target information in this step includes: the target video and the subtitle content corresponding to the target video. The subtitle content corresponding to the target video is a plug-in subtitle and is two independent files with the target video.

The target information may be sent by the server or other terminal devices. For example, when the sender of the target information is a server, the terminal device may first send a request message to the server, where the request message is used to request that the target video is played in a target playing mode, where the target playing mode is a playing mode in which a display screen is rendered with a video image in a first target image area in each video frame of the target video. And after receiving the request message, the server sends target information to the terminal equipment.

The original subtitle corresponding to the target video can be an embedded subtitle or an external subtitle.

Step 102: and displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content.

The video frame in this step is a video frame in the target video.

The first target image area in this step is a local image area in each video frame.

The target subtitle content described in this step is the complete subtitle content corresponding to the displayed video image.

Generally, the target subtitle content corresponding to the target video is played synchronously with the target video according to the playing time of the target video.

Fig. 2 is a schematic flowchart of another video playing method according to an embodiment of the present invention. The video playing method is applied to terminal equipment. The terminal devices described herein may include, but are not limited to: the electronic equipment comprises electronic equipment with a video playing function, such as a mobile phone, a tablet personal computer and wearable equipment.

As shown in fig. 2, the video playing method may include:

step 201: target information is received.

The target information in this step includes: the target video and the subtitle content corresponding to the target video.

In the embodiment of the invention, embedded captions are embedded in a target video, and the caption content corresponding to the target video is as follows: and obtaining the plug-in caption according to the embedded caption in the target video. The plug-in subtitle, the target video and the target video are two independent files.

In the embodiment of the invention, the target information can be sent by the server or other terminal equipment. For example, when the sender of the target information is a server, the terminal device may first send a request message to the server, where the request message is used to request that the target video is played in a target playing mode, where the target playing mode is a playing mode in which a display screen is rendered with a video image in a first target image area in each video frame of the target video. And after receiving the request message, the server sends target information to the terminal equipment.

Step 202: and removing the embedded subtitles in the second target image area.

The second target image area in this step is: the total image area in the target video frame or the first target image area in the target video frame. The target video frame is a video frame with embedded subtitles in the target video.

In the embodiment of the invention, before the video image in the first target image area is displayed, the embedded subtitles in the video frame can be processed and removed, so that the plug-in subtitles can present a more natural display effect.

Optionally, the embedded subtitle may be removed by blurring the image region where the embedded subtitle is located, or the embedded subtitle may be blocked by a superposition mask (e.g., a solid mask) to be removed.

Optionally, the removal of the embedded subtitles may be performed in real time during the playing process of the target video, or may be performed before the playing of the target video.

Step 203: and displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content.

The video frame in this step is a video frame in the target video.

The target subtitle content in this step is the complete subtitle content corresponding to the displayed video image.

In the embodiment of the invention, the displayed video image is the image without the embedded subtitle.

In the embodiment of the invention, before picture rendering is performed only by using a local image (namely, a video image in a first target image area) of a video to be played (namely, a target video), a plug-in subtitle corresponding to the video to be played (namely, subtitle content corresponding to the target video) is obtained. Because the plug-in subtitle and the video to be played are two independent files, the plug-in subtitle can be controlled independently, the plug-in subtitle can be displayed on a screen of the terminal device completely, and the playing effect is improved. In addition, in the embodiment of the invention, embedded subtitles in the video to be played are removed, so that the plug-in subtitles have a more natural display effect, and the playing effect is further improved.

Optionally, step 202: removing the embedded subtitle in the second target image region may include:

identifying the outline of the embedded subtitle in the second target image area; determining pixel points corresponding to the embedded captions according to the outlines of the embedded captions; and replacing the color value of the pixel point corresponding to the embedded caption by using a preset color value.

In the embodiment of the invention, the pixel points corresponding to the embedded subtitles can be replaced by other color values (namely, preset color values), so that the embedded subtitles are removed. Optionally, the preset color value may be: and (3) weighted average value of color values of pixel points (namely neighborhood pixels) in the preset range of the embedded caption.

Optionally, in the embodiment of the present invention, when determining the pixel point corresponding to the embedded subtitle according to the outline of the embedded subtitle, the outline of the embedded subtitle may be expanded first, so as to remove a void in the outline of the embedded subtitle. And then, carrying out corrosion treatment on the embedded caption without the cavity in the outline, removing random noise points, and finally determining the coverage range of the embedded caption according to the embedded caption without the random noise points so as to determine pixel points corresponding to the embedded caption. And the pixel points within the coverage range of the embedded caption are the pixel points corresponding to the embedded caption.

In order to better understand the above process of removing the subtitle content, an example is described below.

It is assumed that the purpose in this example is: and removing the embedded subtitles in the first target image area.

As shown in fig. 3, "Iron burn" and "Iron burns" shown in 301 are embedded subtitles within the first target image area.

First, performing edge detection (i.e., sobel) convolution kernel convolution on the image in the first target image region in the horizontal direction and the vertical direction to identify the outline of the embedded subtitle, where the identification result is shown as 302 in fig. 3.

Next, the identified outline of the embedded subtitle is expanded (i.e., scaled) to remove the hole in the outline, and the processing result is shown as 303 in fig. 3.

Then, the embedded subtitle with the hole removed is corroded (i.e., Erosion) to remove random noise, and an image area corresponding to the embedded subtitle is determined, where the processing result is shown as 304 in fig. 3.

Then, by using a Fast matching Method (FMM for short), pixels in the image region corresponding to the embedded subtitle are replaced by pixels outside the image region corresponding to the embedded subtitle in a weighted manner according to the sequence from outside to inside, and the processing result is shown as 305 in fig. 3.

Finally, after the foregoing processing, when the target video is played, the display effect of the "Iron burn worship" and the "Iron burn muscles" of the sprite is burned by the plug-in subtitle corresponding to the current video frame, as shown by 306 in fig. 3.

Optionally, the target information further includes: the position information (herein, referred to as first position information) of the embedded subtitle in the video frame in which the embedded subtitle is located. By the aid of the position information, when the embedded subtitles are removed, the position of the embedded subtitles to be removed in the target video frame can be quickly and accurately determined, the whole frame of video image is not required to be identified, only a specific local image is required to be identified, the calculation amount is low, the efficiency is high, and reduction of power consumption of the terminal equipment is facilitated.

Optionally, the second target image area may also be: an image area in the target video frame other than the first target image area.

After rendering the display with the video image within the first target image area in each video frame, the user may have a need to view the video image outside the first target image area. For example, when the target video is a landscape video, the terminal device is displaying the video image in the first target image area in each video frame of the landscape video in a portrait mode. It is assumed that the terminal device has performed removal processing only on the embedded subtitle within the first target image area at this time. Since the user may have a need to view a video image on the right or left side of the first target image area, when an operation for displaying a video image outside the first target image area is detected, the image area to be displayed may be determined, and then the removal processing operation may be performed on the embedded subtitle in the image area to be displayed. Further, assume that the left-sliding operation on the screen of the terminal device is an operation for viewing a video image in an image area on the right side of the first target image area. The right slide operation on the screen of the terminal device is an operation for viewing a video image in an image area on the left side of the first target image area. When detecting the left-sliding operation, the terminal equipment removes the embedded subtitles in the right image area of the first target image area; and when detecting the right sliding operation, the terminal equipment removes the embedded subtitles in the left image area of the first target image area.

It should be noted that "left" and "right" are referred to the user, for example, when the screen of the terminal device is disposed facing the user, the right side of the second target image area is the right side of the user, and the left side of the second target image area is the left side of the user.

Optionally, the target information further includes: position information (here denoted as second position information). The position information may be coordinate information of the same position of each video frame, such as coordinate point information of a center point of each video frame. The position information may also be center coordinate point information of the third target image area (i.e., coordinate information of a center point of the third target image area) in each of the video frames. The third target image area is a local image area in each of the video frames. In general, the video image in the third target image area is the image content that is relatively focused by the user, and therefore, the position of the third target image area in each video frame is dynamically changed.

In the embodiment of the present invention, the first target image area in each video frame may be determined according to the aforementioned position information (i.e., the second position information), the rotation angle of the terminal device, and the screen size of the terminal device.

For example, when playing the target video, the rotation angle of the terminal device may be determined first, and assuming that the current display state of the terminal device is vertical and upright, the rotation angle at this time is determined to be 0 °, and the image to be displayed should be played upright. When the rotation angle is greater than 0 °, an included angle greater than 0 ° is formed between the image to be displayed and the terminal screen, that is, the image to be displayed is always upright with respect to the user. Then, a first target image area in each video frame is determined based on the position information and the screen size information of the terminal device. For example, a coordinate point corresponding to the position information is determined as a central point of the first target image area; determining the width value of the screen of the terminal equipment as the width value of the second target image area; and determining the height value of the target cross screen video as the height value of the second target image area, thereby determining the first target image area. Therefore, the determined first target image area can be more adaptive to the screen size of the terminal device, and the watching requirement of a user can be better met.

Optionally, the target information may further include: and the corresponding relation between the subtitle content and the playing time of the target video. For example, the target video has an embedded subtitle of "clear weather", and a corresponding relationship is established between the subtitle content and the corresponding playing time period based on which playing time period the subtitle content continuously appears. Alternatively, the correspondence between the subtitle content and the playing time of the target video may also be replaced by the correspondence between the subtitle content and the video frame. For example, continuing with the foregoing example as an example, based on which video frames the subtitle content continuously appears in, a correspondence may be established between the subtitle content and the video frames.

In the embodiment of the invention, before picture rendering is performed only by using a local image (namely, a video image in a first target image area) of a video to be played (namely, a target video), a plug-in subtitle corresponding to the video to be played (namely, subtitle content corresponding to the target video) is obtained. Because the plug-in caption and the video to be played are two independent files, the plug-in caption can be controlled independently, so that the plug-in caption can be displayed on the screen of the terminal device completely, and the playing effect is improved. In addition, in the embodiment of the invention, embedded subtitles in a video to be played are removed, so that the plug-in subtitles have a more natural display effect, and the playing effect is further improved.

Fig. 4 is a flowchart illustrating a video processing method according to an embodiment of the present invention. The video processing method is applied to a server (namely a cloud).

As shown in fig. 4, the video processing method may include:

step 401: and sending the target information to the terminal equipment.

The target information in this step includes: the target video and the subtitle content corresponding to the target video. And the subtitle content corresponding to the target video is the subtitle content which is acquired and stored in advance by the server. The subtitle content corresponding to the target video is a plug-in subtitle and is two independent files with the target video. The original subtitle corresponding to the target video can be an embedded subtitle or an external subtitle.

In the embodiment of the invention, the server can acquire the subtitle content corresponding to the target video in advance and send the target information to the terminal equipment when the terminal equipment needs the subtitle content, so that the terminal equipment displays the video image in the first target image area in each video frame of the target video and simultaneously displays the target subtitle content. The first target image area is a local image area in each video frame. The target subtitle content is full subtitle content corresponding to the displayed video image.

For example, in order to obtain the target information, the terminal device may first send a request message to the server, where the request message is used to request that the target video is played in a target playing mode, where the target playing mode is a playing mode in which a display screen is rendered with a video image in a first target image area in each video frame of the target video. And after receiving the request message, the server sends target information to the terminal equipment.

In the embodiment of the invention, the subtitle content sent by the server to the terminal equipment is the plug-in subtitle which is acquired by the server in advance and corresponds to the target video. Because the plug-in subtitle and the video to be played are two independent files, the terminal equipment can independently control the plug-in subtitle to be displayed on a screen of the terminal equipment completely, and the playing effect is improved.

Fig. 5 is a schematic flowchart of another video processing method according to an embodiment of the present invention. The video processing method is applied to a server (namely a cloud).

As shown in fig. 5, the video processing method may include:

step 501: and acquiring subtitle content corresponding to the target video.

In the embodiment of the invention, embedded captions are embedded in a target video, and the obtained caption content corresponding to the target video is as follows: and obtaining the plug-in caption according to the embedded caption in the target video. The plug-in subtitle, the target video and the target video are two independent files.

Alternatively, after the server obtains the subtitle content, the subtitle content and the target video may be separately stored.

Step 502: and sending the target information to the terminal equipment.

In the embodiment of the invention, the server can acquire the subtitle content corresponding to the target video in advance and send the target information to the terminal equipment when the terminal equipment needs the subtitle content, so that the terminal equipment displays the video image in the first target image area in each video frame of the target video and simultaneously displays the target subtitle content. The first target image area is a local image area in each video frame. The target subtitle content is complete subtitle content corresponding to the displayed video image.

In the embodiment of the invention, the subtitle content sent by the server to the terminal equipment is the plug-in subtitle which is acquired by the server in advance and corresponds to the target video. Because the plug-in caption and the video to be played are two independent files, the terminal equipment can independently control the plug-in caption, so that the plug-in caption is completely displayed on a screen of the terminal equipment, and the playing effect is improved.

Optionally, step 501: the obtaining of the subtitle content may include:

performing caption identification on video frames in the target horizontal screen video every other preset frame number; taking a union set of the initial caption contents in each group, and sequencing the words in the obtained union set by taking the length of the longest public subsequence as a measurement; and determining the sequenced words as the subtitle content corresponding to the target video.

Wherein the preset frame number is greater than or equal to 0.

In the embodiment of the present invention, the subtitle content in the target video may be identified by an Optical Character Recognition (OCR) technology. When the subtitles in the target video are identified, the subtitles can be identified for each video frame, and the subtitles can be identified once every several video frames, so that the data processing amount is reduced. It is understood that the specific case may be selected as desired.

For the identified subtitle content (i.e., the initial subtitle content), the subtitle content may be grouped according to the similarity of the subtitle content, i.e., the initial subtitle content determined as the same subtitle content by the similarity. Optionally, when grouping is performed, similarity comparison may be performed on the caption identification results (i.e., the initial caption content) of two adjacent video frames in sequence according to the sequence of the video frames in the target video. And if the similarity is greater than or equal to the preset similarity value, dividing the subtitle recognition results of the two video frames into a group. And if the similarity is smaller than the preset similarity value, regarding the two video frames as boundary video frames of two groups, wherein the former video frame belongs to one group, and the latter video frame belongs to the other group. The two adjacent frames of video frames are the two adjacent frames of video frames subjected to caption identification.

After grouping is completed, a union set is taken for the initial caption contents in the same group, and the words in the union set are sequenced by taking the length of the longest public subsequence as a measure; and determining the subtitle content corresponding to the sequenced words as the subtitle content corresponding to the target video. Wherein, the longest common subsequence herein is: the longest common subsequence between words in the union set and the original subtitle content in the corresponding group. Therefore, the accuracy of the subtitle content can be better guaranteed.

Assume that the initial subtitle content in a packet is: and caption content A: "what appears, this time all reflected on the scroll" and subtitle content B: "show, this time all reflected on the rolled surface". When the subtitle content A and the subtitle content B are directly merged, because the elements in the merged set are not in the sequence, the word sequence of the subtitle content obtained by merging is probably inconsistent with the word sequence of the original subtitle content, and the recognition result is inaccurate. In an embodiment of the present invention, to overcome this problem, after the union set is obtained, the words in the union set may be sorted by using the length of the longest common subsequence as a measure, that is, the words in the obtained union set need to have the longest common subsequence with the subtitle content A, B. In this example, the longest common subsequence between words in the union set and subtitle content A, B is "what appears, this time all reflected in the scroll", so that the result of ordering words in the union set, measured by the length of the longest common subsequence, is: "what behaves, this time all react on the roll faces".

Optionally, in the embodiment of the present invention, when acquiring subtitle content, syntax correction may be performed on an english subtitle, so that only the accuracy of the subtitle content may be improved.

Optionally, the target information further includes: position information (herein denoted as second position information). The position information may be coordinate information of the same position of each video frame, such as coordinate point information of a center point of each video frame. The position information may also be center coordinate point information of the third target image area (i.e., coordinate information of a center point of the third target image area) in each of the video frames. The video frame described herein is a video frame in the target video. The third target image area is a local image area in each of the video frames. In general, the video image in the third target image area is the image content that is relatively focused by the user, and therefore, the position of the third target image area in each video frame is dynamically changed.

Alternatively, the terminal device may determine the first target image area in each video frame according to the aforementioned position information (i.e., the second position information), the rotation angle of the terminal device, and the screen size of the terminal device.

For example, when playing the target video, the rotation angle of the terminal device may be determined, and assuming that the current display state of the terminal device is vertical and upright, the rotation angle at this time is determined to be 0 °, and the image to be displayed should be displayed upright relative to the screen of the terminal device. When the rotation angle is larger than 0 °, the image to be displayed and the screen also have an included angle larger than 0 °, but the image to be displayed is always vertically displayed with respect to the user. Then, a first target image area in each video frame is determined based on the position information and the screen size information of the terminal device. For example, a coordinate point corresponding to the position information is determined as a central point of the first target image area; determining the width value of the screen of the terminal equipment as the width value of the second target image area; and determining the height value of the target cross screen video as the height value of the second target image area, thereby determining the first target image area. Therefore, the determined first target image area can be more adaptive to the screen size of the terminal device, and the watching requirement of a user can be better met. In addition, in the embodiment of the invention, although the terminal device only displays the video image in the first target image area, the actually played video is still the whole target video, and the re-clipping or splicing of the video is not involved at the server side, so that the integrity of the plot and copyright of the video program can be fully ensured. Meanwhile, since the video medium does not need to be re-encoded and distributed, the video production efficiency and the distribution service quality are more guaranteed.

Optionally, the embedded subtitle in the target video may be removed in the server, and the specific processing procedure is similar to the embedded subtitle removal processing procedure performed by the terminal device. When the server processes the embedded subtitles, the embedded subtitles in the whole frame of video image can be processed. For example: recognizing the outline of the embedded subtitle in the target video frame, performing expansion processing on the recognized outline of the embedded subtitle, and removing a cavity in the outline of the embedded subtitle; corroding the embedded captions with the cavities in the outline removed to remove random noise points; according to the embedded captions with random noise points removed, determining the coverage range of the embedded captions, and further determining pixel points corresponding to the embedded captions; and finally, replacing the color value of the pixel point corresponding to the embedded caption by a preset color value. The preset color value may be: and (3) the weighted average value of the color values of the pixel points (namely the neighborhood pixels) in the preset range of the embedded caption. The target video frame is a video frame with embedded subtitles in the target video.

The processing operation of removing the caption is carried out at the server side, so that the terminal equipment does not need to process the embedded caption in the target video, the data processing amount of the terminal equipment can be reduced to a certain degree, and the power consumption of the terminal equipment is reduced.

Fig. 6 is a schematic diagram of interaction between a server and a terminal device according to an embodiment of the present invention. Through the description of the interaction example, the technical solution provided by the embodiment of the present invention is further explained.

It is assumed that, in the example, the first target image area is determined with the center coordinate point information of the third target image area.

As shown in fig. 6, at the server side, a third target image area of each video frame of the target video is determined by an Artificial Intelligence (AI) technique, and center coordinate point information of the third target image area is acquired, and the embedded subtitles of each video frame of the target video are identified, so as to acquire subtitle content (i.e., plug-in subtitles). And then, storing the central coordinate point information of the third target image area and the subtitle content in a cloud broadcasting control platform.

When detecting a request operation triggered by a user to play a target video in a target play mode (the explanation on the target play mode is provided in the foregoing, which is not described herein any more), the terminal device downloads the target video, the center coordinate point information of the third target image area, and the subtitle content from the cloud broadcast control platform. And the terminal equipment determines a first target image area in each video frame of the target video according to the central coordinate point information of the third target image area, the rotation angle and the screen size of the terminal equipment, displays the video image in the first target image area through the player, and synchronously displays the complete plug-in subtitle corresponding to the current video image. In addition, the terminal device can also remove the original subtitle content (namely the embedded subtitle) in the target video, and repair the pixel information corresponding to the embedded subtitle by using the pixel information around the embedded subtitle.

Fig. 7 is a block diagram of a video playing apparatus according to an embodiment of the present invention, where the video playing apparatus is applied to a terminal device.

As shown in fig. 7, the video processing apparatus 700 may include:

a receiving module 701, configured to receive target information.

Wherein the target information includes: the method comprises the steps of target video and subtitle content corresponding to the target video.

A display module 702, configured to display the video image in the first target image area in each video frame and display the target subtitle content at the same time.

The video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image.

In the embodiment of the invention, before picture rendering is performed only by using a local image (namely, a video image in a first target image area) of a video to be played (namely, a target video), a plug-in subtitle corresponding to the video to be played (namely, subtitle content corresponding to the target video) is obtained. Because the plug-in subtitle and the video to be played are two independent files, the plug-in subtitle can be controlled independently, the plug-in subtitle can be displayed on a screen of the terminal device completely, and the playing effect is improved.

Fig. 8 is a block diagram of another video playing apparatus provided in an embodiment of the present invention, where the video playing apparatus is applied to a terminal device.

As shown in fig. 8, the video processing apparatus 800 may include:

a receiving module 801, configured to receive target information.

Wherein the target information includes: the target video and the subtitle content corresponding to the target video.

A display module 802, configured to display the video image in the first target image area in each video frame, and simultaneously display the target subtitle content.

Wherein the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image.

Optionally, the video playing apparatus 800 further includes:

and the subtitle processing module 803 is configured to remove the embedded subtitle in the second target image region.

Optionally, the subtitle processing module 803 includes:

the first identifying unit 8031 is configured to identify an outline of the embedded subtitle within the second target image area.

The first determining unit 8032 is configured to determine, according to the outline of the embedded subtitle, a pixel point corresponding to the embedded subtitle.

And the pixel processing unit 8033 is configured to replace the color value of the pixel point corresponding to the embedded subtitle with a preset color value.

Optionally, the video playing apparatus 800 further includes:

a determining module 804, configured to determine the first target image area in each video frame according to the position information and the rotation angle of the terminal device.

Fig. 9 is a block diagram of a video processing apparatus according to an embodiment of the present invention, where the video processing apparatus is applied to a server.

As shown in fig. 9, the video processing apparatus 900 may include:

a sending module 901, configured to send the target information to the terminal device.

Wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; the subtitle content is the subtitle content which is acquired and stored in advance by the server.

Fig. 10 is a block diagram of a video processing apparatus according to an embodiment of the present invention, where the video processing apparatus is applied to a server.

As shown in fig. 10, the video processing apparatus 1000 may include:

a sending module 1001, configured to send the target information to the terminal device.

Optionally, the video processing apparatus 1000 further includes:

an obtaining module 1002, configured to obtain the subtitle content.

Optionally, the obtaining module 1002 includes:

and a second identification unit 1021, configured to perform subtitle identification on video frames in the target video every preset frame number.

Wherein the preset frame number is greater than or equal to 0.

A grouping unit 1022, configured to group the identified initial subtitle content according to the similarity.

And the sorting unit 1023 is used for merging the initial subtitle content in each group and sorting the words in the obtained union set by taking the length of the longest common subsequence as a measure.

And the longest common subsequence is the longest common subsequence between the words in the union set and the initial caption contents in the corresponding group.

A second determining unit 1024, configured to determine the sorted words as the subtitle content.

For the above-mentioned apparatus embodiments, since they are substantially similar to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points.

An embodiment of the present invention further provides an electronic device, as shown in fig. 11, including a processor 1101, a communication interface 1102, a memory 1103, and a communication bus 1104, where the processor 1101, the communication interface 1102, and the memory 1103 complete mutual communication through the communication bus 1104.

A memory 1103 for storing a computer program;

when the electronic device is a terminal device, the processor 1101 is configured to execute the program stored in the memory 1103, and implement the following steps:

receiving target information; and displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content.

Wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image.

Optionally, before displaying the video image in the second target image area in full screen, when the processor 1101 executes the program stored on the memory 1103, the following steps are further implemented:

and removing the embedded subtitles in the second target image area.

determining pixel points corresponding to the embedded subtitles according to the outlines of the embedded subtitles;

Optionally, before displaying the video image in the first target image area in each video frame, when the processor 1101 executes the program stored in the memory 1103, the following steps are further implemented:

When the electronic device is a server, the processor 1101 is configured to execute a program stored in the memory 1103, and implements the following steps:

and sending the target information to the terminal equipment.

Optionally, before the destination information is sent to the terminal device, when the processor 1101 executes the program stored in the memory 1103, the following steps are further implemented:

and acquiring the subtitle content.

Optionally, the obtaining the subtitle content includes:

performing caption identification on video frames in the target video every other preset frame number; wherein the preset frame number is greater than or equal to 0;

grouping the identified initial subtitle content according to the similarity;

and determining the sequenced words as the subtitle content.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the terminal and other devices.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In still another embodiment provided by the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the computer-readable storage medium runs on a computer, the computer is caused to execute the method for upgrading an application program described in the above embodiment.

In yet another embodiment provided by the present invention, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to execute the method for upgrading an application program described in the above embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A video playing method is applied to terminal equipment and is characterized by comprising the following steps:

receiving target information; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; the subtitle content is an external subtitle obtained according to the embedded subtitle in the target video; the subtitle content corresponding to the target video is complete subtitle content corresponding to the target video;

displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content; the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image; the video image is the image without the embedded caption.

2. The video playback method according to claim 1, wherein before said displaying the video image in the first target image area in each video frame, the video playback method further comprises:

removing the embedded subtitles in the second target image area;

3. The video playing method according to claim 2, wherein the removing the embedded subtitle in the second target image region comprises:

4. The video playing method according to claim 1, wherein the target information further comprises: the position information is coordinate point information of the same position of each video frame or central coordinate point information of a third target image area in each video frame, the third target image area is a local image area in each video frame, and the position of the third target image area in each video frame changes dynamically.

5. The video playback method according to claim 4, wherein before said displaying the video image in the first target image area in each video frame, the video playback method further comprises:

6. A video processing method applied to a server is characterized by comprising the following steps:

sending the target information to the terminal equipment; wherein the target information includes: the subtitle content is a plug-in subtitle obtained by the server according to an embedded subtitle in the target video; the subtitle content corresponding to the target video is complete subtitle content corresponding to the target video; the terminal equipment displays the video image in the first target image area in each video frame and simultaneously displays the target subtitle content; wherein the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image; the video image is the image from which the embedded subtitle is removed.

7. The video processing method according to claim 6, wherein before said transmitting the destination information to the terminal device, the video processing method further comprises:

and acquiring the subtitle content.

8. The video processing method according to claim 7, wherein said obtaining the subtitle content comprises:

grouping the identified initial subtitle content according to the similarity;

taking a union set of the initial caption contents in each group, and sequencing the words in the obtained union set by taking the length of the longest public subsequence as a measurement; the longest common subsequence is the longest common subsequence between the words in the union set and the initial caption contents in the corresponding group;

and determining the sequenced words as the subtitle content.

9. The video processing method of claim 6, wherein the target information further comprises: position information, wherein the position information is coordinate point information of the same position of each video frame or center coordinate point information in a third target image area in each video frame; wherein the video frame is a video frame in the target video; the third target image area is a local image area in each video frame, and the position of the third target image area in each video frame is dynamically changed.

10. A video playing device is applied to terminal equipment, and is characterized by comprising:

the receiving module is used for receiving the target information; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; the subtitle content is an external subtitle obtained according to the embedded subtitle in the target video; the subtitle content corresponding to the target video is complete subtitle content corresponding to the target video;

the display module is used for displaying the video image in the first target image area in each video frame and simultaneously displaying the target subtitle content; wherein the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image; the video image is the image from which the embedded subtitle is removed.

11. The video playback device according to claim 10, further comprising:

12. The video playback device according to claim 11, wherein the subtitle processing module comprises:

the first determining unit is used for determining pixel points corresponding to the embedded subtitles according to the contour of the embedded subtitles;

13. The video playback device of claim 10, wherein the target information further comprises: the position information is coordinate point information of the same position of each video frame or central coordinate point information of a third target image area in each video frame, the third target image area is a local image area in each video frame, and the position of the third target image area in each video frame changes dynamically.

14. The video playback device according to claim 13, wherein the video playback device further comprises:

15. A video processing apparatus applied to a server, the video processing apparatus comprising:

the sending module is used for sending the target information to the terminal equipment; wherein the target information includes: the method comprises the steps of obtaining a target video and subtitle content corresponding to the target video; the subtitle content is an externally-hung subtitle obtained by the server according to the embedded subtitle in the target video; the subtitle content corresponding to the target video is complete subtitle content corresponding to the target video; the terminal equipment displays the video image in the first target image area in each video frame and simultaneously displays the target subtitle content; wherein the video frame is a video frame in the target video; the first target image area is a local image area in each video frame; the target subtitle content is complete subtitle content corresponding to the displayed video image; the video image is the image without the embedded caption.

16. The video processing device according to claim 15, wherein the video processing device further comprises:

and the acquisition module is used for acquiring the subtitle content.

17. The video processing apparatus of claim 16, wherein the obtaining module comprises:

18. The video processing apparatus according to claim 15, wherein the target information further includes: the position information is coordinate point information of the same position of each video frame or central coordinate point information in a third target image area in each video frame; wherein the video frame is a video frame in the target video; the third target image area is a local image area in each video frame, and the position of the third target image area in each video frame is dynamically changed.

19. An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the video playback method according to any one of claims 1 to 5 or the video processing method according to any one of claims 6 to 9 when executing a program stored in the memory.

20. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the video playback method according to any one of claims 1 to 5, or implements the video processing method according to any one of claims 6 to 9.