CN112087660A

CN112087660A - Video processing method, device and equipment

Info

Publication number: CN112087660A
Application number: CN201910517023.2A
Authority: CN
Inventors: 曾以亮; 毛春静
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2020-12-15
Also published as: WO2020249100A1

Abstract

The embodiment of the application provides a video processing method, a video processing device and video processing equipment, wherein the method comprises the following steps: the method comprises the steps that first equipment receives a second video frame sent by second equipment and text information extracted from the first video frame, wherein the second video frame is obtained by compressing the first video frame, and the text information comprises text content and attribute information of the text content; and the first equipment adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played. The quality of the compressed video is improved.

Description

Video processing method, device and equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video processing method, apparatus, and device.

Background

The video includes a plurality of video frames, each of which may include text content, for example, the text content may include subtitles, barrages, prompts, and the like.

At present, the resolution of video is continuously improved, so that the bandwidth pressure of a video transmission channel is also increased more and more. To relieve bandwidth pressure on the video transmission channel, the video may be compressed prior to transmission. However, in the actual application process, after the video is compressed, the text content in the video becomes blurred, which affects the recognition of the text content by the user, and results in low quality of the compressed video.

Disclosure of Invention

The application provides a video processing method, a video processing device and video processing equipment. The quality of the compressed video is improved.

In a first aspect, an embodiment of the present application provides a video processing method, where a first device receives a second video frame sent by a second device and text information extracted from the first video frame, the second video frame is obtained by compressing the first video frame, the text information includes text content and attribute information of the text content, and the first device adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

In the above process, for any first video frame in the video stream, the second device extracts the text information in the first video frame, and compresses the first video frame to obtain the second video frame. The second device sends the second video frame and the text information to the first device, and the second video frame is a compressed video frame, so that the bandwidth pressure of a transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device can combine the second video frame and the text information to combine the text content in the text information to the second video frame to obtain a third video frame.

In one possible implementation, the first device may add the text content to the second video frame to obtain a third video frame to be played through the following feasible implementation manners: the first equipment generates a first image corresponding to the text content according to the text content and the attribute information, wherein the resolution of the first image is greater than that of the second video frame; and the first equipment adds the first image into the second video frame according to the attribute information to obtain a third video frame.

In the above process, the first image generated according to the text content and the attribute information includes the text content, and since the resolution of the first image is greater than the resolution of the second video frame, the definition of the text content in the first image can be made higher, so that even if the video frame is compressed, the definition of the text content in the video frame can be made higher.

In a possible implementation manner, the generating, by the first device, a first image corresponding to the text content according to the text content and the attribute information includes: the method comprises the steps that first equipment determines at least one group of text contents in the text contents, wherein each group of text contents comprises at least one character, the font, the size, the color and the font special effect of each character in the group of text contents are the same, and the font special effect comprises at least one of affine, rotation or projection; and the first equipment generates a first image corresponding to each group of text content according to the attribute information of each group of text content.

In the above process, the text content in the first video is divided into at least one group of text content, and since the font, size, color and font special effect of the characters in each group of text content are the same, the first images corresponding to each group of text content can be generated respectively, so that the accuracy of each generated first image can be higher.

In one possible embodiment, the region of the first image other than the text content is transparent.

In the above process, since the area except the text content in the first image is transparent, the first image can be prevented from overlaying the video picture in the second video when the first image is added to the second video frame.

In a possible implementation manner, the adding, by the first device, the first image to the second video frame according to the attribute information to obtain a third video frame to be played includes: and the first equipment acquires the position information of the text content in the first video frame in the attribute information, and adds the first image to the second video frame according to the position information to obtain a third video frame.

In the above process, according to the position information of the text content in the first video frame, the first image can be accurately added to the second video frame, so that the position of the text information in the first image in the second video frame is the same as the position of the text content in the first video frame.

In a possible implementation manner, before the first device adds the text content to the second video frame according to the attribute information, the first device acquires the first identifier in the second video frame; the first equipment acquires a second identifier from the text information; the first device determines that the first identity and the second identity are the same. Optionally, the first identifier and the second identifier are the same timestamp.

In the above process, when the first identifier and the second identifier are the same, the text content and the second video frame are corresponding to the same first video frame, so that the first image corresponding to the text content can be added to the correct second video frame.

In a possible implementation, the receiving, by the first device, the second video frame sent by the second device and the text information extracted from the first video frame includes: the first device receives a second video frame sent by the second device from the first transmission channel; the first device receives the text information sent by the second device from a second transmission channel, wherein the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

In the above process, the second device sends the second video frame and the text information to the first device in different transmission channels, and correspondingly, the first device receives the second video frame and the text information from different transmission channels, so that not only the data transmission efficiency is high, but also the transmission mode for transmitting data in each transmission channel is simple.

In one possible embodiment, the attribute information includes a position, a font, a size, a color, and a font special effect of the text content in the video frame, the font special effect including at least one of affine, rotation, or projection.

In a possible implementation manner, after the first device adds the text content to the second video frame according to the attribute information and obtains a third video frame to be played, the method further includes:

the first equipment plays the third video frame;

alternatively, the first and second electrodes may be,

and the first equipment sends the third video frame to the third equipment, and the third equipment is used for playing the third video frame.

In a second aspect, an embodiment of the present application provides a video processing method, where a second device extracts text information in a first video frame, where the text information includes text content and attribute information; the second equipment compresses the first video frame to obtain a second video frame; the second device sends the second video frame and the text information to the first device.

In a possible implementation, before the second device sends the second video frame and the text information to the first device, the method further includes: the second equipment generates a first identifier; the second device adds the first identifier to the second video frame and the text information, respectively. Optionally, the first identifier is a timestamp generated by the second device.

In the above process, by adding the first identifier to the second video frame and the text information, the first device can determine the corresponding relationship between the second video frame and the text information according to the first identifier, and further the first device can add the first image corresponding to the text content to the correct second video frame.

In one possible embodiment, the second device sends the second video frame and the text information to the first device, including: the second equipment sends a second video frame to the first equipment through the first transmission channel; and the second equipment sends the text information to the first equipment through a second transmission channel, wherein the second transmission channel is a parallel bypass small-bandwidth channel of the first transmission channel.

In the process, the second device sends the second video frame and the text information to the first device in different transmission channels, so that the data transmission efficiency is high, and the transmission mode for transmitting data in each transmission channel is simple.

In a third aspect, an embodiment of the present application provides a video processing apparatus, including a receiving module and a processing module, where,

the receiving module is used for receiving a second video frame sent by a second device and text information extracted from the first video frame, wherein the second video frame is obtained by compressing the first video frame, and the text information comprises text content and attribute information of the text content;

and the processing module is used for adding the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

In a possible implementation, the processing module is specifically configured to:

generating a first image corresponding to the text content according to the text content and the attribute information, wherein the resolution of the first image is greater than that of the second video frame;

and adding the first image into the second video frame according to the attribute information to obtain the third video frame.

determining at least one group of text contents in the text contents, wherein each group of text contents comprises at least one character, the font, the size, the color and the font special effect of each character in the group of text contents are the same, and the font special effect comprises at least one of affine, rotation or projection;

and respectively generating a first image corresponding to each group of text content according to the attribute information of each group of text content.

acquiring the position information of the text content in the first video frame in the attribute information

And adding the first image into the second video frame according to the position information to obtain a third video frame.

In a possible implementation, before the processing module adds the text content to the second video frame according to the attribute information, the processing module is further configured to:

acquiring a first identifier in the second video frame;

acquiring a second identifier in the text information;

determining that the first identity and the second identity are the same.

In a possible embodiment, the first identifier and the second identifier are the same time stamp.

In a possible implementation, the receiving module is specifically configured to:

receiving the second video frame sent by the second device from a first transmission channel;

and receiving the text information sent by the second equipment from a second transmission channel, wherein the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

In one possible implementation, the attribute information includes a position, a font, a size, a color, and a font special effect of the text content in the video frame, the font special effect including at least one of affine, rotation, or projection.

In a possible implementation manner, after the processing module adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played, the processing module is further configured to:

playing the third video frame;

alternatively, the first and second electrodes may be,

and sending the third video frame to a third device, wherein the third device is used for playing the third video frame.

In a fourth aspect, an embodiment of the present application provides a video processing apparatus, including a processing module and a sending module, wherein,

the processing module is used for extracting text information from the first video frame, wherein the text information comprises text content and attribute information;

the processing module is further configured to compress the first video frame to obtain a second video frame;

the sending module is configured to send the second video frame and the text message to the first device.

In a possible implementation, before the sending module sends the second video frame and the text information to the first device, the processing module is further configured to:

generating a first identifier;

adding the first identifier to the second video frame and the text information, respectively.

In one possible implementation, the first identifier is a timestamp generated by the second device.

In a possible implementation manner, the sending module is specifically configured to:

transmitting the second video frame to the first device through a first transmission channel;

and sending the text information to the first equipment through a second transmission channel, wherein the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

In a fifth aspect, an embodiment of the present application provides a video processing apparatus, including: a memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the video processing method according to any of the first aspect.

In a sixth aspect, an embodiment of the present application provides a video processing apparatus, including: a memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the video processing method according to any of the second aspect.

In a seventh aspect, an embodiment of the present application provides a storage medium, where the storage medium includes a computer program, and the computer program is used to implement the video processing method according to any one of the first aspect.

In an eighth aspect, an embodiment of the present application provides a storage medium, where the storage medium includes a computer program, and the computer program is used to implement the video processing method according to any one of the second aspects.

In a ninth aspect, an embodiment of the present application further provides a chip or an integrated circuit, including: a memory and a processor;

the memory for storing program instructions and sometimes intermediate data;

the processor is configured to invoke the program instructions stored in the memory to implement the video processing method according to any of the first aspect.

In a tenth aspect, an embodiment of the present application further provides a chip or an integrated circuit, including: a memory and a processor;

the memory for storing program instructions and sometimes intermediate data;

the processor is configured to call the program instructions stored in the memory to implement the video processing method according to any one of the second aspect.

In an eleventh aspect, the present application further provides a program product, where the program product includes a computer program, where the computer program is stored in a storage medium, and the computer program is used to implement the video processing method according to any one of the first aspect.

In a twelfth aspect, the present application further provides a program product, where the program product includes a computer program, where the computer program is stored in a storage medium, and the computer program is used to implement the video processing method according to any one of the second aspects.

According to the video processing method, the video processing device and the video processing equipment, aiming at any one first video frame in a video stream, the second equipment firstly extracts text information from the first video frame and compresses the first video frame to obtain a second video frame. The second device sends the second video frame and the text information to the first device, and the second video frame is a compressed video frame, so that the bandwidth pressure of a transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device can combine the second video frame and the text information to combine the text content in the text information to the second video frame to obtain a third video frame.

Drawings

FIG. 1 is a diagram of a system architecture according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 3 is a schematic view of a video frame according to an embodiment of the present application;

fig. 4A is a schematic view of another video frame provided in the embodiment of the present application;

fig. 4B is a schematic view of another video frame provided in the embodiment of the present application;

fig. 4C is a schematic view of another video frame provided in the embodiment of the present application;

FIG. 5 is an architecture diagram for processing video provided by an embodiment of the present application;

fig. 6 is a schematic flowchart of another video processing method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a first image provided by an embodiment of the present application;

fig. 8 is a schematic view of a video processing process according to an embodiment of the present application;

fig. 9 is a schematic view of another video processing process provided in the embodiment of the present application;

fig. 10 is a video processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic diagram of another video processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic hardware structure diagram of a video processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic hardware structure diagram of a video processing apparatus according to an embodiment of the present application.

Detailed Description

For ease of understanding, the system architecture used in the present application will first be described.

Fig. 1 is a system architecture diagram according to an embodiment of the present disclosure. Referring to fig. 1, a first device 101 and a second device 102 are included. The second device 102 and the first device 101 have a video transmission channel therebetween, the second device 102 can transmit a video stream to the first device 101 through the transmission channel, and the first device 101 can play the received video stream.

Before the second device 102 sends the video stream to the first device 101, for any one video frame in the video stream, the second device 102 may extract text information in the video frame and perform compression processing on the video frame. The second device 102 transmits the compressed video frame and the extracted text information to the first device 101, and since the second device 102 compresses the video frame, the bandwidth pressure of the transmission channel between the second device 102 and the first device 101 can be reduced. After the first device 101 receives the compressed video frame and the text information, the first device 101 may merge the compressed video frame and the text information to merge the text content in the text information into the video frame, and play the video frame after the text content is merged, and since the text content in the text information is not compressed, the definition of the text content in the video played by the first device 101 is higher, and the text content in the video is prevented from being blurred.

In a possible application scenario, when a user watches an online video through the first device 101, the second device 102 may be a video server, and the first device 101 may be a device capable of playing a video, such as a mobile phone, a computer, or a television.

In a possible application scenario, a user projects a video onto a television (video projection screen) through a mobile phone, the second device 102 may be a video server or a mobile phone, and the first device 101 may be a television.

The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may exist independently or may be combined with each other. The description of the same or similar contents will not be repeated in different embodiments.

Fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present disclosure. Referring to fig. 2, the method may include:

s201, the second device extracts text information in the first video frame.

Optionally, the second device may be a server, a mobile phone, a computer, or the like.

Optionally, the first video frame is any frame in a video stream to be sent to the first device by the second device. The second device has the same processing procedure for each frame in the video to be sent to the first device, and this application takes an arbitrary first video frame as an example.

Optionally, the second device may process the first video frame through a Character Recognition (CR) technology to extract text information in the first video frame. For example, the CR technique may include an Optical Character Recognition (OCR) technique or the like.

Wherein the text information includes text content and attribute information.

The text content includes one or more characters, for example, the characters may be chinese characters, numbers, letters, etc. The text information may include a plurality of sets of text contents and attribute information of each set of text contents. Each group of text content comprises at least one character, the distance between every two adjacent characters in the at least one character in the group of text content is the same, the font, the size, the color and the font special effect of each character in the group of text content are the same, and the font special effect comprises at least one of affine, rotation or projection.

Optionally, the attribute information may include a position, a font, a size, a color, and a font special effect of the text content in the video frame. The position of the text content in the video frame can be represented by at least two coordinates of the area occupied by the text content in the video frame. Fonts may include sons, black, clerks, regular scripts, etc. The font special effects may include at least one of affine, rotation, or projection. Optionally, when a group of text contents includes a plurality of characters, the attribute information of the group of text contents may further include a distance between the characters.

It should be noted that the attribute information and the font special effect may also include others, and this is not particularly limited in this embodiment of the present application.

Fig. 3 is a schematic view of a video frame according to an embodiment of the present application. Referring to fig. 3, the text content "happy artist" and "today's weather is not included in the video frame 301. Since the attribute information of each character in the text content "happy family" is the same and the attribute information of each character in the text content "today's weather is not wrong" is the same, two sets of text content can be extracted from the video frame shown in fig. 3, please refer to video frame 302, and can take "happy family" as a set of text content whose positions in video frame 302 are rectangular regions having vertices of point a1 and point a2 and "today's weather is not wrong" as a set of text content whose positions in video frame 302 are rectangular regions having vertices of point B1 and point B2. The two sets of text contents are shown in table 1:

TABLE 1

Alternatively, the first video frame may remain unchanged after extracting the text information in the first video frame. In this way, the workload of video frame processing can be reduced.

Optionally, after extracting the text information from the first video frame, the text content in the text information may also be removed from the first video frame, and optionally, the text content in the first video frame may be removed in a feasible implementation manner as follows:

one possible implementation is:

and updating the color of the pixel of the text content in the first video frame according to the background pattern of the area of the text content in the first video frame so as to remove the text content in the first video frame. After removing the text content in the first video frame, the area of the first video frame where the text content is located may present a complete background pattern.

Optionally, the background pattern of the region where the text content is located may be a solid color, a preset regular shape, a preset image, or the like.

Such a video frame with text information removed will be described below with reference to fig. 4A to 4B.

Fig. 4A is a schematic view of another video frame according to an embodiment of the present application. Referring to fig. 4A, video frame a1 includes text contents "happy house" and "today's weather is not wrong," the background pattern of the region where the text content "happy house" is pure gray, and the background pattern of the region where the text content "today's weather is not wrong" is pure red, then the color of the pixel where the text content "happy house" in video frame a1 is replaced with gray, and the color of the pixel where the text content "today's weather is not wrong" in video frame a1 is replaced with red, resulting in video frame a2, referring to video frame a2, and video frame a2 does not include text content.

Fig. 4B is a schematic view of another video frame according to an embodiment of the present application. Referring to fig. 4B, if the video frame B1 includes text contents "happy family" and "today's weather is wrong," the background pattern of the region where the text content "happy family" is located is vertical stripes, and the background pattern of the region where the text content "today's weather is not wrong" is petals, the color of the pixel where the text content "happy family" is located in the video frame B1 may be replaced with the color of the pixel in the vertical stripes, and the color of the pixel where the text content "today's weather is not wrong" in the video frame B1 may be replaced with the color of the pixel in the petals, so as to obtain a video frame B2, which is referred to as video frame B2, and the video frame B2 does not include text content. In video frame B2, full vertical stripes and full petals are included.

Another possible implementation:

and updating the color of the pixel where the text content in the first video frame is located to a preset color. For example, the preset color may be white, gray, etc.

Next, a description will be given of such a video frame from which text information is removed, with reference to fig. 4C.

Fig. 4C is a schematic view of another video frame according to an embodiment of the present application. Referring to fig. 4C, video frame C1 includes text content "happy artist" and "today's weather is wrong," the background patterns of the area where the text content "happy artist" is located are vertical stripes, and the background patterns of the area where the text content "today's weather is wrong" are petals. Assuming that the preset color is white, the color of the pixel of the text content "happy family" in the video frame C1 may be replaced by white, and the color of the pixel of the text content "today is not wrong" in the video frame C1 may be replaced by white, so as to obtain the video frame C2, please refer to the video frame C2, and the video frame C2 does not include text content. Wherein white text content covers a part of the pixels in the vertical stripes and in the petals. In video frame B2, the portions in the vertical stripes are covered by white and the portions in the petals are covered by white.

S202, the second equipment compresses the first video frame to obtain a second video frame.

And the resolution of the second video frame is smaller than that of the first video frame.

Optionally, the second device may determine, according to a bandwidth of a transmission channel between the second device and the first device, a compression ratio for compressing the first video frame, and compress the first video frame according to the compression ratio to obtain the second video frame. The bit rate (in bps) of the second video is less than the bandwidth of the transmission channel between the second device and the first device.

And S203, the second device sends the second video frame and the text information to the first device.

The one video comprises a plurality of video frames, and for each video frame, the second device can obtain the corresponding second video frame and the text information and send the second video frame and the text information to the first device. Accordingly, the first device may receive a plurality of second video frames and a plurality of text messages, and in order to enable the first device to determine a correspondence relationship between the second video frames and the text messages, the second device needs to send the second video frames and the text messages according to a preset rule.

Optionally, the second device may send the second video frame and the text information to the first device through the following feasible implementation manners:

the second device generates a first identifier and adds the first identifier to the second video frame and the text information, respectively, and the second device transmits the second video frame including the first identifier and the text information including the first identifier to the first device.

Wherein, the corresponding identifiers of different video frames are different.

Optionally, the first identifier may be a timestamp generated by the second device according to the current time. When the second device needs to generate the timestamps corresponding to the plurality of video frames at the same time, the second device may generate the timestamp according to the current time, and then add the identifier to the timestamp, so that the timestamps corresponding to each video frame are different. For example, if the second device needs to generate timestamps corresponding to video frame 1, video frame 2, and video frame 3 at the same time, and the timestamp generated according to the current time is timestamp 1, after the second device adds an identifier to timestamp 1, the timestamp corresponding to video frame 1 may be timestamp 1+ a, the timestamp corresponding to video frame 2 may be timestamp 1+ b, and the timestamp corresponding to video frame 3 may be timestamp 1+ c.

In the feasible implementation manner, the second device adds the same identifier in the second video frame and the text information, so that the first device can determine the corresponding relationship between the second video frame and the text information according to the identifier, and the process is simple and convenient.

Alternatively, the second device may send the second video frame and the text information to the first device on the same transmission channel.

Alternatively, the second device may transmit the second video frame and the text message to the first device at a different transmission. For example, the second device may send the second video frame to the first device via a first transmission channel and send the text information to the first device via a second transmission channel, the second transmission channel being a parallel bypass small bandwidth channel of the first transmission channel.

For example, the first transmission channel may be an embedded Display Port (eDP) channel, a High Definition Multimedia Interface (HDMI) channel, or the like. The second transmission channel may be an Auxiliary channel (AUX), a Universal Serial Bus (USB) channel, or the like.

And S204, the first equipment adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

Optionally, the first device may first determine the second video frame and the corresponding text information. For example, when the second device transmits the second video frame and the text information by the method shown in S203, the second device may determine the correspondence between the second video frame and the text information according to the identifier included in the received second video frame and the identifier included in the text information. For example, it is assumed that a second video frame received by the first device includes the first identifier, and a text message received by the first device includes the second identifier, and if the first identifier and the second identifier are the same, the first device determines that the second video frame corresponds to the text message.

Optionally, the first device may add the text content to the second video frame to obtain the third video frame by a possible implementation as follows: and the first equipment generates a first image corresponding to the text content according to the text content and the attribute information, and adds the first image to the second video frame according to the attribute information to obtain a third video frame. For example, the first device may add the first image to the second video frame according to the location information in the attribute information. For example, the first device may overlay the first image in the second video frame based on the location information.

Optionally, the region of the first image other than the text content is transparent.

Optionally, the resolution of the first image is greater than the resolution of the second video frame. For example, the resolution of the first image may be equal to the resolution of the first image, so that the sharpness of the text content in the third video may be made higher.

Optionally, after the first device obtains the third video frame, the first device may play the third video frame, or the first device may send the third video frame to the third device, so that the third device plays the third video frame.

According to the video processing method provided by the embodiment of the application, for any first video frame in a video stream, the second device extracts text information in the first video frame and compresses the first video frame to obtain a second video frame. The second device sends the second video frame and the text information to the first device, and the second video frame is a compressed video frame, so that the bandwidth pressure of a transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device can combine the second video frame and the text information to combine the text content in the text information to the second video frame to obtain a third video frame.

For ease of understanding, the architecture of the first device and the second device for processing video is described below in conjunction with fig. 5.

Fig. 5 is an architecture diagram for processing video according to an embodiment of the present application. Referring to fig. 5, after the second device obtains the first video frame, the second device extracts text information from the first video frame to obtain text information, and compresses the first video frame after extracting the text information to obtain a second video frame. The second device also generates a timestamp and adds the same timestamp in the text information and the second video frame, i.e. the text information and the second video frame are time-stamped the same respectively. The second device sends the time-stamped text information and the time-stamped second video frame to the first device.

After the first device receives the text information with the timestamp and the second video frame with the timestamp, the first device may determine a correspondence between the text information and the second video frame according to the timestamp. After the first device determines to obtain the corresponding relationship between the text information and the second video frame, the first device may generate a first image according to the text information, and merge the first image and the second video frame to obtain a third video frame.

With the architecture shown in fig. 5, the video processing procedure will be described below with reference to fig. 6.

Fig. 6 is a flowchart illustrating another video processing method according to an embodiment of the present application. Referring to fig. 6, the method may include:

s601, the second equipment acquires the first video frame.

The second device obtains a first video frame in a video stream to be sent to the first device. The video stream may be a video stream local to the second device or may be a video stream received by the second device from another device.

And S602, the second equipment extracts text information in the first video frame.

It should be noted that the execution process of S602 may refer to the execution process of S201, and is not described herein again.

And S603, the second equipment compresses the first video frame to obtain a second video frame.

It should be noted that the execution process of S603 may refer to the execution process of S202, and details are not described here.

And S604, the second device generates a time stamp.

Optionally, the second device may generate a timestamp according to the current time.

And S605, adding time stamps to the text information and the second video frame by the second equipment respectively.

After adding the time stamp to the text information, the time stamp is included in the text information. After adding the timestamp to the second video frame, the timestamp is included in the second video frame.

And S606, the second device sends the second video frame comprising the time stamp to the first device through the first transmission channel.

And S607, the second device sends the text information including the time stamp to the first device through the second transmission channel.

And S608, the first device determines the corresponding relation between the text information and the second video frame according to the time stamp.

Optionally, the first device may receive a plurality of text messages and a plurality of second video frames at the same time, and therefore, the first device needs to determine the correspondence between the text messages and the second video frames according to the timestamps. For example, the first device determines the text information and the second video frame having the same time stamp as the text information and the second video frame having the correspondence relationship.

And S609, the first device generates a first image according to the text information.

Optionally, if the text information includes multiple groups of text contents, the first device may generate a first image corresponding to each group of text contents.

Optionally, for any group of text contents, the first device generates a first image corresponding to the group of text contents according to the group of text contents, the font, the size, the color, and the font special effect of the group of text contents, and the resolution of the first image is greater than the resolution of the second video frame.

Alternatively, the size of the first image may be the same as the size of the area occupied by the text content in the first video frame.

Next, the first image will be described with reference to fig. 7.

Fig. 7 is a schematic diagram of a first image provided in an embodiment of the present application. Referring to fig. 7, assume that the first video frame is as shown in fig. 701. The text content "happy artist" and "weather not good today" are included in the first video frame 701.

Assuming that the text information extracted in the first video frame is as shown in table 1, the first device may generate an image 702 according to "happy home" and "position 1, font 1, size 1, color 1, space 1, effect 1" in table 1, and in the image 702, the text content "happy home" is included, and the attribute information of the text content in the image 702 is the same as the attribute information of the text content in the first video frame 701. The size of the image 702 is the same as the size of the area occupied by the text content "happy home" in the first video frame 701.

The first device may further generate an image 703 according to "weather is not wrong today" and "position 2, font 2, size 2, color 2, space 2, effect 2" in table 1, and in the image 703, a text content "weather is not wrong today" is included, and attribute information of the text content in the image 703 is the same as attribute information of the text content in the first video frame 701. The size of the image 703 is the same as the size of the area occupied by the text content "today's weather is not wrong" in the first video frame 701.

S610, the first device merges the first image and the second video frame to obtain a third video frame.

Optionally, if the number of the first images is multiple, the first device performs merging processing on each first image and each second video frame respectively to obtain a third video frame.

S611, the first device plays the third video frame.

In the embodiment shown in fig. 6, for any first video frame in the video stream, the second device extracts text information in the first video frame, and compresses the first video frame to obtain a second video frame. The second device sends the second video frame and the text information to the first device, and the second video frame is a compressed video frame, so that the bandwidth pressure of a transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device may merge the second video frame and the text information to merge the text content in the text information into the second video frame to obtain a third video frame, and play the third video frame.

On the basis of any of the above embodiments, the following describes the above video processing method with reference to fig. 8 to 9.

Fig. 8 is a schematic view of a video processing process according to an embodiment of the present application. Referring to fig. 8, a first video frame is shown at 801, and the text information "happy house" and "today's weather is not wrong" is included in the first video frame 801. The font, the size, the color and the font special effect of each character in the text content 'happy family' are the same, and the font, the size, the color and the font special effect of each character in the text content 'today weather is not wrong' are the same.

After the second device acquires the first video frame 801, text information 802 is extracted from the first video frame 801, wherein the text information 802 includes two groups of text contents "happy family" and "today weather is not wrong", and attribute information of each group of text contents. After the second device has extracted the textual information in the first video frame 801, it is assumed that the first video frame 801 is unchanged.

The second device compresses the first video frame 801 from which the text information has been extracted, resulting in a second video frame 803. The second device adds the same time stamp in the second video frame 803 and the text information 802 and transmits the second video frame 803 including the time stamp and the text information 802 including the time stamp.

After the first device receives the second video frame 803 including the timestamp and the text information 802 including the timestamp, the first device determines from the timestamp that the second video frame 803 corresponds to the text information 802. The first device generates an image 804 based on the text content "happy home" and the font, size, color, and font special effect of the text information, the font, size, color, and font special effect of the text content "happy home" in the image 804 corresponding to the same font, size, color, and font special effect of the text content in the first video frame 801. The first device generates an image 805 according to the text content "today's weather is not good" and the font, size, color and font special effect of the text content, and the font, size, color and font special effect of the text content "today's weather is not good" in the image 805 is correspondingly the same as the font, size, color and font special effect of the text content in the first video frame 801.

The first device overlays the image 804 on the second video frame 803 according to the position information of the text content "happy home" in the first video frame 801. The first device also overlays image 805 over second video frame 803, resulting in third video frame 806, based on the position information of the text content "today weather is not good" in first video frame 801. The first device may play the third video frame 806.

In the embodiment shown in fig. 8, not only can the bandwidth pressure between the second device and the first device be reduced, but also the definition of text content in the third video frame played by the first device can be made higher, so that the quality of video played by the first device is made higher. Further, after the second device extracts the text information from the first video frame, the first video frame is unchanged (the first video frame is not processed), so that the workload of video processing of the second device is small, and the efficiency of video processing is high.

Fig. 9 is a schematic view of another video processing process provided in the embodiment of the present application. Referring to fig. 9, a first video frame is shown as 901, and text information "happy house" and "weather today is wrong" are included in the first video frame 901. The font, the size, the color and the font special effect of each character in the text content 'happy family' are the same, and the font, the size, the color and the font special effect of each character in the text content 'today weather is not wrong' are the same.

After the second device acquires the first video frame 901, text information 902 is extracted from the first video frame 901, wherein the text information 902 includes two groups of text contents, namely "happy family" and "today weather is not wrong", and attribute information of each group of text contents. After the second device extracts the text information from the first video frame 901, the second device removes the text content from the first video frame 901 to obtain a first video frame 903. Referring to fig. 9, the first video frame 903 does not include text content.

The second device compresses the first video frame 903 to obtain a second video frame 904. The second device adds the same time stamp in the second video frame 904 and the text information 902 and transmits the second video frame 904 including the time stamp and the text information 902 including the time stamp.

After the first device receives the second video frame 904 including the timestamp and the text information 902 including the timestamp, the first device determines from the timestamp that the second video frame 904 corresponds to the text information 902. The first device generates an image 905 from the text content "happy house" and the font, size, color, and font special effect of the text information, and the font, size, color, and font special effect of the text content "happy house" in the image 905 correspond to the same font, size, color, and font special effect of the text content in the first video frame 901. The first device generates an image 906 according to the text content "today's weather is not good" and the font, size, color and font special effect of the text content, and the font, size, color and font special effect of the text content "today's weather is not good" in the image 906 is correspondingly the same as the font, size, color and font special effect of the text content in the first video frame 901.

The first device overlays an image 905 on the second video frame 904 according to the position information of the text content "happy home" in the first video frame 901. The first device also overlays image 906 on second video frame 904 based on the location information of the text content "today weather is not good" in first video frame 901, resulting in third video frame 907. The first device may play the third video frame 907.

In the embodiment shown in fig. 9, not only can the bandwidth pressure between the second device and the first device be reduced, but also the definition of text content in the third video frame played by the first device can be made higher, so that the quality of video played by the first device is made higher. Further, after the second device extracts the text information from the first video frame, the text content is removed from the first video frame, so that when the first device adds the first image (the image 905 and the image 906) to the second video frame, the problem that the text content in the first image cannot completely cover the text content in the second video frame can be avoided, and the video processing quality is high.

Fig. 10 is a video processing apparatus according to an embodiment of the present application. The video processing apparatus 10 may be provided in a first device. Referring to fig. 10, the video processing apparatus 10 may include a receiving module 11 and a processing module 12, wherein,

the receiving module 11 is configured to receive a second video frame sent by a second device and text information extracted from the first video frame, where the second video frame is obtained by compressing the first video frame, and the text information includes text content and attribute information of the text content;

the processing module 12 is configured to add the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

Optionally, the receiving module 11 may execute S203 in the embodiment of fig. 2 and S606-S607 in the embodiment of fig. 6.

Optionally, the processing module 12 may execute S204 in the embodiment of fig. 2 and S608-S611 in the embodiment of fig. 6.

It should be noted that the video processing apparatus shown in the embodiment of the present application can execute the technical solutions shown in the above method embodiments, and the implementation principle and the beneficial effects thereof are similar, and are not described herein again.

In a possible implementation, the processing module 12 is specifically configured to:

In a possible implementation, before the processing module adds the text content to the second video frame according to the attribute information, the processing module 12 is further configured to:

acquiring a first identifier in the second video frame;

acquiring a second identifier in the text information;

determining that the first identity and the second identity are the same.

In a possible implementation, the receiving module 11 is specifically configured to:

In a possible implementation manner, after the processing module 12 adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played, the processing module 12 is further configured to:

playing the third video frame;

alternatively, the first and second electrodes may be,

Fig. 11 is another video processing apparatus according to an embodiment of the present application. The video processing apparatus 20 may be provided in a second device. Referring to fig. 11, the video processing apparatus 20 may include a processing module 21 and a transmitting module 22, wherein,

the processing module 21 is configured to extract text information from the first video frame, where the text information includes text content and attribute information;

the processing module 21 is further configured to perform compression processing on the first video frame to obtain a second video frame;

the sending module 22 is configured to send the second video frame and the text information to the first device.

Optionally, the processing module 21 may execute S201 to S202 in the embodiment of fig. 2 and S601 to S605 in the embodiment of fig. 6.

Optionally, the sending module 22 may execute S203 in the embodiment of fig. 2 and S606-S607 in the embodiment of fig. 6.

In a possible implementation, before the sending module 22 sends the second video frame and the text information to the first device, the processing module 21 is further configured to:

generating a first identifier;

In a possible implementation, the sending module 22 is specifically configured to:

Fig. 12 is a schematic hardware structure diagram of a video processing apparatus according to an embodiment of the present application. Referring to fig. 12, the video processing apparatus 30 includes: a memory 31, a processor 32, and a receiver 33, wherein the memory 31 and the processor 32 are in communication; illustratively, the memory 31, the processor 32 and the receiver 33 may communicate via a communication bus 44, the memory 31 being configured to store a computer program, the processor 32 executing the computer program to implement the video processing method described above.

Optionally, the processor 32 shown in this application may implement the function of the processing module 12 in the embodiment of fig. 10, and the receiver 33 may implement the function of the receiving module 11 in the embodiment of fig. 10, which is not described herein again.

Optionally, the processor may be a CPU, or may also be other general processors, DSPs, ASICs, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps in the embodiments of the authentication method disclosed in this application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

Fig. 13 is a schematic hardware structure diagram of a video processing apparatus according to an embodiment of the present application. Referring to fig. 13, the video processing apparatus 40 includes: a memory 41, a processor 42, and a transmitter 43, wherein the memory 41 and the processor 42 are in communication; illustratively, the memory 41, the processor 42 and the transmitter 43 may communicate via a communication bus 44, the memory 41 being configured to store a computer program, the processor 42 executing the computer program to implement the video processing method described above.

Optionally, the processor 42 shown in this application may implement the function of the processing module 21 in the embodiment of fig. 11, and the receiver 43 may implement the function of the sending module 22 in the embodiment of fig. 11, which is not described herein again.

The present application provides a storage medium for storing a computer program for implementing the video processing method described in the above embodiments.

An embodiment of the present application further provides a chip or an integrated circuit, including: a memory and a processor;

the memory for storing program instructions and sometimes intermediate data;

the processor is configured to call the program instructions stored in the memory to implement the video processing method as described above.

Alternatively, the memory may be separate or integrated with the processor. In some embodiments, the memory may also be located outside of the chip or integrated circuit.

An embodiment of the present application further provides a program product, where the program product includes a computer program, where the computer program is stored in a storage medium, and the computer program is used to implement the video processing method described above.

All or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a readable memory. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned memory (storage medium) includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (magnetic tape), floppy disk (flexible disk), optical disk (optical disk), and any combination thereof.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

In the present application, the terms "include" and variations thereof may refer to non-limiting inclusions; the term "or" and variations thereof may mean "and/or". The terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. In the present application, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Claims

1. A video processing method, comprising:

the method comprises the steps that first equipment receives a second video frame sent by second equipment and text information extracted from the first video frame, wherein the second video frame is obtained by compressing the first video frame, and the text information comprises text content and attribute information of the text content;

and the first equipment adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

2. The method according to claim 1, wherein the adding, by the first device, the text content to the second video frame according to the attribute information to obtain a third video frame to be played comprises:

the first device generates a first image corresponding to the text content according to the text content and the attribute information, wherein the resolution of the first image is greater than that of the second video frame;

and the first equipment adds the first image to the second video frame according to the attribute information to obtain a third video frame.

3. The method according to claim 2, wherein the generating, by the first device, a first image corresponding to the text content according to the text content and the attribute information comprises:

the first device determines at least one group of text contents in the text contents, wherein each group of text contents comprises at least one character, the font, the size, the color and the font special effect of each character in one group of text contents are the same, and the font special effect comprises at least one of affine, rotation or projection;

and the first equipment generates a first image corresponding to each group of text content according to the attribute information of each group of text content.

4. A method according to claim 2 or 3, wherein the area of the first image other than the text content is transparent.

5. The method according to any one of claims 2 to 4, wherein the adding, by the first device, the first image to the second video frame according to the attribute information to obtain the third video frame to be played comprises:

the first equipment acquires the position information of the text content in the first video frame in the attribute information

And the first equipment adds the first image into the second video frame according to the position information to obtain a third video frame.

6. The method according to any of claims 1-5, wherein the first device adds the text content to the second video frame before the attribute information, further comprising:

the first equipment acquires a first identifier in the second video frame;

the first equipment acquires a second identifier in the text information;

the first device determines that the first identity and the second identity are the same.

7. The method of claim 6, wherein the first identifier and the second identifier are the same timestamp.

8. The method according to any one of claims 1-7, wherein the receiving, by the first device, the second video frame sent by the second device and the text information extracted from the first video frame comprises:

the first device receives the second video frame sent by the second device from a first transmission channel;

and the first equipment receives the text information sent by the second equipment from a second transmission channel, wherein the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

9. The method of any of claims 1-8, wherein the attribute information comprises a position, a font, a size, a color, and a font special effect of the text content in the video frame, the font special effect comprising at least one of an affine, a rotation, or a projection.

10. The method according to any one of claims 1 to 9, wherein the first device, after adding the text content to the second video frame according to the attribute information and obtaining a third video frame to be played, further comprises:

the first device plays the third video frame;

alternatively, the first and second electrodes may be,

and the first equipment sends the third video frame to third equipment, and the third equipment is used for playing the third video frame.

11. A video processing method, comprising:

the second equipment extracts text information in the first video frame, wherein the text information comprises text content and attribute information;

the second equipment compresses the first video frame to obtain a second video frame;

and the second equipment sends the second video frame and the text information to the first equipment.

12. The method of claim 11, wherein before the second device sends the second video frame and the text information to the first device, further comprising:

the second device generates a first identifier;

and the second equipment adds the first identifier in the second video frame and the text information respectively.

13. The method of claim 12, wherein the first identifier is a timestamp generated by the second device.

14. The method of claims 11-13, wherein the second device sending the second video frame and the text information to the first device comprises:

the second device sends the second video frame to the first device through a first transmission channel;

and the second equipment sends the text information to the first equipment through a second transmission channel, wherein the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

15. The method of any of claims 11-14, wherein the attribute information includes a position, font, size, color, and font special effects of the text content in the video frame, the font special effects including at least one of affine, rotation, or projection.

16. A video processing apparatus comprising a receiving module and a processing module, wherein,

17. The apparatus of claim 16, wherein the processing module is specifically configured to:

18. The apparatus of claim 17, wherein the processing module is specifically configured to:

19. The apparatus of claim 17 or 18, wherein an area of the first image other than the text content is transparent.

20. The apparatus according to any one of claims 17 to 19, wherein the processing module is specifically configured to:

21. The apparatus according to any of claims 16-20, wherein before the processing module adds the text content to the second video frame according to the attribute information, the processing module is further configured to:

acquiring a first identifier in the second video frame;

acquiring a second identifier in the text information;

determining that the first identity and the second identity are the same.

22. The apparatus according to any one of claims 16 to 21, wherein the receiving module is specifically configured to:

23. The apparatus according to any of claims 16-22, wherein the attribute information comprises a position, a font, a size, a color, and a font special effect of the text content in the video frame, the font special effect comprising at least one of an affine, a rotation, or a projection.

24. A video processing apparatus comprising a processing module and a transmitting module, wherein,

the sending module is used for sending the second video frame and the text information to the first equipment.

25. The apparatus of claim 24, wherein before the sending module sends the second video frame and the text information to the first device, the processing module is further configured to:

generating a first identifier;

26. The apparatus according to claim 24 or 25, wherein the sending module is specifically configured to:

27. The apparatus according to any of claims 24-26, wherein the attribute information comprises a position, a font, a size, a color, and a font special effect of the text content in the video frame, the font special effect comprising at least one of an affine, a rotation, or a projection.

28. A video processing apparatus, comprising: memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the video processing method according to any of claims 1-10.

29. A video processing apparatus, comprising: memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the video processing method according to any of claims 11-15.

30. A storage medium, characterized in that the storage medium comprises a computer program for implementing the video processing method according to any one of claims 1 to 15.