WO2020249100A1

WO2020249100A1 - Video processing method, apparatus and device

Info

Publication number: WO2020249100A1
Application number: PCT/CN2020/095882
Authority: WO
Inventors: 曾以亮; 毛春静
Original assignee: 华为技术有限公司
Priority date: 2019-06-14
Filing date: 2020-06-12
Publication date: 2020-12-17
Also published as: CN112087660A

Abstract

Provided are a video processing method, apparatus and device. The method comprises: a first device receiving a second video frame sent by a second device and text information extracted from a first video frame, wherein the second video frame is obtained by compressing the first video frame, and the text information comprises text content and attribute information of the text content; and the first device adding, according to the attribute information, the text content to the second video frame in order to obtain a third video frame to be played. The quality of a compressed video is improved.

Description

Video processing method, device and equipment

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China with the application number 201910517023.2, and the priority of the Chinese patent application with the title of "Video Processing Method, Device and Equipment" on June 14, 2019. The entire content is incorporated into this application by reference.

Technical field

This application relates to the field of computer technology, and in particular to a video processing method, device and equipment.

Background technique

The video includes multiple video frames, and each video frame may include text content. For example, the text content may include subtitles, barrage, and prompt information.

At present, the resolution of video continues to increase, making the bandwidth pressure of the video transmission channel more and more severe. In order to relieve the bandwidth pressure of the video transmission channel, the video can be compressed before the video is transmitted. However, in the actual application process, after the video is compressed, the text content in the video becomes blurred, which affects the recognition of the text content by the user, resulting in a lower quality of the compressed video.

Summary of the invention

This application provides a video processing method, device and equipment. Improved the quality of compressed video.

In a first aspect, an embodiment of the present application provides a video processing method. A first device receives a second video frame sent by a second device and text information extracted from the first video frame. The text information includes the text content and the attribute information of the text content obtained by frame compression. The first device adds the text content to the second video frame according to the attribute information to obtain the third video frame to be played.

In the above process, for any first video frame in the video stream, the second device first extracts text information from the first video frame, and compresses the first video frame to obtain the second video frame. The second device sends the second video frame and text information to the first device. Since the second video frame is a compressed video frame, the bandwidth pressure of the transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device may merge the second video frame and the text information to merge the text content in the text information into the second video frame to obtain the third video frame Since the text content is not compressed, the definition of the text content in the third video frame can be made higher, and the quality of the compressed video can be improved.

In a possible implementation manner, the first device may add text content to the second video frame to obtain the third video frame to be played through the following feasible implementations: the first device generates the text content according to the text content and attribute information Corresponding to the first image, the resolution of the first image is greater than the resolution of the second video frame; the first device adds the first image to the second video frame according to the attribute information to obtain the third video frame.

In the above process, the first image generated according to the text content and attribute information includes text content. Since the resolution of the first image is greater than the resolution of the second video frame, the text content in the first image can be made clear In this way, even if the video frame is compressed, the definition of the text content in the video frame can be made higher.

In a possible implementation manner, the first device generates the first image corresponding to the text content according to the text content and attribute information, including: the first device determines at least one group of text content in the text content, and each group of text content includes At least one character, the font, size, color, and font effects of each character in a set of text content are the same. The font effects include at least one of affine, rotation, or projection; the first device is based on the attribute information of each set of text content, Generate the first image corresponding to each set of text content.

In the above process, the text content in the first video is divided into at least one group of text content. Since the font, size, color, and font effects of the characters in each group of text content are the same, each group of text content can be generated separately In this way, the accuracy of each first image generated can be higher.

In a possible implementation manner, the area except the text content in the first image is transparent.

In the above process, since the area except the text content in the first image is transparent, in this way, when the first image is added to the second video frame, the first image can be prevented from covering the video picture in the second video.

In a possible implementation manner, the first device adds the first image to the second video frame according to the attribute information to obtain the third video frame to be played, including: the first device obtains the text content in the attribute information in the For the location information in the first video frame, the first device adds the first image to the second video frame according to the location information to obtain the third video frame.

In the above process, according to the position information of the text content in the first video frame, the first image can be accurately added to the second video frame, so that the position of the text information in the first image in the second video frame is the same The text content has the same position in the first video frame.

In a possible implementation manner, before the first device adds the text content to the second video frame according to the attribute information, the first device obtains the first identifier in the second video frame; the first device obtains the first identifier in the text information. Two identification: the first device determines that the first identification and the second identification are the same. Optionally, the first identifier and the second identifier are the same time stamp.

In the above process, when the first identifier and the second identifier are the same, it means that the text content and the second video frame correspond to the same first video frame. In this way, the first image corresponding to the text content can be added to the correct first image. Two video frames.

In a possible implementation manner, the first device receiving the second video frame sent by the second device and the text information extracted from the first video frame includes: the first device receives from the first transmission channel the second device sent The second video frame; the first device receives the text information sent by the second device from the second transmission channel, and the second transmission channel is the parallel bypass small bandwidth channel of the first transmission channel.

In the above process, the second device sends the second video frame and text information to the first device on a different transmission channel. Correspondingly, the first device receives the second video frame and text information from a different transmission channel. In this way, not only The data transmission efficiency is higher, and the data transmission method in each transmission channel can be made simpler.

In a possible implementation manner, the attribute information includes the position, font, size, color, and font effects of the text content in the video frame, and the font effects include at least one of affine, rotation, or projection.

In a possible implementation manner, the first device adds text content to the second video frame according to the attribute information, and after obtaining the third video frame to be played, the method further includes:

The first device plays the third video frame;

or,

The first device sends the third video frame to the third device, and the third device is used to play the third video frame.

In a second aspect, an embodiment of the present application provides a video processing method. A second device extracts text information from a first video frame. The text information includes text content and attribute information; the second device compresses the first video frame to obtain The second video frame; the second device sends the second video frame and text information to the first device.

In a possible implementation manner, before the second device sends the second video frame and text information to the first device, the method further includes: the second device generates the first identifier; and the second device separately includes the second video frame and the text information. Add the first logo. Optionally, the first identifier is a timestamp generated by the second device.

In the above process, by adding the first identifier to the second video frame and the text information, the first device can determine the correspondence between the second video frame and the text information according to the first identifier, so that the first device can transfer the text content The corresponding first image is added to the correct second video frame.

In a possible implementation manner, the second device sending the second video frame and text information to the first device includes: the second device sends the second video frame to the first device through the first transmission channel; The second transmission channel sends text information to the first device, and the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

In the above process, the second device sends the second video frame and text information to the first device on different transmission channels. In this way, not only can the data transmission efficiency be higher, but also the data transmission in each transmission channel The way is relatively simple.

In a third aspect, an embodiment of the present application provides a video processing device, including a receiving module and a processing module, where:

The receiving module is configured to receive a second video frame sent by a second device and text information extracted from the first video frame, where the second video frame is obtained by compressing the first video frame, and The text information includes text content and attribute information of the text content;

The processing module is configured to add the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

In a possible implementation manner, the processing module is specifically configured to:

Generating a first image corresponding to the text content according to the text content and the attribute information, where the resolution of the first image is greater than the resolution of the second video frame;

According to the attribute information, the first image is added to the second video frame to obtain the third video frame.

Determine at least one group of text content in the text content, each group of text content includes at least one character, the font, size, color, and font special effects of each character in the group of text content are the same, and the font special effects include affine, rotation Or at least one of projections;

The first image corresponding to each group of text content is generated according to the attribute information of each group of text content respectively.

In a possible implementation manner, the area in the first image other than the text content is transparent.

Obtain the position information of the text content in the first video frame in the attribute information

According to the position information, the first image is added to the second video frame to obtain the third video frame.

In a possible implementation manner, before the processing module adds the text content to the second video frame according to the attribute information, the processing module is further configured to:

Acquiring a first identifier in the second video frame;

Acquiring the second identifier in the text information;

It is determined that the first identifier and the second identifier are the same.

In a possible implementation manner, the first identifier and the second identifier are the same time stamp.

In a possible implementation manner, the receiving module is specifically configured to:

Receiving the second video frame sent by the second device from the first transmission channel;

Receiving the text information sent by the second device from a second transmission channel, where the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

In a possible implementation manner, the attribute information includes the position, font, size, color, and font effects of the text content in the video frame, and the font effects include at least one of affine, rotation, or projection. One kind.

In a possible implementation manner, after the processing module adds the text content to the second video frame according to the attribute information to obtain the third video frame to be played, the processing module also uses in:

Playing the third video frame;

or,

Sending the third video frame to a third device, where the third device is used to play the third video frame.

In a fourth aspect, an embodiment of the present application provides a video processing device, including a processing module and a sending module, where:

The processing module is configured to extract text information from a first video frame, where the text information includes text content and attribute information;

The processing module is further configured to perform compression processing on the first video frame to obtain a second video frame;

The sending module is configured to send the second video frame and the text information to the first device.

In a possible implementation manner, before the sending module sends the second video frame and the text information to the first device, the processing module is further configured to:

Generate the first identification;

The first identifier is added to the second video frame and the text information respectively.

In a possible implementation manner, the first identifier is a timestamp generated by the second device.

In a possible implementation manner, the sending module is specifically configured to:

Sending the second video frame to the first device through a first transmission channel;

The text information is sent to the first device through a second transmission channel, and the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.

In a fifth aspect, an embodiment of the present application provides a video processing device, including: a memory, a processor, and a computer program. The computer program is stored in the memory, and the processor runs the computer program to execute the same as in the first aspect. Any one of the video processing methods.

In a sixth aspect, an embodiment of the present application provides a video processing device, including a memory, a processor, and a computer program, the computer program is stored in the memory, and the processor runs the computer program to execute the same as in the second aspect Any one of the video processing methods.

In a seventh aspect, an embodiment of the present application provides a storage medium, the storage medium includes a computer program, and the computer program is used to implement the video processing method according to any one of the first aspect.

In an eighth aspect, an embodiment of the present application provides a storage medium, where the storage medium includes a computer program, and the computer program is used to implement the video processing method according to any one of the second aspect.

In a ninth aspect, an embodiment of the present application also provides a chip or integrated circuit, including: a memory and a processor;

The memory is used for storing program instructions and sometimes also used for storing intermediate data;

The processor is configured to call the program instructions stored in the memory to implement the video processing method according to any one of the first aspect.

In a tenth aspect, an embodiment of the present application also provides a chip or integrated circuit, including a memory and a processor;

The processor is configured to call the program instructions stored in the memory to implement the video processing method according to any one of the second aspect.

In an eleventh aspect, an embodiment of the present application also provides a program product, the program product includes a computer program, the computer program is stored in a storage medium, and the computer program is used to implement the Video processing method.

In a twelfth aspect, an embodiment of the present application further provides a program product, the program product includes a computer program, the computer program is stored in a storage medium, and the computer program is used to implement any one of the second aspect Video processing method.

According to the video processing method, device and device provided by the embodiments of the present application, for any first video frame in the video stream, the second device first extracts text information from the first video frame and compresses the first video frame to obtain The second video frame. The second device sends the second video frame and text information to the first device. Since the second video frame is a compressed video frame, the bandwidth pressure of the transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device may merge the second video frame and the text information to merge the text content in the text information into the second video frame to obtain the third video frame Since the text content is not compressed, the definition of the text content in the third video frame can be made higher, and the quality of the compressed video can be improved.

Description of the drawings

FIG. 1 is a system architecture diagram provided by an embodiment of the application;

FIG. 2 is a schematic flowchart of a video processing method provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a video frame provided by an embodiment of the application;

4A is a schematic diagram of another video frame provided by an embodiment of this application;

4B is a schematic diagram of another video frame provided by an embodiment of this application;

FIG. 4C is a schematic diagram of still another video frame provided by an embodiment of this application;

FIG. 5 is an architecture diagram of video processing provided by an embodiment of the application;

6 is a schematic flowchart of another video processing method provided by an embodiment of the application;

FIG. 7 is a schematic diagram of a first image provided by an embodiment of the application;

FIG. 8 is a schematic diagram of a video processing process provided by an embodiment of the application;

FIG. 9 is a schematic diagram of another video processing process provided by an embodiment of the application;

FIG. 10 is a video processing device provided by an embodiment of this application;

FIG. 11 is another video processing device provided by an embodiment of this application;

FIG. 12 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the application;

FIG. 13 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the application.

Detailed ways

To facilitate understanding, the system architecture used in this application will be described first.

Fig. 1 is a system architecture diagram provided by an embodiment of the application. Please refer to FIG. 1, which includes a first device 101 and a second device 102. There is a video transmission channel between the second device 102 and the first device 101, the second device 102 can send the video stream to the first device 101 through the transmission channel, and the first device 101 can play the received video stream.

Before the second device 102 sends the video stream to the first device 101, for any video frame in the video stream, the second device 102 may first extract text information from the video frame, and compress the video frame. The second device 102 sends the compressed video frame and the extracted text information to the first device 101. Since the second device 102 compresses the video frame, the second device 102 and the first device 101 can be reduced. The bandwidth pressure between the transmission channels. After the first device 101 receives the compressed video frame and text information, the first device 101 may merge the compressed video frame and text information to merge the text content in the text information into the video frame , And play the video frame after the merged text content. Because the text content in the text information is not compressed, the text content in the video played by the first device 101 has a higher definition, which avoids the text content in the video. Blurred.

In a possible application scenario, when a user watches online videos through the first device 101, the second device 102 may be a video server, and the first device 101 may be a device capable of playing videos such as a mobile phone, a computer, or a TV.

In a possible application scenario, a user uses a mobile phone to project a video to a TV (video projection), the second device 102 may be a video server or a mobile phone, and the first device 101 may be a TV.

Hereinafter, the technical solutions shown in this application will be described in detail through specific embodiments. It should be noted that the following embodiments can exist independently or can be combined with each other. For the same or similar content, the description will not be repeated in different embodiments.

FIG. 2 is a schematic flowchart of a video processing method provided by an embodiment of the application. See Figure 2. The method can include:

S201: The second device extracts text information from the first video frame.

Optionally, the second device may be a server, a mobile phone, a computer, etc.

Optionally, the first video frame is any frame in the video stream to be sent by the second device to the first device. Wherein, the second device has the same processing process for each frame in the video to be sent to the first device, and this application takes any first video frame as an example for description.

Optionally, the second device may process the first video frame by using character recognition (CR) technology to extract text information from the first video frame. For example, CR technology may include optical character recognition (Optical Character Recognition, OCR) technology and the like.

Among them, the text information includes text content and attribute information.

The text content includes one or more characters, for example, the characters can be Chinese characters, numbers, letters, etc. The text information may include multiple sets of text content and attribute information of each set of text content. Each set of text content includes at least one character, at least one character in a set of text content has the same spacing between every two adjacent characters, and the font, size, color and font effects of each character in a set of text content are the same, The font effects include at least one of affine, rotation, or projection.

Optionally, the attribute information may include the position, font, size, color, and font special effects of the text content in the video frame. The position of the text content in the video frame can be represented by at least two coordinates of the area occupied by the text content in the video frame. Fonts can include Song Ti, Hei Ti, Li Shu, Kai Ti, etc. The font effect may include at least one of affine, rotation, or projection. Optionally, when a group of text content includes multiple characters, the attribute information of the group of text content may also include character spacing.

It should be noted that the attribute information and font special effects may also include others, which are not specifically limited in the embodiment of the present application.

FIG. 3 is a schematic diagram of a video frame provided by an embodiment of the application. Please refer to Fig. 3, the video frame 301 includes text content "Happy House" and "Today's weather is good." Since the attribute information of each character in the text content "Happy House" is the same, and the attribute information of each character in the text content "Today's weather is good" is the same, therefore, two sets of text content can be extracted from the video frame shown in Figure 3 , Please refer to video frame 302, you can regard "Happy House" as a group of text content. The position of this group of text content in video frame 302 is a rectangular area with points A1 and A2 as vertices. As a group of text content, the position of the group of text content in the video frame 302 is a rectangular area with the points B1 and B2 as vertices. The two sets of text content are shown in Table 1:

Table 1

Optionally, after the text information is extracted from the first video frame, the first video frame may remain unchanged. In this way, the workload of video frame processing can be reduced.

Optionally, after extracting the text information in the first video frame, the text content in the text information can also be removed from the first video frame. Optionally, the text in the first video frame can be removed by the following feasible implementation methods content:

A feasible way to achieve:

According to the background pattern of the area where the text content in the first video frame is located, the color of the pixel where the text content is located in the first video frame is updated to realize the removal of the text content in the first video frame. After removing the text content in the first video frame, the area where the text content is located in the first video frame can present a complete background pattern.

Optionally, the background pattern of the area where the text content is located may be a solid color, a preset regular shape, a preset image, and the like.

Hereinafter, this video frame with text information removed will be described with reference to FIGS. 4A-4B.

FIG. 4A is a schematic diagram of another video frame provided by an embodiment of the application. Please refer to Figure 4A, the video frame A1 includes text content "Happy House" and "Today's weather is good", the background pattern of the area where the text content "Happy House" is located is pure gray, and the text content "Today's weather is good." The background pattern of the area where "is located is pure red, you can replace the color of the pixel where the text content "Happy House" in video frame A1 is located with gray, and the color of the pixel where the text content "Today’s weather is good" in video frame A1 Replace with red to get video frame A2, see video frame A2, video frame A2 does not include text content.

FIG. 4B is a schematic diagram of another video frame provided by an embodiment of the application. Please refer to Figure 4B, the video frame B1 includes text content "Happy House" and "Today's weather is good", the background pattern of the area where the text content "Happy House" is located is vertical stripes, and the text content "Today's weather is good." If the background pattern in the area where "is located is petals, you can replace the color of the pixel where the text content "Happy House" in video frame B1 is located with the color of the pixel in the vertical stripe, and change the text content in video frame B1 "Today’s weather is good. Replace the color of the pixel in the petal with the color of the pixel in the petal to obtain video frame B2. Please refer to video frame B2. Video frame B2 does not include text content. In the video frame B2, complete vertical stripes and complete petals are included.

Another feasible way to achieve:

Update the color of the pixel where the text content in the first video frame is located to the preset color. For example, the preset color can be white, gray, etc.

Hereinafter, this video frame with text information removed will be described with reference to FIG. 4C.

FIG. 4C is a schematic diagram of still another video frame provided by an embodiment of this application. Please refer to Figure 4C. The video frame C1 includes the text content "Happy House" and "Today's weather is good", the background pattern of the area where the text content "Happy House" is located is vertical stripes, and the text content "Today's weather is good." The background pattern of the area where "is located is petals. Assuming that the preset color is white, you can replace the color of the pixel where the text content "Happy House" in video frame C1 is located with white, and replace the color of the pixel where the text content "Today’s weather is good" in video frame C1 with white , Get video frame C2, please refer to video frame C2, video frame C2 does not include text content. Among them, the white text content covers some pixels in the vertical stripes and petals. In the video frame B2, the part in the vertical stripes is covered with white, and the part in the petals is covered with white.

S202. The second device performs compression processing on the first video frame to obtain a second video frame.

Wherein, the resolution of the second video frame is smaller than the resolution of the first video frame.

Optionally, the second device may determine the compression ratio for compressing the first video frame according to the bandwidth of the transmission channel between the second device and the first device, and compress the first video frame according to the compression ratio to obtain the first video frame. Two video frames. The bit rate (unit bps) of the second video is smaller than the bandwidth of the transmission channel between the second device and the first device.

S203: The second device sends the second video frame and text information to the first device.

One video includes multiple video frames. For each video frame, the second device can obtain the corresponding second video frame and text information, and send the second video frame and text information to the first device. Correspondingly, the first device can receive multiple second video frames and multiple text information. In order for the first device to determine the correspondence between the second video frames and the text information, the second device needs to follow a preset rule Send the second video frame and text information.

Optionally, the second device may send the second video frame and text information to the first device in the following feasible implementation manners:

The second device generates the first identifier, and adds the first identifier to the second video frame and the text information respectively, and the second device sends the second video frame including the first identifier and the text information including the first identifier to the first device.

Among them, different video frames correspond to different identifiers.

Optionally, the first identifier may be a timestamp generated by the second device according to the current time. When the second device needs to generate time stamps corresponding to multiple video frames at the same time, the second device may first generate a time stamp according to the current time, and then add an identifier to the time stamp, so that each video frame corresponds to a different time stamp. For example, suppose that the second device needs to generate the timestamps corresponding to video frame 1, video frame 2 and video frame 3 at the same time, and the timestamp generated according to the current time is timestamp 1, then the second device adds the identifier to the timestamp 1. , The timestamp corresponding to video frame 1 may be timestamp 1+a, the timestamp corresponding to video frame 2 may be timestamp 1+b, and the timestamp corresponding to video frame 3 may be timestamp 1+c.

In this feasible implementation manner, the second device adds the same identifier to the second video frame and the text information, so that the first device can determine the correspondence between the second video frame and the text information according to the identifier, and the process is simple and convenient.

Optionally, the second device may send the second video frame and text information to the first device on the same transmission channel.

Optionally, the second device may also send the second video frame and text information to the first device in a different transmission. For example, the second device can send a second video frame to the first device through the first transmission channel, and send text information to the first device through the second transmission channel. The second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel. .

For example, the first transmission channel may be an embedded Display Port (eDP) channel, a High Definition Multimedia Interface (HDMI) channel, etc. The second transmission channel may be an auxiliary channel (Auxiliary, AUX), a universal serial bus (Universal Serial Bus, USB) channel, etc.

S204. The first device adds the text content to the second video frame according to the attribute information to obtain the third video frame to be played.

Optionally, the first device may first determine the second video frame and the corresponding text information. For example, when the second device sends the second video frame and text information through the method shown in S203, the second device may determine the second video frame according to the identifier included in the received second video frame and the identifier included in the text information. Correspondence between frame and text information. For example, suppose that a second video frame received by the first device includes a first identifier, and a text message received includes a second identifier. If the first identifier and the second identifier are the same, the first device determines the second identifier. The video frame corresponds to the text information.

Optionally, the first device may add text content to the second video frame to obtain the third video frame through the following feasible implementations: the first device generates the first image corresponding to the text content according to the text content and attribute information, and A device adds the first image to the second video frame according to the attribute information to obtain the third video frame. For example, the first device may add the first image to the second video frame according to the location information in the attribute information. For example, the first device may overlay the first image in the second video frame according to the location information.

Optionally, the area except the text content in the first image is transparent.

Optionally, the resolution of the first image is greater than the resolution of the second video frame. For example, the resolution of the first image may be equal to the resolution of the first image, so that the definition of the text content in the third video can be made higher.

Optionally, after the first device obtains the third video frame, the first device may play the third video frame, or the first device may send the third video frame to the third device, so that the third device may play the third video frame. Video frame.

In the video processing method provided by the embodiments of the present application, for any first video frame in a video stream, the second device first extracts text information from the first video frame, and compresses the first video frame to obtain the second video frame . The second device sends the second video frame and text information to the first device. Since the second video frame is a compressed video frame, the bandwidth pressure of the transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device may merge the second video frame and the text information to merge the text content in the text information into the second video frame to obtain the third video frame Since the text content is not compressed, the definition of the text content in the third video frame can be made higher, and the quality of the compressed video can be improved.

For ease of understanding, the following describes the video processing architecture of the first device and the second device in conjunction with FIG. 5.

Fig. 5 is an architecture diagram of video processing provided by an embodiment of the application. Referring to Figure 5, after the second device obtains the first video frame, the second device extracts text information from the first video frame to obtain text information, and compresses the first video frame after the text information is extracted to obtain the second Video frame. The second device also generates a time stamp, and adds the same time stamp to the text information and the second video frame, that is, the text information and the second video frame are respectively stamped with the same time stamp. The second device sends the time-stamped text information and the time-stamped second video frame to the first device.

After the first device receives the text information with a time stamp and the second video frame with a time stamp, the first device may determine the correspondence between the text information and the second video frame according to the time stamp. After the first device determines the correspondence between the text information and the second video frame, the first device may generate the first image according to the text information, and perform merging processing on the first image and the second video frame to obtain the third video frame.

Under the architecture shown in FIG. 5, the video processing process will be described below in conjunction with FIG. 6.

FIG. 6 is a schematic flowchart of another video processing method provided by an embodiment of the application. Referring to Figure 6, the method may include:

S601. The second device obtains the first video frame.

The second device obtains the first video frame in the video stream to be sent to the first device. The video stream may be a local video stream of the second device, or a video stream received by the second device from other devices.

S602. The second device extracts text information from the first video frame.

It should be noted that, for the execution process of S602, refer to the execution process of S201, which will not be repeated here.

S603. The second device performs compression processing on the first video frame to obtain a second video frame.

It should be noted that the execution process of S603 can be referred to the execution process of S202, which will not be repeated here.

S604. The second device generates a time stamp.

Optionally, the second device may generate a time stamp according to the current time.

S605. The second device adds a time stamp to the text information and the second video frame respectively.

After the time stamp is added to the text information, the time stamp is included in the text information. After adding the time stamp to the second video frame, the second video frame includes the time stamp.

S606: The second device sends the second video frame including the time stamp to the first device through the first transmission channel.

S607: The second device sends the text information including the time stamp to the first device through the second transmission channel.

S608: The first device determines the correspondence between the text information and the second video frame according to the timestamp.

Optionally, the first device may receive multiple text information and multiple second video frames at the same moment. Therefore, the first device needs to determine the correspondence between the text information and the second video frame according to the timestamp. For example, the first device determines the text information and the second video frame with the same time stamp as the text information and the second video frame having a corresponding relationship.

S609. The first device generates a first image according to the text information.

Optionally, if the text information includes multiple sets of text content, the first device may generate a first image corresponding to each set of text content.

Optionally, for any set of text content, the first device generates a first image corresponding to the set of text content according to the set of text content, the font, size, color, and font effects of the set of text content, and the resolution of the first image The rate is greater than the resolution of the second video frame.

Optionally, the size of the first image may be the same as the size of the area occupied by the text content in the first video frame.

Hereinafter, the first image will be described with reference to FIG. 7.

FIG. 7 is a schematic diagram of a first image provided by an embodiment of the application. Please refer to FIG. 7, assuming that the first video frame is as shown in FIG. 701. The first video frame 701 includes text content "Happy House" and "Today's weather is good."

Assuming that the text information extracted in the first video frame is as shown in Table 1, the first device according to "Happy House" and "Position 1, Font 1, Size 1, Color 1, Spacing 1, Special Effects" in Table 1 1" can generate an image 702. In the image 702, the text content "Happy House" is included. The attribute information of the text content in the image 702 is the same as the attribute information of the text content in the first video frame 701. The size of the image 702 is the same as the size of the area occupied by the text content "Happy House" in the first video frame 701.

The first device can also generate an image 703 according to "It’s nice today" and "Position 2, font 2, size 2, color 2, spacing 2, special effects 2" in Table 1. In the image 703, the text content "Today The weather is good", the attribute information of the text content in the image 703 is the same as the attribute information of the text content in the first video frame 701. The size of the image 703 is the same as the size of the area occupied by the text content "It's nice today" in the first video frame 701.

S610. The first device combines the first image and the second video frame to obtain a third video frame.

Optionally, if the number of first images is multiple, the first device respectively performs merging processing on each first image and second video frame to obtain a third video frame.

S611. The first device plays the third video frame.

In the embodiment shown in FIG. 6, for any first video frame in the video stream, the second device first extracts text information from the first video frame, and compresses the first video frame to obtain the second video frame . The second device sends the second video frame and text information to the first device. Since the second video frame is a compressed video frame, the bandwidth pressure of the transmission channel between the second device and the first device is reduced. After the first device receives the second video frame and the text information, the first device may merge the second video frame and the text information to merge the text content in the text information into the second video frame to obtain the third video frame , And play the third video frame. Since the text content is not compressed, the definition of the text content in the third video frame played by the first device can be made higher, so that the quality of the video played by the first device is higher. high.

On the basis of any of the foregoing embodiments, the foregoing video processing method will be described below with reference to FIGS. 8-9.

FIG. 8 is a schematic diagram of a video processing process provided by an embodiment of this application. Referring to FIG. 8, the first video frame is shown as 801, and the first video frame 801 includes the text information "Happy House" and "It's nice today." Among them, the font, size, color, and font effects of the characters in the text content "Happy House" are the same, and the font, size, color, and font effects of the characters in the text content "It's nice today" are the same.

After acquiring the first video frame 801, the second device extracts the text information 802 from the first video frame 801, where the text information 802 includes two sets of text content "Happy House" and "It's nice today." And the attribute information of each group of text content. After the second device extracts text information in the first video frame 801, it is assumed that the first video frame 801 remains unchanged.

The second device performs compression processing on the first video frame 801 from which the text information has been extracted, to obtain a second video frame 803. The second device adds the same time stamp to the second video frame 803 and the text information 802, and transmits the second video frame 803 including the time stamp and the text information 802 including the time stamp.

After the first device receives the second video frame 803 including the time stamp and the text information 802 including the time stamp, the first device determines that the second video frame 803 corresponds to the text information 802 according to the time stamp. The first device generates an image 804 according to the text content "Happy House" and the font, size, color and font effects of the text information. The font, size, color and font effects of the text content "Happy House" in the image 804 are the same as The font, size, color, and font special effects of the text content in the first video frame 801 correspond to the same. The first device generates the image 805 according to the text content "It’s nice today" and the font, size, color and font effects of the text content, and generates the image 805. The text content "It’s nice today" font, size, color and font in the image 805 The special effect corresponds to the font, size, color, and font special effect of the text content in the first video frame 801.

The first device overlays the image 804 on the second video frame 803 according to the position information of the text content "Happy House" in the first video frame 801. The first device also overlays the image 805 on the second video frame 803 according to the position information of the text content "It's nice today" in the first video frame 801 to obtain the third video frame 806. The first device can play the third video frame 806.

In the embodiment shown in FIG. 8, not only can the bandwidth pressure between the second device and the first device be reduced, but also the definition of the text content in the third video frame played by the first device can be made higher, so that the The quality of the video played by a device is higher. Further, after the second device extracts the text information in the first video frame, the first video frame remains unchanged (the first video frame is not processed), so that the workload of the second device's video processing is small, so that the video The processing efficiency is higher.

FIG. 9 is a schematic diagram of another video processing process provided by an embodiment of this application. Referring to FIG. 9, the first video frame is shown as 901, and the first video frame 901 includes text information "Happy House" and "It's nice today." Among them, the font, size, color, and font effects of the characters in the text content "Happy House" are the same, and the font, size, color, and font effects of the characters in the text content "It's nice today" are the same.

After acquiring the first video frame 901, the second device extracts the text information 902 from the first video frame 901, where the text information 902 includes two sets of text content "Happy House" and "The weather is good today." And the attribute information of each group of text content. After the second device extracts the text information in the first video frame 901, the second device removes the text content in the first video frame 901 to obtain the first video frame 903. Referring to FIG. 9, the first video frame 903 does not include text content.

The second device performs compression processing on the first video frame 903 to obtain the second video frame 904. The second device adds the same time stamp to the second video frame 904 and the text information 902, and transmits the second video frame 904 including the time stamp and the text information 902 including the time stamp.

After the first device receives the second video frame 904 including the time stamp and the text information 902 including the time stamp, the first device determines that the second video frame 904 corresponds to the text information 902 according to the time stamp. The first device generates an image 905 based on the text content "Happy House" and the font, size, color and font effects of the text information. The font, size, color and font effects of the text content "Happy House" in the image 905 are consistent with The font, size, color, and font special effects of the text content in the first video frame 901 correspond to the same. The first device generates image 906 according to the text content "Today's weather is good" and the font, size, color and font effects of the text content, and generates image 906. The text content "Today's weather is good" in image 906 has the font, size, color and font. The special effects correspond to the font, size, color, and font special effects of the text content in the first video frame 901.

The first device overlays the image 905 on the second video frame 904 according to the position information of the text content "Happy House" in the first video frame 901. The first device also overlays the image 906 on the second video frame 904 according to the position information of the text content "It's nice today" in the first video frame 901 to obtain the third video frame 907. The first device can play the third video frame 907.

In the embodiment shown in FIG. 9, not only can the bandwidth pressure between the second device and the first device be reduced, but also the definition of the text content in the third video frame played by the first device can be made higher, so that the The quality of the video played by a device is higher. Further, after the second device extracts the text information in the first video frame, the text content is removed from the first video frame. In this way, the first device adds the first image (image 905 and image 905) to the second video frame. Image 906), the problem that the text content in the first image cannot completely cover the text content in the second video frame can be avoided, so that the quality of the video processing is higher.

Fig. 10 is a video processing device provided by an embodiment of the application. The video processing device 10 may be provided in the first device. Referring to FIG. 10, the video processing device 10 may include a receiving module 11 and a processing module 12, where:

The receiving module 11 is configured to receive a second video frame sent by a second device and text information extracted from the first video frame, where the second video frame is obtained by compressing the first video frame, so The text information includes text content and attribute information of the text content;

The processing module 12 is configured to add the text content to the second video frame according to the attribute information to obtain a third video frame to be played.

Optionally, the receiving module 11 may execute S203 in the embodiment in FIG. 2 and S606-S607 in the embodiment in FIG. 6.

Optionally, the processing module 12 may execute S204 in the embodiment in FIG. 2 and S608-S611 in the embodiment in FIG. 6.

It should be noted that the video processing apparatus shown in the embodiments of the present application can execute the technical solutions shown in the foregoing method embodiments, and the implementation principles and beneficial effects are similar, and will not be repeated here.

In a possible implementation manner, the processing module 12 is specifically configured to:

In a possible implementation manner, before the processing module adds the text content to the second video frame according to the attribute information, the processing module 12 is further configured to:

Acquiring a first identifier in the second video frame;

Acquiring the second identifier in the text information;

In a possible implementation manner, the receiving module 11 is specifically configured to:

In a possible implementation manner, after the processing module 12 adds the text content to the second video frame according to the attribute information to obtain the third video frame to be played, the processing module 12 Also used for:

Playing the third video frame;

or,

FIG. 11 is another video processing device provided by an embodiment of this application. The video processing device 20 may be provided in a second device. Referring to FIG. 11, the video processing device 20 may include a processing module 21 and a sending module 22, where:

The processing module 21 is configured to extract text information from a first video frame, where the text information includes text content and attribute information;

The processing module 21 is further configured to perform compression processing on the first video frame to obtain a second video frame;

The sending module 22 is configured to send the second video frame and the text information to the first device.

Optionally, the processing module 21 may execute S201-S202 in the embodiment of FIG. 2 and S601-S605 in the embodiment of FIG. 6.

Optionally, the sending module 22 may execute S203 in the embodiment in FIG. 2 and S606-S607 in the embodiment in FIG. 6.

In a possible implementation manner, before the sending module 22 sends the second video frame and the text information to the first device, the processing module 21 is further configured to:

Generate the first identification;

In a possible implementation manner, the sending module 22 is specifically used for:

FIG. 12 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the application. 12, the video processing device 30 includes: a memory 31, a processor 32, and a receiver 33, where the memory 31 and the processor 32 communicate; exemplary, the memory 31, the processor 32, and the receiver 33 can communicate through The bus 44 communicates, the memory 31 is used to store a computer program, and the processor 32 executes the computer program to implement the foregoing video processing method.

Optionally, the processor 32 shown in the present application may implement the function of the processing module 12 in the embodiment of FIG. 10, and the receiver 33 may implement the function of the receiving module 11 in the embodiment of FIG. 10, which will not be repeated here.

Optionally, the foregoing processor may be a CPU, or other general-purpose processors, DSPs, ASICs, and so on. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps in the embodiment of the authentication method disclosed in this application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.

FIG. 13 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the application. Referring to FIG. 13, the video processing device 40 includes: a memory 41, a processor 42, and a transmitter 43, where the memory 41 and the processor 42 communicate; for example, the memory 41, the processor 42 and the transmitter 43 can communicate through The bus 44 communicates, the memory 41 is used to store a computer program, and the processor 42 executes the computer program to implement the foregoing video processing method.

Optionally, the processor 42 shown in the present application may implement the function of the processing module 21 in the embodiment of FIG. 11, and the receiver 43 may implement the function of the sending module 22 in the embodiment of FIG. 11, which will not be repeated here.

The present application provides a storage medium, the storage medium is used to store a computer program, and the computer program is used to implement the video processing method described in the foregoing embodiment.

The embodiment of the present application also provides a chip or integrated circuit, including: a memory and a processor;

The processor is configured to call the program instructions stored in the memory to implement the video processing method described above.

Optionally, the memory can be independent or integrated with the processor. In some embodiments, the memory may also be located outside the chip or integrated circuit.

An embodiment of the present application also provides a program product, the program product includes a computer program, the computer program is stored in a storage medium, and the computer program is used to implement the above-mentioned video processing method.

All or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware. The aforementioned program can be stored in a readable memory. When the program is executed, it executes the steps that include the foregoing method embodiments; and the foregoing memory (storage medium) includes: read-only memory (English: read-only memory, abbreviation: ROM), RAM, flash memory, hard disk, Solid state hard drives, magnetic tapes (English: magnetic tape), floppy disks (English: floppy disk), optical discs (English: optical disc) and any combination thereof.

The embodiments of this application are described with reference to the flowcharts and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processing unit of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processing unit of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the present application. In this way, if these modifications and variations of the embodiments of this application fall within the scope of the claims of this application and their equivalent technologies, this application is also intended to include these modifications and variations.

In this application, the term "including" and its variations may refer to non-limiting inclusion; the term "or" and its variations may refer to "and/or". The terms "first", "second", etc. in this application are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. In this application, "plurality" means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are in an "or" relationship.

Claims

A video processing method, characterized by comprising:

The first device receives a second video frame sent by the second device and text information extracted from the first video frame, where the second video frame is obtained by compressing the first video frame, and the text information includes Text content and attribute information of the text content;

The first device adds the text content to the second video frame according to the attribute information to obtain a third video frame to be played.
The method according to claim 1, wherein the first device adds the text content to the second video frame according to the attribute information to obtain the third video frame to be played, comprising:

Generating, by the first device, a first image corresponding to the text content according to the text content and the attribute information, the resolution of the first image is greater than the resolution of the second video frame;

The first device adds the first image to the second video frame according to the attribute information to obtain the third video frame.
The method according to claim 2, wherein the first device generating the first image corresponding to the text content according to the text content and the attribute information comprises:

The first device determines at least one group of text content in the text content, each group of text content includes at least one character, and the font, size, color, and font special effect of each character in the group of text content are the same, and the font special effect Including at least one of affine, rotation or projection;

The first device respectively generates a first image corresponding to each group of text content according to the attribute information of each group of text content.
The method according to claim 2 or 3, wherein the area in the first image other than the text content is transparent.
The method according to any one of claims 2-4, wherein the first device adds the first image to the second video frame according to the attribute information to obtain the to-be-played The third video frame includes:

The first device obtains location information of the text content in the first video frame from the attribute information

The first device adds the first image to the second video frame according to the position information to obtain the third video frame.
The method according to any one of claims 1 to 5, wherein the first device adds the text content before the second video frame according to the attribute information, further comprising:

Acquiring, by the first device, a first identifier in the second video frame;

The first device obtains the second identifier in the text information;

The first device determines that the first identifier and the second identifier are the same.
The method according to claim 6, wherein the first identifier and the second identifier are the same time stamp.
The method according to any one of claims 1-7, wherein the receiving, by the first device, the second video frame sent by the second device and the text information extracted from the first video frame, comprises:

Receiving, by the first device, the second video frame sent by the second device from a first transmission channel;

The first device receives the text information sent by the second device from a second transmission channel, and the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.
The method according to any one of claims 1-8, wherein the attribute information includes the position, font, size, color, and font special effects of the text content in the video frame, and the font special effects include At least one of affine, rotation, or projection.
The method according to any one of claims 1-9, wherein the first device adds the text content to the second video frame according to the attribute information to obtain the third video to be played After the frame, it also includes:

Playing the third video frame by the first device;

or,

The first device sends the third video frame to a third device, and the third device is used to play the third video frame.
A video processing method, characterized by comprising:

The second device extracts text information from the first video frame, where the text information includes text content and attribute information;

Performing compression processing on the first video frame by the second device to obtain a second video frame;

The second device sends the second video frame and the text information to the first device.
The method according to claim 11, wherein before the second device sends the second video frame and the text information to the first device, the method further comprises:

Generating the first identifier by the second device;

The second device adds the first identifier to the second video frame and the text information respectively.
The method according to claim 12, wherein the first identifier is a timestamp generated by the second device.
The method according to claims 11-13, wherein the second device sending the second video frame and the text information to the first device comprises:

Sending, by the second device, the second video frame to the first device through a first transmission channel;

The second device sends the text information to the first device through a second transmission channel, and the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.
The method according to any one of claims 11-14, wherein the attribute information includes the position, font, size, color, and font effects of the text content in the video frame, and the font effects include At least one of affine, rotation, or projection.
A video processing device, which is characterized by comprising a receiving module and a processing module, wherein:

The receiving module is configured to receive a second video frame sent by a second device and text information extracted from the first video frame, where the second video frame is obtained by compressing the first video frame, and The text information includes text content and attribute information of the text content;

The processing module is configured to add the text content to the second video frame according to the attribute information to obtain a third video frame to be played.
The device according to claim 16, wherein the processing module is specifically configured to:

Generating a first image corresponding to the text content according to the text content and the attribute information, where the resolution of the first image is greater than the resolution of the second video frame;

According to the attribute information, the first image is added to the second video frame to obtain the third video frame.
The device according to claim 17, wherein the processing module is specifically configured to:

Determine at least one group of text content in the text content, each group of text content includes at least one character, the font, size, color, and font special effects of each character in the group of text content are the same, and the font special effects include affine, rotation Or at least one of projections;

The first image corresponding to each group of text content is generated according to the attribute information of each group of text content respectively.
The device according to claim 17 or 18, wherein the area in the first image other than the text content is transparent.
The device according to any one of claims 17-19, wherein the processing module is specifically configured to:

Obtain the position information of the text content in the first video frame in the attribute information

According to the position information, the first image is added to the second video frame to obtain the third video frame.
The device according to any one of claims 16-20, wherein before the processing module adds the text content to the second video frame according to the attribute information, the processing module also uses in:

Acquiring a first identifier in the second video frame;

Acquiring the second identifier in the text information;

It is determined that the first identifier and the second identifier are the same.
The device according to any one of claims 16-21, wherein the receiving module is specifically configured to:

Receiving the second video frame sent by the second device from the first transmission channel;

Receiving the text information sent by the second device from a second transmission channel, where the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.
The device according to any one of claims 16-22, wherein the attribute information includes the position, font, size, color, and font special effects of the text content in the video frame, and the font special effects include At least one of affine, rotation, or projection.
A video processing device, characterized in that it comprises a processing module and a sending module, wherein:

The processing module is configured to extract text information from a first video frame, where the text information includes text content and attribute information;

The processing module is further configured to perform compression processing on the first video frame to obtain a second video frame;

The sending module is configured to send the second video frame and the text information to the first device.
The apparatus according to claim 24, wherein before the sending module sends the second video frame and the text information to the first device, the processing module is further configured to:

Generate the first identification;

The first identifier is added to the second video frame and the text information respectively.
The device according to claim 24 or 25, wherein the sending module is specifically configured to:

Sending the second video frame to the first device through a first transmission channel;

The text information is sent to the first device through a second transmission channel, and the second transmission channel is a parallel bypass small bandwidth channel of the first transmission channel.
The device according to any one of claims 24-26, wherein the attribute information includes the position, font, size, color, and font special effects of the text content in the video frame, and the font special effects include At least one of affine, rotation, or projection.
A video processing device, comprising: a memory, a processor, and a computer program, the computer program is stored in the memory, and the processor runs the computer program to execute any one of claims 1-10 The described video processing method.
A video processing device, characterized by comprising: a memory, a processor, and a computer program, the computer program is stored in the memory, and the processor runs the computer program to execute any one of claims 11-15 The described video processing method.
A storage medium, wherein the storage medium includes a computer program, and the computer program is used to implement the video processing method according to any one of claims 1-15.