CN112511860B

CN112511860B - Picture transmission method with clear character area

Info

Publication number: CN112511860B
Application number: CN202011338605.3A
Authority: CN
Inventors: 张浪; 孙利杰; 欧阳殷朝; 陈松政; 刘文清; 杨涛
Original assignee: Hunan Qilin Xin'an Technology Co ltd
Current assignee: Hunan Qilin Xin'an Technology Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2022-05-24
Anticipated expiration: 2040-11-25
Also published as: CN112511860A

Abstract

The invention discloses a picture transmission method with clear text areas, which comprises the steps of compressing and encoding screen image data by a server and decompressing and decoding the screen image data by a client, wherein the step of compressing and encoding the screen image data by the server comprises the following steps: capturing a current picture P_iObtaining a character recognition area according to the unit blocks meeting the conditions; picture P_iPicture P transcoded into YUV format_i1(ii) a For picture P according to character recognition algorithm_i1Carrying out character recognition on the Y component of the character recognition area to obtain a character area; for picture P_i1H264 data and reconstructed picture P are obtained after h264 coding_i2(ii) a Picture P_i1And picture P_i2Performing YUV data difference calculation on the Chinese character area to obtain character difference data; and compressing the character differential data according to a compression algorithm to obtain a character differential compressed data packet, and enclosing and compressing the h264 data and the character differential compressed data into a picture compressed data packet and then sending the picture compressed data packet to the client. The invention reduces the bandwidth consumption, ensures the clear text area and improves the user experience.

Description

Picture transmission method with clear character area

Technical Field

The invention relates to the field of cloud desktop image transmission, in particular to an image transmission method with clear text areas.

Background

The computer screen transmission technology plays an important role in a cloud desktop, a network teaching system and a video conference system, a general method is that a computer screen image is captured, video compression and encoded, and then the computer screen image is transmitted to a client side through a network for display, in order to reduce network bandwidth (especially transmission across the public network) in the transmission process, a lossy compression algorithm with a relatively large compression ratio is generally adopted for video encoding, when the client side displays the image, the whole image can become fuzzy due to lossy compression, especially when the compression ratio is large, the image can be more fuzzy, and therefore sensitive areas of some images cannot be identified, especially text areas.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides a picture transmission method with clear text areas, which can ensure that the consumed bandwidth is small, and can also ensure that sensitive areas such as texts are clear, so as to improve the user experience.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a clear picture transmission method for a character area comprises a step of compressing and coding screen image data by a server, and specifically comprises the following steps:

A1) capturing a screen picture as a current picture P according to preset time_iWill be located in the previous picture P_i-1Change region and located in current picture P_iAdding position information of a unit block of a non-change area into a current character recognition area set A, wherein the unit block is an area divided by a screen according to lines and rows;

A2) the current picture P_iPicture P transcoded into YUV format_i1；

A3) Acquiring a picture P according to elements in the current character recognition area set A_i1According to the character recognition algorithm, character recognition is carried out on the Y component of each unit block to be recognized, and the position information of the unit block to be recognized which is successfully recognized is added into the current character area set B;

A4) for picture P_i1H264 coding is carried out to obtain coded h264 dataAnd reconstructed picture P_i2；

A5) Respectively acquiring a picture P according to elements in the current character region set B_i1And picture P_i2In the character unit blocks corresponding to each other, the picture P is divided into_i1Chinese character unit block and corresponding picture P_i2Performing difference calculation on YUV data of the Chinese character unit block to obtain corresponding character difference data, and adding the position information of the Chinese character unit block and the corresponding character difference data into a current character area detail set C;

A6) and compressing the current text area detail set C according to a compression algorithm to obtain a text differential compression data packet, and enclosing and compressing the encoded h264 data and the text differential compression data into a picture compression data packet and then sending the picture compression data packet to the client.

Further, the method further comprises the step that the client decompresses the decoded screen image data, and specifically comprises the following steps:

B1) acquiring a picture compression data packet sent by a server, and decompressing the picture compression data packet;

B2) if the decompressed content comprises a character differential compression data packet, decompressing the character differential compression data packet to obtain a character region detail set C, and decoding the decompressed h264 data to obtain a reconstructed picture P_i2The character difference data in the character area detail set C and the picture P_i2Synthesizing to obtain clear character picture P_i3Will picture P_i3As a final picture; otherwise, decoding the decompressed h264 data to obtain a reconstructed picture P_i2Will picture P_i2As a final picture; combining the character difference data in the character area detail set C with the picture P_i2Synthesizing to obtain clear character picture P_i3The method specifically comprises the following steps: acquiring a picture P according to the position information in the character area detail set C_i2And the character unit block is matched with the character area detail set C to obtain corresponding character difference data, and the YUV data of the character unit block and the corresponding character difference data are added to obtain new YUV data of the character unit block.

Further, step a1) is preceded by a step of dividing the cell blocks, specifically including: dividing a screen into nw rows of unit blocks with the same nh column size according to the preset unit length w and the preset unit width h, defining a flag set [ nw ] [ nh ] of all the unit blocks, and setting all flags in the flag set [ nw ] [ nh ] to be 0.

Further, the step a1) specifically includes: obtaining a current picture P_iRelative to the previous frame P_i-1All the unit blocks corresponding to the changed area of (a) are taken as the first unit block, and the current picture P is acquired_iRelative to the previous frame P_i-1All the cell blocks corresponding to the unchanged area of (a) are taken as second cell blocks, and a flag set flag [ nw ] is set][nh]Setting a mark corresponding to the first cell block to be 1, and respectively matching the second cell block with a mark set flag [ nw ]][nh]If the mark corresponding to the second unit block is 1, adding the position information of the second unit block into the current character recognition area set A, and simultaneously, setting a mark set flag [ nw ]][nh]The flag corresponding to the second cell block is set to 0.

Further, capturing the screen picture as the current picture P according to the preset time in the step A1)_iThe method specifically comprises the following steps: judging whether the screen image changes within the preset time, if so, capturing the current screen image as the current image P_iOtherwise, the previous picture P is_i-1As the current picture P_i。

Further, step a1) further includes a processing step when the current character recognition area set a is empty: if the current character recognition area set A is empty, the current picture P is divided into a plurality of pictures_iPicture P transcoded into YUV format_i1And then h264 coding is carried out to obtain coded h264 data, and the coded h264 data is compressed into a picture compression data packet and then is sent to the client.

Further, before the step a5), a step of network judgment is further included, which specifically includes:

C1) judging whether the network condition meets a preset condition, if so, jumping to the step A5), and if not, entering the step C2);

C2) respectively obtaining a picture P according to the elements in the character region set B_i1And picture P_i2In the character unit blocks corresponding to each other, the picture P is divided into_i1Chinese character unit block and corresponding picture P_i2Chinese character sheetAnd carrying out difference calculation on the Y component data of the metablock to obtain corresponding character difference data, adding the position information of the character monoblock and the corresponding character difference data into the character area detail set C, and jumping to the step A6).

Further, before the step a6), a step of network judgment is further included, which specifically includes:

D1) judging whether the network condition meets a preset condition, if so, jumping to the step A6), and if not, entering the step D2);

D2) and compressing the encoded h264 data into a picture compression data packet, sending the picture compression data packet to the client, and returning to the step A1).

Further, the character recognition algorithm in the step a3) is a maximum stable extremum region algorithm.

Further, the compression algorithm in step a6) is a run length compression algorithm or a zlib compression algorithm.

Compared with the prior art, the invention has the advantages that:

1. the screen is divided into the unit blocks, only the areas where some unit blocks are located need to be identified during character identification, and the whole picture does not need to be identified, so that the consumption of a CPU (Central processing Unit) can be reduced;

2. the method does not perform character recognition on the area with changed picture in the recognition process, and performs recognition on the area without changing for only one time, thereby reducing the frequency of character recognition and further reducing the CPU consumption caused by character recognition;

3. the method extracts the details of the text area lost due to h264 lossy compression on the premise of keeping the characteristic of h264 high compression ratio, and transmits the detail data after compressing, thereby reducing bandwidth consumption;

4. the method of the invention carries out character recognition according to the Y component without carrying out gray level processing on the image, thereby improving the processing efficiency and reducing the CPU consumption.

Drawings

FIG. 1 is a diagram illustrating steps of encoding and compressing screen image data according to various embodiments of the present invention.

FIG. 2 is a flow chart of encoding and compressing screen image data according to various embodiments of the present invention.

FIG. 3 is a diagram illustrating steps for decoding decompressed screen image data according to various embodiments of the present invention.

FIG. 4 is a flow chart of decoding decompressed screen image data in accordance with various embodiments of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.

Before the subsequent method is carried out, firstly dividing a screen into unit blocks according to lines and columns, assuming that the screen is wide and high, dividing the screen into the unit blocks according to a preset unit length w and a preset unit width h, wherein the unit blocks are small blocks with the size of w h, namely, each unit block is the same in size and is an area with the size of w h on the screen, the smaller the unit length w and the unit width h are, the finer the character recognition in the subsequent step is, but the CPU consumption is correspondingly increased, and the specific values of the unit length w and the unit width h can be adjusted according to the reality, so that the method has the following advantages that:

cell block line count: nw ═ w-1/w

Number of cell block columns: nh ═ h (height + h-1)/h

The screen can be divided into a total of nw rows and nh columns of unit blocks of the same size.

And then defining a flag set flag [ nw ] [ nh ] of all the cell blocks, wherein the flags in the flag set flag [ nw ] [ nh ] correspond to the cell blocks one by one, and setting all the flags in the flag set flag [ nw ] [ nh ] to 0, namely setting the flag [ nw ] [ nh ] to be {0 }.

Example one

As shown in fig. 1 and fig. 2, the method for transmitting a clear text area image in this embodiment includes a step of compressing and encoding screen image data by a server, and specifically includes:

A1) capturing a screen picture as a current picture P according to preset time_iWill be located in the previous picture P_i-1Change region and located in current picture P_iAdding position information of a unit block in a non-change area into a current character recognition area set A, wherein A is { c0... cn }, and the unit block is a screen line-by-lineIn the area divided by the columns, a screen picture capturing program can call interfaces such as NVIDIA NVFBC, AMD RapidFire, Windows DXGI, QXL, Mirror Driver and the like, and the API interfaces can realize the acquisition of the whole screen picture and the screen change area;

A2) the current picture P_iPicture P transcoded into YUV format_i1；

A3) Acquiring a picture P according to elements in the current character recognition area set A_i1Performing character recognition on the unit block to be recognized according to a character recognition algorithm aiming at the Y component of each unit block to be recognized, and adding the position information of the unit block to be recognized which is successfully recognized into a current character area set B, wherein B is { k0... km };

A4) for picture P_i1H264 lossy coding is carried out, 2 data can be obtained during coding through the conventional x264 coding interface, one is coded h264 data, and the other is a reconstructed picture P_i2Taking the open source coding interface of x264 as an example:

X264_API int x264_encoder_encode(x264_t*,x264_nal_t**pp_nal,int*pi_nal,x264_picture_t*pic_in,x264_picture_t*pic_out)；

x264_ picture _ t pic _ in: here the original YUV picture P is passed in_i1；

x264_ nal _ t × pp _ nal: here, a coded h264 picture is obtained;

x264_ picture _ t pic _ out: where the picture P of the reconstructed image is obtained_i2；

Picture P_i2The YUV data is the picture P_i1The YUV data is decoded after h264 lossy coding, so that the picture P_i2Comparing picture P with YUV data_i1The raw YUV data loses much detail and causes picture blurring;

A5) respectively acquiring a picture P according to elements in the current character region set B_i1And picture P_i2In the character unit blocks corresponding to each other, the picture P is divided into_i1Chinese character unit block and corresponding picture P_i2Carrying out difference calculation on YUV data of the Chinese character unit block to obtain corresponding character difference data, and comparing position information of the character unit block with the corresponding character difference dataAdding the corresponding text difference data into a current text area detail set C, wherein C is { g0... gm };

In this embodiment, the text recognition is performed only when the position of the cell block satisfies that the cell block is located in a previous picture change region and is located in a current picture non-change region, where the change region is a region where a subsequent picture changes relative to a previous picture, and the non-change region is a region where the subsequent picture does not change relative to the previous picture. If no cell block satisfies the above condition, indicating that the picture is changed all the time, the current character recognition area set a is empty, and step a1) of this embodiment further includes the processing steps of: if the current character recognition area set A is empty, the current picture P is divided into a plurality of pictures_iPicture P transcoded into YUV format_i1And then carrying out h264 lossy coding to obtain coded h264 data, compressing the coded h264 data into a picture compression data packet, and sending the picture compression data packet to the client. I.e. the current picture P is directly put on without a unit block satisfying the aforementioned conditions_iTranscoding and h264 lossy coding are carried out, then h264 data is compressed and sent to the client, character recognition on a constantly changing picture is skipped, and CPU consumption caused by character recognition is reduced.

In step a1) of this embodiment, the preset time is a time when the text changes from the blur to the clear, and the smaller the value of the preset time, the faster the speed of the text changing from the blur to the clear, the higher the CPU consumption, and the adjustment can be performed according to actual needs. If the picture of the screen does not change after the preset time, it indicates that the areas of the positions of all the cell blocks do not change, and the screen picture is captured as the current picture P according to the preset time in step a1)_iThe method specifically comprises the following steps: judging whether the screen image changes within the preset time, if so, capturing the current screen image as the current image P_iOtherwise, the previous picture P is_i-1As the current picture P_i. No change occurs to the screenIn the present embodiment, the last captured screen is used to perform the subsequent processing, so as to reduce the resource consumption.

The specific step of step a1) of this embodiment includes: obtaining a current picture P_iRelative to the previous frame P_i-1All the unit blocks corresponding to the changed area of (a) are taken as the first unit block, and the current picture P is acquired_iRelative to the previous frame P_i-1All the cell blocks corresponding to the unchanged area of (a) are taken as second cell blocks, and a flag set flag [ nw ] is set][nh]Setting a mark corresponding to the first cell block to be 1, and respectively matching the second cell block with a mark set flag [ nw ]][nh]If the mark corresponding to the second unit block is 1, adding the position information of the second unit block into the current character recognition area set A, and simultaneously, setting a mark set flag [ nw ]][nh]The flag corresponding to the second cell block is set to 0. Through the steps, only the change area of the cell block is changed into the invariable area, and then the character recognition is carried out once, so that the CPU consumption caused by the character recognition is further reduced.

In this embodiment, the character recognition algorithm in step a3) is the maximum stable extremum area algorithm MESR, the YUV format includes Y, U, V3 components, where the Y component represents brightness, and if only the Y component in the picture becomes a black, white and gray picture without color, and the UV component represents color, and the character recognition can be implemented only for the Y component by the maximum stable extremum area algorithm, and if a character is recognized, the recognition is successful, otherwise the recognition fails.

The compression algorithm in step a6) of the present embodiment is a conventional compression algorithm, such as a run length compression algorithm RLE or zlib compression algorithm.

As shown in fig. 3 and 4, the method for transmitting a clear text region picture further includes a step of decompressing, by the client, decoded screen image data, which specifically includes:

B2) if the decompressed content comprises a character differential compression data packet, decompressing the character differential compression data packet to obtain a character region detail set C, and decoding the decompressed h264 data to obtain a reconstructed picture P_i2The character difference data in the character area detail set C and the picture P_i2Synthesizing to obtain clear character picture P_i3Will picture P_i3As a final picture; otherwise, decoding the decompressed h264 data to obtain a reconstructed picture P_i2Will picture P_i2As the final picture.

Combining the character difference data in the character area detail set C with the picture P_i2Synthesizing to obtain clear character picture P_i3The method specifically comprises the following steps: acquiring a picture P according to the position information in the character area detail set C_i2And the character unit block is matched with the character area detail set C to obtain corresponding character differential data, and the YUV data of the character unit block and the corresponding character differential data are added to obtain new YUV data of the character unit block.

Therefore, according to the method of the embodiment, the screen is divided into the unit blocks, the server side performs character recognition on the unit blocks, CPU consumption is reduced, meanwhile, only character recognition is performed once on the unit blocks changed from the change area to the non-change area, character recognition is not performed on the change area, CPU consumption is further reduced, finally, difference calculation is performed on original YUV data of the captured screen image and reconstructed image YUV data after h264 coding to extract character difference data of the character area, the character difference data and the encoded h264 data are packed, compressed and sent to the client side, the client side only needs to synthesize the character difference data and the reconstructed image to obtain an image with clear characters, and the display effect of the characters is guaranteed on the premise of saving network bandwidth.

Example two

The present embodiment is basically the same as the first embodiment, except that a step of network judgment is further included before step a5), which specifically includes:

C2) respectively obtaining a picture P according to the elements in the character region set B_i1And picture P_i2In the character unit blocks corresponding to each other, the picture P is divided into_i1Chinese character unit block and corresponding picture P_i2And D, carrying out difference calculation on the Y component data of the Chinese character unit block to obtain corresponding character difference data, adding the position information of the character unit block and the corresponding character difference data into the character area detail set C, and skipping to the step A6).

Correspondingly, in the step of decompressing the decoded screen image data by the client, step B3) specifically includes: acquiring a picture P according to the position information in the character area detail set C_i2And the character unit block is matched with the character area detail set C to obtain corresponding character differential data, and the YUV data or Y component data of the character unit block and the corresponding character differential data are added to obtain new YUV data of the character unit block.

Through the steps, under the condition of poor network condition, the data transmission between the server and the client saves the network bandwidth, and the picture of the client can still display clear characters.

EXAMPLE III

The present embodiment is basically the same as the second embodiment, except that before step a6), a step of network judgment is further included, which specifically includes:

Through the above steps, on the basis of the second embodiment, the embodiment only sends the encoded h264 data for worse network conditions, so that smooth pictures of the client are ensured, and transmission of the text differential compression data packet is resumed when the network conditions are relieved.

The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims

1. A picture transmission method with clear text areas is characterized by comprising the step of compressing and coding screen image data by a server, and specifically comprises the following steps:

A2) the current picture P_iPicture P transcoded into YUV format_i1；

A4) for picture P_i1H264 coding is carried out to obtain coded h264 data and reconstructed picture P_i2；

A5) Respectively acquiring a picture P according to elements in the current character region set B_i1And picture P_i2In the character unit blocks corresponding to each other, the picture P is divided into_i1Chinese character unit block and corresponding picture P_i2Carrying out difference calculation on YUV data of the Chinese character unit block to obtain corresponding character difference data, and adding the position information of the character unit block and the corresponding character difference data into a current character area detail set C;

2. The method for transmitting a picture with clear text areas according to claim 1, further comprising a step of decompressing the decoded screen image data by the client, specifically comprising:

B2) if the decompressed content comprises a character differential compression data packet, decompressing the character differential compression data packet to obtain a character region detail set C, and decoding the decompressed h264 data to obtain a reconstructed picture P_i2The character difference data in the character area detail set C and the picture P_i2Synthesizing to obtain clear character picture P_i3Will picture P_i3As a final picture; otherwise, decoding the decompressed h264 data to obtain a reconstructed picture P_i2Will picture P_i2As a final picture; combining the character difference data in the character area detail set C with the picture P_i2Synthesizing to obtain clear character picture P_i3The method specifically comprises the following steps: acquiring a picture P according to the position information in the character area detail set C_i2And the character unit block is matched with the character area detail set C to obtain corresponding character differential data, and the YUV data of the character unit block and the corresponding character differential data are added to obtain new YUV data of the character unit block.

3. The method for transmitting a picture with clear text areas according to claim 1, wherein step a1) is preceded by a step of dividing the cell blocks, specifically comprising: dividing a screen into nw rows of unit blocks with the same nh column size according to the preset unit length w and the preset unit width h, defining a flag set [ nw ] [ nh ] of all the unit blocks, and setting all flags in the flag set [ nw ] [ nh ] to be 0.

4. The character area of claim 3 is clearThe picture transmission method is characterized in that the step A1) specifically comprises the following steps: obtaining a current picture P_iRelative to the previous frame P_i-1All the unit blocks corresponding to the changed area of (a) are taken as the first unit block, and the current picture P is acquired_iRelative to the previous picture P_i-1All the cell blocks corresponding to the unchanged area of (a) are taken as second cell blocks, and a flag set flag [ nw ] is set][nh]Setting a mark corresponding to the first cell block to be 1, and respectively matching the second cell block with a mark set flag [ nw ]][nh]If the mark corresponding to the second unit block is 1, adding the position information of the second unit block into the current character recognition area set A, and simultaneously, setting a mark set flag [ nw ]][nh]The flag corresponding to the second cell block is set to 0.

5. The method for transmitting frames with clear text areas according to claim 1, wherein in step A1), the screen frame is grabbed according to the preset time as the current frame P_iThe method specifically comprises the following steps: judging whether the screen image changes within the preset time, if so, capturing the current screen image as the current image P_iOtherwise, the previous picture P is_i-1As the current picture P_i。

6. The method for transmitting pictures with clear text areas according to claim 1, wherein the step a1) further comprises the processing steps of when the current text recognition area set a is empty: if the current character recognition area set A is empty, the current picture P is divided into a plurality of pictures_iPicture P transcoded into YUV format_i1And then h264 coding is carried out to obtain coded h264 data, and the coded h264 data is compressed into a picture compression data packet and then is sent to the client.

7. The method for transmitting pictures with clear text areas according to claim 1, wherein step a5) is preceded by a step of network judgment, which specifically comprises:

8. The method for transmitting pictures with clear text areas according to claim 1, wherein step a6) is preceded by a step of network judgment, which specifically comprises:

D1) judging whether the network condition meets the preset condition, if so, jumping to the step A6), otherwise, entering the step D2);

9. The method for transmitting frames with clear text areas according to claim 1, wherein the text recognition algorithm in step A3) is a maximum stable extremum area algorithm.

10. The method for transmitting pictures with clear text areas according to claim 1, wherein the compression algorithm in step a6) is a run length compression algorithm or a zlib compression algorithm.