CN111182220A

CN111182220A - Image processing apparatus, remote device, and communication system

Info

Publication number: CN111182220A
Application number: CN202010020909.9A
Authority: CN
Inventors: 杨璐; 范志刚
Original assignee: Xian Wanxiang Electronics Technology Co Ltd
Current assignee: Xian Wanxiang Electronics Technology Co Ltd
Priority date: 2020-01-09
Filing date: 2020-01-09
Publication date: 2020-05-19
Also published as: WO2021139418A1

Abstract

The utility model provides an image processing device, far-end equipment and communication system relates to computer coding technical field, the device includes: the device comprises an image acquisition unit, an image coding unit, a sound acquisition unit, a first sound coding unit and a first transmission control unit, wherein the image acquisition unit is used for acquiring a computer picture according to a preset first control parameter, and the sound acquisition unit is used for acquiring first sound data according to the parameter; the image coding unit is used for carrying out image coding on the computer picture according to a preset second control parameter to obtain first coded data; the first sound coding unit is used for carrying out sound coding on the first sound data to obtain second coded data; and the first transmission control unit is used for sending the first coded data and the second coded data to the remote equipment according to a preset third control parameter. The disclosed embodiment can realize dynamic adjustment of transmission parameters in the communication system to transmit sound and image data.

Description

Image processing apparatus, remote device, and communication system

Technical Field

The present disclosure relates to the field of computer coding, and in particular, to an image processing apparatus, a remote device, and a communication system.

Background

The prior remote computer picture transmission adopts the technology which is generally divided into two types, one type is remote picture transmission based on drawing instructions, and the other type is remote picture transmission based on pixel coding.

The defects of the method for remote picture transmission based on drawing instructions comprise that: firstly, in order to draw a display picture at a far end, part of source file data needs to be carried during transmission, but the far end stores the source file data, so that the safety problem exists; second, since drawing commands may be different between computers, the remote end can decode and display the picture only if the drawing command at the remote end is the same as the drawing command at the source end, and thus, the method is less applicable.

Drawbacks of the method for remote picture transmission based on pixel coding include: firstly, the quality of the display picture cannot be dynamically adjusted at the far end; second, the commonly used encoding protocol is h.264 or h.265 or other open source video protocols, which may result in lower security of network transmissions.

Disclosure of Invention

The embodiment of the disclosure provides an image processing device, a remote device and a communication system, which can realize dynamic adjustment of transmission parameters in the communication system to transmit sound and image data. The technical scheme is as follows:

in a first aspect of the disclosed embodiments, an image processing apparatus is provided, the apparatus including: the system comprises an image acquisition unit, an image coding unit, a sound acquisition unit, a first sound coding unit and a first transmission control unit;

the image acquisition unit is used for acquiring computer pictures according to preset first control parameters, the sound acquisition unit is used for acquiring first sound data according to the parameters, and the first control parameters are used for controlling the acquisition of the computer pictures;

the image coding unit is used for carrying out image coding on a computer picture according to a preset second control parameter to obtain first coded data; the first sound coding unit is used for carrying out sound coding on the first sound data to obtain second coded data; the second control parameter is used for controlling the image coding quality;

and the first transmission control unit is used for sending the first coded data and the second coded data to the remote equipment according to a preset third control parameter, and the third control parameter is used for controlling the transmission quality of the first transmission control unit to the remote equipment.

In one embodiment, the apparatus further comprises a control decoding unit and a first sound decoding unit;

the first transmission control unit receives reverse control data sent by the far-end equipment, sends the reverse control data to the control decoding unit for decoding to obtain first decoding data, and sends the first decoding data to the host.

And the first transmission control unit receives second sound data sent by the far-end equipment, sends the second sound data into the first sound decoding unit for decoding to obtain second decoded data, and sends the second decoded data to the host.

In one embodiment, the first transmission control unit receives USB mouse operation encoding data sent by the remote device and sends the USB mouse operation encoding data to the host.

In one embodiment, the apparatus further includes a scheduling unit, where the scheduling unit obtains first control data of the first transmission control unit, second control data of the image encoding unit, and third control data of the control decoding unit, respectively, and updates the first control parameter, the second control parameter, and the third control parameter according to a preset decision model based on the first control data, the second control data, and the third control data.

In one embodiment, when the third control parameter indicates that the bandwidth is insufficient, the first transmission control unit sequentially transmits the first decoded data, the second encoded data, and the second decoded data according to a preset priority.

In a second aspect of the embodiments of the present disclosure, there is provided a remote device, including: a second sound decoding unit, an image decoding unit, a second transmission control unit and a play control unit,

the second transmission control unit receives the first coded data and the second coded data sent by the image processing device, sends the first coded data to the image decoding unit to obtain third decoded data, sends the second coded data to the second sound decoding unit to obtain fourth decoded data, and sends the third decoded data and the fourth decoded data to the playing control unit;

the playing control unit controls the third decoding data and the fourth decoding data to synchronously display and play.

In one embodiment, the remote device further comprises a USB unit, a mouse decoding unit, and a mouse drawing unit, and the second transmission control unit further receives mouse graphic data generated by a user operation, sends the mouse graphic data to the mouse decoding unit for decoding to obtain control decoding data, and sends the control decoding data to the mouse drawing unit; the USB unit acquires USB mouse operation data and sends the USB mouse operation data to the mouse drawing unit; the mouse drawing unit is used for drawing the mouse according to the control decoding data and the mouse operation data.

In one embodiment, the remote device further includes a keyboard and mouse encoding unit, the USB unit is further configured to send USB mouse operation data to the keyboard and mouse encoding unit, the keyboard and mouse encoding unit encodes and compresses the USB mouse operation data to obtain USB mouse operation encoded data, and sends the USB mouse operation encoded data to the image processing apparatus through the second transmission control unit.

In one embodiment, the second transmission control unit receives the control data sent by the image processing device and sends the control data to the mouse decoding unit, the mouse decoding unit sends the decoded control data to the playing control unit, and the playing control unit displays the corresponding mouse graphics according to the control data.

In a third aspect of the embodiments of the present disclosure, a communication system is provided, where the system includes a host, the image processing apparatus disclosed in the first aspect, and the remote device disclosed in the second aspect.

The disclosed embodiment can realize dynamic adjustment of transmission parameters in the communication system to transmit sound and image data. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 2 is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 3 is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 4 is a block diagram of a remote device provided by an embodiment of the present disclosure;

fig. 5 is a block diagram of a remote device provided by an embodiment of the present disclosure;

fig. 6 is a block diagram of a remote device provided by an embodiment of the present disclosure;

fig. 7 is a block diagram of a communication system provided by an embodiment of the present disclosure;

FIG. 8 is a diagram of a host architecture provided by an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of relationships between systems provided by embodiments of the present disclosure;

fig. 10 is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

FIG. 11 is a workflow of an image encoding unit provided by an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of scheduling input and output provided by an embodiment of the present disclosure;

fig. 13 is a schematic diagram illustrating a relationship between a transmission code rate, a delay and a display quality according to an embodiment of the present disclosure;

fig. 14 is a structural diagram of a transmission control unit provided in the embodiment of the present disclosure;

FIG. 15 is a schematic diagram of a redundancy generation model provided by embodiments of the present disclosure;

FIG. 16 is a schematic diagram of a prediction mechanism model provided by an embodiment of the present disclosure;

fig. 17 is a schematic diagram of a prediction mechanism model provided in the embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is an image processing apparatus provided in an embodiment of the present disclosure, and an image processing apparatus 200 shown in fig. 1 includes: an image acquisition unit 201, an image encoding unit 205, a sound acquisition unit 202, a first sound encoding unit 206, and a first transmission control unit 209;

the image acquisition unit 201 is used for acquiring a computer picture according to a preset first control parameter, the sound acquisition unit 202 is used for acquiring first sound data according to the parameter, and the first control parameter is used for controlling the acquisition of the computer picture;

the image coding unit 205 is configured to perform image coding on a computer screen according to a preset second control parameter to obtain first coded data; the first sound encoding unit 206 is configured to perform sound encoding on the first sound data to obtain second encoded data; the second control parameter is used for controlling the image coding quality;

the first transmission control unit 209 sends the first encoded data and the second encoded data to the remote device according to a preset third control parameter, where the third control parameter is used to control the transmission quality of the first transmission control unit to the remote device.

Fig. 2 is an image processing apparatus provided in an embodiment of the present disclosure, and the image processing apparatus 200 shown in fig. 2 further includes: a control decoding unit 207 and a first sound decoding unit 208;

the first transmission control unit 209 receives the reverse control data sent by the remote device, sends the reverse control data to the control decoding unit for decoding to obtain first decoded data, and sends the first decoded data to the host.

The first transmission control unit 209 receives the second sound data sent by the remote device, sends the second sound data to the first sound decoding unit 208 for decoding to obtain second decoded data, and sends the second decoded data to the host.

Optionally, the first transmission control unit 209 further receives USB mouse operation encoded data sent by the remote device, and sends the USB mouse operation encoded data to the host.

Fig. 3 is an image processing apparatus provided in an embodiment of the present disclosure, and the image processing apparatus 200 shown in fig. 3 further includes: the scheduling unit 210, the scheduling unit 210 obtains first control data of the first transmission control unit 209, second control data of the image encoding unit 205, and third control data of the control decoding unit 207, respectively, and updates the first control parameter, the second control parameter, and the third control parameter according to a preset decision model based on the first control data, the second control data, and the third control data.

Alternatively, when the third control parameter indicates that the bandwidth is insufficient, the first transmission control unit 209 preferentially transmits the first decoded data, the second encoded data, and the second decoded data.

Fig. 4 is a far-end device provided by an embodiment of the present disclosure, where the far-end device 300 shown in fig. 4 includes: a second sound decoding unit 304, an image decoding unit 303, a second transmission control unit 308, and a play control unit 302,

the second transmission control unit 308 receives the first encoded data and the second encoded data sent by the image processing apparatus, sends the first encoded data to the image decoding unit 303 to obtain third decoded data, sends the second encoded data to the second sound decoding unit 304 to obtain fourth decoded data, and sends the third decoded data and the fourth decoded data to the playback control unit 302;

the playback control unit controls 302 synchronous display playback based on the third decoded data and the fourth decoded data.

Fig. 5 is a far-end device provided by an embodiment of the present disclosure, where the far-end device 300 shown in fig. 5 further includes: a USB unit 306, a mouse decoding unit 311, and a mouse drawing unit 310,

the second transmission control unit 308 also receives mouse graphic data generated by user operation, sends the mouse graphic data to the mouse decoding unit 311 for decoding to obtain control decoding data, and sends the control decoding data to the mouse drawing unit 310; the USB unit 306 obtains USB mouse operation data and sends the USB mouse operation data to the mouse drawing unit 310; mouse drawing unit the mouse drawing unit 310 is used to draw the mouse according to the control decoding data and the mouse operation data.

Fig. 6 is a far-end device provided by an embodiment of the present disclosure, where the far-end device 300 shown in fig. 6 further includes: a keyboard-mouse coding unit 305 for coding a keyboard and mouse,

the USB unit 306 is further configured to send the USB mouse operation data to the key and mouse encoding unit 305, where the key and mouse encoding unit 305 encodes and compresses the USB mouse operation data to obtain USB mouse operation encoded data, and sends the USB mouse operation encoded data to the image processing apparatus through the second transmission control unit 308.

Optionally, the second transmission control unit 308 further receives control data sent by the image processing apparatus, and sends the control data to the mouse decoding unit 311, the mouse decoding unit 311 sends the decoded control data to the play control unit 302, and the play control unit 302 displays a corresponding mouse graphic according to the control data.

The remote device 300: the second transmission control unit 308 receives data sent from the network, and sends the data to the image decoding unit 303, the second sound decoding unit 304, and the playing control unit 302 determines when to send the data to the display unit 301 for displaying or playing; 308 also receives the mouse graphic data generated by the user operation, sends the mouse graphic data or the animation data into a mouse graphic or animation decoding 311 for decoding, and then sends the mouse graphic or animation data into a mouse drawing unit 310; the mouse drawing unit 310 simultaneously receives mouse operation data input from the second USB unit 306.

The description of the mouse operation is as follows:

firstly, a user performs a mouse operation on the remote device 300, and the remote device 300 acquires mouse operation data of the user through the second USB unit 306;

secondly, the second USB unit 306 reports the mouse operation data to the mouse drawing unit 310 to confirm the coordinates for mouse drawing; meanwhile, the second USB unit 306 sends the mouse operation data to the key and mouse encoding unit 305, the key and mouse encoding unit 305 encodes and compresses the mouse operation data and sends the mouse operation data to the second transmission control unit 308, and then the mouse operation data is sent back to the image processing apparatus 200 through the network card 309 and the network card 109, and the control input unit 203 sends the mouse operation data to the operating system of the host 100, that is, the control input unit 203 is equivalent to simulating a keyboard and a mouse;

thirdly, the host 100 executes corresponding actions according to the mouse operation data, generates display images after the actions are executed, acquires and codes the display images by the image processing device 200, and sends the coded images to the second transmission control unit 308 through the network card 109 and the network card 309; in addition, the image processing apparatus 200 needs to determine whether a mouse image (such as an arrow, a hand, a vertical bar, etc.) corresponding to the current mouse operation data appears for the first time, if so, the image processing apparatus 200 also encodes the mouse image data, sets an identifier corresponding to the mouse image data, sends the encoded mouse image data and the identifier corresponding to the mouse image to the second transmission control unit 308 through the network card 109 and the network card 309, and simultaneously stores a corresponding relationship between the mouse image and the identifier, and if not, the image processing apparatus 200 may send the identifier corresponding to the mouse image to the second transmission control unit 308 through the network card 109 and the network card 309.

Fourth, the encoded image, the encoded mouse graphic data and the identifier, or the identifier received by the second transmission control unit 308 from the host 100 is transmitted to the image decoding unit 303 for decoding, and the encoded mouse graphic data and the identifier, or the identifier is transmitted to the mouse decoding unit 311 for decoding.

If the mouse decoding unit 311 receives the encoded mouse graphics data and the identifier, the corresponding relationship between the mouse graphics and the identifier (specifically, the corresponding relationship between the mouse graphics information and the identifier) is saved, and the encoded mouse graphics data is decoded and then sent to the play control unit 302; mouse drawing unit the mouse drawing unit 310 sends the mouse coordinates to the playback control unit 302, and the playback control unit 302 displays on the display unit 301 based on the image, the mouse image, and the mouse coordinates.

If the mouse decoding unit 311 receives the identifier, it determines the mouse graphic information corresponding to the identifier according to the identifier and the correspondence stored previously, and sends the mouse graphic information to the mouse drawing unit 310; the mouse drawing unit 310 draws a mouse image based on the mouse graphics information, and at the same time, the mouse drawing unit 310 stores mouse coordinates, and the mouse drawing unit 310 displays the image, the mouse image, and the mouse coordinates on the display unit 301.

Of course, if the remote device 300 already knows the entire set of mouse images when the remote device 300 and the host 100 establish a connection, even if the mouse image in the host 100 is changed for the first time, the mouse image does not need to be sent again, and the identifier corresponding to the mouse image is sent directly.

In addition, the second transmission control unit 308 also receives audio input data of the audio input 307, the audio input data is encoded and compressed in the audio encoding device 312 and then is sent to the device 308, the encoded audio is sent to the device 200 through the network card 109 and the network card 309 through the device 308, and the encoded audio is sent to the operating system of the device 100 through the device 204, wherein the device 204 is equivalent to simulating an audio input device.

The remote device 300 may be implemented by independent hardware, or may be implemented by software integrated with other operating systems; 300 has a second transmission control unit 308, the capability of which is the same as that of the first transmission control unit 209, and the second transmission control unit 308 mainly functions to receive data, report in packets, and then perform necessary signaling feedback.

The remote device 300 further has a voice input unit 307, an input voice encoding unit 312, and a second transmission control unit 308 for transmitting back to the host 100, and finally inputting into the operating system via 108.

The remote device 300 further has a USB input unit 306, where the input includes two parts, one part is the input of keyboard and mouse, and the data of the part is encoded by the keyboard and mouse encoding unit 305, then transmitted back to 100 through 308, and finally input to the operating system through 107. Another data is that when the image processing apparatus 200 is integrated into the host 100, the second transfer control unit 308 and the first transfer control unit 209 directly perform USB redirection on the second USB unit 306 and map the second USB unit 306 to the first USB106, and at this time, the second USB 306 operates as if a USB disk or other device is directly plugged into the first USB 106.

The data received by the second transmission control unit 308 is divided into image and sound, and when the image processing apparatus 200 is integrated in the host 100, it also receives mouse graphic data. The three data are respectively sent to three decoding units for decoding, then are sent to a playing control unit 302 in a unified mode, are synchronized by the playing control unit 302, and are mainly audio and video synchronization, and finally the playing control unit 302 determines to play on a display unit 301 of each hardware device.

When the image processing apparatus 200 is integrated in the host 100, the image capturing unit 201 may select to capture only the desktop of the computer instead of the mouse; the mouse is drawn by the remote device 300 by itself, the host 100 transmits the whole set of mouse images and animations to the remote device 300 when establishing connection with the remote device 300, and only key values and mouse position information are transmitted between the host 100 and the remote device 300 when performing key and mouse interaction; the remote device 300 obtains a mouse graphic first when performing mouse drawing, and adds a layer on the basis of the original display image picture, and renders a mouse image at the position according to the mouse position obtained from the user event, where the layer may be overlapped on the original display image picture. The operation time delay can be reduced by drawing the mouse in a zooming-out mode, and the user experience is improved.

Fig. 7 is a block diagram of a communication system according to an embodiment of the present disclosure, where the communication system shown in fig. 7 includes a host 100, an image processing apparatus 200 according to any one of the embodiments, and a remote device 300 according to any one of the embodiments. The host 100 represents a computer system of a source, and the image processing apparatus 200 and the remote device 300 communicate through a network of any medium (WIFI, wired, 4G, etc.). The image processing apparatus 200 may be used by being integrated with the host 100, or may be used as a separate system independent of the host 100.

Fig. 8 is a schematic structural diagram of a host according to an embodiment of the disclosure, where the host shown in fig. 8 includes an image processing apparatus,

the overall system workflow is as follows:

source end 100: the computer generates picture and sound data and sends the picture and the sound data to the image processing unit 104 and the sound processing unit 105 respectively for processing, the image processing device 200 respectively obtains the data through the image acquisition unit 201 and the sound acquisition unit 202, sends the data to the image coding unit 205 and the first sound coding unit 206 respectively for coding, then sends the coding result to the first transmission control unit 209, and sends the coding result to the remote equipment through the first transmission control unit 209; the first transmission control unit 209 receives the reverse control data and voice data of the user from the remote device and sends them to the first voice decoding unit 208 and the control decoding unit 207 respectively for decoding, and the decoding result is sent to the voice input unit 108 and the control input unit 107 of the computer system; the first transmission control unit 209 may also receive USB redirection data sent from a remote end, and directly send the USB redirection data to the first USB unit 106 of the system.

Essentially the disclosed embodiment is shown in figure 9 in relation to a computer system.

Fig. 10 is a schematic structural diagram of an image processing apparatus (i.e., a sending end of a graph transmission system) provided in an embodiment of the present disclosure, where the image processing apparatus 0 is divided into 5 units in total, and acquires, encodes, transmits, schedules, and inputs, as shown in fig. 10, the image processing apparatus may be integrated inside a source end 100 to implement, and may also be used as an independent system; the method can be realized by software, and also can be realized by hardware such as FPGA or chip.

Wherein, the collection can simultaneously collect picture pixel data and sound data. In different implementations, the working position of the image capturing unit 201 is different. If 200 is implemented in 100, 104 sends a copy of data to 201 when the generated image data is written into the image buffer, 201 can also obtain the drawing instruction, image change data and image movement data used when the generated image data is obtained from 104, and the instruction and data can assist the encoding unit to accelerate encoding;

if 200 is independently used outside 100, 201 acquires image data through 100 physical image output interface, and can not acquire drawing command, image change data and image movement data.

The image encoding unit work flow 201 is shown in fig. 11. The image encoding unit receives the data and then determines whether auxiliary data is present. If the auxiliary data exists, the auxiliary data is analyzed, and then the result is directly used, so that the time can be saved, and the time delay can be reduced.

If no auxiliary data exists, image recognition is needed, the basic rule of recognition is to compare two successive frames of image data pixel by pixel, analyze the characteristics, then divide the image into macro blocks of different categories, and analyze the motion vector in the graph block post. The motion vector refers to the direction and distance that the tile moves within the frame. After the image recognition is completed, the motion vector is processed, if the motion vector is the motion vector, whether the current motion block is sent to the highest quality level is judged, if the current motion block is sent to the highest quality level, only motion vector data are generated and sent to a next level for packaging, and if the current motion block is not sent to the next level, the data of the next quality layer of the current block are obtained and sent to the next level for packaging together with the motion vector. If the current block has no motion, different types of image coding units are sent according to different types, including but not limited to coding units of characters, pictures, videos and the like.

The image coding in the present invention is a multi-level, i.e. progressive, coding technique, and a frame of image generates a set of coding results, which we refer to as layers or quality layers, 1 to n (n > 1) layers. For example, after a frame is encoded, there are 5 layers, after the first layer is decoded, the display may be blurred, each layer is decoded more clearly, and after all layers are decoded, the display quality of the original image, that is, the highest quality layer can be achieved. Each layer decoding is done to produce an image that can be displayed, but specifically which layer of the image produced requires a base quality layer (this text labeled green bottom below is described) to prevent the picture from fluctuating back and forth before being sharp or unclear. How large the base quality layer chooses is controlled by the scheduling control unit 210. The coding unit outputs data reaching the basic quality layer, the data of the subsequent quality layer is determined whether to be transmitted or not according to whether the current picture block position is changed or not, one layer is transmitted more (until the highest quality layer) if not, and the data is re-coded and transmitted to the basic quality layer if the change is caused.

The image coding unit can divide the original image data into image small blocks after identification, and the small blocks are divided into 5 categories, namely, characters, pictures, videos and motion vectors without changes. Text, pictures, video all support the multi-quality layer (progressive) coding described above.

Some of the coding units are lossy and some are lossless, and the characteristics of the image are related to the perception of human eyes, for example, video uses lossy coding, because the video is always dynamic, and the adoption of proper lossy coding has no influence on the subjective feeling of people, and meanwhile, the code stream can be reduced and the coding speed can be increased. Meanwhile, the coding units support multi-quality layer output, and the coding units always output basic quality layer data. For example, a block of image data is encoded and divided into 10 layers, the base quality layer may be 3, the encoding unit only needs to output data of quality layers 1-3, the remaining data remains in the context, the base quality layer of the encoding unit is determined by the scheduling control unit 210 according to the currently collected system parameters, and each encoding unit may use a different base quality layer.

In addition, a part of the image data is unchanged data, and the unchanged data refers to that whether the part of the data reaches a highest quality layer or not is judged before being output every time relative to a previous frame, if the part of the data does not reach the highest quality layer, data of a next quality layer is output for data packaging to be sent, and if the part of the data reaches the highest quality layer, only an unchanged identifier needs to be sent. The picture quality of the highest quality layer is determined by the maximum code rate, each coding unit determines the output code rate of the highest quality layer according to the image block proportion and the maximum code rate, so that the output picture quality is determined, the maximum quality layering number of each coding unit can be different, and the maximum quality layering number of each coding unit is determined by the coding unit and recorded in the context of the coding unit.

A scheduling control unit 210, which mainly collects the bandwidth, delay, packet loss rate, packet error rate and next transmission network condition prediction reported from the first transmission control unit 209; 210 also collects the operation time, output code rate and output prediction of the next frame of each current coding unit reported by the image coding unit 205 and the first sound coding unit 206; the schedule control unit 210 also collects keyboard and mouse control information for controlling the decoding unit 207. The scheduling control unit 210 establishes a decision model based on the above data, outputs three sets of parameters, acquires control parameters, encodes control parameters, and transmits control parameters.

The scheduling control unit 210 will be described with reference to two examples according to the bandwidth, the delay, the packet loss rate, the packet error rate, the next prediction of the transmission network condition, the operation time of each current coding unit, the output code rate, and the output prediction of the next frame, and the prediction adjustment of the parameters in the next stage:

example 1: the current bandwidth is 10M, transmission delay is 2ms, the packet loss rate is 0.01%, the packet error rate is 0, next transmission network condition prediction is that next stage still is this kind of better state, the current frame rate of gathering is 5 frames, the present average coding delay of coding unit is 10ms, the current output of coding unit is 2Mbps, the next frame is expected that the code rate is undulant and is kept 2Mbps, the current highest quality layer is 6, the basic quality layer is 3, the user has key mouse operation action at present, and keyboard event appearance time accounts for more.

At this time, the optimal decision output by the scheduling control unit 210 should be: the bandwidth is enough, the transmission is not limited, and because the packet loss rate is not high, the user may edit the document in real time at the moment, the delay requirement is high, and the redundancy quantity can be properly increased, for example, the occupation ratio reaches 1% of the total data quantity; the current user real-time text operation has higher requirements on the keyboard and mouse events, the frame rate can be increased to 10 frames, the unit conditions in the next stage are unchanged, and the frame rate can be continuously increased to 15 frames (the frame rate can be as high as 20 frames during character editing, not video, and the human eye feeling is not different); the characters have higher requirements on definition, the basic quality layer is improved to 4, and if the conditions of all units are unchanged at the next stage, the basic quality layer can be continuously improved and does not exceed the highest quality layer to the maximum extent. The coding unit does not need to limit the code rate, the bandwidth of the next frame is enough, and the next frame can exceed 2 Mbps.

Example 2, following the above scenario, if the user does not edit the document at this time, but instead sees the video, at this time, the bandwidth is 10M, the transmission delay is 3ms, the packet loss rate is 0.01%, the packet error rate is 0, the next transmission network condition is predicted to be a better state in the next stage, the current acquisition frame rate is 20fps (the above scenario continues all the time, the frame rate may reach the maximum frame rate under text operation, 20fps), the current average coding delay of the coding unit is 10ms, the current output of the coding unit is 5Mbps, the predicted code rate of the next frame may be 6-7Mbps, the current highest quality layer is 6, the basic quality layer is 6 (the above scenario continues all the time, the basic quality layer may reach the highest quality layer), and at this time, the keyless mouse operation behavior is performed.

At this time, the optimal decision output by the scheduling control unit 210 should be: according to the recognition and the recognition feedback of the coding unit, the frame rate of the acquisition is improved, namely 25fps is first, (60 fps is the highest, and the refresh frame rate of the pc machine is 60fps at present), because human eyes can not feel the picture interruption until 24fps is reached, the picture can be continuously improved subsequently; scheduling firstly reduces the basic quality layer of the coding unit to half of the original quality layer, and then the basic quality layer is changed into 3; at the moment, under a natural video, the code stream can be reduced, a lossy video coding unit is used, the code stream is limited to 5Mbps, the current frame rate is input to the coding unit at 25fps (the subsequent acquisition frame rate is increased, and the control unit needs to be scheduled to continuously set the parameters), and the coding unit can adjust the coding parameters according to the limited code stream and the frame rate, so that the fluctuation of the average code stream at 5Mbps is ensured; the 10M bandwidth is enough for transmission because no operation is carried out at the moment, the time delay requirement is reduced, the transmission data volume is increased, the redundancy quantity can be reduced at the moment, if the occupation ratio reaches 0.5% of the total data volume, other lost data are recovered through retransmission, even if one frame is lost, the influence on video playing is not too large, meanwhile, the scheduling strategy is adjusted, whether other channels in the whole transmission are used less currently is checked, and the preset bandwidth can be distributed to audio and video channels for use.

Wherein the decision model in the scheduling control unit 210 may be based on a neural network model.

Presetting a large number of input and output parameters, inputting an input reference into an original neural network model, training the original neural network model, finishing the model training when a difference value between a result output by the model and the preset output parameters meets a condition, and taking the trained neural network model as a decision model for practical application.

However, since the actual network state is complex and variable, the network condition may be different in each period of time, such as a wide area network, which may have good quality for two days, and the delay may become small after several days, but the packet loss becomes large, and the like. The training time is calculated according to the year. For example, after training for 1 year, the machine unit can acquire, encode, decode, and display performance, and more than 80% of network conditions can be met, but all conditions cannot be met. Therefore, the decision model needs to be adjusted, that is, the prediction model needs to be continuously trained during the application process.

In practical application, the output parameters with better decision results and the corresponding input parameters in the current network state can be recorded for training the decision model, so that the decision model can be adjusted according to the current network state.

In addition, the application network can be divided into a local area network and a wide area network, the condition difference of the two networks is large, and in order to obtain a better decision result, two decision models can be trained respectively aiming at the local area network and the wide area network.

The decision model is used in the actual engineering and then is continuously trained through machine unit learning, and finally a mapping relation among a group of coding and decoding units, acquisition, transmission and final picture quality is established.

Specifically, the acquisition control parameters may affect the sampling frequency and the output speed of the image acquisition unit 201 and the sound acquisition unit 202. The encoding control parameter data affects the encoding quality, the operation time, and the output code rate of the image encoding unit 205 and the first sound encoding unit 206. Taking the image coding 205 as an example, the scheduling control unit 210 may modify the basic quality layer to determine the display quality of the basic output picture, or may modify the maximum output code rate to affect the display quality when the picture is still, or may schedule and control the image coding unit to find that the possible coding time is too long according to the output prediction of the next frame, but the network prediction may delay too long, thereby limiting the shortest running time of the image coding unit, and the image coding unit may adjust the coding parameters according to the running time to accelerate the running, thereby ensuring smooth remote display output. The scheduling input and output are shown in fig. 12.

All parameters are adjusted according to the parameters, and the two parameters, namely the transmission code rate and the time delay, are finally influenced, and the two parameters finally influence the playing and displaying quality of the remote equipment. The relationship is shown in fig. 14, where the delay is inversely proportional to the display quality, and may be unacceptable to the user to some extent. And when the code rate is improved to a certain degree, the best quality can be achieved, and the code rate is not improved, so that the display quality is not helped.

The first transmission control unit 209 and the second transmission control unit 308 mainly implement functions of unpacking, sending, receiving, packing, packet loss retransmission, delay control, error correction control, and the like of data.

The first transmission control unit and the second transmission control unit are structured as shown in fig. 14.

The first transmission control unit 209 and the second transmission control unit 308 include a display protocol and description map 2091, which displays all possible protocols generated in the upper layer, and generates a description content map for data generated in the upper layer, where the description content includes necessary attributes of current data, such as a frame number, a frame type, a quality level, and the like, and the upper layer may divide transmission weight priorities of various types of data according to the display protocol and the description map 2091 and determine a queue length and a transmission order of each data transmission according to the description, the priorities, and the weights. The upper layer may correspond to data such as images, sounds, keyboard and mouse manipulation data, file stream data, and other control or scheduling instructions.

The display protocol and description map 2091 is used to generate the description content, i.e., the necessary attributes of the current data, such as the frame number, frame type, quality level, etc., for example, using the image data.

The transmit queue scheduler 2092 is used to schedule each queue.

Event controller 2093, which is equivalent to a state machine for event processing, because the upper layer protocol essentially does not directly interact with the data encoding and the bottom layer link socket transmission, and the respective processing flows are different, the time of data generation and consumption is not completely synchronized; in addition, the external dynamic control 2094 modifies the parameter event, and the network events reported by the network adaptive control 2096 are all possibly generated at any time and are completely asynchronous; event controller 2093 has an event queue inside, and divides the event into different event processing threads and event queues, where each processing thread corresponds to an event queue; currently, 2 processing threads and 2 event processing queues are arranged in the system; a sending thread and a receiving thread.

The external dynamic control 2094 reports network events or information such as the delay packet loss rate to the scheduling control unit 210 for decision, and the scheduling control unit 210 sends back parameters such as the quantized transmission index, for example, delay, the size of the code stream, and the like to the external dynamic control 2094, so that the external dynamic control 2094 adjusts the length of each queue according to the parameters.

Statistics 2095 includes receiving data sent by the sending thread, the receiving thread of the event controller 2093, and the receiving thread of the network adaptive control 2096, receiving data, packet loss parameters, and timestamps, and calculating network parameters such as time delay, effective bandwidth, packet loss rate, packet error rate, and RTT. Meanwhile, the statistical result is reported to the scheduling control unit 210.

An algorithm pool 2097, which provides algorithms of different models for real-time computation to generate a quantitative result. Such as FEC forward error correction model algorithm, for generating redundant messages; the flow control algorithm is used for controlling the size of the network output code stream; the network prediction model algorithm is used for various algorithms such as network behavior prediction and the like.

The scheduling control unit 210 will specify a certain queue target time delay for transmission, a queue length, adjust a queue priority, and control the redundancy generation model to recover the lost data according to an error correction algorithm under the condition of not retransmitting data; synchronously controlling the sending time of each queue of the audio and video, because the audio and video needs to be played and synchronously controlled on the remote equipment 300; in the transmission process, the audio/video queue needs to be scheduled as fairly as possible, so that the audio/video data can be kept substantially synchronous, and the processing error of the remote device 300 can be reduced.

Fig. 15 shows a schematic description of the redundancy generation model:

the redundancy generation model comprises five units, an algorithm unit (AM), a control unit (CM), a redundancy calculation unit (RM), a transmission unit (TM) and a data analysis unit (PM).

The algorithm unit contains various algorithms for generating redundancy models, including but not limited to convolutional codes, hamming codes, BCH codes, RS codes, Turbo codes, LDPC codes, TPC codes, etc. (some units may require support of special hardware).

The control unit CM is a set of various strategies, and different decision mechanisms such as a hardware platform screening strategy, a data classification strategy, a network state judgment strategy and the like are adopted. And carrying out comprehensive judgment according to the data fed back by each unit, calling different redundancy algorithms in the AM, and finally generating a redundancy model. The CM balances fault tolerance, computational power, and load ratio in selecting a model.

And the redundancy calculation unit RM generates redundancy for the grouped data according to the redundancy model, the corresponding check code algorithm and the control packet dividing strategy and then sends the redundancy.

The transmission unit TM is responsible for generating data transmission and detecting states in the link, such as packet loss rate, packet error rate, RTT, TTL, and the like. These status data are fed back to the control unit as input parameters for the next phase transmission.

And the data analysis unit PM analyzes the received data, judges whether packet loss exists or not, and reversely calls a redundancy calculation method according to the existing information to recover the lost data if the packet loss exists.

The detailed execution flow is described below.

The redundancy generation model can be divided into a transmitting end and a receiving end.

A sending end flow: after data is input into a system, the CM unit performs transmission priority division on the data according to a data classification strategy, important and emergency programs of the data and the size of data volume; and the CM receives the network state data fed back by the TM, calls various algorithms in the AM according to the data priority and the network state, establishes a redundancy model and updates the redundancy model into the RM, and the RM calculates a result according to the redundancy model and sends the result to the transmission unit through the network.

A receiving end process: the TM analyzes information after receiving the data at the receiving end and inputs the information into the CM, the CM invokes the AM according to the analysis result to construct a redundancy model and updates the redundancy model into the PM, the TM judges whether packet loss exists or not at the moment, and if the packet loss exists, relevant data are input into the PM to recover the lost data.

Because the characteristics of signaling, image, sound and keyboard and mouse data are different, the coding modes are different, and the data volume is different, the upper layer can adopt different transmission strategies for different types of data. In the transmission control 209 and the transmission control 308, each type of data corresponds to a separate queue for transmission scheduling, the queue lengths are different, and the queue lengths affect the transmission delay.

For example, the requirements for the delay of image data are slightly different under different scenes, the delay can be properly increased (the audio and video synchronization is ensured) when the user watches the video, the perception of the user is not high, the queue length can be increased, the redundancy ratio can be properly reduced (the calculation force is reduced, the effective load ratio is improved), and the queue priority is the lowest; however, when a document is edited, the requirement on the delay is higher, because the user always interacts with a source computer at the moment, the requirement on the operation response is higher, the queue length is reduced at the moment, the redundancy occupation ratio is improved to reduce the delay, the fluency of the user operation is ensured, and the queue priority is improved to be next to the keyboard and mouse data.

For another example, the audio is simple and is divided into the cases of sound and no sound, the queue length in the presence of sound can be determined according to the video queue length, because audio and video synchronization is required, the queue priority can be the same as the video, because no sound is generated, data is not generated, other queue scheduling is not affected, and the queue length can be set to 1 (which can be slightly longer to prevent the sudden sound) or not set in the absence of sound.

The signaling is interactive data generated in the process of establishing a connection between the source 200 and the remote 300, and the amount of these data is small, but it is important, and the delay requirement is low (second level), at this time, in order to ensure the reliability of transmission, the redundancy can be improved to 100% or even higher, the queue length can be set to 10% of the total data amount, for example, there are 100 signaling per second, the queue length is initially 10, and the queue priority is higher than the video playing, but lower than the video queue during text operation. The frequency of keyboard and mouse events is high, the number of generated events is one order of magnitude higher than that of audio and video, particularly a mouse, but the data volume of each event is small, the requirement on time delay is high, the requirement on accuracy is high, the queue length is generally one order of magnitude higher than that of a video queue, for example, the video is 20, the keyboard and mouse can be 200, and the priority of a keyboard and mouse queue is highest. The length of each queue can be dynamically regulated and controlled according to the current operation scene and the time delay requirement of the scheduling feedback.

These above queue adjustments are actually performed at 2094. The execution instruction originates from the schedule control unit 210.

When a transmission unit receives a piece of data, 2091 sends the data to different data queues (i.e. the above audio, video, keyboard and mouse operation event queue, etc.) according to the category, 2091 sends a data sending event to the event queue of the 2093 scheduling thread, after the sending thread receives the event, the sending thread has only one sending thread, which may correspond to the events generated by a plurality of upper-layer protocols, the events need to be merged and filtered, and then a scheduling event is generated to trigger the transmission queue scheduling 2092 to work; the transmission queue scheduling 2092 executes scheduling, an independent scheduling thread runs a scheduling algorithm in the transmission queue scheduling 2092, a data packet to be sent is found out, the data is sent to the network adaptive control 2096, the network adaptive control 2096 encapsulates the data (the encapsulation is to unpack, generate redundancy and add extra control information for packet loss data recovery or retransmission), and the encapsulated data is sent to a data buffer area of a socket at the bottom layer of the network 2098. Network 2098 has a dedicated send thread running all the time, and the read buffer data is sent out continuously.

The network 2098 also has a special receiving thread operating constantly corresponding to the physical link to ensure receiving efficiency, a receiving event is generated to the network adaptive control 2096 after data is received, the network adaptive control 2096 also has an independent receiving thread to read data from a cache area of the network 2098, then decapsulation is performed (decapsulation is performed to check whether data is lost or not, if data is not lost, redundancy and control information are removed, and then the data is filled to a specified position, if packet is lost, calculation is performed through the control information and the redundancy to recover, if the data is lost too much, a packet loss report is sent to a source end, and the source end retransmits corresponding data), after decapsulation is completed, whether the data corresponds to a complete data message of an upper layer protocol is judged, if a reporting event is generated, the data is processed by the receiving thread of the event controller 2093, and the receiving thread of the event controller 2093 is responsible for returning the data to the upper layer through a reporting interface registered by the display protocol and the description mapping 2091 Using; if there is a packet loss event, the network adaptive control 2096 also reports the packet loss parameters, including the packet number and size data, to the statistical thread of the event controller 2093.

The transmit queue scheduling 2092 schedules the queues as follows: the transmit queue scheduler 2092 determines which queue data should be sent at the current time based on the display protocol and description mapping 2091 and the feedback from the network adaptive control 2096, and the transmit queue scheduler 2092 runs when there is data in the queue until all the data in the queue is sent. The transmit queue schedule 2092 follows the following equation at run-time. That is, each time the transmission queue scheduling 2092 selects which message in the transmission queue, the message that can be sent most quickly in all queues can be found according to the following formula, and the message is selected to be sent until all messages in the queues are sent completely.

ats, the generation time of the message to be sent currently (at the head of the queue) in each queue;

transmission start time, initially, last (virt _ ft) does not exist because virt _ ft does not exist; virt _ st is the generation time of the message with the earliest generation time in all the messages ats of the queue; wherein, max means to find out the earliest generated message in the queue, and the calculation is performed again in each subsequent scheduling, so virt _ st is a weight data and cannot be calculated as the real transmission start time;

last (virt _ ft): the estimated transmission completion time calculated last by the last scheduling when the scheduling is started;

virt _ ft, namely the predicted transmission completion time, which is actually weight data and is not the real transmission completion time;

the larger the size of the packet is, the longer the transmission time is;

r is the current total available bandwidth;

ri: the occupation ratio of the current allocated bandwidth of each queue in the total available bandwidth;

wi refers to the queue weight (priority specified by 210) data;

this ensures that each queue is called and that starvation does not occur. The algorithm is an improvement on an operating system job scheduling algorithm, and queue priority parameters are added for adapting to the transmission of the existing data with different priorities.

The network adaptive control 2096 establishes a closed-loop prediction mechanism, and predicts the network condition of a sending period according to the result of statistics 2095 and the input of an external dynamic control 2094 and a network prediction model of an algorithm pool 2097, so that the network bandwidth delay and some network events, such as congestion, ultra-large delay, abnormal packet loss, burst flow, disorder and the like, can be predicted, and the information is fed back and reported to influence the size and output speed of an upper-layer output code stream or self-consumption to adjust the current transmission strategy.

To mention several common situations, one is to count and predict that the transmission bandwidth is sufficient, the delay is low, and the output code stream is small at this time, and at this time, the network adaptive control 2096 reports the current situation, and the scheduling control unit 210 controls each coding unit on the upper layer to increase the code stream, so as to improve the quality and improve the output quality of the remote display device. Secondly, it is counted and predicted that the network delay of the next sending period is small, but the bandwidth is insufficient, at this time, the self-adaptation 2096 needs to control to send the control data preferentially, the quantity of the data is small, the requirement on transmission quality is high, then, the data with large quantity and low requirement on transmission quality is inquired to display whether the protocol and the description mapping 2091 can be discarded or not, the transmission pressure is reduced, and on the other hand, the sending cache is adjusted to prevent the loss of the data which cannot be discarded. Thirdly, the current network bandwidth is enough, the delay is low, but the packet error rate is high, at this time, the FEC model is adjusted to generate more redundant packets, and the data can be recovered without retransmission as much as possible.

The predictive mechanism model is shown in fig. 16 (including but not limited to the following input-output conditions).

Wherein the prediction mechanism model may be based on a neural network model.

A large number of input and output parameters are preset as training data, and the input parameters may include: receiving bandwidth, retransmission ratio, packet loss rate, network delay and inter-packet delay, and the output parameters may include: bandwidth and latency.

Inputting an input reference into an original neural network model, training the original neural network model, finishing the model training when a difference value between a result output by the model and a preset output parameter meets a condition, and taking the trained neural network model as a prediction mechanism model for practical application.

The application network can be divided into a local area network and a wide area network, and because the condition difference of the two networks is large, in order to obtain a better decision result, the training data can be divided into two groups, so that two prediction mechanism models which are respectively trained aiming at the local area network and the wide area network are obtained, the training complexity can be effectively reduced, and the accuracy of the model prediction result can be improved.

However, since the actual network state is complex and changeable, the prediction mechanism model training needs to be continuously performed in the actual application, that is, a basic model is trained by using the existing data, and then the basic model is tested and used, and is continuously trained in the using process.

In practical application, output parameters with good prediction results and corresponding input parameters in the current network state can be recorded for training the prediction mechanism model, so that the prediction mechanism model can be adjusted according to the current network state.

In the present invention, the scheduling control unit 210 also has a supplementary image change prediction mechanism for assisting the encoding when the image processing apparatus 200 is used independently outside the host 100. In this case, the image processing apparatus 200 acquires information such as a drawing command, an image change, and an image movement that cannot be acquired by the host 100, and without this unit, it is necessary to perform image recognition by comparing pixels, generate a motion vector, and change position information. The image change prediction mechanism of the unit establishes a corresponding relation between user operation and image change, different operations of the user can influence the changes of the image, and the change results are generated to assist the work of the coding unit. The prediction mechanism model is shown in fig. 17.

The main function of the prediction mechanism unit is to assist the encoding unit in image recognition so as to reduce the calculation power and time delay consumed by encoding as much as possible. The prediction mechanism unit can be a machine unit learning algorithm model, such as a neural network, k-nearest neighbor, Bayes, decision trees, svm, logistic regression, a maximum entropy model, hidden Markov, conditional random field, adaboost, em and other algorithms, preferably a neural network. Because the larger the training data amount is, the better the neural network training result is, at present, enough data are used for training the neural network model, so that the prediction effect of the trained prediction mechanism model is better.

Of course, the prediction mechanism model may also be built using other machine unit learning algorithm models.

A large number of input and output parameters are preset as training data, and the input parameters may include: image recognition algorithm, bandwidth, keyboard and mouse control time, and the output parameters can include: moving and changing regions, wherein the result of the output parameters in the training data may be calculated by an image recognition algorithm based on pixel comparison.

Inputting the input parameters into an original neural network model, training the original neural network model, finishing the model training when the difference value between the output result of the model and the preset output parameters meets the condition, and taking the trained neural network model as a prediction mechanism model for practical application.

In this way, the output result of the prediction mechanism model can be sent to the encoding unit, and calculating the movement and change area using the prediction mechanism model can reduce the consumption of computational resources (which cannot be completely eliminated because the prediction is only aided and the result is not necessarily completely correct) and reduce the amount of computation compared with calculating the movement and change area using an image recognition algorithm based on pixel comparison.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image processing apparatus, characterized in that the apparatus comprises: the system comprises an image acquisition unit, an image coding unit, a sound acquisition unit, a first sound coding unit and a first transmission control unit;

the image coding unit is used for carrying out image coding on the computer picture according to a preset second control parameter to obtain first coded data; the first sound coding unit is used for carrying out sound coding on the first sound data to obtain second coded data; the second control parameter is used for controlling the image coding quality;

and the first transmission control unit sends the first coded data and the second coded data to the remote equipment according to a preset third control parameter, and the third control parameter is used for controlling the transmission quality of the first transmission control unit to the remote equipment.

2. The apparatus of claim 1, further comprising a control decoding unit and a first sound decoding unit;

the first transmission control unit receives reverse control data sent by the remote equipment, sends the reverse control data to the control decoding unit for decoding to obtain first decoding data, and sends the first decoding data to the host.

3. The apparatus according to claim 2, wherein the first transmission control unit receives USB mouse operation encoding data transmitted by a remote device and transmits the USB mouse operation encoding data to the host.

4. The apparatus according to any one of claims 1 to 3, further comprising a scheduling unit, wherein the scheduling unit obtains first control data of the first transmission control unit, second control data of the image encoding unit, and third control data controlling the decoding unit, respectively, and updates the first control parameter, the second control parameter, and the third control parameter according to a preset decision model based on the first control data, the second control data, and the third control data.

5. The apparatus of claim 4, wherein the first transmission control unit sequentially transmits the first decoded data, the second encoded data, and the second decoded data at a preset priority when the third control parameter indicates that the bandwidth is insufficient.

6. A remote device, the remote device comprising: a second sound decoding unit, an image decoding unit, a second transmission control unit and a play control unit,

7. The remote device according to claim 6, wherein the remote device further comprises a USB unit, a mouse decoding unit, and a mouse rendering unit, and the second transmission control unit further receives mouse graphics data generated by a user operation, sends the mouse graphics data to the mouse decoding unit for decoding to obtain control decoding data, and sends the control decoding data to the mouse rendering unit; the USB unit acquires USB mouse operation data and sends the USB mouse operation data to a mouse drawing unit; and the mouse drawing unit is used for drawing a mouse according to the control decoding data and the mouse operation data.

8. The remote device according to claim 7, wherein the remote device further comprises a keyboard and mouse encoding unit, the USB unit is further configured to send USB mouse operation data to the keyboard and mouse encoding unit, the keyboard and mouse encoding unit encodes and compresses the USB mouse operation data to obtain USB mouse operation encoded data, and sends the USB mouse operation encoded data to the image processing apparatus through the second transmission control unit.

9. The remote device according to claim 8, wherein the second transmission control unit receives control data sent by the image processing apparatus and sends the control data to the mouse decoding unit, the mouse decoding unit sends the decoded control data to the playback control unit, and the playback control unit displays a corresponding mouse graphic according to the control data.

10. A communication system, characterized in that the system comprises a host, an image processing apparatus according to any one of claims 1 to 5 and a remote device according to any one of claims 6 to 9.