WO2021168827A1 - 图像传输方法及装置 - Google Patents
图像传输方法及装置 Download PDFInfo
- Publication number
- WO2021168827A1 WO2021168827A1 PCT/CN2020/077260 CN2020077260W WO2021168827A1 WO 2021168827 A1 WO2021168827 A1 WO 2021168827A1 CN 2020077260 W CN2020077260 W CN 2020077260W WO 2021168827 A1 WO2021168827 A1 WO 2021168827A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- resolution
- resolution image
- low
- residual
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/59—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
Definitions
- the embodiments of the present application relate to image processing technology, and in particular, to an image transmission method and device.
- Video generally refers to various technologies that capture, record, process, store, transmit, and reproduce a series of still images in the form of electrical signals.
- the higher the resolution of the video the higher the definition of the video, and the larger the amount of video data. Since the visual effects of high-resolution videos are more realistic than those of low-resolution videos, in order to provide users with a good experience, more and more video application scenarios choose to transmit high-resolution videos in real time, such as the Internet. Live broadcast, watch high-definition video online, high-definition video conference, etc.
- the embodiments of the present application provide an image transmission method and device, which are used to implement high-resolution video transmission.
- an embodiment of the present application provides an image transmission method, which can be applied to the sending end in the transmission system, and can also be applied to the chip in the sending end.
- the method in the embodiments of the present application is used to solve the problem of large transmission volume of high-resolution video caused by the existing way of transmitting high-resolution video.
- the first high-resolution image is converted into a first low-resolution image, wherein the first resolution of the first high-resolution image is higher than the second resolution of the first low-resolution image.
- the image residual between the first high-resolution image and the second high-resolution image is obtained, and the image residual is used to reflect low-frequency information.
- the image residual is encoded to obtain a second code stream. Sending the first code stream and the second code stream.
- the high-frequency information mentioned above is image detail information
- the low-frequency information is image contour information.
- the image detail information changes more rapidly relative to the image contour information.
- the method can be applied to transmit video, and the first high-resolution image is a frame of image in the video.
- the sending end can decompose the high-resolution image to be transmitted into low-resolution images, and transmit the image residuals used to reflect the low-frequency information of the high-resolution images. Since the data amount of the image residuals used to reflect the low-frequency information of the high-resolution image in this embodiment is much smaller than the image residuals used to reflect the low-frequency information and high-frequency information of the high-resolution image. Therefore, the transmission bit rate required to transmit the image residuals used to reflect the low-frequency information of the high-resolution images in this embodiment is compared with the image residuals used to transmit the low-frequency information and high-frequency information of the high-resolution images. The required transmission code rate will be greatly reduced, that is, the transmission volume will be reduced.
- the transmission bit rate required for low-resolution images is much lower than the transmission bit rate required for high-resolution images. Therefore, in this embodiment, a lower transmission code rate can be used between the sending end and the receiving end, and the transmission of high-resolution images can be realized.
- the transmission of high-resolution video can be realized by using a lower transmission code rate, which can meet the transmission code rate supported by most transmission equipment, so as to meet the requirements of real-time transmission of high resolution. Rate the application scenarios of the video.
- the foregoing obtaining the second high-resolution image may include: decoding the first code stream to obtain the second low-resolution image.
- the second low-resolution image is used to reconstruct the second high-resolution image.
- a neural network is used to reconstruct the second high-resolution image.
- the second high-resolution image obtained by the sending end covers the editing of the first low-resolution image during transmission.
- the sender is based on the second high-resolution image and the first high-resolution image, and the acquired image residual also covers the coding and decoding loss of the first low-resolution image in the transmission process.
- the transmission process of the first low-resolution image can be eliminated.
- there is only the transmission loss of the image residual on the transmission path which reduces the loss of the third high-resolution image recovered by the receiving end and improves the image quality.
- the above-mentioned acquisition of the image residual between the first high-resolution image and the second high-resolution image may include: The pixel value of the first pixel in the rate image is subtracted from the pixel value of the second pixel corresponding to the first pixel in the second high-resolution image to obtain the first pixel residual in the image residual.
- the above-mentioned obtaining the image residual between the first high-resolution image and the second high-resolution image may include: converting the second high-resolution image to the fourth high-resolution image. Resolution image, where the fourth high-resolution image has the first resolution.
- the pixel value of the first pixel in the first high-resolution image is subtracted from the pixel value of the third pixel in the fourth high-resolution image corresponding to the first pixel to obtain the second in the image residual Pixel residuals.
- image residuals can be obtained in a variety of ways, which expands the application scenarios of the solution.
- the foregoing encoding of the first low-resolution image may include: encoding the first low-resolution image in a lossy encoding manner. In this way, the transmission code rate of the first low-resolution image can be reduced, thereby further reducing the transmission code rate of the high-resolution image between the sending end and the receiving end.
- the foregoing encoded image residual may include: encoding the image residual in a lossy encoding manner. In this way, the transmission code rate of the image residual can be reduced, thereby further reducing the transmission code rate of the high-resolution image between the sender and the receiver.
- the foregoing encoded image residual may include: encoding the image residual in a lossless encoding manner. In this way, the transmission loss of the image residual can be avoided, the quality of the image residual can be ensured, and the quality of the image restored by using the image residual can be guaranteed.
- the embodiments of the present application provide an image transmission method, which can be applied to the sending end in the transmission system, and can also be applied to the chip in the sending end.
- the method of the embodiment of the present application is used to solve the problem of large transmission loss of the high-resolution video caused by the existing method of transmitting high-resolution video.
- the first high-resolution image is converted into a first low-resolution image
- the first resolution of the first high-resolution image is higher than the second resolution of the first low-resolution image.
- the first low-resolution image is encoded, for example, the first low-resolution image is encoded in a lossy encoding manner to obtain a first code stream. Decode the first code stream to obtain a second low-resolution image.
- the second low-resolution image is used to reconstruct a second high-resolution image, and the third resolution of the second high-resolution image is higher than the second resolution.
- the image residual is encoded to obtain the second bitstream. Send the first code stream and the second code stream.
- the method can be applied to transmit video, and the first high-resolution image is a frame of image in the video.
- the sending end uses the decoded second low-resolution image to obtain the second high-resolution image, so that the second high-resolution image obtained by the sending end covers the second high-resolution image.
- the sender is based on the second high-resolution image and the first high-resolution image, and the acquired image residual also covers the coding and decoding loss of the first low-resolution image in the transmission process. In this way, after the receiving end obtains the second high-resolution image based on the second low-resolution image, when the second high-resolution image is combined with the decoded image residual, the transmission process of the first low-resolution image can be eliminated.
- the image quality of the high-resolution video can be improved, and the user experience can be improved.
- the foregoing reconstruction of the second high-resolution image using the second low-resolution image may include: using the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the above-mentioned acquisition of the image residual between the first high-resolution image and the second high-resolution image may include: The pixel value of the first pixel in the rate image is subtracted from the pixel value of the second pixel corresponding to the first pixel in the second high-resolution image to obtain the first pixel residual in the image residual.
- the above-mentioned obtaining the image residual between the first high-resolution image and the second high-resolution image may include: converting the second high-resolution image to the fourth high-resolution image. Resolution image, where the fourth high-resolution image has the first resolution.
- the pixel value of the first pixel in the first high-resolution image is subtracted from the pixel value of the third pixel in the fourth high-resolution image corresponding to the first pixel to obtain the second in the image residual Pixel residuals.
- image residuals can be obtained in a variety of ways, which expands the application scenarios of the solution.
- the foregoing encoded image residual may include: encoding the image residual in a lossy encoding manner. In this way, the transmission code rate of the image residual can be reduced, thereby further reducing the transmission code rate of the high-resolution image between the sender and the receiver.
- the foregoing encoded image residual may include: encoding the image residual in a lossless encoding manner. In this way, the transmission loss of image residuals can be avoided, the loss of the third high-resolution image recovered by the receiving end is further reduced, and the quality of the image is improved.
- the method can be applied to transmit video, and the first high-resolution image is a frame of image in the video.
- the embodiments of the present application provide an image transmission method, which can be applied to the receiving end in the transmission system, and can also be applied to the chip in the receiving end. Specifically: receiving the first code stream and the second code stream. Decode the first code stream to obtain a second low-resolution image. The second low-resolution image is used to reconstruct a second high-resolution image, the first resolution of the second high-resolution image is higher than the second resolution of the second low-resolution image, and the second high-resolution image includes The high-frequency information of the first high-resolution image, and the low-frequency information of the first high-resolution image is excluded.
- the second code stream is decoded to obtain an image residual between the first high-resolution image and the second high-resolution image, and the image residual is used to reflect low-frequency information.
- the second high-resolution image is combined with the image residual to obtain the third high-resolution image.
- the above-mentioned high-frequency information is image detail information
- the low-frequency information is image contour information.
- the image detail information changes more rapidly relative to the image contour information.
- the method can be applied to transmit video, and the first high-resolution image is a frame of image in the video.
- the foregoing reconstruction of the second high-resolution image using the second low-resolution image may include: using the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- combining the second high-resolution image with the image residual to obtain the third high-resolution image may include: combining the pixel value of the second pixel in the second high-resolution image, Add the residual of the pixel corresponding to the second pixel in the image residual to obtain the third pixel corresponding to the second pixel in the third high-resolution image.
- an embodiment of the present application provides a transmission device.
- the transmission device may include: a first processing module, a first encoding module, a second processing module, a third processing module, a second encoding module, and a sending module.
- the transmission device may further include: a decoding module.
- the first processing module is configured to convert a first high-resolution image into a first low-resolution image, the first resolution of the first high-resolution image is higher than the second resolution of the first low-resolution image.
- the first encoding module is used to encode the first low-resolution image to obtain the first code stream.
- the second processing module is used to obtain a second high-resolution image, the third resolution of the second high-resolution image is higher than the second resolution, and the second high-resolution image includes the high-resolution image of the first high-resolution image. Frequency information, and excludes the low-frequency information of the first high-resolution image.
- the third processing module is used to obtain an image residual between the first high-resolution image and the second high-resolution image, and the image residual is used to reflect low-frequency information.
- the second encoding module is used to encode the image residual to obtain the second code stream.
- the sending module is used to send the first code stream and the second code stream.
- the above-mentioned high-frequency information is image detail information
- the low-frequency information is image contour information.
- the image detail information changes more rapidly relative to the image contour information.
- the decoding module is used to decode the first code stream to obtain the second low-resolution image.
- the second processing module is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image.
- the second processing module is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the first resolution and the third resolution are the same; the third processing module is specifically used to compare the pixel value of the first pixel in the first high-resolution image with that of the second high-resolution image. The pixel value of the second pixel corresponding to the first pixel in is subtracted to obtain the first pixel residual in the image residual.
- the first resolution is greater than the third resolution
- the third processing module is specifically used to convert the second high-resolution image into a fourth high-resolution image, and to change the resolution of the first pixel in the first high-resolution image
- the pixel value is subtracted from the pixel value of the third pixel corresponding to the first pixel in the fourth high-resolution image to obtain the second pixel residual in the image residual, where the fourth high-resolution image Has the first resolution.
- the first encoding module is specifically configured to encode the first low-resolution image in a lossy encoding manner.
- the second encoding module is specifically configured to encode image residuals in a lossy encoding manner.
- the second encoding module is specifically configured to encode image residuals in a lossless encoding manner.
- the device is applied to transmit video
- the first high-resolution image is a frame of image in the video.
- beneficial effects of the transmission device provided by the above-mentioned fourth aspect and each possible implementation manner of the fourth aspect can be referred to the beneficial effects brought about by the above-mentioned first aspect and each possible implementation manner of the first aspect, which will not be added here. Go into details.
- an embodiment of the present application provides a transmission device, which may include: a first processing module, a first encoding module, a decoding module, a second processing module, a third processing module, a second encoding module, and a sending module .
- the first processing module is configured to convert a first high-resolution image into a first low-resolution image, the first resolution of the first high-resolution image is higher than the second resolution of the first low-resolution image.
- the first encoding module is configured to encode a first low-resolution image (for example, encoding the first low-resolution image in a lossy encoding manner) to obtain a first code stream.
- the decoding module is used to decode the first code stream to obtain the second low-resolution image.
- the second processing module is configured to reconstruct a second high-resolution image using the second low-resolution image, and the third resolution of the second high-resolution image is higher than the second resolution.
- the third processing module is used to obtain the image residual between the first high-resolution image and the second high-resolution image.
- the second encoding module is used to encode the image residual to obtain the second code stream.
- the sending module is used to send the first code stream and the second code stream.
- the second processing module is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the first resolution and the third resolution are the same, and the third processing module is specifically used to compare the pixel value of the first pixel in the first high-resolution image with that of the second high-resolution image.
- the pixel value of the second pixel corresponding to the first pixel in is subtracted to obtain the first pixel residual in the image residual.
- the first resolution is greater than the third resolution
- the third processing module is specifically used to convert the second high-resolution image into a fourth high-resolution image, and to change the resolution of the first pixel in the first high-resolution image
- the pixel value is subtracted from the pixel value of the third pixel corresponding to the first pixel in the fourth high-resolution image to obtain the second pixel residual in the image residual, where the fourth high-resolution image Has the first resolution.
- the second encoding module is specifically configured to encode image residuals in a lossy encoding manner.
- the second encoding module is specifically configured to encode image residuals in a lossless encoding manner.
- the device is applied to transmit video
- the first high-resolution image is a frame of image in the video.
- an embodiment of the present application provides a transmission device.
- the transmission device may include: a receiving module, a first decoding module, a first processing module, a second decoding module, and a second processing module.
- the receiving module is used to receive the first code stream and the second code stream.
- the first decoding module is used to decode the first code stream to obtain the second low-resolution image.
- the first processing module is configured to reconstruct a second high-resolution image using the second low-resolution image, the first resolution of the second high-resolution image is higher than the second resolution of the second low-resolution image, and the first The second high-resolution image includes the high-frequency information of the first high-resolution image, and excludes the low-frequency information of the first high-resolution image.
- the second decoding module is used to decode the second code stream to obtain the image residual between the first high-resolution image and the second high-resolution image, and the image residual is used to reflect low-frequency information.
- the second processing module is used to combine the second high-resolution image with the image residual to obtain a third high-resolution image.
- the above-mentioned high-frequency information is image detail information
- the low-frequency information is image contour information.
- the image detail information changes more rapidly relative to the image contour information.
- the first processing module is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the second processing module is specifically configured to add the pixel value of the second pixel in the second high-resolution image to the pixel residual corresponding to the second pixel in the image residual , To obtain the third pixel point corresponding to the second pixel point in the third high-resolution image.
- the device is applied to transmit video
- the first high-resolution image is a frame of image in the video.
- an embodiment of the present application provides a transmission device.
- the transmission device includes a processor system and a memory.
- the memory is used to store computer executable program code, and the program code includes instructions; when the processor system executes the instructions, the instructions cause the transmission device to execute the method described in any one of the first aspect to the third aspect.
- an embodiment of the present application provides a transmission system, which may include the transmitting end described in the first aspect and the receiving end described in the third aspect; or, as described in the second aspect. Or, the transmission device described in the foregoing fourth aspect, and the transmission device described in the sixth aspect; or, the transmission device described in the foregoing fifth aspect; or, as the foregoing seventh aspect Described transmission equipment.
- an embodiment of the present application provides a chip with a computer program stored on the chip, and when the computer program is executed by the chip, the method described in any one of the first aspect to the third aspect is implemented.
- an embodiment of the present application provides a transmission device, which includes the methods provided in the foregoing first aspect or each possible implementation manner of the first aspect, or the second aspect or each possible implementation manner of the second aspect Unit, module or circuit.
- the transmission device may be a sending end or a module applied to the sending end, for example, it may be a chip applied to the sending end.
- an embodiment of the present application provides a transmission device, including a unit, module, or circuit for executing the method provided in the third aspect or each possible implementation manner of the third aspect.
- the transmission device may be a receiving end or a module applied to the receiving end, for example, it may be a chip applied to the receiving end.
- the embodiments of the present application provide a computer-readable storage medium for storing computer programs or instructions.
- the computer programs or instructions When the computer programs or instructions are run on a computer, the computer can execute any one of the first to third aspects. The method described in the item.
- the embodiments of the present application provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the method described in any one of the first aspect to the third aspect.
- the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in any one of the first aspect to the third aspect.
- FIG. 1 is a schematic diagram of the architecture of a transmission system applied in an embodiment of this application;
- Figure 2 is a schematic diagram of a video transmission
- Figure 2A is a schematic diagram of an image in the frequency domain
- FIG. 3 is a schematic structural diagram of a transmission device provided by an embodiment of the application.
- FIG. 5 is a schematic diagram of an image provided by an embodiment of the application.
- Fig. 6 is a schematic diagram of another image provided by an embodiment of the application.
- FIG. 7 is a schematic diagram of another image provided by an embodiment of the application.
- FIG. 8 is a schematic diagram of a process for acquiring a second high-resolution image according to an embodiment of the application.
- FIG. 9 is a schematic diagram of a neural network model provided by an embodiment of the application.
- FIG. 10 is a schematic diagram of a video transmission process provided by an embodiment of this application.
- FIG. 11 is a schematic flowchart of another image transmission method provided by an embodiment of this application.
- FIG. 12 is a schematic structural diagram of a transmission device provided by an embodiment of the application.
- FIG. 13 is a schematic structural diagram of another transmission device provided by an embodiment of this application.
- FIG. 1 is a schematic diagram of the architecture of a transmission system applied in an embodiment of this application.
- the transmission system may include a sending end and a receiving end. Among them, the sending end and the receiving end are connected in a wireless or wired manner.
- Fig. 1 is only a schematic diagram.
- the transmission system may also include other network equipment between the sending end and the receiving end, for example, relay equipment and backhaul equipment, etc., which are not shown in Fig. 1.
- the embodiment of the present application does not limit the number of sending ends and receiving ends included in the transmission system.
- the sender and receiver are collectively referred to as transmission equipment in the following application documents.
- the transmission equipment can be fixed position or movable.
- transmission equipment can be deployed on land, including indoor or outdoor, handheld or vehicle-mounted; it can also be deployed on the water; it can also be deployed on airborne aircraft, balloons, and satellites.
- the transmission device involved in the embodiment of the present application may be a server, a terminal device, and the like.
- the terminal device may also be called a terminal, a user equipment (UE), a mobile station (mobile station, MS), a mobile terminal (mobile terminal, MT), and so on.
- Terminal devices can be mobile phones, tablets, computers with wireless transceiver functions, virtual reality (VR) terminal devices, augmented reality (AR) terminal devices, industrial control (industrial control) Wireless terminals in ), wireless terminals in self-driving, wireless terminals in remote medical surgery, wireless terminals in smart grid, and wireless terminals in transportation safety Terminals, wireless terminals in smart cities, wireless terminals in smart homes, etc.
- the embodiments of this application do not limit the application scenarios of the transmission system.
- the transmission system can be applied to video transmission and/or other static image transmission scenarios.
- the subsequent introduction mainly takes video transmission and processing as an example, related technical solutions can also be used for the transmission and processing of static images, which is not limited in this embodiment.
- Pixel is the unit in the image.
- a frame of image includes multiple pixels. Each pixel has a clear position and assigned color value in the image. The position and color value of all pixels determine the appearance of the image. Look like.
- a pixel may also be referred to as a pixel, which is not distinguished in the embodiment of the present application.
- each pixel (that is, the data used to reflect the color value) is related to the format of the image.
- common image formats are RBG format and YUV format.
- each pixel includes 3 channels of data, which are data representing the red sensitivity of the image, the data of the blue sensitivity of the image, and the data of the green sensitivity of the image.
- the image format is the YUV format
- each pixel includes 1.5 channels of data, which are: data representing the brightness of the image and data representing the color of the image.
- ISP image signal processor
- the number of pixels included in a frame of image is related to the resolution of the image. Taking the image resolution of 3840 ⁇ 2160 as an example, the resolution represents that the image has 3840 pixels in each row in the horizontal direction and 2160 pixels in each column in the vertical direction. It should be understood that when the image size is fixed, the higher the resolution of the image, the more pixels the image contains, the clearer the image (or the higher the image quality). Correspondingly, the image contains more data.
- a video includes a plurality of continuous still image frames, and one frame is an image (also can be referred to as an image). It can also be said that a video is a continuous sequence of images. Therefore, the resolution of the image can also be understood as the resolution of the video, and the format of the image can also be understood as the format of the video. That is, the video includes continuous still images of the same resolution and the same format. The higher the resolution of the video, the higher the definition of the video, and the larger the amount of video data. For ease of understanding, the following Table 1 exemplarily gives examples of some common video resolutions. It should be understood that Table 1 is only an illustration, and is not a limitation on the resolution of the video.
- the level of video resolution is a relative concept. For example, taking 2k video as an example, compared to 1080p video, 2k video is a high-resolution video; compared to 4k video, 2k video is a low-resolution video.
- a video with a high resolution and a large amount of data is referred to as a high-resolution video.
- 4k video, 8k video, etc. Therefore, the resolution of the high-resolution video involved in the subsequent embodiments is higher than the resolution of the low-resolution video, and the specific value refers to the above-mentioned example but is not used for limitation.
- FIG. 2 is a schematic diagram of video transmission. As shown in Figure 2, at present, the sender and receiver usually use the following methods to transmit high-resolution video:
- the sending end encodes the high-resolution video HR to be transmitted, and obtains the code stream HR_str of the high-resolution video HR.
- the encoding method used when the transmitting end encodes the high-resolution video HR may be a general encoding method. For example, the encoding method of the H.264 protocol or the encoding method of the H.265 protocol is adopted.
- S202 The sending end sends the code stream HR_str to the receiving end.
- the receiving end receives the code stream HR_str.
- the receiving end decodes the code stream HR_str to obtain a decoded high-resolution video HR_end. That is, HR_end is the high-resolution video obtained by the receiving end.
- the decoding method used when the receiving end decodes the code stream HR_str is related to the encoding method used by the transmitting end.
- the sending end adopts the encoding method of H.264 protocol, and correspondingly, the receiving end can adopt the decoding method of H.264 protocol.
- the sending end adopts the encoding method of H.265 protocol, and correspondingly, the receiving end can adopt the decoding method of H.265 protocol.
- the transmission bit rate of the video can be as shown in the following formula (1):
- Transmission code rate size of each frame of image ⁇ number of frames/compression ratio (1)
- the transmission bit rate refers to the number of bits of the video transmitted per second
- the transmission bit rate refers to the bit rate used by the sending end when sending the video
- the bit rate used by the receiving end when receiving the video
- the number of frames is the number of video frames transmitted per second.
- the compression ratio is related to the encoding method used and the resolution of the video.
- the size of each frame of image can be shown in the following formula (2):
- Image size resolution ⁇ number of channels ⁇ data bit width (2)
- the number of channels can be determined according to the format of the video. For example, when the video format is RBG format, the number of channels is 3, and when the video format is YUV format, the number of channels is 1.5.
- the data bit width is the number of bits occupied by each data of a pixel, usually 8 bits.
- each data of a pixel uses an 8-bit width.
- the sender uses the H.265 protocol encoding method to encode the 4k video. Under normal circumstances, the H.265 protocol encoding method can achieve a compression ratio of up to 100 times for 4k video.
- the transmission bit rate needs to reach 29859840 bits per second (bps), about 28.5 megabits per second (Mbps).
- the receiving end also needs to use a transmission code rate of 28.5 Mbps.
- the transmission device needs to use a relatively large transmission code rate (for example, a transmission code rate above 30 Mbps) to transmit the high-resolution video, that is, the transmission volume is large.
- a relatively large transmission code rate for example, a transmission code rate above 30 Mbps
- the transmission bit rate supported by traditional transmission equipment is 1Mbps to 20Mbps, which leads to high-resolution video transmission delay, low transmission efficiency, poor image quality, and video freezes.
- Such problems cannot meet the application scenarios of real-time transmission of high-resolution video.
- the application scenarios of real-time transmission of high-resolution video mentioned here can be, for example, webcast, online viewing of high-definition video, and high-definition video conference.
- the transmission device uses the method shown in Figure 2 to transmit high-resolution video, it uses a common codec protocol. Because the encoding method in the general encoding and decoding protocol is to obtain a higher compression ratio by losing a certain image quality. Therefore, the high-resolution video decoded and restored at the receiving end suffers a large loss, which reduces the user experience of the receiving end. Therefore, how to transmit high-resolution video is an urgent problem to be solved.
- Figure 2A is a schematic diagram of an image in the frequency domain.
- a frame of image includes high-frequency information (usually image detail information) and low-frequency information (usually image contour information).
- image detail information usually image detail information
- image contour information usually image contour information
- the frequency of the image detail information is higher than the frequency of the image contour information, and the frequency changes faster. That is, the image detail information changes more rapidly in the image relative to the image contour information.
- the subsequent comparative introduction of FIG. 6 and FIG. 7. please refer to the subsequent description.
- the higher the frequency the faster the change of image information.
- the lower the frequency the slower the change of image information.
- the amount of data that reflects image detail information is usually greater than the amount of data that reflects image outline information (ie, low-frequency information).
- the aforementioned low-frequency information may also be referred to as low-frequency components
- the aforementioned high-frequency information may also be referred to as high-frequency components.
- the high frequency and low frequency mentioned here are not a certain fixed frequency band, but two relative frequencies in a frame of image. That is, in a frame of image, a relatively high frequency is called a high frequency, and a relatively low frequency is called a low frequency.
- the range of high frequency and low frequency in different images may be different, as long as the frequency of low frequency information in the frequency domain is lower than that of high frequency information.
- the embodiments of the present application provide an image transmission method, by decomposing high-resolution images into low-resolution images, and transmitting the image residuals used to reflect the low-frequency information of the high-resolution images.
- the amount of data carried by the resolution image and the image residual is less, therefore, the transmission bit rate for transmitting high-definition images can be greatly reduced, that is, the transmission volume is reduced.
- the transmission bit rate of the high-resolution video can also be greatly reduced, so that the application scenario of real-time transmission of high-resolution video can be met.
- image transmission methods provided by the embodiments of the present application include but are not limited to the above-mentioned transmission scenes of high-definition images and/or high-resolution videos, and can also be applied to any other scenes that need to transmit images and/or videos. Go into details.
- FIG. 3 is a schematic structural diagram of a transmission device 100 provided by an embodiment of the application.
- the transmission device 100 includes: a memory 101 and a processor system 102.
- the memory 101 and the processor system 102 are communicatively connected to each other.
- the memory 101 and the processor system 102 may be connected in a network connection to realize a communication connection.
- the above-mentioned transmission device 100 may further include a bus 103.
- the memory 101 and the processor system 102 communicate with each other through the bus 103.
- the bus 103 may include a path for transmitting information between various components of the transmission device 100 (for example, the memory 101 and the processor system 102).
- the memory 101 may include a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
- the memory 101 may store a program.
- the processor system 102 and the communication interface 103 are used to execute the actions of the sending end or the action of the receiving end in the image transmission method provided in the embodiments of the present application. .
- the processor system 102 may include various processors that execute the image transmission method provided in the embodiments of the present application, such as at least one of the following: a processor 31, an encoder 32, a decoder 33, an image signal processor (ISP) ) 34, an embedded neural-network processing unit (NPU) 35, and a communication interface 36.
- a processor 31 an encoder 32, a decoder 33, an image signal processor (ISP) ) 34, an embedded neural-network processing unit (NPU) 35, and a communication interface 36.
- ISP image signal processor
- NPU embedded neural-network processing unit
- the processor 31, for example, can be used to perform image resolution conversion, obtain image residuals, and other operations; the encoder 32 can, for example, be used to encode an image, and the decoder 33 can, for example, be used to decode an image; the ISP34 can be used, for example, To process the image data to get one or more frames of images.
- the NPU may be used to implement the function of a neural network, for example, and the communication interface 36 may be used to implement communication between the transmission device 100 and other devices or a communication network, for example.
- a graphics processing unit GPU
- the CPU may be used instead of the NPU to implement the function of the neural network.
- the processor 31 may include a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, or a microcontroller. And further includes an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (GPU), or one or more integrated circuits.
- CPU Central Processing Unit
- ASIC Application Specific Integrated Circuit
- GPU graphics processing unit
- the processor 31 may also be an integrated circuit chip with signal processing capability.
- the functions of the transmission device 100 of the present application can be completed by an integrated logic circuit of hardware in the processor 31 or instructions in the form of software.
- the above-mentioned processor 31 may also include a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices.
- DSP digital signal processing
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- Discrete gates or transistor logic devices, and discrete hardware components can implement or execute the methods, steps, and logical block diagrams disclosed in the following embodiments of the present application.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc., such as the aforementioned CPU, microprocessor, or microcontroller.
- the steps may be directly embodied as being executed and completed by the processor system, or executed and completed by a combination of hardware and software modules in the processor system.
- the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory 101, and the processor 31 reads the information in the memory 101 and completes the functions of the transmission device 100 in the embodiment of the present application in combination with its hardware.
- the communication interface 36 uses a transceiver module such as but not limited to a transceiver to implement communication between the transmission device 100 and other devices or a communication network. For example, image transmission can be performed through the communication interface 36.
- the above-mentioned transmission device 100 may further include a camera sensor 104 and a power supply 105, which is not limited in this embodiment.
- both the sending end and the receiving end can use the equipment shown in Fig. 3.
- the two equipments need not be completely the same, and the partial design can be slightly changed without affecting the technical implementation.
- the following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
- FIG. 4 is a schematic flowchart of an image transmission method provided by an embodiment of the application.
- This embodiment relates to the process of decomposing the original high-resolution image to be transmitted into low-resolution images, and transmitting the image residuals used to reflect the low-frequency information of the high-resolution images.
- the method may include: S401.
- the sending end converts the first high-resolution image into the first low-resolution image.
- the above-mentioned first high-resolution image is the original high-resolution image to be transmitted, and the first resolution of the first high-resolution image may be, for example, 4k or 8k.
- the first high-resolution image may be a frame of image in the video.
- the first resolution of the first high-resolution image may also be referred to as the resolution of the video.
- the sending end converts the resolution of the first high-resolution image to obtain the first low-resolution image. That is, the first resolution of the first high-resolution image is higher than the second resolution of the first low-resolution image.
- the second resolution of the first low-resolution image may be 720p, 480p, CIF, etc., for example.
- the sending end may convert the resolution of the first high-resolution image through the CPU 31 of the sending end to obtain the first low-resolution image.
- This embodiment does not limit the foregoing implementation manner of converting the first high-resolution image into the first-resolution image by the sending end.
- the sending end may use the method shown in Table 2 below to change the first high-resolution image.
- the resolution image is converted to the first resolution image:
- the foregoing sending end may also adopt any other manner capable of implementing resolution conversion to convert the first high-resolution image into the first low-resolution image. For example, based on distributed sampling, Markkov chain Monte Carlo sampling, Gibbs sampling, etc., this will not be repeated here.
- the above-mentioned first high-resolution image includes high-frequency information (usually image detail information) and low-frequency information (usually image contour information). Among them, the image detail information changes more rapidly relative to the image contour information.
- FIG. 5 is a schematic diagram of an image provided by an embodiment of the application. The image shown in FIG. 5 is the first high-resolution image (ie, the original high-resolution image), and the first high-resolution image can clearly reflect the detailed information of the animal and the outline information of the animal.
- the sending end encodes the first low-resolution image to obtain a first code stream.
- the sending end may use an encoding method in a general encoding/decoding protocol to encode the first low-resolution image to obtain the first code stream.
- the encoding method in the general encoding and decoding protocol can be, for example, the encoding method using the H.264 protocol, the encoding method using the H.265 protocol (also known as high efficiency video coding (HEVC)), the VP8 protocol Any one of the encoding method, the encoding method of the VP9 protocol, the encoding method of the RV40 protocol, etc.
- the encoding method in the general encoding and decoding protocol mentioned here may also be referred to as a lossy encoding method. That is, an encoding method that obtains a higher compression ratio by losing a certain image quality.
- the compression ratio of the lossy encoding method can be 50/100/500 and so on.
- the sending end may encode the first low-resolution image through the encoder 32 of the sending end to obtain the first code stream.
- the sending end acquires a second high-resolution image.
- the second high-resolution image includes the high-frequency information of the first high-resolution image, and excludes the low-frequency information of the first high-resolution image.
- the third resolution of the second high-resolution image is higher than the second resolution.
- the second high-resolution image includes the high-frequency information of the first high-resolution image, or in other words, includes the fast-changing detail information in the first high-resolution image, or in other words, includes the first high-resolution image. Rate the high frequency components in the image.
- FIG. 6 is a schematic diagram of another image provided by an embodiment of this application.
- the image shown in FIG. 6 is a second high-resolution image corresponding to the first high-resolution image shown in FIG. 5.
- FIG. 5 and FIG. 6 it can be seen that the second high-resolution image shown in FIG. 6 mainly includes detailed information of the animal.
- the sending end may obtain the second high-resolution image according to the first low-resolution image.
- the sending end may first encode and further decode the first low-resolution image to obtain the second low-resolution image, and then, the sending end uses the second low-resolution image to obtain the second high-resolution image, etc.
- the sending end may obtain the second high-resolution image and the like according to the first high-resolution image.
- FIG. 7 is a schematic diagram of another image provided by an embodiment of the application.
- the image shown in Fig. 7 is the image residual.
- the first high-resolution image includes the detailed information of the animal, as well as the contour information of the animal, and the second high-resolution image
- the image mainly includes the detailed information of the animal (that is, high frequency information), and the image residual is used to reflect the contour information of the animal (that is, low frequency information).
- the texture of the detail information in FIG. 6 changes rapidly in the image, while the texture of the contour information in FIG. 7 changes slowly.
- the sending end can compare the pixel value of the first pixel in the first high-resolution image with that of the second high-resolution image.
- the pixel value of the second pixel corresponding to the first pixel in the rate image is subtracted to obtain the first pixel residual in the image residual.
- the sending end may obtain the image residual between the first high-resolution image and the second high-resolution image through the processor 31 of the sending end.
- the transmitting end encodes the image residual to obtain a second code stream.
- the sending end can use the encoding method in the general encoding and decoding protocol to encode the image residual. That is, the image residual is encoded in a lossy encoding method.
- the sending end can also use a separate encoding method to encode the image residuals. For example, entropy coding, self-encoding network, etc.
- the compression used by the separate encoding method mentioned here is relatively low, and the loss of image quality is negligible.
- the compression ratio of the separate encoding method may be between 0.5 and 2.
- this encoding method may also be referred to as lossless encoding. That is, the image residual is encoded in a lossless encoding method.
- the sending end may encode the image residual through the encoder 32 of the sending end to obtain the second code stream.
- the sending end sends the first code stream and the second code stream.
- the receiving end receives the first code stream and the second code stream.
- the transmitting end may send the first code stream and the second code stream to the communication interface 36 of the receiving end through the communication interface 36 of the transmitting end, so that the receiving end can receive the first code stream and the second code stream.
- the receiving end decodes the first code stream to obtain a second low-resolution image.
- the decoding method used when the receiving end decodes the first code stream is related to the encoding method used when the transmitting end encodes the first low-resolution image. That is, the encoding method of the protocol used by the sending end to encode the first low-resolution image, the receiving end needs to decode the first code stream correspondingly using the decoding method of the protocol, which will not be repeated here. For example, if the sending end uses the VP8 protocol encoding method to encode the first low-resolution image, correspondingly, the receiving end may use the VP8 protocol decoding method to decode the first code stream.
- the second low-resolution image obtained by the receiving end decoding the first code stream has the second resolution, that is, the resolution of the second low-resolution image is the same as the resolution of the first low-resolution image.
- the receiving end may decode the first code stream through the decoder 33 of the receiving end to obtain the second low-resolution image.
- the receiving end uses the second low-resolution image to reconstruct the second high-resolution image.
- the manner in which the receiving end uses the second low-resolution image to reconstruct the second high-resolution image is related to the manner in which the transmitting end obtains the second high-resolution image. This part of the content will be highlighted below.
- the receiving end decodes the second code stream to obtain an image residual between the first high-resolution image and the second high-resolution image.
- the decoding method used when the receiving end decodes the second code stream is related to the encoding method used when the transmitting end encodes the image residual. That is, what kind of encoding method the transmitting end uses to encode the image residual, the receiving end needs to use the decoding method of the protocol to decode the second code stream, which will not be repeated here.
- the receiving end may decode the second code stream through the decoder 33 of the receiving end to obtain the image residual between the first high-resolution image and the second high-resolution image.
- the receiving end combines the second high-resolution image with the image residual to obtain a third high-resolution image. For example, the receiving end adds the pixel value of the second pixel in the second high-resolution image to the pixel residual in the image residual corresponding to the second pixel to obtain the third high-resolution image corresponding to The third pixel of the second pixel.
- the third high-resolution image can be obtained by combining the third pixel points. At this time, the third high-resolution image is the high-resolution image obtained by the receiving end.
- the sending end can first convert the resolution of the two high-resolution images to the same resolution, and then perform the subtraction process.
- the sending end may first convert the second high-resolution image into a fourth high-resolution image, where the fourth high-resolution image has the first resolution. That is, the resolution of the first high-resolution image is the same as the resolution of the fourth high-resolution image. Then, the sending end subtracts the pixel value of the first pixel in the first high-resolution image from the pixel value of the third pixel in the fourth high-resolution image corresponding to the first pixel to obtain the image residual The second pixel in the residual.
- the first high-resolution image is converted into a fifth high-resolution image, where the fifth high-resolution image has a third resolution. That is, the resolution of the fifth high-resolution image is the same as the resolution of the second high-resolution image.
- the sending end subtracts the pixel value of the fourth pixel in the fifth high-resolution image from the pixel value of the second pixel in the second high-resolution image corresponding to the fourth pixel to obtain the image residual The second pixel in the residual.
- the receiving end may first merge the second high-resolution image with the image residual to obtain the fifth high-resolution image. Then, the receiving end may perform resolution conversion on the fifth high-resolution image to obtain a third high-resolution image. So far, the sending end transmits the first high-resolution image to the receiving end.
- the receiving end may combine the second high-resolution image with the image residual through the processor 31 of the receiving end to obtain the third high-resolution image.
- the third high-resolution image may be the same as the first high-resolution image at the sending end. If encoding will cause loss, the third high-resolution image may be different from the first high-resolution image at the sending end, and there will be a slight loss. It can be considered that the restored third high-resolution image is roughly the same as the first high-resolution image .
- the sending end can decompose the high-resolution image to be transmitted into low-resolution images, and transmit the image residuals used to reflect the low-frequency information of the high-resolution images. Since the data amount of the image residuals used to reflect the low-frequency information of the high-resolution image in this embodiment is much smaller than the image residuals used to reflect the low-frequency information and high-frequency information of the high-resolution image. Therefore, the transmission bit rate required to transmit the image residuals used to reflect the low-frequency information of the high-resolution images in this embodiment is compared with the image residuals used to transmit the low-frequency information and high-frequency information of the high-resolution images. The required transmission bit rate will drop significantly.
- the transmission code rate required for low-resolution images is much lower than the transmission code rate required for high-resolution images. Therefore, in this embodiment, a lower transmission code rate can be used between the sending end and the receiving end, and the transmission of high-resolution images can be realized.
- the transmission of high-resolution video can be realized by using a lower transmission code rate, which can meet the transmission code rate supported by most transmission equipment, so as to meet the requirements of real-time transmission of high resolution. Rate the application scenarios of the video.
- FIG. 8 is a schematic diagram of a process for acquiring a second high-resolution image according to an embodiment of the application.
- the sending end involved in this embodiment uses the second low-resolution image to reconstruct the second high-resolution image.
- the foregoing step S402 may include: S801.
- the transmitting end decodes the first code stream to obtain a second low-resolution image.
- the decoding method used when the transmitting end decodes the first code stream is related to the encoding method used when the transmitting end encodes the first low-resolution image.
- the encoding method of the protocol used by the sending end to encode the first low-resolution image requires the corresponding decoding method of the protocol to decode the first code stream, which will not be repeated here.
- the decoder 33 at the transmitting end may decode the first code stream to obtain the second low-resolution image.
- the sending end uses the second low-resolution image to reconstruct the second high-resolution image.
- the receiving end can use the second low-resolution image to reconstruct the second high-resolution image in the same manner in S408.
- the second high-resolution image obtained by the sending end covers the editing of the first low-resolution image during transmission. Decoding loss.
- the image residual obtained by the sending end Based on the second high-resolution image and the first high-resolution image, the image residual obtained by the sending end also covers the coding and decoding loss of the first low-resolution image during transmission.
- the receiving end obtains the second high-resolution image based on the second low-resolution image
- the transmission process of the first low-resolution image can be eliminated.
- there is only the transmission loss of the image residual on the channel which reduces the loss of the third high-resolution image recovered by the receiving end (for example, only the loss of the image residual), or even no loss (for example, using lossless
- the encoding method encodes the image residual error), which improves the quality of the image.
- the sending end may use the following method to reconstruct the second high-resolution image using the second low-resolution image.
- the first method uses the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the receiving end can also use the neural network to reconstruct the second high-resolution image. That is, the sending end and the receiving end are deployed with the same neural network, so that the second resolution image is input to the same neural network to obtain the second high resolution image output by the neural network.
- the neural network consists of two parts, namely the network structure and the parameters of the convolution kernel.
- the network structure mentioned here may include, for example, at least one of the following information: the number of network layers, the number of convolution kernels in each layer, the size of each convolution kernel, and the connection relationship of each layer.
- the parameters of the convolution kernel are used to constrain the operations performed by the convolution kernel. Therefore, the neural network involved in the embodiments of the present application can be implemented in two steps: the first step is to design the network structure, and the second step is to train the neural network to obtain the parameters of each convolution kernel.
- Fig. 9 is a schematic diagram of a neural network model provided by an embodiment of the application.
- the network structure can be determined according to the task goals achieved by the neural network.
- the task goal achieved by the neural network in this embodiment is to use low-resolution images to construct high-resolution images that cover image detail information.
- the neural network in this embodiment needs to have two functions: one function is to extract image detail information, and the other function is to convert the resolution. Therefore, the network architecture of the neural network in this embodiment may include two parts: one part is the network layer (abbreviated as: extraction layer) used to extract image detail information, and the other part is used to implement low-resolution conversion to high-resolution. Network layer (abbreviation: conversion layer).
- both the extraction layer and the conversion layer mentioned here may include at least one convolutional layer.
- the extraction layer can be implemented using 1 to N convolutional layers, and the conversion layer can be implemented using 1 to M deconvolutional layers. Where N and M are both integers greater than or equal to 2.
- FIG. 9 is a schematic diagram of an example where the extraction layer is located before the conversion layer.
- the position of the two network layers in the neural network is not limited.
- the extraction layer can be located after the conversion layer or before the conversion layer. That is, the neural network can first perform the operation of extracting image detail information, and then perform the operation of low resolution conversion to high resolution; or, it can perform the operation of low resolution conversion to high resolution first, and then perform the operation of extracting image detail information. Operation; or, after performing the operation of partially extracting image detail information, perform the operation of converting low resolution to high resolution, and then perform the remaining operation of extracting image detail information.
- the length ⁇ width in the above table is used to characterize the size of the convolution kernel used by the convolution layer.
- the size of the input channel of each layer can be determined according to the number of convolution kernels of the previous layer, and each input channel of the first layer corresponds to one channel of the image in the RBG format.
- the neural network shown in Table 3 first performs the operation of extracting image detail information, and then performs the operation of low-resolution conversion to high-resolution.
- the neural network shown in Table 4 performs the operation of converting low-resolution to high-resolution after performing partial extraction of image detail information, and then performs the remaining operations of extracting image detail information.
- the neural networks of the two structures can reconstruct the second high-resolution image based on the second low-resolution image
- Table 3 is the operation of extracting image detail information based on the low-resolution image
- Table 4 shows The partial extraction of image detail information in the neural network is performed based on high-resolution images. Since the data volume of the high-resolution image is greater than the data volume of the low-resolution image, the neural network shown in Table 4 achieves the reconstruction of the second high-resolution image based on the second low-resolution image. This shows that the neural network will take up more calculations, but the extracted image detail information will be more accurate. Therefore, in specific implementation, the specific architecture of the neural network can be selected according to actual needs. It can be understood that although the above examples give an example of a neural network applied to an image in the RBG format, the structure of the neural network corresponding to which format of the image is used in the embodiment of the present application is not limited.
- the neural network After determining the architecture of the neural network, for example, the neural network can be trained in the following manner to obtain the parameters of each convolution kernel: First, a training set for training the neural network is constructed.
- the training set may include S groups of samples, each group The sample includes a frame of input image X and label image Y. Among them, the input image X is a low-resolution image, and the label image Y is the target output to be achieved after the low-resolution image is input to the neural network.
- S is an integer greater than or equal to 1.
- the S samples can be obtained using S high-resolution images. Specifically, taking the i-th high-resolution image as an example, resolution conversion may be performed on the i-th high-resolution image to obtain a low-resolution image corresponding to the i-th high-resolution image.
- the low-resolution image is the input image Xi.
- i is an integer greater than 0 and less than or equal to S.
- the i-th high-resolution image can be subjected to frequency domain conversion (for example, Fourier transform, wavelet transform, or other spatial-to-frequency-domain methods, etc.) to obtain the frequency domain information of the i-th high-resolution image .
- the frequency domain information of the i-th high-resolution image can be extracted (for example, a high-pass filter is used to extract the frequency domain information) to obtain the high-frequency information of the i-th high-resolution image.
- the high-frequency information of the i-th high-resolution image is converted into an image.
- the high-frequency information can be converted into an image by converting the frequency domain to the spatial domain.
- the high-frequency information of the i-th high-resolution image can be converted into an image through inverse Fourier transform.
- the obtained image is the label image Yi of the input image Xi.
- the frequency of the above-mentioned high-frequency information is greater than or equal to a preset frequency threshold.
- the preset frequency threshold can be determined according to actual needs.
- the objective function of the neural network Since the purpose of training the neural network is to continuously adjust and optimize the parameters of each convolution kernel to make the image Y'output by the neural network according to the input input image X and the label image Y as similar as possible, therefore, the goal of the neural network
- the function can be defined as min
- the trained neural network model can be applied to the above-mentioned sending end and receiving end to reconstruct the second high-resolution image using the second low-resolution image.
- neural network model described in the above examples and the manner of training the neural network model are only an example. Due to the large number of image formats, neural networks applied to images of various formats will not be listed here. During specific implementation, those skilled in the art can perform corresponding neural network operations according to the format of the image to be processed and the task goal of the neural network to "build a high-resolution image covering image details using low-resolution images". The construction and training of the network to achieve the above functions will not be repeated here.
- the foregoing embodiments all take the neural network model as an example, how to use the second low-resolution image to reconstruct the second high-resolution image is illustrated as an example.
- AI artificial intelligence
- the neural network involved in this embodiment includes, but is not limited to, a convolutional neural network.
- the sending end may use the second low-resolution image to reconstruct the second high-resolution image using the neural network through the NPU35 of the sending end.
- the sending end can use the second low-resolution image to reconstruct the second high-resolution image using a neural network through the GPU (not shown in FIG. 3) of the sending end.
- the sending end may use the second low-resolution image to reconstruct the second high-resolution image using a neural network through the processor 31 of the sending end.
- the sending end may use the processor 31 and GPU on the sending end to use the second low-resolution image, use a neural network to reconstruct the second high-resolution image, and so on.
- the receiving end can also be implemented in a similar manner, which will not be repeated here.
- the sending end can perform resolution conversion on the second low-resolution image to obtain a high-resolution image (for example, a sixth high-resolution image). Then, the sending end may perform frequency domain conversion on the sixth high-resolution image to obtain frequency domain information of the sixth high-resolution image. Then, the sending end may extract the frequency domain information of the sixth high-resolution image to obtain the high-frequency information of the sixth high-resolution image. Finally, the sending end can convert the high-frequency information of the sixth high-resolution image into a second high-resolution image. Regarding how to implement frequency domain conversion, and extract frequency domain information, and how to convert high-frequency information into images, please refer to the content of the sample set for constructing the training neural network, which will not be repeated here. Correspondingly, the receiving end can also use this method to reconstruct the second high resolution image using the second resolution image.
- the sending end may use the GPU (not shown in FIG. 3) of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 and GPU on the sending end to obtain the second high-resolution image and the like in this manner.
- the receiving end can also be implemented in a similar manner, which will not be repeated here.
- the foregoing sending end may directly use the first low-resolution image to reconstruct the second high-resolution image.
- the sending end may input the first low-resolution image into the neural network to obtain the second high-resolution image.
- the receiving end can input the second low-resolution image into the neural network to obtain the second high-resolution image.
- the sending end may use the GPU (not shown in FIG. 3) of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 and GPU on the sending end to obtain the second high-resolution image and the like in this manner.
- the receiving end can also be implemented in a similar manner, which will not be repeated here.
- the sending end may first convert the first low-resolution image into a seventh high-resolution image, and perform high-frequency information extraction on the seventh high-resolution image to obtain the second high-resolution image.
- the receiving end may convert the second low-resolution image into an eighth high-resolution image, and perform high-frequency information extraction on the eighth high-resolution image to obtain the second high-resolution image.
- the sending end may use the GPU (not shown in FIG. 3) of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 and GPU on the sending end to obtain the second high-resolution image and the like in this manner.
- the receiving end can also be implemented in a similar manner, which will not be repeated here.
- the above-mentioned sending end may also use the first high-resolution image to reconstruct the second high-resolution image.
- high-frequency information extraction may be performed on the first high-resolution image to obtain the second high-resolution image.
- the receiving end can convert the second low-resolution image into a seventh high-resolution image, and perform high-frequency information extraction on the seventh high-resolution image to obtain the second high-resolution image.
- the sending end may use the GPU (not shown in FIG. 3) of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 of the sending end to obtain the second high-resolution image in this manner.
- the sending end may use the processor 31 and GPU on the sending end to obtain the second high-resolution image and the like in this manner.
- the receiving end can also be implemented in a similar manner, which will not be repeated here.
- FIG. 10 is a schematic diagram of a video transmission process provided by an embodiment of the present application.
- the method includes: (1) The sending end converts the high-resolution video HR to be transmitted into the low-resolution video LR. Among them, the resolution of the original high-resolution video HR is higher than the resolution of the low-resolution video LR.
- the high-resolution video HR to be transmitted may also be referred to as the original high-resolution video, and the resolution of the high-resolution video HR to be transmitted may be, for example, 4k or 8k.
- the transmitting end encodes the low-resolution video LR to obtain the bit stream LR_str of the low-resolution video LR.
- the sender uses a general codec protocol to encode low-resolution video LR.
- the transmitting end decodes the code stream LR_str of the low-resolution video LR to obtain the decoded low-resolution video LR'.
- the decoded low-resolution video LR' has the same resolution as the low-resolution video LR.
- the sending end uses the decoded low-resolution video LR' as input and inputs it to the neural network to obtain the high-resolution video HR_hf output by the neural network.
- the high-resolution video HR_hf includes the high-frequency information of each image in the high-resolution video HR, and excludes the low-frequency information of each image in the high-resolution video HR.
- the sender obtains the video residual Res_lf between the high-resolution video HR and the high-resolution video HR_hf.
- the video residual Res_lf is used to reflect the low-frequency information of each image in the high-resolution video HR. For example, if the resolution of the high-resolution video HR and the high-resolution video HR_hf are the same, the sender can compare the pixel value of the first pixel of each image in the high-resolution video HR to the first pixel in the high-resolution video HR_hf. The pixel value of the second pixel of the point is subtracted to obtain the first pixel residual in the video residual Res_lf.
- the sending end encodes the video residual Res_lf to obtain the code stream Res_str of the video residual Res_lf.
- the sender uses a general encoding and decoding protocol to encode the video residual Res_lf.
- the sending end uses a separate encoding method to encode the video residual Res_lf. For example, entropy coding, self-encoding network, etc.
- the transmitting end sends the code stream LR_str of the low-resolution video LR and the code stream Res_str of the video residual Res_lf.
- the receiving end receives the code stream LR_str and the code stream Res_str.
- the receiving end decodes the code stream LR_str to obtain the decoded low-resolution video LR'.
- the decoding method used when the receiving end decodes the code stream LR_str is related to the encoding method used when the transmitting end encodes the low-resolution video LR. That is, what kind of encoding method the transmitting end uses to encode the low-resolution video LR, the receiving end needs to use the decoding method of the protocol to decode the bit stream LR_str, which will not be repeated here.
- the receiving end uses the decoded low-resolution video LR' as input and inputs it to the neural network to obtain the high-resolution video HR_hf output by the neural network. It should be understood that the neural network deployed at the receiving end is the same as the neural network deployed at the transmitting end, or at least approximately the same in function.
- the receiving end decodes the code stream Res_str to obtain the decoded video residual Res_lf'.
- the decoding method used when the receiving end decodes the code stream Res_str is related to the encoding method used when the transmitting end encodes the video residual Res_lf. That is, what kind of encoding method the transmitting end uses to encode the video residual Res_lf, the receiving end needs to use the decoding method of the protocol to decode the bit stream Res_str, which will not be repeated here.
- the receiving end combines the high-resolution video HR_hf with the decoded video residual Res_lf' to restore the third high-resolution video HR_end. For example, the receiving end adds the pixel value of the second pixel in the high-resolution video HR_hf to the pixel residual of the second pixel in the video residual Res_lf' to obtain the third high-resolution video HR_end The third pixel corresponding to the second pixel.
- FIG. 10 uses neural network to reconstruct the high-resolution video HR_hf as an example to illustrate how to transmit the high-resolution video. It should be understood that this part of the content is also implemented in the aforementioned second way of "using the second low-resolution image to reconstruct the second high-resolution image".
- the sending end processes the 4k video to obtain 720p video and 4k video residual C#.
- the 4K video residual C# is used to reflect the image contour information of each image of the 4k video (that is, used to reflect the low-frequency information of each image of the 4k video).
- the 4K video residual C# includes a small amount of data. Therefore, when encoding the 4k video residual C# in this embodiment, a larger compression ratio can be used, for example, the compression ratio can be up to 500 times.
- the transmission bit rate required to transmit the 4k video residual C# is:
- the sum of the two transmission bit rates is 12Mbps. That is to say, when the 4k video is transmitted using the scheme shown in FIG. 10, the transmission bit rate of the sender needs to reach 12Mbps. Correspondingly, when the receiving end receives the 4k video, it also needs to use a transmission bit rate of 12 Mbps.
- the transmitting end and the receiving end use the coded and decoded low-resolution video LR', the high-resolution video HR_hf including the image detail information of each image in the high-resolution video HR is acquired. Therefore, the high-resolution video HR_hf obtained by the sending end and the receiving end are exactly the same. Since the sending end uses the high-resolution video HR_hf to calculate the video residual, the receiving end uses the decoded video residual to correct the high-resolution video HR_hf.
- the video residual can be used to eliminate the low-resolution video LR
- the coding and decoding loss makes the entire transmission path only have the coding and decoding loss of the video residual, and there is no longer the coding and decoding loss of the low-resolution video LR, so that the quality of video transmission can be improved.
- the foregoing embodiment focuses on how to decompose a high-resolution image into a low-resolution image and image residuals used to reflect the low-frequency information of the high-resolution image for transmission, so as to reduce the transmission bit rate of the high-resolution image.
- the embodiments of this application also provide another image transmission method. Although this method also decomposes high-resolution images into low-resolution images and image residuals for transmission, the focus is on how to eliminate low-resolution image transmission. Loss in the process.
- the transmitted image residual can be the image residual used to reflect the low-frequency information of the high-resolution image, or it can be the image residual used to reflect the low-frequency information and high-frequency information of the high-resolution image. , It can also be the image residual used to reflect the high-frequency information of the high-resolution image, which is not limited.
- FIG. 11 is a schematic flowchart of another image transmission method provided by an embodiment of the application.
- the method may include: S1101, the sending end converts the first high-resolution image into the first low-resolution image. Wherein, the first resolution of the first high-resolution image is higher than the second resolution of the first low-resolution image. S1102.
- the transmitting end encodes the first low-resolution image to obtain a first code stream. For example, the sending end encodes the first low-resolution image in a lossy encoding manner to obtain the first code stream.
- the sending end decodes the first code stream to obtain a second low-resolution image.
- the sending end uses the second low-resolution image to reconstruct the second high-resolution image.
- the third resolution of the second high-resolution image is higher than the second resolution.
- the second high-resolution image may include: image detail information and image contour information of the first high-resolution image, or may only include the image detail information of the first high-resolution image, or may only include the first high-resolution image.
- the contour information of the high-resolution image, etc. are not limited in the embodiment of the present application.
- the sending end obtains an image residual between the first high-resolution image and the second high-resolution image.
- the sender encodes the image residual to obtain a second code stream.
- the image residual is encoded in a lossy encoding method, or the image residual is encoded in a lossless encoding method.
- the sending end sends the first code stream and the second code stream.
- the receiving end decodes the first code stream to obtain a second low-resolution image.
- the receiving end uses the second low-resolution image to reconstruct the second high-resolution image.
- the receiving end decodes the second code stream to obtain an image residual between the first high-resolution image and the second high-resolution image.
- the receiving end combines the second high-resolution image with the image residual to obtain a third high-resolution image.
- the sending end uses the decoded second low-resolution image to obtain the second high-resolution image, so that the second high-resolution image obtained by the sending end covers the first low-resolution image.
- the sender is based on the second high-resolution image and the first high-resolution image, and the acquired image residual also covers the coding and decoding loss of the first low-resolution image in the transmission process. In this way, after the receiving end obtains the second high-resolution image based on the second low-resolution image, when the second high-resolution image is combined with the decoded image residual, the transmission process of the first low-resolution image can be eliminated.
- the encoding method encodes the image residual error), which improves the quality of the image.
- the image quality of the high-resolution video can be improved, and the user experience can be improved.
- FIG. 12 is a schematic structural diagram of a transmission device provided by an embodiment of this application. It can be understood that the transmission device can correspondingly implement the operations or steps of the sending end in the foregoing method embodiments.
- the transmission device may be the transmitting end or may be a component configurable at the transmitting end, such as a chip. As shown in FIG. 12, the transmission device may include: a first processing module 11, a first encoding module 12, a second processing module 13, a third processing module 14, a second encoding module 15, and a sending module 16.
- the transmission device may further include: a decoding module 17.
- the first processing module 11 is configured to convert a first high-resolution image into a first low-resolution image, and the first resolution of the first high-resolution image is higher than the first low-resolution image. Resolution The second resolution of the image.
- the first encoding module 12 is used for encoding a first low-resolution image to obtain a first code stream.
- the second processing module 13 is configured to obtain a second high-resolution image, the third resolution of the second high-resolution image is higher than the second resolution, and the second high-resolution image includes the information of the first high-resolution image High-frequency information, and the low-frequency information of the first high-resolution image is excluded.
- the third processing module 14 is configured to obtain an image residual between the first high-resolution image and the second high-resolution image, and the image residual is used to reflect low-frequency information.
- the second encoding module 15 is used for encoding image residuals to obtain a second code stream.
- the sending module 16 is used to send the first code stream and the second code stream.
- the above-mentioned high-frequency information is image detail information
- the low-frequency information is image contour information.
- the image detail information changes more rapidly relative to the image contour information.
- the decoding module 17 is used to decode the first code stream to obtain the second low-resolution image.
- the second processing module 13 is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image.
- the second processing module 13 is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the first resolution and the third resolution are the same; the third processing module 14 is specifically configured to compare the pixel value of the first pixel in the first high-resolution image with that of the second high-resolution image. The pixel value of the second pixel corresponding to the first pixel in the image is subtracted to obtain the first pixel residual in the image residual.
- the first resolution is greater than the third resolution
- the third processing module 14 is specifically configured to convert the second high-resolution image into a fourth high-resolution image, and to convert the first pixel in the first high-resolution image The pixel value of is subtracted from the pixel value of the third pixel corresponding to the first pixel in the fourth high-resolution image to obtain the second pixel residual in the image residual, where the fourth high-resolution image
- the image has the first resolution.
- the first encoding module 12 is specifically configured to encode the first low-resolution image in a lossy encoding manner.
- the second encoding module 15 is specifically configured to encode image residuals in a lossy encoding manner.
- the second encoding module 15 is specifically configured to encode image residuals in a lossless encoding manner.
- the device is applied to transmit video
- the first high-resolution image is a frame of image in the video.
- the transmission device provided in this embodiment can perform the actions of the sending end in the method embodiments corresponding to FIG. 4 and FIG.
- the first processing module 11 is configured to convert a first high-resolution image into a first low-resolution image, the first resolution of the first high-resolution image is higher than the first low-resolution image The second resolution of the image.
- the first encoding module 12 is used for encoding a first low-resolution image (for example, encoding the first low-resolution image in a lossy encoding manner) to obtain a first code stream.
- the decoding module 17 is used to decode the first code stream to obtain the second low-resolution image.
- the second processing module 13 is configured to reconstruct a second high-resolution image using the second low-resolution image, and the third resolution of the second high-resolution image is higher than the second resolution.
- the third processing module 14 is used to obtain the image residual between the first high-resolution image and the second high-resolution image.
- the second encoding module 15 is used for encoding image residuals to obtain a second code stream.
- the sending module 16 is used to send the first code stream and the second code stream.
- the second processing module 13 is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the first resolution and the third resolution are the same, and the third processing module 14 is specifically configured to compare the pixel value of the first pixel in the first high-resolution image with that of the second high-resolution image. The pixel value of the second pixel corresponding to the first pixel in the image is subtracted to obtain the first pixel residual in the image residual.
- the first resolution is greater than the third resolution
- the third processing module 14 is specifically configured to convert the second high-resolution image into a fourth high-resolution image, and to convert the first pixel in the first high-resolution image The pixel value of is subtracted from the pixel value of the third pixel corresponding to the first pixel in the fourth high-resolution image to obtain the second pixel residual in the image residual, where the fourth high-resolution image
- the image has the first resolution.
- the second encoding module 15 is specifically configured to encode image residuals in a lossy encoding manner.
- the second encoding module 15 is specifically configured to encode image residuals in a lossless encoding manner.
- the device is applied to transmit video
- the first high-resolution image is a frame of image in the video.
- the transmission device provided in this embodiment can perform the actions of the sending end in the method embodiment corresponding to FIG. 11, and its implementation principles and technical effects are similar, and will not be repeated here.
- the above-mentioned device may further include at least one storage module, which may include data and/or instructions, and each of the above-mentioned modules may read the data and/or instructions in the storage module to implement the corresponding method.
- at least one storage module which may include data and/or instructions, and each of the above-mentioned modules may read the data and/or instructions in the storage module to implement the corresponding method.
- FIG. 13 is a schematic structural diagram of another transmission device provided by an embodiment of the application. It is understandable that the transmission device can correspondingly implement the operations or steps of the receiving end in the foregoing method embodiments.
- the transmission device may be a receiving end or a component configurable at the receiving end, such as a chip. As shown in FIG. 13, the transmission device may include: a receiving module 21, a first decoding module 22, a first processing module 23, a second decoding module 24, and a second processing module 25.
- the receiving module 21 is configured to receive the first code stream and the second code stream.
- the first decoding module 22 is used to decode the first code stream to obtain the second low-resolution image.
- the first processing module 23 is configured to reconstruct a second high-resolution image using the second low-resolution image, the first resolution of the second high-resolution image is higher than the second resolution of the second low-resolution image,
- the second high-resolution image includes the high-frequency information of the first high-resolution image, and excludes the low-frequency information of the first high-resolution image.
- the second decoding module 24 is used to decode the second code stream to obtain an image residual between the first high-resolution image and the second high-resolution image, and the image residual is used to reflect low-frequency information.
- the second processing module 25 is used to combine the second high-resolution image with the image residual to obtain a third high-resolution image.
- the above-mentioned high-frequency information is image detail information
- the low-frequency information is image contour information.
- the image detail information changes more rapidly relative to the image contour information.
- the first processing module 23 is specifically configured to use the second low-resolution image to reconstruct the second high-resolution image using a neural network.
- the second processing module 25 is specifically configured to compare the pixel value of the second pixel in the second high-resolution image with the pixel residual corresponding to the second pixel in the image residual. Add to obtain the third pixel point corresponding to the second pixel point in the third high-resolution image.
- the device is applied to transmit video
- the first high-resolution image is a frame of image in the video.
- the transmission device provided in this embodiment can perform the actions of the receiving end in the foregoing method embodiments, and its implementation principles and technical effects are similar, and will not be repeated here.
- the above-mentioned device may further include at least one storage module, which may include data and/or instructions, and each of the above-mentioned modules may read the data and/or instructions in the storage module to implement the corresponding method.
- the various modules involved in FIG. 12 and FIG. 13 can be implemented by software, hardware or a combination of both.
- the sending module may be a transmitter when actually implemented, and the receiving module may be a receiver when actually implemented, or the sending module and the receiving module are realized by a transceiver, or the sending module and the receiving module are realized by a communication interface.
- the processing module can be implemented in the form of software called by processing elements; it can also be implemented in the form of hardware.
- the processing module may be at least one separately set up processing element, or it may be integrated in a certain chip of the above-mentioned device for implementation.
- each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
- the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASIC), or one or more microprocessors (digital signal processor, DSP), or, one or more field programmable gate arrays (FPGA), etc.
- ASIC application specific integrated circuit
- DSP digital signal processor
- FPGA field programmable gate arrays
- the processing element may be a general-purpose processor, such as a central processing unit (CPU) or other processors that can call program codes.
- CPU central processing unit
- these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).
- SOC system-on-a-chip
- the present application also provides a transmission device 100 as shown in FIG. 3.
- the processor system 102 in the transmission device 100 reads the program and data set stored in the memory 101 to execute the aforementioned image transmission method.
- the embodiment of the present application also provides a computer-readable storage medium on which is stored computer instructions for implementing the method executed by the sending end or the method executed by the receiving end in the foregoing method embodiments.
- the transmission device can implement the method executed by the sending end or the method executed by the receiving end in the foregoing method embodiments.
- the embodiments of the present application also provide a computer program product containing instructions, which when executed, cause the computer to implement the method executed by the sending end or the method executed by the receiving end in the foregoing method embodiments.
- An embodiment of the present application also provides a transmission system, which includes the sending end and/or the receiving end in the above embodiment.
- the transmission system includes: the sending end and the receiving end in the embodiment corresponding to FIG. 4 or FIG. 9 above.
- the transmission system includes: the transmission device described in conjunction with FIG. 12 and the transmission device described in FIG. 13.
- the transmission system includes: the transmission device described above in conjunction with FIG. 3.
- the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- software it can be implemented in the form of a computer program product in whole or in part.
- the computer program product includes one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
- the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website site, computer, server or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state drive (SSD)).
- plural herein refers to two or more.
- the term “and/or” in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations.
- the character "/" in this article generally indicates that the associated objects before and after are in an "or” relationship; in the formula, the character "/" indicates that the associated objects before and after are in a "division" relationship.
- the size of the sequence numbers of the foregoing processes does not mean the order of execution.
- the execution order of the processes should be determined by their functions and internal logic, and should not be used for the implementation of this application.
- the implementation process of the example constitutes any limitation.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请实施例提供一种图像传输方法及装置,该方法包括:将第一高分辨率图像转换为第一低分辨率图像,第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率;编码第一低分辨率图像以得到第一码流;获取第二高分辨率图像,第二高分辨率图像的第三分辨率高于第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息;获取第一高分辨率图像和第二高分辨率图像之间的图像残差,图像残差用于反映该第一高分辨率图像的低频信息;编码图像残差以得到第二码流;发送第一码流和第二码流。本申请实施例可以大幅降低传输高分辨率视频的传输码率,从而可以满足实时传输高分辨率视频的应用场景。
Description
本申请实施例涉及图像处理技术,尤其涉及一种图像传输方法及装置。
视频泛指将一系列静态影像以电信号的方式加以捕捉、纪录、处理、储存、传送与重现的各种技术。视频的分辨率越高,视频的清晰度也就越高,视频的数据量也就越大。由于高分辨率视频的视觉效果相较于低分辨率视频的视觉效果更加逼真,因此,为了给用户带来良好的体验,越来越多的视频应用场景选择实时传输高分辨率视频,比如网络直播、在线观看高清视频、高清视频会议等。
目前,大部分传输设备传输高分辨率视频时,通常将高分辨率视频进行编码后进行传输。然而,该传输高分辨率视频的方式,无法满足实际使用时的需求。故,如何传输高分辨率视频是一个亟待解决的问题。
发明内容
本申请实施例提供一种图像传输方法及装置,用于实现高分辨率视频的传输。
第一方面,本申请实施例提供一种图像传输方法,该方法可以应用于传输系统中的发送端,也可以应用于该发送端中的芯片。本申请实施例的方法用于解决现有的传输高分辨率视频的方式,导致高分辨率视频的传输量大的问题。具体地:将第一高分辨率图像转换为第一低分辨率图像,其中,该第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。编码第一低分辨率图像以得到第一码流;获取第二高分辨率图像,该第二高分辨率图像的第三分辨率高于第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。获取第一高分辨率图像和第二高分辨率图像之间的图像残差,该图像残差用于反映低频信息。编码所述图像残差以得到第二码流。发送所述第一码流和所述第二码流。
应理解,上述所说的高频信息是图像细节信息,低频信息是图像轮廓信息。在第一高分辨率图像中,图像细节信息相对于图像轮廓信息更快速地变化。
可选的,该方法可以应用于传输视频,第一高分辨率图像为视频中的一帧图像。
本实施例提供的图像传输方法,发送端可以将待传输的高分辨图像分解为低分辨率图像,以及,用于反映高分辨率图像的低频信息的图像残差进行传输。由于本实施例中用于反映高分辨率图像的低频信息的图像残差的数据量,远小于用于反映高分辨率图像的低频信息和高频信息的图像残差。所以,传输本实施例中用于反映高分辨率图像的低频信息的图像残差所需的传输码率,相比传输用于反映高分辨率图像的低频信息和高频信息的图像残差所需的传输码率会大幅下降,即降低了传输量。另外,低分辨率图像所需的传输码率 也远低于传输高分辨率图像所需的传输码率。因此,本实施例中发送端和接收端之间可以采用较低的传输码率,即可实现高分辨率图像的传输。当该方法应用于传输高分辨率视频时,可以采用较低的传输码率即可实现高分辨率视频的传输,可以满足大部分传输设备所支持的传输码率,从而可以满足实时传输高分辨率视频的应用场景。
作为一种可能的实现方式,上述获取第二高分辨率图像可以包括:解码第一码流,以得到第二低分辨率图像。利用第二低分辨率图像重构第二高分辨率图像。例如,利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
通过采用解码后的第二低分辨率图像去获取第二高分辨率图像的方式,使发送端所获取到的第二高分辨率图像,涵盖了第一低分辨率图像在传输过程中的编解码损失。发送端在基于该第二高分辨率图像与第一高分辨率图像,所获取的图像残差也涵盖了第一低分辨率图像在传输过程中的编解码损失。这样,接收端在基于该第二低分辨率图像获取到第二高分辨率图像后,将第二高分辨图像与解码得到的图像残差进行合并时,可以消除第一低分辨率图像传输过程中的损失,此时,传输通路上只有图像残差的传输损失,减少了接收端恢复出来的第三高分辨率图像的损失,提高了图像的质量。
作为一种可能的实现方式,第一分辨率和第三分辨率相同,则上述获取第一高分辨率图像和第二高分辨率图像之间的图像残差,可以包括:将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。或者,第一分辨率大于第三分辨率,则上述获取第一高分辨率图像和第二高分辨率图像之间的图像残差,可以包括:将第二高分辨率图像转换为第四高分辨率图像,其中,第四高分辨率图像具有第一分辨率。将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差。通过该方式,可以通过多种方式获取图像残差,扩展了方案的应用场景。
作为一种可能的实现方式,上述编码第一低分辨率图像可以包括:以有损编码方式编码第一低分辨率图像。通过该方式,可以降低第一低分辨率图像的传输码率,从而进一步降低发送端和接收端之间传输高分辨率图像的传输码率。
作为一种可能的实现方式,上述编码图像残差可以包括:以有损编码方式编码图像残差。通过该方式,可以降低图像残差的传输码率,从而进一步降低发送端和接收端之间传输高分辨率图像的传输码率。
作为一种可能的实现方式,上述编码图像残差可以包括:以无损编码方式编码图像残差。通过该方式,可以避免图像残差的传输损失,确保图像残差的质量,从而保障了使用该图像残差所恢复的图像的质量。
第二方面,本申请实施例提供一种图像传输方法,该方法可以应用于传输系统中的发送端,也可以应用于该发送端中的芯片。本申请实施例的方法用于解决现有的传输高分辨率视频的方式,导致高分辨率视频的传输损失大的问题。具体地:将第一高分辨率图像转换为第一低分辨率图像,该第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。编码第一低分辨率图像,例如以有损编码方式编码所述第一低分辨率图像,以得到第一码流。解码第一码流,以得到第二低分辨率图像。利用第二低分辨率图像重构第二高分辨率图像,该第二高分辨率图像的第三分辨率高于第二分辨率。获取第一高分辨率图 像和第二高分辨率图像之间的图像残差。编码图像残差以得到第二码流。发送第一码流和第二码流。
可选的,该方法可以应用于传输视频,第一高分辨率图像为视频中的一帧图像。
本实施例提供的图像传输方法,发送端通过采用解码后的第二低分辨率图像去获取第二高分辨率图像的方式,使发送端所获取到的第二高分辨率图像,涵盖了第一低分辨率图像在传输过程中的编解码损失。发送端在基于该第二高分辨率图像与第一高分辨率图像,所获取的图像残差也涵盖了第一低分辨率图像在传输过程中的编解码损失。这样,接收端在基于该第二低分辨率图像获取到第二高分辨率图像后,将第二高分辨图像与解码得到的图像残差进行合并时,可以消除第一低分辨率图像传输过程中的损失,减少了接收端恢复出来的第三高分辨率图像的损失,提高了图像的质量。当该方法应用于传输高分辨率视频时,可以提高高分辨率视频的图像质量,提高了用户体验。
作为一种可能的实现方式,上述利用第二低分辨率图像重构第二高分辨率图像,可以包括:利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第一分辨率和第三分辨率相同,则上述获取第一高分辨率图像和第二高分辨率图像之间的图像残差,可以包括:将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。或者,第一分辨率大于第三分辨率,则上述获取第一高分辨率图像和第二高分辨率图像之间的图像残差,可以包括:将第二高分辨率图像转换为第四高分辨率图像,其中,第四高分辨率图像具有第一分辨率。将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差。通过该方式,可以通过多种方式获取图像残差,扩展了方案的应用场景。
作为一种可能的实现方式,上述编码图像残差可以包括:以有损编码方式编码图像残差。通过该方式,可以降低图像残差的传输码率,从而进一步降低发送端和接收端之间传输高分辨率图像的传输码率。
作为一种可能的实现方式,上述编码图像残差可以包括:以无损编码方式编码图像残差。通过该方式,可以避免图像残差的传输损失,进一步减少了接收端恢复出来的第三高分辨率图像的损失,提高了图像的质量。
可选的,该方法可以应用于传输视频,第一高分辨率图像为视频中的一帧图像。
第三方面,本申请实施例提供一种图像传输方法,该方法可以应用于传输系统中的接收端,也可以应用于该接收端中的芯片。具体地:接收第一码流和第二码流。解码第一码流,以得到第二低分辨率图像。利用第二低分辨率图像重构第二高分辨率图像,该第二高分辨率图像的第一分辨率高于第二低分辨率图像的第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。解码第二码流,以得到第一高分辨率图像和第二高分辨率图像之间的图像残差,该图像残差用于反映低频信息。将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像。
应理解,上述高频信息是图像细节信息,低频信息是图像轮廓信息。在第一高分辨率图像中,图像细节信息相对于图像轮廓信息更快速地变化。
可选的,该方法可以应用于传输视频,第一高分辨率图像为视频中的一帧图像。
作为一种可能的实现方式,上述利用第二低分辨率图像重构第二高分辨率图像,可以包括:利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,上述将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像,可以包括:将第二高分辨率图像中第二像素点的像素值,与图像残差中对应于第二像素点的像素点残差相加,以得到第三高分辨率图像中对应于第二像素点的第三像素点。
上述第三方面和第三方面的各可能的实现方式所提供的图像传输方法,其有益效果可以参见上述第一方面和第一方面的各可能的实现方式所带来的有益效果,在此不加赘述。
第四方面,本申请实施例提供一种传输装置,该传输装置可以包括:第一处理模块、第一编码模块、第二处理模块、第三处理模块、第二编码模块和发送模块。可选的,在一些实施例中,该传输装置还可以包括:解码模块。
第一处理模块,用于将第一高分辨率图像转换为第一低分辨率图像,该第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。第一编码模块,用于编码第一低分辨率图像以得到第一码流。第二处理模块,用于获取第二高分辨率图像,该第二高分辨率图像的第三分辨率高于第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。第三处理模块,用于获取第一高分辨率图像和第二高分辨率图像之间的图像残差,该图像残差用于反映低频信息。第二编码模块,用于编码图像残差以得到第二码流。发送模块,用于发送第一码流和第二码流。
应理解,上述高频信息是图像细节信息,低频信息是图像轮廓信息。在第一高分辨率图像中,图像细节信息相对于图像轮廓信息更快速地变化。
作为一种可能的实现方式,解码模块,用于解码第一码流,以得到第二低分辨率图像。第二处理模块,具体用于利用第二低分辨率图像重构第二高分辨率图像。例如,第二处理模块,具体用于利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第一分辨率和第三分辨率相同;第三处理模块,具体用于将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。或者,第一分辨率大于第三分辨率,第三处理模块,具体用于将第二高分辨率图像转换为第四高分辨率图像,并将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差,其中,第四高分辨率图像具有第一分辨率。
作为一种可能的实现方式,第一编码模块,具体用于以有损编码方式编码第一低分辨率图像。
作为一种可能的实现方式,第二编码模块,具体用于以有损编码方式编码图像残差。
作为一种可能的实现方式,第二编码模块,具体用于以无损编码方式编码图像残差。
作为一种可能的实现方式,该装置应用于传输视频,第一高分辨率图像为该视频中的一帧图像。
上述第四方面和第四方面的各可能的实现方式所提供的传输装置,其有益效果可以参见上述第一方面和第一方面的各可能的实现方式所带来的有益效果,在此不加赘述。
第五方面,本申请实施例提供一种传输装置,该传输装置可以包括:第一处理模块、 第一编码模块、解码模块、第二处理模块、第三处理模块、第二编码模块和发送模块。
第一处理模块,用于将第一高分辨率图像转换为第一低分辨率图像,该第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。第一编码模块,用于编码第一低分辨率图像(例如以有损编码方式编码所述第一低分辨率图像),以得到第一码流。解码模块,用于解码第一码流,以得到第二低分辨率图像。第二处理模块,用于利用第二低分辨率图像重构第二高分辨率图像,该第二高分辨率图像的第三分辨率高于第二分辨率。第三处理模块,用于获取第一高分辨率图像和第二高分辨率图像之间的图像残差。第二编码模块,用于编码图像残差以得到第二码流。发送模块,用于发送第一码流和第二码流。
作为一种可能的实现方式,第二处理模块,具体用于利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第一分辨率和第三分辨率相同,第三处理模块,具体用于将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。或者,第一分辨率大于第三分辨率,第三处理模块,具体用于将第二高分辨率图像转换为第四高分辨率图像,并将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差,其中,第四高分辨率图像具有第一分辨率。
作为一种可能的实现方式,第二编码模块,具体用于以有损编码方式编码图像残差。
作为一种可能的实现方式,第二编码模块,具体用于以无损编码方式编码图像残差。
作为一种可能的实现方式,该装置应用于传输视频,第一高分辨率图像为该视频中的一帧图像。
上述第五方面和第五方面的各可能的实现方式所提供的传输装置,其有益效果可以参见上述第二方面和第二方面的各可能的实现方式所带来的有益效果,在此不加赘述。
第六方面,本申请实施例提供一种传输装置,该传输装置可以包括:接收模块、第一解码模块、第一处理模块、第二解码模块和第二处理模块。
接收模块,用于接收第一码流和第二码流。第一解码模块,用于解码第一码流,以得到第二低分辨率图像。第一处理模块,用于利用第二低分辨率图像重构第二高分辨率图像,该第二高分辨率图像的第一分辨率高于第二低分辨率图像的第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。第二解码模块,用于解码第二码流,以得到第一高分辨率图像和第二高分辨率图像之间的图像残差,该图像残差用于反映低频信息。第二处理模块,用于将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像。
应理解,上述高频信息是图像细节信息,低频信息是图像轮廓信息。在第一高分辨率图像中,图像细节信息相对于图像轮廓信息更快速地变化。
作为一种可能的实现方式,第一处理模块,具体用于利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第二处理模块,具体用于将第二高分辨率图像中第二像素点的像素值,与图像残差中对应于第二像素点的像素点残差相加,以得到第三高分辨率图像中对应于第二像素点的第三像素点。
作为一种可能的实现方式,该装置应用于传输视频,第一高分辨率图像为视频中的一帧图像。
上述第六方面和第六方面的各可能的实现方式所提供的传输装置,其有益效果可以参见上述第一方面和第一方面的各可能的实现方式所带来的有益效果,在此不加赘述。
第七方面,本申请实施例提供一种传输设备,该传输设备包括:处理器系统、存储器。其中,存储器用于存储计算机可执行程序代码,程序代码包括指令;当处理器系统执行指令时,指令使传输设备执行如第一方面至第三方面任一项所描述的方法。
第八方面,本申请实施例提供了一种传输系统,该传输系统可以包括如前述第一方面所描述的发送端,以及,第三方面所描述的接收端;或者,如前述第二方面所描述的发送端;或者,如前述第四方面所描述的传输装置,以及,第六方面所描述的传输装置;或者,如前述第五方面所描述的传输装置;或者,如前述第七方面所描述的传输设备。
第九方面,本申请实施例提供一种芯片,该芯片上存储有计算机程序,在计算机程序被该芯片执行时,实现如第一方面至第三方面任一项所描述的方法。
第十方面,本申请实施例提供一种传输装置,包括用于执行以上第一方面或第一方面各可能的实施方式,或者,第二方面或第二方面各可能的实施方式所提供的方法的单元、模块或电路。该传输装置可以为发送端,也可以为应用于发送端的一个模块,例如,可以为应用于发送端的芯片。
第十一方面,本申请实施例提供一种传输装置,包括用于执行以上第三方面或第三方面各可能的实施方式所提供的方法的单元、模块或电路。该传输装置可以为接收端,也可以为应用于接收端的一个模块,例如,可以为应用于接收端的芯片。
第十二方面,本申请实施例提供一种计算机可读存储介质,用于存储计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行如第一方面至第三方面任一项所描述的方法。
第十三方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面至第三方面任一项所描述的方法。
第十四方面,本申请实施例提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行第一方面至第三方面任一项所描述的方法。
图1为本申请实施例应用的传输系统的架构示意图;
图2为一种视频传输的示意图;
图2A为一种图像的频域示意图;
图3为本申请实施例提供的一种传输设备的结构示意图;
图4为本申请实施例提供的一种图像传输方法的流程示意图;
图5为本申请实施例提供的一种图像的示意图;
图6为本申请实施例提供的另一种图像的示意图;
图7为本申请实施例提供的又一种图像的示意图;
图8为本申请实施例提供的一种获取第二高分辨率图像的流程示意图;
图9为本申请实施例提供的一种神经网络模型的示意图;
图10为本申请实施例提供的一种视频传输的流程示意图;
图11为本申请实施例提供的另一种图像传输方法的流程示意图;
图12为本申请实施例提供的一种传输装置的结构示意图;
图13为本申请实施例提供的另一种传输装置的结构示意图。
图1为本申请实施例应用的传输系统的架构示意图。如图1所示,该传输系统可以包括发送端和接收端。其中,发送端与接收端通过无线或有线方式连接。图1只是示意图,该传输系统中发送端与接收端之间还可以包括其它网络设备,例如还可以包括中继设备和回传设备等,在图1中未画出。本申请实施例对该传输系统中包括的发送端和接收端的数量不做限定。为了便于描述,下述申请文件中将发送端和接收端统称为传输设备。
传输设备可以是固定位置的,也可以是可移动的。例如,传输设备可以部署在陆地上,包括室内或室外、手持或车载;也可以部署在水面上;还可以部署在空中的飞机、气球和人造卫星上。
本申请实施例所涉及的传输设备可以是服务器、终端设备等。终端设备也可以称为终端Terminal、用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等。终端设备可以是手机(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等。
本申请实施例对传输系统的应用场景不做限定。例如,该传输系统可以应用于视频传输和/或其他静态图像传输的场景。虽然后续介绍主要以视频传输和处理为例,但是相关技术方案也可以用于静态的图像的传输和处理,本实施例对此不限定。
为了便于理解传输系统传输视频的流程,下面先对一些概念进行解释和说明。
像素点是图像中单位,一帧图像包括多个像素点,每个像素点在图像中都有一个明确的位置和被分配的色彩数值,所有像素点的位置和色彩数值决定了图像呈现出来的样子。在一些实施例中,像素点也可以称为像素,本申请实施例对此不进行区分。
每个像素点的数据(即用于反映色彩数值的数据)与图像的格式有关。目前,常见的图像格式有RBG格式和YUV格式。当图像的格式为RBG格式时,每个像素点包括3个通道的数据,分别为表征图像红色感光度的数据、图像蓝色感光度的数据和图像绿色感光度的数据。当图像的格式为YUV格式时,每个像素点包括1.5个通道的数据,分别为:表征图像亮度的数据和表征图像色彩的数据。应理解,图像的具体格式与发送端的图像信号处理器(image signal processor,ISP)对图像的处理方式有关,具体可以参见现有技术,对此不再赘述。
一帧图像所包括的像素点的数量与图像的分辨率相关。以图像的分辨率为3840×2160为例,则该分辨率表征图像在横向上每一行有3840个像素点,在垂向上每一列有2160个像素点。应理解,当图像大小固定时,图像的分辨率越高,图像所包含的像素点越多,图像 也就越清晰(或者说图像的质量越高)。相应地,图像所包含的数据也就越多。
视频是包括连续的多个静态图像帧,一帧即为一帧图像(也可以称为一幅图像)。也可以说,视频是连续的图像序列。因此,图像的分辨率也可以理解为是视频的分辨率,图像的格式也可以理解为是视频的格式。即,视频包括连续的相同分辨率、且相同格式的静态图像。视频的分辨率越高,视频的清晰度越高,视频的数据量也就越大。为了便于理解,下述表1示例性的给出了一些常见的视频的分辨率的示例。应理解,表1仅是一种示意,并非对视频的分辨率的限定。
表1
一般来说,视频分辨率的高低是一个相对的概念。例如,以2k视频为例,相对于1080p的视频来说,2k视频为高分辨率视频;相对于4k视频来说,2k视频为低分辨率视频。在本申请实施例中,为了便于描述,将分辨率高、且数据量大的视频称为高分辨率视频。例如,4k视频、8k视频等。因此后续实施例涉及的高分辨率视频的分辨率高于低分辨率视频的分辨率,具体数值参照上述举例但不用于限定。
图2为一种视频传输的示意图。如图2所示,目前,发送端与接收端通常采用如下方式进行高分辨率视频的传输:
S201、发送端对待传输的高分辨率视频HR进行编码,得到该高分辨率视频HR的码流HR_str。其中,发送端对高分辨率视频HR进行编码时所使用的编码方式可以是通用的编码方式。例如,采用H.264协议的编码方式或采用H.265协议的编码方式等。
S202、发送端向接收端发送该码流HR_str。相应地,接收端接收该码流HR_str。
S203、接收端对该码流HR_str进行解码,得到解码后的高分辨率视频HR_end。即,HR_end为接收端获得的高分辨率视频。其中,接收端对码流HR_str进行解码时所使用的解码方式,与,发送端所使用的编码方式相关。例如,发送端采用H.264协议的编码方式,相应地,接收端可以采用H.264协议的解码方式。发送端采用H.265协议的编码方式,相应地,接收端可以采用H.265协议的解码方式。
视频的传输码率可以如下述公式(1)所示:
传输码率=每帧图像的大小×帧数/压缩比 (1)
其中,传输码率是指每秒传输的视频的比特数,该传输码率是指发送端发送该视频时所需使用的码率,以及,接收端接收该视频时所需使用的码率。帧数为每秒传输的视频的帧数。压缩比与所采用的编码方式、以及,视频的分辨率有关。每帧图像的大小可以如下 述公式(2)所示:
图像的大小=分辨率×通道数×数据位宽 (2)
其中,通道数可以根据视频的格式确定。例如,当视频的格式为RBG格式时,通道数为3,当视频的格式为YUV格式时,通道数为1.5。数据位宽为一个像素点的每个数据所占比特数,通常为8比特。
示例性的,以通过上述图2所示的传输方式传输4k视频为例,假定每秒可以传输4k视频的30帧图像,其中,该4k视频的分辨率为3840×2160,该4k视频的格式为YUV格式,一个像素点的每个数据使用8比特位宽。发送端采用H.265协议的编码方式对该4k视频编码。通常情况下,H.265协议的编码方式对4k视频的压缩比可达100倍。
在该示例下,通过公式(1)的计算方式可知,传输该4k视频时需要采用如下传输码率:
(3840×2160×1.5×8)×30/100=29859840
也就是说,发送端在发送该4k视频时,传输码率需要达到29859840比特/秒(bits per second,bps),约28.5兆比特/秒(million bits per second,Mbps)。相应地,接收端接收该4k视频时,也需要使用28.5Mbps的传输码率。
通过该示例可以看出,由于高分辨率视频的数据量较大,传输设备需要使用较大的传输码率(例如30Mbps以上的传输码率)传输高分辨率视频,即传输量大。然而,受限于传输设备的带宽、功耗等原因,传统传输设备支持的传输码率为1Mbps至20Mbps,导致高分辨率视频的传输延时大、传输效率低、图像质量差、视频卡顿等问题,无法满足实时传输高分辨率视频的应用场景。这里所说的实时传输高分辨率视频的应用场景例如可以为网络直播、在线观看高清视频、高清视频会议等。
另外,传输设备在采用图2所示的方式传输高分辨率视频时,采用的是通用的编解码协议。由于通用的编解码协议中的编码方式是通过损失一定的图像质量来获得较高的压缩比。因此,接收端解码恢复出来的高分辨率视频的损失较大,降低了接收端的用户体验。故,如何传输高分辨率视频是一个亟待解决的问题。
图2A为一种图像的频域示意图。如图2A所示,通过对图像研究发现,一帧图像包括高频信息(通常是图像细节信息)和低频信息(通常是图像轮廓信息)。在对图像进行频域转换后,可以看出图像细节信息的频率相对于图像轮廓信息的频率更高,频率变化也更快。即,图像细节信息相对于图像轮廓信息在图像中更快速地变化,具体可参照后续图6和图7的对比介绍,具体请参见后续描述。或者说,频率越高的图像信息的变化越快。频率越低的图像信息的变化越缓慢。一般来说,反映图像细节信息(即高频信息)的数据量通常大于反映图像轮廓信息(即低频信息)的数据量。在一些实施例中,上述低频信息也可以称为低频分量,上述高频信息也可以称为高频分量。需要说明的是,这里所说的高频和低频并非是某一个固定频段,而是在一帧图像中相对的两种频率。即,在一帧图像中,相对高的频率称为高频,相对低的频率称为低频。不同图像中的高频和低频的范围可能不同,只要是低频信息在频域上的频率低于高频信息即可。
鉴于该特点,本申请实施例提供了一种图像传输方法,通过将高分辨率图像分解为低分辨率图像,以及,用于反映高分辨率图像的低频信息的图像残差进行传输,由于低分辨率图像和图像残差所携带的数据量较少,因此,可以大幅降低传输高清图像的传输码率, 即降低了传输量。当将该方法应用于传输高分辨率视频时,也可以大幅降低高分辨率视频的传输码率,从而可以满足实时传输高分辨率视频的应用场景。应理解,本申请实施例提供的图像传输方法包括但不限于上述高清图像和/或高分辨率视频的传输场景,也可以适用于其他任意需要传输图像和/或视频的场景,对此不再赘述。
为了便于对本申请实施例的理解,下面先对本申请实施例所涉及的一种传输设备100进行示例说明。图3为本申请实施例提供的一种传输设备100的结构示意图。如图3所示,该传输设备100包括:存储器101、处理器系统102。存储器101、处理器系统102之间彼此通信连接。例如,存储器101、处理器系统102之间可以采用网络连接的方式,实现通信连接。或者,上述传输设备100还可以包括总线103。存储器101、处理器系统102通过总线103实现彼此之间的通信连接。图3是以存储器101、处理器系统102通过总线103实现彼此之间的通信连接的传输设备100。应理解,当上述传输设备100包括总线103时,总线103可包括在传输设备100各个部件(例如,存储器101、处理器系统102)之间传送信息的通路。
存储器101可以包括只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器101可以存储程序,当存储器101中存储的程序被处理器系统102执行时,处理器系统102和通信接口103用于执行本申请实施例提供的图像传输方法中发送端的动作,或者接收端的动作。
处理器系统102可以包括执行本申请实施例提供的图像传输方法的各种处理器,例如下述至少一项:处理器31、编码器32、解码器33、图像处理器(image signal processor,ISP)34、嵌入式神经网络处理器(neural-network processing unit,NPU)35、通信接口36。
处理器31例如可以用于进行图像分辨率转换、获取图像残差等操作;编码器32例如可以用于对图像进行编码处理,解码器33例如可以用于对图像进行解码处理;ISP34例如可以用于对图像数据进行处理得到一帧或多帧图像。NPU例如可以用于实现神经网络的功能,通信接口36例如可以用于实现传输设备100与其他设备或通信网络之间的通信。可选的,在一些实施例中,还可以使用图形处理器(graphics processing unit,GPU)替换上述NPU,以代替NPU实现神经网络的功能,或者,使用CPU代替NPU实现神经网络的功能等。
其中,处理器31可以包括通用的中央处理器(Central Processing Unit,CPU),微处理器,或微控制器。以及进一步包括应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路。
处理器31还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的传输设备100的功能可以通过处理器31中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器31还可以包括通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请下文实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,如之前提到的CPU,微处理器,或微控制器。结合本申请下文实施例所公开的方法的步骤可以直接体现为处理器系统执行完成,或者用处理器系统中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器101,处理器31读取存储器101中的信息,结合其硬件完成本 申请实施例的传输设备100的功能。
通信接口36使用例如但不限于收发器一类的收发模块,来实现传输设备100与其他设备或通信网络之间的通信。例如,可以通过通信接口36进行图像传输。可选的,在一些实施例中,上述传输设备100还可以包括:摄像头传感器104、电源105,本实施例对此不限定。
下面以图3所示的传输设备作为发送端和接收端为例,结合具体地实施例对本申请实施例的技术方案进行详细说明。即发送端和接收端均可采用图3所示的设备,当然而这两个设备不必完全相同,局部的设计可略有变化,不影响技术实施。下面这几个具体的实施例可以相互结合,对于相同或者相似的概念或者过程可能在某些实施例不再赘述。
图4为本申请实施例提供的一种图像传输方法的流程示意图。本实施例涉及的是将待传输的原始高分辨率图像分解为低分辨率图像,以及,用于反映高分辨率图像的低频信息的图像残差进行传输的过程。如图4所示,该方法可以包括:S401、发送端将第一高分辨率图像转换为第一低分辨率图像。上述第一高分辨率图像即为待传输的原始高分辨率图像,该第一高分辨率图像的第一分辨率例如可以为4k或8k等。可选的,当该方法应用于传输视频时,第一高分辨率图像可以为该视频中的一帧图像。此时,第一高分辨率图像的第一分辨率也可以称为是该视频的分辨。
在本实施例中,发送端对第一高分辨率图像的分辨率进行了转换,得到了第一低分辨率图像。即,第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。示例性的,第一低分辨率图像的第二分辨率例如可以为720p、480p、CIF等。示例性的,发送端可以通过发送端的CPU31对第一高分辨率图像的分辨率进行转换,得到第一低分辨率图像。
本实施例不限定上述发送端将第一高分辨率图像转换为第一分辨率图像的实现方式。以将分辨率为3840×2160的第一高分辨率图像转换为分辨率1280×720的第一低分辨率图像为例,发送端例如可以采用下述表2所示的方式,将第一高分辨率图像转换为第一分辨率图像:
表2
应理解,上述表2仅是一种示意。具体实现时,上述发送端还可以采用其他任意能够实现分辨率转换的方式,将第一高分辨率图像转换为第一低分辨率图像。例如,基于分布采样的方式、马克科夫链蒙特卡洛采样的方式、吉布斯采样的方式等,对此不再赘述。
上述第一高分辨率图像包括高频信息(通常是图像细节信息)和低频信息(通常是图像轮廓信息)。其中,图像细节信息相对于图像轮廓信息更快速地变化。下面通过一个示例对第一高分辨率图像进行示例说明。图5为本申请实施例提供的一种图像的示意图。图5所示的图像为第一高分辨率图像(即原始高分辨率图像),该第一高分辨率图像可以清晰的反映动物的细节信息,以及,该动物的轮廓信息。
S402、发送端编码第一低分辨率图像以得到第一码流。例如,发送端可以采用通用的编解码协议中的编码方式,对第一低分辨率图像进行编码,以得到第一码流。通用的编解码协议中的编码方式例如可以是:采用H.264协议的编码方式、采用H.265协议的编码方式(也可以称为高效视频编码(high efficiency video coding,HEVC))、VP8协议的编码方式、VP9协议的编码方式、RV40协议的编码方式等中任一种。应理解,这里所说的通用的编解码协议中的编码方式也可以称为有损编码方式。即,通过损失一定的图像质量来获得较高的压缩比的编码方式。例如,有损编码方式的压缩比可以为50/100/500等。示例性的,发送端可以通过发送端的编码器32编码第一低分辨率图像以得到第一码流。
S403、发送端获取第二高分辨率图像。该第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。其中,第二高分辨率图像的第三分辨率高于第二分辨率。
在本实施例中,第二高分辨率图像中包括第一高分辨率图像的高频信息,或者说,包括第一高分辨率图像中变化快速的细节信息,或者说,包括第一高分辨率图像中的高频分量。继续延续图5的示例,图6为本申请实施例提供的另一种图像的示意图。图6所示的图像为图5所示的第一高分辨率图像对应的第二高分辨率图像。通过对比图5和图6可知,图6所示的第二高分辨率图像中主要包括该动物的细节信息。
可选的,发送端可以根据第一低分辨率图像获取该第二高分辨率图像。或者,发送端可以先对第一低分辨图像进行编码并进一步解码,得到第二低分辨率图像,然后,发送端使用第二低分辨率图像获取该第二高分辨率图像等。或者,发送端可以根据第一高分辨率图像获取该第二高分辨率图像等。
S404、发送端获取第一高分辨率图像和第二高分辨率图像之间的图像残差,其中,图像残差用于反映第一高分辨率图像的低频信息。或者说,该图像残差用于反映第一高分辨率图像中的图像轮廓信息的数据,或者说,用于反映第一高分辨率图像中的低频分量。继续延续图5的示例,图7为本申请实施例提供的又一种图像的示意图。图7所示的图像为图像残差,通过对比图5、图6和图7可知,第一高分辨率图像中包括该动物的细节信息,以及,该动物的轮廓信息,第二高分辨率图像中主要包括该动物的细节信息(即高频信息),而图像残差中用于反映该动物的轮廓信息(即低频信息)。图6的细节信息的纹理在图像中是快速变化的,而图7中的轮廓信息的纹理是缓慢变化的。
例如,第一高分辨率图像的第一分辨率和第二高分辨率的第三分辨率相同,发送端可以将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。示例性的,发送端可 以通过发送端的处理器31获取第一高分辨率图像和第二高分辨率图像之间的图像残差。
S405、发送端编码图像残差以得到第二码流。例如,发送端可以采用通用的编解码协议中的编码方式,对图像残差进行编码。即,以有损编码方式编码图像残差。发送端也可以采用单独的编码方式,对图像残差进行编码。例如,熵编码、自编码网络等。这里所说的单独的编码方式所使用的压缩比较低,对图像质量的损失可以忽略不计。例如,该单独的编码方式的压缩比可以为0.5至2之间。在一些实施例中,这种编码方式也可以称为无损编码。即,以无损编码方式编码图像残差。示例性的,发送端可以通过发送端的编码器32编码图像残差以得到第二码流。
S406、发送端发送第一码流和第二码流。相应地,接收端接收该第一码流和第二码流。例如,发送端可以通过发送端的通信接口36将第一码流和第二码流发送至接收端的通信接口36,以使接收端接收到该第一码流和第二码流。
S407、接收端解码第一码流,以得到第二低分辨率图像。接收端解码第一码流时所使用的解码方式,与,发送端对第一低分辨率图像编码时所采用的编码方式相关。即,发送端采用何种协议的编码方式对第一低分辨率图像进行编码,则接收端需对应的采用该协议的解码方式对第一码流进行解码,对此不再赘述。例如,若发送端采用VP8协议的编码方式对第一低分辨率图像编码进行编码,相应地,接收端可以采用VP8协议的解码方式对第一码流进行解码。此时,接收端解码第一码流所得到的第二低分辨率图像具有第二分辨率,即,第二低分辨率图像的分辨率与第一低分辨率图像的分辨率相同。示例性的,接收端可以通过接收端的解码器33解码第一码流,以得到第二低分辨率图像。
S408、接收端利用第二低分辨率图像重构第二高分辨率图像。接收端利用第二低分辨率图像重构第二高分辨率图像的方式,与,发送端如何获取第二高分辨率图像的方式有关。此部分内容将在下文中进行重点介绍。
S409、接收端解码第二码流,以得到第一高分辨率图像和第二高分辨率图像之间的图像残差。接收端解码第二码流时所使用的解码方式,与,发送端对该图像残差编码时所采用的编码方式相关。即,发送端采用何种协议的编码方式对图像残差进行编码,则接收端需对应的采用该协议的解码方式对第二码流进行解码,对此不再赘述。示例性的,接收端可以通过接收端的解码器33解码第二码流,以得到第一高分辨率图像和第二高分辨率图像之间的图像残差。
S410、接收端将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像。例如,接收端将第二高分辨率图像中第二像素点的像素值,与图像残差中对应于第二像素点的像素点残差相加,以得到第三高分辨率图像中对应于第二像素点的第三像素点。各第三像素点组合即可得到该第三高分辨率图像。此时,该第三高分辨率图像即为接收端获得的高分辨率图像。
应理解,上述均以第一高分辨率图像的第一分辨率和第二高分辨率的第三分辨率相同为例,对发送端如何获取图像残差,以及,接收端如何使用该图像残差获取第三高分辨率图像进行了示例说明。
在一些实施例中,若第一高分辨率图像的第一分辨率和第二高分辨率的第三分辨率不同,例如,第一高分辨率图像的第一分辨率大于第二高分辨率的第三分辨率,或者,第一高分辨率图像的第一分辨率小于第二高分辨率的第三分辨率。则在该情况下,发送端可以 先将两个高分辨率图像的分辨率转换成相同的分辨率,然后再进行相减处理。
例如,发送端可以先将第二高分辨率图像转换为第四高分辨率图像,其中,第四高分辨率图像具有第一分辨率。即,第一高分辨率图像的分辨率与第四高分辨率图像的分辨率相同。然后,发送端将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差。
再例如,将第一高分辨率图像转换为第五高分辨率图像,其中,第五高分辨率图像具有第三分辨率。即,第五高分辨率图像的分辨率与第二高分辨率图像的分辨率相同。然后,发送端将第五高分辨率图像中第四像素点的像素值,与第二高分辨率图像中对应于第四像素点的第二像素点的像素值相减,以得到图像残差中的第二像素点残差。相应地,接收端也可以在解码第二码流获取到第二高分辨率图像之后,先将第二高分辨率图像与图像残差合并,以得到第五高分辨率图像。然后,接收端可以对该第五高分辨率图像进行分辨率转换,以得到第三高分辨率图像。至此,发送端将该第一高分辨率图像传输至接收端。
示例性的,接收端可以通过接收端的处理器31将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像。如果不考虑图像编码损失,第三高分辨率图像可能与发送端的第一高分辨率图像相同。如果编码会导致损失,第三高分辨率图像可能与发送端的第一高分辨率图像则不同,会略有损失,可以认为恢复出的第三高分辨率图像大致与第一高分辨率图像相同。
本实施例提供的图像传输方法,发送端可以将待传输的高分辨图像分解为低分辨率图像,以及,用于反映高分辨率图像的低频信息的图像残差进行传输。由于本实施例中用于反映高分辨率图像的低频信息的图像残差的数据量,远小于用于反映高分辨率图像的低频信息和高频信息的图像残差。所以,传输本实施例中用于反映高分辨率图像的低频信息的图像残差所需的传输码率,相比传输用于反映高分辨率图像的低频信息和高频信息的图像残差所需的传输码率会大幅下降。另外,低分辨率图像所需的传输码率也远低于传输高分辨率图像所需的传输码率。因此,本实施例中发送端和接收端之间可以采用较低的传输码率,即可实现高分辨率图像的传输。当该方法应用于传输高分辨率视频时,可以采用较低的传输码率即可实现高分辨率视频的传输,可以满足大部分传输设备所支持的传输码率,从而可以满足实时传输高分辨率视频的应用场景。
下面对发送端如何获取第二高分辨率图像,以及,接收端如何使用第二低分辨率图像重建第二高分辨率图像进行说明。图8为本申请实施例提供的一种获取第二高分辨率图像的流程示意图。本实施例涉及的发送端使用第二低分辨率图像重构第二高分辨率图像。如图8所示,上述步骤S402可以包括:S801、发送端解码第一码流,以得到第二低分辨率图像。发送端解码第一码流时所使用的解码方式,与,发送端对第一低分辨率图像编码时所采用的编码方式相关。即,发送端采用何种协议的编码方式对第一低分辨率图像进行编码,则需对应的采用该协议的解码方式解码该第一码流,对此不再赘述。示例性的,发送端的解码器33可以解码第一码流,以得到第二低分辨率图像。
S802、发送端利用第二低分辨率图像重构第二高分辨率图像。相应地,接收端在S408中可以采用同样方式利用第二低分辨率图像重构第二高分辨率图像。通过采用解码后的第二低分辨率图像去获取第二高分辨率图像的方式,使发送端所获取到的第二高分辨率图像,涵盖了第一低分辨率图像在传输过程中的编解码损失。发送端在基于该第二高分辨率图像 与第一高分辨率图像,所获取的图像残差也涵盖了第一低分辨率图像在传输过程中的编解码损失。这样,接收端在基于该第二低分辨率图像获取到第二高分辨率图像后,将第二高分辨图像与解码得到的图像残差进行合并时,可以消除第一低分辨率图像传输过程中的损失,此时,通路上只有图像残差的传输损失,减少了接收端恢复出来的第三高分辨率图像的损失(例如只有图像残差的损失),甚至无损失(例如,使用无损编码方式对图像残差进行编码的场景),提高了图像的质量。
可选的,发送端可以采用如下方式,利用第二低分辨率图像重构第二高分辨率图像。
第一种方式:发送端利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。相应地,接收端也可使用神经网络重构第二高分辨率图像。即,发送端与接收端部署有相同的神经网络,以采用将第二分辨率图像输入相同的神经网络,获取神经网络输出的第二高分辨率图像。
神经网络包括两部分,分别为网络结构和卷积核的参数。应理解,这里所说的网络结构例如可以包括下述至少一项信息:网络层数、每层的卷积核的个数、每个卷积核的尺寸、各层的连接关系等。卷积核的参数用于约束卷积核所执行的操作。因此,本申请实施例所涉及的神经网络可以分2步实现:第1步设计网络结构,第2步训练神经网络,获得各卷积核的参数。
图9为本申请实施例提供的一种神经网络模型的示意图。如图9所示,网络结构可以根据神经网络所实现的任务目标所确定。本实施例中的神经网络所实现的任务目标为:使用低分辨率图像构建涵盖图像细节信息的高分辨率图像。也就是说,本实施例中的神经网络需要具有两个功能:一个功能是提取图像细节信息,另一个功能是分辨率转换。因此,本实施例中的神经网络的网络架构可以包括两个部分:一部分是用于提取图像细节信息的网络层(简称:提取层),另一部分是用于实现低分辨率转换高分辨率的网络层(简称:转换层)。应理解,这里所说的提取层和转换层均可以包括至少一个卷积层。提取层可以使用1~N个卷积层实现,转换层可以使用1~M个反卷积层实现。其中N和M均为大于或等于2的整数。
虽然图9是以提取层位于转换层之前为例的示意图。但是应理解,该两部分网络层在神经网络中的位置不进行限定。例如,提取层可以位于转换层之后,也可以位于转换层之前。即,神经网络可以先执行提取图像细节信息的操作,然后再执行低分辨率转换高分辨率的操作;或者,可以先执行低分辨率转换高分辨率的操作,然后再执行提取图像细节信息的操作;或者,也可以在执行完部分提取图像细节信息的操作之后,执行低分辨率转换高分辨率的操作,然后,再执行剩余的提取图像细节信息的操作。
示例性的,如下以应用于RBG格式的图像的神经网络为例,给出了两种实现本申请实施例的神经网络的示例:
表3
其中,上述表中长×宽用于表征该卷积层所使用的卷积核的大小。每一层的输入通道的大小可以根据上一层的卷积核的个数确定,第一层的每个输入通道对应RBG格式的图像的一个通道。
表4
通过表3和表4可以看出,表3所示的神经网络是先执行提取图像细节信息的操作,然后再执行低分辨率转换高分辨率的操作。表4所示的神经网络是在执行完部分提取图像细节信息的操作之后,执行低分辨率转换高分辨率的操作,然后,再执行剩余的提取图像细节信息的操作。
虽然两种结构的神经网络都可以实现基于第二低分辨率图像重构第二高分辨率图像,但是表3是基于低分辨率图像执行的提取图像细节信息的操作,而表4所示的神经网络中部分提取图像细节信息的操作是基于高分辨率图像执行的。由于高分辨率图像的数据量大于低分辨率图像的数据量,因此,表4所示的神经网络在实现基于第二低分辨率图像重构第二高分辨率图像时,相比表3所示神经网络会占用更大的运算量,但是所提取的图像细节信息会更加精准。因此,具体实现时,可以根据实际需求选择该神经网络的具体架构。可以理解,虽然上述示例给出了应用于RBG格式的图像的神经网络的示例,但是,本申请实施例对应用于何种格式的图像的神经网络的结构并不进行限定。
在确定神经网络的架构之后,例如可以采用如下方式对神经网络进行训练,以得到各卷积核的参数:首先,构建训练该神经网络的训练集,该训练集可以包括S组样本,每组样本包括一帧输入图像X和标签图像Y。其中,输入图像X为低分辨率图像,标签图像Y即为该低分辨率图像输入到神经网络后要达到的目标输出。其中,S为大于或等于1的整数。
该S组样本可以使用S个高分辨率图像获得。具体地,以第i个高分辨率图像为例,可以将第i个高分辨率图像进行分辨率转换,得到第i个高分辨率图像对应的低分辨率图像。该低分辨率图像即为输入图像Xi。其中,i为大于0、且小于或等于S的整数。
同时,可以将第i个高分辨率图像进行频域转换(例如,傅里叶变换、小波变换、或其他空域转频域的方法等),得到该第i个高分辨率图像的频域信息。然后,可以对第i个高分辨率图像的频域信息进行提取(例如使用高通滤波器进行频域信息的提取),以得到第i个高分辨率图像的高频信息。最后,通过将第i个高分辨率图像的高频信息转换为图像。例如,可以通过频域转空域的方式,将高频信息转换为图像。示例性的,可以通过傅里叶 逆变换,将第i个高分辨率图像的高频信息转换为图像。此时,所得到的图像即为输入图像Xi的标签图像Yi。其中,上述高频信息的频率大于或等于预设频率阈值。应理解,此处所说的高频信息即为图像细节信息。关于预设频率阈值可以根据实际需求确定。
其次,构建该神经网络的目标函数。由于训练该神经网络的目的是通过不断调整和优化各卷积核的参数,使神经网络根据输入的输入图像X所输出的图像Y’与标签图像Y尽可能相似,因此,该神经网络的目标函数例如可以定义为min||Y-Y’||
2。
最后,使用构建好的训练集和目标函数训练该神经网络,直至该目标函数的取值小于或等于预设阈值。关于如何使用构建好的训练集和目标函数训练该神经网可以参见现有技术,对此不再赘述。此时,该训练好的神经网络模型即可应用于上述发送端和接收端,以利用第二低分辨率图像重构第二高分辨率图像。
应理解,上述示例所描述的神经网络模型,以及,训练神经网络模型的方式仅是一种示例。由于图像的格式众多,此处不再一一列举应用于各种格式的图像的神经网络。具体实现时,本领域技术人员可以根据所需处理的图像的格式,以及,该神经网络所需实现的任务目标“使用低分辨率图像构建涵盖图像细节信息的高分辨率图像”,进行相应神经网络的构建和训练,以实现上述功能,对此不再赘述。
另外,虽然上述实施例均以神经网络模型为例,对如何利用第二低分辨率图像重构第二高分辨率图像进行了示例说明。但是,本领域技术人员可以理解的是,也可以采用其他人工智能(artificial intelligence,AI)模型替代上述神经网络,实现利用第二低分辨率图像重构第二高分辨率图像的功能,对此不再赘述。本实施例中涉及的神经网络包括但不限于卷积神经网络。
示例性的,发送端可以通过发送端的NPU35,利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。或者,发送端可以通过发送端的GPU(图3未示出),利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。或者,发送端可以通过发送端的处理器31,利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。或者,发送端可以通过发送端的处理器31和GPU,利用第二低分辨率图像,使用神经网络重构第二高分辨率图像等。相应地,接收端也可以采用类似的方式实现,对此不再赘述。
第二种方式:发送端可以对第二低分辨率图像进行分辨率转换,得到一个高分辨率图像(例如第六高分辨率图像)。然后,发送端可以对该第六高分辨率图像进行频域转换,以得到该第六高分辨率图像的频域信息。然后,发送端可以对该第六高分辨率图像的频域信息进行提取,以得到该第六高分辨率图像的高频信息。最后,发送端可以将该第六高分辨率图像的高频信息转换为第二高分辨率图像。关于如何实现频域转换,以及,频域信息的提取,以及,如何将高频信息转换为图像,可以参见前述构建训练神经网络的样本集中的内容,对此不再赘述。相应地,接收端也可以采用该方式,利用第二分辨率图像重构第二高分辨率图像。
示例性的,发送端可以通过发送端的GPU(图3未示出),使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31,使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31和GPU,使用该方式获取第二高分辨率图像等。相应地,接收端也可以采用类似的方式实现,对此不再赘述。
应理解,上述示例仅是对利用第二低分辨率图像重构第二高分辨率图像的示意。本领 域技术人员也可以利用第二低分辨率图像,采用其他方式重构第二高分辨率图像,对此不进行限定。
可选的,在一些实施例中,上述发送端也可以直接使用第一低分辨率图像重构第二高分辨率图像。例如,发送端可以将第一低分辨率图像输入神经网络,获取第二高分辨率图像。相应地,接收端可以将第二低分辨率图像输入神经网络,获取第二高分辨率图像。示例性的,发送端可以通过发送端的GPU(图3未示出),使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31,使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31和GPU,使用该方式获取第二高分辨率图像等。相应地,接收端也可以采用类似的方式实现,对此不再赘述。
或者,发送端可以先将第一低分辨率图像转换为第七高分辨率图像,并对该第七高分辨率图像进行高频信息提取,以获取第二高分辨率图像。相应地,接收端可以对第二低分辨率转换为第八高分辨率图像,并对该第八高分辨率图像进行高频信息提取,以获取第二高分辨率图像。示例性的,发送端可以通过发送端的GPU(图3未示出),使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31,使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31和GPU,使用该方式获取第二高分辨率图像等。相应地,接收端也可以采用类似的方式实现,对此不再赘述。
在另一些实施例中,上述发送端还可以使用第一高分辨率图像重构第二高分辨率图像。例如,可以对第一高分辨率图像进行高频信息提取,以获取第二高分辨率图像。相应地,接收端可以对第二低分辨率转换为第七高分辨率图像,并对该第七高分辨率图像进行高频信息提取,以获取第二高分辨率图像。示例性的,发送端可以通过发送端的GPU(图3未示出),使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31,使用该方式获取第二高分辨率图像。或者,发送端可以通过发送端的处理器31和GPU,使用该方式获取第二高分辨率图像等。相应地,接收端也可以采用类似的方式实现,对此不再赘述。
下面以将本申请实施例提供的图像传输方法应用于传输视频为例,对如何传输视频进行示例说明:图10为本申请实施例提供的一种视频传输的流程示意图。如图10所示,该方法包括:(1)发送端将待传输的高分辨率视频HR转换为低分辨率视频LR。其中,原始高分辨率视频HR的分辨率高于低分辨率视频LR的分辨率。该待传输的高分辨率视频HR也可以称为原始高分辨率视频,该待传输的高分辨率视频HR的分辨率例如可以为4k或8k等。
(2)发送端对低分辨率视频LR进行编码,得到低分辨率视频LR的码流LR_str。例如,发送端使用通用编解码协议对低分辨率视频LR进行编码。
(3)发送端对低分辨率视频LR的码流LR_str进行解码得到解码后的低分辨率视频LR’。其中,解码后的低分辨率视频LR’与低分辨率视频LR的分辨率相同。
(4)发送端使用解码后的低分辨率视频LR’作为输入,输入至神经网络,得到神经网络输出的高分辨率视频HR_hf。其中,该高分辨率视频HR_hf中包括高分辨率视频HR中各图像的高频信息,且排除了高分辨率视频HR中各图像的低频信息。
(5)发送端获取高分辨率视频HR和高分辨率视频HR_hf之间的视频残差Res_lf。其中,视频残差Res_lf用于反映高分辨率视频HR中各图像的低频信息。例如,高分辨率视频HR和高分辨率视频HR_hf分辨率相同,发送端可以将高分辨率视频HR中各图像的第一像素点 的像素值,与高分辨率视频HR_hf中对应于第一像素点的第二像素点的像素值相减,以得到视频残差Res_lf中的第一像素点残差。
(6)发送端对视频残差Res_lf进行编码,得到视频残差Res_lf的码流Res_str。例如,发送端使用通用编解码协议对视频残差Res_lf进行编码。或者,发送端使用单独的编码方式对视频残差Res_lf进行编码。例如,熵编码、自编码网络等。
(7)发送端发送低分辨率视频LR的码流LR_str和视频残差Res_lf的码流Res_str。相应地,接收端接收该码流LR_str和码流Res_str。
(8)接收端对码流LR_str进行解码,得到解码后的低分辨率视频LR’。接收端解码码流LR_str时所使用的解码方式,与,发送端对低分辨率视频LR编码时所采用的编码方式相关。即,发送端采用何种协议的编码方式对低分辨率视频LR进行编码,则接收端需对应的采用该协议的解码方式对码流LR_str进行解码,对此不再赘述。
(9)接收端使用解码后的低分辨率视频LR’作为输入,输入至神经网络,得到神经网络输出的高分辨率视频HR_hf。应理解,接收端部署的神经网络与发送端部署的神经网络相同,或功能上至少近似相同。
(10)接收端对码流Res_str进行解码,得到解码后的视频残差Res_lf’。接收端解码码流Res_str时所使用的解码方式,与,发送端对视频残差Res_lf编码时所采用的编码方式相关。即,发送端采用何种协议的编码方式对视频残差Res_lf进行编码,则接收端需对应的采用该协议的解码方式对码流Res_str进行解码,对此不再赘述。
(11)接收端将高分辨率视频HR_hf和解码后的视频残差Res_lf’合并,恢复出第三高分辨率视频HR_end。例如,接收端将高分辨率视频HR_hf中第二像素点的像素值,与视频残差Res_lf’中对应于第二像素点的像素点残差相加,以得到第三高分辨率视频HR_end中对应于第二像素点的第三像素点。
虽然上述均以高分辨率视频HR和高分辨率视频HR_hf分辨率相同为例,对发送端如何获取视频残差Res_lf,以及,接收端如何使用该视频残差Res_lf恢复出高分辨率视频HR_end进行了示例说明。但是应理解,当高分辨率视频HR和高分辨率视频HR_hf分辨率不同时,发送端可以对其中一个视频的分辨率进行转换后,再获取视频残差Res_lf,相应地,接收端基于该解码后的视频残差Res_lf’恢复出高分辨率视频HR_end时,也需执行对应的分辨率转换操作,具体可以参见前述关于第一高分辨率图像的第一分辨率和第二高分辨率的第三分辨率不同时的描述,其实现方式与原理类似,对此不再赘述。
另外,上述图10所示的示例以利用神经网络重构高分辨率视频HR_hf为例,对如何传输高分辨率视频进行了示例说明。应理解,该部分内容也采用前述所说“利用第二低分辨率图像重构第二高分辨率图像”的第二种方式实现,其实现方式与原理类似,对此不再赘述。
同样以传输4k视频为例,在该示例下,假定发送端对该4k视频处理得到的是720p视频,以及,4k视频残差C#。其中,4K视频残差C#用于反映4k视频的各图像的图像轮廓信息(即用于反映4k视频的各图像的低频信息)。相比于用于反映4k视频的各图像的低频信息和高频信息的4k视频残差,4K视频残差C#所包括的数据量很少。因此,在对本实施例中的4k视频残差C#进行编码时,可以采用较大的压缩比,例如,该压缩比可达500倍。
则在该示例下,通过公式(1)的计算方式可知,传输该720p视频B需要采用如下传输 码率:
(1280×720×1.5×8)×30/50/1024/1024=6.3Mbps
传输该4k视频残差C#所需的传输码率为:
(3840×2160×1.5×8)×30/500/1024/1024=5.7Mbps
两个传输码率相加为12Mbps。也就是说,在采用图10所示的方案传输该4k视频时,发送端的传输码率需要达到12Mbps。相应地,接收端接收该4k视频时,也需要使用12Mbps的传输码率。
通过该示例可以看出,在通过图10所示的方式进行高分辨率视频的传输时,即,在通过将高分辨率视频转换为低分辨率视频和用于反映高分辨率视频中各图像的低频信息的视频残差进行传输时,由于低分辨率视频的传输码率远低于高分辨率视频的码率,且用于反映高分辨率视频中各图像的低频信息的视频残差的数据量很少,可以使用较高的压缩比进行编码,因此,仅需要很低的传输码率即可实现高分辨率视频的传输,且所需的传输码率位于大部分传输设备所支持的传输码率范围,因此,可以满足实时传输高分辨率视频的应用场景,实现高分辨率视频的无延时传输、提高了传输效率。
另外,由于发送端和接收端均采用的是编解码后的低分辨率视频LR’,获取包括高分辨率视频HR中各图像的图像细节信息的高分辨率视频HR_hf。所以,发送端和接收端所获取到的高分辨率视频HR_hf完全一致。由于发送端会使用高分辨率视频HR_hf计算视频残差,接收端会使用解码后的视频残差对该高分辨率视频HR_hf进行修正,因此,可以通过该视频残差消除低分辨率视频LR的编解码损失,使得整个传输通路上仅有视频残差的编解码损失,不再有低分辨率视频LR的编解码损失,从而可以提高视频传输的质量。
前述实施例重点描述的是如何将高分辨率图像分解为低分辨率图像和用于反映高分辨率图像的低频信息的图像残差进行传输,以降低高分辨率图像的传输码率的内容。下面,本申请实施例还提供了另外一种图像传输方法,该方法虽然也是将高分辨率图像分解为低分辨率图像和图像残差进行传输,但是重点涉及的是如何消除低分辨率图像传输过程中的损失。在该实施例中,所传输的图像残差可以是用于反映高分辨率图像的低频信息的图像残差,也可以是用于反映高分辨率图像的低频信息和高频信息的图像残差,也可以是用于反映高分辨率图像的高频信息的图像残差,对此不进行限定。
图11为本申请实施例提供的另一种图像传输方法的流程示意图。如图11所示,该方法可以包括:S1101、发送端将第一高分辨率图像转换为第一低分辨率图像。其中,第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。S1102、发送端编码第一低分辨率图像,以得到第一码流。例如,发送端以有损编码方式编码第一低分辨率图像,以得到第一码流。
S1103、发送端解码第一码流,以得到第二低分辨率图像。S1104、发送端利用第二低分辨率图像重构第二高分辨率图像。其中,第二高分辨率图像的第三分辨率高于第二分辨率。可选的,该第二高分辨率图像可以包括:第一高分辨率图像的图像细节信息和图像轮廓信息,也可以仅包括第一高分辨率图像的图像细节信息,也可以仅包括第一高分辨率图像的轮廓信息等,本申请实施例对此不进行限定。
S1105、发送端获取第一高分辨率图像和第二高分辨率图像之间的图像残差。S1106、发送端编码图像残差以得到第二码流。例如,以有损编码方式编码图像残差,或者,以无 损编码方式编码图像残差。
S1107、发送端发送第一码流和第二码流。S1108、接收端解码第一码流,以得到第二低分辨率图像。S1109、接收端利用第二低分辨率图像重构第二高分辨率图像。
S1110、接收端解码第二码流,以得到第一高分辨率图像和第二高分辨率图像之间的图像残差。S1111、接收端将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像。关于上述各步骤的实现方式,以及,各技术特征的解释,可以参见前述实施例中相关描述,在此不再赘述。
在本实施例中,发送端通过采用解码后的第二低分辨率图像去获取第二高分辨率图像的方式,使发送端所获取到的第二高分辨率图像,涵盖了第一低分辨率图像在传输过程中的编解码损失。发送端在基于该第二高分辨率图像与第一高分辨率图像,所获取的图像残差也涵盖了第一低分辨率图像在传输过程中的编解码损失。这样,接收端在基于该第二低分辨率图像获取到第二高分辨率图像后,将第二高分辨图像与解码得到的图像残差进行合并时,可以消除第一低分辨率图像传输过程中的损失,此时,通路上只有图像残差的传输损失,减少了接收端恢复出来的第三高分辨率图像的损失(例如只有图像残差的损失),甚至无损失(例如,使用无损编码方式对图像残差进行编码的场景),提高了图像的质量。当该方法应用于传输高分辨率视频时,可以提高高分辨率视频的图像质量,提高了用户体验。
为了进一步说明技术效果,图12为本申请实施例提供的一种传输装置的结构示意图。可以理解的是,该传输装置可以对应实现前述各方法实施例中发送端的操作或者步骤。该传输装置可以是发送端或者可以是可配置于发送端的部件,例如芯片。如图12所示,该传输装置可以包括:第一处理模块11、第一编码模块12、第二处理模块13、第三处理模块14、第二编码模块15和发送模块16。可选的,在一些实施例中,该传输装置还可以包括:解码模块17。
在一种可能的实现方式中,第一处理模块11,用于将第一高分辨率图像转换为第一低分辨率图像,该第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。第一编码模块12,用于编码第一低分辨率图像以得到第一码流。第二处理模块13,用于获取第二高分辨率图像,该第二高分辨率图像的第三分辨率高于第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。第三处理模块14,用于获取第一高分辨率图像和第二高分辨率图像之间的图像残差,该图像残差用于反映低频信息。第二编码模块15,用于编码图像残差以得到第二码流。发送模块16,用于发送第一码流和第二码流。
应理解,上述高频信息是图像细节信息,低频信息是图像轮廓信息。在第一高分辨率图像中,图像细节信息相对于图像轮廓信息更快速地变化。
作为一种可能的实现方式,解码模块17,用于解码第一码流,以得到第二低分辨率图像。第二处理模块13,具体用于利用第二低分辨率图像重构第二高分辨率图像。例如,第二处理模块13,具体用于利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第一分辨率和第三分辨率相同;第三处理模块14,具体用于将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。或者,第一分辨率大 于第三分辨率,第三处理模块14,具体用于将第二高分辨率图像转换为第四高分辨率图像,并将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差,其中,第四高分辨率图像具有第一分辨率。
作为一种可能的实现方式,第一编码模块12,具体用于以有损编码方式编码第一低分辨率图像。
作为一种可能的实现方式,第二编码模块15,具体用于以有损编码方式编码图像残差。
作为一种可能的实现方式,第二编码模块15,具体用于以无损编码方式编码图像残差。
作为一种可能的实现方式,该装置应用于传输视频,第一高分辨率图像为该视频中的一帧图像。
本实施例提供的传输装置,可以执行上述图4和图8所对应的方法实施例中发送端的动作,其实现原理和技术效果类似,在此不再赘述。
在另一实施例中,第一处理模块11,用于将第一高分辨率图像转换为第一低分辨率图像,该第一高分辨率图像的第一分辨率高于第一低分辨率图像的第二分辨率。第一编码模块12,用于编码第一低分辨率图像(例如以有损编码方式编码所述第一低分辨率图像),以得到第一码流。解码模块17,用于解码第一码流,以得到第二低分辨率图像。第二处理模块13,用于利用第二低分辨率图像重构第二高分辨率图像,该第二高分辨率图像的第三分辨率高于第二分辨率。第三处理模块14,用于获取第一高分辨率图像和第二高分辨率图像之间的图像残差。第二编码模块15,用于编码图像残差以得到第二码流。发送模块16,用于发送第一码流和第二码流。
作为一种可能的实现方式,第二处理模块13,具体用于利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第一分辨率和第三分辨率相同,第三处理模块14,具体用于将第一高分辨率图像中第一像素点的像素值,与第二高分辨率图像中对应于第一像素点的第二像素点的像素值相减,以得到图像残差中的第一像素点残差。或者,第一分辨率大于第三分辨率,第三处理模块14,具体用于将第二高分辨率图像转换为第四高分辨率图像,并将第一高分辨率图像中第一像素点的像素值,与第四高分辨率图像中对应于第一像素点的第三像素点的像素值相减,以得到图像残差中的第二像素点残差,其中,第四高分辨率图像具有第一分辨率。
作为一种可能的实现方式,第二编码模块15,具体用于以有损编码方式编码图像残差。
作为一种可能的实现方式,第二编码模块15,具体用于以无损编码方式编码图像残差。
作为一种可能的实现方式,该装置应用于传输视频,第一高分辨率图像为该视频中的一帧图像。
本实施例提供的传输装置,可以执行上述图11所对应的方法实施例中发送端的动作,其实现原理和技术效果类似,在此不再赘述。
可选的,上述装置中还可以包括至少一个存储模块,该存储模块可以包括数据和/或指令,上述各模块可以读取存储模块中的数据和/或指令,实现对应的方法。
为了进一步说明技术效果,图13为本申请实施例提供的另一种传输装置的结构示意图。可以理解的是,该传输装置可以对应实现前述各方法实施例中接收端的操作或者步骤。 该传输装置可以是接收端或者可以是可配置于接收端的部件,例如芯片。如图13所示,该传输装置可以包括:接收模块21、第一解码模块22、第一处理模块23、第二解码模块24和第二处理模块25。
接收模块21,用于接收第一码流和第二码流。第一解码模块22,用于解码第一码流,以得到第二低分辨率图像。第一处理模块23,用于利用第二低分辨率图像重构第二高分辨率图像,该第二高分辨率图像的第一分辨率高于第二低分辨率图像的第二分辨率,第二高分辨率图像中包括第一高分辨率图像的高频信息,且排除了第一高分辨率图像的低频信息。第二解码模块24,用于解码第二码流,以得到第一高分辨率图像和第二高分辨率图像之间的图像残差,该图像残差用于反映低频信息。第二处理模块25,用于将第二高分辨率图像与图像残差合并,以得到第三高分辨率图像。
应理解,上述高频信息是图像细节信息,低频信息是图像轮廓信息。在第一高分辨率图像中,图像细节信息相对于图像轮廓信息更快速地变化。
作为一种可能的实现方式,第一处理模块23,具体用于利用第二低分辨率图像,使用神经网络重构第二高分辨率图像。
作为一种可能的实现方式,第二处理模块25,具体用于将第二高分辨率图像中第二像素点的像素值,与图像残差中对应于第二像素点的像素点残差相加,以得到第三高分辨率图像中对应于第二像素点的第三像素点。
作为一种可能的实现方式,该装置应用于传输视频,第一高分辨率图像为视频中的一帧图像。
本实施例提供的传输装置,可以执行上述方法实施例中接收端的动作,其实现原理和技术效果类似,在此不再赘述。可选的,上述装置中还可以包括至少一个存储模块,该存储模块可以包括数据和/或指令,上述各模块可以读取存储模块中的数据和/或指令,实现对应的方法。
图12和图13中涉及的各个模块可以软件、硬件或二者结合来实现。例如,以上各个实施例中发送模块实际实现时可以为发送器,接收模块实际实现时可以为接收器,或者,发送模块和接收模块通过收发器实现,或者,发送模块和接收模块通过通信接口实现。而处理模块可以以软件通过处理元件调用的形式实现;也可以以硬件的形式实现。例如,处理模块可以为至少一个单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上处理模块的功能。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个专用集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(central processing unit,CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统 (system-on-a-chip,SOC)的形式实现。
本申请还提供一种如图3所示的传输设备100,该传输设备100中的处理器系统102读取存储器101存储的程序和数据集合以执行前述图像传输方法。
本申请实施例还提供一种计算机可读存储介质,其上存储有用于实现上述方法实施例中由发送端执行的方法,或由接收端执行的方法的计算机指令。例如,该计算机指令被执行时,使得传输装置可以实现上述方法实施例中发送端执行的方法、或者、接收端执行的方法。
本申请实施例还提供一种包含指令的计算机程序产品,该指令被执行时使得该计算机实现上述方法实施例中由发送端执行的方法,或由接收端执行的方法。
本申请实施例还提供一种传输系统,该传输系统包括上文实施例中的发送端,和/或接收端。
作为一个示例,该传输系统包括:上文结合图4或图9对应的实施例中的发送端和接收端。
作为另一示例,该传输系统包括:上文结合图12描述的传输装置、图13描述的传输装置。
作为另一示例,该传输系统包括:上文结合图3所描述的传输设备。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Drive(SSD))等。
本文中的术语“多个”是指两个或两个以上。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系;在公式中,字符“/”,表示前后关联对象是一种“相除”的关系。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。
可以理解的是,在本申请的实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
Claims (30)
- 一种图像传输方法,其特征在于,所述方法包括:将第一高分辨率图像转换为第一低分辨率图像,所述第一高分辨率图像的第一分辨率高于所述第一低分辨率图像的第二分辨率;编码所述第一低分辨率图像以得到第一码流;获取第二高分辨率图像,所述第二高分辨率图像的第三分辨率高于所述第二分辨率,所述第二高分辨率图像中包括所述第一高分辨率图像的高频信息,且排除了所述第一高分辨率图像的低频信息;获取所述第一高分辨率图像和所述第二高分辨率图像之间的图像残差,所述图像残差用于反映所述低频信息;编码所述图像残差以得到第二码流;发送所述第一码流和所述第二码流。
- 根据权利要求1所述的方法,其特征在于,所述高频信息是图像细节信息,所述低频信息是图像轮廓信息。
- 根据权利要求1或2所述的方法,其特征在于,所述获取第二高分辨率图像,包括:解码所述第一码流,以得到第二低分辨率图像;利用所述第二低分辨率图像重构所述第二高分辨率图像。
- 根据权利要求3所述的方法,其特征在于,所述利用所述第二低分辨率图像重构第二高分辨率图像,包括:利用所述第二低分辨率图像,使用神经网络重构所述第二高分辨率图像。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述第一分辨率和所述第三分辨率相同;所述获取所述第一高分辨率图像和所述第二高分辨率图像之间的图像残差,包括:将所述第一高分辨率图像中第一像素点的像素值,与所述第二高分辨率图像中对应于所述第一像素点的第二像素点的像素值相减,以得到所述图像残差中的第一像素点残差。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述方法应用于传输视频,所述第一高分辨率图像为所述视频中的一帧图像。
- 一种图像传输方法,其特征在于,所述方法包括:将第一高分辨率图像转换为第一低分辨率图像,所述第一高分辨率图像的第一分辨率高于所述第一低分辨率图像的第二分辨率;编码所述第一低分辨率图像,以得到第一码流;解码所述第一码流,以得到第二低分辨率图像;利用所述第二低分辨率图像重构第二高分辨率图像,所述第二高分辨率图像的第三分辨率高于所述第二分辨率;获取所述第一高分辨率图像和所述第二高分辨率图像之间的图像残差;编码所述图像残差以得到第二码流;发送所述第一码流和所述第二码流。
- 根据权利要求7所述的方法,其特征在于,所述利用所述第二低分辨率图像重构第二高分辨率图像,包括:利用所述第二低分辨率图像,使用神经网络重构第二高分辨率图像。
- 根据权利要求7或8所述的方法,其特征在于,所述第一分辨率和所述第三分辨率相同;所述获取所述第一高分辨率图像和所述第二高分辨率图像之间的图像残差,包括:将所述第一高分辨率图像中第一像素点的像素值,与所述第二高分辨率图像中对应于所述第一像素点的第二像素点的像素值相减,以得到所述图像残差中的第一像素点残差。
- 根据权利要求7-9任一项所述的方法,其特征在于,所述方法应用于传输视频,所述第一高分辨率图像为所述视频中的一帧图像。
- 一种图像传输方法,其特征在于,所述方法包括:接收第一码流和第二码流;解码所述第一码流,以得到第二低分辨率图像;利用所述第二低分辨率图像重构第二高分辨率图像,所述第二高分辨率图像的第一分辨率高于所述第二低分辨率图像的第二分辨率,所述第二高分辨率图像中包括所述第一高分辨率图像的高频信息,且排除了所述第一高分辨率图像的低频信息;解码所述第二码流,以得到第一高分辨率图像和所述第二高分辨率图像之间的图像残差,所述图像残差用于反映所述低频信息;将所述第二高分辨率图像与所述图像残差合并,以得到第三高分辨率图像。
- 根据权利要求11所述的方法,其特征在于,所述高频信息是图像细节信息,所述低频信息是图像轮廓信息。
- 根据权利要求11或12所述的方法,其特征在于,所述利用所述第二低分辨率图像重构第二高分辨率图像,包括:利用所述第二低分辨率图像,使用神经网络重构所述第二高分辨率图像。
- 根据权利要求11-13任一项所述的方法,其特征在于,所述将所述第二高分辨率图像与所述图像残差合并,以得到所述第三高分辨率图像,包括:将所述第二高分辨率图像中第二像素点的像素值,与所述图像残差中对应于所述第二像素点的像素点残差相加,以得到所述第三高分辨率图像中对应于所述第二像素点的第三像素点。
- 根据权利要求11-14任一项所述的方法,其特征在于,所述方法应用于传输视频,所述第一高分辨率图像为所述视频中的一帧图像。
- 一种传输装置,其特征在于,所述装置包括:第一处理模块,用于将第一高分辨率图像转换为第一低分辨率图像,所述第一高分辨率图像的第一分辨率高于所述第一低分辨率图像的第二分辨率;第一编码模块,用于编码所述第一低分辨率图像以得到第一码流;第二处理模块,用于获取第二高分辨率图像,所述第二高分辨率图像的第三分辨率高于所述第二分辨率,所述第二高分辨率图像中包括所述第一高分辨率图像的高频信息,且排除了所述第一高分辨率图像的低频信息;第三处理模块,用于获取所述第一高分辨率图像和所述第二高分辨率图像之间的图像残差,所述图像残差用于反映所述低频信息;第二编码模块,用于编码所述图像残差以得到第二码流;发送模块,用于发送所述第一码流和所述第二码流。
- 根据权利要求16所述的装置,其特征在于,所述高频信息是图像细节信息,所述低频信息是图像轮廓信息。
- 根据权利要求17所述的装置,其特征在于,所述装置还包括:解码模块,用于解码所述第一码流,以得到第二低分辨率图像;所述第二处理模块,具体用于利用所述第二低分辨率图像重构所述第二高分辨率图像。
- 根据权利要求16-18任一项所述的装置,其特征在于,所述第二处理模块,具体用于利用所述第二低分辨率图像,使用神经网络重构所述第二高分辨率图像。
- 根据权利要求16-19任一项所述的装置,其特征在于,所述第一分辨率和所述第三分辨率相同;所述第三处理模块,具体用于将所述第一高分辨率图像中第一像素点的像素值,与所述第二高分辨率图像中对应于所述第一像素点的第二像素点的像素值相减,以得到所述图像残差中的第一像素点残差。
- 根据权利要求16-20任一项所述的装置,其特征在于,所述装置应用于传输视频,所述第一高分辨率图像为所述视频中的一帧图像。
- 一种传输装置,其特征在于,所述装置包括:第一处理模块,用于将第一高分辨率图像转换为第一低分辨率图像,所述第一高分辨率图像的第一分辨率高于所述第一低分辨率图像的第二分辨率;第一编码模块,用于编码所述第一低分辨率图像,以得到第一码流;解码模块,用于解码所述第一码流,以得到第二低分辨率图像;第二处理模块,用于利用所述第二低分辨率图像重构第二高分辨率图像,所述第二高分辨率图像的第三分辨率高于所述第二分辨率;第三处理模块,用于获取所述第一高分辨率图像和所述第二高分辨率图像之间的图像残差;第二编码模块,用于编码所述图像残差以得到第二码流;发送模块,用于发送所述第一码流和所述第二码流。
- 根据权利要求22所述的装置,其特征在于,所述第二处理模块,具体用于利用所述第二低分辨率图像,使用神经网络重构第二高分辨率图像。
- 根据权利要求22或23所述的装置,其特征在于,所述第一分辨率和所述第三分辨率相同;所述第三处理模块,具体用于将所述第一高分辨率图像中第一像素点的像素值,与所述第二高分辨率图像中对应于所述第一像素点的第二像素点的像素值相减,以得到所述图像残差中的第一像素点残差。
- 根据权利要求22-24任一项所述的装置,其特征在于,所述装置应用于传输视频,所述第一高分辨率图像为所述视频中的一帧图像。
- 一种传输装置,其特征在于,所述装置包括:接收模块,用于接收第一码流和第二码流;第一解码模块,用于解码所述第一码流,以得到第二低分辨率图像;第一处理模块,用于利用所述第二低分辨率图像重构第二高分辨率图像,所述第二高分辨率图像的第一分辨率高于所述第二低分辨率图像的第二分辨率,所述第二高分辨率图像中包括所述第一高分辨率图像的高频信息,且排除了所述第一高分辨率图像的低频信息;第二解码模块,用于解码所述第二码流,以得到第一高分辨率图像和所述第二高分辨率图像之间的图像残差,所述图像残差用于反映所述低频信息;第二处理模块,用于将所述第二高分辨率图像与所述图像残差合并,以得到第三高分辨率图像。
- 根据权利要求26所述的装置,其特征在于,所述高频信息是图像细节信息,所述低频信息是图像轮廓信息。
- 根据权利要求26或27所述的装置,其特征在于,所述第一处理模块,具体用于利用所述第二低分辨率图像,使用神经网络重构所述第二高分辨率图像。
- 根据权利要求26-28任一项所述的装置,其特征在于,所述第二处理模块,具体用于将所述第二高分辨率图像中第二像素点的像素值,与所述图像残差中对应于所述第二像素点的像素点残差相加,以得到所述第三高分辨率图像中对应于所述第二像素点的第三像素点。
- 根据权利要求26-29任一项所述的装置,其特征在于,所述装置应用于传输视频,所述第一高分辨率图像为所述视频中的一帧图像。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/077260 WO2021168827A1 (zh) | 2020-02-28 | 2020-02-28 | 图像传输方法及装置 |
CN202080006694.3A CN113574886B (zh) | 2020-02-28 | 2020-02-28 | 图像传输方法及装置 |
EP20921135.8A EP4090024A4 (en) | 2020-02-28 | 2020-02-28 | IMAGE TRANSMISSION METHOD AND APPARATUS |
US17/897,808 US20230007282A1 (en) | 2020-02-28 | 2022-08-29 | Image transmission method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/077260 WO2021168827A1 (zh) | 2020-02-28 | 2020-02-28 | 图像传输方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/897,808 Continuation US20230007282A1 (en) | 2020-02-28 | 2022-08-29 | Image transmission method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021168827A1 true WO2021168827A1 (zh) | 2021-09-02 |
Family
ID=77490583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/077260 WO2021168827A1 (zh) | 2020-02-28 | 2020-02-28 | 图像传输方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230007282A1 (zh) |
EP (1) | EP4090024A4 (zh) |
CN (1) | CN113574886B (zh) |
WO (1) | WO2021168827A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4181067A4 (en) * | 2020-11-30 | 2023-12-27 | Samsung Electronics Co., Ltd. | DEVICE AND METHOD FOR ENCODING AND DECODING IMAGES BY AI |
US20230232098A1 (en) * | 2022-01-14 | 2023-07-20 | Bendix Commercial Vehicle Systems Llc | System and method for opportunistic imaging |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101345870A (zh) * | 2008-09-04 | 2009-01-14 | 上海交通大学 | 低码率视频超分辨率重构的编码装置和解码装置 |
CN101799919A (zh) * | 2010-04-08 | 2010-08-11 | 西安交通大学 | 一种基于pca对齐的正面人脸图像超分辨率重建方法 |
CN106162180A (zh) * | 2016-06-30 | 2016-11-23 | 北京奇艺世纪科技有限公司 | 一种图像编解码方法及装置 |
CN108076301A (zh) * | 2016-11-11 | 2018-05-25 | 联芯科技有限公司 | VoLTE视频多方电话的视频处理方法和系统 |
US20180174275A1 (en) * | 2016-12-15 | 2018-06-21 | WaveOne Inc. | Autoencoding image residuals for improving upsampled images |
CN109451323A (zh) * | 2018-12-14 | 2019-03-08 | 上海国茂数字技术有限公司 | 一种无损图像编码方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104978717A (zh) * | 2015-06-11 | 2015-10-14 | 沈阳东软医疗系统有限公司 | Ct重建图像的处理方法、装置及设备 |
CN109547784A (zh) * | 2017-09-21 | 2019-03-29 | 华为技术有限公司 | 一种编码、解码方法及装置 |
KR20200056943A (ko) * | 2018-11-15 | 2020-05-25 | 한국전자통신연구원 | 영역 차등적 영상 부/복호화 방법 및 장치 |
-
2020
- 2020-02-28 EP EP20921135.8A patent/EP4090024A4/en active Pending
- 2020-02-28 CN CN202080006694.3A patent/CN113574886B/zh active Active
- 2020-02-28 WO PCT/CN2020/077260 patent/WO2021168827A1/zh unknown
-
2022
- 2022-08-29 US US17/897,808 patent/US20230007282A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101345870A (zh) * | 2008-09-04 | 2009-01-14 | 上海交通大学 | 低码率视频超分辨率重构的编码装置和解码装置 |
CN101799919A (zh) * | 2010-04-08 | 2010-08-11 | 西安交通大学 | 一种基于pca对齐的正面人脸图像超分辨率重建方法 |
CN106162180A (zh) * | 2016-06-30 | 2016-11-23 | 北京奇艺世纪科技有限公司 | 一种图像编解码方法及装置 |
CN108076301A (zh) * | 2016-11-11 | 2018-05-25 | 联芯科技有限公司 | VoLTE视频多方电话的视频处理方法和系统 |
US20180174275A1 (en) * | 2016-12-15 | 2018-06-21 | WaveOne Inc. | Autoencoding image residuals for improving upsampled images |
CN109451323A (zh) * | 2018-12-14 | 2019-03-08 | 上海国茂数字技术有限公司 | 一种无损图像编码方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4090024A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP4090024A4 (en) | 2023-01-18 |
US20230007282A1 (en) | 2023-01-05 |
EP4090024A1 (en) | 2022-11-16 |
CN113574886A (zh) | 2021-10-29 |
CN113574886B (zh) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200145692A1 (en) | Video processing method and apparatus | |
US11475539B2 (en) | Electronic apparatus, system and controlling method thereof | |
WO2021036795A1 (zh) | 视频超分辨率处理方法及装置 | |
KR101266667B1 (ko) | 장치 내 제어기에서 프로그래밍되는 압축 방법 및 시스템 | |
WO2019184639A1 (zh) | 一种双向帧间预测方法及装置 | |
US20230007282A1 (en) | Image transmission method and apparatus | |
CN113747242B (zh) | 图像处理方法、装置、电子设备及存储介质 | |
WO2019056898A1 (zh) | 一种编码、解码方法及装置 | |
WO2023279961A1 (zh) | 视频图像的编解码方法及装置 | |
WO2024078066A1 (zh) | 视频解码方法、视频编码方法、装置、存储介质及设备 | |
CN111406404B (zh) | 获得视频文件的压缩方法、解压缩方法、系统及存储介质 | |
WO2020168501A1 (zh) | 图像的编码方法、解码方法及所适用的设备、系统 | |
CN115866297A (zh) | 视频处理方法、装置、设备及存储介质 | |
WO2022179509A1 (zh) | 音视频或图像分层压缩方法和装置 | |
Yang et al. | Graph-convolution network for image compression | |
CN105306941B (zh) | 一种视频编码方法 | |
CN107945108A (zh) | 视频处理方法及装置 | |
CN114727116A (zh) | 编码方法及装置 | |
WO2022111349A1 (zh) | 图像处理方法、设备、存储介质及计算机程序产品 | |
WO2023279968A1 (zh) | 视频图像的编解码方法及装置 | |
WO2023015520A1 (zh) | 图像编解码方法和装置 | |
CN116708793B (zh) | 视频的传输方法、装置、设备及存储介质 | |
WO2023197717A1 (zh) | 一种图像解码方法、编码方法及装置 | |
CN113747099B (zh) | 视频传输方法和设备 | |
WO2021237474A1 (zh) | 视频传输方法、装置和系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20921135 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020921135 Country of ref document: EP Effective date: 20220809 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |