WO2024078064A1

WO2024078064A1 - Image processing method and apparatus, and terminal

Info

Publication number: WO2024078064A1
Application number: PCT/CN2023/105927
Authority: WO
Inventors: 鄢玉民; 宋晨
Original assignee: 中兴通讯股份有限公司
Priority date: 2022-10-11
Filing date: 2023-07-05
Publication date: 2024-04-18
Also published as: CN117915022A

Abstract

Provided in the present application are an image processing method and apparatus, and a terminal. The method comprises: performing synthesis on an auxiliary stream image and labeling information corresponding to same, so as to generate the current frame of synthesized image; performing detection on the current frame of synthesized image and the previous frame of synthesized image, so as to determine difference information; encoding the current frame of synthesized image according to the difference information, so as to generate encoded data; and sending the encoded data to a peer device, such that the peer device processes the encoded data, so as to obtain and display a decoded image comprising the labeling information corresponding to the auxiliary stream image.

Description

Image processing method, device and terminal

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to patent application No. 202211239520.9 filed with the China Patent Office on October 11, 2022, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to but is not limited to the field of image processing technology.

Background technique

At present, during video conferencing, the terminal will interact with the user by displaying multiple frames of images. However, with the development of video conferencing, users' demand for real-time interaction is becoming more and more prominent. Traditional video conferencing only displays a certain frame of image acquired, and cannot synchronously update the difference between consecutive frames, which reduces the interactivity between the terminal and the user and cannot meet the user's usage needs.

Summary of the invention

The present application provides an image processing method, device, terminal, electronic device and storage medium.

In a first aspect, the present application provides an image processing method, the method comprising: synthesizing an auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image; detecting the current frame synthesized image and the previous frame synthesized image to determine difference information; encoding the current frame synthesized image based on the difference information to generate encoded data; sending the encoded data to a peer device so that the peer device processes the encoded data to obtain and display a decoded image including the annotation information corresponding to the auxiliary stream image.

In a second aspect, the present application provides an image processing method, the method comprising: obtaining encoded data, which is the data sent by the image processing method in the first aspect; decoding the encoded data to obtain a decoded image, which is an image carrying an auxiliary stream image and its corresponding annotation information; and displaying the decoded image.

In a third aspect, the present application provides an encoding device, which includes: a synthesis module, configured to synthesize an auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image; a detection module, configured to detect the current frame synthesized image and the previous frame synthesized image, and determine the difference information encoding module, configured to encode the current frame synthesized image according to the difference information to generate encoded data; a sending module, configured to send the encoded data to a peer device, so that the peer device processes the encoded data, obtains and displays a decoded image including the annotation information corresponding to the auxiliary stream image.

In a fourth aspect, the present application provides a decoding device, comprising: an acquisition module, configured to acquire encoded data, the encoded data being the data sent by the image processing method in the first aspect; a decoding module, configured to decode the encoded data to obtain a decoded image, the decoded image being an image carrying an auxiliary stream image and its corresponding annotation information; and a display module, configured to display the decoded image.

In a fifth aspect, the present application provides a terminal, comprising: an encoding device and/or a decoding device; the encoding device is configured to execute the image processing method in the first aspect of the present application; and the decoding device is configured to execute the image processing method in the second aspect of the present application.

In a sixth aspect, the present application provides an image processing system, the image processing system comprising: a plurality of terminals connected in communication, the terminals being configured to implement any one of the image processing methods in the present application.

In a seventh aspect, the present application provides an electronic device, comprising: one or more processors; a memory on which one or more programs are stored, and when the one or more programs are executed by one or more processors, the one or more processors implement any image processing method in the present application.

In an eighth aspect, the present application provides a readable storage medium, which stores a computer program, and when the computer program is executed by a processor, any one of the image processing methods in the present application is implemented.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic flow chart of an image processing method provided in the present application.

FIG. 2 is a schematic diagram showing a flow chart of image processing by the image synthesis device provided in the present application.

FIG. 3 is a schematic diagram showing a flow chart of detecting auxiliary stream images provided by the present application.

FIG4 is a schematic flow chart showing the image processing method provided by the present application.

FIG5 is a block diagram showing the composition of the image processing system provided by the present application.

FIG6 shows a block diagram of the image processing system provided by the present application.

FIG. 7 shows a block diagram of the image processing system provided by the present application.

FIG8 shows a schematic diagram of a display interface for auxiliary stream images provided in the present application.

FIG9 shows a block diagram of the encoding device provided in the present application.

FIG10 is a block diagram showing the composition of the decoding device provided in the present application.

FIG. 11 shows a block diagram of the components of the terminal provided in the present application.

FIG. 12 is a block diagram showing the composition of the image processing system provided by the present application.

FIG. 13 is a block diagram showing an exemplary hardware architecture of a computing device capable of implementing the image processing method and apparatus according to the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of this application more clear, the following will describe the implementation of this application in detail with reference to the accompanying drawings. It should be noted that the implementation of this application and the features in the implementation can be combined with each other arbitrarily without conflict.

FIG1 is a flow chart of an image processing method provided by the present application. The method can be applied to an encoding device. As shown in FIG1 , the image processing method in the present application includes but is not limited to the following steps S101 to S104.

Step S101 : synthesize the auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image.

Step S102: Detect the current frame synthesized image and the previous frame synthesized image to determine difference information.

The previous frame composite image is an image generated by synthesizing the previous frame image of the auxiliary stream image and the annotation information corresponding to the previous frame image of the auxiliary stream image.

Step S103, encoding the current frame synthesized image according to the difference information to generate encoded data.

Step S104: Send the coded data to the peer device so that the peer device can The decoded image including the annotation information corresponding to the auxiliary stream image is obtained and displayed.

Among them, the counterpart device is a device that can process the encoded data, obtain and display the decoded image including the annotation information corresponding to the auxiliary stream image. For example, the counterpart device can be a decoding device, a receiving terminal and other devices. The counterpart device can be set based on the actual application scenario. Other unspecified counterpart devices are also within the scope of protection of this application and will not be repeated here.

In the present application, by synthesizing the auxiliary stream image and its corresponding annotation information to generate the current frame synthesized image, the information on the terminal's annotation of the auxiliary stream image can be clearly identified; the current frame synthesized image and the previous frame synthesized image are detected to determine the difference information, so that the user can synchronously obtain the difference information between two consecutive frames, thereby improving the interactivity between the terminal and the user; the current frame synthesized image is encoded based on the difference information to generate encoded data, which can speed up the encoding speed of the image to reduce the energy consumption of encoding; and, by sending the encoded data to the opposite device, so that the opposite device processes the encoded data, obtains and displays the decoded image including the annotation information corresponding to the auxiliary stream image, so that the opposite device can view the decoded image with the annotation information, so that the opposite device can display the annotation information more clearly.

In some embodiments, the synthesis of the auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image in step S101 can be implemented in the following manner: based on multiple frame rates, obtaining the annotation information corresponding to the auxiliary stream image; processing the annotation information corresponding to the auxiliary stream image according to a preset container and a preset image format to generate an annotated image; and integrating the auxiliary stream image and the annotated image to generate a current frame synthesized image.

The annotation information corresponding to the auxiliary stream image may be information based on multiple frame rates in the form of point set data. Frame rate refers to the number of frames or images shown or displayed per second. Frame rate is mainly used to refer to the number of frames of an image played per second in the synchronized audio and/or image of a movie, television or video. For example, the frame rate may be 120 frames per second, or 24 frames per second (or 25 frames per second, 30 frames per second), etc.

By obtaining the annotation information corresponding to the auxiliary stream image through a variety of different frame rates, the real-time change of the auxiliary stream image can be clarified, and then the annotation information corresponding to the auxiliary stream image is processed according to a preset container (such as a bitmap container, etc.) and a preset image format (such as an image format of a red green blue alpha (RGBA) color space; a YUV image format, etc.), so that the obtained annotated image can better reflect the real-time change characteristics and meet the user's needs. Real-time usage requirements.

Among them, the "Y" in the YUV image format represents brightness (Luminance or Luma), which is the grayscale value; while "U" and "V" represent chrominance (Chrominance or Chroma), which describes the color and saturation of the image and is used to specify the color of the pixel.

Furthermore, the auxiliary stream image and the annotated image are integrated (for example, it can be superimposed synthesis or differential synthesis, etc.) to generate a current frame composite image, which is convenient for subsequent processing and improves the image processing efficiency.

In some embodiments, the auxiliary stream image and the annotated image are integrated to generate a current frame composite image, including: converting the image formats of the auxiliary stream image and the annotated image respectively to obtain a converted image set; scaling each image in the converted image set according to a preset image resolution to obtain a scaled image set; synchronizing each image in the scaled image set according to a preset frame rate to obtain a processed auxiliary stream image and a processed annotated image; and superimposing and synthesizing the processed auxiliary stream image and the processed annotated image to generate a current frame composite image.

By performing multi-level and different-dimensional processing on auxiliary stream images and annotated images, the processed auxiliary stream images and the processed annotated images can be more conveniently superimposed and synthesized, thereby ensuring the accuracy of the superimposed images and improving the image processing efficiency.

For example, FIG2 shows a schematic diagram of a process flow of an image synthesis device provided by the present application for processing an image. As shown in FIG2, the image synthesis device 200 includes but is not limited to the following modules: a label collector 201, a data conversion module 202, an auxiliary stream image acquisition module 203, an image format conversion module 204, an image scaling module 205, a frame rate synchronization module 206, and an image overlay module 207.

The annotation collector 201 is configured to collect annotation information and support the collection of annotation information at multiple frame rates, obtain annotation information presented in the form of point set data, or passively receive point set data pushed by an annotation source.

The auxiliary stream image acquisition module 203 is configured to acquire auxiliary stream images, support acquisition of auxiliary stream images of various frame rates, and support various image formats, and can actively acquire auxiliary stream images or passively receive auxiliary stream image data push.

The data conversion module 202 is configured to process the point set data, for example, to convert the point set data based on a preset container such as a bitmap, and convert it into an appropriate The synthesized annotated image supports output in preset image formats such as the image format of the RGBA color space and the YUV image format.

The image format conversion module 204 is configured to convert the format of the auxiliary stream image and the format of the annotated image into the same type of image format to avoid image synthesis failure caused by different image formats.

The image scaling module 205 is configured to stretch the auxiliary stream image and the annotated image to the same image resolution according to a preset image resolution, and is applied to the scenario where the resolution of the auxiliary stream image, the resolution of the annotated image and the resolution of the target image are inconsistent.

The frame rate synchronization module 206 is configured to synchronize the acquisition frequencies of the auxiliary stream image and the annotated image according to a preset frame rate, and control the frequency of the synthesized current frame synthesized image by dropping frames and/or inserting frames, thereby reducing the data processing pressure of the image synthesis device 200 and improving the efficiency and stability of image synthesis.

For example, the image synthesis device 200 may process the input auxiliary stream image and annotation information in the following manner.

First, the auxiliary stream image is collected by the auxiliary stream image collection module 203, and the annotation information is collected by the annotation collector 201. Then, the annotation information corresponding to the auxiliary stream image is processed by the data conversion module 202 according to the preset container and the preset image format to generate an annotated image.

The image format conversion module 204 performs image format conversion on the auxiliary stream image and the annotated image respectively to obtain a converted image set, wherein the converted image set includes: the auxiliary stream image after image format conversion and the annotated image after image format conversion.

Then, the image scaling module 205 performs scaling processing on the auxiliary stream image after the image format conversion and the annotated image after the image format conversion, for example, according to the preset image resolution, the resolution of the auxiliary stream image after the image format conversion is adjusted to obtain the scaled auxiliary stream image; according to the preset image resolution, the annotated image after the image format conversion is adjusted to obtain the scaled annotated image. This ensures that the image resolutions of the scaled annotated image and the scaled auxiliary stream image are both the preset image resolutions, which facilitates the subsequent image processing.

Furthermore, the scaled annotated image and the scaled auxiliary stream image are synchronized by the frame rate synchronization module 206 to obtain a processed auxiliary stream image and a processed annotated image with the same frame rate (for example, both are preset frame rates).

For example, each image in the scaled image set is synchronized according to a preset frame rate to obtain a processed auxiliary stream image and a processed annotated image, including: when it is determined that the actual frame rate of the images in the scaled image set is greater than the preset frame rate, frame dropping processing is performed on each image in the scaled image set based on a sampling method to obtain a processed auxiliary stream image and a processed annotated image; when it is determined that the actual frame rate of the images in the scaled image set is less than the preset frame rate, internal interpolation is used to process each image in the scaled image set to obtain a processed auxiliary stream image and a processed annotated image.

By processing auxiliary stream images and annotated images in different frame rate synchronization modes, the success rate of images in the superposition synthesis process can be increased, and the image processing efficiency can be improved.

Finally, the image superposition module 207 is used to superimpose the processed auxiliary stream image and the processed annotated image to generate a current frame composite image.

In some embodiments, the processed auxiliary stream image and the processed annotated image are superimposed and synthesized to generate a current frame synthesized image, including: using the processed auxiliary stream image as the background image, superimposing the annotation features in the processed annotated image onto the processed auxiliary stream image, and obtaining the current frame synthesized image.

For example, by analyzing the Alpha component in the processed annotated image, the transparency of the annotated image is set to be fully transparent, thereby obtaining the annotation features in the processed annotated image. Then, the annotation features in the processed annotated image are superimposed on the processed auxiliary stream image to obtain the current frame composite image. The current frame composite image can have both the annotation features in the processed annotated image and the image features of the processed auxiliary stream image, thereby enriching the content of the current frame composite image.

In some embodiments, the processed auxiliary stream image and the processed annotated image are superimposed and synthesized to generate a current frame synthesized image, including: processing the processed auxiliary stream image according to preset transparency information to obtain image features of the processed auxiliary stream image, wherein the image features of the processed auxiliary stream image match the annotated information;

The processed annotated image is used as the background image, and the image features of the processed auxiliary stream image are superimposed on the processed annotated image to obtain a current frame composite image.

The processed auxiliary stream image is processed according to the preset transparency information, so that the image features of the processed auxiliary stream image can be obtained, and the image features of the processed auxiliary stream image match the annotation information, so that the characteristics of the annotation information can be represented, so that the processed auxiliary stream image can be further processed. The image features of the processed auxiliary stream image are superimposed on the processed annotated image, so that the current frame synthetic image has both the annotated features in the processed annotated image and the image features of the processed auxiliary stream image.

In some embodiments, the detection of the current frame composite image and the previous frame composite image to determine the difference information in step S102 can be implemented in the following manner: based on preset sizes, the current frame composite image and the pre-stored previous frame composite image are partitioned to obtain a first region image set corresponding to the current frame composite image and a second region image set corresponding to the previous frame composite image; based on the number of regions, the first region image and the second region image are compared to obtain the difference information.

The first region image set includes a plurality of first region images, and the second region image set includes a plurality of second region images.

It should be noted that the preset size can be a predefined minimum size for partitioning or blocking an image. For example, if the preset size is 16*16, the current frame composite image can be divided into multiple 16*16 first area images. At the same time, the previous frame composite image can also be divided into multiple 16*16 second area images. This allows the image to be divided in detail and the differences between different images to be more prominent.

Furthermore, the number of regions in the first region image set is the same as the number of regions in the second region image set, which can facilitate block-by-block comparison of the region images in the two region image sets, thereby making the obtained difference information more accurate.

In some embodiments, the difference information includes: at least one difference region. Encoding the current frame synthesized image according to the difference information to generate encoded data includes: determining difference contour information according to the at least one difference region; cropping the current frame synthesized image according to the difference contour information to obtain a changed region image; encoding the changed region image to generate encoded data.

The difference region is used to characterize an image region where the image features of the first region image are different from the image features of the second region image, and can accurately measure the difference between the two frames of images, making it convenient to process the current frame synthesized image.

For example, at least one difference area is merged to the maximum extent to obtain difference contour information, and the image boundary with difference is clarified, so as to crop the current frame synthetic image based on the difference contour information to obtain the image change information that only includes the difference information. The image of the changing area.

By encoding the changed area image, the encoded data can reflect the difference between the previous and next two frames of image, thereby improving the encoding speed of the current frame synthesized image.

For example, Fig. 3 shows a schematic diagram of a process for detecting auxiliary stream images provided by the present application. As shown in Fig. 3, the input image of the region detection device 300 is an auxiliary stream annotated image F1, which is a processed image obtained after being processed by the image synthesis device 200, and can simultaneously reflect the features of the auxiliary stream image and the annotated image.

After acquiring the auxiliary stream annotated image F1, the region detection device 300 performs block (or partition) processing on the auxiliary stream annotated image F1 to obtain a first region image set, where the first region image set includes a plurality of first region images.

By dividing the auxiliary stream annotated image F1 into blocks (or partitions), local information of the auxiliary stream annotated image F1 can be reflected, so as to facilitate subsequent comparison of features of different local images and realize detection of changed areas.

In addition, the region detection device 300 also pre-stores a second region image set, which includes multiple second region images. Moreover, the second region image set is an image set obtained by performing block (or partition) processing on the previous frame synthetic image, which can reflect the image features in different regions of the previous frame synthetic image.

Furthermore, by comparing the multiple second area images in the second area image set with the multiple first area images in the first area image set in blocks (or partitions), difference information (for example, different feature information in a certain area, etc.) is obtained.

It should be noted that if a difference is found in an image in a certain area, the image block in the area is cached and the area where the image block is located is recorded. The process of caching image blocks can be performed synchronously by multiple threads or by scanning the image blocks with differences line by line.

By storing a plurality of image blocks with differences and integrating the plurality of image blocks with differences, the contour of the image with differences can be extracted (for example, the circumscribed rectangular contour of the image block is extracted, etc.), and then based on the contour, the image within the contour is cropped to generate a difference image corresponding to the difference information. The difference image and the auxiliary stream annotated image F1 are both input to the encoding module 310 for encoding, so that the encoded data can be obtained quickly and accurately.

If it is determined that there is no difference between the first region image set and the second region image set, there is no need to extract the contour, and the processing of the auxiliary stream image of the frame is directly skipped.

By comparing the previous cached synthetic image with the auxiliary stream image, and extracting the contour corresponding to the changed image area, the image of the changed area within the contour is cropped to obtain difference information, thereby improving the accuracy of judging the difference changes of the auxiliary stream image.

In some embodiments, after detecting the current frame synthetic image and the previous frame synthetic image and determining the difference information, the method further includes: skipping the current frame synthetic image when it is determined that the difference information indicates that there is no difference between the current frame synthetic image and the previous frame synthetic image.

It should be noted that since there is no difference between the current frame composite image and the previous frame composite image, which indicates that the current frame composite image and the previous frame composite image are the same two frame images, there is no need to process the current frame image, and only the current frame composite image needs to be skipped to speed up the image processing.

In some implementations, sending the encoded data to the peer device includes: sending the encoded data to the peer device through a first channel; after sending the encoded data to the peer device, further includes: sending labeled data corresponding to the labeled information to the peer device through a second channel.

The annotation data corresponding to the annotation information may be data obtained by packaging the annotation information and complying with the transmission rules of the second channel. For example, the annotation information is represented by binary data, and a data packet header (e.g., a data packet header representing information such as the network address of the peer device) is added in front of the binary data, thereby obtaining the annotation data corresponding to the annotation information.

Sending encoded data and labeled data corresponding to the labeling information to the peer device through different transmission channels (such as the first channel and the second channel, etc.) can facilitate the peer device's processing of different data, so that the peer device can analyze and process the obtained encoded data more quickly, thereby improving data processing efficiency.

FIG4 is a flow chart of the image processing method provided by the present application. The method can be applied to a decoding device. As shown in FIG4 , the image processing method in the embodiment of the present application includes but is not limited to the following steps S401 to S404.

Step S401, obtaining encoded data.

The coded data is sent by the peer device (such as a coding device) through the Data encoded by any image processing method in the application.

For example, the encoded data is data obtained by the encoding device by encoding the current frame synthetic image based on the difference information, the difference information is information obtained by the encoding device by detecting the current frame synthetic image and the previous frame synthetic image, and the current frame synthetic image is an image synthesized by the encoding device on the auxiliary stream image and its corresponding annotation information.

Step S402: decode the encoded data to obtain a decoded image.

The decoded image is an image that carries the auxiliary stream image and its corresponding annotation information.

It should be noted that, since the encoded data sent by the encoding device is data encoded by any image processing method in the present application, that is, the encoded data already carries the annotation information and the auxiliary stream image, the decoding device only needs to perform corresponding decoding on the encoded data, thereby ensuring that the decoded image includes the characteristics of the auxiliary stream image and the characteristics of the annotation information corresponding to the auxiliary stream image. The decoding method used by the decoding device to decode the encoded data matches the encoding method of the encoded data to ensure that an accurate decoded image is obtained.

For example, the encoding device can use a specific compression technology to encode the current frame synthetic image based on the difference information to obtain encoded data, and the decoding device needs to use the same compression technology to decode the encoded data so that the obtained decoded image can simultaneously include the characteristics of the auxiliary stream image and the characteristics of the annotation information corresponding to the auxiliary stream image.

Step S403: superimpose the decoded image and the previous frame of synthesized image to generate an image to be displayed.

The decoded image includes an auxiliary stream image and annotation information, and the decoded image can reflect the features of the annotation information and the auxiliary stream image. The decoded image is superimposed with the previous frame composite image to generate an image to be displayed, so that the image to be displayed can reflect the annotation information.

Step S404: display the image to be displayed.

In some implementations, before performing step S404 of displaying the image to be displayed, the method further includes: rendering the image to be displayed to obtain a rendered image to be displayed.

By rendering the image to be displayed, the surface shading effect of the image to be displayed can be reflected intuitively and in real time, thereby showing the texture characteristics of the image to be displayed and the influence of the light source on the image to be displayed, so that the user can also view the rendered image to be displayed, thereby improving the user's viewing experience.

In this embodiment, by obtaining the coded data, it is possible to clearly determine the processing of the coded data. The invention relates to a method for processing a video image, wherein the coded data is data sent by the coding device and encoded by any one of the image processing methods in the present application, so as to facilitate subsequent processing; the coded data is decoded to obtain and display a decoded image, which is an image carrying an auxiliary stream image and its corresponding annotation information, so that the decoded image can reflect the annotation information and the characteristics of the auxiliary stream image; the decoded image is superimposed with the previous frame composite image to generate an image to be displayed, and the image to be displayed is displayed, so that the image to be displayed can reflect the annotation information.

In some embodiments, obtaining the encoded data in step S401 includes: obtaining the encoded data, including: receiving the encoded data through a first channel, wherein the encoded data is data corresponding to a synthesized image of a current frame, and the synthesized image of the current frame is an image synthesized of an auxiliary stream image and its corresponding annotation information; decoding the encoded data, and before obtaining the decoded image, the method also includes: receiving annotation data corresponding to the annotation information through a second channel.

The annotation data is data corresponding to the annotation information. For example, the annotation data can be represented by binary data and is used to represent the information.

By parsing the annotation data corresponding to the annotation information received by the second channel, the specific meaning of the annotation information corresponding to the auxiliary stream image can be clarified, so as to facilitate the subsequent processing of the data to be analyzed and improve the data processing efficiency; and by separately processing the data transmitted in different channels, different types of data can be processed to improve the accuracy of data processing.

Fig. 5 is a block diagram of the image processing system provided by the present application. As shown in Fig. 5, a first terminal 510 is connected to a second terminal 520 for communication (eg, communicating via the Internet or a communication network, etc.).

The first terminal 510 includes: an image synthesis device 511, a region detection device 512, an encoding module 513 and an auxiliary stream data sending module 514. The second terminal 520 includes: a receiving module 521, a decoding module 522 and an image rendering module 523. The functions of each module can refer to the description in the above embodiment.

The image synthesis device 511 can simultaneously obtain the annotation information and the auxiliary stream image, process the annotation information, generate the annotation image, and then synthesize the annotation image with the auxiliary stream image to generate the current frame synthesized image, so that the current frame synthesized image can simultaneously reflect the image features of the auxiliary stream image and the features corresponding to the annotation information.

Since the second terminal 520 can only perform conventional image decoding, The superimposed auxiliary stream image processed by the decoding module 522 can also reflect the characteristics of the annotation information, but the finally obtained image cannot accurately and clearly represent the characteristics of the annotation information.

Fig. 6 shows a block diagram of the image processing system provided by the present application. As shown in Fig. 6, a first terminal 610 is connected in communication with a second terminal 620 (eg, communicating via the Internet or a communication network, etc.).

The first terminal 610 includes: an image synthesis device 611, a region detection device 612, an encoding module 613, an auxiliary stream data sending module 614 and an annotation information sending module 615. The second terminal 620 includes: a receiving module 621, a decoding module 622, an image rendering module 623 and an annotation information receiving module 624. The functions of each module can refer to the description in the above embodiment.

The image synthesis device 611 can simultaneously obtain the annotation information and the auxiliary stream image, process the annotation information, generate the annotation image, and then synthesize the annotation image with the auxiliary stream image to generate the current frame synthesized image, so that the current frame synthesized image can simultaneously reflect the image features of the auxiliary stream image and the features corresponding to the annotation information.

It should be noted that the annotation information sending module 615 can also obtain annotation information and send the annotation information to the second terminal 620 to facilitate the second terminal 620 to analyze the superimposed auxiliary stream image output by the decoding module 622, so that the image input to the image rendering module 623 can clearly and accurately reflect the characteristics of the annotation information.

When communicating between different types of terminals, the terminals can all support decoding of the annotation information, so that the user can obtain the characteristics of the annotation information.

For example, Fig. 7 shows a block diagram of the image processing system provided by the present application. As shown in Fig. 7, the first terminal 710 is connected to the second terminal 720 and the third terminal 730 for communication (eg, communicating via the Internet or a communication network, etc.).

The first terminal 710 includes: an image synthesis device 711, a region detection device 712, an encoding module 713, an auxiliary stream data sending module 714, and a label information sending module 715. The second terminal 720 includes: a receiving module 721, a decoding module 722, an image rendering module 723, and a label information receiving module 724. The third terminal 730 includes: a receiving module 731, a decoding module 732, and an image rendering module 733. The functions of each module can refer to the description in the above embodiment.

The encoding module 713 is responsible for converting the image output by the area detection device 712 into a suitable The auxiliary stream data sending module 714 and the annotation information sending module 715 transmit the image data to the second terminal 720 (or the third terminal 730) through a wired communication network or a wireless communication network (such as an optical network composed of optical fibers).

For example, the image data processing may be implemented in the following manner.

The image synthesis device 711 obtains the annotation information and the auxiliary stream image respectively, and the image synthesis device 711 can obtain the annotation information and the auxiliary stream image at the same time, and process the annotation information to generate the annotation image, and then synthesize the annotation image with the auxiliary stream image to generate the current frame synthesis image. So that the current frame synthesis image can simultaneously reflect the image features of the auxiliary stream image and the features corresponding to the annotation information. Among them, the annotation information is a series of annotated point set data generated by the annotation source.

The area detection device 712 performs difference detection on different areas of the input current frame composite image to obtain difference information, and inputs both the difference information and the auxiliary stream image into the encoding module 713 for encoding, generates encoded data, and outputs the encoded data to the auxiliary stream data sending module 714, so that the auxiliary stream data sending module 714 sends the obtained encoded data to the second terminal 720 (and/or the third terminal 730) through the communication network, so that the second terminal 720 and/or the third terminal 730 can obtain the encoded data synchronized with the auxiliary stream image and annotation information.

The region detection device 712 may obtain a first region image set including a plurality of first region images by dividing the input current frame synthetic image into blocks, and then compare the plurality of first region images with a plurality of second region images cached therein to obtain the changed region information. The plurality of second region images are images obtained by dividing the previous frame synthetic image into blocks by the region detection device 712.

It should be noted that after receiving the encoded data, the second terminal 720 and/or the third terminal 730 will decode the encoded data, but the difference is that the second terminal 720 can also obtain the original annotation information at the same time to facilitate its analysis of the encoded data and obtain accurate auxiliary stream images and annotation information.

Furthermore, it is also necessary to use the image rendering module 733 or the image rendering module 723 to render the superimposed auxiliary stream image to ensure that the user obtains a clearer image.

In this embodiment, the annotated image and the auxiliary stream image are superimposed and synthesized. It can ensure the content consistency of the synthesized image of the current frame, and by comparing the differences between adjacent image frames, it can limit the range of image superposition corresponding to the annotation information, thereby improving the speed of image synthesis. It can meet the needs of users in different application scenarios and improve product competitiveness. It solves the problem of inconsistent interaction of auxiliary stream content between the first terminal 710 that can be annotated and the third terminal 730 that cannot be annotated.

FIG8 shows a schematic diagram of a display interface for auxiliary stream images provided by the present application. As shown in FIG8 , (A) in FIG8 represents a display interface of an auxiliary stream image with annotated information sent by the first terminal 710, or a display interface of an auxiliary stream image with annotated information displayed by the second terminal 720.

(B) in FIG. 8 shows a display interface in the prior art in which only an auxiliary stream image (ie, an auxiliary stream image without annotation information) is displayed by a terminal.

(C) in FIG. 8 shows a display interface of the auxiliary stream image displayed by the third terminal 730 .

By comparing the three display interfaces in FIG8 , the display status of the information can be clearly marked, which facilitates user viewing and improves the user experience.

Fig. 9 shows a block diagram of the coding device provided by the present application. As shown in Fig. 9, in one embodiment, the coding device 900 includes but is not limited to the following modules.

The synthesis module 901 is configured to synthesize the auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image; the detection module 902 is configured to detect the current frame synthesized image and the previous frame synthesized image to determine the difference information; the encoding module 903 is configured to encode the current frame synthesized image according to the difference information to generate encoded data; the sending module 904 is configured to send the encoded data to the opposite device so that the opposite device processes the encoded data to obtain and display a decoded image including the annotation information corresponding to the auxiliary stream image.

It should be noted that the encoding device 900 in this embodiment can implement any image processing method applied to the encoding device in the embodiments of the present application.

According to the encoding device of the embodiment of the present application, the synthesis module synthesizes the auxiliary stream image and its corresponding annotation information to generate the current frame synthetic image, which can clarify the information of the terminal annotating the auxiliary stream image; the detection module detects the current frame synthetic image and the previous frame synthetic image to determine the difference information, so that the user can synchronously obtain the difference information between two consecutive frames, thereby improving the interactivity between the terminal and the user; the encoding module detects the current frame synthetic image and the previous frame synthetic image according to the difference information, and determines the difference information between the two consecutive frames. The auxiliary stream image is encoded to generate encoded data, which can speed up the encoding speed of the image to reduce the energy consumption of encoding; and the encoded data is sent to the opposite device through the sending module, so that the opposite device processes the encoded data, obtains and displays the decoded image including the annotation information corresponding to the auxiliary stream image, so that the opposite device can view the decoded image with the annotation information, and the opposite device can display the annotation information more clearly.

Fig. 10 shows a block diagram of a decoding device provided by the present application. As shown in Fig. 10, in one embodiment, the decoding device 1000 includes but is not limited to the following modules.

The acquisition module 1001 is configured to acquire encoded data, which is data sent by any image processing method adopted by the encoding device in the present application; the decoding module 1002 is configured to decode the encoded data to obtain a decoded image, which is an image carrying an auxiliary stream image and its corresponding annotation information; the generation module 1003 is configured to superimpose the decoded image and the previous frame composite image to generate an image to be displayed; the display module 1004 is configured to display the image to be displayed.

It should be noted that the decoding device 1000 in this embodiment can implement any image processing method applied to a decoding device in the embodiments of the present application.

According to the decoding device of the implementation mode of the present application, by using the acquisition module to obtain the encoded data, it is possible to clarify the processing requirements for the encoded data, wherein the encoded data is the data sent by the encoding device and encoded by it using any one of the image processing methods in the present application, which is convenient for subsequent processing; the encoded data is decoded to obtain and display a decoded image, which is an image carrying an auxiliary stream image and its corresponding annotation information, so that the decoded image can reflect the characteristics of the annotation information and the auxiliary stream image, which is convenient for users to use.

FIG11 is a block diagram showing a terminal provided by the present application. As shown in FIG11 , in one embodiment, the terminal 1100 includes but is not limited to the following modules: an encoding device 1101 and/or a decoding device 1102 .

For example, (A) in FIG. 11 indicates that the terminal 1100 includes only the encoding device 1101 ; (B) in FIG. 11 indicates that the terminal 1100 includes only the decoding device 1102 ; and (C) in FIG. 11 indicates that the terminal 1100 includes the encoding device 1101 and the decoding device 1102 .

The encoding device 1101 is configured to execute any image processing method applied to an encoding device in the embodiments of the present application. The decoding device 1102 is configured to execute any image processing method applied to a decoding device in the embodiments of the present application.

For example, the terminal 1100 may be a terminal supporting audio/video conferencing functions (such as a smart phone, etc.), or a tablet computer supporting online teaching (or a personal computer, etc.). The above terminal categories are only examples, and specific settings can be made according to actual needs. Other unspecified terminal categories are also within the scope of protection of this application and will not be repeated here.

According to the terminal of the embodiment of the present application, the auxiliary stream image and its corresponding annotation information are synthesized by the encoding device to generate the current frame synthetic image, which can clarify the information of the terminal on the auxiliary stream image; the current frame synthetic image and the previous frame synthetic image are detected to determine the difference information, so that the user can synchronously obtain the difference information between two consecutive frames, thereby improving the interactivity between the terminal and the user; the auxiliary stream image is encoded according to the difference information to generate encoding data, which can speed up the encoding speed of the image to reduce the energy consumption of encoding. And the encoding data and its corresponding annotation information are obtained by the encoding device, which can clarify the processing requirements of the encoding data, and the encoding data is the image obtained by the encoding device encoding the auxiliary stream image according to the difference information, and the difference information is the information obtained by the encoding device detecting the current frame synthetic image and the previous frame synthetic image, which can enable the user to synchronously obtain the difference information between two consecutive frames, thereby improving the interactivity between the terminal and the user; the encoding data is decoded to obtain the image to be analyzed, thereby speeding up the processing speed of the image to be analyzed; the image to be analyzed is processed according to the annotation information corresponding to the encoding data to obtain a decoded image, so that the decoded image can reflect the characteristics of the annotation information and the auxiliary stream image, which is convenient for users to use.

Fig. 12 shows a block diagram of the image processing system provided by the present application. The image processing system includes a plurality of terminals connected in communication; wherein the terminals can implement any one of the image processing methods in the embodiments of the present application.

For example, as shown in FIG. 12 , in one embodiment, the image processing system includes but is not limited to the following devices: at least one transmitting terminal 1201 in communication connection, and at least one first receiving terminal 1202 and/or second receiving terminal 1203 .

For example, (A) in Figure 12 indicates that the image processing system includes: a sending terminal 1201 and a first receiving terminal 1202 that are communicatively connected; (B) in Figure 12 indicates that the image processing system includes: a sending terminal 1201 and a second receiving terminal 1203 that are communicatively connected; (C) in Figure 12 indicates that the image processing system includes: a sending terminal 1201, and a first receiving terminal 1202 and a second receiving terminal 1203 that are respectively communicatively connected to the sending terminal 1201.

The first sending terminal 1201 is configured to execute any of the embodiments of the present application. An image processing method applied to an encoding device.

The first receiving terminal 1202 is configured to execute any one of the image processing methods applied to a decoding device in the embodiments of the present application.

The second receiving terminal 1203 is configured to obtain the encoded data sent by the first terminal, decode the encoded data, obtain and display a decoded image including the annotation information corresponding to the auxiliary stream image, wherein the encoded data is data obtained by the encoding device by encoding the current frame synthetic image according to the difference information, the difference information is information obtained by the encoding device by detecting the current frame synthetic image and the previous frame synthetic image, and the current frame synthetic image is an image synthesized by the encoding device on the auxiliary stream image and its corresponding annotation information.

According to the image processing system of the implementation mode of the present application, the auxiliary stream image and its corresponding annotation information are synthesized by the sending terminal to generate the current frame synthesized image, which can clearly identify the information annotated by the sending terminal on the auxiliary stream image; the current frame synthesized image and the previous frame synthesized image are detected to determine the difference information, so that the user can synchronously obtain the difference information between two consecutive frames, thereby improving the interactivity between the terminal and the user; the auxiliary stream image is encoded according to the difference information to generate encoded data, which can speed up the encoding speed of the image to reduce the energy consumption of encoding. Furthermore, by sending the encoded data to the first receiving terminal and/or the second receiving terminal through at least one sending terminal, different receiving terminals can receive the encoded data and process the encoded data to obtain and display the decoded image including the annotation information corresponding to the auxiliary stream image, so that the first receiving terminal and/or the second receiving terminal can view the decoded image with the annotation information, so as to display the annotation information more clearly.

It should be clear that the present invention is not limited to the specific configurations and processes described in the above embodiments and shown in the figures. For the convenience and brevity of description, a detailed description of the known methods is omitted here, and the specific working processes of the systems, modules and units described above can refer to the corresponding processes in the above method embodiments, which will not be repeated here.

As shown in FIG13 , the computing device 1300 includes an input device 1301, an input interface 1302, a central processing unit 1303, a memory 1304, an output interface 1305, and an output device 1306. The input interface 1302, the central processing unit 1303, the memory 1304, and the output interface 1305 are interconnected via a bus 1307. The device 1306 is connected to the bus 1307 through the input interface 1302 and the output interface 1305 respectively, and further connected to other components of the computing device 1300.

Exemplarily, the input device 1301 receives input information from the outside and transmits the input information to the central processing unit 1303 through the input interface 1302; the central processing unit 1303 processes the input information based on the computer executable instructions stored in the memory 1304 to generate output information, temporarily or permanently stores the output information in the memory 1304, and then transmits the output information to the output device 1306 through the output interface 1305; the output device 1306 outputs the output information to the outside of the computing device 1300 for user use.

In one embodiment, the computing device shown in Figure 13 can be implemented as an electronic device, which may include: a memory configured to store a program; a processor configured to run the program stored in the memory to execute the image processing method described in the above embodiment.

In one embodiment, the computing device shown in Figure 13 can be implemented as an image processing system, which may include: a memory configured to store a program; a processor configured to run the program stored in the memory to execute the image processing method described in the above embodiment.

The above is only an exemplary embodiment of the present application and is not intended to limit the scope of protection of the present application. In general, the various embodiments of the present application can be implemented in hardware or special circuits, software, logic or any combination thereof. For example, some aspects can be implemented in hardware, while other aspects can be implemented in firmware or software that can be executed by a controller, microprocessor or other computing device, although the present application is not limited thereto.

Embodiments of the present application may be implemented by executing computer program instructions by a data processor of a mobile device, for example in a processor entity, or by hardware, or by a combination of software and hardware. The computer program instructions may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages.

The block diagrams of any logic flow in the drawings of this application may represent program steps, or may represent interconnected logic circuits, modules and functions, or may represent a combination of program steps and logic circuits, modules and functions. The computer program may be stored in a memory. The memory may be of any type suitable for the local technical environment and may use any Suitable data storage technology implementations include, but are not limited to, read-only memory (ROM), random access memory (RAM), optical storage devices and systems (digital versatile discs DVD or CD discs), etc. Computer-readable media may include non-transitory storage media. The data processor may be any type suitable for the local technical environment, such as, but not limited to, a general-purpose computer, a special-purpose computer, a microprocessor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a programmable logic device (FGPA), and a processor based on a multi-core processor architecture.

By way of exemplary and non-limiting examples, a detailed description of exemplary embodiments of the present application has been provided above. However, in conjunction with the drawings and claims, various modifications and adjustments to the above embodiments will be apparent to those skilled in the art without departing from the scope of the present invention. Therefore, the proper scope of the present invention will be determined according to the claims.

Claims

An image processing method, comprising:

Synthesize the auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image;

Detecting the current frame synthesized image and the previous frame synthesized image to determine difference information;

Encoding the current frame synthesized image according to the difference information to generate encoded data;

The encoded data is sent to a peer device, so that the peer device processes the encoded data, obtains and displays a decoded image including the annotation information corresponding to the auxiliary stream image.
The method according to claim 1, wherein the detecting the current frame synthetic image and the previous frame synthetic image to determine the difference information comprises:

According to a preset size, the current frame synthetic image and the pre-stored previous frame synthetic image are respectively partitioned to obtain a first region image set corresponding to the current frame synthetic image and a second region image set corresponding to the previous frame synthetic image; wherein the first region image set includes a plurality of first region images, the second region image set includes a plurality of second region images, and the number of regions in the first region image set is the same as the number of regions in the second region image set;

According to the number of regions, the first region image and the second region image are compared respectively to obtain the difference information.
The method according to claim 2, wherein the difference information comprises: at least one difference region, the difference region being an image region where image features of the first region image and image features of the second region image are different;

The step of encoding the current frame synthesized image according to the difference information to generate encoded data includes:

Determining difference profile information according to at least one of the difference regions;

Cutting the current frame synthesized image according to the difference contour information to obtain a changed area image;

The changed region image is encoded to generate the encoded data.
The method according to claim 1, wherein after detecting the current frame synthetic image and the previous frame synthetic image and determining the difference information, the method further comprises:

In a case where it is determined that the difference information represents that there is no difference between the current frame synthesized image and the previous frame synthesized image, the current frame synthesized image is skipped.
The method according to claim 1, wherein synthesizing the auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image comprises:

Based on multiple frame rates, obtaining annotation information corresponding to the auxiliary stream image;

Processing the annotation information corresponding to the auxiliary stream image according to a preset container and a preset image format to generate an annotated image;

The auxiliary stream image and the annotated image are integrated to generate the current frame composite image.
The method according to claim 5, wherein the step of integrating the auxiliary stream image and the annotated image to generate the current frame composite image comprises:

Performing image format conversion on the auxiliary stream image and the annotated image respectively to obtain a converted image set;

Scaling each image in the converted image set according to a preset image resolution to obtain a scaled image set;

Synchronizing each image in the zoomed image set according to a preset frame rate to obtain a processed auxiliary stream image and a processed annotated image;

The processed auxiliary stream image and the processed annotated image are superimposed and synthesized to generate the current frame synthesized image.
The method according to claim 6, wherein the images in the zoomed image set are synchronized according to a preset frame rate to obtain the processed auxiliary stream image and the processed auxiliary stream image. The processed annotated images include:

When it is determined that the actual frame rate of the image in the zoomed image set is greater than the preset frame rate, performing frame drop processing on each image in the zoomed image set based on a sampling manner to obtain the processed auxiliary stream image and the processed annotated image;

When it is determined that the actual frame rate of the images in the scaled image set is less than the preset frame rate, each image in the scaled image set is processed by internal interpolation to obtain the processed auxiliary stream image and the processed annotated image.
The method according to claim 6, wherein the step of superimposing and synthesizing the processed auxiliary stream image and the processed annotated image to generate the current frame synthesized image comprises:

The processed auxiliary stream image is used as a background image, and the annotation features in the processed annotation image are superimposed on the processed auxiliary stream image to obtain the current frame composite image.
The method according to claim 6, wherein the step of superimposing and synthesizing the processed auxiliary stream image and the processed annotated image to generate the current frame synthesized image comprises:

Processing the processed auxiliary stream image according to preset transparency information to obtain image features of the processed auxiliary stream image, wherein the image features of the processed auxiliary stream image match the annotation information;

The processed annotated image is used as a background image, and image features of the processed auxiliary stream image are superimposed on the processed annotated image to obtain the current frame composite image.
An image processing method, comprising:

Acquire coded data, wherein the coded data is data sent by the image processing method according to any one of claims 1 to 9;

Decoding the encoded data to obtain a decoded image, where the decoded image is an image carrying the auxiliary stream image and its corresponding annotation information;

Superimposing the decoded image and the previous frame of synthesized image to generate an image to be displayed;

The image to be displayed is displayed.
The method according to claim 10, wherein before displaying the image to be displayed, the method further comprises:

The image to be displayed is rendered to obtain a rendered image to be displayed.
A coding device, comprising:

A synthesis module is configured to synthesize the auxiliary stream image and its corresponding annotation information to generate a current frame synthesized image;

A detection module, configured to detect the current frame synthetic image and the previous frame synthetic image to determine difference information;

An encoding module, configured to encode the current frame synthesized image according to the difference information to generate encoded data;

The sending module is configured to send the encoded data to a peer device, so that the peer device processes the encoded data, obtains and displays a decoded image including the annotation information corresponding to the auxiliary stream image.
A decoding device, comprising:

An acquisition module, configured to acquire coded data, wherein the coded data is data sent by the image processing method according to any one of claims 1 to 9;

A decoding module, configured to decode the encoded data to obtain a decoded image, wherein the decoded image is an image carrying a secondary stream image and its corresponding annotation information;

A generating module, configured to superimpose the decoded image and a previous frame of synthesized image to generate an image to be displayed;

The display module is configured to display the image to be displayed.
A terminal, comprising: an encoding device and/or a decoding device;

The encoding device is configured to perform the image processing method according to any one of claims 1 to 9;

The decoding device is configured to execute the image processing method according to any one of claims 10 to 11.
An image processing system, comprising: a plurality of terminals in communication connection;

The terminal is configured to execute the image processing method according to any one of claims 1 to 9, or according to any one of claims 10 to 11.
An electronic device, comprising:

one or more processors;

A memory having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the image processing method as claimed in any one of claims 1 to 9, or any one of claims 10 to 11.
A readable storage medium, wherein the readable storage medium stores a computer program, and when the computer program is executed by a processor, the image processing method according to any one of claims 1 to 9 or any one of claims 10 to 11 is implemented.