WO2016011886A1

WO2016011886A1 - Method and apparatus for decoding image

Info

Publication number: WO2016011886A1
Application number: PCT/CN2015/083270
Authority: WO
Inventors: 何正军; 陈国权
Original assignee: 阿里巴巴集团控股有限公司; 何正军; 陈国权
Priority date: 2014-07-25
Filing date: 2015-07-03
Publication date: 2016-01-28
Also published as: CN105338358B; CN105338358A

Abstract

Disclosed are a method and apparatus for decoding an image. The method comprises: receiving a request for decoding a specified image; decomposing the specified image into a plurality of decoding tasks; determining decoding capability information about a central processing unit (CPU) and a graphics processing unit (GPU) of a current terminal device respectively when decoding the image; and allocating the plurality of decoding tasks to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU. By means of the present application, the efficiency of decoding an image can be improved.

Description

Method and device for decoding images

Technical field

The present application relates to the field of image decoding technologies, and in particular, to a method and apparatus for decoding an image.

Background technique

Currently, in many mobile terminal applications, decoding and rendering of images is also a commonly used processing technique. Especially for mobile applications such as Taobao and Tmall, which display goods, the picture carries very important information of the goods, because it can provide customers with very intuitive and fast information, and can display the details or details of the goods from different angles. information. However, since the CPU (usually ARM) used by mobile devices currently has a certain processing gap compared with PCs, and the mobile device has strong sensitivity to power consumption, how can the mobile terminal be enabled? The application quickly decodes the image and becomes a critical part of the mobile terminal application rendering it to the screen.

For the above reasons, people have been looking for various methods for accelerating image processing. However, due to the limitation of the CPU's floating-point computing power, image processing operations that require high-density calculations are not obvious in processing performance and efficiency. improvement. With the rapid development of programmable graphics processing units (GPUs), the technology of using GPU to accelerate image processing has gradually become a research hotspot. The GPU is a core processor dedicated to image processing. Its function relative to the graphics card is equivalent to the role of the CPU in the entire terminal device. However, in the prior art, the use of the GPU for decoding makes its acceleration efficiency not well met the expectations, mainly in the following aspects:

First, before starting processing on the GPU, the data needs to be written to the GPU's memory first. At the same time, because the mobile application cannot know the specific location of the cache at the time of display, after the GPU processing is completed, the CPU host side You also need to read the data back into the CPU memory. Obviously, the magnitude of the acceleration gain produced by parallel decoding on the GPU is reduced by data input and output (IO).

Second, after writing data to the GPU and triggering the GPU to decode, the CPU needs to wait for the GPU decoding to complete, and then read the decoded data back to the CPU memory. This process will cause the CPU to wait idle during the time that the GPU is decoding, eventually resulting in wasted CPU processing power in this processing time.

In summary, how to further increase the decoding rate of an image becomes a technical problem that is urgently needed by those skilled in the art.

Summary of the invention

The present application provides a method and apparatus for decoding an image, which can improve the decoding efficiency of the image.

This application provides the following solutions:

A method of decoding an image, comprising:

Receiving a request to decode a specified image;

Decomposing the specified image into a plurality of decoding tasks;

Determining decoding capability information of the central processor CPU of the current terminal device and the graphics processor GPU when decoding the image respectively;

The plurality of decoding tasks are allocated to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU.

An apparatus for decoding an image, comprising:

a decoding request receiving unit, configured to receive a request to decode the specified image;

a decoding task decomposition unit, configured to decompose the specified image into a plurality of decoding tasks;

a decoding capability information determining unit, configured to determine decoding capability information of a central processing unit CPU and a graphics processing unit GPU of the current terminal device when decoding an image, respectively;

And a decoding task allocation unit, configured to allocate the plurality of decoding tasks to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU.

According to a specific embodiment provided by the present application, the present application discloses the following technical effects:

Through the embodiment of the present application, joint decoding of the CPU and the GPU can be implemented. When the decoding tasks of the two are allocated, the decoding is performed based on the respective decoding capabilities, and further, the preliminary decoding can be performed on the CPU side. After the single image is decomposed into multiple decoding tasks, the task amount is allocated in units of decoding tasks. Thus, after the task is allocated, the CPU and the GPU need to perform the same operation steps, but according to the respective decoding capabilities. Different tasks are allocated, so the synchronization of the two decodings can be guaranteed to the maximum, and one of the processors can be decoded. After a long wait for another processor to complete the decoding, the decoding efficiency of the image is improved as a whole.

Of course, implementing any of the products of the present application does not necessarily require all of the advantages described above to be achieved at the same time.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings to be used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only some of the present application. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.

1 is a flowchart of a method provided by an embodiment of the present application;

FIG. 2 is a flow chart of another method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of an apparatus provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments of the present application are within the scope of the present disclosure.

In the embodiment of the present application, in order to improve the efficiency of image decoding, a method of jointly decoding the CPU and the GPU may be adopted. In a specific implementation, the single image can be decomposed into multiple decoding tasks, and the decoding task can be allocated according to the decoding capability ratio of the CPU and the GPU in the current terminal device when decoding the image, so that the CPU and the GPU can be guaranteed as much as possible. In the synchronization at the time of decoding, the situation in which one processor waits for the decoding result of the same image by another processor for a long time is avoided, and the image decoding capability of the CPU and the GPU is fully utilized, thereby improving the image decoding efficiency of the terminal. The specific implementation is described in detail below.

Referring to FIG. 1, the embodiment of the present application first provides a method for decoding an image, and the method may specifically include the following steps:

S101: Receive a request for decoding a specified image.

The request received in this step may be sent by an upper layer code, for example, a front end code of an application such as a browser. The specific specified image generally refers to an image, that is, each step in the embodiment of the present application, and describes a process of decoding a single image, for example, a description picture of a certain commodity object.

In practical applications, the method can be applied to the process of displaying the image of the product object in the e-commerce transaction platform, or the process of displaying the image by other image-related application software and programs. In the case of displaying the product object image, the client may send a browsing request of the product object to the server, and then may receive the page information of the product object image information returned by the server, and in the process of displaying the page information, The request for decoding the specified image by the upper layer code may be received, wherein the specified image may be one of each product object image to be displayed, and in the case that the page includes multiple product object images, each product The object picture can be displayed according to the method of the embodiment of the present application.

S102: Decompose the specified image into multiple decoding tasks.

After receiving the decoding request, in the embodiment of the present application, the image may be firstly decoded and decomposed into multiple decoding tasks. For example, for an image of the JPEG format, MCU decoding may be performed first, so that one image may be decomposed into a plurality of MCU block lines, and each MCU block line is composed of a plurality of MCU blocks. In this way, decoding task assignment between the CPU and the GPU can be performed in units of MCU block behavior.

S103: determining decoding capability information of the central processing unit CPU and the graphics processing unit GPU of the current terminal device when decoding the image respectively;

The so-called decoding capability information may be measured by a plurality of specific parameters, such as a decoding rate, or a time when decoding the same image, and the like. In practical applications, there are generally a variety of CPU models and a variety of GPU models, specific to a terminal device, may be equipped with one of the models of the CPU, a certain type of GPU, where the CPU and GPU may be independent of each other, It may also be integrated, and so on. In addition, different terminal devices may be different in other hardware configurations, including memory size and the like. Therefore, for different terminal devices, the decoding ability of the CPU and GPU when decoding images may be different. The embodiment of the present application performs the allocation of the decoding task according to the decoding capability of the CPU and GPU of the current terminal device when decoding an image.

It should be noted that the image in the actual application generally has multiple formats, for example, may include a JPEG format, a jpg format, a tif format, a bmp format, and the like. For the same terminal device, the CPU and GPU may have different decoding capabilities when decoding images of different formats. Therefore, when determining the decoding capability information, the format of the specified image may be first determined, and then the current terminal is determined. The decoding capability information of the CPU and the GPU of the device when decoding the image of the format, respectively, so that the subsequent decoding of the decoding task by the information can better realize the synchronization of the decoding of the two processors.

In addition, there may be multiple types of images in the same format. The CPU and GPU of the same terminal device may also exhibit different decoding capabilities when decoding images of different encoding types in the same image format. Therefore, in order to further improve the accuracy of the synchronization, in a specific implementation, the format of the specified image may be first determined, and then the encoding type of the specified image in the format may be determined, and then the CPU and the GPU of the current terminal device may be respectively decoded in the format. The decoding capability information when the image of the encoding type is output, so that the information can be used to allocate the decoding task between the CPU and the GPU. For example, for JPEG format images, the IDCT (Inverse Discrete Cosine Transform) method has a low-speed high-precision integer method, a high-speed sub-precision integer method, and a fast floating-point type. Upsample can be further divided into fullsize, h2v1, h2v2, h2v1_fancy, and H2v2_fancy, etc., different IDCT, Upsample methods can be combined into a variety of specific coding types. The same CPU or GPU may also have different decoding capabilities for JPEG images of different encoding types. For example, the horizontal sampling factor and the vertical sampling factor during decoding may affect the processing time of IDCT and upsample. In this way, for the JPEG format image, the decoding capability information when the current terminal device decodes the JPEG format image of the specific encoding type can be allocated, and the information is used to perform the allocation of the decoding task.

For example, when obtaining the decoding capability information of the CPU and GPU of the current terminal device, there are various implementation manners, which are described in the following examples.

In one implementation, considering the same type of terminal device, since it has the same hardware configuration information, such as the same model of CPU, GPU, etc., its decoding capability when decoding images is generally the same. Therefore, a first database may be established in advance by testing or the like, in which the correspondence between the model of the plurality of terminal devices and the decoding capability information of the CPU and the GPU when the image is decoded is stored in the first database, When receiving the decoding request in the current terminal device, the model of the current terminal device may be obtained first, and then the data may be used in the foregoing model. The library finds the decoding capability information of the CPU of the current terminal device and the GPU when decoding the image, respectively. For example, in practical applications, the structure of the first database can be as shown in Table 1 below:

Table 1

For example, the terminal device model may include an iPhone 5S 32G, a Samsung S416G, etc. In the table 1, the decoding capability is expressed by a rate, and in actual applications, other parameters may also be used.

Of course, the database may also store the decoding capability values corresponding to the CPU and GPU of each terminal device when decoding images of various formats. At this point, the structure of the first database can be as shown in Table 2 below:

Table 2

The specific image format may include a JPEG format, a jpg format, a tif format, a bmp format, and the like.

In addition, the decoding capability information corresponding to different coding types in each format may be saved in the first database, and the structure of the specific first database is not introduced here.

It should be noted that, as shown in Table 1 and Table 2 above, the values of the decoding capabilities of the CPU and the GPU of each terminal device are recorded in the first database. In actual applications, the actual allocation is used for decoding task assignment. The information is the ratio between the decoding capabilities of the two. Therefore, the ratio can also be directly saved in the first database. Thus, by querying the first database, the current terminal device can be directly queried to decode the CPU and the GPU. Ability ratio. For example, the structure of the first database at this time can be as shown in Table 3 below:

table 3

Similarly, the ratio of the decoding capability of the CPU and the GPU when processing the images of each format by the terminal devices of each model can also be stored in the database. For example, the structure of the specific first database can be as shown in Table 4 below:

Table 4

Of course, in practical applications, the decoding capability of the CPU and GPU may be related not only to the model but also to the memory usage of the terminal device. Therefore, the decoding capability information is obtained by the above table lookup method. At the same time, the decoding ability of the CPU and GPU in the specific terminal device may not be accurately reflected. Therefore, in the embodiment of the present application, the decoding capability information of the CPU and the GPU can also be obtained in a specific terminal device.

For example, in one implementation manner, the decoding capability information of the CPU and the GPU of the current terminal device when decoding the image may be tested in advance using the preset image, and the test result is saved, so that an encoding is received. After the request, the decoding capability information of the current terminal device's CPU and the GPU when decoding the image may be determined according to the saved test result. Specifically, the test-dedicated JPEG image may be prepared in advance (for example, when the decoder is installed, the test-dedicated JPEG image may be downloaded to the terminal device together with the installation package), and then in the terminal device, according to step S102 In the manner described above, the preset image is decomposed into a plurality of test decoding tasks, and then the dedicated JPEG image is decoded by using the CPU and the GPU separately, and then, according to the task amount of each decoding task and the respective decoding stations. The time spent calculating the decoding rates of the two, so that the rate information can be saved as the respective decoding capability information in the data table preset locally by the terminal device. Further, after receiving the specific decoding request, the data of the current terminal device and the decoding capability information of the GPU when decoding the image can be obtained by querying the data table. Of course, there may be more than one dedicated JP image, which corresponds to different formats. In the same format, there may be multiple dedicated images, corresponding to different encoding types. In testing, the CPU and GPU may be tested separately. The decoding capability information of the images of the respective formats or the respective encoding types is saved in the terminal device, and after receiving the specific decoding request, the data acquired in the local test can be utilized to perform the decoding task allocation. It is.

In addition, for the mobile terminal device, if it is necessary to download an additional test-dedicated image and additionally run the test program, the performance of the mobile terminal device may be affected for a mobile terminal device with limited resources. Therefore, in the embodiment of the present application, the decoding capability information of the CPU and the GPU of the current terminal device when decoding the image is respectively tested by using the image carried in the actual decoding request, and the test result is saved, so that When a new decoding request is received, the CPU and GPU of the current terminal device can be determined according to the previously saved test results. Decoding capability information when decoding an image.

For example, when the decoding request is received for the first time, since the current terminal device has not obtained any information about the decoding capability of the CPU and the GPU in the device, the image carried in the current request may be decomposed into multiple pieces. After decoding the task, all of them are allocated to one of the processors (for example, the CPU) for decoding, and then the decoding capability information of the processor is calculated according to the task amount of the decoding task and the time taken to complete the decoding, and is saved to the current terminal device locally. . When the decoding request is received for the second time, after the image carried in the current request is decomposed into multiple decoding tasks, all of them are allocated to another processor (for example, a GPU) for decoding, and then according to the task amount of the decoding task and decoding is completed. The time spent calculating the decoding capability information of the other processor is saved locally on the current terminal device. In this way, from the third time the decoding request is received, the local data table of the terminal device can be directly queried, and the decoding capability information of the CPU and the GPU is obtained, and the decoding task is allocated.

Of course, in this implementation mode, the decoding capability test may also be performed according to different image formats and different coding types in the same format. When the decoding request for an image of a certain format is received for the first time, the image of the format carried in the current request may be decomposed into a plurality of decoding tasks, all of which are allocated to one of the processors (for example, a CPU) for decoding, and then The decoding capability information of the processor is calculated and saved to the current terminal device local. When the decoding request of the image of the format is received for the second time, the image of the format carried in the current request may be decomposed into multiple decoding tasks, and all of the images are allocated to another processor (for example, a GPU) for decoding. The decoding capability information of the other processor is then calculated and saved locally on the current terminal device. In this way, from the third time that the decoding request of the image of the format is received, the data table local to the terminal device can be directly queried, and the decoding capability information of the CPU and the GPU when processing the image of the format is obtained, and the decoding task is performed. Allocated. The case of different coding types in the same format is similar, and will not be described in detail here.

S104: Allocate the multiple decoding tasks to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU.

After obtaining the decoding capability ratio between the CPU and the GPU, the plurality of decoding tasks obtained by the previous decomposition can be allocated to the CPU and the GPU for parallel processing. In a specific implementation, the first number of decoding tasks may be allocated to the CPU according to the total number of decoding tasks and the decoding capability ratio, and the second number of decoding tasks are allocated to the GPU, where the first number and the second number are The sum of numbers, etc. For the total number of decoding tasks, the ratio between the first number and the second number can be as close as possible or equal to the decoding capability ratio between the CPU and the GPU to improve the accuracy of the synchronization.

After the decoding in the CPU is completed, the decoding result of the GPU can be read back from the GPU, and the decoding result of the CPU is combined with the decoding result of the GPU to obtain a combined decoding result. And further, the combined results are returned.

In practical applications, it is generally not possible to achieve full synchronization. In this way, after the CPU finishes decoding, the GPU may not have completed decoding, and may have completed decoding. Therefore, when reading back the decoding result of the GPU from the GPU. The CPU can attempt to read back the decoding result of each MCU block line of the second line number from the GPU. Specifically, it can be determined whether the GPU has completed decoding, and if so, the GPU decoding result is read back from the GPU, otherwise it can wait until the GPU has completed decoding, and the GPU reads back the second line from the GPU. The decoding result of the MCU block line.

Specifically, when the decoding result is combined, the decoding result of the two is mainly saved according to a predetermined rule. The so-called established rules are mainly determined based on the positions assigned to the respective decoding tasks in the original image. Therefore, in a specific implementation, the relative position relationship information of the decoding task allocated to the CPU and the decoding task assigned to the GPU in the original specified image may be recorded, so that according to the saved address of the decoding result of the CPU and the relative positional relationship, The storage address of the GPU decoding result is determined, and the decoding result of the GPU is saved to the save address, and the decoding result is merged.

For example, suppose a JPEG image has 100 MCU block lines, 60 of which need to be allocated to the CPU for decoding, and 40 lines need to be allocated to the GPU for decoding. When assigning, the first 60 lines can be allocated to the CPU, and the last 40 lines. Assigned to the GPU, so that after the CPU completes the decoding and saves the decoding result in an address, the end address of the CPU decoding result can be determined. After reading back the decoding result of the GPU, the next address unit of the end address can be The starting address is used to save the decoding result of the GPU, that is, the decoding result of the CPU and the decoding result of the GPU are kept in the storage address, thus realizing the combination of the decoding result of the CPU and the GPU, and the combined decoding result is Can be returned to the upper layer code, for example, returned to the browser's display module for rendering, display and other subsequent operations.

For ease of understanding, in the following second embodiment, taking the decoding of the JPEG image as an example, the present application The examples are described in further detail.

First, a brief introduction to the basic operation steps of JPEG image decoding is given. When performing JPEG image decoding, first, the related information of the file to be decoded is read one by one according to the JPEG file data storage method, and the next decoding operation is prepared. Since the image data stream is composed of MCUs, and the MCU is composed of data units and color components, the image data stream stores information in units of bits, and the internal data is encoded by forward discrete cosine transform. (FDCT) performs the conversion of the space-time domain to the frequency domain. Therefore, the decoding mainly includes the following steps:

First, MCU decoding is required for the JPEG image, and the JPEG image is decoded into a plurality of MCU block lines, and each MCU block line is composed of a plurality of MCU blocks.

Then, since the data in the file is the result of transforming the space-time domain to the frequency domain by forward discrete cosine transform (FDCT) at the time of encoding, it is necessary to inverse discrete cosine transform (IDCT) when decoding, that is, The frequency domain values in the color component unit matrix are converted to the space-time domain, and the matrix size of the original frequency domain is 8*8. After the inverse discrete cosine transform, the matrix of the space-time domain is still 8*8.

After IDCT, upsample is also required.

Finally, to display an image on the screen, you need to represent the color of the image in RGB mode. Therefore, the YCrCb mode needs to be converted to the RGB mode during decoding. For convenience of description, this step is referred to as YCC2RGB.

General JPEG image decoding involves several major steps described above. In the embodiment of the present application, the method of jointly decoding the CPU and the GPU, that is, for the same decoding task, is jointly performed by the CPU and the GPU, and each part of the decoding task is shared. However, the specific decoding steps are still included in the above steps, and the specific implementation in each step is not the focus of the embodiment of the present application. Therefore, no detailed description will be made, but how to perform the decoding task allocation. The purpose of finally achieving the maximum improvement of decoding efficiency belongs to the key content of the embodiment of the present application, and therefore, the content will be described in detail.

Referring to FIG. 2, when decoding a JPEG image, the following steps may be included:

S201: Receive a request for decoding a specified JPEG image;

S202: performing MCU decoding on the specified JPEG image to obtain multiple MCU block rows, where each MCU block row is composed of multiple MCU blocks;

After receiving the decoding request, the image can be first MCU decoded in the CPU, and all MCU decoding operations are performed in the CPU. After the CPU decodes, multiple MCU block rows can be obtained, and each MCU block row is composed of multiple MCU blocks. In this way, decoding task assignment between the CPU and the GPU can be performed in units of MCU block behavior. The number of MCU blocks specifically included in each MCU block row is generally determined by factors such as the height and width of the specific image and the horizontal and vertical sampling factors.

It should be noted that, in the embodiment of the present application, the minimum granularity of the decoding task allocation is an MCU block row, because an upsample (upsampling) operation is required in the GPU, if the processed data is not a complete MCU block row. Will cause a failure in the upsampling calculation on the GPU.

S203: Determine decoding capability information of the current terminal device CPU and the GPU when decoding the JPEG image respectively;

As described in the first embodiment, when determining the decoding capability information, there may be multiple ways, including querying a pre-established database, or using a preset test-specific image for testing, or performing in an actual decoding process. Testing, and more. In the second embodiment, a specific implementation manner is mainly described in the manner of testing in the actual decoding process.

Specifically, when receiving the request for decoding a specified JPEG image, the encoding type of the specified JPEG image may be acquired, and then determining whether the current terminal device has saved the CPU and the processing of the GPU when processing the encoding type JPEG image The rate information, if both are not yet, proves that the encoding type image is decoded on the terminal device for the first time. Therefore, all decoding tasks can be assigned to the CPU, that is, all decoding operations of the JPEG image are all After the CPU completes, the task amount of the decoding task can be calculated and the time taken by the CPU to complete the decoding task is recorded. It should be noted that, since the MCU decoding work is more suitable for the CPU, in the embodiment of the present application, after the CPU and the GPU are jointly decoded, the MCU decoding operation may be completed on the CPU. Assign subsequent tasks to the CPU and GPU. Therefore, when calculating the task amount of the CPU and the time spent here, the calculation may be started after the completion of the MCU decoding, that is, from the completion of the MCU decoding to the completion of all the decoding tasks. Traffic and time spent. Then, according to the ratio between the task amount and the elapsed time, the decoding rate of the CPU when decoding the encoding type JPEG image can be calculated, and the decoding capability information of the CPU when decoding the JPEG image of the encoding type is currently Saved in the terminal device. It should be noted that, for the JPEG image, after decoding by using the CPU, the decoding result can be returned, and when another decoding request is received later, if it is found that the JPEG image of the encoding type is still, it can be directly based on the already The saved information acquires the processing rate when the CPU in the current terminal device decodes the JPEG image of the type.

If, after receiving the request for decoding a specified JPEG image and obtaining the encoding type of the image, it is found that only the current terminal device stores the processing rate information when the CPU decodes the JPEG image of the encoding type, It is proved that the JPEG image of the encoding type is decoded in the terminal device for the second time. Therefore, the image can be decoded using the GPU to test the processing rate of the GPU when decoding the JPEG image of the encoding type. However, in a specific implementation, the CPU is more suitable for the MCU decoding operation. Therefore, in the preferred scheme of jointly decoding the CPU and the GPU provided by the embodiment of the present application, the MCU decoding operations are all completed in the CPU. Therefore, when testing, the following may be performed: first, performing MCU decoding on the currently specified JPEG image in the CPU, and then transmitting the MCU decoding result to the GPU, so that the GPU performs subsequent decoding operations based on the MCU decoding result, including IDCT, Upsample, YCC2RGB, etc. After the GPU decoding is completed, the CPU reads back the GPU decoding result, and then calculates the task amount of the decoding task allocated to the GPU, and records the time taken by the GPU to perform the decoding process; in addition, when testing the decoding time on the GPU It can also add the time to write the MCU decoded output data to the GPU memory and finally the IO time of reading the decoded RGB data back to the CPU memory, that is, the time taken to send the decoding task to the GPU can be calculated. And the time taken by the GPU to perform the decoding process and the time taken to read back the decoding result from the GPU, and then, according to the task amount of the decoding task allocated to the GPU and the sum of the foregoing times, calculate that the GPU is decoding the encoding type. The processing rate at the time of the JPEG image, and the decoding capability information when the GPU decodes the JPEG image of the encoding type is saved in the current terminal device.

In this way, when the decoding request for the JPEG image of the encoding type is subsequently received, the processing rate information of the CPU and the GPU when decoding the JPEG image of the encoding type can be directly read from the current terminal device.

That is to say, in the above manner, for the JPEG images of various encoding types, when decoding is performed twice before, the CPU and the GPU are separately used for decoding, so as to test the respective decoding capabilities, and then receive them. When a new JPEG image is obtained, the decoding processing rate information of the CPU and the GPU can be obtained by querying from the terminal device.

S204: The decoding task is allocated to the CPU and the GPU according to the total number of rows of the MCU block row and the pre-acquired rate ratio, where the decoding task of the CPU includes the MCU block row of the first row number. The decoding task of the GPU includes a second row of MCU block lines;

After obtaining all the MCU block lines of the jpeg picture, the decoding rate ratio of the CPU and GPU of the current terminal device can be queried. Of course, the encoding type of the currently specified JPEG image can be determined first, and then the CPU and GPU of the encoding type are queried. The decoding rate ratio. Then, the decoding task can be allocated to the CPU and the GPU according to the total number of rows of the MCU block row and the decoding rate ratio, wherein the decoding task of the CPU includes the MCU block row of the first row number, and the decoding task of the GPU includes the second row number. MCU block line. The sum of the first row number and the second row number is equal to the total number of rows of the MCU block row, and the specific values of the first row number and the second row number are determined according to the total number of rows and the aforementioned decoding rate ratio. For example, in an ideal state, the ratio of the first row to the second row may be exactly equal to the decoding rate ratio between the CPU and the GPU. Of course, in practical applications, since the total number of rows may not be divisible, the ratio of the first row to the second row is as close as possible to the decoding rate ratio between the CPU and the GPU. For example, assuming that the decoding rate ratio between the CPU and the GPU is 2:1, and the total number of rows of the MCU block line is 90 lines, the first line number is directly determined to be 60, and the second line number is determined to be 30, however, If the total number of lines of the MCU block line is 91 lines, since the number of lines cannot be divisible by 3, the first line number can be determined to be 60, the second line number is determined to be 31, or the first line number is determined to be 61. The second line number is determined to be 30, and so on.

It should be noted that when assigning a decoding task, generally, the position of each block in the original image can be performed. For example, in the case where the decoding rate ratio between the CPU and the GPU is 2:1, the first two-thirds of the block lines can be allocated to the CPU, and the last third of the fast lines are allocated to the GPU, so that the subsequent When merging the decoding results, the GPU decoding result is directly saved after the address where the CPU decoding result is located.

S205: The GPU is in a non-blocking manner by using the MCU block behavior parameter of the second row number. Transmitting a decoding request, so that the GPU decodes the MCU block row of the second row number;

After the MCU block lines that need to be decoded by the GPU are determined, the block lines can be sent to the GPU, that is, the decoding request is sent to the GPU with the block behavior parameter of the second line number. It should be noted that, when sending a decoding request to the GPU, the specific number of work items of the GPU in different dimensions may be configured based on the second line number (specifically, the programming platform of different GPUs changes) And the change, currently mainly based on CUDA or OpenCL two). In addition, it should be noted that the GPU performs subsequent decoding operations after the MCU decoding has been performed to obtain the MCU block line, and mainly includes IDCT, upsample, YCC2RGB, and the like. The decoding request sent to the GPU needs to be a non-blocking request, so that the decoder on the CPU side can continue to perform subsequent operations after the request is sent, without waiting for the GPU to return a response, that is, The MCU block line of the first line number is decoded, and after the MCU decoding, subsequent decoding operations such as IDCT, upsample, YCC2RGB, and the like are performed on the basis of the MCU block line. That is to say, after the MCU decoding is completed from the CPU, and the subsequent decoding tasks are allocated to the CPU and the GPU, the flow of steps required for the CPU and the GPU to perform decoding is the same, so that the tasks are performed according to the respective decoding processing rates. Quantity allocation, therefore, can maximize the synchronization of the CPU and GPU, that is, try to make the CPU complete the decoding of the MCU block row of the first row, while the GPU also completes the MCU block row of the second row. Knowing the codes, each does not need to take a long time to wait for the other party to complete the decoding.

It should be noted that IDCT is generally the most cumbersome and complicated step in each step after the MCU is decoded. For this reason, in the embodiment of the present application, in the GPU, multiple MCU blocks can be used simultaneously. The thread performs IDCT transformation. For this reason, each MCU block can also be grouped. For example, since the width of a single MCU matrix is Hmax*8 pixels, the height is Vmax*8 pixels, and each MCU is divided into several data. Units, data units are typically 8*8 in size, so each data unit can be divided into 8 groups of 8 data per group, so that in the GPU, 8 sets of data can be distributed to 8 different threads. Synchronous IDCT (To ensure optimal memory access efficiency, it is generally necessary to allocate 8 sets of data of one MCU block to different threads of the same workgroup for decoding). However, in view of the fact that the main advantage of the GPU is the decoding process, it is a weakness of the GPU for how to perform data grouping, calculation of the subsequent IDCT result storage address, and the like. For this reason, in the embodiment of the present application, on the CPU side Decoder determines which MCU block rows need to be assigned to After the GPU, each MCU block can also be first grouped, and an address offset can also be calculated for each packet, so that when a decoding request is sent to the GPU, the grouping result of each MCU block and each group can be The corresponding address offset is sent to the GPU together. In this way, when specifically decoding, the GPU allocates data of each packet to each thread, performs IDCT conversion, and saves the IDCT calculation result of each group of data according to the corresponding address offset, and does not need to be re-executed. Address calculation, providing GPU decoding efficiency. After the GPU performs IDCT transformation on each MCU block, subsequent upsample and YCC2RGB calculations can be performed.

S206: Decode, in the CPU, the MCU block row of the first row number;

As described above, after sending a decoding request to the GPU, the CPU can decode the MCU block row of the first row number, and the specific operations that need to be performed are also the main operations of the steps of IDCT, upsample, and YCC2RGB.

S207: After decoding the CPU is completed, reading a decoding result of the GPU from the GPU.

S208: Combine the decoding result of the CPU with the decoding result of the GPU to obtain a combined decoding result.

After reading back the decoding result of the GPU, it can be merged with the decoding result of the CPU, so that the combined decoding result can be obtained, and the combined decoding result can be further returned to the upper layer code. For details, refer to the description of step S104 in the first embodiment, and the detailed description is omitted here.

In summary, in the embodiment of the present application, for the JPEG image, the joint decoding of the CPU and the GPU can be implemented, and when the decoding tasks of the two are allocated, the allocation is performed based on the respective decoding capabilities, and is performed on the CPU side. After the MCU decodes, the decoding task is allocated in units of MCU block behavior. Thus, after the task assignment, the CPU and the GPU need to perform the same operation steps, but different task amounts are allocated according to the respective decoding capabilities. Therefore, the synchronization of the decoding of the two can be ensured to the utmost, and the situation that one of the parties needs to wait for the decoding of the other party after the decoding is completed is avoided, thereby improving the decoding efficiency of the JPEG image as a whole.

Corresponding to the method for decoding a JPEG image provided by the embodiment of the present application, the present application implements The example also provides an apparatus for decoding an image. Referring to FIG. 3, the apparatus may include:

a decoding request receiving unit 301, configured to receive a request for decoding a specified image;

Decoding task decomposition unit 302, configured to decompose the specified image into multiple decoding tasks;

The decoding capability information determining unit 303 is configured to determine decoding capability information when the central processing unit CPU of the current terminal device and the graphics processing unit GPU respectively decode the image;

The decoding task allocation unit 304 is configured to allocate the plurality of decoding tasks to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU.

The decoding capability information determining unit 303 may specifically include:

a format determining subunit, configured to determine a format of the specified image;

The first capability information determining subunit is configured to determine decoding capability information of the CPU of the current terminal device and the GPU when decoding the image of the format, respectively.

An encoding type determining subunit, configured to determine an encoding type of the specified image in the format;

The second capability information determining subunit is configured to determine decoding capability information of the CPU of the current terminal device and the GPU when decoding the image of the encoding type in the format, respectively.

Specifically, the decoding capability information determining unit 303 may specifically include:

The terminal model information obtaining subunit is configured to obtain model information of the current terminal device;

The query subunit is configured to query a pre-established database according to the model of the current terminal device, and obtain decoding capability information of the current terminal device CPU and the GPU when decoding the image respectively; wherein the database stores multiple terminals Correspondence between the model of the device and the decoding capability information of the CPU and GPU mounted on the image when decoding the image.

In another implementation, the device may further include:

a first test unit, configured to test, in advance, the decoding capability information of the current terminal device CPU and the GPU when decoding the image, and save the test result;

The decoding capability information determining unit 303 can be specifically configured to:

Determining, according to the saved test result, decoding capability information of the CPU of the current terminal device and the GPU when decoding the image, respectively.

In another implementation, the decoding capability information determining unit 303 may further include:

a second test unit, configured to pre-use the image carried in the decoding request, test the decoding capability information of the current terminal device CPU and the GPU when decoding the image, and save the test result;

Specifically, the decoding task assignment unit 304 may specifically include:

An allocation subunit, configured to allocate a first number of decoding tasks to the CPU according to the total number of decoding tasks and the decoding capability ratio, and allocate a second number of decoding tasks to the GPU;

a decoding request sending subunit, configured to send a decoding request to the GPU in a non-blocking manner by using the decoding task of the second number as a parameter, so that the GPU decodes the decoding task of the second number;

And a decoding subunit, configured to decode the decoding task of the first number in the CPU.

In a specific implementation, the device may further include:

a decoding result readback unit, configured to read back a decoding result of the GPU from the GPU after decoding in the CPU is completed;

And a decoding result merging unit, configured to combine the decoding result of the CPU with the decoding result of the GPU, and return the combined result.

In addition, the device may further include:

a recording unit, configured to record, when the plurality of decoding tasks are allocated to the CPU and the GPU for parallel processing, record a relative of a decoding task allocated to the CPU and the decoding task allocated to the GPU in the specified image Location relationship information;

The decoding result combining unit may be specifically configured to:

And determining, according to the saved address of the decoding result of the CPU and the relative positional relationship, a save address of the decoding result of the GPU, and saving the decoding result of the GPU to the save address.

The specified image includes an image in a JPEG format, and the decoding task decomposition unit 302 may be specifically configured to:

The specified image is subjected to minimum coding unit MCU decoding to obtain a plurality of MCU block lines, each MCU block line is determined as one decoding task, and each MCU block line is composed of a plurality of MCU blocks.

Wherein, the device may further comprise:

a data group dividing unit, configured to divide each MCU block allocated to the GPU processing into a preset number of data groups;

An address offset calculation unit, configured to calculate an address offset of the calculation result corresponding to each data group after the inverse discrete cosine transform IDCT is performed in the GPU;

a sending unit, configured to: when the decoding task is allocated to the GPU, send the group information and the address offset information corresponding to each group to the GPU, so that the GPU sends the same MCU block according to the group information. The IDCT calculation is performed for a plurality of threads, and the IDCT calculation result is saved according to the address offset for subsequent steps to read the IDCT calculation result according to the address offset, and perform subsequent decoding calculation.

In practical applications, it also includes:

a browsing request sending unit, configured to send a browsing request of the commodity object to the server before receiving the request for decoding the specified image;

The page information receiving unit is configured to receive page information including the product object picture information returned by the server, and receive the request for decoding the specified image in the process of displaying the page information.

It will be apparent to those skilled in the art from the above description of the embodiments that the present application can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present application or portions of the embodiments.

The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for a system or system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. The system and system embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie It can be located in one place or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. General skill in the art Personnel can understand and implement without creative work.

The method and device for decoding an image provided by the present application are described in detail above. The principles and implementation manners of the present application are described in the specific examples. The description of the above embodiments is only used to help understand the present application. The method and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as limiting the application.

Claims

A method for decoding an image, comprising:

Receiving a request to decode a specified image;

Decomposing the specified image into a plurality of decoding tasks;

Determining decoding capability information of the central processor CPU of the current terminal device and the graphics processor GPU when decoding the image respectively;

The plurality of decoding tasks are allocated to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU.
The method according to claim 1, wherein the determining the decoding capability information of the CPU of the current terminal device and the GPU when decoding the image respectively includes:

Determining a format of the specified image;

Determining decoding capability information of the CPU of the current terminal device and the GPU when decoding the image of the format, respectively.
The method according to claim 1, wherein the determining the decoding capability information of the CPU of the current terminal device and the GPU when decoding the image respectively includes:

Determining a format of the specified image;

Determining an encoding type of the specified image in the format;

Determining decoding capability information of the CPU of the current terminal device and the GPU when decoding the image of the encoding type in the format, respectively.
The method according to any one of claims 1 to 3, wherein the determining the decoding capability information of the CPU of the current terminal device and the GPU when decoding the image respectively includes:

Obtain the model information of the current terminal device;

Querying a pre-established database according to the model of the current terminal device, and acquiring decoding capability information of the current terminal device's CPU and the GPU when decoding the image respectively; wherein the database stores a plurality of terminal device models and the piggybacking The correspondence between the decoding capability information of the CPU and the GPU at the time of decoding the image, respectively.
The method according to any one of claims 1 to 3, further comprising:

Pre-using the preset image to test the decoding capability information of the CPU and the GPU of the current terminal device when decoding the image, respectively, and saving the test result;

The determining the decoding capability information of the CPU of the current terminal device and the GPU when decoding the image respectively includes:

Determining, according to the saved test result, decoding capability information of the CPU of the current terminal device and the GPU when decoding the image, respectively.
The method according to any one of claims 1 to 3, further comprising:

Pre-utilizing the image carried in the decoding request, testing the decoding capability information of the CPU and the GPU of the current terminal device when decoding the image, and saving the test result;

The determining the decoding capability information of the CPU of the current terminal device and the GPU when decoding the image respectively includes:

Determining, according to the saved test result, decoding capability information of the CPU of the current terminal device and the GPU when decoding the image, respectively.
The method according to claim 1, wherein the assigning the plurality of decoding tasks to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU comprises:

Allocating a first number of decoding tasks to the CPU according to the total number of decoding tasks and the decoding capability ratio, and allocating a second number of decoding tasks to the GPU;

Decoding, by the decoding task of the second number, a decoding request to the GPU in a non-blocking manner, so that the GPU decodes the decoding task of the second number;

The first number of decoding tasks are decoded in the CPU.
The method of claim 7 further comprising:

After decoding in the CPU is completed, reading back the decoding result of the GPU from the GPU;

Combining the decoding result of the CPU with the decoding result of the GPU, and returning the combined result.
The method according to claim 8, wherein when the plurality of decoding tasks are allocated to the CPU and the GPU for parallel processing, the method further includes:

Recording relative position relationship information of the decoding task assigned to the CPU and the decoding task assigned to the GPU in the specified image;

The combining the decoding result of the CPU with the decoding result of the GPU includes:

Determining the storage address according to the storage address of the decoding result of the CPU and the relative positional relationship The save address of the decoding result of the GPU, and the decoding result of the GPU is saved to the save address.
The method according to claim 1, wherein the specified image comprises an image in a JPEG format, and the decomposing the specified image into a plurality of decoding tasks comprises:

The specified image is subjected to minimum coding unit MCU decoding to obtain a plurality of MCU block lines, each MCU block line is determined as one decoding task, and each MCU block line is composed of a plurality of MCU blocks.
The method of claim 10, further comprising:

Dividing each MCU block allocated to the GPU processing into a preset number of data groups;

Calculating the address offset of the calculation result corresponding to each data group after the inverse discrete cosine transform IDCT is performed in the GPU;

When the decoding task is allocated to the GPU, the packet information and the corresponding address offset information of each group are also sent to the GPU, so that the GPU allocates the same MCU block to multiple threads according to the group information. The IDCT calculation is performed, and the IDCT calculation result is saved according to the address offset for the subsequent step to read the IDCT calculation result according to the address offset, and perform subsequent decoding calculation.
The method according to any one of claims 1 to 3, further comprising: before receiving the request for decoding the specified image, the method further comprising:

Sending a browse request for the commodity object to the server;

Receiving page information including the product object picture information returned by the server, and receiving the request for decoding the specified image in the process of displaying the page information.
An apparatus for decoding an image, comprising:

a decoding request receiving unit, configured to receive a request to decode the specified image;

a decoding task decomposition unit, configured to decompose the specified image into a plurality of decoding tasks;

a decoding capability information determining unit, configured to determine decoding capability information of a central processing unit CPU and a graphics processing unit GPU of the current terminal device when decoding an image, respectively;

And a decoding task allocation unit, configured to allocate the plurality of decoding tasks to the CPU and the GPU for parallel processing according to a decoding capability ratio between the CPU and the GPU.
The apparatus according to claim 13, wherein the decoding capability information determining unit comprises:

a format determining subunit, configured to determine a format of the specified image;

The first capability information determining subunit is configured to determine decoding capability information of the CPU of the current terminal device and the GPU when decoding the image of the format, respectively.
The apparatus according to claim 13, wherein the decoding capability information determining unit comprises:

a format determining subunit, configured to determine a format of the specified image;

An encoding type determining subunit, configured to determine an encoding type of the specified image in the format;

The second capability information determining subunit is configured to determine decoding capability information of the CPU of the current terminal device and the GPU when decoding the image of the encoding type in the format, respectively.
The apparatus according to any one of claims 13 to 15, wherein the decoding capability information determining unit comprises:

The terminal model information obtaining subunit is configured to obtain model information of the current terminal device;

The query subunit is configured to query a pre-established database according to the model of the current terminal device, and obtain decoding capability information of the current terminal device CPU and the GPU when decoding the image respectively; wherein the database stores multiple terminals Correspondence between the model of the device and the decoding capability information of the CPU and GPU mounted on the image when decoding the image.
The device according to any one of claims 13 to 15, further comprising:

a first test unit, configured to test, in advance, the decoding capability information of the current terminal device CPU and the GPU when decoding the image, and save the test result;

The decoding capability information determining unit is specifically configured to:

Determining, according to the saved test result, decoding capability information of the CPU of the current terminal device and the GPU when decoding the image, respectively.
The device according to any one of claims 13 to 15, further comprising:

a second test unit, configured to pre-use the image carried in the decoding request, test the decoding capability information of the current terminal device CPU and the GPU when decoding the image, and save the test result;

The decoding capability information determining unit is specifically configured to:

Determining, according to the saved test result, decoding capability information of the CPU of the current terminal device and the GPU when decoding the image, respectively.
The apparatus according to claim 13, wherein the decoding task allocation unit comprises:

An allocation subunit, configured to allocate a first number of decoding tasks to the CPU according to the total number of decoding tasks and the decoding capability ratio, and allocate a second number of decoding tasks to the GPU;

a decoding request sending subunit, configured to send a decoding request to the GPU in a non-blocking manner by using the decoding task of the second number as a parameter, so that the GPU decodes the decoding task of the second number;

And a decoding subunit, configured to decode the decoding task of the first number in the CPU.
The device according to claim 19, further comprising:

a decoding result readback unit, configured to read back a decoding result of the GPU from the GPU after decoding in the CPU is completed;

And a decoding result merging unit, configured to combine the decoding result of the CPU with the decoding result of the GPU, and return the combined result.
The device according to claim 20, further comprising:

a recording unit, configured to record, when the plurality of decoding tasks are allocated to the CPU and the GPU for parallel processing, record a relative of a decoding task allocated to the CPU and the decoding task allocated to the GPU in the specified image Location relationship information;

The decoding result combining unit is specifically configured to:

And determining, according to the saved address of the decoding result of the CPU and the relative positional relationship, a save address of the decoding result of the GPU, and saving the decoding result of the GPU to the save address.
The device according to claim 13, wherein the specified image comprises an image in a JPEG format, and the decoding task decomposition unit is specifically configured to:

The specified image is subjected to minimum coding unit MCU decoding to obtain a plurality of MCU block lines, each MCU block line is determined as one decoding task, and each MCU block line is composed of a plurality of MCU blocks.
The device according to claim 22, further comprising:

a data group dividing unit, configured to divide each MCU block allocated to the GPU processing into a preset number of data groups;

An address offset calculation unit, configured to calculate an address offset of the calculation result corresponding to each data group after the inverse discrete cosine transform IDCT is performed in the GPU;

a sending unit, configured to: when the decoding task is allocated to the GPU, send the group information and the address offset information corresponding to each group to the GPU, so that the GPU follows the grouping information, Allocating the same MCU block to multiple threads for IDCT calculation, and saving the IDCT calculation result according to the address offset, for subsequent steps to read the IDCT calculation result according to the address offset, and performing subsequent Decoding calculation.
The device according to any one of claims 13 to 15, further comprising:

a browsing request sending unit, configured to send a browsing request of the commodity object to the server before receiving the request for decoding the specified image;

The page information receiving unit is configured to receive page information including the product object picture information returned by the server, and receive the request for decoding the specified image in the process of displaying the page information.