CN114064557A

CN114064557A - Image processing apparatus

Info

Publication number: CN114064557A
Application number: CN202010744770.2A
Authority: CN
Inventors: 陈羿逞; 童旭荣
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2022-02-18

Abstract

A graphics processing apparatus includes a plurality of system-on-chip (SoCs) that can cooperate to achieve higher graphics processing performance, the apparatus including a first SoC, an external circuit, and a second SoC. The external circuit is not included in any SoC. The first SoC includes: a first Graphics Processing Unit (GPU) for dividing data to be processed into a first portion and a second portion, acquiring and processing the first portion to generate and output first data; and a first transceiver circuit for obtaining the second part, transmitting the part to the second SoC via the external circuit, and receiving second data via the external circuit to transmit the second data. The second SoC includes: a second transceiver circuit for receiving the second part via the external circuit and outputting the second data to the first SoC via the external circuit; and a second GPU that receives the second portion from the second transceiver circuit and processes the portion to output the second data to the second transceiver circuit.

Description

Image processing apparatus

Technical Field

The present invention relates to a graphics processing apparatus, and more particularly, to a graphics processing apparatus including a plurality of system-on-chips that can operate cooperatively.

Background

The System on a Chip (SoC) design is to integrate the main functions of an end product (or System) into a single Chip, which is called SoC.

Low-computing power (low-resolution) soc is generally used for lower-order electronic products (e.g., 1920 × 1080 resolution tvs), and high-computing power soc is generally used for higher-order electronic products (e.g., 3840 × 1920 resolution tvs). Considering that the total development and manufacturing cost of a plurality of system-on-chips with different operational capabilities is higher than that of any one of the plurality of system-on-chips, and considering that the system-on-chip with high operational capability used for low-order electronic products is not cost-effective, the industry needs a technology capable of realizing high operational capability through the combination of a plurality of system-on-chips with low operational capabilities, thereby flexibly using a single system-on-chip with low operational capability for low-order electronic products and a combination of a plurality of system-on-chips with low operational capability for high-order electronic products.

Known multi-core and multi-cluster (multi-cluster) technologies include General Interrupt Controller (GIC), Coherent Mesh Network (CMN) technology, and Cache Coherent Interconnect accelerator (CCIX) technology. The above-described techniques do not address the cooperative operation of different system-on-chips.

Disclosure of Invention

One objective of the present disclosure is to provide a graphics processing apparatus including a plurality of system-on-chips that can operate cooperatively to achieve higher graphics processing performance.

One embodiment of the disclosed graphics processing apparatus includes a first system-on-chip, an external circuit, and a second system-on-chip. The first system-on-chip includes a first Graphics Processing Unit (GPU) and a first transceiver circuit. The first graphic processor is used for dividing the data to be processed into a plurality of input parts including a first input part and a second input part under the enhanced processing mode; the first graphics processor is further configured to fetch and process the first input portion to generate and output first output data in the enhanced processing mode. The first transceiver circuit is coupled to the first graphic processor for obtaining the second input part in the enhanced processing mode and transmitting the second input part to the second system-on-chip via the external circuit; the first transceiver circuit is further configured to receive second output data via the external circuit in the enhanced processing mode to transmit the second output data. The external circuit is not included in either of the first system-on-chip and the second system-on-chip. The second system-on-chip includes a second transceiver circuit and a second graphics processor. The second transceiver circuit is used for receiving the second input part through the external circuit and outputting the second output data to the first system-on-chip through the external circuit in the enhanced processing mode. The second graphic processor is coupled to the second transceiver circuit and is used for receiving the second input part and processing the second input part to generate the second output data in the enhanced processing mode; the second graphic processor is also used for outputting the second output data to the second transceiver circuit under the enhanced processing mode.

The features, operation and efficacy of the present invention are described in detail herein with reference to the accompanying drawings.

Drawings

FIG. 1 shows an embodiment of an image processing apparatus according to the present disclosure;

FIG. 2 illustrates one embodiment of the first SoC and the second SoC of FIG. 1;

FIG. 3 illustrates another embodiment of the first SoC and the second SoC of FIG. 1;

FIG. 4 shows an embodiment of an image processing pipeline that can be used as the first/second image processing pipeline of FIG. 3;

FIG. 5 shows another embodiment of the first SoC and the second SoC of FIG. 1;

FIG. 6 shows one embodiment of a data processing apparatus of the present disclosure;

FIG. 7 illustrates one embodiment of the first SoC and the second SoC of FIG. 6;

FIG. 8 shows an example of the cooperation of the first SoC and the second SoC of FIG. 6;

FIG. 9 shows another embodiment of the first SoC and the second SoC of FIG. 6;

FIG. 10 shows one embodiment of a graphics processing device of the present disclosure;

FIG. 11 illustrates one embodiment of the first SoC and the second SoC of FIG. 10;

FIG. 12 shows a schematic diagram of an exemplary implementation of FIG. 11; and

fig. 13 shows a schematic diagram of another exemplary implementation of fig. 11.

Detailed Description

The present disclosure discloses an image processing apparatus, a data processing apparatus and a graphic processing apparatus, each of which includes a plurality of system-on-chip chips that can be operated in cooperation to achieve higher processing performance. To facilitate understanding, the following description contains many examples, illustrations, and exemplary implementations, and these descriptions are not intended to limit the scope of the implementations of the present invention.

Fig. 1 shows an embodiment of an image processing apparatus according to the present disclosure. The image processing apparatus 100 of fig. 1 includes a first SoC 110, a second SoC 120, and an external circuit 130. The first SoC 110 is a main SoC, and the second SoC 120 is a performance-enhancing SoC (performance-enhancing SoC) having the same or different circuit configurations; however, depending on implementation requirements, some circuitry in first SoC 110 and/or second SoC 120 may not be substantive. The external circuit 130 is not included in either of the first SoC 110 and the second SoC 120. For example, if the first SoC 110 and the second SoC 120 are packaged chips (packaged chips) disposed on a circuit board (e.g., a printed circuit board), the external circuit 130 may be/include a signal transmission line of the circuit board. For example, if both the first SoC 110 and the second SoC 120 are unpackaged dies (non-packaged dies) included in the semiconductor package, the external circuit 130 is included in the semiconductor package and includes at least one of the following types depending on the type of the semiconductor package (e.g., wire bond package, flip chip package, etc.): at least one connecting pad; at least one connecting wire; at least one metal ball; and at least one circuit located on the surface of the substrate or contained in the substrate.

FIG. 2 shows one embodiment of the first SoC 110 and the second SoC 120 of FIG. 1. As shown in fig. 2, the first SoC 110 includes a data shunting circuit 112, a first image processing circuit 114 and a transmitting circuit 116, and the second SoC 120 includes a receiving circuit 122 and a second image processing circuit 124. Each of the first SoC 110 and the second SoC 120 is configured to process a portion of the input image data, so as to achieve higher image processing performance through cooperation without exceeding the processing capability. The circuits of the first SoC 110 and the second SoC 120 are described below.

Please refer to fig. 1-2. The data splitting circuit 112 is configured to split the input image data into N input portions including a first input portion and a second input portion for the first image processing circuit 114 and the second image processing circuit 124 to process respectively, where N is an integer greater than 1, which also indicates that the image processing apparatus 100 includes N socs that can cooperatively operate. In an exemplary implementation, the data splitting circuit 112 divides the input image data into data of a left half frame and data of a right half frame (when N is 2) or divides the input image data into more parts (when N >2) by calculating the number of received horizontal pixels of a horizontal image line to determine the horizontal position of the currently received horizontal pixel, which can be implemented by known techniques. In an exemplary implementation, where N is 2, the processing capability of each of the first image processing circuit 114 and the second image processing circuit 124 indicates a frame rate (frame rate) of 7680 pixels × 4320 pixels and 60Hz (abbreviated as 8K4K60Hz), or the frame rate and the size are equivalent (e.g., 4K120Hz), and the splitting of the input image data is one of the following:

(1) the size and frame rate of the input video data is 8K4K60 Hz. The first input portion is data of a left half frame corresponding to the input image data, and the size and frame rate of the first input portion are 3840 pixels × 4320 pixels and 60Hz (4K 60Hz for short), respectively. The second input portion is data of a right half screen corresponding to the input video data, and the size and frame rate thereof are also 4K60 Hz.

(2) The size and frame rate of the input video data is 8K4K60 Hz. The first input portion is data of a left half frame and data of a part of a right half frame corresponding to the input image data, and the size and frame rate of the first input portion are (3840+ n) pixels × 4320 pixels and 60Hz (abbreviated as (4K + n)4K60Hz), respectively. The second input portion is data of a right half screen and data of a part of a left half screen corresponding to the input video data, and the size and frame rate of the second input portion are also (4K + n)4K60 Hz. In this case, the data of the right half of the first input portion and the data of the left half of the second input portion are typically data adjacent to the junction of the left and right halves, which is used to reference the first image processing circuit 114 and the second image processing circuit 124 to help the processed left and right halves to be joined seamlessly.

(3) The size and frame rate of the input image data are 3840 pixels × 2160 pixels and 120Hz (4K 2K120Hz for short), respectively. The first input portion is data of a left half frame corresponding to the input video data, and the size and frame rate of the first input portion are 1920 pixels × 2160 pixels and 120Hz (2K 120Hz for short), respectively. The second input portion is data of a right half screen corresponding to the input video data, and the size and frame rate thereof are also 2K120 Hz.

(4) The size and frame rate of the input video data is 4K2K120 Hz. The first input portion is data of a left half frame and data of a part of a right half frame corresponding to the input video data, and the size and frame rate of the first input portion are (1920+ n) pixels × 2160 pixels and 120Hz (2K + n)2K120Hz for short, respectively. The second input portion is data of a right half screen and data of a part of a left half screen corresponding to the input video data, and the size and frame rate of the second input portion are also (2K + n)2K120 Hz. In this case, the data of the right half of the first input portion and the data of the left half of the second input portion are typically data adjacent to the junction of the left and right halves, which is used to reference the first image processing circuit 114 and the second image processing circuit 124 to help the processed left and right halves to be joined seamlessly.

Please refer to fig. 1-2. The transmitting circuit 116 is coupled to the data splitting circuit 112, and is configured to receive the second input portion to output the second input portion to the second system SoC 120 via the external circuit 130. The receiving circuit 122 is coupled to the external circuit 130 for receiving the second input portion and forwarding the second input portion to the second image processing circuit 124. In an exemplary implementation, the transmission between the transmitting circuit 116 and the receiving circuit 122 is based on a known or self-developed signaling standard (e.g., the V-by-One HS standard or the HDMI standard), and neither the transmission in the first SoC 110 nor the transmission in the second SoC 120 is based on the signaling standard, which typically supports a maximum data transfer rate that is not less than the video data processing capability of the second SoC 120. In an exemplary implementation, the first SoC 110 includes a first encryption circuit (not shown) for encrypting the second input portion before outputting the second input portion to the receiving circuit 122 via the external circuit 130; the second SoC 120 comprises a second decryption circuit (not shown) for decrypting the second input portion after receiving the second input portion. In an exemplary implementation, if the first SoC 110 needs to receive data from the second SoC 120, the first SoC 110 includes a first transceiver circuit (e.g., the first transceiver circuit 310 of fig. 3) including the transmitting circuit 116; if the data received from the second SoC 120 is encrypted, the first SoC 110 includes a first decryption circuit (not shown) to decrypt the data of the second SoC 120; in the exemplary implementation, if the second SoC 120 is capable of outputting data to the first SoC 110, the second SoC 120 includes a second transceiver circuit (e.g., the second transceiver circuit 320 of fig. 3) including the receiving circuit 122, and the second SoC 120 may optionally include a second encryption circuit (not shown) for encrypting data before the second transceiver circuit outputs the data to the first transceiver circuit. The encryption circuit and the decryption circuit may be implemented by known or self-developed technologies, such as High-definition Digital Content Protection (HDCP).

Please refer to fig. 1-2. The first image processing circuit 114 is coupled to the data splitting circuit 112 and configured to receive and process the first input portion to generate a first output portion of a plurality of output portions of output image data for a back-end circuit (e.g., a panel control circuit). The second image processing circuit 124 is coupled to the receiving circuit 122 for receiving and processing the second input portion to generate a second output portion of the output portions of the output image data to the back-end circuit. For example, in the case of the above cases (1) or (2), when the sizes and frame rates of the first/second output portions are 3840 pixels × 4320 pixels and 120Hz (abbreviated as 4K120Hz), respectively, the first/second image processing circuits 114/124 include known or self-developed Frame Rate Conversion (FRC) circuits (e.g., FRC circuit 420 of fig. 4) for converting the input frame rate (60Hz) of the first/second input portions into the output frame rate (120Hz) of the first/second output portions, and the first output portion and the second output portion constitute a complete frame within a time of one of the output frame rates (1/120 Hz). For example, in the case of the aforementioned cases (3) and (4), when the size and frame rate of the first/second output portion is 4K120Hz, the first/second image processing circuit 114/124 comprises a scaler (e.g., scaler 430 of fig. 4) which is known or developed by itself and is used to scale the size of the first/second input portion (1920 pixels × 2160 pixels or (1920+ n) pixels × 2160 pixels) to the size of the first/second output portion (3840 pixels × 4320 pixels), and the first output portion and the second output portion constitute a complete image frame within a time of one-half (1/120Hz) of the frame rate of the first/second output portion. In addition, according to the implementation requirement, the first image processing circuit 114 may output at least a portion of the first output portion to the second image processing circuit 124 through the transmitting circuit 116 and the receiving circuit 122, and/or the second image processing circuit 124 may output at least a portion of the second output portion to the first image processing circuit 114 through the second transceiving circuit and the first transceiving circuit; for example, two image processing circuits can exchange data to be sent to the panel for display and processed to meet the particular requirements of the panel.

It should be noted that the combination of the first output portion and the second output portion (e.g., the combination (8K4K120Hz) of the left half frame (4K 120Hz) and the right half frame (4K 120Hz) in any of the cases (1) - (4)) indicates the size and the frame rate of the output image, and the data amount per unit time (per unit time) determined by the size and the frame rate of the output image (i.e., the data transmission rate for outputting the output image) is greater than the data amount processing capability per unit time (e.g., 4K120Hz) of the first image processing circuit 114 and greater than the data amount processing capability per unit time (e.g., 4K120Hz) of the second image processing circuit 124. In other words, the combination of the first SoC 110 and the second SoC 120 achieves a higher processing performance than either of the two socs.

Fig. 3 shows another embodiment of the first SoC 110 and the second SoC 120 of fig. 1, and more particularly, shows an embodiment of the first image processing circuit 114 and the second image processing circuit 124. According to fig. 3, the circuit configuration of the second SoC 120 is the same as that of the first SoC 110, and therefore, the first SoC 110 includes the first transceiver circuit 310 including the transmitting circuit 116 (not shown in fig. 3), the second SoC 120 includes the second transceiver circuit 320 including the receiving circuit 122 (not shown in fig. 3), the second SoC 120 further includes another data shunting circuit 330 corresponding to the data shunting circuit 112, but the data shunting circuit 330 has no substantial effect and can be disabled or omitted here; in addition, the first image processing circuit 114 and the second image processing circuit 124 have the same circuit configuration, wherein some circuits may not be substantially functional and may be disabled or omitted. The first image processing circuit 114 includes a first delay circuit 1142, a first selection circuit 1144 and a first image processing pipeline (image processing pipeline) 1146; the second image processing circuit 124 includes a second delay circuit 1242, a second selection circuit 1244 and a second image processing circuit 1246. It is noted that if the input image data is/includes encoded data, the first image processing circuit 114/the second image processing circuit 124 may further include a decoder (not shown) for decoding the encoded data so that the first image processing circuit 1146/the second image processing circuit 1246 process the decoded data.

Please refer to fig. 3. Considering that the path of the data shunting circuit 112 outputting the first input portion to the first image processing circuit 114 is generally shorter than the path of the data shunting circuit 112 outputting the second input portion to the second image processing circuit 124, the first delay circuit 1142 is used for receiving and delaying the first input portion, so that the time of the first image processing circuit 114 receiving the first input portion is substantially synchronous with the time of the second image processing circuit 124 receiving the second input portion, and the substantial synchronization means that the difference of the receiving times is smaller than a predetermined threshold value and can be ignored. The first selection circuit 1144 is coupled between the first delay circuit 1142 and the first image processing pipeline 1146, and is coupled to the first transceiver circuit 310 (as shown by the dashed short line in fig. 3); the first selection circuit 1144 is used for receiving the first input portion from the first delay circuit 1142 to output the first input portion to the first image processing pipeline 1146. The first image processing pipeline 1146 is coupled to the first selection circuit 1144 for receiving and processing the first input portion to generate the first output portion.

Please refer to fig. 3. The second delay circuit 1242 is coupled to the data shunting circuit 330 (as shown by the dashed short-dashed line in fig. 3), but has no substantial effect here. The second selection circuit 1244 is coupled to the second delay circuit 1242 (as shown by the dashed short line in fig. 3), and is coupled between the second transceiver circuit 320 and the second image processing circuit 1246; the second selection circuit 1244 is used for receiving the second input portion from the second transceiver circuit 320 to output the second input portion to the second image processing circuit 1246. The second image processing circuit 1246 is coupled to the second selection circuit 1244 for receiving and processing the second input portion to generate the second output portion.

Please refer to fig. 3. In an exemplary implementation, the first image processing pipeline 1146 exchanges one or more synchronization signals (e.g., at least one horizontal synchronization signal and/or at least one vertical synchronization signal) with the second image processing pipeline 1246 to substantially synchronize the first output portion with the second output portion. In an exemplary implementation, a dedicated line (not shown) is provided between the first image processing pipeline 1146 and the second image processing pipeline 1246 for signal transmission in a unidirectional or bidirectional manner, wherein the portion of the dedicated line between the two socs is included in the external circuit 130; those skilled in the art can refer to the first transceiver 525, the external circuit 570, and the second transceiver 555 of fig. 5 and their related descriptions to understand how to implement the dedicated lines. Various methods may be used for transmission between the first image processing pipeline 1146 and the second image processing pipeline 1246, including:

(1) the dedicated lines are used to transmit data between the first image processing pipeline 1146 and the second image processing pipeline 1246. Any image processing circuit can receive/access the data from another image processing circuit according to the time sequence of the input image data, and output the processed data from the data shunting circuit 112, and the data can be temporarily stored in a buffer (not shown) before being output to the image processing circuit;

(2) the existing paths (i.e., the first transceiver circuit 310, the external circuit 130, and the second transceiver circuit 320) are utilized to implement the transmission between the first image processing pipeline 1146 and the second image processing pipeline 1246. If the existing path is only available for transmission or reception at the same time, each SoC can use a known or self-developed arbiter (not shown) to determine the timing of transmission and reception according to the timing of the input image data; if the existing path can be used for transmission and reception at the same time, any SoC can temporarily store the received data in a buffer (not shown), and the image processing circuit of the SoC can receive/access the received data in the buffer according to the timing arrangement of the input image data and process and output the data from the data shunting circuit 112.

Fig. 4 shows an embodiment in which the image processing pipeline 400 may be used as either of the first image processing pipeline 1146 and the second image processing pipeline 1246. The image processing pipeline 400 includes: a known or self-developed image characteristic adjusting circuit 410 for adjusting characteristics of an image such as brightness, contrast, saturation, etc.; a known or self-developed frame rate conversion circuit 420; and a scaler 430, either known or developed in its own right. The position sequence of the circuits of the image processing pipeline 400 depends on the implementation requirement; in addition, the image processing circuit 400 may include more circuits (e.g., known or self-developed panel timing converters) or omit certain unused circuits.

Fig. 5 shows another embodiment of the first SoC 110 and the second SoC 120 of fig. 1. In this embodiment, the first SoC 110 and the second SoC 120 are respectively a first television SoC and a second television SoC for converting various input video data into video data displayable on a television panel; the first SoC 110 further includes a first system bus 510, a first processor 520 (e.g., a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU)), a first transceiver 525 and other circuits 530 (e.g., network circuits, USB circuits, audio circuits, storage circuits, etc.); the second SoC 120 further includes a second system bus 540, a second processor 550, a second transceiver 555, and other circuitry 560. The first processor 520 and the second processor 550 cooperate with the second transceiver 555 via the first transceiver 525 and the external circuit 570, the details and variations of which are found in the description of the embodiments of fig. 6-13 of the present invention; it is noted that, depending on the implementation requirement, the external circuit 570 may be integrated with the external circuit 130, and each of the first SoC 110 and the second SoC 120 may include an access circuit (as shown in fig. 9) for controlling the destination of data; it is noted that the first transceiver 525 may be integrated with the transmitting circuit 116, the second transceiver 555 may be integrated with the receiving circuit 122, and the data transmission management after the integration may be handled by a known or self-developed arbiter, on the premise that the performance of the implementation is acceptable. In addition, the first image processing circuit 114 communicates with the first processor 520 through the first system bus 510 to utilize the computing resources of the first processor 520 or be controlled by the first processor 520; the second image processing circuit 124 communicates with the second processor 550 via the second system bus 540 to utilize the computing resources of the second processor 550 or to be controlled by the second processor 550. Each of the first SoC 110 and the second SoC 120 may be used individually for lower-order television products (e.g., 4K television) and the two socs may also cooperate for higher-order television products (e.g., 8K television).

FIG. 6 shows one embodiment of a data processing apparatus of the present disclosure. The data processing apparatus 600 of fig. 6 includes a first SoC 610, a second SoC 620, and an external circuit 630. The first SoC 610 serves as a main SoC, and the second SoC 620 serves as a performance enhancement SoC, which have the same or different circuit configurations; however, depending on implementation requirements, some circuitry in the first SoC 610 and/or the second SoC 620 may not be substantive. The external circuit 630 is not included in either of the first SoC 610 and the second SoC 620. For example, if the first SoC 610 and the second SoC 620 are packaged chips disposed on a circuit board (e.g., a printed circuit board), the external circuit 630 may be/include a signal transmission line of the circuit board. For example, if the first SoC 610 and the second SoC 620 are both unpackaged dies contained in a semiconductor package, the external circuit 630 may be contained in the semiconductor package and may include at least one of the following depending on the type of the semiconductor package (e.g., wire bond package, flip chip package, etc.): at least one connecting pad; at least one connecting wire; at least one metal ball; and at least one circuit located on the surface of the substrate or contained in the substrate.

FIG. 7 shows one embodiment of the first SoC 610 and the second SoC 620 of FIG. 6. As shown in fig. 7, the first SoC 610 includes a first CPU 612 and a first transceiver 614, and the second SoC 620 includes a second CPU 622 and a second transceiver 624. Each of the first SoC 610 and the second SoC 620 is configured to process a portion of data to be processed, so as to achieve higher data processing performance through cooperation without exceeding the processing capacity. The circuits of the first SoC 610 and the second SoC 620 are described below.

Please refer to fig. 6-7. The first CPU 612 is configured to divide the to-be-processed data into a plurality of input portions including a first input portion and a second input portion according to the to-be-processed data itself or related information thereof in an enhanced processing mode (i.e., when the first SoC 610 and the second SoC 620 are simultaneously running); the first CPU 612 is also configured to fetch and process the first input portion to generate and output first output data in the enhanced processing mode. For example, at least a portion of the first SoC 610 runs in an open Execution Environment (REE); all of the second SoC 620 runs in a Trusted Execution Environment (TEE); the first input portion is non-sensitive data such as system operating data of a general-purpose operating system (e.g., an open source operating system); the second input portion is sensitive data such as at least one of: data to be authenticated (e.g., identification data such as fingerprint data, Personal Identification Number (PIN), payment information, etc.); confidential/secret data (e.g., private keys, certificates, etc.); and protected data (e.g., Digital Rights Management (DRM) data such as encrypted compressed video data). In the above example, the sensitive data of the second input portion is transmitted from the first SoC 610 to the second SoC 620 via the external circuit 630, so if the data transmitted via the external circuit 630 (e.g., a line on a circuit board) is easily stolen, the communication between the first SoC 610 and the second SoC 620 generally needs to conform to a secure Transmission specification (e.g., Digital Transmission Content Protection (DTCP)); if the data transmitted via the external circuit 630 (e.g., pads, solder balls, etc. in a semiconductor package) is less susceptible to theft, the communication between the first SoC 610 and the second SoC 620 does not necessarily conform to the secure transmission specification. For example, the first SoC 610 includes two parts respectively operating in the REE and the TEE, and the first input part is non-sensitive data and/or sensitive data, and since the data transmission between the two parts is data transmission in the same SoC, it is not necessary to meet the aforementioned safety transmission specification.

Please refer to fig. 6-7. The first transceiver circuit 614 is coupled to the first CPU 612, and is configured to retrieve the second input portion from the first CPU 612 or a memory (e.g., the system memory 920 of fig. 9) to transmit the second input portion to the second SoC 620 via the external circuit 630 in the enhanced processing mode, and the first transceiver circuit 614 is further configured to receive the second output data of the second SoC 620 via the external circuit 630 in the enhanced processing mode to forward the second output data. The second transceiving circuit 624 is configured to receive the second input portion via the external circuit 630 and output the second output data via the external circuit 630 to the first SoC 610 in the enhanced processing mode. The second CPU 622 is configured to receive the second input portion from the second transceiver 624 and process the second input portion to generate the second output data in the enhanced processing mode, so as to output the second output data to the second transceiver 624.

Please refer to fig. 6-7. In one exemplary implementation, the first CPU 612 includes a first cache memory 6122, and the second CPU 622 includes a second cache memory 6222; when the first CPU 612 processes the first input portion, the first CPU 612 uses the first cache memory 6122 to store first cache data (e.g., data to be processed or processed data) associated with the first input portion; when the second CPU 622 processes the second input portion, the second CPU 622 uses the second buffer memory 6222 to store second buffer data (e.g., data to be processed or processed) related to the second input portion; the first cache data is inconsistent with the second cache data (coherent), that is, the first CPU 612 does not need to be aware of the progress of the second CPU 622 in processing the second input portion, the second CPU 622 does not need to be aware of the progress of the first CPU 612 in processing the first input portion, and the stored data in the first cache memory 6122 is not required to be consistent with the stored data in the second cache memory 6222, which is different from the prior art (e.g., CCIX).

To assist understanding, one exemplary implementation is presented below. The first SoC 610 and the second SoC 610 cooperate to process a network video stream as shown in fig. 8. Fig. 8 shows the following several stages of processing:

(1) s810: the first SoC 610 outputs log data (i.e., sensitive data) of the network video streaming service to the second SoC 620.

(2) S820: the second SoC 620 processes and verifies the user account information.

(3) S830: the second SoC 620 handles DRM-related issues.

(4) S840: the first SoC 610 starts playing the network video.

(5) S850: the first SoC 610 receives encrypted network video stream data from the network and outputs the encrypted network video stream data (i.e., sensitive data) to the second SoC 620.

(6) S860: the second SoC 620 decrypts the encrypted network video stream data.

(7) S870: the second SoC 620 sends the decrypted data to the first SoC 610 under DTCP protection.

(8) S880: the first SoC 610 outputs video data through the secure display path.

Since one of ordinary skill in the art will understand how to implement the stages of fig. 8 using the circuits of fig. 6-7 based on the above description, redundant descriptions are omitted here.

FIG. 9 shows another embodiment of the first SoC 610 and the second SoC 620 of FIG. 6. As shown in fig. 9, the first SoC 610 further includes a first system bus 910, a first system memory 920 (e.g., a DRAM), a first memory access circuit 930, a first encryption/decryption circuit 940 and other circuits 950 (e.g., a network circuit, a USB circuit, an audio circuit, a graphics processor, etc.), in addition to the first CPU 612 and the first transceiver circuit 614; the second SoC 620 includes a second system bus 960, a second system memory 970 (e.g., DRAM), a second memory access circuit 980, a second encryption/decryption circuit 990, and other circuits 995 (e.g., network circuits, USB circuits, audio circuits, a graphics processor, etc.), in addition to the second CPU 622 and the second transceiver circuit 624. In addition, a dedicated line may be selectively disposed between the first CPU 612 and the second CPU 622 as shown by the dashed line in fig. 9, so that the two CPUs perform signal transmission (for example, transmission of an Interrupt Request (IRQ) and/or transmission of control signals/information required for cooperative operation) unidirectionally or bidirectionally, wherein the portion of the dedicated line between the two socs is included in the external circuit 630; if the dedicated line is not provided, the signal transmission is a path formed by the memory access circuit, the encryption/decryption circuit, the transceiver circuit, and the like.

Please refer to fig. 6 and 9. The first memory access circuit 930 is a known or self-developed circuit for receiving/forwarding instructions or data of the first CPU 612 and for accessing the first system memory 920 via the first system bus 910; the first CPU 612 may also access the first system memory 920 directly via the first system bus 910, as the implementation requires. The first encryption/decryption circuit 940 is a known or self-developed circuit, and is used to obtain the second input portion from the first memory access circuit 930, encrypt the second input portion, and provide the encrypted second input portion to the first transceiver circuit 614 for output to the second transceiver circuit 624. The first encryption/decryption circuit 940 is further configured to receive the second output data from the first transceiver circuit 614 and decrypt the second output data to output the decrypted second output data to the first memory access circuit 930. The operation of each circuit of the second SoC 620 is similar to that of each circuit of the first SoC 610, and redundant description is omitted here. In one exemplary implementation, the second input portion contains compressed data, and the second CPU 622 is configured to decompress the compressed data to generate decompressed data for inclusion in the second output data. In one exemplary implementation, the second input portion includes audio data, and the second CPU 622 is configured to perform equalization (equalization) on the audio data to generate equalized audio data for inclusion in the second output data. It should be noted that the encryption/decryption circuit may be disabled or omitted as required by the implementation.

Please refer to fig. 6, 7 and 9. In one exemplary implementation, each of the first SoC 610 and the second SoC 620 is a television SoC. In an exemplary implementation, the second SoC 620 is enabled in the enhanced processing mode and disabled in the normal processing mode to reduce power consumption, each mode depending on at least one of: setting by a user; the current performance metric of the first CPU 612; the nature (e.g., sensitivity or independence) of the data to be processed. In an exemplary implementation, the combination of the first output data and the second output data indicates an amount of data per unit time that is greater than the data amount per unit time processing capability of the first CPU 612 and greater than the data amount per unit time processing capability of the second CPU 622, which indicates that the processing capability of the data processing apparatus 600 is better than the processing capability of either of the first SoC 610 and the second SoC 620.

FIG. 10 shows one embodiment of a graphics processing device of the present disclosure. The graphics processing device 1000 of fig. 10 includes a first SoC 1010, a second SoC 1020, and an external circuit 1030. The first SoC 1010 is a main SoC, and the second SoC 1020 is a performance enhancing SoC, which have the same or different circuit configurations; however, depending on implementation requirements, certain circuits in the first SoC 1010 and/or the second SoC 1020 may not be substantive. The external circuit 1030 is not included in either of the first SoC 1010 and the second SoC 1020. For example, if the first SoC 1010 and the second SoC 1020 are packaged chips mounted on a circuit board (e.g., a printed circuit board), the external circuit 1030 may be/include a signal transmission line of the circuit board. For example, if the first SoC 1010 and the second SoC 1020 are both unpackaged dies contained in a semiconductor package, the external circuit 1030 may be contained in the semiconductor package and may include at least one of the following depending on the type of the semiconductor package (e.g., wire bond, flip chip, etc.): at least one connecting pad; at least one connecting wire; at least one metal ball; and at least one circuit located on the surface of the substrate or contained in the substrate.

FIG. 11 illustrates one embodiment of the first SoC 1010 and the second SoC 1020 of FIG. 10. As shown in fig. 11, the first SoC 1010 includes a first GPU 1012 and a first transceiver circuit 1014, and the second SoC 1020 includes a second GPU 1022 and a second transceiver circuit 1024. Each of the first SoC 1010 and the second SoC 1020 is configured to process a portion of data to be processed to achieve higher graphics processing performance through cooperation without exceeding processing capacity. The circuits of the first SoC 1010 and the second SoC 1020 are described below.

Please refer to fig. 10-11. The first GPU 1012 is used for dividing the data to be processed into a plurality of input parts including a first input part and a second input part in an enhanced processing mode (i.e. when the first SoC 1010 and the second SoC 1020 run simultaneously); the first GPU 1012 is also configured to fetch and process the first input portion to generate and output first output data in the enhanced processing mode. The first transceiver circuit 1014 is configured to retrieve the second input portion from the first GPU 1012 or from a memory access circuit (not shown) controlled by the first GPU 1012 in the enhanced processing mode, and transmit the second input portion to the second SoC 1020 via the external circuit 1030; the first transceiver circuit 1014 is further configured to receive second output data via the external circuit 1030 to output the second output data in the enhanced processing mode. The second transceiver circuit 1024 is used for receiving the second input portion via the external circuit 1030 and transmitting the second output data to the first SoC 1010 via the external circuit 1030 in the enhanced processing mode. The second GPU 1022 is configured to receive the second input portion from the second transceiver 1024 and process the second input portion to generate the second output data in the enhanced processing mode. The second GPU 1022 is further configured to output the second output data to the second transceiver 1024 in the enhanced processing mode.

Please refer to fig. 10-11. In one exemplary implementation, the first GPU 1012 includes a first cache memory 1110, and the second GPU 1022 includes a second cache memory 1120; when the first GPU 1012 processes the first input portion, the first GPU 1012 uses the first cache memory 1110 to store first cache data (e.g., pending data or processed data) associated with the first input portion; when the second GPU 1022 processes the second input portion, the second GPU 1022 uses the second cache memory 1120 to store second cache data (e.g., pending data or processed data) associated with the second input portion; the first and second cache data are not consistent, i.e., the first GPU 1012 does not need to be aware of the progress of the second GPU 1022 in processing the second input portion, the second GPU 1022 does not need to be aware of the progress of the first GPU 1012 in processing the first input portion, and the stored data in the first cache memory 1110 does not need to be consistent with the stored data in the second cache memory 1120, unlike the prior art (e.g., CCIX).

As described above, for example, the first SoC 1010 executes a first application (e.g., a camera application or a second game application) and a second application (e.g., a chat application), the second SoC 1020 executes a third application (e.g., a first game application), the first input portion includes the first application-related data and the second application-related data (i.e., data to be rendered by the first GPU 1012), the second input portion includes the third application-related data and data of a keyboard/mouse event controlling the execution of the third application (i.e., data to be rendered by the second GPU 1022), the first output data includes first rendering data (e.g., screen data of the first application) and second rendering data (e.g., screen data of the second application), the second output data includes third rendering data (e.g., frame data of the third application) and an Interrupt request, and the first SoC 1010 executes an Interrupt Service Routine (ISR) (e.g., Interrupt Service Routine (ISR)): the interrupt service routine 1210 of fig. 12) to write the third rendering data into a data buffer circuit according to the interrupt request (e.g.: the data staging circuit 1220 of fig. 12), and further, the first SoC 1010 executes software (e.g.: the conventional Alpha blending software 1230 of fig. 12) reads the third rendering data from the data buffering circuit and superimposes the first, second and third rendering data to be displayed on the same OSD frame. FIG. 12 is a diagram illustrating the above example, wherein an example of the data-buffering circuit 1220 includes three stages of registers connected in a ring (not shown), and the first SoC 1010 writes the third rendering data into X of the data-buffering circuit 1220^thAfter the register, the write pointer (write pointer) of the data register circuit 1220 is updated to point to X^thRegister changed to point (X +1)^thRegister, and the first SoC 1010 slave data register circuit 1220X^thBefore the register reads the third rendering data, the read pointer of the data register 1220 is also updated to point to (X +2)^thRegister changes to point to X^thRegisters, X, (X +1) and (X +2) are three consecutive integers, the next number of (X +2) is X to form a loop. It is noted that the above-mentioned ISR and the technique of software superposition can be implemented by known or self-developed techniques.

Also for example, the first input portion includes primary user interface data (i.e., data to be rendered by the first GPU 1012), the second input portion includes two-dimensional data/position data (i.e., data to be rendered by the second GPU 1022) and an interrupt request for the second GPU 1022 to receive and process the second input portion, the first output data includes first rendering data, the second output data includes second rendering data (e.g., Augmented Reality (AR) data or Virtual Reality (VR) data), and the first SoC 1010 overlays graphics through hardware (e.g., the known On Screen Display (OSD) generating hardware 1310 of fig. 13), displays the first rendering data On a first layer of OSD Screen and displays the second rendering data On a second layer of OSD Screen. It is noted that the above-described techniques for superimposing graphics by hardware can be implemented by known or self-developed techniques. Fig. 13 is a schematic diagram of the above example.

Please refer to fig. 10. In one exemplary implementation, each of the first SoC 1010 and the second SoC 1020 is a television SoC. In an exemplary implementation, the second SoC 1020 is enabled in the enhanced processing mode and disabled in the normal processing mode to reduce power consumption, each mode depending on at least one of: setting by a user; the current performance metric of the first GPU 1012; the nature of the data to be processed (e.g., high computational resource requirements). In an exemplary implementation, the combination of the first output data and the second output data indicates an amount of data per unit time that is greater than the processing capacity per unit time of the first GPU 1012 and greater than the processing capacity per unit time of the second GPU 1022, which indicates that the processing capacity of the data processing apparatus 1000 is better than the processing capacity of either of the first SoC 1010 and the second SoC 1020.

Please refer to fig. 10, and also refer to fig. 5/9. In one example, the first SoC 1010 includes a first CPU and a first system bus (not shown), and the first GPU 1012 communicates with the first CPU via the first system bus to utilize the computing resources of the first CPU; the second SoC 1020 includes a second CPU and a second system bus, and the second GPU 1022 communicates with the second CPU through the second system bus to utilize the computing resources of the second CPU.

It should be noted that, when the implementation is possible, a person skilled in the art may selectively implement some or all of the technical features of any one of the foregoing embodiments, or selectively implement a combination of some or all of the technical features of the foregoing embodiments, thereby increasing the flexibility in implementing the invention.

In summary, each of the image processing apparatus, the data processing apparatus, and the graphics processing apparatus of the present disclosure can achieve higher processing performance by the plurality of socs operating cooperatively.

Although the embodiments of the present invention have been described above, these embodiments are not intended to limit the present invention, and those skilled in the art can make variations on the technical features of the present invention according to the explicit or implicit contents of the present invention, and all such variations may fall within the scope of the claims of the present invention.

Description of the reference numerals

100 image processing apparatus

110 first SoC

120 the second SoC

130 external circuit

112 data shunt circuit

114 first image processing circuit

116 a transmission circuit

122 receiving circuit

124 second image processing circuit

310 first transceiving circuit

320 second transceiving circuit

330 data shunt circuit

1142 first delay circuit

1144 first selection circuit

1146 first image processing pipeline

1242 second delay Circuit

1244 second selection Circuit

1246 second image processing pipeline

400 image processing pipeline

410 image characteristic adjusting circuit

420 frame rate conversion circuit

430 scaler

510 first System bus

520 first processor

525 first transceiver

530 other circuits

540 second System bus

550 second processor

555 second transceiver

560 other circuits

570 external circuit

600 data processing apparatus

610 first SoC

620 second SoC

630 external circuit

612 first CPU

614 first transceiving circuit

622 second CPU

624 second transceiver circuit

6122 first cache memory

6222 second cache memory

S810-S880 video stream processing stage

910 first System bus

920 first System memory

930 first memory access circuit

940 first encryption and decryption circuit

950 other circuits

960 second System bus

970 second system memory

980 second memory Access Circuit

990: second encryption/decryption circuit

995 other circuits

1000 graphic processing apparatus

1010 first SoC

1020 second SoC

1030 external circuit

1012 first GPU

1014 first transceiver circuit

1022 second GPU

1024 second transceiver circuit

1110 first cache memory

1120 second cache memory

1210 interrupt service routine

1220, data temporary storage circuit

1230 alpha blending software

1310 OSD generating hardware

Claims

1. A graphics processing apparatus comprising a plurality of system-on-chips that are cooperatively operable, the graphics processing apparatus comprising:

a first system-on-chip comprising:

a first graphics processor for dividing data to be processed into a plurality of input portions including a first input portion and a second input portion in an enhanced processing mode, the first graphics processor further for acquiring and processing the first input portion to generate and output first output data in the enhanced processing mode; and

a first transceiver circuit, coupled to the first graphic processor, for obtaining the second input portion in the enhanced processing mode to transmit the second input portion to a second system-on-chip via an external circuit, and for receiving second output data via the external circuit in the enhanced processing mode to output the second output data;

the external circuit not included in either of the first system-on-chip and the second system-on-chip; and

the second system-on-chip includes:

a second transceiving circuit for receiving the second input portion via the external circuit and transmitting the second output data to the first system-on-chip via the external circuit in the enhanced processing mode; and

a second graphics processor, coupled to the second transceiver circuit, for receiving the second input portion and processing the second input portion to generate the second output data in the enhanced processing mode, and for outputting the second output data to the second transceiver circuit in the enhanced processing mode.

2. The graphics processing apparatus of claim 1, wherein the first graphics processor uses a first cache memory to store first cache data when the first graphics processor is processing the first input portion; the second graphics processor using a second cache memory to store second cache data when the second graphics processor is processing the second input portion; the first cached data is inconsistent with the second cached data.

3. The graphics processing apparatus of claim 1, wherein the first output data comprises first rendering data, the second output data comprises second rendering data, the first system on chip executing software to superimpose the first rendering data and the second rendering data.

4. The graphics processing apparatus according to claim 1, wherein the first output data includes first rendering data, the second output data includes second rendering data, and the first system on chip superimposes the first rendering data and the second rendering data by hardware to display the first rendering data as a first layer screen and the second rendering data as a second layer screen.

5. The graphics processing apparatus according to claim 1, wherein the first and second system-on-chips are packaged chips disposed on a circuit board, and the external circuit belongs to the circuit board.

6. The graphics processing device of claim 1, wherein the first system-on-chip and the second system-on-chip are both unpackaged dies included in a semiconductor package, the external circuitry being included in the semiconductor package.

7. The graphics processing apparatus according to claim 1, wherein the first system-on-chip and the second system-on-chip have the same circuit configuration, the first system-on-chip being a main system-on-chip and the second system-on-chip being a performance enhancing system-on-chip.

8. The graphics processing apparatus according to claim 1, wherein the second system-on-chip is disabled in a normal processing mode and enabled in the enhanced processing mode.

9. The graphics processing apparatus according to claim 8, wherein the second system-on-chip is disabled in the normal processing mode in accordance with at least one of: setting by a user; a current performance metric of the first graphics processor; and the nature of the data to be processed.

10. The graphics processing apparatus of claim 1, wherein a combination of the first output data and the second output data indicates an amount of data per unit time that is greater than a data amount per unit time processing capability of the first graphics processor and greater than a data amount per unit time processing capability of the second graphics processor.