CN114064557B

CN114064557B - Image processing apparatus

Info

Publication number: CN114064557B
Application number: CN202010744770.2A
Authority: CN
Inventors: 陈羿逞; 童旭荣
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2024-08-27
Anticipated expiration: 2040-07-29
Also published as: CN114064557A

Abstract

A graphics processing apparatus includes a plurality of system-on-a-chip (SoCs) that can cooperate to achieve higher graphics processing performance, the apparatus including a first SoC, an external circuit, and a second SoC. The external circuit is not included in any SoC. The first SoC includes: a first Graphics Processor (GPU) that divides data to be processed into a first portion and a second portion, and that obtains and processes the first portion to generate and output first data; and a first transceiver circuit for obtaining the second portion to transmit the portion to the second SoC via the external circuit, and receiving second data via the external circuit to transmit the second data. The second SoC includes: a second transceiver circuit for receiving the second portion via the external circuit and outputting the second data to the first SoC via the external circuit; and a second GPU receiving the second portion from the second transceiving circuit and processing the portion to output the second data to the second transceiving circuit.

Description

Image processing apparatus

Technical Field

The present invention relates to graphics processing devices, and more particularly to graphics processing devices that include multiple system-on-chip chips that can operate cooperatively.

Background

A System on a Chip (SoC) design refers to integrating the main functions of one end product (or System) into a single Chip, which is called a SoC.

Low-computing-capability (low arithmetic capability) system-on-chip is typically used for lower-order electronics (e.g., 1920 x 1080 resolution television), while high-computing-capability system-on-chip is typically used for higher-order electronics (e.g., 3840 x 1920 resolution television). Considering that the total development and manufacturing costs of various system-in-chips of different computing capabilities are necessarily higher than those of any one of the various system-in-chips, and considering that the system-in-chips of high computing capabilities are not cost-effective for low-order electronic products, the industry needs a technology capable of realizing high computing capabilities by a combination of a plurality of system-in-chips of low computing capabilities, thereby flexibly using a single system-in-chip of low computing capabilities for low-order electronic products, and using a combination of a plurality of system-in-chips of low computing capabilities for high-order electronic products.

Known multi-core and multi-cluster technologies include universal interrupt controllers (Generic Interrupt Controller, GIC), consistent mesh network (Coherent Mesh Network, CMN) technologies, and cache consistent interconnect accelerator (Cache Coherent Interconnect for Accelerators, CCIX) technologies. The above-described techniques do not address the cooperative operation of different system-on-chip chips.

Disclosure of Invention

It is an object of the present disclosure to provide a graphics processing apparatus including a plurality of system-on-chip chips that can cooperatively operate to achieve higher graphics processing performance.

One embodiment of a graphics processing apparatus of the present disclosure includes a first system-on-chip, an external circuit, and a second system-on-chip. The first system-on-chip includes a first graphics processor (Graphic Processing Unit, GPU) and a first transceiver circuit. The first graphic processor is used for dividing data to be processed into a plurality of input parts including a first input part and a second input part in an enhanced processing mode; the first graphics processor is also configured to obtain and process the first input portion to generate and output first output data in the enhanced processing mode. The first transceiver circuit is coupled to the first graphics processor and is configured to obtain the second input portion in the enhanced processing mode, so as to transmit the second input portion to the second system-in-chip through the external circuit; the first transceiver circuit is further configured to receive second output data via the external circuit in the enhanced processing mode, so as to transmit the second output data. The external circuit is not included in either of the first system-on-chip and the second system-on-chip. The second system-on-chip comprises a second transceiver circuit and a second graphics processor. The second transceiver circuit is configured to receive the second input portion via the external circuit and output the second output data to the first system-in-chip via the external circuit in the enhanced processing mode. The second graphic processor is coupled with the second transceiver circuit and is used for receiving the second input part and processing the second input part to generate second output data in the enhanced processing mode; the second graphic processor is also used for outputting the second output data to the second transceiver circuit in the enhanced processing mode.

The features, implementation and effects of the present invention are described in detail below with reference to the preferred embodiments of the present invention in conjunction with the accompanying drawings.

Drawings

FIG. 1 shows an embodiment of an image processing apparatus of the present disclosure;

FIG. 2 shows one embodiment of the first SoC and the second SoC of FIG. 1;

FIG. 3 shows another embodiment of the first SoC and the second SoC of FIG. 1;

FIG. 4 shows an embodiment of an image processing pipeline that may be used as the first/second image processing pipeline of FIG. 3;

FIG. 5 shows a further embodiment of the first SoC and the second SoC of FIG. 1;

FIG. 6 illustrates one embodiment of a data processing apparatus of the present disclosure;

FIG. 7 illustrates one embodiment of the first SoC and the second SoC of FIG. 6;

FIG. 8 shows an example of collaboration of the first SoC and the second SoC of FIG. 6;

FIG. 9 shows another embodiment of the first SoC and the second SoC of FIG. 6;

FIG. 10 illustrates one embodiment of a graphics processing apparatus of the present disclosure;

FIG. 11 illustrates one embodiment of the first SoC and the second SoC of FIG. 10;

FIG. 12 shows a schematic diagram of an exemplary implementation of FIG. 11; and

Fig. 13 shows a schematic diagram of another exemplary implementation of fig. 11.

Detailed Description

The present disclosure discloses an image processing apparatus, a data processing apparatus, and a graphics processing apparatus, each apparatus including a plurality of system-on-chips that can cooperatively operate to achieve higher processing performance. To facilitate understanding, the following description contains many specifics, examples and exemplary implementations, which are not intended to limit the scope of the implementations of the present invention.

Fig. 1 shows an embodiment of an image processing apparatus of the present disclosure. The image processing apparatus 100 of fig. 1 includes a first SoC 110, a second SoC 120, and an external circuit 130. The first SoC 110 acts as a primary SoC, the second SoC 120 acts as a performance enhancing SoC (performance-ENHANCING SOC) having the same or different circuit configurations (circuit configuration); however, based on implementation requirements, certain circuitry in the first SoC 110 and/or the second SoC 120 may not have a substantial role. The external circuit 130 is not included in either one of the first SoC 110 and the second SoC 120. For example, if the first SoC 110 and the second SoC 120 are both packaged chips (PACKAGED CHIPS) disposed on a circuit board (e.g., a printed circuit board), the external circuit 130 may be/include a signal transmission line of the circuit board. For example, if the first SoC 110 and the second SoC 120 are unpackaged dies (non-PACKAGED DIES) included in a semiconductor package, the external circuit 130 may be included in the semiconductor package, and may include at least one of the following depending on the type of the semiconductor package (e.g., wire-bond package, flip-chip package, etc.): at least one connection pad; at least one connecting line; at least one metal ball; and at least one circuit is positioned on the surface of the substrate or contained in the substrate.

Fig. 2 shows one embodiment of the first SoC 110 and the second SoC 120 of fig. 1. As shown in fig. 2, the first SoC 110 includes a data shunt circuit 112, a first image processing circuit 114 and a transmitting circuit 116, and the second SoC 120 includes a receiving circuit 122 and a second image processing circuit 124. Each of the first SoC 110 and the second SoC 120 is configured to process a portion of the input video data to achieve a higher video processing performance through cooperation without exceeding a processing capability. The respective circuits of the first SoC 110 and the second SoC 120 are described below.

Please refer to fig. 1-2. The data splitting circuit 112 is configured to split the input image data into N input portions including a first input portion and a second input portion for the first image processing circuit 114 and the second image processing circuit 124 to process respectively, wherein N is an integer greater than 1, which also indicates that the image processing apparatus 100 includes N co-operable socs. In one exemplary implementation, the data splitting circuit 112 determines the horizontal position of the currently received horizontal pixel by counting the number of the received horizontal pixels of the horizontal image line, so as to split the input image data into the data of the left half frame and the data of the right half frame (when n=2), or split the input image data into more parts (when N > 2), which can be implemented by the known technology. In an exemplary implementation, the N is 2, and the processing capability of each of the first image processing circuit 114 and the second image processing circuit 124 indicates that the image size and the frame rate (FRAME RATE) are 7680 pixels×4320 pixels and 60Hz (8K 4K60Hz for short), respectively, or the image size and the frame rate are equivalent (e.g., 4K120Hz described below), and the split situation of the input image data is one of the following:

(1) The size and frame rate of the input video data are 8K4K60Hz. The first input part is the data of the left half picture corresponding to the input image data, and the size and the frame rate are 3840 pixels×4320 pixels and 60Hz (4K 60Hz for short) respectively. The second input part is the data of the right half picture corresponding to the input image data, and the size and the frame rate are also 4K4K60Hz.

(2) The size and frame rate of the input video data are 8K4K60Hz. The first input part is the data of the left half picture and the data of the right half picture corresponding to the input image data, and the size and the frame rate of the first input part are (3840+n) pixel×4320 pixel and 60Hz (4 k+n) 4K60Hz for short, respectively. The second input part is the data of the right half picture corresponding to the input image data and the data of the left half picture of a part, and the size and the frame rate of the second input part are (4K+n) 4K60Hz. In this case, the data of the right half of the first input portion and the data of the left half of the second input portion are usually the data of the joint adjacent to the left and right half of the second input portion, and are used for referencing the first image processing circuit 114 and the second image processing circuit 124 to help the processed left and right half of the second input portion to be joined seamlessly.

(3) The size and frame rate of the input image data are 3840 pixels×2160 pixels and 120Hz (4K 2K120Hz for short), respectively. The first input part is the data of the left half picture corresponding to the input image data, and the size and the frame rate are 1920 pixels×2160 pixels and 120Hz (2K 120Hz for short) respectively. The second input part is the data of the right half picture corresponding to the input image data, and the size and the frame rate are also 2K2K120Hz.

(4) The size and frame rate of the input image data are 4K2K120Hz. The first input part is the data of the left half picture corresponding to the input image data and the data of a part of the right half picture, and the size and the frame rate of the first input part are (1920+n) pixel×2160 pixel and 120Hz (abbreviated as (2k+n) 2K120 Hz) respectively. The second input part is the data of the right half picture corresponding to the input image data and the data of the left half picture of a part, and the size and the frame rate of the second input part are (2K+n) 2K120Hz. In this case, the data of the right half of the first input portion and the data of the left half of the second input portion are usually the data of the joint adjacent to the left and right half of the second input portion, and are used for referencing the first image processing circuit 114 and the second image processing circuit 124 to help the processed left and right half of the second input portion to be joined seamlessly.

Please refer to fig. 1-2. The transmitting circuit 116 is coupled to the data splitting circuit 112 for receiving the second input portion and outputting the second input portion to the second system SoC 120 via the external circuit 130. The receiving circuit 122 is coupled to the external circuit 130 for receiving the second input portion and forwarding the second input portion to the second image processing circuit 124. In One exemplary implementation, the transmission between the transmit circuit 116 and the receive circuit 122 is based on a known or self-developed signal transmission standard (SIGNALING STANDARD) (e.g., V-by-One HS standard or HDMI standard), and neither the transmission within the first SoC 110 nor the transmission within the second SoC 120 is based on the signal transmission standard, which typically supports a maximum data transmission rate that is not less than the image data processing capability of the second SoC 120. In one exemplary implementation, the first SoC 110 includes a first encryption circuit (not shown) for encrypting the second input portion before outputting the second input portion to the receiving circuit 122 via the external circuit 130; the second SoC 120 includes a second decryption circuit (not shown) for decrypting the second input portion after receiving the second input portion. In one exemplary implementation, if the first SoC 110 has a need to receive data from the second SoC 120, the first SoC 110 includes a first transceiver circuit (e.g., the first transceiver circuit 310 of fig. 3) including the transmit circuit 116; if the data received from the second SoC 120 is encrypted, the first SoC 110 includes a first decryption circuit (not shown) to decrypt the data of the second SoC 120; in the present exemplary implementation, if the second SoC 120 is capable of outputting data to the first SoC 110, the second SoC 120 includes a second transceiver circuit (e.g., the second transceiver circuit 320 of fig. 3) including the receiving circuit 122, and the second SoC 120 includes a second encryption circuit (not shown) that encrypts the data before the second transceiver circuit outputs the data to the first transceiver circuit, as required by the implementation. The encryption circuit and the decryption circuit may be implemented by known or self-developed techniques, such as High-Bandwidth Digital Content Protection (HDCP).

Please refer to fig. 1-2. The first image processing circuit 114 is coupled to the data splitting circuit 112 for receiving and processing the first input portion to generate a first output portion of a plurality of output portions for outputting image data to a back-end circuit (e.g., a panel control circuit). The second image processing circuit 124 is coupled to the receiving circuit 122 for receiving and processing the second input portion to generate a second output portion of the plurality of output portions of the output image data to the back-end circuit. For example, in the case of the foregoing cases (1) or (2), when the size and frame rate of the first/second output portion are 3840 pixels×4320 pixels and 120Hz (4K 120Hz for short), the first image processing circuit 114/second image processing circuit 124 includes a known or self-developed Frame Rate Conversion (FRC) circuit (e.g., FRC circuit 420 of fig. 4) for converting the input frame rate (60 Hz) of the first/second input portion into the output frame rate (120 Hz) of the first/second output portion, and the first output portion and the second output portion form a complete image frame in a time of one-half (1/120 Hz) of the output frame rate. For example, in the case of the aforementioned case (3) or (4), when the size and frame rate of the first/second output portion is 4K120Hz, the first image processing circuit 114/second image processing circuit 124 includes a known or self-developed scaler (scaler) (e.g., the scaler 430 of fig. 4) for scaling the size of the first/second input portion (1920 pixels×2160 pixels, or (1920+n pixels×2160 pixels) to the size of the first/second output portion (3840 pixels×4320 pixels), and the first/second output portion form a complete image frame within a time of one-half (1/120 Hz) of the frame rate of the first/second output portion. In addition, according to implementation requirements, the first image processing circuit 114 may output at least a portion of the first output portion to the second image processing circuit 124 via the transmitting circuit 116 and the receiving circuit 122, and/or the second image processing circuit 124 may output at least a portion of the second output portion to the first image processing circuit 114 via the second transceiver circuit and the first transceiver circuit; for example, two image processing circuits may exchange data to be sent to a panel for display and process it to meet the specific needs of the panel.

It should be noted that the combination of the first output portion and the second output portion (e.g., the combination of the left half frame (4K 120 Hz) and the right half frame (4K 120 Hz) (8K 4K120 Hz)) indicates the size and the frame rate of the output image, and the size and the frame rate of the output image determine the data amount per unit time (i.e., the data transmission rate of the output image) that is greater than the data amount processing capability per unit time (e.g., 4K120 Hz) of the first image processing circuit 114 and that is also greater than the data amount processing capability per unit time (e.g., 4K120 Hz) of the second image processing circuit 124. In other words, the combination of the first SoC 110 and the second SoC 120 achieves a higher processing performance than either of the two socs.

Fig. 3 shows another embodiment of the first SoC 110 and the second SoC 120 of fig. 1, and in particular shows an embodiment of the first image processing circuit 114 and the second image processing circuit 124. According to fig. 3, the circuit configuration of the second SoC 120 is the same as the first SoC 110, and thus, the first SoC 110 includes a first transceiver circuit 310 including a transmitting circuit 116 (not shown in fig. 3), the second SoC 120 includes a second transceiver circuit 320 including a receiving circuit 122 (not shown in fig. 3), and the second SoC 120 further includes another data splitting circuit 330 corresponding to the data splitting circuit 112, but the data splitting circuit 330 has no substantial effect and may be disabled or omitted; in addition, the first image processing circuit 114 and the second image processing circuit 124 have the same circuit configuration, wherein part of the circuits may be disabled or omitted without substantial effect. The first image processing circuit 114 includes a first delay circuit 1142, a first selection circuit 1144, and a first image processing circuit (image processing pipeline) 1146; the second image processing circuit 124 includes a second delay circuit 1242, a second selection circuit 1244 and a second image processing circuit 1246. It should be noted that if the input image data is/includes encoded data, the first image processing circuit 114/the second image processing circuit 124 may further include a decoder (not shown) for decoding the encoded data so that the first image processing circuit 1146/the second image processing circuit 1246 process the decoded data.

Please refer to fig. 3. Considering that the path of the data splitting circuit 112 outputting the first input portion to the first image processing circuit 114 is generally shorter than the path of the data splitting circuit 112 outputting the second input portion to the second image processing circuit 124, the first delay circuit 1142 is configured to receive and delay the first input portion so that the time of the first image processing circuit 114 receiving the first input portion is substantially synchronous with the time of the second image processing circuit 124 receiving the second input portion, the substantial synchronization means that the difference of the receiving times is smaller than the predetermined threshold value and can be ignored. The first selection circuit 1144 is coupled between the first delay circuit 1142 and the first image processing pipeline 1146, and is coupled to the first transceiver circuit 310 (as shown by the dashed short-dashed line in fig. 3); the first selection circuit 1144 is configured to receive the first input portion from the first delay circuit 1142 and output the first input portion to the first image processing pipeline 1146. The first image processing circuit 1146 is coupled to the first selection circuit 1144 for receiving and processing the first input portion to generate the first output portion.

Please refer to fig. 3. The second delay circuit 1242 is coupled to the data shunt circuit 330 (shown by the dashed short-dashed line in fig. 3), but has no substantial effect. The second selection circuit 1244 is coupled to the second delay circuit 1242 (as shown by the dashed short-dashed line in fig. 3) and is coupled between the second transceiver circuit 320 and the second image processing pipeline 1246; the second selection circuit 1244 is configured to receive the second input portion from the second transceiver circuit 320 and output the second input portion to the second image processing pipeline 1246. The second image processing circuit 1246 is coupled to the second selection circuit 1244 for receiving and processing the second input portion to generate the second output portion.

Please refer to fig. 3. In one exemplary implementation, the first image processing pipeline 1146 exchanges one or more synchronization signals (e.g., at least one horizontal synchronization signal and/or at least one vertical synchronization signal) with the second image processing pipeline 1246 to substantially synchronize the first output portion with the second output portion. In an exemplary implementation, a dedicated line (not shown) is provided between the first image processing pipeline 1146 and the second image processing pipeline 1246 for unidirectional or bidirectional signal transmission, wherein the portion of the dedicated line between the two socs is included in the external circuit 130; one of ordinary skill in the art can refer to the first transceiver 525, the external circuitry 570, and the second transceiver 555 of fig. 5 and their associated description to understand how to implement the dedicated circuitry. Various approaches may be taken with respect to the transfer between the first image processing pipeline 1146 and the second image processing pipeline 1246, including:

(1) The dedicated line is used to transfer between the first image processing line 1146 and the second image processing line 1246. Any image processing circuit can receive/access the data transmitted from another image processing circuit according to the time sequence of the input image data, and output the processed data from the data shunt circuit 112, and the data can be temporarily stored in a buffer (not shown) before being output to the image processing circuit;

(2) The transmission between the first image processing pipeline 1146 and the second image processing pipeline 1246 is implemented by using the existing paths (i.e., the first transceiver circuit 310, the external circuit 130 and the second transceiver circuit 320). If the existing path is only available for transmission or reception at the same time, each SoC may determine the timing of transmission and reception according to the timing of the input image data using a known or self-developed arbiter (not shown); if the existing path can be used for transmission and reception at the same time, any SoC can store the received data in a buffer (not shown), and the image processing circuit of the SoC can receive/access the received data in the buffer according to the timing sequence of the input image data and process and output the data from the data splitting circuit 112.

Fig. 4 shows an embodiment in which the image processing pipeline 400 may be used as either the first image processing pipeline 1146 or the second image processing pipeline 1246. The image processing pipeline 400 comprises: a known or self-developed image characteristic adjustment circuit 410 for adjusting characteristics of the image such as brightness, contrast, saturation, etc.; a known or self-developed frame rate conversion circuit 420; and a sealer 430, known or self-developed. The position sequence of each circuit of the image processing pipeline 400 depends on implementation requirements; in addition, the image processing pipeline 400 may include additional circuitry (e.g., a known or self-developed panel timing converter) or may omit some of the unused circuitry.

Fig. 5 shows a further embodiment of the first SoC 110 and the second SoC 120 of fig. 1. In this embodiment, the first SoC 110 and the second SoC 120 are respectively a first television SoC and a second television SoC, and are used for converting various input video data into video data that can be displayed by the television panel; the first SoC 110 further includes a first system bus 510, a first processor 520 (e.g., a central processing unit (Central Processing Unit, CPU) or a graphics processor (Graphics Processing Unit, GPU)), a first transceiver 525, and other circuitry 530 (e.g., network circuitry, USB circuitry, audio circuitry, storage circuitry, etc.); the second SoC 120 further includes a second system bus 540, a second processor 550, a second transceiver 555, and other circuitry 560. The first processor 520 and the second processor 550 cooperate via the first transceiver 525, the external circuitry 570, and the second transceiver 555, details and variations of this cooperation being seen in the description of the embodiment of the invention of fig. 6-13; it should be noted that, depending on implementation requirements, the external circuit 570 may be integrated with the external circuit 130, where each of the first SoC 110 and the second SoC 120 may include an access circuit (as shown in fig. 9) for controlling the forwarding (destination) of data; it is further noted that the first transceiver 525 may be integrated with the transmitting circuit 116, the second transceiver 555 may be integrated with the receiving circuit 122, and the management of the data transmission after integration may be performed by a known or self-developed arbiter, provided that the performance is acceptable. In addition, the first image processing circuit 114 communicates with the first processor 520 via the first system bus 510 to utilize the operation resources of the first processor 520 or is controlled by the first processor 520; the second image processing circuit 124 communicates with the second processor 550 via the second system bus 540 to utilize the computing resources of the second processor 550 or is controlled by the second processor 550. Each of the first SoC 110 and the second SoC 120 may be used alone for lower-order television products (e.g., 4K televisions), and the two socs may also be used in cooperation for higher-order television products (e.g., 8K televisions).

Fig. 6 shows one embodiment of a data processing apparatus of the present disclosure. The data processing apparatus 600 of fig. 6 includes a first SoC 610, a second SoC 620, and an external circuit 630. The first SoC 610 acts as the primary SoC and the second SoC 620 acts as the performance enhancing SoC, with the same or different circuit configurations; however, based on implementation requirements, certain circuitry in the first SoC 610 and/or the second SoC 620 may not have a substantial role. The external circuit 630 is not included in either one of the first SoC 610 and the second SoC 620. For example, if the first SoC 610 and the second SoC 620 are packaged chips disposed on a circuit board (e.g., a printed circuit board), the external circuit 630 may be/include signal transmission lines of the circuit board. For example, if the first SoC 610 and the second SoC 620 are unpackaged dies included in a semiconductor package, the external circuit 630 may be included in the semiconductor package, and may include at least one of the following depending on the type of the semiconductor package (e.g., wire-bond package, flip-chip package, etc.): at least one connection pad; at least one connecting line; at least one metal ball; and at least one circuit is positioned on the surface of the substrate or contained in the substrate.

Fig. 7 shows one embodiment of the first SoC 610 and the second SoC 620 of fig. 6. As shown in fig. 7, the first SoC 610 includes a first CPU 612 and a first transceiver circuit 614, and the second SoC 620 includes a second CPU 622 and a second transceiver circuit 624. Each of the first SoC 610 and the second SoC 620 is configured to process a portion of the data to be processed to achieve higher data processing performance through cooperation without exceeding processing power. The respective circuits of the first SoC 610 and the second SoC 620 are described below.

Please refer to fig. 6-7. The first CPU 612 is configured to divide the data to be processed into a plurality of input portions including a first input portion and a second input portion according to the data to be processed or related information thereof in the intensive processing mode (i.e. when the first SoC 610 and the second SoC 620 are running simultaneously); the first CPU 612 is further configured to obtain and process the first input portion to generate and output first output data in the enhanced processing mode. For example, at least a portion of the first SoC 610 is running in an open execution environment (Rich Execution Environment, REE); all of the second SoC 620 is running in a trusted execution environment (Trust Execution Environment, TEE); the first input portion is non-sensitive data such as system operating data of a general purpose operating system (e.g., an open source operating system); the second input portion is sensitive data such as at least one of: data to be verified (e.g., identification data such as fingerprint data, personal identification code (personal identification number, PIN), payment information, etc.); confidential/secret data (e.g., private key (PRIVATE KEY), certificate (certificate), etc.); and protected data (e.g., digital Rights Management (DRM) data such as encrypted compressed video data). In the above example, the sensitive data of the second input portion is transmitted from the first SoC 610 to the second SoC 620 through the external circuit 630, so if the data transmitted through the external circuit 630 (e.g. the circuit on the circuit board) is easier to be stolen, the communication between the first SoC 610 and the second SoC 620 generally has to meet the security transmission specification (e.g. digital transmission content protection (Digital Transmission Content Protection, DTCP)); if the data transferred via the external circuit 630 (e.g., pads, solder balls, etc. within the semiconductor package) is less vulnerable to theft, the communication between the first SoC 610 and the second SoC 620 does not necessarily conform to the secure transmission specification. For another example, the first SoC 610 includes two parts running on the re and TEE, respectively, the first input part is insensitive and/or sensitive data, and the data transmission between the two parts is data transmission in the same SoC, which is not required to meet the aforementioned security transmission specification.

Please refer to fig. 6-7. The first transceiver 614 is coupled to the first CPU 612 for retrieving the second input portion from the first CPU 612 or a memory (e.g., the system memory 920 of fig. 9) to transfer the second input portion to the second SoC 620 via the external circuit 630 in the enhanced processing mode, and the first transceiver 614 is further configured to receive the second output data of the second SoC 620 via the external circuit 630 in the enhanced processing mode to transfer the second output data. The second transceiver 624 is configured to receive the second input portion via the external circuit 630 and output the second output data to the first SoC 610 via the external circuit 630 in the enhanced processing mode. The second CPU 622 is configured to receive the second input portion from the second transceiver circuit 624 in the enhanced processing mode, and process the second input portion to generate the second output data, thereby outputting the second output data to the second transceiver circuit 624.

Please refer to fig. 6-7. In one exemplary implementation, the first CPU 612 includes a first cache memory 6122 and the second CPU 622 includes a second cache memory 6222; when the first CPU 612 processes the first input section, the first CPU 612 stores first buffered data (e.g., data to be processed or processed) associated with the first input section using the first buffer memory 6122; when the second CPU 622 processes the second input section, the second CPU 622 stores second cache data (e.g., data to be processed or processed data) associated with the second input section using the second cache memory 6222; the first cache data is inconsistent with the second cache data (incoherent), in other words, the first CPU 612 does not need to care for the progress of the second CPU 622 to process the second input portion, the second CPU 622 does not need to care for the progress of the first CPU 612 to process the first input portion, and the stored data of the first cache memory 6122 and the stored data of the second cache memory 6222 do not need to be consistent, which is different from the prior art (for example: CCIX).

To aid understanding, one exemplary implementation is set forth below. The first SoC 610 cooperates with the second SoC 610 to process the network video stream as shown in fig. 8. Fig. 8 shows the following stages of processing:

(1) S810: the first SoC 610 outputs login data (i.e., sensitive data) of the network video streaming service to the second SoC 620.

(2) S820: the second SoC 620 processes the user account information and performs authentication.

(3) S830: the second SoC 620 handles DRM related issues.

(4) S840: the first SoC 610 begins playing the network video.

(5) S850: the first SoC 610 receives the encrypted network video stream data from the network and outputs the encrypted network video stream data (i.e., sensitive data) to the second SoC 620.

(6) S860: the second SoC 620 decrypts the encrypted network video stream data.

(7) S870: the second SoC 620 transmits the decrypted data to the first SoC 610 under DTCP protection.

(8) S880: the first SoC 610 outputs video data through the secure display path.

Redundant description is omitted herein since one of ordinary skill in the art can implement the stages of fig. 8 using the circuits of fig. 6-7 in light of the above description.

Fig. 9 shows another embodiment of the first SoC 610 and the second SoC 620 of fig. 6. As shown in fig. 9, the first SoC 610 includes, in addition to the first CPU 612 and the first transceiver 614, a first system bus 910, a first system memory 920 (e.g., DRAM), a first memory access circuit 930, a first encryption/decryption circuit 940 and other circuits 950 (e.g., network circuits, USB circuits, audio circuits, graphics processors, etc.); the second SoC 620 includes a second system bus 960, a second system memory 970 (e.g., DRAM), a second memory access circuit 980, a second encryption/decryption circuit 990, and other circuits 995 (e.g., network circuits, USB circuits, audio circuits, graphics processors, etc.), in addition to the second CPU 622 and the second transceiver circuit 624. In addition, a dedicated line may be selectively provided between the first CPU 612 and the second CPU 622 as shown in dashed lines in fig. 9, so that the two CPUs perform signal transmission (e.g., transmission of interrupt request (Interrupt Request, IRQ), and/or transmission of control signals/information required for cooperative operation) unidirectionally or bidirectionally, wherein a portion of the dedicated line between the two socs is included in the external circuit 630; if the dedicated line is not provided, the signal transmission is a path formed by the memory access circuit, the encryption/decryption circuit, the transceiver circuit, and the like.

Please refer to fig. 6 and fig. 9. The first memory access circuit 930 is a well-known or self-developed circuit for receiving/forwarding instructions or data from the first CPU 612 and for accessing the first system memory 920 via the first system bus 910; the first CPU 612 may also access the first system memory 920 directly via the first system bus 910 according to implementation requirements. The first encryption/decryption circuit 940 is a known or self-developed circuit for obtaining the second input portion from the first memory access circuit 930 and encrypting it, and providing the encrypted second input portion to the first transceiver circuit 614 for output to the second transceiver circuit 624. The first encryption/decryption circuit 940 is further configured to receive the second output data from the first transceiver circuit 614 and decrypt it to output the decrypted second output data to the first memory access circuit 930. The operation of each circuit of the second SoC 620 is similar to that of each circuit of the first SoC 610, and the repetitive and redundant description is omitted here. In one exemplary implementation, the second input portion contains compressed data and the second CPU 622 is configured to decompress the compressed data to generate decompressed data contained in the second output data. In one exemplary implementation, the second input portion contains audio data, and the second CPU 622 is configured to apply equalization (equalization) to the audio data to produce equalized audio data contained in the second output data. It should be noted that the encryption and decryption circuit may be disabled or omitted depending on implementation requirements.

Please refer to fig. 6, 7, 9. In one exemplary implementation, each of the first SoC 610 and the second SoC 620 is a television SoC. In one exemplary implementation, the second SoC 620 is enabled in the intensive processing mode and disabled in the normal processing mode to reduce power consumption, each mode may be dependent on at least one of: user setting; the current performance index of the first CPU 612; the nature (e.g., sensitivity or independence) of the data to be processed. In one exemplary implementation, the combination of the first output data and the second output data indicates an amount of data per unit time that is greater than the amount of data per unit time processing capability of the first CPU 612 and also greater than the amount of data per unit time processing capability of the second CPU 622, which means that the processing capability of the data processing apparatus 600 is better than either of the first SoC 610 and the second SoC 620.

Fig. 10 shows one embodiment of a graphics processing apparatus of the present disclosure. The graphics processing apparatus 1000 of fig. 10 includes a first SoC 1010, a second SoC 1020, and an external circuit 1030. The first SoC 1010 acts as a primary SoC and the second SoC 1020 acts as a performance enhancing SoC, with the same or different circuit configurations; however, based on implementation requirements, certain circuitry in the first SoC 1010 and/or the second SoC 1020 may not have a substantial role. The external circuit 1030 is not included in either of the first SoC 1010 and the second SoC 1020. For example, if the first SoC 1010 and the second SoC 1020 are both packaged chips disposed on a circuit board (e.g., a printed circuit board), the external circuit 1030 may be/include a signal transmission line of the circuit board. For example, if the first SoC 1010 and the second SoC 1020 are unpackaged dice included in a semiconductor package, the external circuit 1030 may be included in the semiconductor package, and may include at least one of the following depending on the type of semiconductor package (e.g., wire-bond package, flip-chip package, etc.): at least one connection pad; at least one connecting line; at least one metal ball; and at least one circuit is positioned on the surface of the substrate or contained in the substrate.

Fig. 11 shows one embodiment of the first SoC 1010 and the second SoC 1020 of fig. 10. As shown in fig. 11, the first SoC 1010 includes a first GPU 1012 and a first transceiving circuit 1014, and the second SoC 1020 includes a second GPU 1022 and a second transceiving circuit 1024. Each of the first SoC 1010 and the second SoC 1020 is configured to process a portion of the data to be processed to achieve higher graphics processing performance through cooperation without exceeding processing capabilities. The respective circuits of the first SoC 1010 and the second SoC 1020 are described below.

Please refer to fig. 10-11. The first GPU 1012 is configured to divide the data to be processed into a plurality of input portions including a first input portion and a second input portion in an enhanced processing mode (i.e., when the first SoC 1010 and the second SoC 1020 are simultaneously operating); the first GPU 1012 is also configured to obtain and process the first input portion to generate and output first output data in the enhanced processing mode. The first transceiver circuit 1014 is configured to obtain the second input portion from the first GPU 1012 or from a memory access circuit (not shown) controlled by the first GPU 1012 in the enhanced processing mode, to transmit the second input portion to the second SoC 1020 via the external circuit 1030; the first transceiver circuit 1014 is further configured to receive second output data via the external circuit 1030 to output the second output data in the enhanced processing mode. The second transceiver 1024 is configured to receive the second input portion via the external circuit 1030 and transmit the second output data to the first SoC 1010 via the external circuit 1030 in the enhanced processing mode. The second GPU 1022 is configured to receive the second input portion from the second transceiving circuit 1024 and process the second input portion to generate the second output data in the enhanced processing mode. The second GPU 1022 is further configured to output the second output data to the second transceiving circuit 1024 in the intensive processing mode.

Please refer to fig. 10-11. In one exemplary implementation, the first GPU 1012 includes a first cache memory 1110, and the second GPU 1022 includes a second cache memory 1120; as the first GPU 1012 processes the first input portion, the first GPU 1012 uses the first cache memory 1110 to store first cache data (e.g., data to be processed or processed) associated with the first input portion; when the second GPU 1022 processes the second input portion, the second GPU 1022 stores second buffered data (e.g., data to be processed or processed) associated with the second input portion using the second buffer memory 1120; the first cache data is inconsistent with the second cache data, in other words, the first GPU 1012 does not need to pay attention to the progress of the second GPU 1022 to process the second input portion, the second GPU 1022 does not need to pay attention to the progress of the first GPU 1012 to process the first input portion, and the data stored in the first cache 1110 and the data stored in the second cache 1120 do not need to be consistent, which is different from the prior art (e.g. CCIX).

In this regard, for example, the first SoC 1010 executes a first application (application) (e.g., a photography application or a second game application) and a second application (e.g., a chat application), the second SoC 1020 executes a third application (e.g., a first game application), the first input portion includes data related to the first application and data related to the second application (i.e., data to be rendered by the first GPU 1012), the second input portion includes data related to the third application and data of a keyboard/mouse event that controls execution of the third application (i.e., data to be rendered by the second GPU 1022), The first output data includes first rendering data (e.g., the first application's picture data) and second rendering data (e.g., the second application's picture data), the second output data includes third rendering data (e.g., the third application's picture data) and an interrupt request, the first SoC 1010 executes an interrupt service routine (Interrupt Service Routine, ISR) (e.g., the interrupt service routine 1210 of fig. 12) to write the third rendering data to a data buffering circuit (e.g., the data buffering circuit 1220 of fig. 12) according to the interrupt request, in addition, the first SoC 1010 executes software (e.g., known Alpha blending (Alpha blending) software 1230 of FIG. 12) to read the third rendering data from the data buffering circuit and superimpose the first, second and third rendering data for display on the same OSD screen. Fig. 12 is a schematic diagram of the above example, wherein one example of the data register circuit 1220 includes three stages of registers connected in a ring (not shown), the first SoC 1010 updates the write pointer (write pointer) of the data register circuit 1220 to change from pointing to the X ^th register to pointing to the (x+1) ^th register after writing the third rendering data to the X ^th register of the data register circuit 1220, before the first SoC 1010 reads the third rendering data from the X ^th register of the data register circuit 1220, the read pointer (read pointer) of the data register circuit 1220 is also updated to point to the X ^th register instead of the (x+2) ^th register, x, (X+1) and (X+2) are three consecutive integers, and the next number of (X+2) is X to constitute a cycle. It is noted that the above-mentioned ISR and the superposition techniques using software can be implemented by known or self-developed techniques.

For example, the first input portion includes main user interface data (i.e., data to be rendered by the first GPU 1012), the second input portion includes two-dimensional data/position data (i.e., data to be rendered by the second GPU 1022) and an interrupt request for the second GPU 1022 to receive and process the second input portion, the first output data includes first rendering data, the second output data includes second rendering data (e.g., augmented Reality (Augmented Reality, AR) data or Virtual Reality (VR) data), the first OSD 1010 generates a hardware 1310 overlay by hardware (e.g., the known On-screen display (On SCREEN DISPLAY, OSD) of FIG. 13), displays the first rendering data On a first layer OSD screen and displays the second rendering data On a second layer screen. It is noted that the above techniques for overlaying graphics by hardware may be implemented by known or self-developed techniques. Fig. 13 is a schematic view of the above example.

Please refer to fig. 10. In one exemplary implementation, each of the first SoC 1010 and the second SoC 1020 is a television SoC. In one exemplary implementation, the second SoC 1020 is enabled in the enhanced processing mode and disabled in the normal processing mode to reduce power consumption, each mode may be dependent on at least one of: user setting; the current performance index of the first GPU 1012; the nature of the data to be processed (e.g., high computational resource requirements). In one exemplary implementation, the combination of the first output data and the second output data indicates an amount of data per unit time that is greater than the amount of data per unit time processing capability of the first GPU 1012 and also greater than the amount of data per unit time processing capability of the second GPU 1022, which means that the processing capability of the data processing apparatus 1000 is better than the processing capability of either of the first SoC 1010 and the second SoC 1020.

Please refer to fig. 10, and refer to fig. 5/9. In one implementation example, the first SoC 1010 includes a first CPU and a first system bus (not shown), and the first GPU 1012 communicates with the first CPU via the first system bus to utilize the computing resources of the first CPU; the second SoC 1020 includes a second CPU and a second system bus, and the second GPU 1022 communicates with the second CPU through the second system bus to utilize the computing resources of the second CPU.

It should be noted that, where possible, one of ordinary skill in the art may selectively implement some or all of the features of any one of the embodiments described above, or may selectively implement combinations of some or all of the features of multiple embodiments described above, thereby increasing the flexibility of implementing the invention.

In summary, each of the image processing apparatus, the data processing apparatus and the graphics processing apparatus of the present disclosure may realize higher processing performance through a plurality of socs operating cooperatively.

Although the embodiments of the present invention have been described above, these embodiments are not intended to limit the present invention, and those skilled in the art may make variations to the technical features of the present invention according to the explicit or implicit disclosure of the present invention, and all such variations may fall within the scope of claims sought to be protected by the present invention, in other words, the scope of claims of the present invention should be regarded as being defined by the claims of the present specification.

Description of the reference numerals

100 Image processing device

110 First SoC

120 Second SoC

130 External circuits

112 Data shunt circuit

114 First image processing circuit

116 Transfer circuit

122 Receiving circuit

124 Second image processing circuit

310 First transceiver circuit

320 Second transceiver circuit

330 Data shunt circuit

1142 First delay circuit

1144 First selection circuit

1146 First image processing pipeline

1242 Second delay circuit

1244 Second selection circuit

1246 Second image processing pipeline

400 Image processing pipeline

410 Image characteristic adjusting circuit

420 Frame rate conversion circuit

430 Scaler

510 First system bus

520 First processor

525 First transceiver

530 Other circuits

540 Second System bus

550 Second processor

555 Second transceiver

560 Other circuits

570 External Circuit

600 Data processing apparatus

610 First SoC

620 Second SoC

630 External circuits

612 First CPU

614 First transceiver circuit

622 Second CPU

624 Second transceiver circuit

6122 First cache memory

6222 Second cache memory

S810-S880 video stream processing stage

910 First system bus

920 First System memory

930 First memory access circuit

940 First encryption and decryption circuit

950 Other circuits

960 Second System bus

970 Second System memory

980 Second memory access circuit

990 Second encryption and decryption circuit

995 Other circuits

1000 Graphics processing apparatus

1010 First SoC

1020 Second SoC

1030 External circuit

1012 First GPU

1014 First transceiver circuit

1022 Second GPU

1024 Second transceiver circuit

1110 First cache memory

1120 Second cache memory

1210 Interrupt service routine

1220 Data temporary storage circuit

1230 Alpha blending software

1310 OSD generating hardware

Claims

1. A graphics processing apparatus comprising a plurality of system-on-chip chips that are interoperable, the graphics processing apparatus comprising:

A first system-on-chip comprising:

The first graphic processor is used for dividing data to be processed into a plurality of input parts including a first input part and a second input part in an enhanced processing mode, and is also used for acquiring and processing the first input part to generate and output first output data in the enhanced processing mode; and

The first transceiver circuit is coupled to the first graphics processor and is used for acquiring the second input part to transmit the second input part to a second system-in-chip through an external circuit in the enhanced processing mode, and is also used for receiving second output data through the external circuit in the enhanced processing mode to output the second output data;

The external circuit is not included in any one of the first system-on-chip and the second system-on-chip; and

The second system-on-chip includes:

A second transceiver circuit for receiving the second input portion via the external circuit and transmitting the second output data to the first system-in-chip via the external circuit in the intensive processing mode; and

The second graphics processor is coupled to the second transceiver circuit, and is configured to receive the second input portion and process the second input portion to generate the second output data in the enhanced processing mode, and is further configured to output the second output data to the second transceiver circuit in the enhanced processing mode.

2. The graphics processing apparatus of claim 1, wherein the first graphics processor uses a first cache memory to store first cache data while the first graphics processor processes the first input section; when the second graphics processor processes the second input part, the second graphics processor uses a second cache memory to store second cache data; the first cache data is inconsistent with the second cache data.

3. The graphics processing apparatus of claim 1, wherein the first output data includes first rendering data and the second output data includes second rendering data, the first system-on-chip executing software to superimpose the first rendering data and the second rendering data.

4. The graphics processing apparatus of claim 1, wherein the first output data comprises first rendering data and the second output data comprises second rendering data, the first system-in-chip superimposing the first rendering data and the second rendering data by hardware to display the first rendering data as a first layer screen and the second rendering data as a second layer screen.

5. The graphics processing apparatus of claim 1, wherein the first system-in-chip and the second system-in-chip are packaged chips disposed on a circuit board, the external circuit belonging to the circuit board.

6. The graphics processing apparatus of claim 1, wherein the first system-in-chip and the second system-in-chip are both unpackaged dice contained in a semiconductor package, the external circuitry being contained in the semiconductor package.

7. The graphics processing apparatus of claim 1, wherein the first system-on-chip and the second system-on-chip have the same circuit configuration, the first system-on-chip being a primary system-on-chip and the second system-on-chip being a performance enhancing system-on-chip.

8. The graphics processing apparatus of claim 1, wherein the second system-on-chip is disabled in a normal processing mode and enabled in the enhanced processing mode.

9. The graphics processing apparatus of claim 8, wherein the second system-on-chip is disabled in the normal processing mode according to at least one of: user setting; a current performance index of the first graphics processor; and the nature of the data to be processed.

10. The graphics processing apparatus of claim 1, wherein the combination of the first output data and the second output data indicates an amount of data per unit time that is greater than an amount of data per unit time processing capability of the first graphics processor and greater than an amount of data per unit time processing capability of the second graphics processor.