CN117349222A - Processor, data processing method and computer equipment - Google Patents

Processor, data processing method and computer equipment Download PDF

Info

Publication number
CN117349222A
CN117349222A CN202311315788.0A CN202311315788A CN117349222A CN 117349222 A CN117349222 A CN 117349222A CN 202311315788 A CN202311315788 A CN 202311315788A CN 117349222 A CN117349222 A CN 117349222A
Authority
CN
China
Prior art keywords
sub
data
module
modules
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311315788.0A
Other languages
Chinese (zh)
Inventor
蔡俊伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311315788.0A priority Critical patent/CN117349222A/en
Publication of CN117349222A publication Critical patent/CN117349222A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/17Interprocessor communication using an input/output type connection, e.g. channel, I/O port
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The application discloses a processor, a data processing method and computer equipment, and belongs to the technical field of computers. The processor comprises a sub-processor, wherein the sub-processor comprises a memory and a plurality of execution modules which are connected with each other, and the execution modules comprise a plurality of operation sub-modules and interaction sub-modules; the memory is used for dividing the first data into K first sub-data and respectively transmitting the K first sub-data into the operation sub-module; the operation sub-module is used for operating the first sub-data transmitted from the memory to obtain intermediate sub-data, and transmitting the intermediate sub-data to the interaction sub-module belonging to the same execution module; and the interaction sub-module is used for acquiring the intermediate sub-data transmitted by the operation sub-module, transmitting the intermediate sub-data to other interaction sub-modules, receiving the intermediate sub-data transmitted by the other interaction sub-modules, and processing the obtained plurality of intermediate sub-data to obtain second data. The method and the device can reduce the area of the execution module and reduce the difficulty of carrying out layout and wiring on the execution module.

Description

Processor, data processing method and computer equipment
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a processor, a data processing method and computer equipment.
Background
Processors include a variety of different types of sub-processors, typically including multiple arithmetic units that may operate on data in parallel, thereby ensuring parallelism of the sub-processors. The operation units are laid out on an execution module, and the execution module refers to the smallest module when the sub-processor is laid out. Since a plurality of operation units in the sub-processor are required to be laid out on one execution module, the execution module needs to stack a large number of operation units, which causes expansion of the area of the execution module and makes it difficult to perform layout and wiring in the execution module.
Disclosure of Invention
The embodiment of the application provides a processor, a data processing method and computer equipment, which can reduce the area of an execution module and reduce the difficulty of carrying out layout and wiring on the execution module. The technical scheme is as follows:
in one aspect, a processor is provided, the processor including a sub-processor including a memory and a plurality of execution modules interconnected, the execution modules including a plurality of operator modules and an interaction sub-module, the interaction sub-module being connected to the plurality of operator modules;
The memory is used for dividing the first data into K first sub-data, respectively transmitting the K first sub-data into the operation sub-modules, wherein K is the total number of the operation sub-modules in the sub-processor, and K is an integer greater than 1;
the operation sub-module is used for operating the first sub-data transmitted from the memory to obtain intermediate sub-data, and transmitting the intermediate sub-data to the interaction sub-module belonging to the same execution module;
the interaction sub-module is used for acquiring the intermediate sub-data transmitted by the operation sub-module, transmitting the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receiving the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processing the obtained plurality of intermediate sub-data to obtain second data.
In another aspect, a data processing method is provided, which is executed by a processor, the processor includes a sub-processor, the sub-processor includes a memory and a plurality of execution modules connected to each other, the execution modules include a plurality of operation sub-modules and an interaction sub-module, and the interaction sub-module is connected to the plurality of operation sub-modules; the method comprises the following steps:
The memory divides the first data into K first sub-data, the K first sub-data are respectively transmitted into the operation sub-modules, K is the total number of the operation sub-modules in the sub-processor, and K is an integer greater than 1;
the operation sub-module performs operation on the first sub-data transmitted from the memory to obtain intermediate sub-data, and transmits the intermediate sub-data to the interaction sub-module belonging to the same execution module;
the interaction sub-module acquires intermediate sub-data transmitted by the operation sub-module, transmits the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receives the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processes the acquired plurality of intermediate sub-data to acquire second data.
Optionally, the interaction sub-module further comprises or operates a unit; the method further comprises the steps of:
the first left input port transmits an input preset signal into the or operation unit, and the value of the preset signal is 0;
the first right input port transmits the intermediate sub-data transmitted from the second left output port to the or operation unit;
and the OR operation unit performs OR operation on the preset signal and the intermediate sub-data, and transmits the intermediate sub-data after OR operation to the splicing unit.
Optionally, the interaction sub-module further comprises a replication unit; the method further comprises the steps of:
the copying unit copies the intermediate sub-data transmitted by the operation sub-module to obtain two intermediate sub-data, one intermediate sub-data is transmitted to the first left output port, and the other intermediate sub-data is transmitted to the first right output port; wherein the first left output port is a null connection.
Optionally, the method further comprises:
the splicing unit acquires an indication signal, wherein the indication signal is used for indicating the target bit number required to be processed by an execution module where the interaction sub-module is located;
the splicing unit splices the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules to obtain intermediate data, and the method comprises the following steps:
and the splicing unit splices the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules based on the target bit number to obtain the intermediate data, so that the intermediate sub-data transmitted by the operation sub-module is positioned in the target bit number of the intermediate data.
Optionally, the second operation unit performs an operation on the intermediate data to obtain the second data, including:
the second operation unit responds to a shift instruction of the intermediate data, determines target shift times, and shifts the intermediate data for the first time to obtain third data;
the second operation unit obtains an indication signal, wherein the indication signal is used for indicating a target bit number required to be processed by an execution module where the interaction sub-module is located, and third sub-data on the target bit number is determined in the third data;
and the second operation unit shifts the third sub data until the current shift times reach the target shift times to obtain the second data.
Optionally, the interaction sub-module is located in a central area of the execution module, and a plurality of operation sub-modules in the execution module are distributed around the interaction sub-module.
In another aspect, a computer device is provided that includes a processor and a memory having at least one computer program stored therein, the at least one computer program being loaded and executed by the processor to perform operations as performed by the processor.
In another aspect, a computer readable storage medium having at least one computer program stored therein is provided, the at least one computer program being loaded and executed by a processor to implement operations performed by the processor as described above.
In another aspect, a computer program product is provided, comprising a computer program loaded and executed by a processor to implement operations performed by the processor as described above.
According to the scheme provided by the embodiment of the application, the sub-processor comprises a plurality of execution modules, each execution module comprises a plurality of operation sub-modules, which is equivalent to respectively deploying all operation sub-modules on the sub-processor onto different execution modules, and the memory divides the first data into a plurality of first sub-data which are respectively transmitted into the operation sub-modules of different execution modules for operation. And the interactive sub-module is also deployed on the execution module, intermediate sub-data obtained by the operation sub-modules on different execution modules can be concentrated in the interactive sub-module, and the interactive sub-module carries out overall processing on the intermediate sub-data obtained by all the operation sub-modules, so as to obtain second data. Because a plurality of operation submodules are scattered on different execution modules, the number of the operation submodules which are required to be laid out on one execution module is greatly reduced, the area of the execution module is reduced, and the difficulty in carrying out layout and wiring on the execution module is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a processor according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another processor according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a shift operation according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a sub-processor according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an execution module according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another execution module according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a further processor according to an embodiment of the present application;
FIG. 8 is a schematic diagram of another seed processor provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of an interaction sub-module according to an embodiment of the present application;
FIG. 10 is a flow chart of a data processing method according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first execution module may be referred to as a second execution module, and similarly, a second execution module may be referred to as a first execution module, without departing from the scope of the present application.
Wherein at least one refers to one or more than one, for example, at least one execution module may be any integer number of execution modules greater than or equal to one, such as one execution module, two execution modules, three execution modules, and the like. The plurality of execution modules may be two or more, and for example, the plurality of execution modules may be an integer number of two or more of any one of two execution modules, three execution modules, and the like. Each refers to each of at least one, for example, each execution module refers to each of a plurality of execution modules, and if the plurality of execution modules is 3 execution modules, each execution module refers to each of the 3 execution modules.
It should be noted that, information related to the present application (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals (including but not limited to signals transmitted between a user terminal and other devices, etc.) are fully authorized by the user or related aspects, and the collection, use, and processing of related data is required to comply with related laws and regulations and standards of related countries and regions.
The embodiment of the application provides a processor, which comprises a sub-processor, wherein the sub-processor comprises a memory and a plurality of execution modules connected with each other, the execution modules comprise a plurality of operation sub-modules and an interaction sub-module, and the interaction sub-module is connected with the plurality of operation sub-modules. In some embodiments, the processor is disposed in a computer device. Optionally, the computer device is a terminal or a server. Optionally, the server is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like. Optionally, the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like, but is not limited thereto.
Fig. 1 is a schematic structural diagram of a processor according to an embodiment of the present application. As shown in fig. 1, the processor includes a sub-processor including a memory 10 and a plurality of execution modules 20 connected to each other, the execution modules including a plurality of operator modules 201 and an interaction sub-module 202, the interaction sub-module 202 being connected to the plurality of operator modules 201.
The processor may be an AI (Artificial Intelligence ) processor or an AI chip, which is also called an AI accelerator or an AI computing card, and is a processor for processing large computing tasks in the field of artificial intelligence. The sub-processor may provide hardware support for a vector operator processor in the AI processor for single instruction multiple data operations (SIMD, single Instruction Multiple Datastream). Parallelism is the number of vector data that describes a computation that can be performed in one cycle. The higher the parallelism, the higher the computational power of the vector operator processor, but at the same time, the more hardware computing resources in the vector operator processor that need to be stacked may cause the area of the chip in the vector operator processor to expand dramatically.
The execution module 20 in the sub-processor is a Harden, which refers to the smallest module that the back-end process places. The memories 10 in the sub-processors are respectively interconnected with each execution module 20, any two execution modules 20 of the plurality of execution modules 20 are interconnected, and any two execution modules 20 are connected with each other through respective interaction sub-modules 202. Memory 10 may be a vector L1 memory. Each execution module 20 includes a plurality of operator sub-modules 201 and an interaction sub-module 202, the interaction sub-module 202 also being interconnected with each operator sub-module in the same execution module 20. The interaction sub-module 202 may be referred to as a CROSS, which is a component for performing data interaction between different modules.
In the related art, the sub-processor includes a plurality of operation sub-modules, and the plurality of operation sub-modules are integrated on one execution module, so that when the number of the plurality of operation sub-modules is large, the area of the execution module is rapidly expanded, and the difficulty of layout and wiring of the execution module is also increased. The embodiment of the application provides a scheme for splitting an execution module, which splits an execution module into a plurality of execution modules, each execution module comprises a plurality of operation sub-modules, the area of the execution module is saved, the complexity of the back-end flow for layout and wiring can be reduced, the back-end layout and wiring flow is simplified, and the resource occupation pressure is relieved.
Referring to fig. 1, the memory 10 is configured to divide the first data into K first sub-data, and respectively transmit the K first sub-data to the operation sub-modules 201, where K is the total number of the operation sub-modules 201 in the sub-processor, and K is an integer greater than 1. The total number of all the operation sub-modules 201 in the plurality of execution modules 20 in the sub-processor is K, the memory 10 divides the first data to be processed into K first sub-data, and then the K first sub-data are respectively transferred into the K operation sub-modules 201, that is, each operation sub-module 201 transfers into one first sub-data. For example, the number of processing bits of the sub-processor is 4096 bits, the number of processing bits of the operation sub-module 201 is 256 bits, the sub-processor includes 16 operation sub-modules 201, and the number of bits of the first data is 4096 bits, and then the memory 10 divides the 4096 bits of the first data into 16 256 bits of first sub-data, and each 256 bits of first sub-data is respectively transferred into each operation sub-module 201.
Referring to fig. 1, the operation sub-module 201 is configured to operate on first sub-data input into the memory 10 to obtain intermediate sub-data, and transmit the intermediate sub-data to the interaction sub-module 202 belonging to the same execution module 20. For any one of the operation sub-modules 201, after a first sub-data is transferred into the operation sub-module 201 in the memory 10, the operation sub-module 201 performs an operation on the first sub-data to obtain an intermediate sub-data, and then transfers the intermediate sub-data into the interaction sub-module 202 in the same execution module 20 as the operation sub-module 201.
Referring to fig. 1, the interaction sub-module 202 is configured to obtain intermediate sub-data input by the operation sub-module 201, transmit the intermediate sub-data to other interaction sub-modules 202 belonging to different execution modules, receive the intermediate sub-data input by the other interaction sub-modules 202 belonging to different execution modules, and process the obtained plurality of intermediate sub-data to obtain second data. For the interaction sub-module 202 in any execution module 20, the interaction sub-module 202 receives the intermediate sub-data transmitted by the operation sub-module 201 in the same execution module 20 and the intermediate sub-data transmitted by the interaction sub-module 202 in different execution modules 20, and the interaction sub-module 202 transmits the intermediate sub-data transmitted by the operation sub-module 201 to the interaction sub-module 202 in different execution modules 20, that is, the interaction sub-module 202 in each execution module 20 can acquire the intermediate sub-data provided by all operation sub-modules 201. The interaction sub-module 202 processes the received plurality of intermediate sub-data to obtain second data. Since the second data is obtained by processing the intermediate sub-data corresponding to each first sub-data in the first data, that is, the whole operation process of the first data is distributed to the plurality of operation sub-modules 201 to be executed, and finally the interaction sub-module 202 processes the intermediate sub-data obtained by the plurality of operation sub-modules 201 to obtain the second data, the second data can be regarded as the data obtained by operating the whole first data.
In summary, in the scheme provided by the embodiment of the present application, the sub-processor includes a plurality of execution modules, each execution module includes a plurality of operation sub-modules, which is equivalent to respectively disposing all operation sub-modules on the sub-processor onto different execution modules, and dividing the first data into a plurality of first sub-data by the memory and respectively transmitting the first sub-data into the operation sub-modules of different execution modules for operation. And the interactive sub-module is also deployed on the execution module, intermediate sub-data obtained by the operation sub-modules on different execution modules can be concentrated in the interactive sub-module, and the interactive sub-module carries out overall processing on the intermediate sub-data obtained by all the operation sub-modules, so as to obtain second data. Because a plurality of operation submodules are scattered on different execution modules, the number of the operation submodules which are required to be laid out on one execution module is greatly reduced, the area of the execution module is reduced, and the difficulty in carrying out layout and wiring on the execution module is reduced.
In some embodiments, referring to fig. 2, the operator module 201 includes a first operation unit 211 and a register 221. The register 221 is connected to the first arithmetic unit 211, and the memory 10 is connected to the register 221. The first operation unit 211 may be a vector operation unit (VALU, vector Arithmetic Logic Unit), and the register 221 may be a vector register file (VRF, vector Register File).
Referring to fig. 2, the memory 10 is configured to load each of the K first sub-data into a register 221 corresponding to the first sub-data. A register 221 for buffering the first sub-data. The first operation unit 211 is configured to read the first sub-data from the register 221, and operate on the first sub-data to obtain intermediate sub-data. After the first data is divided into K first sub-data by the memory 10, each first sub-data corresponds to one operation sub-module 201, and for any first sub-data, the memory 10 uses a Load instruction (data Load instruction) to Load the first sub-data into the register 221 of the corresponding operation sub-module 201, and the register 221 caches the first sub-data. The first arithmetic unit 211 in the same arithmetic sub-module 201 may read the first sub-data from the register 221 in the same arithmetic sub-module 201.
In the related art, one register is deployed in the sub-processor, and each operation sub-module reads data in the same register. In the embodiment of the present application, the registers are further split in a distributed manner, the number of the registers obtained by splitting is equal to the number of the first operation units in the sub-processor, each first operation unit corresponds to one register, each first operation unit and the corresponding register form an operation submodule, each first operation unit only needs to read data in the corresponding register, so that the registers and the first operation units can be tightly coupled, each first operation unit is closer to the corresponding register, and the time sequence pressure of the first operation unit for reading data or writing back data when executing the internal operation logic is facilitated to be reduced.
In one possible implementation, the first data has a length of M bits, the first sub data has a length of N bits, K is equal to a ratio of M to N, M and N are integers greater than 1, and N is less than M. The memory 10 is used for loading a plurality of first sub-data with consecutive bits into registers 221 of a plurality of operation sub-modules 201 in the same execution module 20, wherein the number of the plurality of first sub-data is equal to the number of the plurality of operation sub-modules 201 in the execution module 20. For example, the first data has a length of 4096 bits, K is equal to 16, the first sub-data has a length of 256 bits, the number of execution modules is two, and the memory 10 may transfer bits 1 to 2048 in the first data into the registers 221 of the plurality of operator modules 201 in the first execution module 20, and bits 2049 to 4096 in the first data into the registers 221 of the plurality of operator modules 201 in the second execution module 20.
In this embodiment, the number of bits of the first sub-data loaded into the registers 221 of the plurality of operation sub-modules 201 in the same execution module 20 is continuous, so as to ensure that the number of bits of the first sub-data processed by the plurality of operation sub-modules 201 in the same execution module 20 is continuous, thereby reducing the burden of data interaction.
In one possible implementation, the interaction sub-module 202 is further configured to obtain an indication signal, where the indication signal is used to indicate a target number of bits that needs to be processed by the execution module 20 where the interaction sub-module 202 is located, and determine, from the second data, second sub-data on the target number of bits; the interaction sub-module 202 is further configured to write back the second sub-data to the register 221 in the execution module 20 where the interaction sub-module 202 is located, if the second data is a processing result of the first data; in the case where the second data is not the processing result of the first data, the second sub-data is transferred to the first operation unit 211 in the execution module 20 where the interaction sub-module 202 is located.
Since the sub-processor includes a plurality of execution modules 20, each execution module 20 only needs to process one first sub-data of the first data, and the number of bits that each execution module 20 needs to process is different, it is necessary to use the indication signal to distinguish which bits each execution module needs to process. For example, the sub-processor includes 2 execution modules 20, the first data has 4096 bits, one execution module 20 has 1 st to 2048 th bits of processing, and the other execution module 20 has 2049 th to 4096 th bits of processing. In the case where the value of the instruction signal obtained by the interaction sub-module 202 is 0, the number of target bits indicating that the execution module 20 where the interaction sub-module is located needs to process is 1 st bit to 2048 th bit. In the case where the value of the instruction signal obtained by the interaction sub-module 202 is 1, the number of target bits indicating that the execution module 20 where the interaction sub-module is located needs to process is 2049 th bits to 4096 th bits.
After the interaction sub-module 202 obtains the second data, if the second data is the processing result of the first data, it indicates that the processing is completed, and no processing is needed for the second data, the interaction sub-module 202 writes back the second sub-data on the target bit number in the second data to the register 221 in the execution module 20 where the interaction sub-module 202 is located, the register 221 caches the second sub-data, and the subsequent memory 20 reads and stores the second sub-data in the register 221 through a Store instruction (data saving instruction). For example, the number of bits that the execution module 20 of the interaction sub-module 202 needs to process is 2049 th bit to 4096 th bit, and the interaction sub-module 202 writes back the second sub-data on 2049 th bit to 4096 th bit of the second data to the register 221. Optionally, the interaction sub-module 202 writes back the second sub-data to the register 221 in the execution module 20 where the interaction sub-module 202 is located, that is, the second sub-data is divided into a plurality of sub-data and then written back to the register 221 in each operation sub-module 201 in the execution module 20, and the number of bits of the second sub-data written back to each register 221 is the same as the number of bits of the first sub-data cached in the register 221. The second data is a result of the shift operation on the first data, and is a result of the processing of the first data.
After the interaction sub-module 202 obtains the second data, if the second data is not the processing result of the first data, it indicates that the processing of the second data is not completed, the interaction sub-module 202 writes back the second sub-data on the target bit number in the second data to the first operation unit 211 in the execution module 20 where the interaction sub-module 202 is located, the first operation unit 211 continues to operate the second sub-data, if the operation result does not need to interact with the operation results of other first operation units 211 again, the first operation unit 211 writes back the operation result to the register 221, and if the operation result needs to interact with the operation results of other first operation units 211 again, the first operation unit 211 transfers the operation result to the interaction sub-module 202. For example, the number of bits required to be processed by the execution module 20 where the interaction sub-module 202 is located is 2049 th bit to 4096 th bit, and the interaction sub-module 202 transfers the second sub-data on 2049 th bit to 4096 th bit of the second data to the first operation unit 211. Alternatively, the interaction sub-module 202 transmits the second sub-data to the first operation unit 211 in the execution module 20 where the interaction sub-module 202 is located, which means that the second sub-data is divided into a plurality of pieces and then transmitted to the first operation unit 211 in each operation sub-module 201 in the execution module 20, and the number of bits of the second sub-data transmitted to each first operation unit 211 is the same as the number of bits of the first sub-data transmitted to the first operation unit 211. The processing result in which the second data is not the first data is understood to be the final processing result in which the second data is not the request for processing the first data.
In this embodiment of the present application, after obtaining the second data, the interaction sub-module only transmits the sub-data on the target number of bits in the second data to the operation sub-module (transmitted to the first operation unit or the register), where the target number of bits is the number of bits that the execution module where the interaction sub-module is located is responsible for, so as to ensure that each operation sub-module is only responsible for the data on the specific number of bits, and avoid errors in the operation process.
In some embodiments, referring to fig. 2, the interaction sub-module 202 includes a stitching unit 212, a second arithmetic unit 222, an input port 232, and an output port 242. The input port 232 is used for receiving intermediate sub-data transmitted by other interaction sub-modules 202 belonging to different execution modules 20; the splicing unit 212 is configured to splice the intermediate sub-data input by the operation sub-module 201 and the intermediate sub-data input by the other interaction sub-modules 202 belonging to different execution modules 20 to obtain intermediate data; the second operation unit 222 is configured to perform an operation on the intermediate data to obtain second data; the output port 242 is used to transfer intermediate sub-data incoming from the operator sub-module 201 to other interaction sub-modules 202 belonging to different execution modules 20.
The other interaction sub-modules 202 and the interaction sub-modules 202 belong to different execution modules 20, and the other interaction sub-modules 202 are interaction sub-modules in different execution modules 20. The input port 232 of each interaction sub-module 202 is connected to the output port 242 of the other interaction sub-modules 202 belonging to different execution modules 20, and intermediate sub-data can be mutually transmitted between the respective interaction sub-modules 202 of different execution modules 20 through the connection between the input port 232 and the output port 242, that is, the interaction sub-module 202 of one execution module 20 can obtain the intermediate sub-data provided by the interaction sub-module 202 of another execution module 20.
In one possible implementation, the splicing unit in the interaction sub-module 202 further obtains an indication signal, where the indication signal is used to indicate the target number of bits that need to be processed by the execution module 20 where the interaction sub-module 202 is located. The splicing unit splices the intermediate sub-data input by the operation sub-module 201 and the intermediate sub-data input by the other interaction sub-modules 202 belonging to different execution modules 20 based on the target bit number, so as to obtain intermediate data, so that the intermediate sub-data input by the operation sub-module 201 is located in the target bit number of the intermediate data.
Since the number of bits in charge of each execution module 20 is different, when splicing the intermediate sub-data provided by the operation sub-modules 201 of the plurality of execution modules 20, the number of bits of each intermediate sub-data in the first data needs to be considered. It is therefore necessary to use the indication signal to distinguish which bits each execution module 20 needs to process. For example, the sub-processor includes 2 execution modules 20, the first data has 4096 bits, one execution module 20 has 1 st to 2048 th bits of processing, and the other execution module 20 has 2049 th to 4096 th bits of processing. The intermediate sub-data provided by the current execution module 20 acquired by the splicing unit is intermediate sub-data a, and the intermediate sub-data provided by the other execution module 20 is intermediate sub-data b. Then, in the case that the value of the indication signal obtained by the splicing unit is 0, the number of bits of the target required to be processed by the execution module 20 where the interaction sub-module is located is 1 st bit to 2048 th bit, so that the intermediate sub-data a is 1 st bit to 2048 th bit, and the intermediate sub-data b is 2049 th bit to 4096 th bit, and thus the spliced intermediate data is { intermediate sub-data a, intermediate sub-data b }. In the case that the value of the instruction signal obtained by the interaction sub-module 202 is 1, the number of target bits required to be processed by the execution module 20 where the interaction sub-module is located is 2049 th bit to 4096 th bit, so that the intermediate sub-data a is 2049 th bit to 4096 th bit, and the intermediate sub-data b is 1 st bit to 2048 th bit, and thus the spliced intermediate data is { intermediate sub-data b, intermediate sub-data a }.
In one possible implementation, the second operation unit 222 determines the target shift number in response to the shift instruction of the intermediate data, and performs the first shift on the intermediate data to obtain the third data. The second operation unit 222 obtains an indication signal, where the indication signal is used to indicate a target number of bits to be processed by the execution module 20 where the interaction sub-module 202 is located, determines third sub-data on the target number of bits in the third data, and shifts the third sub-data until the current shift number reaches the target shift number, so as to obtain second data.
Since the interaction sub-module 202 only needs to input the data of the number of bits for which the current execution module 20 is responsible to the operation sub-module 201, in order to reduce the operation amount of the interaction sub-module 202, in the second operation unit 222 of the interaction sub-module 202, only the data of the target number of bits for which the current execution module 20 is responsible may be processed, thereby reducing the operation amount of the shift operation and being beneficial to improving the processing efficiency.
Taking a shift instruction as an example, the shift operation is performed on intermediate data in bytes, and the shift information is a 9-bit array indicating whether or not the shift is performed at this time. As shown in fig. 3, (1) in fig. 3 is a shift process provided by the related art, and (2) in fig. 3 is a shift process provided by the embodiment of the present application. Referring to (1) in fig. 3, in the related art, according to the 9-bit shift information, 9 shift operations are performed respectively to obtain shift data 8, and the shift data 8 is second data obtained by shifting intermediate data, and then second sub-data is selected from the shift data 8 according to the target bit number indicated by the indication signal. Because the complete shift data is required to be shifted each time, the operation amount is large and the processing efficiency is low. Referring to fig. 3 (2), in the embodiment of the present application, a first shift operation is performed according to the shift information [8] of the 9 th bit in the shift information to obtain shift data 0, and then shift sub-data 0 is selected from the shift data 0 according to the target bit number indicated by the indication signal, where the shift data 0 is the third data, and the shift sub-data 0 is the third sub-data. And then, according to the shift information of the rest 8 in the shift information, performing 8 shift operations on the third sub data to obtain shift sub data 8, wherein the shift sub data 8 is the second data. Wherein the second data is identical to the second sub-data of (1), so that no further screening of the second data is required. The rest shift operation is only to shift the data on the bit number which is responsible for the current execution module except the first shift operation, so that the operation amount of the shift operation is reduced, and the processing efficiency is improved.
In one possible implementation, the plurality of execution modules 20 in the sub-processor include a first execution module 20a and a second execution module 20b, the first execution module 20a including an interaction sub-module 202a, and the second execution module 20b including an interaction sub-module 202b. In order to reduce the difficulty of layout and wiring, the first execution module 20a and the second execution module 20b are arranged in a left-right mirror image manner, referring to fig. 4, the input port 232a and the output port 242a in the interaction sub-module 202a are located on the right side, the input port 232a is located above the output port 242a, the input port 232b and the output port 242b in the interaction sub-module 202b are located on the left side, and the input port 232b is located above the output port 242 b. Then, there is a crossover in the wiring between the input port 232a and the output port 242b, and the wiring between the output port 242a and the input port 232 b.
In some embodiments, the interaction sub-module 202 is located in a central region of the execution module 20, and the plurality of operation sub-modules 201 in the execution module 20 are distributed around the interaction sub-module 202.
Optionally, the sub-processor includes 2 execution modules, and the number of operation sub-modules in the sub-processor is 16, and then each execution module includes an interaction sub-module and 8 operation sub-modules. Referring to fig. 5 and 6, the execution module includes an operator module 0-operator module 7 and an interaction sub-module, the interaction sub-module is located in a central area of the execution module, and the operator modules 0-operator modules 7 are distributed around the interaction sub-module.
In the embodiment of the application, since each operation submodule needs to perform data interaction with the interaction submodule, the interaction submodule is laid out in the central area of the execution module, the operation submodule is laid out around the interaction submodule, the distance between the operation submodule and the interaction submodule is reduced as much as possible, and the wiring pressure of wiring between the operation submodule and the interaction submodule is reduced.
And, each register in the operator module includes a load port for loading data in the memory into the register and a store port for storing data in the register into the memory. When the register is laid out, the load port and the store port in the register are closely adjacent, so that the wiring pressure of the load port and the store port is reduced.
The embodiment of the application provides a scheme for splitting an execution module, and when the execution module is split, one problem to be considered is how many execution modules are split. Wherein the actual number of the minimum units in one execution module is between 8000k and 9000k and the desired number of the minimum units in one execution module is between 4000k and 5000k without splitting the execution module, so that the execution module can be split into 2 to satisfy the desired number of the minimum units in one execution module.
Fig. 7 is a schematic structural diagram of another processor according to an embodiment of the present application. As shown in fig. 7, the processor includes a sub-processor including a memory 10 and first and second execution modules 20a and 20b connected to each other, the first execution module 20a including a plurality of operator sub-modules 201a and an interaction sub-module 202a, the interaction sub-module 202a being connected to the plurality of operator sub-modules 201 a. The second execution module 20b includes a plurality of operator sub-modules 201b and an interaction sub-module 202b, the interaction sub-module 202b being coupled to the plurality of operator sub-modules 201 b.
The interaction sub-module 202a includes a splicing unit, a second operation unit, a first input port, and a second output port. The interaction submodule 202b includes a splicing unit, a second operation unit, a second input port and a second output port. The first input port is connected with the second output port, and the second input port is connected with the first output port.
The memory 10 divides the first data into K first sub-data, and transmits the K first sub-data to the operation sub-modules, respectively, where K is the total number of operation sub-modules in the sub-processor, and K is an integer greater than 1.
For the first execution module 20a, the operation sub-module 201a in the first execution module 20a performs an operation on the first sub-data input from the memory 10 to obtain intermediate sub-data, and the intermediate sub-data is input to the interaction sub-module 202a in the first execution module 20 a. The first input port of the interaction sub-module 202a receives the intermediate sub-data transferred by the interaction sub-module 202b of the second execution module 20b through a connection between the first input port and the second output port. The splicing unit in the interaction sub-module 202a splices the intermediate sub-data input by the operation sub-module 201a and the intermediate sub-data input by the interaction sub-module 202b to obtain intermediate data. The second operation unit in the interaction sub-module 202a performs an operation on the intermediate data to obtain second data. A first output port in the interaction sub-module 202a passes intermediate sub-data of the operation sub-module 201a to the interaction sub-module 202b in the second execution module 20b through a connection between the first output port and the second input port.
For the second execution module 20b, the operation sub-module 201b in the second execution module 20b performs an operation on the first sub-data input from the memory 10 to obtain intermediate sub-data, and the intermediate sub-data is input to the interaction sub-module 202b in the second execution module 20 b. The second input port of the interaction sub-module 202b receives the intermediate sub-data transferred by the interaction sub-module 202a of the first execution module 20a through a connection between the second input port and the first output port. The splicing unit in the interaction sub-module 202b splices the intermediate sub-data input by the operation sub-module 201b and the intermediate sub-data input by the interaction sub-module 202a to obtain intermediate data. The second operation unit in the interaction sub-module 202b performs an operation on the intermediate data to obtain second data. The second output port in the interaction sub-module 202b passes the intermediate sub-data of the operation sub-module 201b to the interaction sub-module 202a in the first execution module 20a through a connection between the second output port and the first input port.
In one possible implementation, referring to fig. 8, the first execution module 20a and the second execution module 20b have the same structure, and the first input port includes a first left input port and a first right input port, the first output port includes a first left output port and a first right output port, the first left output port is located below the first left input port, and the first right input port is located below the first right output port. The second input port comprises a second left input port and a second right input port, the second output port comprises a second left output port and a second right output port, the second left output port is positioned below the second left input port, and the second right input port is positioned below the second right output port; the first right input port is connected with the second left output port, and the second left input port is connected with the first right output port.
For the interaction sub-module 202a in the first execution module 20a, the first right input port is configured to receive the intermediate sub-data transmitted from the second left output port in the second execution module 20b through a connection between the first right input port and the second left output port; the first right output port is used to transfer the intermediate sub-data of the operation sub-module 201a received by the interaction sub-module 202a to the second left input port in the second execution module 20b through the connection between the second left input terminal and the first right output port.
For the interaction sub-module 202b in the second execution module 20b, the second left input port is configured to receive the intermediate sub-data input from the first right output port in the first execution module 20a through a connection between the second left input port and the first right output port; the second left output port is used for transmitting the intermediate sub-data of the operation sub-module 201a received by the interaction sub-module 202b to the first right input port in the first execution module 20a through the connection between the first right input terminal and the second left output port.
In one possible implementation, the interaction sub-module 202a and the interaction sub-module 202b further comprise or operate units.
Referring to fig. 8 and 9, for the interaction sub-module 202a in the first execution module 20a, the first left input port transmits the input preset signal to the or operation unit, and the value of the preset signal is 0. The first right input port receives the intermediate sub-data transmitted from the second left output port in the second execution module 20b, and transmits the intermediate sub-data transmitted from the second left output port to the operation unit. Or the operation unit performs OR operation on the preset signal and the transmitted intermediate sub-data, and the intermediate sub-data after OR operation is transmitted into the splicing unit. The value of the preset signal is 0, so that the obtained data is the intermediate sub-data after the or operation of the preset signal and the incoming intermediate sub-data by the operation unit.
Referring to fig. 8 and 9, for the interaction sub-module 202b in the second execution module 20b, the second right input port transmits the input preset signal to the or operation unit, and the value of the preset signal is 0. The second left input port receives the intermediate sub-data transmitted from the first right output port in the first execution module 20a, and transmits the intermediate sub-data transmitted from the first right output port to the operation unit. Or the operation unit performs OR operation on the preset signal and the transmitted intermediate sub-data, and the intermediate sub-data after OR operation is transmitted into the splicing unit. The value of the preset signal is 0, so that the obtained data is the intermediate sub-data after the or operation of the preset signal and the incoming intermediate sub-data by the operation unit.
In one possible implementation, the interaction sub-module 202a and the interaction sub-module 202b further comprise a replication unit.
Referring to fig. 8 and 9, for the interaction sub-module 202a in the first execution module 20a, the copying unit copies the intermediate sub-data input from the operation sub-module 201a to obtain two intermediate sub-data, one intermediate sub-data is input to the first left output port, and the other intermediate sub-data is input to the first right output port. The first left output port is a null connection, that is, the intermediate sub-data is not output to other components, and the first right output port transmits the intermediate sub-data to the second left input port.
Referring to fig. 8 and 9, for the interaction sub-module 202b in the second execution module 20b, the copying unit copies the intermediate sub-data input from the operation sub-module 201b to obtain two intermediate sub-data, one intermediate sub-data is input to the second right output port, and the other intermediate sub-data is input to the second left output port. The first left output port is in idle connection. I.e. the intermediate sub-data is not output to the other components, the second left output port passes the intermediate sub-data to the first right input port.
In this embodiment, as shown in fig. 8 and fig. 9, although a set of redundant input ports and output ports are added in the execution modules, structural consistency of the first execution module and the second execution module is maintained, and data interaction ports between the first execution module and the second execution module are aligned, so that connection is simple, wiring crossover is not caused, the first execution module and the second execution module can be placed back to back and then directly connected, consumption of a channel occupied by the wiring on the area is avoided, and complexity of layout and wiring is further reduced.
According to the processor provided by the embodiment of the application, the execution module is split into the first execution module and the second execution module which are identical in structure, so that the complexity of the back-end comprehensive flow layout wiring can be reduced. In addition, because the first execution module and the second execution module have the same structure, the first execution module and the second execution module can be obtained by twice instantiation by using one Harden, the flow of layout and wiring is further simplified, and the efficiency of the layout and wiring is improved.
Fig. 10 is a flowchart of a data processing method provided in an embodiment of the present application, where the embodiment of the present application is executed by a processor, and the processor includes a sub-processor, where the sub-processor includes a memory and a plurality of execution modules connected to each other, and the execution modules include a plurality of operation sub-modules and an interaction sub-module, where the interaction sub-module is connected to the plurality of operation sub-modules. Wherein the processor may be an AI processor or an AI chip, etc. Referring to fig. 10, the method includes:
1001. the memory divides the first data into K first sub-data, the K first sub-data are respectively transmitted into the operation sub-modules, K is the total number of the operation sub-modules in the sub-processor, and K is an integer greater than 1.
1002. The operation sub-module is used for operating the first sub-data transmitted from the memory to obtain intermediate sub-data, and transmitting the intermediate sub-data to the interaction sub-module belonging to the same execution module.
1003. The interaction sub-module acquires intermediate sub-data transmitted by the operation sub-module, transmits the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receives the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processes the acquired plurality of intermediate sub-data to acquire second data.
According to the method provided by the embodiment of the application, the sub-processor comprises a plurality of execution modules, each execution module comprises a plurality of operation sub-modules, which is equivalent to respectively deploying all operation sub-modules on the sub-processor onto different execution modules, and the memory divides the first data into a plurality of first sub-data which are respectively transmitted into the operation sub-modules of different execution modules for operation. And the interactive sub-module is also deployed on the execution module, intermediate sub-data obtained by the operation sub-modules on different execution modules can be concentrated in the interactive sub-module, and the interactive sub-module carries out overall processing on the intermediate sub-data obtained by all the operation sub-modules, so as to obtain second data. Because a plurality of operation submodules are scattered on different execution modules, the number of the operation submodules which are required to be laid out on one execution module is greatly reduced, the area of the execution module is reduced, and the difficulty in carrying out layout and wiring on the execution module is reduced.
In one possible implementation, the operator module includes a first operation unit and a register, the register being coupled to the first operation unit and the memory being coupled to the register. The memory divides the first data into K first sub-data, and respectively transfers the K first sub-data to the operation sub-module, including: and the memory loads each first sub data in the K first sub data to a register corresponding to the first sub data. The operation sub-module performs operation on the first sub-data transmitted into the memory to obtain intermediate sub-data, and the operation sub-module comprises: the register caches the first sub-data; the first operation unit reads the first sub-data from the register, and performs operation on the first sub-data to obtain intermediate sub-data.
In one possible implementation, the first data has a length of M bits, the first sub data has a length of N bits, K is equal to a ratio of M to N, M and N are integers greater than 1, and N is less than M. The memory loads each first sub data in the K first sub data to a register corresponding to the first sub data, and the memory comprises: the memory loads a plurality of first sub-data with continuous bit numbers into registers of a plurality of operation sub-modules in the same execution module respectively, wherein the number of the plurality of first sub-data is equal to the number of the plurality of operation sub-modules in the execution module.
In one possible implementation, the method further includes: the interaction sub-module acquires an indication signal, wherein the indication signal is used for indicating a target bit number required to be processed by an execution module where the interaction sub-module is located, and second sub-data on the target bit number is determined in the second data; the interaction sub-module writes back the second sub-data to a register in an execution module where the interaction sub-module is located when the second data is a processing result of the first data; and transmitting the second sub-data to a first operation unit in an execution module where the interaction sub-module is located when the second data is not a processing result of the first data.
In one possible implementation, the interaction submodule includes a splicing unit, a second operation unit, an input port and an output port. The interaction sub-module obtains intermediate sub-data transmitted by the operation sub-module, transmits the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receives the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processes the obtained plurality of intermediate sub-data to obtain second data, wherein the method comprises the following steps: the input port receives intermediate sub-data transmitted by other interaction sub-modules in different execution modules; the splicing unit splices the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules to obtain intermediate data; the second operation unit performs operation on the intermediate data to obtain second data; the output port transmits the intermediate sub-data transmitted by the operation sub-module to other interaction sub-modules belonging to different execution modules.
In one possible implementation, the plurality of execution modules includes a first execution module and a second execution module; the interaction sub-module in the first execution module comprises a first input port and a first output port, the interaction sub-module in the second execution module comprises a second input port and a second output port, the first input port is connected with the second output port, and the second input port is connected with the first output port. The input port receives intermediate sub-data transmitted by other interaction sub-modules in different execution modules, and the input port comprises: the first input port receives intermediate sub-data transmitted by the interaction sub-module in the second execution module through connection between the first input port and the second output port. The output port transmits the intermediate sub-data transmitted by the operation sub-module to other interaction sub-modules belonging to different execution modules, and the method comprises the following steps: the first output port communicates the intermediate sub-data to the interaction sub-module in the second execution module via a connection between the first output port and the second input port.
In one possible implementation manner, the first execution module and the second execution module have the same structure, the first input port includes a first left input port and a first right input port, the first output port includes a first left output port and a first right output port, the first left output port is located below the first left input port, and the first right input port is located below the first right output port; the second input port comprises a second left input port and a second right input port, the second output port comprises a second left output port and a second right output port, the second left output port is positioned below the second left input port, and the second right input port is positioned below the second right output port; the first right input port is connected with the second left output port, and the second left input port is connected with the first right output port; the first input port receives intermediate sub-data transmitted by an interaction sub-module in the second execution module through connection between the first input port and the second output port, and the method comprises the following steps: the first right input port receives intermediate sub-data transmitted from the second left output port in the second execution module through connection between the first right input port and the second left output port. The first output port is connected with the second input port through the first output port, and transmits the intermediate sub-data to an interaction sub-module in the second execution module, and the interaction sub-module comprises: the first right output port communicates intermediate sub-data to a second left input port in the second execution module via a connection between the second left input port and the first right output port.
In one possible implementation, the interaction sub-module further comprises or operates a unit. The method further comprises the steps of: the first left input port transmits an input preset signal into or operates the unit, and the value of the preset signal is 0; the first right input port transmits the intermediate sub-data transmitted from the second left output port to the or the operation unit; or the operation unit performs OR operation on the preset signal and the intermediate sub-data, and the intermediate sub-data after OR operation is transmitted into the splicing unit.
In one possible implementation, the interaction sub-module further comprises a replication unit. The method further comprises the steps of: the copying unit copies the intermediate sub-data transmitted by the operation sub-module to obtain two intermediate sub-data, one intermediate sub-data is transmitted to the first left output port, and the other intermediate sub-data is transmitted to the first right output port; the first left output port is in idle connection.
In one possible implementation, the method further includes: the splicing unit acquires an indication signal, wherein the indication signal is used for indicating the target bit number of the execution module where the interaction sub-module is located. The splicing unit splices the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules to obtain intermediate data, and the method comprises the following steps: the splicing unit splices the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules based on the target bit number to obtain intermediate data, so that the intermediate sub-data transmitted by the operation sub-module is positioned in the target bit number of the intermediate data.
In one possible implementation manner, the second operation unit performs an operation on the intermediate data to obtain second data, including: the second operation unit responds to a shift instruction of the intermediate data, determines the target shift times, and carries out first shift on the intermediate data to obtain third data; the second operation unit acquires an indication signal, wherein the indication signal is used for indicating a target bit number required to be processed by an execution module where the interaction sub-module is located, and third sub-data on the target bit number is determined in the third data; and the second operation unit shifts the third sub data until the current shift times reach the target shift times to obtain second data.
In one possible implementation, the interaction sub-module is located in a central region of the execution module, and a plurality of operation sub-modules in the execution module are distributed around the interaction sub-module.
It should be noted that, the data processing method provided in the embodiment of the present application and the processor provided in the foregoing embodiment belong to the same inventive concept, and the specific implementation manner of the data processing method provided in the embodiment of the present application may refer to the embodiment of the foregoing processor, and the embodiments of the present application are not repeated herein.
The present application also provides a computer device including a processor and a memory, where at least one computer program is stored in the memory, where the at least one computer program is loaded and executed by the processor to implement the operations performed in the processor of the above embodiments.
Optionally, the computer device is provided as a terminal. Fig. 11 shows a schematic structural diagram of a terminal 1100 according to an exemplary embodiment of the present application. The terminal 1100 includes: a processor 1101 and a memory 1102.
The processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processing interactor) for responsible for rendering and rendering of content required to be displayed by the display screen. In some embodiments, the processor 1101 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one computer program for being possessed by processor 1101 to implement the various embodiments described above.
In some embodiments, the terminal 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Optionally, the peripheral device comprises: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, and a power supply 1108.
A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 may communicate with other devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1104 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited in this application.
The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1105 may be one and disposed on the front panel of the terminal 1100; in other embodiments, the display 1105 may be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in other embodiments, the display 1105 may be a flexible display disposed on a curved surface or a folded surface of the terminal 1100. Even more, the display 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1105 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. The front camera is disposed on the front panel of the terminal 1100, and the rear camera is disposed on the rear surface of the terminal 1100. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal 1100, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 may also include a headphone jack.
A power supply 1108 is used to power the various components in terminal 1100. The power supply 1108 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 1108 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the structure shown in fig. 11 is not limiting and that terminal 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.
Optionally, the computer device is provided as a server. Fig. 12 is a schematic structural diagram of a server provided in the embodiment of the present application, where the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1201 and one or more memories 1202, where at least one computer program is stored in the memories 1202, and the at least one computer program is loaded and executed by the processors 1201 to implement the foregoing embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
The present application also provides a computer readable storage medium having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to implement the operations performed by the processor of the above embodiments.
Embodiments of the present application also provide a computer program product comprising a computer program loaded and executed by a processor to implement the operations performed by the processor of the embodiments described above.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the embodiments is merely an optional embodiment and is not intended to limit the embodiments, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the embodiments of the present application are intended to be included in the scope of the present application.

Claims (20)

1. A processor, wherein the processor comprises a sub-processor, the sub-processor comprises a memory and a plurality of execution modules connected with each other, the execution modules comprise a plurality of operation sub-modules and an interaction sub-module, and the interaction sub-module is connected with the plurality of operation sub-modules;
The memory is used for dividing the first data into K first sub-data, respectively transmitting the K first sub-data into the operation sub-modules, wherein K is the total number of the operation sub-modules in the sub-processor, and K is an integer greater than 1;
the operation sub-module is used for operating the first sub-data transmitted from the memory to obtain intermediate sub-data, and transmitting the intermediate sub-data to the interaction sub-module belonging to the same execution module;
the interaction sub-module is used for acquiring the intermediate sub-data transmitted by the operation sub-module, transmitting the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receiving the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processing the obtained plurality of intermediate sub-data to obtain second data.
2. The processor of claim 1, wherein the operator module comprises a first arithmetic unit and a register, the register coupled to the first arithmetic unit, the memory coupled to the register;
the memory is used for respectively loading each first sub data in the K first sub data to a register corresponding to the first sub data;
The register is used for caching the first sub data;
the first operation unit is configured to read the first sub-data from the register, and operate on the first sub-data to obtain the intermediate sub-data.
3. The processor of claim 2, wherein the first data is M bits in length, the first sub-data is N bits in length, K is equal to a ratio of M to N, M and N are integers greater than 1, and N is less than M;
the memory is used for respectively loading a plurality of first sub-data with continuous digits into registers of a plurality of operation sub-modules in the same execution module, and the number of the plurality of first sub-data is equal to the number of the plurality of operation sub-modules in the execution module.
4. The processor of claim 2, wherein the interaction sub-module is further configured to obtain an indication signal, where the indication signal is used to indicate a target number of bits that the execution module where the interaction sub-module is located needs to process, and determine second sub-data on the target number of bits from the second data;
the interaction sub-module is further configured to write back the second sub-data to a register in an execution module where the interaction sub-module is located, where the second data is a processing result of the first data; and under the condition that the second data is not the processing result of the first data, transmitting the second sub-data to a first operation unit in an execution module where the interaction sub-module is located.
5. The processor of claim 1, wherein the interaction submodule includes a stitching unit, a second arithmetic unit, an input port, and an output port;
the input port is used for receiving intermediate sub-data transmitted by other interaction sub-modules in different execution modules;
the splicing unit is used for splicing the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules to obtain intermediate data;
the second operation unit is used for performing operation on the intermediate data to obtain second data;
and the output port is used for transmitting the intermediate sub-data transmitted by the operation sub-module to other interaction sub-modules belonging to different execution modules.
6. The processor of claim 5, wherein the plurality of execution modules comprises a first execution module and a second execution module; the interaction sub-module in the first execution module comprises a first input port and a first output port, the interaction sub-module in the second execution module comprises a second input port and a second output port, the first input port is connected with the second output port, and the second input port is connected with the first output port;
The first input port is configured to receive intermediate sub-data that is transmitted by an interaction sub-module in the second execution module through connection between the first input port and the second output port;
the first output port is configured to transmit the intermediate sub-data to an interaction sub-module in the second execution module through a connection between the first output port and the second input port.
7. The processor of claim 6, wherein the first execution module and the second execution module are identical in structure, the first input port comprising a first left input port and a first right input port, the first output port comprising a first left output port and a first right output port, the first left output port being located below the first left input port, the first right input port being located below the first right output port; the second input port comprises a second left input port and a second right input port, the second output port comprises a second left output port and a second right output port, the second left output port is positioned below the second left input port, and the second right input port is positioned below the second right output port; the first right input port is connected with the second left output port, and the second left input port is connected with the first right output port;
The first right input port is configured to receive intermediate sub-data that is transmitted from a second left output port in the second execution module through a connection between the first right input port and the second left output port;
the first right output port is configured to transfer the intermediate sub-data into a second left input port in the second execution module through a connection between the second left input port and the first right output port.
8. The processor of claim 7, wherein the interaction sub-module further comprises or an operation unit;
the first left input port is used for transmitting an input preset signal into the or operation unit, and the value of the preset signal is 0;
the first right input port is used for transmitting the intermediate sub-data transmitted from the second left output port to the or operation unit;
and the OR operation unit is used for carrying out OR operation on the preset signal and the intermediate sub-data, and transmitting the intermediate sub-data after OR operation into the splicing unit.
9. The processor of claim 7, wherein the interaction sub-module further comprises a replication unit;
the copying unit is used for copying the intermediate sub-data transmitted by the operation sub-module to obtain two intermediate sub-data, transmitting one intermediate sub-data into the first left output port, and transmitting the other intermediate sub-data into the first right output port; wherein the first left output port is a null connection.
10. The processor of claim 5, wherein the stitching unit is further configured to obtain an indication signal, where the indication signal is used to indicate a target number of bits to be processed by an execution module where the interaction sub-module is located;
the splicing unit is used for splicing the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules based on the target bit number to obtain the intermediate data, so that the intermediate sub-data transmitted by the operation sub-module is positioned in the target bit number of the intermediate data.
11. The processor according to claim 5, wherein the second operation unit is configured to determine a target shift number in response to a shift instruction for the intermediate data, and perform a first shift on the intermediate data to obtain third data;
the second operation unit is configured to obtain an indication signal, where the indication signal is used to indicate a target bit number of processing required by an execution module where the interaction sub-module is located, and determine third sub-data on the target bit number in the third data;
and the second operation unit is used for shifting the third sub-data until the current shifting times reach the target shifting times, so as to obtain the second data.
12. The processor of claim 1, wherein the interaction sub-module is located in a central region of the execution module, and wherein a plurality of operation sub-modules in the execution module are distributed around the interaction sub-module.
13. A data processing method, characterized in that the method is executed by a processor, the processor comprises a sub-processor, the sub-processor comprises a memory and a plurality of execution modules connected with each other, the execution modules comprise a plurality of operation sub-modules and an interaction sub-module, and the interaction sub-module is connected with the plurality of operation sub-modules; the method comprises the following steps:
the memory divides the first data into K first sub-data, the K first sub-data are respectively transmitted into the operation sub-modules, K is the total number of the operation sub-modules in the sub-processor, and K is an integer greater than 1;
the operation sub-module performs operation on the first sub-data transmitted from the memory to obtain intermediate sub-data, and transmits the intermediate sub-data to the interaction sub-module belonging to the same execution module;
the interaction sub-module acquires intermediate sub-data transmitted by the operation sub-module, transmits the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receives the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processes the acquired plurality of intermediate sub-data to acquire second data.
14. The method of claim 13, wherein the operator module comprises a first arithmetic unit and a register, the register coupled to the first arithmetic unit, the memory coupled to the register;
the memory divides the first data into K first sub-data, and respectively transfers the K first sub-data to the operation sub-module, including:
the memory loads each first sub data in the K first sub data to a register corresponding to the first sub data respectively;
the operation sub-module performs operation on the first sub-data transmitted by the memory to obtain intermediate sub-data, and the operation sub-module comprises:
the register caches the first sub-data;
the first operation unit reads the first sub-data from the register, and performs operation on the first sub-data to obtain the intermediate sub-data.
15. The method of claim 14, wherein the first data has a length of M bits, the first sub data has a length of N bits, K is equal to a ratio of M to N, M and N are integers greater than 1, and N is less than M;
the memory loads each first sub data in the K first sub data to a register corresponding to the first sub data, and the memory comprises:
The memory loads a plurality of first sub-data with continuous bit numbers into registers of a plurality of operation sub-modules in the same execution module respectively, and the number of the plurality of first sub-data is equal to the number of the plurality of operation sub-modules in the execution module.
16. The method of claim 14, wherein the method further comprises:
the interaction sub-module acquires an indication signal, wherein the indication signal is used for indicating a target bit number required to be processed by an execution module where the interaction sub-module is located, and second sub-data on the target bit number is determined in the second data;
the interaction sub-module writes back the second sub-data to a register in an execution module where the interaction sub-module is located when the second data is a processing result of the first data; and under the condition that the second data is not the processing result of the first data, transmitting the second sub-data to a first operation unit in an execution module where the interaction sub-module is located.
17. The method of claim 13, wherein the interaction submodule includes a stitching unit, a second arithmetic unit, an input port, and an output port; the interaction sub-module obtains intermediate sub-data transmitted by the operation sub-module, transmits the intermediate sub-data to other interaction sub-modules belonging to different execution modules, receives the intermediate sub-data transmitted by the other interaction sub-modules belonging to different execution modules, and processes the obtained plurality of intermediate sub-data to obtain second data, and the method comprises the following steps:
The input port receives intermediate sub-data transmitted by other interaction sub-modules in different execution modules;
the splicing unit splices the intermediate sub-data transmitted by the operation sub-module and the intermediate sub-data transmitted by other interaction sub-modules belonging to different execution modules to obtain intermediate data;
the second operation unit performs operation on the intermediate data to obtain second data;
and the output port transmits the intermediate sub-data transmitted by the operation sub-module to other interaction sub-modules belonging to different execution modules.
18. The method of claim 17, wherein the plurality of execution modules includes a first execution module and a second execution module; the interaction sub-module in the first execution module comprises a first input port and a first output port, the interaction sub-module in the second execution module comprises a second input port and a second output port, the first input port is connected with the second output port, and the second input port is connected with the first output port;
the input port receives intermediate sub-data transmitted by other interaction sub-modules in different execution modules, and the input port comprises:
The first input port receives intermediate sub-data transmitted by an interaction sub-module in the second execution module through connection between the first input port and the second output port;
the output port transmits the intermediate sub-data transmitted by the operation sub-module to other interaction sub-modules belonging to different execution modules, and the method comprises the following steps:
the first output port transmits the intermediate sub-data to an interaction sub-module in the second execution module through connection between the first output port and the second input port.
19. The method of claim 18, wherein the first execution module and the second execution module are identical in structure, the first input port comprising a first left input port and a first right input port, the first output port comprising a first left output port and a first right output port, the first left output port being located below the first left input port, the first right input port being located below the first right output port; the second input port comprises a second left input port and a second right input port, the second output port comprises a second left output port and a second right output port, the second left output port is positioned below the second left input port, and the second right input port is positioned below the second right output port; the first right input port is connected with the second left output port, and the second left input port is connected with the first right output port;
The first input port receives intermediate sub-data transmitted by an interaction sub-module in the second execution module through connection between the first input port and the second output port, and the method comprises the following steps:
the first right input port receives intermediate sub-data transmitted by a second left output port in the second execution module through the connection between the first right input port and the second left output port;
the first output port transmits the intermediate sub-data to an interaction sub-module in the second execution module through the connection between the first output port and the second input port, and the interaction sub-module comprises:
the first right output port transmits the intermediate sub-data to a second left input port in the second execution module through the connection between the second left input port and the first right output port.
20. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one computer program that is loaded and executed by the processor to implement the operations performed by the processor of any of claims 1-12.
CN202311315788.0A 2023-10-11 2023-10-11 Processor, data processing method and computer equipment Pending CN117349222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311315788.0A CN117349222A (en) 2023-10-11 2023-10-11 Processor, data processing method and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311315788.0A CN117349222A (en) 2023-10-11 2023-10-11 Processor, data processing method and computer equipment

Publications (1)

Publication Number Publication Date
CN117349222A true CN117349222A (en) 2024-01-05

Family

ID=89367554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311315788.0A Pending CN117349222A (en) 2023-10-11 2023-10-11 Processor, data processing method and computer equipment

Country Status (1)

Country Link
CN (1) CN117349222A (en)

Similar Documents

Publication Publication Date Title
CN116842307B (en) Data processing method, device, equipment, chip and storage medium
CN110147347B (en) Chip for matrix processing, matrix processing method, device and storage medium
CN109120862A (en) High-dynamic-range image acquisition method, device and mobile terminal
CN110673944B (en) Method and device for executing task
CN111045732B (en) Data processing method, chip, device and storage medium
CN112560435B (en) Text corpus processing method, device, equipment and storage medium
CN112632918A (en) Document editing method, device, terminal and storage medium
CN111193604B (en) Deployment method, device, equipment and storage medium of virtual network function chain
CN117215990A (en) Inter-core communication method and device of multi-core chip and multi-core chip
CN117349222A (en) Processor, data processing method and computer equipment
CN111626035A (en) Layout analysis method and electronic equipment
CN115964331A (en) Data access method, device and equipment
CN110969217B (en) Method and device for image processing based on convolutional neural network
CN116909626B (en) Data processing method, processor and computer equipment
CN116881194B (en) Processor, data processing method and computer equipment
CN114417773A (en) Chip layout method and device, electronic equipment and readable storage medium
CN116820524B (en) Model updating method, device, computer equipment and storage medium
CN112231619A (en) Conversion method, conversion device, electronic equipment and storage medium
CN113282242B (en) Distributed storage method, device, equipment and computer readable storage medium
CN116935824B (en) Audio data filtering method, device, equipment and storage medium
CN117667208B (en) Data operation method, memory and computer equipment
CN115391524B (en) Sensitive word detection method and device, computer equipment, storage medium and product
CN113641611B (en) I2C interface circuit, control method thereof and electronic equipment
CN116501227B (en) Picture display method and device, electronic equipment and storage medium
CN116980277B (en) Data processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication