WO2023151460A1 - 数据处理方法、装置、芯片及介质 - Google Patents

数据处理方法、装置、芯片及介质 Download PDF

Info

Publication number
WO2023151460A1
WO2023151460A1 PCT/CN2023/072631 CN2023072631W WO2023151460A1 WO 2023151460 A1 WO2023151460 A1 WO 2023151460A1 CN 2023072631 W CN2023072631 W CN 2023072631W WO 2023151460 A1 WO2023151460 A1 WO 2023151460A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
cores
core
target
data processing
Prior art date
Application number
PCT/CN2023/072631
Other languages
English (en)
French (fr)
Inventor
沈杨书
何伟
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2023151460A1 publication Critical patent/WO2023151460A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This description relates to the technical field of data processing, and in particular to a data processing method, device, chip and medium.
  • the data processing process of these multiple computing cores is globally synchronized, that is, it is necessary to wait for all computing cores to process through these multiple computing cores.
  • the data processing process of the core is completed before the subsequent data processing process can continue. Once there is a computing core among the multiple computing cores that has not completed the data processing process, the subsequent data processing process will not be able to proceed, resulting in data processing. less efficient.
  • this specification provides a data processing method, device, terminal and medium.
  • a data processing method is provided, which is applied to a many-core chip, and the many-core chip includes a plurality of computing cores, and a The target circuit, the method includes:
  • Computing cores in the first target area are controlled to perform data processing synchronously.
  • the data while controlling the computing cores in the first target area, the data is performed synchronously During data processing, the method further includes:
  • calculation cores in the second target area partially overlap with the calculation cores in the first target area or do not overlap.
  • the obtained working status of each computing core it is determined that all computing cores in the first target area are in an idle state, including:
  • any two connected computing cores are in an idle state, determine the connected two computing cores as the first sub-region cores;
  • the computing cores of the multiple connected first sub-area cores are determined as the computing cores in the first target area.
  • the target circuit is an OR circuit.
  • the OR circuit is an OR circuit including a switch.
  • the first indication information is sent to the target circuit, and the When the computing core is in a non-idle state, send second indication information to the target circuit;
  • the first indication information is used to indicate that the computing core is in an idle state
  • the second indication information is used to indicate that the computing core is in a non-idle state.
  • the method also includes any of the following:
  • the indication information sent by the two calculation cores is the first indication information, it is determined that the sub-area core formed by the two calculation cores is in an idle state;
  • the indication information sent by any computing core is the second indication information, it is determined that the sub-area cores formed by the two computing cores are in a non-idle state.
  • a data processing device which is applied to a many-core chip, and a target circuit is arranged between two adjacent computing cores on the many-core chip, and the device includes:
  • An acquisition module configured to acquire the working status of each calculation core based on the target circuit set between the calculation cores
  • a determining module configured to determine that all the computing cores in the first target area are in an idle state according to the acquired working status of each computing core
  • the control module is used to control the computing cores in the first target area to perform data processing synchronously.
  • control module when used to control the computing cores in the first target area, is also used to:
  • calculation cores in the second target area partially overlap with the calculation cores in the first target area or do not overlap.
  • the acquiring module is specifically configured to:
  • any two connected computing cores are in an idle state, determine the connected two computing cores as the first sub-region cores;
  • the computing cores of the multiple connected first sub-area cores are determined as the computing cores in the first target area.
  • the target circuit is an OR circuit.
  • the OR circuit is an OR circuit including a switch.
  • the first indication information is sent to the target circuit, and the When the computing core is in a non-idle state, send second indication information to the target circuit;
  • the first indication information is used to indicate that the computing core is in an idle state
  • the second indication information is used to indicate that the computing core is in a non-idle state.
  • the determination module is also used for any of the following:
  • the indication information sent by the two calculation cores is the first indication information, it is determined that the sub-area core formed by the two calculation cores is in an idle state;
  • the indication information sent by any computing core is the second indication information, it is determined that the sub-area cores formed by the two computing cores are in a non-idle state.
  • a many-core chip includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein, when the processor executes the computer program Implement the operations performed by the above data processing method.
  • a computer-readable storage medium is provided.
  • a program is stored on the computer-readable storage medium, and the program is used by a processor to execute the operations performed by the above data processing method.
  • a computer program product including a computer program, and when the computer program is executed by a processor, operations performed by the above data processing method are implemented.
  • the working status of each computing core can be obtained through the target circuit set between the computing cores, so that according to the working status of each computing core, in the first target area
  • the calculation cores in the many-core chip are all in an idle state, the calculation cores in the first target area are controlled to perform data processing synchronously, so as to realize the local synchronization of the calculation cores in the many-core chip.
  • the processing process is more flexible, which makes data processing more efficient.
  • Fig. 1 is a flow chart of a data processing method shown in this specification according to an exemplary embodiment.
  • Fig. 2 is a schematic diagram showing a connection relationship of a computing core according to an exemplary embodiment in this specification.
  • Fig. 3 is a schematic diagram of a target circuit to which computing cores at various positions are connected according to an exemplary embodiment of this specification.
  • Fig. 4 is a schematic diagram of computing cores included in a many-core chip shown in this specification according to an exemplary embodiment.
  • Fig. 5 is a schematic diagram showing division of target areas according to an exemplary embodiment in this specification.
  • Fig. 6 is a schematic diagram of a data processing process shown in this specification according to an exemplary embodiment.
  • Fig. 7 is a block diagram of a data processing device shown in this specification according to an exemplary embodiment.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • the present application provides a data processing method, which can be applied to many-core chips, so that the many-core chips can process data to be processed through the data processing method provided by the present application.
  • the many-core chip is a chip including a plurality of computing cores, so that the many-core chip can use the multiple computing cores to realize the processing of the data to be processed.
  • the many-core chip is a computing chip, or the many-core chip is a perception chip, etc.
  • the present application does not limit the chip type of the many-core chip.
  • the many-core chip can be applied to various types of computer equipment, for example, the many-core chip can be applied to a server, or the many-core chip can be applied to a terminal, such as a desktop computer, a portable computer, a tablet computer, a smart Mobile phones, smart watches, etc., are not limited in this application.
  • Figure 1 is a flow chart of a data processing method shown in this specification according to an exemplary embodiment, which is applied to a many-core chip, and the many-core chip includes a plurality of computing cores, and the positions on the many-core chip adjacent two A target circuit is arranged between the computing cores, and the method includes the following steps:
  • Step 101 based on the target circuits set between the computing cores, obtain the working status of each computing core.
  • the calculation core is used to execute the calculation process corresponding to the neurons included in the neural network mapped in the many-core chip, and one calculation core can process one or more neurons correspondingly, or multiple calculation cores can correspond to one The neurons are processed to realize the processing of the data to be processed.
  • a computing core is executing the computing process mapped in the corresponding neuron, that is, processing the data to be processed, the computing core is in a non-idle state (or working state), and if The calculation core is not executing the calculation process at this time, that is, the data to be processed is not processed, so the calculation core is in an idle state.
  • Step 102 according to the acquired working status of each computing core, determine that all computing cores in the first target area are in an idle state.
  • Step 103 controlling the computing cores in the first target area to perform data processing synchronously.
  • the computing cores in the first target area may implement multiple types of processing operations to implement the data processing process.
  • the processing operation may be a convolution operation, a mapping operation, etc., and the present application does not limit the specific type of the processing operation.
  • types of processing operations performed by computing cores in the first target area may be the same or different, which is not limited in the present application.
  • the target circuit is set between two adjacent computing cores, so that the working status of each computing core can be obtained through the target circuit set between the computing cores, so that according to each computing core
  • the computing cores in the first target area are all in an idle state, the computing cores in the first target area are controlled to perform data processing synchronously, so as to realize the partial synchronization of the computing cores in the many-core chip.
  • the processing process is more flexible, which makes data processing more efficient.
  • the data processing method provided by this application does not require routing communication through many-core chips, but directly uses the set target circuit to realize the synchronization of local computing cores on the many-core chip by setting a dedicated circuit. Compared with routing synchronization way is more direct and efficient.
  • the above embodiment shown in Figure 1 is based on setting a target circuit between two adjacent computing cores as an example.
  • the target circuit can also be set according to the set interval, that is, a target circuit can be set every set number of calculation cores, for example, every 2 calculation cores That is, a target circuit is set.
  • the target circuit when obtaining the working status of each computing core based on the target circuit set between the computing cores, the many-core chip may send control instructions to each computing core after receiving the data to be processed , so that each computing core sends instruction information to its corresponding target circuit based on the received control instruction, and the instruction information can be used to indicate the working state of the computing core, so that the target circuit can receive the two connected After the indication information sent by the computing cores, the working states of the two computing cores are determined based on the received indication information.
  • each computing core may also actively send indication information to its corresponding target circuit.
  • each computing core may also send indication information for indicating its own working state to its corresponding target circuit every preset period of time.
  • the indication information may include first indication information and second indication information, the first indication information may be used to indicate that the computing core is processing an idle state, and the second indication information may be used to indicate that the computing core is in a non-idle state.
  • the OR circuit can be used as the target circuit, and the connection relationship between the two computing cores can be referred to in Figure 2, which is a schematic diagram of the connection relationship of a computing core shown in this specification according to an exemplary embodiment , the two computing cores are connected by an OR circuit, and the two computing cores can send indication information indicating the working state to the OR circuit based on their own working status, so that the OR circuit can determine the two based on the received indication information. Calculate the working status of the core.
  • FIG. 3 is a schematic diagram of a target circuit connected to computing cores at various positions shown in this specification according to an exemplary embodiment.
  • computing cores in the middle of the array such computing cores can be connected with Four OR circuits; for the computing core at the edge of the array, three OR circuits can be connected to this type of computing core; for the computing core at the vertex of the array, two OR circuits can be connected to this type of computing core.
  • FIG. 4 is a schematic diagram of computing cores included in a many-core chip shown in this specification according to an exemplary embodiment.
  • FIG. 4 The 9 computing cores form a 3*3 computing core array, wherein the computing core 405 is the computing core in the middle of the array, and the computing core 405 is connected to four target circuits, which are target circuit 3, target Circuit 4, target circuit 8 and target circuit 11;
  • calculation core 402, calculation core 404, calculation core 406 and calculation core 408 are calculation cores at the edge of the array, calculation core 402, calculation core 404, calculation core 406 and calculation core 408 are respectively connected with three target circuits, wherein, the three target circuits connected to the calculation core 402 are target circuit 1, target circuit 2 and target circuit 8 respectively, and the three target circuits connected to the calculation core 404 are target circuit 3, target circuit 3 and target circuit 8 respectively.
  • Target circuit 7 and target circuit 10 the three target circuits connected to computing core 406 are target circuit 4, target circuit 9 and target circuit 12 respectively, and the three target circuits connected to computing core 408 are target circuit 5 and target circuit 6 and the target circuit 11;
  • the calculation core 401, the calculation core 403, the calculation core 407 and the calculation core 409 are the calculation cores at the apex position of the array, and the calculation core 401, the calculation core 403, the calculation core 407 and the calculation core 409 are respectively connected with two target circuits, wherein the two target circuits connected to the computing core 401 are target circuit 1 and target circuit 7 respectively, the two target circuits connected to the computing core 403 are target circuit 2 and target circuit 9 respectively, and the computing core 407 is connected to The two connected target circuits are target circuit 5 and target circuit 10 respectively, and the two target circuits connected to computing core 409 are target circuit 6 and target circuit 12 respectively.
  • the working status of the sub-area cores composed of every two computing cores can be determined through the logic of the OR operation, so as to determine whether each computing core in the first target area is in the idle state. That is, for the above step 102, when it is determined that all the computing cores in the first target area are in an idle state according to the obtained working status of each computing core, the following steps may be included:
  • Step 1021 through the target circuit, obtain the working status of any two computing cores connected to the target circuit.
  • the computing core For any one of the two computing cores connected to the target circuit, if the computing core is in an idle state, send the first indication information to the target circuit, and if the computing core is in a non-idle state , sending the second indication information to the target circuit.
  • 0 can be used as the first indication information
  • 1 can be used as the second indication information.
  • the corresponding The target circuit sends 0, so that the target circuit can determine that the computing core is in an idle state based on the received 0; when the computing core is in a non-idle state (that is, a working state), it can send to the corresponding target circuit 1, so that the target circuit can determine that the computing core is in a non-idle state based on receiving a 1.
  • Step 1022 in the case that any two connected computing cores are in an idle state, determine the connected two computing cores as the first sub-area cores.
  • the indication information sent by any two connected computing cores is the first indication information
  • the indication information sent by any computing core is the second indication information, it is determined that the sub-area cores formed by the two computing cores are in a non-idle state.
  • the indication information sent by any computing core is the second indication information may include the following two situations:
  • the instruction information sent by one computing core is the first instruction information, and the instruction information sent by the other computing core is the second instruction information;
  • the indication information sent by the two computing cores is the second indication information.
  • the target circuit may be an OR circuit, for example, an OR circuit including a switch, and optionally, an OR circuit including an OR gate.
  • OR circuit for example, an OR circuit including a switch, and optionally, an OR circuit including an OR gate.
  • the specific implementation manner of the OR circuit is not limited in this application.
  • Step 1023 Determine the calculation cores of multiple connected first sub-area cores as the calculation cores in the first target area.
  • the two sub-area cores include the same calculation core, that is, in the case that one of the two calculation cores included in the two sub-area cores is the same , it can be determined that the two sub-region cores are connected sub-region cores.
  • the first target area may be a preset area, and may also be an area dynamically determined according to restriction conditions.
  • each computing core to process data is limited, or in other words, the amount that each computing core can process at one time
  • the amount of data has an upper limit. Therefore, when determining the target number, the target number of computing cores for data to be processed can be determined based on the upper limit of the amount of data each computing core can process at one time.
  • the upper limit of the amount of data that each computing core can process at one time is the same, or the upper limit of the amount of data that each computing core can process at one time is different.
  • the above-mentioned process of determining the first target area according to the restriction can have There are two specific implementation methods as follows:
  • the data volume of the data to be processed can be divided by the upper limit of the data volume that each computing core can process at one time, so that the obtained The result value is used as the target number of calculation cores for processing the data to be processed, and then a plurality of connected first sub-region cores are determined from the calculation cores in the idle state, and the determined first sub-region cores include The number of computing cores must meet the target number.
  • the upper limit of the amount of data that each computing core can process at one time can be different, the upper limit of the amount of data that each connected computing core can process at one time can be accumulated until the accumulated data amount value Greater than or equal to the amount of data to be processed, the number of calculation cores corresponding to the accumulated data value is the target number, so as to obtain at least one first sub-region core that is connected and the number of calculation cores included meets the target number the first target area.
  • the amount of data that can be processed by the second computing core connected to the first computing core can be accumulated at one time, and so on , sequentially accumulating the amount of data that can be processed by the computing cores with the connection relationship at one time, until the amount of accumulated data reaches the amount of data to be processed.
  • the determined calculation cores that meet the data volume requirements can also meet the requirements of the connection relationship, so as to obtain at least one first sub-area core that is connected and the number of calculation cores included meets the target number. the first target area.
  • the division of the first target area when implementing the division of the first target area, it may be implemented by an OR circuit including a switch. For example, after determining the first sub-area core used to form the first target area, a control instruction can be sent to the corresponding target circuit to control the closing or opening of the switch contained in the target circuit, so as to realize the control of the target circuit. available or unavailable controls.
  • Fig. 5 is a schematic diagram of the division of a target area shown in this specification according to an exemplary embodiment.
  • the many-core chip shown in Fig. 5 includes 36 computing cores, and these 36 computing cores are divided into 4 In the target area, 4 groups of data to be processed can be processed respectively.
  • the circles in Figure 5 represent the calculation cores, and the circles with different shadings represent the calculation cores in different target areas.
  • the calculation cores corresponding to the circles with different shadings form different target areas, and the target Circuit 501, Circuit 502, Circuit 503, Circuit 504, Circuit 505, Circuit 506, Circuit 507, Circuit 508, Circuit 509, Circuit 510, Circuit 511, Circuit 512, Circuit 513 , the target circuit 514, the target circuit 515, the target circuit 516, and the target circuit 517 are in the open state, and the switches in the other target circuits in the figure are in the closed state, so as to realize the division of these four target areas.
  • the above process is illustrated by taking the division of the target area by turning on or off the control switch as an example.
  • the division of the target area can also be performed in other ways, for example, many-core
  • Each target circuit in the chip may be in a closed state, so that the many-core chip sends control information to each target circuit to control the target circuit whose switch is in a closed state to divide the target area based on the received control information.
  • the control information is used to indicate whether the two computing cores connected to the target circuit can be used as computing cores in one target area.
  • the grouping of computing cores is realized through switches, so that part of the computing cores in many-core chips can be combined into the first target area, so that the computing cores in the first target area can perform data processing synchronously without the need for routing and master control. , so that the processing pressure of the many-core chip can be reduced, thereby increasing the processing speed of the many-core chip.
  • the data to be processed can be various types of data, for example, the data to be processed can be image data, text data, behavior data, etc., specifically, the data to be processed can be image features, text, user behavior data (such as click rate, etc.), this application does not limit the specific type of data to be processed.
  • the first target area is a pre-set area
  • the many-core chip may also receive other data to be processed.
  • Computing cores other than those included in a target area are used to process other data to be processed.
  • the computing cores in the second target area are controlled to perform data processing synchronously; and, the computing cores in the second target area partially overlap with the computing cores in the first target area or not.
  • the second target area may be a pre-set area, or an area dynamically determined according to restriction conditions.
  • introduction of the second target area please refer to the above-mentioned introduction about the first target area, which will not be repeated here.
  • the type of data to be processed processed by the computing core included in the first target area may be the same as or different from the type of data to be processed processed by the computing core included in the second target area. This is not limited.
  • the types of processing operations performed by the computing cores included in the first target area may be the same as or different from the types of processing operations performed by the computing cores included in the second target area, and this application does not make any limitation on this limited.
  • the calculation cores in the first target area and the calculation cores in the second target area can perform calculations at the same time, and after some calculation cores in the first target area complete the calculation tasks, the first target area can be released
  • the computing cores that complete the computing tasks, so that the computing cores that complete the tasks can be determined as the computing cores included in the second target area, so that these computing cores can continue to execute the computing tasks corresponding to the second target area.
  • the first target area and the second target area may not be fixed, and the division of the areas may be dynamically adjusted according to needs during the calculation process, for example, the number of calculation cores included in the area may be increased or decreased to dynamically adjust the second target area Division of a target area and a second target area.
  • FIG. 6 is a schematic diagram of a data processing process shown in this specification according to an exemplary embodiment.
  • the computing core 601, computing core 602, Computing core 603, computing core 604, computing core 605, and computing core 606 can be used as computing cores in the first target area to execute computing tasks synchronously.
  • time t1 is reached, computing core 601, computing core 602, and computing core
  • the calculation task of 603 has been completed, but the calculation tasks of calculation core 604, calculation core 605 and calculation core 606 have not been completed.
  • calculation core 601, calculation core 602 and calculation core 602 can be released from the calculation cores included in the first target area Computing core 603, so that computing core 601, computing core 602, and computing core 603 can be used as computing cores in the second target area to execute the next computing task synchronously.
  • computing core 604, computing core 605, and computing core 606 Still as the calculation core in the first target area, to execute its unfinished calculation tasks, until the time t2 is reached, the calculation tasks of the calculation core 604, the calculation core 605 and the calculation core 606 have been completed, then you can use the calculation core 604, calculation Core 605 and computing core 606 continue to perform other computing tasks.
  • the computing core 601, computing core 602 and computing core 603 in the first target area can continue to execute other data processing tasks after completing the current data processing task without waiting for the first target area
  • Computing core 604, computing core 605, and computing core 606 in the computer complete data processing tasks, thereby realizing partial synchronization of the data processing process of computing cores, so that many-core chips can start subsequent operations without waiting for the completion of data processing tasks of all computing cores.
  • the data processing tasks of many-core chips can be improved, and the data processing efficiency of many-core chips can be improved, so that the computing power of many-core chips can be fully utilized.
  • this specification also provides embodiments of devices and chips used therein.
  • Figure 7 is a block diagram of a data processing device shown in this specification according to an exemplary embodiment, the many-core chip includes multiple computing cores, and the two adjacent computing cores on the many-core chip A target circuit is arranged between; the data processing device includes:
  • An acquisition module 701 configured to acquire the working status of each calculation core based on the target circuit set between the calculation cores;
  • a determining module 702 configured to determine that all computing cores in the first target area are in an idle state according to the obtained working status of each computing core;
  • the control module 703 is configured to control the computing cores in the first target area to perform data processing synchronously:
  • control module 703 is also used to control the computing cores in the first target area to perform data processing synchronously:
  • calculation cores in the second target area partially overlap with the calculation cores in the first target area or do not overlap.
  • the acquiring module 701 is specifically configured to:
  • any two connected computing cores are in an idle state, determine the connected two computing cores as the first sub-region cores;
  • the computing cores of the multiple connected first sub-area cores are determined as the computing cores in the first target area.
  • the target circuit is an OR circuit.
  • the OR circuit is an OR circuit including a switch.
  • the first indication information is sent to the target circuit, and the When the computing core is in a non-idle state, send second indication information to the target circuit;
  • the first indication information is used to indicate that the computing core is in an idle state
  • the second indication information is used to indicate that the computing core is in a non-idle state.
  • the determination module 702 is also used for any of the following:
  • the indication information sent by the two calculation cores is the first indication information, it is determined that the sub-area core formed by the two calculation cores is in an idle state;
  • the indication information sent by any computing core is the second indication information, it is determined that the sub-area cores formed by the two computing cores are in a non-idle state.
  • the device embodiment since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. It can be understood and implemented by those skilled in the art without creative effort.
  • the present application also provides a many-core chip.
  • the many-core chip includes a memory, a plurality of computing cores, and a computer program stored in the memory and operable on the computing cores, wherein any embodiment is realized when the computing cores execute the program. The operation performed by the provided data processing method.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium can be in various forms.
  • the computer-readable storage medium can be: RAM (Radom Access Memory, Random Access Memory ), volatile memory, non-volatile memory, flash memory, storage drives (such as hard disk drives), solid-state drives, storage disks of any type (such as compact discs, DVDs, etc.), or similar storage media, or combinations thereof.
  • the computer-readable storage medium can also be paper or other suitable printable program medium.
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the computing core, the data processing method provided by any embodiment of the present application is implemented.
  • the present application also provides a computer program product, including a computer program.
  • a computer program product including a computer program.
  • the data processing method provided in any embodiment of the present application is implemented.
  • one or more embodiments of this specification may be provided as a method, device, chip, computer-readable storage medium, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may employ a computer program embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The form of the product.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to the part of the description of the method embodiment.
  • Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, hardware including the structures disclosed in this specification and their structural equivalents, or combinations thereof A combination of one or more.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data
  • the processing means executes.
  • the computer-readable storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Microcomputers (AREA)

Abstract

本说明书提供一种数据处理方法、装置、芯片及介质,属于数据处理技术领域。该方法通过在位置相邻的两个计算核之间设置目标电路,以便通过计算核之间设置的目标电路,来获取各个计算核的工作状态,从而根据各个计算核的工作状态,在第一目标区域内的计算核均处于空闲状态的情况下,控制第一目标区域内的计算核,同步进行数据处理,实现众核芯片内计算核的局部同步,相对于众核芯片内计算核的全局同步而言,处理过程更加灵活,从而使得数据处理效率更高。

Description

数据处理方法、装置、芯片及介质 技术领域
本说明书涉及数据处理技术领域,尤其涉及一种数据处理方法、装置、芯片及介质。
背景技术
随着高性能计算需求的日益增加,众核芯片作为一种计算核数量较多、计算能力较强的芯片,其研究逐渐成为芯片研究领域的重要研究方向之一。
相关技术中,众核芯片在通过多个计算核进行数据处理的过程中,这多个计算核的数据处理过程是全局同步的,也即是,在通过这多个计算核处理需要等待所有计算核的数据处理过程均执行完成,才能继续后续的数据处理过程,一旦这多个计算核中存在一个计算核尚未完成数据处理过程,则会导致后续的数据处理过程无法进行,从而使得导致数据处理效率较低。
发明内容
为克服相关技术中存在的问题,本说明书提供了一种数据处理方法、装置、终端及介质。
根据本说明书实施例的第一方面,提供一种数据处理方法,应用于众核芯片,该众核芯片包括多个计算核,该众核芯片上位置相邻的两个计算核之间设置有目标电路,该方法包括:
基于该计算核之间设置的目标电路,获取各个计算核的工作状态;
根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态;
控制该第一目标区域内的计算核,同步进行数据处理。
在本说明书的一些实施例中,在控制该第一目标区域内的计算核,同步进行数 据处理的过程中,该方法还包括:
在确定第二目标区域内的计算核均处于空闲状态的情况下,控制该第二目标区域内的计算核,同步进行数据处理;
并且,该第二目标区域内的计算核与该第一目标区域内的计算核部分重复或不重复。
在本说明书的一些实施例中,根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态,包括:
通过该目标电路,获取该目标电路连接的任意两个计算核的工作状态;
在连接的任意两个计算核均处于空闲状态的情况下,将连接的两个计算核确定为第一子区域核;
将多个相连的该第一子区域核的计算核,确定为该第一目标区域内的计算核。
在本说明书的一些实施例中,该目标电路为或电路。
在本说明书的一些实施例中,该或电路为包含开关的或电路。
在本说明书的一些实施例中,对于该目标电路所连接的两个计算核中的任意一个计算核,在该计算核处于空闲状态的情况下,向该目标电路发送第一指示信息,在该计算核处于非空闲状态的情况下,向该目标电路发送第二指示信息;
其中,该第一指示信息用于指示该计算核处于空闲状态,该第二指示信息用于指示该计算核处于非空闲状态。
在本说明书的一些实施例中,该方法还包括下述任一项:
在该两个计算核所发送的指示信息均为第一指示信息的情况下,确定该两个计算核所组成的子区域核处于空闲状态;
在任一计算核所发送的指示信息为第二指示信息的情况下,确定该两个计算核所组成的子区域核处于非空闲状态。
根据本说明书实施例的第二方面,提供一种数据处理装置,应用于众核芯片,该众核芯片上位置相邻的两个计算核之间设置有目标电路,该装置包括:
获取模块,用于基于该计算核之间设置的目标电路,获取各个计算核的工作状态;
确定模块,用于根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态;
控制模块,用于控制该第一目标区域内的计算核,同步进行数据处理。
在本说明书的一些实施例中,该控制模块,在用于控制该第一目标区域内的计算核,同步进行数据处理的过程中,还用于:
在确定第二目标区域内的计算核均处于空闲状态的情况下,控制该第二目标区域内的计算核,同步进行数据处理;
并且,该第二目标区域内的计算核与该第一目标区域内的计算核部分重复或不重复。
在本说明书的一些实施例中,该获取模块,在用于根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态时,具体用于:
通过该目标电路,获取该目标电路连接的任意两个计算核的工作状态;
在连接的任意两个计算核均处于空闲状态的情况下,将连接的两个计算核确定为第一子区域核;
将多个相连的该第一子区域核的计算核,确定为该第一目标区域内的计算核。
在本说明书的一些实施例中,该目标电路为或电路。
在本说明书的一些实施例中,该或电路为包含开关的或电路。
在本说明书的一些实施例中,对于该目标电路所连接的两个计算核中的任意一个计算核,在该计算核处于空闲状态的情况下,向该目标电路发送第一指示信息,在该计算核处于非空闲状态的情况下,向该目标电路发送第二指示信息;
其中,该第一指示信息用于指示该计算核处于空闲状态,该第二指示信息用于指示该计算核处于非空闲状态。
在本说明书的一些实施例中,该确定模块,还用于下述任一项:
在该两个计算核所发送的指示信息均为第一指示信息的情况下,确定该两个计算核所组成的子区域核处于空闲状态;
在任一计算核所发送的指示信息为第二指示信息的情况下,确定该两个计算核所组成的子区域核处于非空闲状态。
根据本说明书实施例的第三方面,提供一种众核芯片,该众核芯片包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行计算机程序时实现上述数据处理方法所执行的操作。
根据本说明书实施例的第四方面,提供一种计算机可读存储介质,计算机可读存储介质上存储有程序,程序被处理器执行上述数据处理方法所执行的操作。
根据本说明书实施例的第五方面,提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现上述数据处理方法所执行的操作。
本说明书的实施例提供的技术方案可以包括以下有益效果:
通过在位置相邻的两个计算核之间设置目标电路,以便通过计算核之间设置的目标电路,来获取各个计算核的工作状态,从而根据各个计算核的工作状态,在第一目标区域内的计算核均处于空闲状态的情况下,控制第一目标区域内的计算核,同步进行数据处理,实现众核芯片内计算核的局部同步,相对于众核芯片内计算核的全局同步而言,处理过程更加灵活,从而使得数据处理效率更高。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书一起用于解释本说明书的原理。
图1是本说明书根据一示例性实施例示出的一种数据处理方法的流程图。
图2是本说明书根据一示例性实施例示出的一种计算核的连接关系示意图。
图3是本说明书根据一示例性实施例示出的一种各个位置的计算核所连接的目标电路示意图。
图4是本说明书根据一示例性实施例示出的一种众核芯片所包括的计算核的示意图。
图5是本说明书根据一示例性实施例示出的一种目标区域的划分情况示意图。
图6是本说明书根据一示例性实施例示出的一种数据处理过程的示意图。
图7是本说明书根据一示例性实施例示出的一种数据处理装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如本申请中所详述的、本说明书的一些方面相一致的装置和方法的例子。
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本申请中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
本申请提供了一种数据处理方法,可以应用于众核芯片,以便众核芯片可以通过本申请所提供的数据处理方法,来对待处理数据进行处理。其中,众核芯片为包括多个计算核的芯片,以便众核芯片可以利用这多个计算核,实现待处理数据的处理。
可选地,众核芯片为计算芯片,或者,众核芯片为感知芯片,等等,本申请对众核芯片的芯片类型不加以限定。该众核芯片可以应用在多种类型的计算机设备上,例如,该众核芯片可以应用在服务器上,或者,该众核芯片可以应用在终端上,如台式计算机、便携式计算机、平板电脑、智能手机、智能手表,等等,本申请对此也不加以限定。
上述为关于本申请的应用场景的相关介绍,接下来结合本说明书实施例,对本申请所提供的数据处理方法进行详细说明。
如图1所示,图1是本说明书根据一示例性实施例示出的一种数据处理方法的流程图,应用于众核芯片,该众核芯片包括多个计算核,该众核芯片上位置相邻的两 个计算核之间设置有目标电路,该方法包括以下步骤:
步骤101、基于计算核之间设置的目标电路,获取各个计算核的工作状态。
其中,计算核用于执行映射在该众核芯片中的神经网络所包括的神经元对应的计算过程,一个计算核可以对应处理一个或多个神经元,或者,多个计算核可以对应于一个神经元进行处理,从而实现对待处理数据的处理。
需要说明的是,若某个计算核正在执行映射在对应神经元中的计算过程,也即是,正在对待处理数据进行处理,则该计算核处于非空闲状态(或称工作状态),而若该计算核此时并未执行计算过程,也即是并未对待处理数据进行处理,则该计算核处于空闲状态。
步骤102、根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态。
步骤103、控制第一目标区域内的计算核,同步进行数据处理。
可选地,第一目标区域内的计算核可以通过多种类型的处理操作,来实现数据处理过程。其中,处理操作可以为卷积操作、映射操作等等,本申请对处理操作的具体类型不加以限定。
另外,第一目标区域内的计算核所执行的处理操作的类型可以相同,也可以不同,本申请对此不加以限定。
本申请所提供的数据处理方法,通过在位置相邻的两个计算核之间设置目标电路,以便通过计算核之间设置的目标电路,来获取各个计算核的工作状态,从而根据各个计算核的工作状态,在第一目标区域内的计算核均处于空闲状态的情况下,控制第一目标区域内的计算核,同步进行数据处理,实现众核芯片内计算核的局部同步,相对于众核芯片内计算核的全局同步而言,处理过程更加灵活,从而使得数据处理效率更高。另外,本申请提供的数据处理方法不需要通过众核芯片的路由通信,而是直接利用设置的目标电路,以设置专用电路的方式实现众核芯片上局部计算核的同步,相比于路由同步的方式更加直接、高效。
在介绍了本申请的基本实现过程之后,下面具体介绍本申请的各种非限制性实施方式。
上述图1所示的实施例是以在相邻两个计算核之间即设置一个目标电路为例来 进行说明的,在更多可能的实现方式中,还可以按照设定间隔进行目标电路的设置,也即是,可以每隔设定数量的计算核设置一个目标电路,例如每隔2个计算核即设置一个目标电路。
无论是采用哪种方式设置的目标电路,均可以用于实现计算核的工作状态的获取。在一些实施例中,对于上述步骤101,在基于计算核之间设置的目标电路,获取各个计算核的工作状态时,众核芯片可以在接收到待处理数据后,向各个计算核发送控制指令,以便各个计算核基于接收到的控制指令,向各自对应的目标电路发送指示信息,该指示信息可以用于指示计算核所处的工作状态,以便目标电路可以在接收到其所连接的两个计算核发送的指示信息后,基于所接收到的指示信息,确定这两个计算核的工作状态。
可选地,各个计算核还可以主动向各自对应的目标电路发送指示信息。例如,各个计算核还可以每隔预设时长,即向各自对应的目标电路发送用于指示自己的工作状态的指示信息。
其中,该指示信息可以包括第一指示信息和第二指示信息,该第一指示信息可以用于指示计算核处理空闲状态,该第二指示信息可以用于指示计算核处于非空闲状态。
在一些实施例中,可以采用或电路作为目标电路,则两个计算核之间的连接关系可以参见图2,图2是本说明书根据一示例性实施例示出的一种计算核的连接关系示意图,两个计算核之间通过或电路连接,这两个计算核可以基于自己的工作状态,向或电路发送指示工作状态的指示信息,以便或电路可以基于接收到的指示信息,确定这两个计算核的工作状态。
需要说明的是,众核芯片所包括的计算核可以以计算核阵列的形式分布在众核芯片上,对于处在众核芯片的不同位置处的计算核,其所连接的目标电路的数量是不同的,参见图3,图3是本说明书根据一示例性实施例示出的一种各个位置的计算核所连接的目标电路示意图,对于处于阵列中间位置的计算核,这类计算核可以连接有四个或电路;对于处于阵列边缘位置的计算核,这类计算核可以连接有三个或电路;对于处于阵列顶点位置的计算核,这类计算核可以连接有两个或电路。
下面结合图4,进一步对各个位置的计算核的目标电路连接情况进行说明,图4是本说明书根据一示例性实施例示出的一种众核芯片所包括的计算核的示意图,图4 中的9个计算核组成了一个3*3的计算核阵列,其中,计算核405即为处于阵列中间位置的计算核,计算核405即连接有四个目标电路,分别为目标电路3、目标电路4、目标电路8以及目标电路11;计算核402、计算核404、计算核406和计算核408即为处于阵列边缘位置的计算核,计算核402、计算核404、计算核406和计算核408分别连接有三个目标电路,其中,计算核402所连接的三个目标电路分别为目标电路1、目标电路2以及目标电路8,计算核404所连接的三个目标电路分别为目标电路3、目标电路7以及目标电路10,计算核406所连接的三个目标电路分别为目标电路4、目标电路9以及目标电路12,计算核408所连接的三个目标电路分别为目标电路5、目标电路6以及目标电路11;计算核401、计算核403、计算核407和计算核409即为处于阵列顶点位置的计算核,计算核401、计算核403、计算核407和计算核409分别连接有两个目标电路,其中,计算核401所连接的两个目标电路分别为目标电路1和目标电路7,计算核403所连接的两个目标电路分别为目标电路2和目标电路9,计算核407所连接的两个目标电路分别为目标电路5和目标电路10,计算核409所连接的两个目标电路分别为目标电路6和目标电路12。
在采用或电路作为目标电路的情况下,可以通过或运算的逻辑,来确定每两个计算核所组成的子区域核的工作状态,从而确定出第一目标区域中的各个计算核是否均处于空闲状态。也即是,对于上述步骤102,在根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态时,可以包括如下步骤:
步骤1021、通过目标电路,获取目标电路连接的任意两个计算核的工作状态。
对于该目标电路所连接的两个计算核中的任意一个计算核,在该计算核处于空闲状态的情况下,向该目标电路发送第一指示信息,在该计算核处于非空闲状态的情况下,向该目标电路发送第二指示信息。
在一种可能的实现方式中,可以以0作为第一指示信息,以1作为第二指示信息,则对于任一计算核,在该计算核处于空闲状态的情况下,可以向其所对应的目标电路发送0,以便目标电路可以基于接收到的0确定该计算核处于空闲状态;在该计算核处于非空闲状态(也即是工作状态)的情况下,可以向其所对应的目标电路发送1,以便目标电路可以基于接收到1确定该计算核处于非空闲状态。
步骤1022、在连接的任意两个计算核均处于空闲状态的情况下,将连接的两个计算核确定为第一子区域核。
在一种可能的实现方式中,在连接的任意两个计算核所发送的指示信息均为第一指示信息的情况下,确定这两个计算核所组成的子区域核处于空闲状态。仍以以0表示第一指示信息为例,若目标电路所连接的两个计算核核均输出0,则可以确定这两个计算核所组成的子区域核处于空闲状态。
在另一种可能的实现方式中,在任一计算核所发送的指示信息为第二指示信息的情况下,确定这两个计算核所组成的子区域核处于非空闲状态。
其中,任一计算核所发送的指示信息为第二指示信息可以包括以下两种情况:
1、一个计算核所发送的指示信息为第一指示信息,另一个计算核所发送的指示信息为第二指示信息;
2、两个计算核所发送的指示信息均为第二指示信息。
仍以0表示第一指示信息,以1表示第二指示信息为例,若一个计算核输出0,另一个计算核输出1,则可以确定这两个计算核所组成的子区域核处于非空闲状态;或者,若两个计算核均输出1,则可以确定这两个计算核所组成的子区域核处于非空闲状态。
其中,该目标电路可以为或电路,例如,包含开关的或电路,可选地,还可以为包括或门的或电路,本申请对或电路的具体实现方式不加以限定。
步骤1023、将多个相连的第一子区域核的计算核,确定为第一目标区域内的计算核。
其中,对于任意两个相连的子区域核,这两个子区域核中包括一个相同的计算核,也即是,在两个子区域核分别包括的两个计算核中存在一个计算核相同的情况下,即可确定这两个子区域核为相连的子区域核。
需要说明的是,第一目标区域可以为预先设置好的区域,还可以为根据限制条件动态确定出的区域。
在根据限制条件确定第一目标区域时,可以通过如下方式实现:
基于待处理数据的数据量,确定用于处理待处理数据的计算核的目标数量,将处于空闲状态且所包括的计算核的数量符合目标数量的子区域核对应的区域,确定为第一目标区域。
各个计算核处理数据的能力都是有限的,或者说,各个计算核一次所能处理的 数据量是有上限的,因而,在确定目标数量时,可以基于各个计算核一次所能处理的数据量上限,来确定用于处于待处理数据的计算核的目标数量。
可选地,各个计算核一次所能处理的数据量的上限相同,或者,各个计算核一次所能处理的数据量的上限不同,基于此,上述根据限制条件确定第一目标区域的过程可以有如下两种具体实现方式:
在一种可能的实现方式中,若各个计算核一次所能处理的数据量上限相同,则可以对待处理数据的数据量和各个计算核一次所能处理的数据量上限做除法,从而将得到的结果值,作为用于处理待处理数据的计算核的目标数量,进而从处于空闲状态的计算核中,确定出多个相连的第一子区域核,所确定出的第一子区域核所包括的计算核的数量需满足目标数量。
在另一种可能的实现方式中,若各个计算核一次所能处理的数据量上限不同,则可以对相连的各个计算核一次所能处理的数据量上限进行累加,直至累加后的数据量值大于或等于待处理数据的数量,所累加的数据量值对应的计算核的数量即为目标数量,从而得到相连且所包括的计算核的数量符合目标数量的至少一个第一子区域核所组成的第一目标区域。
例如,可以在任一个处于空闲状态的第一计算核一次所能处理的数据量上限的基础上,累加上与该第一计算核相连的第二计算核一次所能处理的数据量,以此类推,依次累加具有连接关系的计算核一次所能处理的数据量,直至累加得到的数据量达到待处理数据的数据量。通过上述累加方式,可以使得所确定出的满足数据量需求的计算核,也满足连接关系的需求,从而得到相连且所包括的计算核的数量符合目标数量的至少一个第一子区域核所组成的第一目标区域。
在一种可能的实现方式中,在实现第一目标区域的划分时,可以通过包含开关的或电路实现。例如,在确定出用于组成第一目标区域的第一子区域核后,可以通过向相应的目标电路发送控制指令,以控制目标电路所包含的或开关闭合或断开,从而实现对目标电路的可用或不可用的控制。
其中,在发送控制指令时,向处于空闲状态的第一子区域核对应的目标电路发送第一控制指令,以通过该第一控制指令,控制处于空闲状态的第一子区域核对应的目标电路处于可用状态;向处于非空闲状态的第一子区域核对应的目标电路发送第二控制指令,以通过该第二控制指令,控制处于非空闲状态的第一子区域核对应的目标 电路处于不可用状态。
参见图5,图5是本说明书根据一示例性实施例示出的一种目标区域的划分情况示意图,图5所示的众核芯片包括36个计算核,这36个计算核被分为4个目标区域,可以分别对4组待处理数据进行处理。其中,图5中的圆形即代表计算核,不同底纹的圆形代表不同目标区域中的计算核,或者说,不同底纹的圆形对应的计算核组成了不同的目标区域,而目标电路501、目标电路502、目标电路503、目标电路504、目标电路505、目标电路506、目标电路507、目标电路508、目标电路509、目标电路510、目标电路511、目标电路512、目标电路513、目标电路514、目标电路515、目标电路516、目标电路517所包括的开关处于断开状态,图中其他目标电路所包括的开关处于闭合状态,从而实现对这4个目标区域的划分。
上述过程是以通过控制开关的闭合或断开来实现目标区域的划分为例来进行说明的,在更多可能的实现方式中,还可以通过其他方式来进行目标区域的划分,例如,众核芯片中的各个目标电路可以均处于闭合状态,从而由众核芯片向各个目标电路发送控制信息,以控制开关处于闭合状态的目标电路基于接收到的控制信息,来进行目标区域的划分。其中,控制信息用于指示目标电路所连接的两个计算核是否可以作为一个目标区域中的计算核。
通过开关实现计算核的分组,从而可以实现将众核芯片中的部分计算核组成第一目标区域,使得第一目标区域内的计算核可以同步进行数据处理,而无需通过路由和主控进行控制,从而能够减少众核芯片的处理压力,进而提高众核芯片的处理速度。
可选地,待处理数据可以为多种类型的数据,例如,待处理数据可以为图像数据、文本数据、行为数据等,具体地,待处理数据可以为图像特征、文字、用户行为数据(如点击率等),本申请对待处理数据的具体类型不加以限定。
另外,在第一目标区域为预先设置好的区域的情况下,也可以通过与上述过程同理的方式,基于待处理数据的数据量,来对已设置的第一目标区域进行调整,以保证第一目标区域所包括的计算核的处理能力,能够满足待处理数据的数据处理需求。
在一些实施例中,在通过步骤103,来控制第一目标区域内的计算核,同步进行数据处理的过程中,众核芯片还有可能接收到其他的待处理数据,此时,可以通过第一目标区域所包括的计算核以外的其他计算核,来对其他待处理数据进行处理。
在一种可能的实现方式中,在确定第二目标区域内的计算核均处于空闲状态的 情况下,控制第二目标区域内的计算核,同步进行数据处理;并且,第二目标区域内的计算核与第一目标区域内的计算核部分重复或不重复。
其中,第二目标区域可以为预先设置好的区域,还可以为根据限制条件动态确定出的区域,关于第二目标区域的介绍可以参见上述有关第一目标区域的介绍,此处不再赘述。
可选地,第一目标区域所包括的计算核所处理的待处理数据的类型,与第二目标区域所包括的计算核所处理的待处理数据的类型可以相同,也可以不同,本申请对此不加以限定。
另外,第一目标区域所包括的计算核所执行的处理操作的类型,与第二目标区域所包括的计算核所执行的处理操作的类型可以相同,也可以不同,本申请对此也不加以限定。
需要说明的是,第一目标区域中的计算核和第二目标区域中的计算核可以同时进行计算,而在第一目标区域中的部分计算核完成计算任务后,第一目标区域即可释放完成计算任务的计算核,从而使得可以将完成任务的计算核确定成第二目标区域所包括的计算核,以通过这部分计算核继续执行第二目标区域所对应的计算任务。也就是说,第一目标区域和第二目标区域可以不是固定的,在计算过程中可以根据需要动态调整区域的划分,例如,可以通过增加或减少区域包括的计算核的数量,来动态调整第一目标区域和第二目标区域的划分。
例如,参见图6,图6是本说明书根据一示例性实施例示出的一种数据处理过程的示意图,以包括6个计算核的众核芯片为例,其中,计算核601、计算核602、计算核603、计算核604、计算核605和计算核606可以作为第一目标区域内的计算核,来同步执行计算任务,在达到t1时刻的情况下,计算核601、计算核602和计算核603的计算任务已完成,而计算核604、计算核605和计算核606的计算任务尚未完成,此时,可以从第一目标区域所包括的计算核中释放出计算核601、计算核602和计算核603,以便计算核601、计算核602和计算核603可以作为第二目标区域内的计算核,来同步执行下一个计算任务,与此同时,计算核604、计算核605和计算核606仍作为第一目标区域内的计算核,来执行其尚未完成的计算任务,直至达到t2时刻,计算核604、计算核605和计算核606的计算任务已完成,则可以通过计算核604、计算核605和计算核606继续执行其他计算任务。
在上述任务执行过程中,第一目标区域内的计算核601、计算核602和计算核603在执行完当前的数据处理任务后,即可继续执行其他数据处理任务,而无需等待第一目标区域内的计算核604、计算核605和计算核606完成数据处理任务,从而实现了计算核的数据处理过程的局部同步,使得众核芯片无需等待所有计算核的数据处理任务完成,即可开始后续的数据处理任务,提高众核芯片的数据处理效率,从而可以充分发挥众核芯片的计算能力。
与前述方法的实施例相对应,本说明书还提供了装置及其所应用的芯片的实施例。
如图7所示,图7是本说明书根据一示例性实施例示出的一种数据处理装置的框图,该众核芯片包括多个计算核,该众核芯片上位置相邻的两个计算核之间设置有目标电路;该数据处理装置包括:
获取模块701,用于基于该计算核之间设置的目标电路,获取各个计算核的工作状态;
确定模块702,用于根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态;
控制模块703,用于控制该第一目标区域内的计算核,同步进行数据处理:
在本说明书的一些实施例中,该控制模块703,在用于制该第一目标区域内的计算核,同步进行数据处理的过程中,还用于:
在确定第二目标区域内的计算核均处于空闲状态的情况下,控制该第二目标区域内的计算核,同步进行数据处理;
并且,该第二目标区域内的计算核与该第一目标区域内的计算核部分重复或不重复。
在本说明书的一些实施例中,该获取模块701,在用于根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态时,具体用于:
通过该目标电路,获取该目标电路连接的任意两个计算核的工作状态;
在连接的任意两个计算核均处于空闲状态的情况下,将连接的两个计算核确定为第一子区域核;
将多个相连的该第一子区域核的计算核,确定为该第一目标区域内的计算核。
在本说明书的一些实施例中,该目标电路为或电路。
在本说明书的一些实施例中,该或电路为包含开关的或电路。
在本说明书的一些实施例中,对于该目标电路所连接的两个计算核中的任意一个计算核,在该计算核处于空闲状态的情况下,向该目标电路发送第一指示信息,在该计算核处于非空闲状态的情况下,向该目标电路发送第二指示信息;
其中,该第一指示信息用于指示该计算核处于空闲状态,该第二指示信息用于指示该计算核处于非空闲状态。
在本说明书的一些实施例中,该确定模块702,还用于下述任一项:
在该两个计算核所发送的指示信息均为第一指示信息的情况下,确定该两个计算核所组成的子区域核处于空闲状态;
在任一计算核所发送的指示信息为第二指示信息的情况下,确定该两个计算核所组成的子区域核处于非空闲状态。
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本申请还提供了一种众核芯片,众核芯片包括存储器和多个计算核,及存储在存储器上并可在计算核上运行的计算机程序,其中,计算核执行程序时实现任一实施例所提供的数据处理方法所执行的操作。
本申请还提供了一种计算机可读存储介质,计算机可读存储介质可以是多种形式,比如,在不同的例子中,计算机可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、DVD等),或者类似的存储介质,或者它们的组合。特殊的,计算机可读存储介质还可以是纸张或者其他合适的能够打印 程序的介质。计算机可读存储介质上存储有计算机程序,计算机程序被计算核执行时实现本申请任一实施例所提供的数据处理方法。
本申请还提供了一种计算机程序产品,包括计算机程序,计算机程序被计算核执行时实现本申请任一实施例所提供的数据处理方法。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、装置、芯片、计算机可读存储介质或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于芯片所对应的实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本说明书特定实施例进行了描述。其它实施例在本申请的范围内。在一些情况下,在本申请中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本说明书中公开的结构及其结构性等同物的硬件、或者它们中的一个或多个的组合。本说明书中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机可读存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
虽然本说明书包含许多具体实施细节,但是这些不应被解释为限制任何发明的 范围或所要求保护的范围,而是主要用于描述特定发明的具体实施例的特征。本说明书内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在本申请的范围以内。在某些情况下,本申请中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
本领域技术人员在考虑说明书及实践这里申请的发明后,将容易想到本说明书的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未申请的本技术领域中的公知常识或惯用技术手段。也即是,本说明书并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。
以上所述仅为本说明书的可选实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。

Claims (10)

  1. 一种数据处理方法,其特征在于,应用于众核芯片,所述众核芯片包括多个计算核,所述众核芯片上位置相邻的两个计算核之间设置有目标电路,所述方法包括:
    基于所述计算核之间设置的目标电路,获取各个计算核的工作状态;
    根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态;
    控制所述第一目标区域内的计算核,同步进行数据处理。
  2. 根据权利要求1所述的方法,其特征在于,在所述控制所述第一目标区域内的计算核,同步进行数据处理的过程中,所述方法还包括:
    在确定第二目标区域内的计算核均处于空闲状态的情况下,控制所述第二目标区域内的计算核,同步进行数据处理;
    并且,所述第二目标区域内的计算核与所述第一目标区域内的计算核部分重复或不重复。
  3. 根据权利要求1所述的方法,其特征在于,所述根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态,包括:
    通过所述目标电路,获取所述目标电路连接的任意两个计算核的工作状态;
    在连接的任意两个计算核均处于空闲状态的情况下,将连接的两个计算核确定为第一子区域核;
    将多个相连的所述第一子区域核的计算核,确定为所述第一目标区域内的计算核。
  4. 根据权利要求1所述的方法,其特征在于,所述目标电路为或电路。
  5. 根据权利要求4所述的方法,其特征在于,所述或电路为包含开关的或电路。
  6. 根据权利要求4所述的方法,其特征在于,对于所述目标电路所连接的两个计算核中的任意一个计算核,在所述计算核处于空闲状态的情况下,向所述目标电路发送第一指示信息,在所述计算核处于非空闲状态的情况下,向所述目标电路发送第二指示信息;
    其中,所述第一指示信息用于指示所述计算核处于空闲状态,所述第二指示信息 用于指示所述计算核处于非空闲状态。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括下述任一项:
    在所述两个计算核所发送的指示信息均为第一指示信息的情况下,确定所述两个计算核所组成的子区域核处于空闲状态;
    在任一计算核所发送的指示信息为第二指示信息的情况下,确定所述两个计算核所组成的子区域核处于非空闲状态。
  8. 一种数据处理装置,其特征在于,应用于众核芯片,所述众核芯片包括多个计算核,所述众核芯片上位置相邻的两个计算核之间设置有目标电路,所述装置包括:
    获取模块,用于基于所述计算核之间设置的目标电路,获取各个计算核的工作状态;
    确定模块,用于根据获取的各个计算核的工作状态,确定第一目标区域内的计算核均处于空闲状态;
    控制模块,用于控制所述第一目标区域内的计算核,同步进行数据处理。
  9. 一种众核芯片,其特征在于,所述众核芯片包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至7中任一项所述的数据处理方法所执行的操作。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有程序,所述程序被处理器执行如权利要求1至7中任一项所述的数据处理方法所执行的操作。
PCT/CN2023/072631 2022-02-14 2023-01-17 数据处理方法、装置、芯片及介质 WO2023151460A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210134715.0 2022-02-14
CN202210134715.0A CN114546640A (zh) 2022-02-14 2022-02-14 数据处理方法、装置、芯片及介质

Publications (1)

Publication Number Publication Date
WO2023151460A1 true WO2023151460A1 (zh) 2023-08-17

Family

ID=81676352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072631 WO2023151460A1 (zh) 2022-02-14 2023-01-17 数据处理方法、装置、芯片及介质

Country Status (2)

Country Link
CN (1) CN114546640A (zh)
WO (1) WO2023151460A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114546640A (zh) * 2022-02-14 2022-05-27 北京灵汐科技有限公司 数据处理方法、装置、芯片及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120060170A1 (en) * 2009-05-26 2012-03-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and scheduler in an operating system
CN104008013A (zh) * 2013-02-26 2014-08-27 华为技术有限公司 一种核资源分配方法、装置及众核系统
CN112035578A (zh) * 2020-11-06 2020-12-04 北京谷数科技股份有限公司 基于众核处理器的数据并行处理方法及装置
CN113407238A (zh) * 2020-03-16 2021-09-17 北京灵汐科技有限公司 一种具有异构处理器的众核架构及其数据处理方法
CN114546640A (zh) * 2022-02-14 2022-05-27 北京灵汐科技有限公司 数据处理方法、装置、芯片及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120060170A1 (en) * 2009-05-26 2012-03-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and scheduler in an operating system
CN104008013A (zh) * 2013-02-26 2014-08-27 华为技术有限公司 一种核资源分配方法、装置及众核系统
CN113407238A (zh) * 2020-03-16 2021-09-17 北京灵汐科技有限公司 一种具有异构处理器的众核架构及其数据处理方法
CN112035578A (zh) * 2020-11-06 2020-12-04 北京谷数科技股份有限公司 基于众核处理器的数据并行处理方法及装置
CN114546640A (zh) * 2022-02-14 2022-05-27 北京灵汐科技有限公司 数据处理方法、装置、芯片及介质

Also Published As

Publication number Publication date
CN114546640A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
CN110147251B (zh) 用于计算神经网络模型的系统、芯片及计算方法
KR101798369B1 (ko) 휴대용 디바이스에서 동기적 태스크 디스패치를 위한 시스템 및 방법
JP6961686B2 (ja) トリガ動作を用いたgpuリモート通信
CN110309088B (zh) Zynq fpga芯片及其数据处理方法、存储介质
JP6387571B2 (ja) 装置、方法、システム、プログラム、およびコンピュータ可読記録媒体
US11797467B2 (en) Data processing device with transmission circuit
CN109388595A (zh) 高带宽存储器系统以及逻辑管芯
KR20200119706A (ko) 프로세서 제어 장치, 방법 및 설비
WO2023151460A1 (zh) 数据处理方法、装置、芯片及介质
CN111722786A (zh) 基于NVMe设备的存储系统
CN111325312A (zh) 神经处理系统及其操作方法
WO2022001128A1 (zh) Fpga板卡内存数据的读取方法、装置及介质
US9417924B2 (en) Scheduling in job execution
CN116320469B (zh) 一种虚拟化视频编解码系统及方法、电子设备和存储介质
JP7268063B2 (ja) 低電力のリアルタイムオブジェクト検出用のシステム及び方法
CN111651383B (zh) 用于具有数据流管理器的处理器中的数据流的方法和装置
US9753769B2 (en) Apparatus and method for sharing function logic between functional units, and reconfigurable processor thereof
US10996860B2 (en) Method to improve mixed workload performance on storage devices that use cached operations
JP2022516549A (ja) チップ動作周波数の設定
CN114201727A (zh) 数据的处理方法、处理器、人工智能芯片及电子设备
US20210241105A1 (en) Inference apparatus, inference method, and storage medium
CN111340202B (zh) 运算方法、装置及相关产品
CN114328350A (zh) 一种基于axi总线的通讯方法、装置以及介质
CN111767999B (zh) 数据处理方法、装置及相关产品
US9727528B2 (en) Reconfigurable processor with routing node frequency based on the number of routing nodes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23752240

Country of ref document: EP

Kind code of ref document: A1