CN116304744A - Data processing method, device, electronic equipment, readable storage medium and chip - Google Patents
Data processing method, device, electronic equipment, readable storage medium and chip Download PDFInfo
- Publication number
- CN116304744A CN116304744A CN202310324754.1A CN202310324754A CN116304744A CN 116304744 A CN116304744 A CN 116304744A CN 202310324754 A CN202310324754 A CN 202310324754A CN 116304744 A CN116304744 A CN 116304744A
- Authority
- CN
- China
- Prior art keywords
- data
- registers
- data set
- computing units
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Image Processing (AREA)
Abstract
The application discloses a data processing method, a data processing device, electronic equipment, a readable storage medium and a chip, and belongs to the technical field of data processing. The data processing method comprises the following steps: acquiring a first data set and a second data set, and carrying out matching processing on data in the first data set and the second data set, wherein the processing times M and M are positive integers; p target data subsets in the second data set are acquired, wherein,p is a positive integer; and matching each target data subset in the P target data subsets with the first data set through N computing units.
Description
Technical Field
The application belongs to the technical field of data processing, and particularly relates to a data processing method, a data processing device, electronic equipment, a readable storage medium and a chip.
Background
In the field of data image processing and the field of neural network processing, there is a processing method of performing matching calculation of data of a certain smaller size and data of another larger size.
In the matching processing mode in the related art, multiple rounds of repeated matching calculation are needed for two data, and each round of matching calculation needs to read the data again, so that the whole matching processing process is long in time consumption and high in power consumption.
Disclosure of Invention
The embodiment of the application aims to provide a data processing method, a device, electronic equipment, a readable storage medium and a chip, so that the number of data reading times in the matching processing process is reduced, and the working efficiency is improved.
In a first aspect, an embodiment of the present application provides a data processing method, performed by an electronic device, where the electronic device includes N computing units, N is an integer greater than 1, and the data processing method includes: acquiring a first data set and a second data set, and carrying out matching processing on data in the first data set and the second data set, wherein the processing times M and M are positive integers; p target data subsets in the second data set are acquired, wherein,p is a positive integer; and matching each target data subset in the P target data subsets with the first data set through N computing units.
In a second aspect, an embodiment of the present application provides a data processing apparatus, applied to an electronic device, where the electronic device includes N computing units, N is an integer greater than 1, and the data processing apparatus includes: the acquisition module is used for acquiring the first data set and the second data set and carrying out matching processing on the data in the first data set and the second data set, wherein the processing times M is a positive integer; an acquisition module for acquiring P target data subsets in the second data set, wherein, P is a positive integer; and the processing module is used for matching each target data subset in the P target data subsets with the first data set through N computing units.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the data processing method as in the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the steps of the data processing method as in the first aspect.
In a fifth aspect, embodiments of the present application provide a chip comprising a processor and a communication interface coupled to the processor, the processor being configured to execute programs or instructions to implement the steps of the data processing method as in the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executed by at least one processor to implement the steps of the data processing method as in the first aspect.
In the embodiment of the application, N computing units are configured in the electronic device, a first data set, a second data set, and the number of processing times M for performing matching processing on data in the first data set and the second data set are acquired, P target data subsets in the second data set are acquired, each of the P target data subsets is matched with the first data set through the N computing units, N is an integer greater than 1,by the scheme, the data does not need to be read again in each matching process, so that the number of times of data reading in the matching process is reduced, and the working efficiency is improved.
Drawings
FIG. 1 illustrates a flow chart of a data processing method according to some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of N registers and N compute units provided by some embodiments of the present application;
FIG. 3 illustrates a second data in register flow schematic provided by some embodiments of the present application;
FIG. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 5 shows a block diagram of an electronic device according to an embodiment of the present application;
fig. 6 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type and do not limit the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The data processing method, the device, the electronic equipment and the storage medium provided by the embodiment of the application are described in detail below with reference to fig. 1 to 6 through specific embodiments and application scenes thereof.
In some embodiments of the present application, a data processing method is provided, which is performed by an electronic device, where the electronic device includes N computing units, each computing unit, fig. 1 shows a flowchart of the data processing method according to some embodiments of the present application, and as shown in fig. 1, the data processing method includes:
in this embodiment of the present application, the first data set and the second data set are two data sets that need to be matched, and the processing frequency M is a calculation round that needs to be repeated in the matching process of the first data set and the second data, that is, a calculation field of view. The processing number M may be a number preset in advance, and the processing number M is associated with a processing procedure of the first data set and the second data set. The processing times M can be obtained from an image algorithm and can be determined according to the accuracy of matching required.
Illustratively, the data in the first data set is smaller size data (kernel data) and the data in the second data set is larger size data (data).
in the embodiment of the present application, when the number N of computing units is greater than or equal to the processing number M, the processing is performed according toAnd dividing the processing times M by the number N of the calculation units, and rounding upwards to obtain a target data subset which is required to be acquired once, namely taking the second data set as the target data subset. In the case where the number N of calculation units is smaller than the number M of processes, according toThat is, the number of processing times M divided by the number of calculation units N is rounded up to the number P of acquisition target data. Optionally, the number of second data in the target data subset is greater than or equal to N.
The processing times M are times for carrying out matching processing on the first data in the first data set, namely the first data in the first data set needs to be matched with the second data in the second data set which is different from the M groups. The number N of the computing units is the processing parallelism of the electronic equipment for carrying out matching processing on the target data subset and the data in the first data set, namely, the first data in the first data set and the second data in N groups of different second data sets can be matched by reading the data once, so that the number P of the target data subset is a numerical value obtained by dividing M by N and rounding up.
For example, the number of processing times is 7, and the number of calculation units is 4, so that P is 2, that is, two target data subsets may be acquired.
For example, the number of processing times is not 4, and the number of calculation units is 4, so P is 1, that is, a target data subset is acquired.
It should be noted that, the number N of computing units may be configured according to the algorithm user definition required to be executed by the electronic device, and the appropriate parallelism N is determined by comprehensively considering factors such as the power consumption requirement, the performance requirement, the number of times of computation, the resource overhead of each group of computing units, and the like of the electronic device, so as to configure the appropriate number of computing units.
In the embodiment of the application, according to the processing parallelism N of the matching processing performed by the electronic device and the processing times M of the matching processing required by the electronic device on the first data set, the number P of the target data subsets required to be acquired can be determined,the number of times that the electronic device reads the second data set from the memory is reduced, power consumption required by the whole matching process flow is further reduced, and efficiency of the matching process flow is improved.
And 106, matching each target data subset of the P target data subsets with the first data set through N computing units.
In this embodiment of the present invention, data of each of the P target data subsets is data in the second data set, and each target data subset is matched with the first data set by using N computing units, that is, only one target data subset is matched with the first data set in the same period, and after the matching processing of one target data subset with the first data set is completed, the matching processing of another target data subset with the first data set is continued. The number of the second data in the target data subset is larger than that of the first data in the first data set, and the N computing units are used for carrying out multi-channel matching computation on the target data subset and the first data set so as to carry out matching processing computation on each second data in the target data subset and each first data in the first data set.
Specifically, in the matching process, the matching process is performed on the target data subset and the data in the first data set by using N computing units, where the N computing units can process the data synchronously at the same processing time.
The number of the computing units is four, the four first data in the first data set are respectively input into the four computing units at the same computing moment, the four second data in the target data subset are respectively input into the four computing units, and the four first data and the four second data can be synchronously matched and computed through the four computing units, so that the time required by matching processing is reduced.
Illustratively, the matching process of the first data set and the second data set is 3D denoising of the image, i.e. matching process of two frames of image data that are different in time domain. The matching process may also be performed on two spatially distinct frames of image data.
In the embodiment of the application, N computing units are configured in the electronic device, a first data set, a second data set, and the number of processing times M for performing matching processing on data in the first data set and the second data set are acquired, P target data subsets in the second data set are acquired, each of the P target data subsets is matched with the first data set through the N computing units, N is an integer greater than 1,by the scheme, the data does not need to be read again in each matching process, so that the number of times of data reading in the matching process is reduced, and the working efficiency is improved.
In some embodiments of the present application, matching each of the P target data subsets to the first data set by the N computing units includes: in Q clocks, N pieces of second data in the target data subset are respectively input into N computing units according to the arrangement sequence, wherein the N pieces of second data comprise the X-N+1st data to the X data in the target data subset, Q and X are positive integers, Q=X, and Q is more than or equal to N; inputting first data in the first data set into N computing units, wherein the first data is data extracted from a Q-th clock in the first data set; and carrying out matching processing on the Q first data and the Q second data through the N computing units. Optionally, inputting the first data in the first data set to the N computing units includes inputting a piece of the first data in the first data set to the N computing units, respectively, that is, the N computing units possess the same first data.
In this embodiment of the present application, at each time in the process of performing the matching process on the target data subset and the first data set, the second data in the target data subset and the first data in the first data set are calculated by N calculation units.
Specifically, in the Q clock, the X-n+1th second data to the X second data in the target data subset are respectively input to the N computing units, and the X first data in the first data set are respectively input to the N computing units, and the X second data and the X first data are both data extracted in the Q clock. At this time, the X first data and different second data are input into each computing unit, so that N computing units can synchronously match N different second data in the target data subset with the same first data. At the Q+1 clock, the (X-N+2) -th to (N+1) -th second data in the target data subset are respectively input to the N computing units, and the (X+1) -th first data in the first data set are respectively input to the N computing units. The first data input to the N computing units changes along with the change of the computing clock, and the second data in each computing unit in the N computing units also changes along with the change of the computing clock in the computing process, so that the sliding computation of the data in the first data set and the target data subset is realized. For example: and 4 computing units, at the 4 th clock, respectively inputting the 1 st second data to the 4 th second data in the target data subset to the computing units, performing matching processing with the first data in the first data set, and at the 5 th clock, respectively inputting the 2 nd second data to the 5 th second data in the target data subset to the computing units, and performing matching processing with the first data in the first data set.
Illustratively, the number of the computing units is 4, namely AU1, AU2, AU3 and AU4, the first data in the first data set is kernel data, and the second data in the target data subset is data. Each clock cycle corresponds to a moment, and the same kernel data is input to 4 computing units in each clock cycle. In the 1 st clock period, data is written into data1 by AU1, data2 by AU2, data3 by AU3, data4 by AU4, and kernel1 by AU1, AU2, AU3 and AU4 synchronously. In the 2 nd clock period, data are written into data2 by AU1, data3 by AU2, data4 by AU3, data5 by AU4, kernel2 by AU1, AU2, AU3 and AU4 synchronously, and so on, so that data in the second data set slide in N computing units until each first data in the first data set is input into the computing unit once, and the round of computing is stopped. The second data are sequentially subjected to sliding calculation in 4 calculation units, the N calculation units synchronously calculate the first data in the first data set and four groups of different second data columns, 4 times of matching processing on the first data set are completed after the 4 calculation units match the target data subset with the first data set, and under the condition that the calculation times are 8, the matching processing on the first data set and the second data set can be completed by acquiring the target data subset again.
According to the method and the device, the second data in the target data subset are sequentially input into the N computing units according to the computing time, the first data in the first data set are synchronously input into the N computing units for computing, so that different computing units can carry out matching processing on different second data and the same first data, after all the first data in the first data set are input into the computing units once, the N times of matching processing are completed, the target data subset is read once, the first data set can be subjected to the N times of matching processing, the matching processing efficiency is improved, and the times of obtaining the data in the second data set are reduced.
In some embodiments of the present application, the electronic device further includes N registers, where the N registers are in one-to-one correspondence with the N computing units.
In this embodiment of the present application, registers having the same number as the computing units are further configured in the electronic device, and the N registers are sequentially connected, so that data can be transmitted between the N registers, where the N registers are in one-to-one correspondence with the N computing units. In the process of inputting the N second data into the N computing units according to the arrangement sequence, each computing unit in the N computing units respectively reads the data in the corresponding N registers.
Specifically, the N registers are sequentially connected, and data in the target data subset is transmitted in the N registers according to the computation time, so that each second data in the target data subset sequentially flows through the N registers. In the embodiment of the application, the N-level registers and the N computing units which are arranged in one-to-one correspondence with the N-level registers are arranged in the electronic equipment, in the data reading process, the data stored in the N-level registers can be respectively read through the N computing units, the data in the target data subset flows in the N-level registers, the data in the target data subset can sequentially enter the computing units to be calculated, the times required for acquiring the target data subset in the local memory are reduced, the power consumption caused by reading the data is reduced, and the processing speed is improved.
In some embodiments of the present application, before matching each of the P target data subsets with the first data set by the N computing units, the method further includes:
and respectively transmitting the N second data in the target data subset to N registers so that the second data in the N registers can be input into corresponding computing units at each clock.
In the embodiment of the application, the second data in the target data subset are respectively transmitted to the N registers, so that the corresponding N computing units can read the second data in the corresponding registers at each clock, and the second data in the target data subset are sequentially transmitted to the N registers along with the change of the clock.
Specifically, the number of the registers is the same as the number of the computing units, and the registers are matched with the computing units, so that the second data in each register can be input into the corresponding computing unit.
In the embodiment of the application, the N-level register is arranged in the electronic equipment, and the second data in the target data subset is input into the N-level register, so that the second data in the target data subset is transmitted in the N-level register, N times of matching processing can be performed when the target data set is acquired once, the times of acquiring the target data set in the local memory are reduced, the power consumption caused by reading the data is reduced, and the processing speed is improved.
In some embodiments of the present application, the N registers are connected in sequence;
transmitting N second data in the target data subset to N registers, respectively, comprising: at the Y clock, inputting the Z second data in the target data subset into the first register in the N registers, wherein Y and Z are positive integers, and Y=Z; and at each clock, the second data input into the N registers from the previous clock flows to the next register according to the arrangement sequence of the N registers until the N registers flow out.
In this embodiment of the present application, N registers are sequentially connected, and the second data in the target data subset can flow in the N registers. Specifically, the second data is input to the first register of the N registers, and flows to the next register every time one clock passes until the second data flows out of the N registers. In the process of inputting the data in the target data subset to the N registers, the corresponding second data is only required to be input to the first register at each clock, and at each clock, the second data in each register can flow to the next register, so that each second data can pass through the N registers in sequence, and accordingly the corresponding second data can be read at each clock calculation unit.
Illustratively, n=4 is illustrated: at the 1 st clock, inputting the 1 st second data into the first register; in the 2 nd clock, inputting the 2 nd second data into the first register, and then flowing the 1 st second data input by the 1 st clock into the second register connected with the first register; at the 3 rd clock, inputting the 3 rd second data to the first register, flowing the 2 nd second data to the second register, and flowing the 1 st second data to the 3 rd register; at the 4 th clock, inputting the 4 th second data into the first register, the 3 rd second data flows into the second register, the 2 nd second data flows into the third register, and the 1 st second data flows into the last register; at the 5 th clock, the 5 th second data is input to the first register, the 4 th second data flows to the second register, the 3 rd second data flows to the third register, the 2 nd second data flows to the last register, and the 1 st second data flows out of the last register.
Fig. 2 is a schematic diagram of N registers and N computing units provided in some embodiments of the present application, where, as shown in fig. 2, the number of registers is 4, and reg0, reg1, reg2, reg3, and the number of computing units is also 4, and AU0, AU1, AU2, and AU3, respectively. Each computing unit corresponds to each register one by one, and second data of the target data subset are input into 4 registers and sequentially flow through the 4 computing units. The first data kernel in the first data set is synchronously input into 4 computing units and is computed with 4 different second data.
Illustratively, each stage of registers can store one second datum. At clock 1, data0 in the target data subset registers into reg0, and at clock 2, dota0 is transferred from reg0 to reg1, data1 registers into reg0. At clock 3, dota0 is transferred from reg1 to reg2, data1 is transferred from reg0 to reg1, and data2 registers into reg0. At clock 4, dota0 is transferred from reg2 to reg3, data1 is transferred from reg1 to reg2, data2 is transferred from reg0 to reg1, data3 is registered into reg0, at clock 5, dota0 is transferred from reg3 to the outside, data1 is transferred from reg2 to reg3, data2 is transferred from reg1 to reg2, data3 is transferred from reg0 to reg1, data4 is registered into reg0. It can be seen that reg1 through reg4 are connected in sequence and the second data in the target data subset flows through the 4 registers in sequence. So at clock 1, the computing unit AU0 reads data0, clock 2 computing unit AU0 reads data1, AU1 reads data0. The clock 3 computing unit AU0 reads data2, AU1 reads data1, AU2 reads data0, the clock 4 computing unit AU0 reads data3, AU1 reads data2, AU2 reads data1, AU3 reads data0. Clock 5 calculates that unit AU0 reads data4, AU1 reads data3, AU2 reads data2, AU3 reads data1.
Fig. 3 illustrates a second data in register flow schematic provided by some embodiments of the present application, as shown in fig. 3. The first data set includes 12 first data, kernelA, kernelB to kernelL respectively, the number of calculation times is 7, the number of calculation units and the number of registers are 4, and the 4 registers are reg0, reg1, reg2, reg3 respectively. At each time (CLK), the 4 computing units perform matching processing on the first data and the second data within each. The number of target data sets is 2, namely a first target data set and a second target data set, and the first target data set and the second target data set comprise 15 second data. The second data in the first target data set are data0, data2 to data14, respectively, and the second data in the second target data set are data4, data5 to data18, respectively. In the calculation process of clk3 to clk14, the data in the reg3 register completes the matching process of data0 to data11 and kernelA to kernelL, the data in the reg2 register completes the matching process of data1 to data12 and kernelA to kernelL, the data in the reg1 register completes the matching process of data2 to data13 and kernelA to kernelL, and the data in the reg0 register completes the matching process of data3 to data14 and kernelA to kernelL. At this time, the first target data set and the first data set complete the matching process, and a process of performing the matching process of the second target data set and the first data set, which is the same as the matching process of the first target data set and the first data set, starts to be performed. After the second target data set completes the matching process with the first data set, the first data set completes 7 matching processes with the second data set.
In this embodiment of the application, through link to each other the setting in proper order N registers, make the second data flow in N registers, guarantee that every second data homoenergetic flows through N registers in proper order, realized obtaining once target data subset and can carry out the N times with first dataset and match the processing, reduced the number of times of obtaining data at local storage area, reduced the consumption when promoting processing speed.
According to the data processing method provided by the embodiment of the application, the execution body can be a data processing device. In the embodiments of the present application, a method for executing data processing by a data processing device is taken as an example, and the data processing device provided in the embodiments of the present application is described.
In some embodiments of the present application, a data processing apparatus is provided, which is applied to an electronic device, where the electronic device includes N computing units, N is an integer greater than 1, fig. 4 shows a block diagram of a structure of the data processing apparatus according to an embodiment of the present application, and as shown in fig. 4, a data processing apparatus 400 includes:
an obtaining module 402, configured to obtain a first data set and a second data set, and a number of processing times M of performing matching processing on data in the first data set and the second data set, where M is a positive integer;
An acquisition module 402 for acquiring P target data subsets in the second data set, wherein,p is a positive integer;
and a processing module 404, configured to match each of the P target data subsets with the first data set through N computing units.
In this embodiment of the present application, N computing units are configured in an electronic device, and a first data set, a second data set, and a number of processing times M for performing matching processing on data in the first data set and the second data set are acquired, P target data subsets in the second data set are acquired, and the P target numbers are determinedAccording to each target data subset in the subsets and the first data set, carrying out matching processing through the N computing units, wherein N is an integer greater than 1,by the scheme, the data does not need to be read again in each matching process, so that the number of times of data reading in the matching process is reduced, and the working efficiency is improved.
In some embodiments of the present application, the processing module 404 is configured to input N second data in the target data subset to N computing units according to an arrangement order at a Q-th clock, where the N second data includes X-n+1st data to X-th data in the target data subset, Q and X are positive integers, q=x, and Q is greater than or equal to N;
The processing module 404 is configured to input first data in the first data set to the N computing units, where the first data is data collected at the Q-th clock in the first data set;
and the processing module 404 is configured to perform matching processing on the first data and the N second data through the N computing units.
According to the method and the device, the second data in the target data subset are sequentially input into the N computing units according to the computing time, the first data in the first data set are synchronously input into the N computing units for computing, so that different computing units can carry out matching processing on different second data and the same first data, after all the first data in the first data set are input into the computing units once, the N times of matching processing are completed, the target data subset is read once, the first data set can be subjected to the N times of matching processing, the matching processing efficiency is improved, and the times of obtaining the data in the second data set are reduced.
In some embodiments of the present application, the electronic device further includes N registers, where the N registers are in one-to-one correspondence with the N computing units.
In the embodiment of the application, the N-level registers and the N computing units which are arranged in one-to-one correspondence with the N-level registers are arranged in the electronic equipment, in the data reading process, the data stored in the N-level registers can be respectively read through the N computing units, the data in the target data subset flows in the N-level registers, the data in the target data subset can sequentially enter the computing units to be calculated, the times required for acquiring the target data subset in the local memory are reduced, the power consumption caused by reading the data is reduced, and the processing speed is improved.
In some embodiments of the present application, the processing module 404 is configured to transmit the second data in the target data subset to the N registers respectively, so that the second data in the N registers can be input to the corresponding computing units at each clock.
In the embodiment of the application, the N-level register is arranged in the electronic equipment, and the second data in the target data subset is input into the N-level register, so that the second data in the target data subset is transmitted in the N-level register, N times of matching processing can be performed when the target data subset is acquired once, the times of acquiring the target data subset in the local memory are reduced, the power consumption caused by reading the data is reduced, and the processing speed is improved.
In some embodiments of the present application, the N registers are connected in sequence;
the processing module is also used for inputting the Z second data in the target data subset into the first register in the N registers in the Y clock, wherein Y and Z are positive integers, and Y=Z; and at each clock, the second data input into the N registers from the previous clock flows to the next register according to the arrangement sequence of the N registers until the N registers flow out.
In this embodiment of the present application, the N registers are sequentially connected, and the output data of the previous register is used as the input data of the next register, so the second data in the target data subset can flow in the N registers. Specifically, the second data is input to the first register of the N registers, and flows to the next register every time one clock passes until the second data flows out of the N registers. In the process of inputting the data in the target data subset to the N registers, the corresponding second data is only required to be input to the first register at each clock, and at each clock, the second data in each register can flow to the next register, so that each second data can pass through the N registers in sequence, and accordingly the corresponding second data can be read at each clock calculation unit. The number of times of acquiring data in the local storage area is reduced, and the power consumption is reduced while the processing speed is improved.
The data processing apparatus in the embodiments of the present application may be an electronic device, or may be a component in an electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.
The data processing apparatus in the embodiments of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
The data processing device provided in the embodiment of the present application can implement each process implemented by the foregoing method embodiment, and in order to avoid repetition, details are not repeated here.
Optionally, an electronic device is further provided in the embodiments of the present application, fig. 5 shows a block diagram of a structure of an electronic device according to an embodiment of the present application, as shown in fig. 5, an electronic device 500 includes a processor 502, a memory 504, and a program or an instruction stored in the memory 504 and capable of running on the processor 502, where the program or the instruction implements each process of the foregoing method embodiment when executed by the processor 502, and the same technical effects are achieved, and are not repeated herein.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 6 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 600 includes, but is not limited to: radio frequency unit 601, network module 602, audio output unit 603, input unit 604, sensor 605, display unit 606, user input unit 607, interface unit 608, memory 609, and processor 610.
Those skilled in the art will appreciate that the electronic device 600 may further include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 610 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
The processor 610 is configured to obtain a first data set and a second data set, and process times M of performing matching processing on data in the first data set and the second data set, where M is a positive integer;
a processor 610 for acquiring P target data subsets in the second data set, wherein,p is a positive integer;
and a processor 610, configured to match each of the P target data subsets with the first data set by using N computing units.
In this embodiment of the present application, N computing units are configured in the electronic device to obtain the first data set and the second numberObtaining P target data subsets in the second data set according to the data sets and the processing times M for carrying out matching processing on the data in the first data set and the second data set, carrying out matching processing on each target data subset in the P target data subsets and the first data set through the N computing units, wherein N is an integer larger than 1,
By the scheme, the data does not need to be read again in each matching process, so that the number of times of data reading in the matching process is reduced, and the working efficiency is improved.
Further, the processor 610 is configured to input N second data in the target data subset to N computing units according to an arrangement order at a Q-th clock, where the N second data includes X-n+1st data to X-th data in the target data subset, Q and X are positive integers, q=x, and Q is greater than or equal to N;
the processor 610 is configured to input first data in the first data set to the N computing units, where the first data is data collected at a Q-th clock in the first data set;
and the processor 610 is configured to perform matching processing on the first data and the N second data through the N computing units, respectively.
According to the method and the device, the second data in the target data subset are sequentially input into the N computing units according to the computing time, the first data in the first data set are synchronously input into the N computing units for computing, so that different computing units can carry out matching processing on different second data and the same first data, after all the first data in the first data set are input into the computing units once, the N times of matching processing are completed, the target data subset is read once, the first data set can be subjected to the N times of matching processing, the matching processing efficiency is improved, and the times of obtaining the data in the second data set are reduced.
Further, the electronic device further includes N registers, where the N registers are in one-to-one correspondence with the N computing units.
In the embodiment of the application, the N-level registers and the N computing units which are arranged in one-to-one correspondence with the N-level registers are arranged in the electronic equipment, in the data reading process, the data stored in the N-level registers can be respectively read through the N computing units, the data in the target data subset flows in the N-level registers, the data in the target data subset can sequentially enter the computing units to be calculated, the times required for acquiring the target data subset in the local memory are reduced, the power consumption caused by reading the data is reduced, and the processing speed is improved.
Further, the processor 610 is configured to transmit the N second data in the target data subset to the N registers, so that the second data in the N registers can be input to the corresponding computing units at each clock.
In the embodiment of the application, the N-level register is arranged in the electronic equipment, and the second data in the target data subset is input into the N-level register, so that the second data in the target data subset is transmitted in the N-level register, N times of matching processing can be performed when the target data subset is acquired once, the times of acquiring the target data subset in the local memory are reduced, the power consumption caused by reading the data is reduced, and the processing speed is improved.
Further, the N registers are connected in sequence;
the processing module is also used for inputting the Z second data in the target data subset into the first register in the N registers in the Y clock, wherein Y and Z are positive integers, and Y=Z; and at each clock, the second data input into the N registers from the previous clock flows to the next register according to the arrangement sequence of the N registers until the N registers flow out.
In this embodiment of the present application, N registers are sequentially connected, and the second data in the target data subset can flow in the N registers. Specifically, the second data is input to the first register of the N registers, and flows to the next register every time one clock passes until the second data flows out of the N registers. In the process of inputting the data in the target data subset to the N registers, the corresponding second data is only required to be input to the first register at each clock, and at each clock, the second data in each register can flow to the next register, so that each second data can pass through the N registers in sequence, and accordingly the corresponding second data can be read at each clock calculation unit. The number of times of acquiring data in the local storage area is reduced, and the power consumption is reduced while the processing speed is improved.
It should be understood that in the embodiment of the present application, the input unit 604 may include a graphics processor (Graphics Processing Unit, GPU) 6041 and a microphone 6042, and the graphics processor 6041 processes image data of still pictures or videos obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 607 includes at least one of a touch panel 6071 and other input devices 6072. The touch panel 6071 is also called a touch screen. The touch panel 6071 may include two parts of a touch detection device and a touch controller. Other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 609 may include volatile memory or nonvolatile memory, or the memory 609 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 609 in the present embodiment includes, but is not limited to, these and any other suitable types of memory.
The processor 610 may include one or more processing units; optionally, the processor 610 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 610.
The embodiment of the application further provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here.
The processor is a processor in the electronic device in the above embodiment. Readable storage media include computer readable storage media such as Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disks, and the like.
The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or instructions, the processes of the above method embodiment are realized, the same technical effects can be achieved, and in order to avoid repetition, the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
The embodiments of the present application provide a computer program product, which is stored in a storage medium, and the program product is executed by at least one processor to implement the respective processes of the above method embodiments, and achieve the same technical effects, and are not repeated herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.
Claims (13)
1. A data processing method performed by an electronic device, the electronic device comprising N computing units, N being an integer greater than 1, the data processing method comprising:
acquiring a first data set and a second data set, and carrying out matching processing on data in the first data set and the second data set for processing times M, wherein M is a positive integer;
and matching each target data subset of the P target data subsets with the first data set through the N computing units.
2. The data processing method according to claim 1, wherein said matching each of the P target data subsets with the first data set by the N computing units includes: in the Q clock, N pieces of second data in a target data subset are respectively input into the N computing units according to the arrangement sequence, wherein the N pieces of second data comprise the X-N+1st data to the X data in the target data subset, Q and X are positive integers, Q=X, and Q is more than or equal to N;
inputting first data in a first data set into the N computing units, wherein the first data is data extracted from the first data set at a Q-th clock;
And respectively carrying out matching processing on the first data and the N second data through the N computing units.
3. The data processing method according to claim 1, wherein the electronic device further includes N registers, the N registers being in one-to-one correspondence with the N computing units.
4. A data processing method according to claim 3, wherein said matching each of said P target data subsets with a first data set, prior to said matching by said N computing units, further comprises:
and respectively transmitting N second data in the target data subset to the N registers.
5. The data processing method according to claim 4, wherein the N registers are connected in sequence;
the transmitting the N second data in the target data subset to the N registers respectively includes:
in a Y clock, inputting the Z second data in the target data subset into a first register in the N registers, wherein Y and Z are positive integers, and Y=Z;
and at each clock, the second data input to the N registers from the previous clock flows to the next register according to the arrangement sequence of the N registers until the N registers flow out.
6. A data processing apparatus applied to an electronic device, the electronic device comprising N computing units, N being an integer greater than 1, the data processing apparatus comprising:
the acquisition module is used for acquiring a first data set and a second data set and processing times M for carrying out matching processing on data in the first data set and the second data set, wherein M is a positive integer;
the acquisition module is configured to acquire P target data subsets in the second data set, where, p is a positive integer;
and the processing module is used for matching each target data subset in the P target data subsets with the first data set through the N computing units.
7. The data processing apparatus according to claim 6, wherein,
the processing module is used for respectively inputting N second data in the target data subset into the N computing units according to the arrangement sequence in the Q clock, wherein the N second data comprise the X-N+1st data to the X data in the target data subset, Q and X are positive integers, Q=X, and Q is more than or equal to N;
the processing module is used for inputting first data in a first data set into the N computing units, wherein the first data is data acquired at a Q-th clock in the first data set;
And the processing module is used for carrying out matching processing on the first data and the N second data through the N computing units.
8. The data processing apparatus of claim 6, wherein the electronic device further comprises N registers, the N registers being in one-to-one correspondence with the N computing units.
9. The data processing apparatus according to claim 8, wherein,
and the processing module is used for respectively transmitting the second data in the target data subset to the N registers.
10. The data processing apparatus according to claim 9, wherein the N registers are connected in sequence;
the processing module is configured to input, in a Y clock, a Z second data in the target data subset to a first register in the N registers, where Y and Z are positive integers, and y=z;
and at each clock, the second data input to the N registers from the previous clock flows to the next register according to the arrangement sequence of the N registers until the N registers flow out.
11. An electronic device, comprising:
A processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the data processing method of any one of claims 1 to 5.
12. A readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the data processing method according to any of claims 1 to 5.
13. A chip comprising a processor and a communication interface, the communication interface and the processor being coupled, the processor being configured to execute programs or instructions for implementing the steps of the data processing method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310324754.1A CN116304744A (en) | 2023-03-30 | 2023-03-30 | Data processing method, device, electronic equipment, readable storage medium and chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310324754.1A CN116304744A (en) | 2023-03-30 | 2023-03-30 | Data processing method, device, electronic equipment, readable storage medium and chip |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116304744A true CN116304744A (en) | 2023-06-23 |
Family
ID=86824001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310324754.1A Pending CN116304744A (en) | 2023-03-30 | 2023-03-30 | Data processing method, device, electronic equipment, readable storage medium and chip |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116304744A (en) |
-
2023
- 2023-03-30 CN CN202310324754.1A patent/CN116304744A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11880757B2 (en) | Neural network processor for handling differing datatypes | |
US11487846B2 (en) | Performing multiply and accumulate operations in neural network processor | |
US20190340490A1 (en) | Systems and methods for assigning tasks in a neural network processor | |
US11200490B2 (en) | Processing group convolution in neural network processor | |
US11783174B2 (en) | Splitting of input data for processing in neural network processor | |
CN108733347B (en) | Data processing method and device | |
CN109284761A (en) | A kind of image characteristic extracting method, device, equipment and readable storage medium storing program for executing | |
US11853868B2 (en) | Multi dimensional convolution in neural network processor | |
CN106169961A (en) | The network parameter processing method and processing device of neutral net based on artificial intelligence | |
CN113407537B (en) | Data processing method and device and electronic equipment | |
CN114761969A (en) | Multi-mode planar engine for neural processor | |
CN111695686B (en) | Address allocation method and device | |
US20230169316A1 (en) | Indexing Operations In Neural Network Processor | |
Qing et al. | Attentive and context-aware deep network for saliency prediction on omni-directional images | |
CN111178513B (en) | Convolution implementation method and device of neural network and terminal equipment | |
US10459731B2 (en) | Sliding window operation | |
CN116304744A (en) | Data processing method, device, electronic equipment, readable storage medium and chip | |
CN103365822A (en) | Digital signal processor and digital signal processing method | |
CN112711395B (en) | Encryption and decryption method and device, electronic equipment and computer readable storage medium | |
CN111242081B (en) | Video detection method, target detection network training method, device and terminal equipment | |
CN113902639A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN111382835B (en) | Neural network compression method, electronic equipment and computer readable medium | |
US20240232571A1 (en) | Palettization of Kernel Vector in Neural Network Processor | |
US12099840B1 (en) | Throughput increase for tensor operations | |
CN117119124A (en) | Video processor, video processing method, video processing device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |