EP3120243A1 - Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierung - Google Patents
Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierungInfo
- Publication number
- EP3120243A1 EP3120243A1 EP15709476.4A EP15709476A EP3120243A1 EP 3120243 A1 EP3120243 A1 EP 3120243A1 EP 15709476 A EP15709476 A EP 15709476A EP 3120243 A1 EP3120243 A1 EP 3120243A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- code
- optimized
- application
- versions
- different
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000005457 optimization Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 5
- 238000000513 principal component analysis Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000013213 extrapolation Methods 0.000 description 8
- 238000005259 measurement Methods 0.000 description 8
- 238000012512 characterization method Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000009313 farming Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3428—Benchmarking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/83—Indexing scheme relating to error detection, to error correction, and to monitoring the solution involving signatures
Definitions
- the invention relates to the field of software engineering for parallel architecture, and in particular that of assistance with optimization and code parallelization.
- Code optimization typically involves making changes to the code in order to reduce resource requirements, reducing function execution times, or improving power consumption.
- tools help with code optimization.
- For sequential architectures it is known to use a description in C language of a sequential algorithm.
- code optimization requires human intervention.
- human intervention often introduces a variability in the quality of the codes generated for each of the parallel computing architectures. This variability raises various problems, in particular that related to the comparison of two parallel architectures where the result of the analysis is subjective because strongly dependent on the expertise of the developer of the architectures studied.
- Another problem is related to the prediction of the performances of a new application code for several target architectures, the prediction can be imprecise because dependent on the human expertise of the
- the Tanabe patent application US 2009/0138862 A1 proposes a device for aiding parallelization, which performs a dependency analysis to extract the opportunities for parallelization within a program.
- the parallelization opportunities correspond to the statistically possible parallelizations for a given application. No indication as to the method of parallelizing, nor as to the potential gains, is provided. As such, the expertise related to parallelization is not taken into account.
- An object of the present invention is to provide a method for synthesizing and formalizing the expertise of developers parallel architectures to allow any developer to be able to estimate the performance and consumption of application codes on various architectures of calculations.
- the technical advantages of the present invention are to allow an estimation of the performance and the consumption of application codes on various computation architectures, without requiring the intervention of expert developers, nor the porting of codes on the architectures concerned.
- the device of the present invention makes it possible to assist a developer in the effort of porting a code from a source architecture to a target architecture, starting from a non-optimized application code in a language that can be compiled natively on a network.
- reference platform such as C, C ++, FORTRAN for example.
- the device of the invention advantageously comprises a database of existing experimental measurements that can be enriched.
- the measurements are either made by the process operator or imported from outside experiments.
- Each experimental measurement consists of evaluating the performance of several reference application codes on several target architectures.
- Each reference application code is available in a non-optimized and sequential version, allowing direct performance evaluation on a single core of each target architecture.
- Each reference code is also available in parallelized version and optimized for each target architecture.
- the invention will find application to make studies on the choice, the implementation, the performance possible for porting applications on new architectures.
- the invention will apply to the industrial field where the application codes often evolve less rapidly than the parallel computing architectures, and where the problem of porting existing application code to new parallel architectures is crucial.
- the present invention makes it possible to assist manufacturers in the porting of "business" application codes to advanced parallel architectures whose complexity may be difficult to master.
- the method of the invention makes it possible to qualify and compare new parallel architectures in order to better understand an offer available on the market.
- a method of assisting the optimization and code parallelization of an application running on a computer includes the steps of: comparing a portion of code representing a hot spot of an application to a plurality of versions non-optimized code to determine a correlation with at least one non-optimized code version; and - generating from said at least one non-optimized code version, performance predictions for different architectures and different parallel programming models for said hotspot.
- the comparing step includes calculating a correlation coefficient between said hot spot and the plurality of non-optimized code versions.
- the comparison step comprises a step of generating a signature for said hot point and comparing the signature with a plurality of signatures associated with the plurality of non-optimized code versions.
- the comparison step between the signatures is performed according to a principal component analysis (PCA).
- PCA principal component analysis
- the signatures associated with the plurality of non-optimized code versions contain at least metrics relating to the stability of a data stream, to a parallelization ratio, to a re-use distance of the data stream and to a volume of data.
- the plurality of non-optimized code versions are stored in a reference database where each non-optimized code version is a non-optimized code version for a reference platform and is associated with different optimized code versions and parallelized on different architectures and according to different models of parallel programming.
- the different code versions optimized and parallelized on different architectures and according to different parallel programming models are stored in a porting database and the step of generating predictions consists in extracting porting data for said porting database. non-optimized code version.
- the method further comprises a step that makes it possible to display the result of the predictions for a user.
- the result is displayed as Kiviat diagrams.
- the method may include an initial step of receiving an executable code of an application to be optimized and parallelized and a step of detecting in the executable code of a portion of code representing a hotspot.
- the invention also covers a device which comprises means for implementing the method.
- the invention may operate in the form of a computer program product that includes code instructions for performing the claimed process steps when the program is run on a computer.
- Figure 1 schematically shows a device in which the invention can be implemented
- Figure 2 shows a sequence of steps of the method of the invention in one embodiment
- FIG. 3 illustrates in the form of radar diagrams the result of the method of the invention for an example of application.
- the device of the invention comprises an extraction module (102) able to analyze a non-optimized executable code representative of an application and to extract hot spots from the code.
- Hot spots are portions of the code that penalize the performance of the application. These portions typically represent the least amount of code line for the greatest run time.
- Hot spots are unoptimized code portions representing discernable and compact phases of the original application.
- the executable code entering on the extraction module is a code generated, by a compilation device, from the source code of the application to be analyzed.
- the executable code can be either a file available in the direct environment of the device (100), stored on an internal disk of a computer implementing the device and operated by a user, either a file from a near or far external source.
- Executable code can come from a compiler that converts source code into C / C ++ or Fortran.
- the executable code is executed by an emulator in order to extract the appropriate characteristics.
- the device (100) performing the analysis of the executable code of this application emulates the execution of the executable code on its dataset.
- the application dataset is the input image.
- the extraction module (102) is coupled to a characterization module (104) capable of characterizing the hot spots extracted from the code.
- the characterization of hot spots consists of calculating a signature for each hot point extracted from the incoming code.
- the characterization module is also coupled to a database (106) of reference micronuclei.
- the base (106) is an empirical knowledge base of known optimization and parallelization techniques, either from the process operator or from external sources, consisting of reference micronuclei.
- the knowledge base contains six reference micronuclei making it possible to cover the algorithmic space of vision as widely as possible.
- the reference micronuclei are chosen according to several parameters such as the type of data access, for example a linear or random input image path, such as the regularity of the data, for example the fact that the nature of the calculations is predictable before execution or on the contrary if the nature of the calculations depends on the intermediate calculations at the time of execution, such as the complexity of the data, for example the number of different calculations performed on a single datum (on each pixel of an image for example).
- Each reference micronucleus has a non-optimized code version that corresponds to a basic way of coding on a reference platform and different code versions optimized and parallelized on different architectures.
- the reference platform is an x86 processor.
- Input images are generated randomly and measurements are made on different image sizes. The multitude of parameters on the input images makes it possible to characterize the micronucleus algorithm independently of its inputs.
- the database of measurements that is obtained has four input axes: (1) the target architecture, (2) the micronucleus, (3) the size of the input data set and (4) the type of parallelization relative to different programming models (for example, data-level parallelization or task-level parallelization and optimization.
- Micronuclei can come from outside sources, provided by or retrieved from developers around the world to accumulate past expertise. The choice of micronucleus is made according to a field of application in order to increase the precision and the relevance of the process.
- the characterization module (104) calculates a signature for any execution of executable code on an input dataset.
- the module calculates the signature of each reference micronucleus of the knowledge base on each of its input datasets.
- the calculation of the signatures of the reference micronuclei is done only once, during the integration of the reference micronucleus into the database, consisting of a calibration process. This calculation is performed before using the device on an input application.
- the characterization module makes it possible to calculate the signatures of the extracted hot spots by executing the executable code of the input application with its data set.
- the output of the signature module 104 is coupled to the input of a correlation module (108) which is itself coupled to the base of reference micronuclei.
- the correlation module makes it possible to establish correlations between the signature of a portion of code extracted from the code of the input application and the signatures of the reference micronuclei of the knowledge base 106.
- the output of the correlation module is coupled to the input of an extrapolation module (1 10).
- the extrapolation module is also coupled to a porting database (1 12) which contains the data relating to the porting of reference micronuclei to various parallel architectures.
- the porting architectures are representative of a panel of existing parallel architectures.
- the extrapolation module makes it possible, by extracting appropriate data from the porting database 1 12, to make predictions or projections of the performance of the micronuclei extracted from the incoming code on the different architectures and by parallel programming model.
- the result of extrapolations is then available at the output of the extrapolation module and can be presented to the user in various forms such as that illustrated for example in Figure 3 by Kiviat diagrams.
- the data contained in the reference database also makes it possible to produce statistical predictions of the performance of the application once it has been parallelized, on measures such as execution times, a number of monopolized resources or, for example, power consumption.
- Figure 2 illustrates the steps performed by method 200 of the invention in a preferred implementation.
- the method begins with a step (202) for receiving an executable code representative of the application.
- the executable code can be in C, C ++ or Fortran language or any other language compilable natively on the reference machine.
- the code to be analyzed is a non-optimized code.
- the method makes it possible to search for hot spots in the code.
- the application kernels that are extracted will be the parts of the code that will be optimized as will be detailed later.
- the step of extracting application kernels consists in breaking down the code, and searching for long continuous portions of program execution "discernible portions" and involving a minimum number of instructions of the program "compact portions".
- the extraction step is performed with a tool based on a functional x86 processor emulator.
- tools performing program hot-spot extraction can be used as well-known profiling and sampling tools such as GProf or Oprofile.
- the static instructions of the code are extracted to keep only the portions corresponding to the original source code.
- the method makes it possible to test whether the hot spot found covers a major part of the code of the application.
- the method repeats the step of searching for and extracting hot spots from the remainder of the code.
- the step of searching and extracting hot spots is done in the traces of dynamic instructions.
- the next step (206) allows the characterization of the extracted nuclei by calculating a signature representative of each hot spot.
- the signature is computed using the same emulator as used for the extraction step, and contains several metrics: (1) the stability of the data stream, (2) the parallelization ratio (3) the reuse distance of the data stream and (4) the data volume.
- the stability of the data stream is an indicator of the average number of producer locations for each of the instructions. It captures if the calculations follow a fixed data flow circuit or if data is subjected to complex address calculations. In the latter case, continuous architectures such as GPUs would not be effective targets. In addition, poor data flow stability can lead to limited parallelization possibilities because it means that many dependencies are revealed during execution.
- the parallelization ratio calculates on an ideal data flow graph the ratio between the ideal parallelism width and the number of executed instructions. A high value of this indicator means high parallelization possibilities.
- the data stream reuse distance gives the average time that one byte of data must be stored before reuse. This measurement is evaluated on an ideal data flow graph and allows to know the ideal locality of data that a kernel contains and to determine if the kernel would favor a large bandwidth or a low latency architecture.
- the data volume evaluates the total amount of data that the code executes. This information is important because the other signature parameters are independent of the data volume, all being calculated in relation to the number of executed instructions.
- these metrics are hardware-independent as much as possible in order to measure application-related information rather than architecture-related information.
- the synthetic metrics in this embodiment come from a richer intermediate representation consisting of a time-folded graph of the set of interactions between the different instructions of the input executable code.
- an intermediate representation is kept with the signature to allow to quickly recalculate new metrics without having to reproduce step 206.
- step 206 makes it possible to assign a signature to each application kernel of the non-optimized code version.
- step (208) is to compare the previously computed signature for an application core with reference micronucleus signatures.
- the method makes it possible to search, by a signature of a non-optimized code version, in the reference micronucleus database 106 and to correlate a non-optimized application core with a non-optimized reference micronucleus.
- the correlation calculation between the signatures is performed according to principal component analysis (PCA).
- PCA principal component analysis
- the method makes it possible to select for each application core, the closest reference micronucleus by retaining the reference micronucleus presenting with the application core an optimum distance.
- the next step (212) is for each non-optimized microkernel to extrapolate the performance of the non-optimized code to the target architectures by referring to data in the optimized port database for the selected optimal microkernel.
- the extrapolated performances are essentially the consumption and the speed of execution of a program, after parallelization and optimization.
- the extrapolation consists in extracting from the database of ports 1 12 the relevant data for the non-optimized micronucleus studied. Extrapolation allows the estimation of performances based on concrete and empirical portations resulting from the business expertise.
- the result of the extrapolation (214) can be presented to the user in a variety of forms to enable selection of the appropriate target platform for its constraints.
- FIG. 3 illustrates results obtained by the method of the invention as part of an analysis of a code relating to an image processing application.
- the reference micronucleus base (106) is composed of the following six micronuclei:
- the 'Max 3x3' kernel is well known to those skilled in the art as a 2D memory access filter, which performs more memory access than operations.
- the 'Deriche Filter' and 'FGL Filter' cores are respectively x8 and x4 1D filters. These filters have horizontal and vertical cross access patterns and their dependencies are causal and anticausal.
- the 'Quad-tree variance calculation' kernel is an algorithm that partitions the image into a zone of low variance. This algorithm exhibits a recursive behavior by the fact that it partitions more and more finely the areas of the image with strong variance. By construction, this algorithm is also strongly dependent on the data (values of the pixels of the image).
- the integral image is an algorithm that calculates for each destination pixel, the sum of all the source pixels at the top left of the destination pixel. This algorithm exhibits a diagonal dependency scheme present in many image processing algorithms.
- the 'Matrix Multiplication' micronucleus is a well-known algorithm that displays a very characteristic 3D access pattern.
- OpenMP Open Multi-Processing
- Farming OpenGL (Open Computing Language)
- CUDA Computer Unified Device Architecture
- the farming model was developed in C using the PThread library. In this model, the task to be performed is split into many independent subtasks and executed on fewer threads of work.
- OpenGL and CUDA are standard languages used to program Graphics Processing Units (GPUs).
- OpenCL is also used for Intel® multiprocessors.
- the input data set in the example shown corresponds to images whose size is in the range of 256 * 256 up to 2048 * 2048 pixels.
- the parameters of the reference base used by the method are:
- a prediction of the performance is performed. This prediction provides insight into the best architecture and programming model to use. Once a 'programming model / architecture' pair is chosen, acceleration measurements (m_speedup) can be extracted from the database. To calculate the final execution time on the target platform (Predicted_time), a sequential performance report (arch_factor) between the reference architecture and the tested architecture is also needed.
- variable 'Seq_ref_time' represents the sequential execution time of the portions of the application outside the hot spots.
- the variable 'Seq_kernel_ref_time' represents the sequential execution time of the code portions of the application corresponding to the hot spots.
- a correlation between the extracted micronuclei provides a confidence coefficient that can be used to determine whether the selected reference nucleus is actually very close to the application core.
- the method of the invention also makes it possible to perform a correlation between the reference cores and thus to evaluate maximum and average values for the coefficient of confidence, the minimum values being always at zero and corresponding to kernel comparisons with them. same.
- the selected reference kernels are considered good candidates when their confidence coefficient (in comparison with the application kernels) is below a minimum confidence coefficient of two distinct reference kernels.
- FIG. 3 shows respectively for each of the four architectures studied, the results obtained by operating the method of the invention according to seven parameters: (302) multi-core performance; (304) Single-core efficiency; (306) Number of hearts; (308) Energy efficiency; (310) Ease of porting; (312) Memory capacity; (314) Regularity of performance. Even without a detailed forecast of application performance, these visual diagrams provide a user with a quick comparison between the four target platforms for these parameters and help select the most promising platform.
- the illustrated example is essentially for a performance prediction calculation, the method of the invention makes it possible to execute it for other predictions, such as for example latency measurements.
- the support can be electronic, magnetic, optical, electromagnetic or be an infrared type of diffusion medium.
- Such supports are, for example, Random Access Memory RAMs (ROMs), magnetic or optical tapes, disks or disks (Compact Disk - Read Only Memory (CD-ROM), Compact Disk - Read / Write (CD-R / W) and DVD).
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Devices For Executing Special Programs (AREA)
- Stored Programmes (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1452304A FR3018932B1 (fr) | 2014-03-20 | 2014-03-20 | Procede et dispositif d'aide a l'optimisation et la parallelisation de code |
PCT/EP2015/055040 WO2015140021A1 (fr) | 2014-03-20 | 2015-03-11 | Procede et dispositif d'aide a l'optimisation et la parallelisation de code |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3120243A1 true EP3120243A1 (de) | 2017-01-25 |
Family
ID=51303071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP15709476.4A Withdrawn EP3120243A1 (de) | 2014-03-20 | 2015-03-11 | Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierung |
Country Status (4)
Country | Link |
---|---|
US (1) | US20170090891A1 (de) |
EP (1) | EP3120243A1 (de) |
FR (1) | FR3018932B1 (de) |
WO (1) | WO2015140021A1 (de) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2539961B (en) * | 2015-07-03 | 2022-03-02 | Fujitsu Ltd | Code hotspot encapsulation |
JP6953800B2 (ja) | 2016-07-08 | 2021-10-27 | 富士通株式会社 | シミュレーションジョブを実行するためのシステム、コントローラ、方法、及びプログラム |
CN107451213A (zh) * | 2017-07-17 | 2017-12-08 | 广州特道信息科技有限公司 | 舆情分析方法及装置 |
US10878082B2 (en) | 2019-03-25 | 2020-12-29 | Aurora Labs Ltd. | Dynamic CFI using line-of-code behavior and relation models |
US11775317B2 (en) * | 2021-04-30 | 2023-10-03 | International Business Machines Corporation | Locate neural network performance hot spots |
FR3122752B1 (fr) * | 2021-05-05 | 2023-09-29 | Centre Nat Etd Spatiales | Procédé mis en œuvre par ordinateur pour déterminer automatiquement une architecture cible. |
CN113852814B (zh) * | 2021-07-19 | 2023-06-16 | 南京邮电大学 | 数据级和任务级融合的并行解码方法、装置及存储介质 |
WO2023234952A1 (en) * | 2022-06-03 | 2023-12-07 | Google Llc | Caching compilation outputs using optimization profiles |
-
2014
- 2014-03-20 FR FR1452304A patent/FR3018932B1/fr not_active Expired - Fee Related
-
2015
- 2015-03-11 WO PCT/EP2015/055040 patent/WO2015140021A1/fr active Application Filing
- 2015-03-11 EP EP15709476.4A patent/EP3120243A1/de not_active Withdrawn
- 2015-03-11 US US15/126,820 patent/US20170090891A1/en not_active Abandoned
Non-Patent Citations (2)
Title |
---|
None * |
See also references of WO2015140021A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2015140021A1 (fr) | 2015-09-24 |
FR3018932B1 (fr) | 2016-12-09 |
US20170090891A1 (en) | 2017-03-30 |
FR3018932A1 (fr) | 2015-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015140021A1 (fr) | Procede et dispositif d'aide a l'optimisation et la parallelisation de code | |
Qasaimeh et al. | Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels | |
EP1704476B1 (de) | System zum automatischen erzeugen optimierter codes | |
Wang et al. | CAVBench: A benchmark suite for connected and autonomous vehicles | |
Nardi et al. | Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM | |
Verma et al. | Performance evaluation of deep learning compilers for edge inference | |
EP2805234B1 (de) | Verfahren zur optimierung der parallelen verarbeitung von daten auf einer hardwareplattform | |
Antikainen et al. | Nonnegative tensor factorization accelerated using GPGPU | |
GB2555673A (en) | Image patch matching using probabilistic sampling based on an oracle | |
US11398015B2 (en) | Iterative image inpainting with confidence feedback | |
WO2015183851A1 (en) | Combining compute tasks for a graphics processing unit | |
Gutiérrez-Zaballa et al. | On-chip hyperspectral image segmentation with fully convolutional networks for scene understanding in autonomous driving | |
US10754630B2 (en) | Build-time code section-specific compiler selection | |
US11443118B2 (en) | Word embedding method and apparatus, and word search method | |
Yatskou et al. | Simulation modelling and machine learning platform for processing fluorescence spectroscopy data | |
Peredo et al. | Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming | |
Ortega et al. | High performance computing for optical diffraction tomography | |
WO2022187843A1 (en) | Methods and systems for raman spectra-based identification of chemical compounds | |
Saini et al. | Bang for the Buck: Evaluating the Cost-Effectiveness of Heterogeneous Edge Platforms for Neural Network Workloads | |
Arunachalam et al. | End-to-end industrial IoT: software optimization and acceleration | |
Alawneh et al. | Ice simulation using GPGPU | |
Tang | TensorRT inference performance study in MLModelScope | |
Danopoulos et al. | A quantitative comparison for image recognition on accelerated heterogeneous cloud infrastructures | |
Gouin et al. | Threewise: a local variance algorithm for GPU | |
US20240220571A1 (en) | Vectorized sparse convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20160916 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20170717 |
|
R17C | First examination report despatched (corrected) |
Effective date: 20171017 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20201001 |