EP3120243A1 - Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierung - Google Patents

Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierung

Info

Publication number
EP3120243A1
EP3120243A1 EP15709476.4A EP15709476A EP3120243A1 EP 3120243 A1 EP3120243 A1 EP 3120243A1 EP 15709476 A EP15709476 A EP 15709476A EP 3120243 A1 EP3120243 A1 EP 3120243A1
Authority
EP
European Patent Office
Prior art keywords
code
optimized
application
versions
different
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP15709476.4A
Other languages
English (en)
French (fr)
Inventor
Alexandre GUERRE
Yves LHUILLIER
Jean-Thomas AQUAVIVA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Original Assignee
Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Commissariat a lEnergie Atomique et aux Energies Alternatives CEA filed Critical Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Publication of EP3120243A1 publication Critical patent/EP3120243A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3428Benchmarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/83Indexing scheme relating to error detection, to error correction, and to monitoring the solution involving signatures

Definitions

  • the invention relates to the field of software engineering for parallel architecture, and in particular that of assistance with optimization and code parallelization.
  • Code optimization typically involves making changes to the code in order to reduce resource requirements, reducing function execution times, or improving power consumption.
  • tools help with code optimization.
  • For sequential architectures it is known to use a description in C language of a sequential algorithm.
  • code optimization requires human intervention.
  • human intervention often introduces a variability in the quality of the codes generated for each of the parallel computing architectures. This variability raises various problems, in particular that related to the comparison of two parallel architectures where the result of the analysis is subjective because strongly dependent on the expertise of the developer of the architectures studied.
  • Another problem is related to the prediction of the performances of a new application code for several target architectures, the prediction can be imprecise because dependent on the human expertise of the
  • the Tanabe patent application US 2009/0138862 A1 proposes a device for aiding parallelization, which performs a dependency analysis to extract the opportunities for parallelization within a program.
  • the parallelization opportunities correspond to the statistically possible parallelizations for a given application. No indication as to the method of parallelizing, nor as to the potential gains, is provided. As such, the expertise related to parallelization is not taken into account.
  • An object of the present invention is to provide a method for synthesizing and formalizing the expertise of developers parallel architectures to allow any developer to be able to estimate the performance and consumption of application codes on various architectures of calculations.
  • the technical advantages of the present invention are to allow an estimation of the performance and the consumption of application codes on various computation architectures, without requiring the intervention of expert developers, nor the porting of codes on the architectures concerned.
  • the device of the present invention makes it possible to assist a developer in the effort of porting a code from a source architecture to a target architecture, starting from a non-optimized application code in a language that can be compiled natively on a network.
  • reference platform such as C, C ++, FORTRAN for example.
  • the device of the invention advantageously comprises a database of existing experimental measurements that can be enriched.
  • the measurements are either made by the process operator or imported from outside experiments.
  • Each experimental measurement consists of evaluating the performance of several reference application codes on several target architectures.
  • Each reference application code is available in a non-optimized and sequential version, allowing direct performance evaluation on a single core of each target architecture.
  • Each reference code is also available in parallelized version and optimized for each target architecture.
  • the invention will find application to make studies on the choice, the implementation, the performance possible for porting applications on new architectures.
  • the invention will apply to the industrial field where the application codes often evolve less rapidly than the parallel computing architectures, and where the problem of porting existing application code to new parallel architectures is crucial.
  • the present invention makes it possible to assist manufacturers in the porting of "business" application codes to advanced parallel architectures whose complexity may be difficult to master.
  • the method of the invention makes it possible to qualify and compare new parallel architectures in order to better understand an offer available on the market.
  • a method of assisting the optimization and code parallelization of an application running on a computer includes the steps of: comparing a portion of code representing a hot spot of an application to a plurality of versions non-optimized code to determine a correlation with at least one non-optimized code version; and - generating from said at least one non-optimized code version, performance predictions for different architectures and different parallel programming models for said hotspot.
  • the comparing step includes calculating a correlation coefficient between said hot spot and the plurality of non-optimized code versions.
  • the comparison step comprises a step of generating a signature for said hot point and comparing the signature with a plurality of signatures associated with the plurality of non-optimized code versions.
  • the comparison step between the signatures is performed according to a principal component analysis (PCA).
  • PCA principal component analysis
  • the signatures associated with the plurality of non-optimized code versions contain at least metrics relating to the stability of a data stream, to a parallelization ratio, to a re-use distance of the data stream and to a volume of data.
  • the plurality of non-optimized code versions are stored in a reference database where each non-optimized code version is a non-optimized code version for a reference platform and is associated with different optimized code versions and parallelized on different architectures and according to different models of parallel programming.
  • the different code versions optimized and parallelized on different architectures and according to different parallel programming models are stored in a porting database and the step of generating predictions consists in extracting porting data for said porting database. non-optimized code version.
  • the method further comprises a step that makes it possible to display the result of the predictions for a user.
  • the result is displayed as Kiviat diagrams.
  • the method may include an initial step of receiving an executable code of an application to be optimized and parallelized and a step of detecting in the executable code of a portion of code representing a hotspot.
  • the invention also covers a device which comprises means for implementing the method.
  • the invention may operate in the form of a computer program product that includes code instructions for performing the claimed process steps when the program is run on a computer.
  • Figure 1 schematically shows a device in which the invention can be implemented
  • Figure 2 shows a sequence of steps of the method of the invention in one embodiment
  • FIG. 3 illustrates in the form of radar diagrams the result of the method of the invention for an example of application.
  • the device of the invention comprises an extraction module (102) able to analyze a non-optimized executable code representative of an application and to extract hot spots from the code.
  • Hot spots are portions of the code that penalize the performance of the application. These portions typically represent the least amount of code line for the greatest run time.
  • Hot spots are unoptimized code portions representing discernable and compact phases of the original application.
  • the executable code entering on the extraction module is a code generated, by a compilation device, from the source code of the application to be analyzed.
  • the executable code can be either a file available in the direct environment of the device (100), stored on an internal disk of a computer implementing the device and operated by a user, either a file from a near or far external source.
  • Executable code can come from a compiler that converts source code into C / C ++ or Fortran.
  • the executable code is executed by an emulator in order to extract the appropriate characteristics.
  • the device (100) performing the analysis of the executable code of this application emulates the execution of the executable code on its dataset.
  • the application dataset is the input image.
  • the extraction module (102) is coupled to a characterization module (104) capable of characterizing the hot spots extracted from the code.
  • the characterization of hot spots consists of calculating a signature for each hot point extracted from the incoming code.
  • the characterization module is also coupled to a database (106) of reference micronuclei.
  • the base (106) is an empirical knowledge base of known optimization and parallelization techniques, either from the process operator or from external sources, consisting of reference micronuclei.
  • the knowledge base contains six reference micronuclei making it possible to cover the algorithmic space of vision as widely as possible.
  • the reference micronuclei are chosen according to several parameters such as the type of data access, for example a linear or random input image path, such as the regularity of the data, for example the fact that the nature of the calculations is predictable before execution or on the contrary if the nature of the calculations depends on the intermediate calculations at the time of execution, such as the complexity of the data, for example the number of different calculations performed on a single datum (on each pixel of an image for example).
  • Each reference micronucleus has a non-optimized code version that corresponds to a basic way of coding on a reference platform and different code versions optimized and parallelized on different architectures.
  • the reference platform is an x86 processor.
  • Input images are generated randomly and measurements are made on different image sizes. The multitude of parameters on the input images makes it possible to characterize the micronucleus algorithm independently of its inputs.
  • the database of measurements that is obtained has four input axes: (1) the target architecture, (2) the micronucleus, (3) the size of the input data set and (4) the type of parallelization relative to different programming models (for example, data-level parallelization or task-level parallelization and optimization.
  • Micronuclei can come from outside sources, provided by or retrieved from developers around the world to accumulate past expertise. The choice of micronucleus is made according to a field of application in order to increase the precision and the relevance of the process.
  • the characterization module (104) calculates a signature for any execution of executable code on an input dataset.
  • the module calculates the signature of each reference micronucleus of the knowledge base on each of its input datasets.
  • the calculation of the signatures of the reference micronuclei is done only once, during the integration of the reference micronucleus into the database, consisting of a calibration process. This calculation is performed before using the device on an input application.
  • the characterization module makes it possible to calculate the signatures of the extracted hot spots by executing the executable code of the input application with its data set.
  • the output of the signature module 104 is coupled to the input of a correlation module (108) which is itself coupled to the base of reference micronuclei.
  • the correlation module makes it possible to establish correlations between the signature of a portion of code extracted from the code of the input application and the signatures of the reference micronuclei of the knowledge base 106.
  • the output of the correlation module is coupled to the input of an extrapolation module (1 10).
  • the extrapolation module is also coupled to a porting database (1 12) which contains the data relating to the porting of reference micronuclei to various parallel architectures.
  • the porting architectures are representative of a panel of existing parallel architectures.
  • the extrapolation module makes it possible, by extracting appropriate data from the porting database 1 12, to make predictions or projections of the performance of the micronuclei extracted from the incoming code on the different architectures and by parallel programming model.
  • the result of extrapolations is then available at the output of the extrapolation module and can be presented to the user in various forms such as that illustrated for example in Figure 3 by Kiviat diagrams.
  • the data contained in the reference database also makes it possible to produce statistical predictions of the performance of the application once it has been parallelized, on measures such as execution times, a number of monopolized resources or, for example, power consumption.
  • Figure 2 illustrates the steps performed by method 200 of the invention in a preferred implementation.
  • the method begins with a step (202) for receiving an executable code representative of the application.
  • the executable code can be in C, C ++ or Fortran language or any other language compilable natively on the reference machine.
  • the code to be analyzed is a non-optimized code.
  • the method makes it possible to search for hot spots in the code.
  • the application kernels that are extracted will be the parts of the code that will be optimized as will be detailed later.
  • the step of extracting application kernels consists in breaking down the code, and searching for long continuous portions of program execution "discernible portions" and involving a minimum number of instructions of the program "compact portions".
  • the extraction step is performed with a tool based on a functional x86 processor emulator.
  • tools performing program hot-spot extraction can be used as well-known profiling and sampling tools such as GProf or Oprofile.
  • the static instructions of the code are extracted to keep only the portions corresponding to the original source code.
  • the method makes it possible to test whether the hot spot found covers a major part of the code of the application.
  • the method repeats the step of searching for and extracting hot spots from the remainder of the code.
  • the step of searching and extracting hot spots is done in the traces of dynamic instructions.
  • the next step (206) allows the characterization of the extracted nuclei by calculating a signature representative of each hot spot.
  • the signature is computed using the same emulator as used for the extraction step, and contains several metrics: (1) the stability of the data stream, (2) the parallelization ratio (3) the reuse distance of the data stream and (4) the data volume.
  • the stability of the data stream is an indicator of the average number of producer locations for each of the instructions. It captures if the calculations follow a fixed data flow circuit or if data is subjected to complex address calculations. In the latter case, continuous architectures such as GPUs would not be effective targets. In addition, poor data flow stability can lead to limited parallelization possibilities because it means that many dependencies are revealed during execution.
  • the parallelization ratio calculates on an ideal data flow graph the ratio between the ideal parallelism width and the number of executed instructions. A high value of this indicator means high parallelization possibilities.
  • the data stream reuse distance gives the average time that one byte of data must be stored before reuse. This measurement is evaluated on an ideal data flow graph and allows to know the ideal locality of data that a kernel contains and to determine if the kernel would favor a large bandwidth or a low latency architecture.
  • the data volume evaluates the total amount of data that the code executes. This information is important because the other signature parameters are independent of the data volume, all being calculated in relation to the number of executed instructions.
  • these metrics are hardware-independent as much as possible in order to measure application-related information rather than architecture-related information.
  • the synthetic metrics in this embodiment come from a richer intermediate representation consisting of a time-folded graph of the set of interactions between the different instructions of the input executable code.
  • an intermediate representation is kept with the signature to allow to quickly recalculate new metrics without having to reproduce step 206.
  • step 206 makes it possible to assign a signature to each application kernel of the non-optimized code version.
  • step (208) is to compare the previously computed signature for an application core with reference micronucleus signatures.
  • the method makes it possible to search, by a signature of a non-optimized code version, in the reference micronucleus database 106 and to correlate a non-optimized application core with a non-optimized reference micronucleus.
  • the correlation calculation between the signatures is performed according to principal component analysis (PCA).
  • PCA principal component analysis
  • the method makes it possible to select for each application core, the closest reference micronucleus by retaining the reference micronucleus presenting with the application core an optimum distance.
  • the next step (212) is for each non-optimized microkernel to extrapolate the performance of the non-optimized code to the target architectures by referring to data in the optimized port database for the selected optimal microkernel.
  • the extrapolated performances are essentially the consumption and the speed of execution of a program, after parallelization and optimization.
  • the extrapolation consists in extracting from the database of ports 1 12 the relevant data for the non-optimized micronucleus studied. Extrapolation allows the estimation of performances based on concrete and empirical portations resulting from the business expertise.
  • the result of the extrapolation (214) can be presented to the user in a variety of forms to enable selection of the appropriate target platform for its constraints.
  • FIG. 3 illustrates results obtained by the method of the invention as part of an analysis of a code relating to an image processing application.
  • the reference micronucleus base (106) is composed of the following six micronuclei:
  • the 'Max 3x3' kernel is well known to those skilled in the art as a 2D memory access filter, which performs more memory access than operations.
  • the 'Deriche Filter' and 'FGL Filter' cores are respectively x8 and x4 1D filters. These filters have horizontal and vertical cross access patterns and their dependencies are causal and anticausal.
  • the 'Quad-tree variance calculation' kernel is an algorithm that partitions the image into a zone of low variance. This algorithm exhibits a recursive behavior by the fact that it partitions more and more finely the areas of the image with strong variance. By construction, this algorithm is also strongly dependent on the data (values of the pixels of the image).
  • the integral image is an algorithm that calculates for each destination pixel, the sum of all the source pixels at the top left of the destination pixel. This algorithm exhibits a diagonal dependency scheme present in many image processing algorithms.
  • the 'Matrix Multiplication' micronucleus is a well-known algorithm that displays a very characteristic 3D access pattern.
  • OpenMP Open Multi-Processing
  • Farming OpenGL (Open Computing Language)
  • CUDA Computer Unified Device Architecture
  • the farming model was developed in C using the PThread library. In this model, the task to be performed is split into many independent subtasks and executed on fewer threads of work.
  • OpenGL and CUDA are standard languages used to program Graphics Processing Units (GPUs).
  • OpenCL is also used for Intel® multiprocessors.
  • the input data set in the example shown corresponds to images whose size is in the range of 256 * 256 up to 2048 * 2048 pixels.
  • the parameters of the reference base used by the method are:
  • a prediction of the performance is performed. This prediction provides insight into the best architecture and programming model to use. Once a 'programming model / architecture' pair is chosen, acceleration measurements (m_speedup) can be extracted from the database. To calculate the final execution time on the target platform (Predicted_time), a sequential performance report (arch_factor) between the reference architecture and the tested architecture is also needed.
  • variable 'Seq_ref_time' represents the sequential execution time of the portions of the application outside the hot spots.
  • the variable 'Seq_kernel_ref_time' represents the sequential execution time of the code portions of the application corresponding to the hot spots.
  • a correlation between the extracted micronuclei provides a confidence coefficient that can be used to determine whether the selected reference nucleus is actually very close to the application core.
  • the method of the invention also makes it possible to perform a correlation between the reference cores and thus to evaluate maximum and average values for the coefficient of confidence, the minimum values being always at zero and corresponding to kernel comparisons with them. same.
  • the selected reference kernels are considered good candidates when their confidence coefficient (in comparison with the application kernels) is below a minimum confidence coefficient of two distinct reference kernels.
  • FIG. 3 shows respectively for each of the four architectures studied, the results obtained by operating the method of the invention according to seven parameters: (302) multi-core performance; (304) Single-core efficiency; (306) Number of hearts; (308) Energy efficiency; (310) Ease of porting; (312) Memory capacity; (314) Regularity of performance. Even without a detailed forecast of application performance, these visual diagrams provide a user with a quick comparison between the four target platforms for these parameters and help select the most promising platform.
  • the illustrated example is essentially for a performance prediction calculation, the method of the invention makes it possible to execute it for other predictions, such as for example latency measurements.
  • the support can be electronic, magnetic, optical, electromagnetic or be an infrared type of diffusion medium.
  • Such supports are, for example, Random Access Memory RAMs (ROMs), magnetic or optical tapes, disks or disks (Compact Disk - Read Only Memory (CD-ROM), Compact Disk - Read / Write (CD-R / W) and DVD).

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)
EP15709476.4A 2014-03-20 2015-03-11 Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierung Withdrawn EP3120243A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1452304A FR3018932B1 (fr) 2014-03-20 2014-03-20 Procede et dispositif d'aide a l'optimisation et la parallelisation de code
PCT/EP2015/055040 WO2015140021A1 (fr) 2014-03-20 2015-03-11 Procede et dispositif d'aide a l'optimisation et la parallelisation de code

Publications (1)

Publication Number Publication Date
EP3120243A1 true EP3120243A1 (de) 2017-01-25

Family

ID=51303071

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15709476.4A Withdrawn EP3120243A1 (de) 2014-03-20 2015-03-11 Verfahren und vorrichtung zur unterstützung mit code-optimierung und -parallelisierung

Country Status (4)

Country Link
US (1) US20170090891A1 (de)
EP (1) EP3120243A1 (de)
FR (1) FR3018932B1 (de)
WO (1) WO2015140021A1 (de)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2539961B (en) * 2015-07-03 2022-03-02 Fujitsu Ltd Code hotspot encapsulation
JP6953800B2 (ja) 2016-07-08 2021-10-27 富士通株式会社 シミュレーションジョブを実行するためのシステム、コントローラ、方法、及びプログラム
CN107451213A (zh) * 2017-07-17 2017-12-08 广州特道信息科技有限公司 舆情分析方法及装置
US10878082B2 (en) 2019-03-25 2020-12-29 Aurora Labs Ltd. Dynamic CFI using line-of-code behavior and relation models
US11775317B2 (en) * 2021-04-30 2023-10-03 International Business Machines Corporation Locate neural network performance hot spots
FR3122752B1 (fr) * 2021-05-05 2023-09-29 Centre Nat Etd Spatiales Procédé mis en œuvre par ordinateur pour déterminer automatiquement une architecture cible.
CN113852814B (zh) * 2021-07-19 2023-06-16 南京邮电大学 数据级和任务级融合的并行解码方法、装置及存储介质
WO2023234952A1 (en) * 2022-06-03 2023-12-07 Google Llc Caching compilation outputs using optimization profiles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2015140021A1 *

Also Published As

Publication number Publication date
WO2015140021A1 (fr) 2015-09-24
FR3018932B1 (fr) 2016-12-09
US20170090891A1 (en) 2017-03-30
FR3018932A1 (fr) 2015-09-25

Similar Documents

Publication Publication Date Title
WO2015140021A1 (fr) Procede et dispositif d'aide a l'optimisation et la parallelisation de code
Qasaimeh et al. Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels
EP1704476B1 (de) System zum automatischen erzeugen optimierter codes
Wang et al. CAVBench: A benchmark suite for connected and autonomous vehicles
Nardi et al. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM
Verma et al. Performance evaluation of deep learning compilers for edge inference
EP2805234B1 (de) Verfahren zur optimierung der parallelen verarbeitung von daten auf einer hardwareplattform
Antikainen et al. Nonnegative tensor factorization accelerated using GPGPU
GB2555673A (en) Image patch matching using probabilistic sampling based on an oracle
US11398015B2 (en) Iterative image inpainting with confidence feedback
WO2015183851A1 (en) Combining compute tasks for a graphics processing unit
Gutiérrez-Zaballa et al. On-chip hyperspectral image segmentation with fully convolutional networks for scene understanding in autonomous driving
US10754630B2 (en) Build-time code section-specific compiler selection
US11443118B2 (en) Word embedding method and apparatus, and word search method
Yatskou et al. Simulation modelling and machine learning platform for processing fluorescence spectroscopy data
Peredo et al. Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming
Ortega et al. High performance computing for optical diffraction tomography
WO2022187843A1 (en) Methods and systems for raman spectra-based identification of chemical compounds
Saini et al. Bang for the Buck: Evaluating the Cost-Effectiveness of Heterogeneous Edge Platforms for Neural Network Workloads
Arunachalam et al. End-to-end industrial IoT: software optimization and acceleration
Alawneh et al. Ice simulation using GPGPU
Tang TensorRT inference performance study in MLModelScope
Danopoulos et al. A quantitative comparison for image recognition on accelerated heterogeneous cloud infrastructures
Gouin et al. Threewise: a local variance algorithm for GPU
US20240220571A1 (en) Vectorized sparse convolution

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20160916

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20170717

R17C First examination report despatched (corrected)

Effective date: 20171017

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20201001