Palamadai Natarajan et al., 2017 - Google Patents

Autotuning divide‐and‐conquer stencil computations

Palamadai Natarajan et al., 2017

Document ID: 721709892673851483
Author: Palamadai Natarajan E; Mehri Dehnavi M; Leiserson C
Publication year: 2017
Publication venue: Concurrency and Computation: Practice and Experience

External Links

Cited by

Snippet

This paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐ exhaustive” autotuning. We present a pruned‐exhaustive autotuner called Ztune that …

Continue reading at www.cs.toronto.edu (PDF) (other versions)

238000000034 method 0 description 25

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30442—Query optimisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30389—Query formulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring

Similar Documents

Publication	Publication Date	Title
Wang et al.	2017	Gunrock: GPU graph analytics
Gao et al.	2020	Estimating GPU memory consumption of deep learning models
US10437573B2 (en)	2019-10-08	General purpose distributed data parallel computing using a high level language
Sundaram et al.	2015	Graphmat: High performance graph analytics made productive
US8239847B2 (en)	2012-08-07	General distributed reduction for data parallel computing
Xie et al.	2013	Fast iterative graph computation with block updates
Reguly et al.	2017	Loop tiling in large-scale stencil codes at run-time with OPS
Gareev et al.	2018	High-performance generalized tensor operations: A compiler-oriented approach
Tang et al.	2015	Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency
Christen et al.	2012	Patus for convenient high-performance stencils: evaluation in earthquake simulations
Elafrou et al.	2018	Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms
Sevenich et al.	2016	Using domain-specific languages for analytic graph databases
Izquierdo-Carrasco et al.	2013	A generic vectorization scheme and a GPU kernel for the phylogenetic likelihood library
Mustafa	2022	A survey of performance tuning techniques and tools for parallel applications
Devarajan et al.	2018	Vidya: Performing code-block I/O characterization for data access optimization
Wernsing et al.	2012	Elastic computing: A portable optimization framework for hybrid computers
Goda et al.	2020	Out-of-order execution of database queries
Sankaran et al.	2022	Benchmarking the linear algebra awareness of tensorflow and pytorch
Satish et al.	2015	Can traditional programming bridge the ninja performance gap for parallel computing applications?
Abdelfattah et al.	2022	Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers
Palamadai Natarajan et al.	2017	Autotuning divide‐and‐conquer stencil computations
Lukarski	2012	Parallel sparse linear algebra for multi-core and many-core platforms: Parallel solvers and preconditioners
Palkowski et al.	2019	Tiling Nussinov’s RNA folding loop nest with a space-time approach
AlOnazi et al.	2017	Asynchronous task-based parallelization of algebraic multigrid
Hofmann et al.	2015	Performance analysis of the Kahan-enhanced scalar product on current multicore processors