Palamadai Natarajan et al., 2017 - Google Patents
Autotuning divide‐and‐conquer stencil computationsPalamadai Natarajan et al., 2017
View PDF- Document ID
- 721709892673851483
- Author
- Palamadai Natarajan E
- Mehri Dehnavi M
- Leiserson C
- Publication year
- Publication venue
- Concurrency and Computation: Practice and Experience
External Links
Snippet
This paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐ exhaustive” autotuning. We present a pruned‐exhaustive autotuner called Ztune that …
- 238000000034 method 0 description 25
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30442—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30389—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | Gunrock: GPU graph analytics | |
| Gao et al. | Estimating GPU memory consumption of deep learning models | |
| US10437573B2 (en) | General purpose distributed data parallel computing using a high level language | |
| Sundaram et al. | Graphmat: High performance graph analytics made productive | |
| US8239847B2 (en) | General distributed reduction for data parallel computing | |
| Xie et al. | Fast iterative graph computation with block updates | |
| Reguly et al. | Loop tiling in large-scale stencil codes at run-time with OPS | |
| Gareev et al. | High-performance generalized tensor operations: A compiler-oriented approach | |
| Tang et al. | Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency | |
| Christen et al. | Patus for convenient high-performance stencils: evaluation in earthquake simulations | |
| Elafrou et al. | Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms | |
| Sevenich et al. | Using domain-specific languages for analytic graph databases | |
| Izquierdo-Carrasco et al. | A generic vectorization scheme and a GPU kernel for the phylogenetic likelihood library | |
| Mustafa | A survey of performance tuning techniques and tools for parallel applications | |
| Devarajan et al. | Vidya: Performing code-block I/O characterization for data access optimization | |
| Wernsing et al. | Elastic computing: A portable optimization framework for hybrid computers | |
| Goda et al. | Out-of-order execution of database queries | |
| Sankaran et al. | Benchmarking the linear algebra awareness of tensorflow and pytorch | |
| Satish et al. | Can traditional programming bridge the ninja performance gap for parallel computing applications? | |
| Abdelfattah et al. | Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers | |
| Palamadai Natarajan et al. | Autotuning divide‐and‐conquer stencil computations | |
| Lukarski | Parallel sparse linear algebra for multi-core and many-core platforms: Parallel solvers and preconditioners | |
| Palkowski et al. | Tiling Nussinov’s RNA folding loop nest with a space-time approach | |
| AlOnazi et al. | Asynchronous task-based parallelization of algebraic multigrid | |
| Hofmann et al. | Performance analysis of the Kahan-enhanced scalar product on current multicore processors |