Liu, 2017 - Google Patents

An Optimization Compiler Framework Based on Polyhedron Model for GPGPUs

Liu, 2017

Document ID: 11213241217785687214
Author: Liu L
Publication year: 2017

External Links

Cited by

Snippet

General purpose GPU (GPGPU) is an effective many-core architecture that can yield high throughput for many scientific applications with thread-level parallelism. However, several challenges still limit further performance improvements and make GPU programming …

Continue reading at corescholar.libraries.wright.edu (PDF) (other versions)

238000005457 optimization 0 title abstract description 72

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling, out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling, out of order instruction execution from multiple instruction streams, e.g. multistreaming
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software

Similar Documents

Publication	Publication Date	Title
US8782645B2 (en)	2014-07-15	Automatic load balancing for heterogeneous cores
US10430190B2 (en)	2019-10-01	Systems and methods for selectively controlling multithreaded execution of executable code segments
Kim et al.	2011	CuMAPz: A tool to analyze memory access patterns in CUDA
EP4291980B1 (en)	2024-12-11	System for auto-parallelization of processing codes for multi-processor systems with optimized latency, and method thereof
Hayes et al.	2014	Unified on-chip memory allocation for SIMT architecture
Baskaran et al.	2022	An architecture interface and offload model for low-overhead, near-data, distributed accelerators
Kandemir et al.	2009	Optimizing shared cache behavior of chip multiprocessors
Gerzhoy et al.	2019	Nested mimd-simd parallelization for heterogeneous microprocessors
Huynh et al.	2013	Mapping streaming applications onto GPU systems
Kim et al.	2013	Memory performance estimation of CUDA programs
Garcia et al.	2013	A dynamic schema to increase performance in many-core architectures through percolation operations
Liu	2017	An Optimization Compiler Framework Based on Polyhedron Model for GPGPUs
Wu et al.	2020	A model-based software solution for simultaneous multiple kernels on GPUs
Braak et al.	2016	R-gpu: A reconfigurable gpu architecture
Zuzak et al.	2016	Exploiting multi-loop parallelism on heterogeneous microprocessors
Rajput et al.	2013	Proactive bottleneck performance analysis in parallel computing using openMP
Ding et al.	2012	Multicore-aware code co-positioning to reduce WCET on dual-core processors with shared instruction caches
Arandi	2012	The data-driven multithreading virtual machine
Liu et al.	2019	A Compiler-assisted locality aware CTA Mapping Scheme.
Zhang	2016	Inter-warp Divergence Aware Execution on GPUs
Damani	2022	OPTIMIZED SCHEDULING AND RESOURCE ALLOCATION FOR THREAD PARALLEL ARCHITECTURES
Murthy	2009	Optimal loop unrolling for GPGPU programs
Baskaran	2009	Compile-time and Run-time Optimizations for Enhancing Locality and Parallelism on Multi-core and Many-core Systems
Kumar et al.	2011	A new compiler for space-time scheduling of ILP processors
VIANELLO	2019	Development of a multi-GPU Navier-Stokes solver