Ravi et al., 2010 - Google Patents
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurationsRavi et al., 2010
View PDF- Document ID
- 8401803320159736434
- Author
- Ravi V
- Ma W
- Chiu D
- Agrawal G
- Publication year
- Publication venue
- Proceedings of the 24th ACM international conference on supercomputing
External Links
Snippet
A trend that has materialized, and has given rise to much attention, is of the increasingly  heterogeneous computing platforms. Presently, it has become very common for a desktop or  a notebook computer to come equipped with both a multi-core CPU and a GPU. Capitalizing … 
    - 230000001603 reducing 0 title description 69
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| Ravi et al. | Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations | |
| Chen et al. | GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data | |
| Chen et al. | FlinkCL: An OpenCL-based in-memory computing architecture on heterogeneous CPU-GPU clusters for big data | |
| Bauer et al. | Legate NumPy: Accelerated and distributed array computing | |
| Callahan et al. | The cascade high productivity language | |
| Unat et al. | Mint: realizing CUDA performance in 3D stencil methods with annotated C | |
| Chen et al. | Optimizing mapreduce for gpus with effective shared memory usage | |
| Kim et al. | Achieving a single compute device image in OpenCL for multiple GPUs | |
| Lee et al. | Early evaluation of directive-based GPU programming models for productive exascale computing | |
| Gropp et al. | Programming for exascale computers | |
| Lee et al. | Skmd: Single kernel on multiple devices for transparent cpu-gpu collaboration | |
| US8528001B2 (en) | Controlling and dynamically varying automatic parallelization | |
| Samadi et al. | Adaptive input-aware compilation for graphics engines | |
| Doerfert et al. | Polly's polyhedral scheduling in the presence of reductions | |
| Pienaar et al. | Automatic generation of software pipelines for heterogeneous parallel systems | |
| Aguilar Mena et al. | OmpSs-2@ Cluster: Distributed memory execution of nested OpenMP-style tasks | |
| Totoni et al. | HPAT: high performance analytics with scripting ease-of-use | |
| Planas et al. | SSMART: smart scheduling of multi-architecture tasks on heterogeneous systems | |
| Cieslewicz et al. | Automatic contention detection and amelioration for data-intensive operations | |
| SFX Teixeira et al. | Automated mapping of task-based programs onto distributed and heterogeneous machines | |
| Garg et al. | Share-a-GPU: Providing simple and effective time-sharing on GPUs | |
| Shrestha et al. | Jagged tiling for intra-tile parallelism and fine-grain multithreading | |
| Jin et al. | Using compiler directives for accelerating CFD applications on GPUs | |
| Jindal et al. | Exploiting GPUs with the super instruction architecture | |
| Geng et al. | The importance of efficient fine-grain synchronization for many-core systems |