Wu et al., 2025 - Google Patents
EITHOT: Efficient In-place Transposition of High Order Tensors on GPUsWu et al., 2025
- Document ID
- 3787467860718913876
- Author
- Wu C
- Tu C
- Cheng K
- Lee C
- Publication year
- Publication venue
- ACM Transactions on Parallel Computing
External Links
Snippet
Tensor transposition is a fundamental operation in tensor calculations with various applications. However, a naive implementation that copies each element from the source tensor to the transposed position in the target tensor requires double space, making it …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30442—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3457—Performance evaluation by simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Gale et al. | Sparse gpu kernels for deep learning | |
| Filippone et al. | Sparse matrix-vector multiplication on GPGPUs | |
| Dongarra et al. | Accelerating numerical dense linear algebra calculations with GPUs | |
| Gremse et al. | GPU-accelerated sparse matrix-matrix multiplication by iterative row merging | |
| Springer et al. | HPTT: A high-performance tensor transposition C++ library | |
| Ashari et al. | On optimizing machine learning workloads via kernel fusion | |
| Liu et al. | Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors | |
| Tang et al. | Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes | |
| Yeralan et al. | Algorithm 980: Sparse QR factorization on the GPU | |
| Elafrou et al. | Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms | |
| Guo et al. | A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs | |
| Koza et al. | Compressed multirow storage format for sparse matrices on graphics processing units | |
| Huang et al. | Strassen’s algorithm reloaded on GPUs | |
| Basaran et al. | Grex: An efficient MapReduce framework for graphics processing units | |
| Bartezzaghi et al. | An explicit dynamics GPU structural solver for thin shell finite elements | |
| Tolmachev | VkFFT-a performant, cross-platform and open-source GPU FFT library | |
| Bernaschi et al. | A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units | |
| Oyarzun et al. | Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers | |
| Park et al. | mGEMM: Low-latency convolution with minimal memory overhead optimized for mobile devices | |
| Gao et al. | A systematic literature survey of sparse matrix-vector multiplication | |
| Liu et al. | Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA | |
| Wu et al. | EITHOT: Efficient In-place Transposition of High Order Tensors on GPUs | |
| Reddy et al. | New sparse matrix storage format to improve the performance of total SPMV time | |
| Page et al. | Scalability of sparse matrix dense vector multiply (SpMV) on a migrating thread architecture | |
| Corrigan et al. | A hybrid grid compressible flow solver for large-scale supersonic jet noise simulations on multi-GPU clusters |