Wu et al., 2025 - Google Patents

EITHOT: Efficient In-place Transposition of High Order Tensors on GPUs

Wu et al., 2025

Document ID: 3787467860718913876
Author: Wu C; Tu C; Cheng K; Lee C
Publication year: 2025
Publication venue: ACM Transactions on Parallel Computing

External Links

Cited by

Snippet

Tensor transposition is a fundamental operation in tensor calculations with various applications. However, a naive implementation that copies each element from the source tensor to the transposed position in the target tensor requires double space, making it …

Continue reading at dl.acm.org (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30442—Query optimisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3457—Performance evaluation by simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Gale et al.	2020	Sparse gpu kernels for deep learning
Filippone et al.	2017	Sparse matrix-vector multiplication on GPGPUs
Dongarra et al.	2014	Accelerating numerical dense linear algebra calculations with GPUs
Gremse et al.	2015	GPU-accelerated sparse matrix-matrix multiplication by iterative row merging
Springer et al.	2017	HPTT: A high-performance tensor transposition C++ library
Ashari et al.	2015	On optimizing machine learning workloads via kernel fusion
Liu et al.	2015	Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
Tang et al.	2013	Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
Yeralan et al.	2017	Algorithm 980: Sparse QR factorization on the GPU
Elafrou et al.	2018	Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms
Guo et al.	2011	A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs
Koza et al.	2014	Compressed multirow storage format for sparse matrices on graphics processing units
Huang et al.	2020	Strassen’s algorithm reloaded on GPUs
Basaran et al.	2013	Grex: An efficient MapReduce framework for graphics processing units
Bartezzaghi et al.	2015	An explicit dynamics GPU structural solver for thin shell finite elements
Tolmachev	2023	VkFFT-a performant, cross-platform and open-source GPU FFT library
Bernaschi et al.	2016	A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units
Oyarzun et al.	2017	Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
Park et al.	2022	mGEMM: Low-latency convolution with minimal memory overhead optimized for mobile devices
Gao et al.	2024	A systematic literature survey of sparse matrix-vector multiplication
Liu et al.	2009	Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA
Wu et al.	2025	EITHOT: Efficient In-place Transposition of High Order Tensors on GPUs
Reddy et al.	2012	New sparse matrix storage format to improve the performance of total SPMV time
Page et al.	2020	Scalability of sparse matrix dense vector multiply (SpMV) on a migrating thread architecture
Corrigan et al.	2012	A hybrid grid compressible flow solver for large-scale supersonic jet noise simulations on multi-GPU clusters