[go: up one dir, main page]

Cicek et al., 2022 - Google Patents

Energy efficient boosting of gemm accelerators for dnn via reuse

Cicek et al., 2022

View PDF
Document ID
10388025999366812905
Author
Cicek N
Shen X
Ozturk O
Publication year
Publication venue
ACM Transactions on Design Automation of Electronic Systems (TODAES)

External Links

Snippet

Reuse-centric convolutional neural networks (CNN) acceleration speeds up CNN inference by reusing computations for similar neuron vectors in CNN's input layer or activation maps. This new paradigm of optimizations is, however, largely limited by the overheads in neuron …
Continue reading at dl.acm.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/78Architectures of general purpose stored programme computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology

Similar Documents

Publication Publication Date Title
Cao et al. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity
Mittal et al. A survey of deep learning on CPUs: Opportunities and co-optimizations
Albericio et al. Cnvlutin: Ineffectual-neuron-free deep neural network computing
Chung et al. Linqits: Big data on little clients
Aluru et al. A review of hardware acceleration for computational genomics
Gong et al. Save: Sparsity-aware vector engine for accelerating dnn training and inference on cpus
US20080250227A1 (en) General Purpose Multiprocessor Programming Apparatus And Method
Kim et al. Accelerating large-scale graph-based nearest neighbor search on a computational storage platform
Lee et al. Anna: Specialized architecture for approximate nearest neighbor search
Chen et al. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs
Wang et al. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning
Cicek et al. Energy efficient boosting of gemm accelerators for dnn via reuse
Cong et al. Best-effort FPGA programming: A few steps can go a long way
Soltaniyeh et al. An accelerator for sparse convolutional neural networks leveraging systolic general matrix-matrix multiplication
Han et al. Distme: A fast and elastic distributed matrix computation engine using gpus
Cicek et al. General reuse-centric CNN accelerator
Chen et al. fgSpMSpV: A fine-grained parallel SpMSpV framework on HPC platforms
Lin et al. Hitgnn: High-throughput gnn training framework on cpu+ multi-fpga heterogeneous platform
Yesil et al. Hardware accelerator design for data centers
Lee et al. Similarity search on automata processors
Lee et al. MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units
Gupta et al. Store-n-learn: Classification and clustering with hyperdimensional computing across flash hierarchy
Qararyah et al. An efficient hybrid deep learning accelerator for compact and heterogeneous CNNs
Jeon et al. XEM: Tensor accelerator for AB21 supercomputing artificial intelligence processor
Sharafeddin et al. On the effectiveness of accelerating MapReduce functions using the Xilinx Vivado HLS tool