Skip to main content
  • Home
  • Happenings
  • Events
  • Hardware Software Co-Design for Efficient Deep Learning Systems
Hardware Software Co-Design for Efficient Deep Learning Systems

Hardware Software Co-Design for Efficient Deep Learning Systems

Date6th Apr 2021

Time04:00 PM

Venue Google Meet

PAST EVENT

Details

Deep Neural Networks (DNNs) have achieved remarkable success in a wide range of domains. However, this success comes at the cost of ever-increasing computational demands of DNNs which becomes the primary challenge. There are three principal approaches to address this challenge: (i) Improve the resource efficiency of Deep Learning systems, (ii) Design efficient networks, and (iii) Improve the design methodology of efficient networks and devices. In this seminar, we identify and present key ideas towards improving the execution efficiency of DNNs across all the three mentioned principal approaches. First, towards building efficient systems, we present SparseCache, an enhanced cache architecture to accelerate DNNs on resource-constrained hardware platforms. SparseCache utilizes a Ternary Content Addressable Memory (TCAM) based micro-architectural extension called null-cache to exploit an essential attribute of DNNs, viz, sparsity to store zero-valued cache lines compactly. By intelligently storing only the addresses of zero-valued lines in null-cache, SparseCache improves the effective cache capacity, thereby reducing the overall miss-rate and execution-time up to 28% and 21%, respectively across four state-of-the-art DNNs. Second, towards building efficient networks, we present FuSeConv, an efficient convolution operator specifically fine-tuned for systolic-arrays. Depth-wise separable convolutions are the de-facto building blocks for efficient inference. However, the computational patterns of depth-wise separable convolutions are not systolic and thus they are sub-optimal on systolic-array based hardware accelerators. FuSeConv overcomes this problem by decomposing convolutions fully into separable 1D convolutions along both spatial and depth dimensions. The resultant computation is systolic and efficiently utilizes the systolic array with a modified dataflow. FuSeConv achieves a significant speedup of 3-7x on a 64x64 systolic array. Finally, towards improving the design methodology of efficient networks and devices, we present generalizable DNN cost models for mobile devices. Designing efficient networks crucially involve latency characterization across many hardware platforms, which is infeasible due to the wide diversity of platforms. Thus cost models are utilized to make characterization possible. However, existing studies are very restricted and often build individual cost models for each hardware platform. In this work, we systematically study cost models by building a large crowd-sourced repository of 118 networks and 105 mobile devices and make a key contribution of building a single cost model that generalizes across this wide product space of network and hardware. Specifically, we propose a representation of hardware-devices based on a judiciously chosen set of network latencies, called the signature-set, enabling us to build this generalizable cost model. Our results show that by carefully choosing the signature set, network representation, and the ML algorithm, we can build powerful cost models that generalize, thus making the process of designing efficient networks by extensive characterization possible.

Speakers

Vinod Ganesan

Department of Computer Science and Engineering