Date of Award
Master of Science
Artificial intelligence has come a very long way from being a mere spectacle on the silver screen in the 1920s [Hml18]. As artificial intelligence continues to evolve, and we begin to develop more sophisticated Artificial Neural Networks, the need for specialized and more efficient machines (less computational strain while maintaining the same performance results) becomes increasingly evident. Though these new techniques, such as Multilayer Perceptrons, Convolutional Neural Networks and Recurrent Neural Networks, may seem as if they are on the cutting edge of technology, many of these ideas are over 60 years old! However, many of these earlier models, at the time of their respective introduction, either lacked algorithmic sophistication (the very early McCulloch and Pitts, and Rosenblatt Perceptron), or suffered from insufficient training data to allow effective learning of a solution. Now, however, we are in the era of Big Data and the Internet of Things, where we have everything from autonomous vehicles to smart toilets that have Amazons Alexa integration within them. This has given an incredible rise to sophistication of these new Artificial Neural Networks. This increase in sophistication has come at an expense of high computational complexity. Though traditional CPUs and GPUs have been the modern go-to processors for these types of applications, there has been an increasing interest in developing specialized hardware that not only speeds up these computations, but also does it in the most energy efficient manner.
The objective of this Thesis is to provide the reader with a clear understanding of a subdiscipline in artificial intelligence, Artificial Neural Networks, also referenced as Multilayer Perceptrons or Deep Neural Networks; current challenges and opportunities within the Deep Learning field; and a coverage of proposed Domain Specific Architectures [Hen17] that aim at optimizing the type of computations being performed in the Artificial Neural Networks and the way data is moved throughout the processor in order to increase energy efficiency. The Domain Specific Architecture guidelines utilized in this study are: investment of dedicated memories close to the processor, Âµ-architectural optimizations, leveraging the easiest form of parallelism, reduction of data size, and designing the Domain Specific Architecture to the domain specific language.
This study has managed to leverage four out of the five Domain Specific Architecture design guidelines. We have leveraged the use of dedicated memories and µ-architectura optimizations by building dedicated Functional Units which have their own dedicated memories and specialized multiplication hardware. We also leveraged the use of the easiest form of parallelism by using a Spatial Architecture, as opposed to the traditional Temporal Architecture. Specifically, the Temporal Architecture operates as a Systolic Array. We have also investigated the use of a newly proposed mid-precision floating point representation of data values, which consists of 12 bits in parallel, and is based on the IEEE 754 standard of half-precision? which uses 16 bits.
The organization of this paper is as follows: first, we cover a brief history lesson of artificial intelligence, machine learning, and Artificial Neural Networks; next we go into a much more comprehensive background study of these algorithms, their origin, and some of their modern applications; a history of computer architecture and its different classifications; the Domain Specific Architecture guidelines; and the approach to the proposed DSA design. Next, we discuss the specific problems in the primary areas of study needed to build this Domain Specific Architecture, which includes a discussion of the test-bed design and results of the proposed design. A conclusion for the study, as well as discussion of future work are given in the final section.
Received from ProQuest
Angel Izael Solis
Solis, Angel Izael, "Dedicated Hardware for Machine/Deep Learning: Domain Specific Architectures" (2019). Open Access Theses & Dissertations. 172.