Seminar Integrierte Systeme

Vortragende/r (Mitwirkende/r)
ArtSeminar
Umfang3 SWS
SemesterSommersemester 2019
UnterrichtsspracheDeutsch
Stellung in StudienplänenSiehe TUMonline

Termine

Teilnahmekriterien & Anmeldung

Siehe TUMonlineBegrenzte Teilnehmerzahl! Anmeldung in TUMonline vom 15.03. - 24.04.2019. Jeder Student muss ein Seminarthema vor der Einführungsveranstaltung wählen. Dazu muss er Kontakt mit dem entsprechenden Themenbetreuer aufnehmen. Die Themen werden in der Reihenfolge der Anfragen vergeben. Die einzelnen Themen werden unter http://www.lis.ei.tum.de/?id=hsis bekannt gegeben.

Lernziele

Durch die Teilnahme an den Modulveranstaltungen erhält der Studierende Kenntnisse integrierter Systeme sowie deren Anwendungsbereiche. Der Studierende ist anschließend in der Lage eine Aufgabenstellung aus einem aktuellen Themengebiet der integrierten Systeme selbstständig auf wissenschaftilche Weise zu bearbeiten, selbständige Literaturrecherchen dazu durchzuführen, und eine schriftliche Ausarbeitung dazu anzufertigen. Darüber hinaus kann der Studierende die von ihm erarbeiteten Erkenntnisse vor einem fachlichen Publikum präsentieren. Ausarbeitung eines Themas und einer Übersichtspräsentation, "Üben von Präsentationstechnik" "Halten eines Fachvortrags mit anschließender Diskussion"

Beschreibung

Wechselnde Schwerpunktthemen zu integrierten Schaltungen und Systemen, sowie deren Anwendungen. Die Modulteilnehmer erarbeiten selbständig aktuelle wissenschaftliche Beiträge, fertigen eine zu bewertende schriftliche Ausarbeitung an und tragen ihre Resultate vor. Intensive Behandlung der Thematik in der Diskussion.

Inhaltliche Voraussetzungen

Basiskenntnisse integrierter Schaltungen und Systeme sowie deren Anwendungen

Lehr- und Lernmethoden

Jeder Teilnehmer bearbeitet eine individuelle fachliche Aufgabenstellung. Dies geschieht insbesondere in selbständiger Einzelarbeit des Studierenden. Der Teilnehmer bekommt - abhängig von seinem individuellen Thema - einen eigenen Betreuer zugeordnet. Der Betreuer hilft dem Studierenden insbesondere zu Beginn der Arbeit, indem er in das Fachthema einführt, geeignete Literatur zur Verfügung stellt und hilfreiche Tipps sowohl bei der fachlichen Arbeit als auch bei der Erstellung der schriftlichen Ausarbeitung und des Vortrags gibt.

Studien-, Prüfungsleistung

Bewertung besteht aus folgendenen Elementen: - 4 Seiten Ausarbeitung im IEEE-Format - Präsentation eines Posters - Präsentation von 15 Minuten mit anschließend Fragen

Empfohlene Literatur

Themen-spezifische Literatur wird vom jeweiligen Betreuer empfohlen und soll durch eigene Recherchen ergänzt werden.

Links

Angebotene Themen

Hauptseminare

Low-precision training of BinaryNets

Low-precision training of BinaryNets

Stichworte:
BinaryNets, Vectorized Hardware-Accelerator

Kurzbeschreibung:
Binarzied Conv Nets (BinaryNets) [1, 2] are way more computational efficient compared to vanilla ConvNets, as the MAC operations could be replaced by lightwight XNOR and Popcount operations. On the one hand, full-precision weights and activations are maintained throughout the training, which requires very potent GPUs (e.g. Nvidia Volta). Mixed precision training is used to improve the training performance of GPUs [3]. On the other hand, low-precision training of BinaryNets [4, 5] are more efficient in terms of compute resources but its applicatin by using commodity training hardware is still unsolved.On the one hand, full-precision weights and activations are maintained throughout the training, which requires very potent GPUs (e.g. Nvidia Volta). Mixed precision training is used to improve the training performance of GPUs [3]. On the other hand, low-precision training of BinaryNets [4, 5] are more efficient in terms of compute resources but its applicatin by using commodity training hardware is still unsolved.

Beschreibung

Convolutional Neural Networks (ConvNets) is a major trend in solving complex tasks in the field of computer vision. These models become state-of-the-art in several visual recognition tasks. Leading edge ConvNets have a vast number of trainable parameters, which are arranged ins hundreds of layers. Binarzied Conv Nets (BinaryNets) [1, 2] are way more computational efficient compared to vanilla ConvNets, as the MAC operations could be replaced by lightwight XNOR and Popcount operations.

The process of training BinaryNets still poses some unsolved challenge. On the one hand, full-precision weights and activations are maintained throughout the training, which requires very potent GPUs (e.g. Nvidia Volta). Mixed precision training is used to improve the training performance of GPUs [3]. On the other hand, low-precision training of BinaryNets [4, 5] are more efficient in terms of compute resources but its applicatin by using commodity training hardware is still unsolved.

An analysis and comparison of recent publications, of low-precision training of BinaryNets on vectorized hardware accellerator, is the main task for this seminar topic. Furthermore, the implementation-wise differences, as well as potential hardware, should be discussed in this work.

References:

[1] Lin, Xiaofan & Zhao, Cong & Pan, Wei. (2017). Towards Accurate Binary Convolutional Neural Network. NIPS.

[2]  Zhuang, B., Shen, C., Tan, M., Liu, L., & Reid, I.D. (2018). Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation.

[3] Paulius Micikevicius and Sharan Narang and Jonah Alben and Gregory Diamos and Erich Elsen and David Garcia and Boris Ginsburg and Michael Houston and Oleksii Kuchaiev and Ganesh Venkatesh and Hao Wu. (2018). Mixed Precision Training. ICLR.

[4] Zhou, Shuchang & Ni, Zekun & Zhou, Xinyu & Wen, He & Wu, Yuxin & Zou, Yuheng. (2016). DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.

[5] Hubara, Itay & Courbariaux, Matthieu & Soudry, Daniel & El-Yaniv, Ran & Bengio, Y. (2016). Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. Journal of Machine Learning Research.

Kontakt

Alexander Frickenstein

alexander.frickenstein@bmw.de

Phone: +49-151-601-66600

Betreuer:

Alexander Frickenstein

Application of BinaryNets on Vectorized Hardware-Accelerator

Application of BinaryNets on Vectorized Hardware-Accelerator

Stichworte:
BinaryNets, Vectorized Hardware-Accelerator

Kurzbeschreibung:
Binarzied Conv Nets (BinaryNets) [1, 2] are way more computational efficient compared to vanilla ConvNets, as the MAC operations could be replaced by lightwight XNOR and Popcount operations. An analysis and comparison of recent publications, applying BinaryNets on vectorized hardware accellerator, is the main task for this seminar topic.

Beschreibung

Convolutional Neural Networks (ConvNets) is a major trend in solving complex tasks in the field of computer vision. These models become state-of-the-art in several visual recognition tasks. Leading edge ConvNets have a vast number of trainable parameters, which are arranged ins hundreds of layers. Binarzied Conv Nets (BinaryNets) [1, 2] are way more computational efficient compared to vanilla ConvNets, as the MAC operations could be replaced by lightwight XNOR and Popcount operations.

Beyond that, BinaryNets have to be applicable to modern vectorized HW accelerator (i.e. CPUs [3,4], GPUs [5] or FPGAs). Parallel computers support to process multiple elements within the same operation, but they require structured data and predefined data types.

An analysis and comparison of recent publications, applying BinaryNets on vectorized hardware accellerator, is the main task for this seminar topic. Furthermore, the implementation-wise differences, as well as potential hardware, should be discussed in this work.

References:

[1] Lin, Xiaofan & Zhao, Cong & Pan, Wei. (2017). Towards Accurate Binary Convolutional Neural Network. NIPS.

[2]  Zhuang, B., Shen, C., Tan, M., Liu, L., & Reid, I.D. (2018). Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation.

[3] Rastegari, Mohammad & Ordonez, Vicente & Redmon, Joseph & Farhadi, Ali. (2016). XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. ECCV.

[4] Yash Akhauri. (2018). https://software.intel.com/en-us/articles/binary-neural-networks-

[5] Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

 

 

Kontakt

Alexander Frickenstein

alexander.frickenstein@bmw.de

Phone: +49-151-601-66600

Betreuer:

Alexander Frickenstein

Efficient Neural Architectural Search techniques for various HW platforms

Efficient Neural Architectural Search techniques for various HW platforms

Stichworte:
Neural Architectural Search, Optimization, DNN

Kurzbeschreibung:
Neural Architectural Search (NAS) is an essential tool for mapping the the CNN architectures on various HW platforms such as CPU, GPU and FPGA. Integration of the NAS with popular optimization schemes such as quantization, pruning could be further surveyed.

Beschreibung

Neural Architectural Search (NAS) is an essential tool for mapping the the CNN architectures on various HW platforms such as CPU, GPU and FPGA. However, directly applying NAS is computationally expensive and even time consuming as the CNN has potentially enormous design space. Proxyless-NAS [1], Mnas-Net [2]  are among the popular NAS techniques in the literature.

Understanding the importance of NAS to realize HW aware CNN models is the centralized aspect in this topic. Integration of the NAS with popular optimization schemes such as quantization, pruning could be further surveyed.

References:

[1] Cai Han and Ligeng Zhu and Song Han,  Proxyless NAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019.

[2] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan and Quoc V. Le, MnasNet: Platform-Aware Neural Architecture Search for Mobile, 2018

Kontakt

Manoj-Rohit Vemparala

Manoj-Rohit.Vemparala@bmw.de

Phone: +49-151-601-95959

Betreuer:

Alexander Frickenstein

Deep Learning Inference Accelerators exploiting sparsity in Optimized Convolutional Neural Networks.

Deep Learning Inference Accelerators exploiting sparsity in Optimized Convolutional Neural Networks.

Stichworte:
Pruning, Sparsity, Optimization, DNN

Kurzbeschreibung:
Effective pruning methods have become popular in the last few years. Custom ASIC-architectures proposed in literature such as EIE [2] or SCNN [3] is efficient to accelerate irregular sparsity exisiting on CNN model. Therefore, a survey of sparse CNN based HW accelerators is a centralized aspect for this work.

Beschreibung

Convolutional Neural Networks (CNNs) are biologically inspired algorithms which are able to detect prominent features and provide an output as image classification or object detection. As the CNNs are getting deeper, the redundancy in networks has also increased. Thus, effective pruning methods have become popular in the last few years. Han et al. [1] has shown that pruning based approaches removes redundant weights and can achieve compression rate upto 49 x in VGG-16. However, it is difficult to integrate the zero detection in the existing data pipeline and realize an efficient. Custom ASIC-architectures proposed in literature such as EIE [2] or SCNN [3] is efficient to accelerate irregular sparsity exisiting on CNN model.

In this topic, the importance of pruning among various CNN optimization techniques must be realized. Survey of CNN based HW accelerators is a centralized aspect for this work. The existing dataflow techniques for pruning based CNN accelerator could be realized.

References:

[1] Song Han, Xingyu Liu and William J Dally, Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR 2016.

[2] Song Han, Xingyu Li, Huizi Mao, Jing Pu, Ardvan Pedram, Mark A Horowitz and William J Dally, EIE: Efficient Inference Engine on Compressed Deep Neural Network, ISCA 2016.

[3] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally, SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks, ISCA 2017.

Kontakt

Manoj-Rohit Vemparala

Manoj-Rohit.Vemparala@bmw.de

Phone: +49-151-601-95959

Betreuer:

Alexander Frickenstein

Image Quality Metrics for Computer Vision

Image Quality Metrics for Computer Vision

Beschreibung

Kamerasysteme zur visuellen Erfassung des Umfelds stellen eine essentielle Sensorik für autonome Fahrzeuge dar. Um in allen Situationen in alltäglichen Verkehrsszenarien zuverlässig verwertbare Daten sammeln zu können, werden extreme Anforderungen an die verwendeten Kameras gestellt – beispielsweise in Bezug auf Dynamikumfang und Auflösung entfernter Objekte.
Um autonomes Fahren serientauglich zu verwirklichen, ist daher eine Optimierung aller Kamerakomponenten notwendig. Kennzeichnend für diese ist insbesondere, dass das Optimum durch eine perfekte Interpretierbarkeit der Daten durch Computer Vision (CV) Systeme definiert ist. Dies steht in Kontrast zu traditionellem Kameradesign, in dem für den menschlichen Sehsinn optimiert wird.
Um die Qualität der von Kamerasystemen erfassten Daten zu bewerten, existiert eine Vielzahl an subjektiven und objektiven Bildqualitäts-Metriken. Die Aussagekraft auch der objektiven Metriken ist bisher allerdings hauptsächlich in Bezug auf den menschlichen Sehsinn erforscht. CV-Algorithmen zeigen jedoch nicht notwendigerweise dieselben Sensitivitäten auf Bildfehler und Artefakte.
In neueren Publikationen wurden Metriken zur Bewertung von Bildqualität vorgeschlagen, welche auf potentiell für CV besonders relevanten Bildeigenschaften wie den enthaltenen Strukturen und dem Informationsgehalt basieren. Im Rahmen des Seminars sollen solche Konzepte recherchiert und die für CV relevanten Aspekte extrahiert werden. Wenn möglich, soll auch evaluiert werden, inwiefern die vorgeschlagenen Metriken mit bekannten Sensitivitäten von CV-Algorithmen korrelieren.

Zum Einstieg empfohlene Literatur

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.

Sheikh, H. R., & Bovik, A. C. (2004, May). Image information and visual quality. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 3, pp. 709-712). IEEE.

Haccius, C., & Herfet, T. (2017). Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation. In Computer Science & Information Technology (CS & IT). Academy & Industry Research Collaboration Center (AIRCC). https://doi.org/10.5121/csit.2017.70104

Ansprechpartner

Korbinian Weikl, M.Sc.
korbinian.weikl@bmw.de
+49-151-601-48831

Kontakt

Korbinian Weikl, M.Sc.
korbinian.weikl@bmw.de
+49-151-601-48831

Betreuer:

Korbinian Weikl

Image Signal Processing for Computer Vision

Image Signal Processing for Computer Vision

Beschreibung

Kamerasysteme zur visuellen Erfassung des Umfelds stellen eine essentielle Sensorik für autonome Fahrzeuge dar. Um in allen Situationen in alltäglichen Verkehrsszenarien zuverlässig verwertbare Daten sammeln zu können, werden extreme Anforderungen an die verwendeten Kameras gestellt – beispielsweise in Bezug auf Dynamikumfang und Auflösung entfernter Objekte.


Um autonomes Fahren serientauglich zu verwirklichen, ist daher eine Optimierung aller Kamerakomponenten notwendig. Kennzeichnend für diese ist insbesondere, dass das Optimum durch eine perfekte Interpretierbarkeit der Daten durch Computer Vision (CV) Systeme definiert ist. Dies steht in Kontrast zu traditionellem Kameradesign, in dem für den menschlichen Sehsinn optimiert wird.


Ein Element der Kamera, das besonders durch die Optimierung für den menschlichen Sehsinn gekennzeichnet ist, ist der Image Signal Processer (ISP). In diesem werden unter anderem Sensorkorrekturen und Verarbeitungsschritte zur Verbesserung der Bildqualität durchgeführt und so aus Kamera-Rohdaten vom Betrachter interpretierbare Bilder erzeugt.


In aktuellen Publikationen wurden die Einflüsse einzelner Stufen bestehender ISP Pipelines auf CV Systeme untersucht und neue Ansätze für ISP Designs aus den Anforderungen der CV abgeleitet. Im Rahmen des Seminars sollen die Ergebnisse dieser Untersuchungen recherchiert und zu einem Konzept für einen CV-optimierten ISP zusammengefasst werden.

Zum Einstieg empfohlene Literatur

Heide, F., Steinberger, M., Tsai, Y. T., Rouf, M., Paj?k, D., Reddy, D., ... & Kautz, J. (2014). FlexISP: A flexible camera image processing framework. ACM Transactions on Graphics (TOG), 33(6), 231.

Buckler, M., Jayasuriya, S., & Sampson, A. (2017). Reconfiguring the imaging pipeline for computer vision. In Proceedings of the IEEE International Conference on Computer Vision (pp. 975-984).

Diamond, S., Sitzmann, V., Boyd, S., Wetzstein, G., & Heide, F. (2017). Dirty pixels: Optimizing image classification architectures for raw sensor data. arXiv preprint arXiv:1701.06487.

Ansprechpartner
Korbinian Weikl, M.Sc.
korbinian.weikl@bmw.de
+49-151-601-48831

 

Kontakt

Korbinian Weikl, M.Sc.
korbinian.weikl@bmw.de
+49-151-601-48831

 

Betreuer:

Korbinian Weikl

Application of Machine Learning Based Approaches in FPGA design Optimization

Application of Machine Learning Based Approaches in FPGA design Optimization

Beschreibung

The ever-higher demands for computational capabilities of FPGA devices in the application domains like data analysis or image processing are forcing the researchers to rethink their conventional approaches to the system design. One alternative is approximate computing which performs inexact calculations instead of the actual one and brings out better performance, space and energy efficiency on hardware systems. However, it's essential to keep the application quality degradation due to such approximations below a tolerable limit. The machine learning-based approaches such as learning classifier systems or genetic algorithms play an important role in the identification of optimal FPGA design parameters which maximizes the above benefits with or without the approximations in their calculations.

This seminar aims to identify and analyze the applications of machine learning based approaches in the FPGA design optimization with or without approximations.

Kontakt

Manu Manuel, Room: N2116, manu.manuel@tum.de, +49 89 289 28338

Betreuer:

Manu Manuel

Hardware Acceleration Techniques for Virtualized Network Functions

Hardware Acceleration Techniques for Virtualized Network Functions

Beschreibung

Network traffic is known to traverse a number of different links before reaching its destination. In one way or another it is processed at each node of the network. These processing tasks range from simple IP forwarding to more complex operations like the deployment of firewalls. Network functions traditionally were realized using dedicated proprietary hardware which is now more and more replaced by commodity servers executing the same tasks in software, effectively virtualizing the network functions. Since general purpose computing equipment however is inherently slower in direct comparison, recent approaches to improve their performance attempt to support them using hardware accelerators like FPGAs, ASICs or NPUs.

 

The goal of this seminar topic is to investigate state of the art hardware acceleration techniques to improve virtualized network function performance.

Betreuer:

Franz Biersack

Knobs and Their Influenced Metrics for Runtime Performance and Power Optimization on MPSoCs

Knobs and Their Influenced Metrics for Runtime Performance and Power Optimization on MPSoCs

Stichworte:
Runtime, Optimization, MPSoC, Performance, Power Saving

Beschreibung

Modern multi-processor system-on-chips (MPSoCs) are designed to manage high peak performance when requested by the user, but also need to be very power efficient in stand-by mode for certain devices like mobile phones. To manage the large dynamic range between the two mentioned operating points those chips include several knobs to be changed while runtime to switch from one mode to another.

The goal of this seminar topic is to do a literature research on the different knobs currently available at hardware, as well as at software level to change a system's performance while runtime. The survey should compare the different knobs in respect to their impact on performance, power, local vs. global performance and the periodicity they can be applied. Further the different performance metrics influenced by each knob should be mentioned.

Betreuer:

Florian Maurer

Reinforcement Learning Approaches for Interacting Individuals

Reinforcement Learning Approaches for Interacting Individuals

Stichworte:
Machine Learning, Reinforcement Learning, Multi-Agent, Centralized, Decentralized

Kurzbeschreibung:
Research on Learning in Multi-Agent Systems

Beschreibung

In latest research, machine learning has been successfully applied to many kinds of problems like picture classification or control. Machine learning has shown to be able to build highly accurate models also for a big amount of sensor data. In many problems, like autonomous driving, coordination between different entities (cars) which are controlled individually is necessary to optimize for a specific goal (routing with least average travelling time).


In such problems mainly two approaches exist: the (1) centralized and the (2) decentralized one. In the first (1) one, all sensor data is shared with a central controller which processes it. In the decentralized (2) manner only parts of the data is shared between the different entities, which decide on their actions locally based on the available information.


The goal of this seminar topic is to compare the advantages and disadvantages of the centralized and the decentralized approach (multi-agent).

Betreuer:

Florian Maurer

Approximate computing Methods for FPGA-Based Image Processing

Approximate computing Methods for FPGA-Based Image Processing

Beschreibung

Digital image processing in professional applications places ever-higher demands so that the computing power and power consumption of FPGA devices reach their limits. Approximate computing provides a new design paradigm by performing inexact calculations instead of the actual one. As a result, fewer resources are used in the FPGA devices, more functions can be implemented, and the energy efficiency of the calculations is improved. However, approximate computing always trades off the application quality against these benefits. Hence. it's important to keep the quality degradation below a tolerable limit.

This seminar aims to identify the current trends in approximate computing on FPGA for image processing and analyzing the interesting approaches in detail.

Kontakt

Manu Manuel, Room: N2116, manu.manuel@tum.de, +49 89 289 28338

Betreuer:

Manu Manuel

Demand aware Power Management in modern Multi-Core Processors

Demand aware Power Management in modern Multi-Core Processors

Beschreibung

For decades microprocessors have been operated at higher and higher clock frequencies to steadily raise their processing power. Hitting the power wall in the early 2000s however, a paradigm shift of performance increases by simply increasing the frequency set in. Instead, higher numbers of processor cores were used to achieve further performance advantages. And while modern processors still allow for clock frequencies of several GHz their workload often varies significantly and operating them with a lower clock frequency is easily sufficient. This also allows for lower supply voltages which overall can considerably save power consumption. Thus, modern CPUs provide interfaces designed to control the operating frequency and voltage they are supplied with, to save energy and extend their life cycle.

 

The goal of this seminar topic is to investigate mechanisms, advantages and drawbacks of state of the art power and thermal management technologies in modern multi-core processors.

Betreuer:

Franz Biersack

FPGAs in Cloud Computing

FPGAs in Cloud Computing

Beschreibung

For many years, cloud computing focused primarily on providing general-purpose, software-programmable resources such as CPUs and GPUs at a large scale. As the tools for hardware development advance and become viable to use for a broader group of (software) developers, recently big operators such as Amazon, Microsoft or Alibaba introduced FPGA-equipped nodes to their portfolio to allow customers to accelerate their workloads. The goal of this seminar topic is to survey the availability of "programmable hardware" across public cloud environments, to give an overview on how customers can program and integrate these resources in their applications and to outline use-cases, limitations and technical challenges.

 

Betreuer:

Inference on FPGA: A Survey of Technologies and Trends Towards Efficient Neural Network Accelerators

Inference on FPGA: A Survey of Technologies and Trends Towards Efficient Neural Network Accelerators

Beschreibung

In recent years, the growing interest in deep learning across multiple disciplines has created a new competitive field in the semiconductor industry. The race towards providing the best hardware platform for training or inference has pushed some of the largest manufacturers into making acquisitions, changing some of the fundamental subcomponents of their existing solutions, and providing new solutions altogether.

The goal of this survey is to study the efforts made by the major FPGA manufacturers to push their platform as the ideal solution for deep neural network inference on edge devices.

Betreuer:

Nael Al-Fasfous

Analysis of Software Graph Processors

Analysis of Software Graph Processors

Beschreibung

The copying of graphs is well suited for near memory hardware acceleration. The goal of this seminar is to make a survey and analysis of existing software comparable approaches.

Kontakt

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Multi-Agent Distributed Reinforcement Learning Approaches

Multi-Agent Distributed Reinforcement Learning Approaches

Stichworte:
Machine Learning, Reinforcement Learning, Multi-Agent

Kurzbeschreibung:
Research on Learning in Multi-Agent Systems

Beschreibung

In latest research, machine learning has been successfully applied to many kinds of problems like picture classification or control. Machine learning has shown to be able to build highly accurate models also for a big amount of sensor data. In many problems, like autonomous driving, coordination between different entities (cars) which are controlled individually is necessary to optimize for a specific goal (routing with least average travelling time).


In such problems mainly two approaches exist: the (1) centralized and the (2) decentralized one. In the first (1) one, all sensor data is shared with a central controller which processes it. In the decentralized (2) manner only parts of the data is shared between the different entities, which decide on their actions locally based on the available information.


The goal of this seminar topic is to search for and investigate different learning systems which are based on the decentralized approach (multi-agent) and are optimally aiming for a single global goal.

Betreuer:

Florian Maurer

Hardware-level Approximation of Convolutional Neural Networks

Hardware-level Approximation of Convolutional Neural Networks

Beschreibung

Convolutional Neural Networks (CNNs) have become the state-of-the-art in image classification and other computer vision tasks. This has led to a substantial effort from industry and academia, to bring such neural networks to edge devices. With tight area, power and latency constraints, this challenge presents many optimization opportunities. Neural networks are inherently error-resilient, and this makes approximate computing a prominent approach for such optimizations.

Many approximation techniques have been used to optimize CNNs at the structural level. These include quantization, pruning, and low-rank approximation, among others. The goal of this seminar is to research possible approximations at the hardware level, to further exploit the resilience of such neural networks.

Betreuer:

Nael Al-Fasfous

Hardware Acceleration of Convolutional Neural Networks on FPGA Platforms

Hardware Acceleration of Convolutional Neural Networks on FPGA Platforms

Beschreibung

Convolutional Neural Networks (CNNs) have become the state-of-the-art in image classification and other computer vision tasks. Their highly parallel structure renders general purpose Central Processing Units (CPUs) inefficient at running inference for such networks. On the other hand, general purpose Graphics Processing Units (GPUs) have a highly parallel architecture and prove to be very fast at executing most CNNs. Unfortunately, GPUs are power-hungry and therefore not suitable for edge applications, where energy is a valuable, limited resource. This has led to an effort from industry and academia to develop custom hardware accelerators for CNNs on edge. Since recent state-of-the-art CNNs vary in their structure and often have architectural irregularities, a single, fixed hardware accelerator design is usually not the best option. Rapid prototyping and pipeline irregularities make Field Programmable Gate Arrays (FPGAs) an excellent platform for designing accelerators that are tailored for the latest CNNs.

The goal of this seminar is to research the latest advancements and trends in CNN accelerator design on FPGAs.

Betreuer:

Nael Al-Fasfous

Coherence for Near Memory Computing

Coherence for Near Memory Computing

Kurzbeschreibung:
The goal of this seminar is to provide a survey of cache coherence mechanism for near memory computing.

Beschreibung

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

The goal of this seminar is to investigate how cache coherence mechanisms adapt to such near memory computing operations.

Kontakt

srivatsa.akshay@tum.de

Betreuer:

A Survey on Hybrid/Adaptive Coherency Protocols

A Survey on Hybrid/Adaptive Coherency Protocols

Kurzbeschreibung:
The goal of this seminar is to survey hybrid architecture-specific extensions to traditional cache coherence mechanisms that optimize application performance

Beschreibung

Hardware supported coherent architectures allow for faster coherency messages and easier programming models. But for large systems with varying application demands, the scalability and performance of such schemes maybe suboptimal. The goal of this seminar topic is to perform a detailed analysis of different hybrid/adaptive coherency schemes in the scope of NoC based distributed shared memory MPSoCs.

Kontakt

srivatsa.akshay@tum.de

Betreuer:

Analysis of Near Memory Graph Accelerators

Analysis of Near Memory Graph Accelerators

Beschreibung

The copying of graphs is well suited for near memory hardware acceleration. The goal of this seminar is to make a survey and analysis of existing approaches.

Kontakt

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

An Introduction to Finite Length Codes for SoCs

An Introduction to Finite Length Codes for SoCs

Beschreibung

High data integrity is a key in modern SoC communication. However, due to the ever decreasing feature size, modern silicon devices become more vulnerable to transient faults. At the same time, on-Chip communication operates on rather small chunks of data, in contrast to traditional unreliable communication scenarios like wireless communication. Therefore, conventional measures like the channel capacity as introduced by Shannon do hold anymore, paving the way to new methods to quantify channels and codes alike that take the code length into account. The goal of this seminar is to provide an introduction into the field and methods of finite length codes.

Betreuer: