Currently offered Theses

Often, new topics are in preparation for being advertised, which are not yet listed here. Sometimes there is also the possibility to define a topic matching your specific interests. Therefore, do not hesitate to contact our scientific staff, if you are interested in contributing to our work. If you have further questions concerning a thesis at the institute please contact Dr. Thomas Wild.

For interested students in an "Ingenieurpraxis":

We supervise such internships done in industry if the topic matches our area of work. However, we do not offer such internships at our chair as from our point of view students should gain early experience in industry work.

 

BAMAIDPFPIPHSSHK
Title
------

Extending Region Based Cache Coherence to Global (DDR) Memory for Distributed Shared MPSoCs on an FPGA Prototype

Extending Region Based Cache Coherence to Global (DDR) Memory for Distributed Shared MPSoCs on an FPGA Prototype

Keywords:
Cache Coherence, Distributed Shared Memory MPSoCs

Short Description:
The goal of this project is to extend RBCC to global memory with distributed directories.

Description

Providing hardware coherence formoderntile-based MPSoCsrequires additional area. As a result, this does not scale with increasing tile counts.As part of the Invasive Computing project, we introducedRegion Based Cache Coherence (RBCC) whichis ascalableapproachthat provides on-demand coherence. RBCC enables users to dynamically create/destroy coherency regions based on application requirements. Currently, RBCC has been developed for the distributed tile local memories of our system. The next step is to extend RBCC to the global memory, so as to fully utilize the memory capacity of our heterogeneous muticore architecture.

Towards this goal you’ll complete the following tasks:

  • Investigate existing distributed directory based cache coherence schemes

  • Extend RBCC to global DDR memory

  • Verify the design on a FPGA-based hardware platform

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very Good VHDL Skills

  • Good C/C++ Skills

  • Good understanding of MPSoCs and Cache Coherence Schemes

  • Self-motivated and structured work style

Contact

Akshay Srivatsa
Chair of Integrated Systems
Arcisstraße 21, 80333 Munich
Tel. +49 89 289 22963
srivatsa.akshay@tum.de
www.lis.ei.tum.de

Supervisor:

----

HW-SW interface design for a self-aware SoC paradigm based on hardware machine learning (IPF)

HW-SW interface design for a self-aware SoC paradigm based on hardware machine learning (IPF)

Keywords:
Machine Learning, Supervisory control theory, Learning Classifier system, HW-SW Interface, VHDL, C, SoC

Short Description:
The goal of this project is to develop a hardware-software interface for the machine learning based IPF platform.

Description

As today's Multi-Processor System-on-Chip (MPSoCs) are getting more and more complex due to the growing amount of cores and accelerators.  Hence it's not possible anymore to set runtime parameters like frequency and task distribution by design time in an optimal manner. Therefore future controllers try to make use of machine learning which is aware of the system's current state (self-awareness).

Information Processing Factoriy (IPF) is a global project that claims to show self-awareness across multiple abstraction levels. It represents a paradigm shift in platform design by envisioning the move towards a consistent platform-centric design in which the combination of self-organized learning and formal reactive methods guarantee the applicability of such cyber-physical systems in safety-critical and high-availability applications.

At TUM, we explore the application and implementation of machine learning algorithms in hardware to optimize the mode of operation of MPSoCs at runtime. 

Towards this goal, you'll complete the following tasks:
1. Understand the current implementation of Learning Classifier Tables (LCT) and Supervisory Control as well as their communication with SW in VHDL.
2. Design and implement a new HW-SW interface which supports new features and functionalities on the FPGA.
3. Develop a software API to utilize the functionalities implemented in hardware.
4. Test your new HW-SW interface.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:
• Good VHDL Skills
• Good C / C ++ Skills
• Good Understanding of MPSoCs
• Self-motivated and structured work style
• opional: basic knowledge of machine learning

Contact

Anmol Prakash Surhonne
Chair of Integrated Systems
Arcisstrasse 21, 80333 Munich Germany
Tel. +49 89 289 23872
anmol.surhonne@tum.de
www.lis.ei.tum.de 

Florian Maurer
Chair of Integrated Systems
Arcisstrasse 21, 80333 Munich Germany
Tel. +49 89 289 23870
flo.maurer@tum.de
www.lis.ei.tum.de

Supervisor:

Anmol Prakash Surhonne, Florian Maurer
------

Evaluate sensor processing algorithm on new computer platform

Evaluate sensor processing algorithm on new computer platform

Description

o Get familiarised with the new computer architecture based on ARM Cortex R52 and with the typical sensor processing
algorithm
o Evaluate the computer hardware and software architecture for a sensor processing implementation
o Perform an implementation of the algorithm with a selected software architecture
o Evaluate the performance of the processing speed and computer resources (memory, interconnect bandwidth, …)
o Documentation of the implementation, evaluation and test results

Supervisor:

------

Implementation of a navigation clock assessment algorithm for space applications

Implementation of a navigation clock assessment algorithm for space applications

Description

o Understand and replay the Simulations of the algorithm on Matlab Simulink
o If necessary, adapt them for VHDL oriented implementation
o Evaluate the FPGA technology, suitable for space, for the algorithm implementation
o Evaluate the algorithm resources needs for FPGA implementation
o Perform all steps to define, implement and test (vs Matlab) the algorithm into a the FPGA
o Documentation associated to the VHDL development process

Supervisor:

------

Implementation of streaming test interfaces for a new space computing platform

Implementation of streaming test interfaces for a new space computing platform

Description

o Get familiarised with the new computer architecture and with the streaming interface concept
o Implement the streaming interface for the test system in VHDL
o Generate test software for the HPDP computer architecture and the test system
o Debug and execute the interface tests on the HPDP hardware.
o Execute performance tests
o Documentation of all the evaluation, implementation, various analysis and test results

Supervisor:

------

Implementation of benchmarks for a new computing platform for space applications

Implementation of benchmarks for a new computing platform for space applications

Description

o Evaluate the HPDP processor architecture for the required benchmarks
¤ Signal processing such as FIR, FFT
¤ I/O performance
o Partition the benchmarks in its sequential and parallel parts
o Implement the benchmarks in C and NML (native mapping language - a software dataflow description language) to achieve
best performance
o Debug and execute the benchmarks on the HPDP hardware
o Documentation of all the evaluation, implementation, various analysis and test results

Supervisor:

------

Evaluation of space processors for robotic and autonomous navigation

Evaluation of space processors for robotic and autonomous navigation

Description

o Evaluate the array processor architecture for image based navigation algorithms
o Evaluating / Improving the existing image processing algorithm based on the images obtained from the embarked camera on
the rover, especially considering what difficulties might appear wrt. the same implementation in Matlab or C/C++
o Perform all steps to implement and test the image based navigation algorithms into the on-board processor (HPDP) and
analyse results (Quality of detection and tracking of features, throughput and execution time of the algorithms)
o Design of the motor control activities for the manoeuvring of the rover with a given μP
o Documentation of all the evaluation, implementation, various analysis and test results

Supervisor:

------

Convolutional Neural Networks for space image processing

Convolutional Neural Networks for space image processing

Description

- Develop an systematic and mathematical understanding of the principles of CNN,
- Object shape and intensity learning in space: methods, algorithms and CNN
- Object recognition in space with CNN, algorithms and methodologies
- Implementation in FPGA technology suitable for space, in VHDL language
- Evaluation of suitability of CNN for Space

Supervisor:

------

Evaluation of a new European FPGA technology ("Brave") for space mass memory

Evaluation of a new European FPGA technology ("Brave") for space mass memory

Description

- Evaluation of the "Brave Large" FPGA technology in terms of the impact for current space mass memory FPGA architectures
- Perform the portability assessment of a space mass memory applications to the new FPGA technology, make use of the
available on-chip ARM Cortex R5 core if necessary
- Demonstrate the performance of the application with an implementation on the simulation tools and in hardware
- Evaluation of tool chain (Synthesis and Place and Route) for this technology
- Documentation of all the evaluation, implementation, various analysis and test results.

Supervisor:

------

Exploring the Dynamicity of Region Based Cache Coherence for Distributed Shared Memory MPSoCs on an FPGA Prototype

Exploring the Dynamicity of Region Based Cache Coherence for Distributed Shared Memory MPSoCs on an FPGA Prototype

Keywords:
Cache Coherence, Distributed Shared Memory MPSoCs

Short Description:
The goal of this project is to explore the dynamicity of RBCC and minimize the context switching penalties.

Description

Providing hardware coherence for modern tile-based MPSoCs requires additional area. As a result, this does not scale with increasing tile counts. As part of the Invasive Computing project, we introduced Region Based Cache Coherence (RBCC) which is a scalable approach that provides on-demand coherence. RBCC enables users to dynamically create/destroy coherency regions based on application requirements. With such dynamicity, the associated context switching overheads like cache flushing, directory flushing, coherency region reconfigurations, etc. need to be investigated and optimized.

Towards this goal you’ll complete the following tasks:
• Investigate existing directory based cache coherence schemes
• Implement/Modify a dynamic framework for RBCC
• Verify the design on a FPGA-based hardware platform

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:
• Very Good VHDL Skills
• Good C/C++ Skills
• Good understanding of MPSoCs and Cache Coherence Schemes
• Self-motivated and structured work style

Contact

Akshay Srivatsa
Chair of Integrated Systems
Arcisstraße 21, 80333 Munich
Tel. +49 89 289 22963
srivatsa.akshay@tum.de
www.lis.ei.tum.de

Supervisor:

------

Approximate Computing for FPGA-based Image Processing

Approximate Computing for FPGA-based Image Processing

Description

Digital image processing in professional applications places ever-higher demands, so that the computing power and power consumption of FPGA devices reach their limits. Approximate Computing refers to a set of methods that are based on not performing calculations exactly, but only approximated. As a result, fewer resources are used in the FPGA, more functions can be implemented in the existing FPGA devices, and the energy efficiency of the calculations is improved. However, approximate computing always degrades the quality of the application, so an optimization process must be found that maximizes utility and keeps degradation below a tolerable limit.

Supervisor:

------

An Introduction to Finite Length Codes for SoCs

An Introduction to Finite Length Codes for SoCs

Description

High data integrity is a key in modern SoC communication. However, due to the ever decreasing feature size, modern silicon devices become more vulnerable to transient faults. At the same time, on-Chip communication operates on rather small chunks of data, in contrast to traditional unreliable communication scenarios like wireless communication. Therefore, conventional measures like the channel capacity as introduced by Shannon do hold anymore, paving the way to new methods to quantify channels and codes alike that take the code length into account. The goal of this seminar is to provide an introduction into the field and methods of finite length codes.

Supervisor:

------

Meltdown: Concept, Cause and Effect

Meltdown: Concept, Cause and Effect

Description

When at the beginning of 2018 researchers published their discovery of side-channel attacks Meltdown and Spectre on modern CPUs, an entire industry was forced to rethink state-of-the-art techniques used to increase the processing power of their designs. In the seminar the core concepts of modern processors, their exploits leading to Meltdown, as well as mitigation techniques shall be presented.

Supervisor:

------

An Introduction to Finite Length Codes for SoCs

An Introduction to Finite Length Codes for SoCs

Description

High data integrity is a key in modern SoC communication. However, due to the ever decreasing feature size, modern silicon devices become more vulnerable to transient faults. At the same time, on-Chip communication operates on rather small chunks of data, in contrast to traditional unreliable communication scenarios like wireless communication. Therefore, conventional measures like the channel capacity as introduced by Shannon do hold anymore, paving the way to new methods to quantify channels and codes alike that take the code length into account. The goal of this seminar is to provide an introduction into the field and methods of finite length codes.

Supervisor:

------

Design and Implementation of a Network Interface for a Fault-Tolerant Time-Division Multiplexed Network on Chip

Design and Implementation of a Network Interface for a Fault-Tolerant Time-Division Multiplexed Network on Chip

Description

Enabled by ever decreasing structure sizes, modern System on Chips (SoC) integrate a large amount of different processing elements, making them Multi-Processor System on Chips (MPSoC). These processing elements require a communication infrastructure to exchange data with each other and with shared resources such as memory and I/O ports. The limited scalability of bus-based solutions has led to a paradigm shift towards Network on Chips (NoC) which allow for multiple data streams between different nodes to be exchanged in parallel.
In order to implement a safety-critical real-time application on such an MPSoC, the NoC must fulfill certain requirements: it must ensure that no critical data gets lost, all critical data gets delivered within a certain deadline, and other applications cannot interfere with the critical application. And all this must be guaranteed even in case of a fault in the NoC.

Goal

The goal of this thesis is to implement a Network Interface for a Time-Division Multiplexed NoC that meets the criteria described above and create tests to validate the behavior of the implemented hardware.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:

  • Very good programming skills in a hardware description language i.e. VHDL or (System)Verilog
  • Solid Python programming skills
  • At least basic knowledge of the functionality of NoCs
  • Self-motivated and structured work style

Learning Objectives

By completing this project, you will be able to

  • understand the concept of TDM NoCs
  • design and implement a complex hardware module in SystemVerilog
  • create tests to validate hardware modules
  • document your work in form of a scientific report and a presentation

 

 

Contact

Max Koenen
Room N2118
Tel. 089 289 23084
max.koenen@tum.de

Supervisor:

------

Extensions & Performance Benchmarks of a CAPI-based Network Interface Card

Extensions & Performance Benchmarks of a CAPI-based Network Interface Card

Description

With ever-increasing network data rates, the data transfer between network interface card (NIC) and the host system has a decisive impact on the achievable application performance. To fully exploit the host system’s CPU capacity for application processing, it is important to minimize I/O processing overheads. In this project, we want to extend the implementation and optimize the performance of an FPGA-based NIC that is connected to the host system with the Coherent Accelerator Processor Interface (CAPI) [1] for IBM POWER8 Systems.

In a previous project an initial implementation of the CAPI-based NIC was developed using the CAPI Storage, Network and Analytics Programming (SNAP) framework [2]. The goal of this project is to integrate the physical network interfaces in the design, as well as to identify and mitigate performance bottlenecks.

[1] https://developer.ibm.com/linuxonpower/capi/

[2] https://openpowerfoundation.org/blogs/capi-snap-simple-developers

Towards this goal you will complete the following tasks:

  • Analyze source code and working principles of the existing NIC implementation
  • Getting familiar with CAPI and the CAPI SNAP framework
  • Integrate an Ethernet Media Access Controller (MAC) IP core into the FPGA design
  • Benchmark throughput and latency of FPGA-to-host communication through simulations and measurements
  • Identify performance bottlenecks, propose and implement improvements
  • Extend the design to make use of multiple RX/TX queues for multi-core processing

Prerequisites

To successfully complete this project, you should already have several of the following skills and experiences:

  • Knowledge of a hardware description language such as Verilog and/or VHDL
  • Hands-on FPGA development experience
  • Solid C programming skills
  • Proficiency using Linux
  • Self-motivated and structured work style

Learning Objectives

By completing this project, you will be able to

  • understand the basic working principles of NICs, as well as FPGA-host communication mechanisms
  • apply your theoretical knowledge to an implementation consisting of both hard- and software parts
  • document work in a scientific report form and in a presentation

Contact

Andreas Oeldemann
Room N2137
Tel. 089 289 22962
andreas.oeldemann@tum.de

The thesis is carried out in cooperation with

Power Systems Acceleration Department
IBM Systems – HW Development Böblingen
IBM Deutschland R&D GmbH

 

 

Supervisor:

-----

Design and Implementation of a Hardware Managed Queue

Design and Implementation of a Hardware Managed Queue

Description

Description

Queues are a central element of an Operating System and Application Control Flow in general.

This project is part of a hardware-software codesign.

Goal

The goal of this project is to develop a hardware managed queue for a NoC-based multiprocessor platform

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

---

Application Profiling for Near Memory Computing

Application Profiling for Near Memory Computing

Description

* Image Source: http://www.layer7.co.za/app_profiling.html

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to profile application in the context of Near Memory Computing and to identify useful functions or primitives that could be accelerated.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C/C++
  • Good programming skills in SystemC
  • Very good analytical thinking and understanding of complex problems
  • Good knowledge about digital circuit design
  • Very good knowledge in the field of Near Memory Computing

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

---

FPGA Prototyping a Bus Front-End for Near Memory Accelerators

FPGA Prototyping a Bus Front-End for Near Memory Accelerators

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to develop a bus front-end for near memory operations on a FPGA prototype.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

---

FPGA Prototyping a Memory Back-End for Near Memory Accelerators

FPGA Prototyping a Memory Back-End for Near Memory Accelerators

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to develop a memory back-end for near memory operations on a FPGA prototype.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

------

Frequency Optimization of a FPGA Prototype

Frequency Optimization of a FPGA Prototype

Description

Description

Our NoC-based many-core design is implemented on multiple Xilinx Virtex7 FPGAs. It is currently frequency limited by individual components.

Goal

The goal of this work is to optimize the overall frequency of an FPGA design.

This work includes:

  • Indetification of the critical paths of the design
  • Pipelining the design to reach higher frequencies

Prerequisites

For this challenging task, several prerequisites should be met:

  • Very good knowledge of VHDL
  • Very good knowledge of the Xilinx Vivado Synthesis Tool
  • Very good experience with FPGA design
  • Very good knowledge about digital circuit design

Application

If you are interested, send me an email with your CV, your transcript of records and summary of your experience attachted.

Contact

Sven Rheindt

Room: N2140

Tel. 089 289 28387

sven.rheindt@tum.de

Supervisor:

---

Simulator Support for Dynamic Task Migration

Simulator Support for Dynamic Task Migration

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to implement dynamic data migration into a trace-based simulator and to evaluate its potential.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C++ or SystemC
  • Good comprehension of a complex system
  • Very good knowledge about hardware development.

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

Ongoing Works

Master's Theses

Design and Implementation of a Fault-Tolerant Low-Throughput Broadcast Control & Management Network for System on Chip

Design and Implementation of a Fault-Tolerant Low-Throughput Broadcast Control & Management Network for System on Chip

Description

Enabled by ever decreasing structure sizes, modern System on Chips (SoC) integrate a large amount of different processing elements, making them Multi-Processor System on Chips (MPSoC). These processing elements require a communication infrastructure to exchange data with each other and with shared resources such as memory and I/O ports. The limited scalability of bus-based solutions has led to a paradigm shift towards Network on Chips (NoC) which allow for multiple data streams between different nodes to be exchanged in parallel.
One way of organizing the access to such a NoC is by using Time-Division Multiplexing (TDM) which
allows to give service guarantees. However, such a TDM NoC must be configured before it can be used which requires a reliable configuration network.

Goal

The goal of this thesis is to implement a reliable broadcast configuration network that can be used to configure the routers and network interfaces of a TDM NoC and to create tests to validate the implemented hardware.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:

  • Good programming skills in a hardware description language i.e. VHDL or (System)Verilog
  • Good knowledge of on-chip communication
  • Solid Python programming skills
  • At least basic knowledge of the functionality of NoCs
  • Self-motivated and structured work style

Learning Objectives

By completing this project, you will be able to

  • understand the concept of TDM NoCs
  • create and extend hardware modules in SystemVerilog
  • create tests to validate hardware modules
  • document your work in form of a scientific report and a presentation

 

Contact

Max Koenen
Room N2118
Tel. 089 289 23084
max.koenen@tum.de

Supervisor:

Forschungspraxis or MSCE Internships

Implementation of Fault-Injection & Fault-Detection Mechanisms in a Time-Division Multiplexed Network on Chip

Implementation of Fault-Injection & Fault-Detection Mechanisms in a Time-Division Multiplexed Network on Chip

Description

Enabled by ever decreasing structure sizes, modern System on Chips (SoC) integrate a large amount of different processing elements, making them Multi-Processor System on Chips (MPSoC). These processing elements require a communication infrastructure to exchange data with each other and with shared resources such as memory and I/O ports. The limited scalability of bus-based solutions has led to a paradigm shift towards Network on Chips (NoC) which allow for multiple data streams between different nodes to be exchanged in parallel.
To implement safety-critical real-time applications on such an MPSoC, the NoC must be fault-tolerant. In order to fulfill this requirement, it is necessary to first detect a fault in the system. Furthermore, to test this requirement, it is necessary to be able to inject errors into the system at random times and places.

Goal

The goal of this thesis is to implement a fault-injection and a fault-detection mechanism in a Time-Division Multiplexed (TDM) NoC and to create tests to validate the behavior of the hardware models.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:

  • At least basic programming skills in a hardware description language i.e. VHDL or (System)Verilog
  • Solid Python programming skills
  • At least basic knowledge of the functionality of NoCs
  • Self-motivated and structured work style

Learning Objectives

By completing this project, you will be able to

  • understand the concept of TDM NoCs
  • understand the concept of fault-detection in hardware
  • create and extend hardware modules in SystemVerilog
  • create tests to validate hardware modules
  • document your work in form of a scientific report and a presentation

 

Contact

Max Koenen
Room N2118
Tel. 089 289 23084
max.koenen@tum.de

Supervisor:

Seminars

Architectures for Neuromorphic Computing

Architectures for Neuromorphic Computing

Description

The goal of neuromorphic computers is to mimic the behaviour of the human nervous system or brain. Since the behaviour of neurons differs greatly from how classical computer systems work there is a need for new architectures. The approaches range from specialized CMOS designs over MOSFET based architectures to memristor based approaches. The goal of this seminar is to present the challenges posed by neuromorphic computing and how different architectures approach them.

Supervisor:

Energy Efficiency of Neural Networks

Energy Efficiency of Neural Networks

Description

Deep and Convolutional Neural Networks are currently the de-facto standard when
it comes to machine learning and in the past years there have been great advances regarding their performance. However, with the wide adoption of these
techniques in data-centers around the world, energy efficiency becomes a more
and more important aspect. Therefore, the goal of this seminar is to provide an
overview of neural network implementations in software and hardware with regard to their energy efficiency.

Supervisor:

Energy Efficiency of Neural Networks

Energy Efficiency of Neural Networks

Description

Deep and Convolutional Neural Networks are currently the de-facto standard when
it comes to machine learning and in the past years there have been great advances regarding their performance. However, with the wide adoption of these
techniques in data-centers around the world, energy efficiency becomes a more
and more important aspect. Therefore, the goal of this seminar is to provide an
overview of neural network implementations in software and hardware with regard to their energy efficiency.

Supervisor:

Meltdown: Concept, Cause and Effect

Meltdown: Concept, Cause and Effect

Description

When at the beginning of 2018 researchers published their discovery of side-channel attacks Meltdown and Spectre on modern CPUs, an entire industry was forced to rethink state-of-the-art techniques used to increase the processing power of their designs. In the seminar the core concepts of modern processors, their exploits leading to Meltdown, as well as mitigation techniques shall be presented.

Supervisor:

Specialized Hardware for Deep-Learning Applications (MSEI)

Specialized Hardware for Deep-Learning Applications (MSEI)

Keywords:
Inference Engines, TPU, DNNs

Short Description:
Deep Neural Networks (DNNs) became an fundamental aspect of nowadays applications in the field of computer vision. Because of their inherent demand in memory storage and compute storage their efficient implementation is still an unsolved problem. Moreover, the power-consumption makes them impracticable for some embedded applications. Specialized hardware accelerator for DNNs become more and more popular in the field of AI. Different to CPUs or GPUs these systems are only capable of computing DNNs efficiently. A survey of recent hardware accelerators for DNNs is in the scope of this seminar.

Description

Deep Neural Networks (DNNs) became an fundamental aspect of nowadays applications in the field of computer vision. Because of their inherent demand in memory storage and compute storage their efficient implementation is still an unsolved problem. Moreover, the power-consumption makes them impracticable for some embedded applications.   Specialized hardware accelerator for DNNs become more and more popular in the field of AI. Different to CPUs or GPUs these

 

systems are only capable of computing DNNs efficiently. A survey of recent hardware accelerators for DNNs is in the scope of this seminar.

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

Synchronisierung per SpaceWire

Synchronisierung per SpaceWire

Description

Der Wide-Field-Imager (WFI) ist ein Instrument des ESA ATHENA Satelliten. Einzelne Module der WFI Subsysteme (Instrument Control- and Power Distribution Unit, Detector Electronics) sind über einen SpaceWire Router miteinander verbunden. Um die Submodule untereinander zu synchronisieren, kann der Time-Code des SpaceWire Protokolls benutzt werden. Ziel dieser Arbeit ist es, die Synchronisationsmöglichkeiten mittels SpaceWire zu recherchieren und eines oder mehrere mögliche Konzepte zu detaillieren und zu charakterisieren. Mögliche Limitierungen dieser Methode sind aufzuzeigen und gegebenenfalls mittels eines vorhandenen Test-Aufbaus nachzuweisen.

Contact

m.plattner@tum.de

Supervisor:

Applying Binary Weights and Activations to Deep Neural Networks (MSCE)

Applying Binary Weights and Activations to Deep Neural Networks (MSCE)

Keywords:
Binary, EXOR, Optimization, DNN

Short Description:
Binary weights and acctivations are capable of increasing the performnance of DNNs drastically

Description

Floating-point operations are computational heavy, memory hungry and the energy consumption is high, making them less optimal in embedded hardware. Fix-point operations are commonly used in embedded hardware [1]. In contrast, binary weighs and operations of CNNs are actually discussed in the research-community.

Courbariaux et al. [2] have applied binary weights to DNNs with a minor loss of accuracy. Moreover, in [3], they have proven that binary activations and computations can also be used within DNNs.

An elaborate analysis and comparison of recent publications, in the field of low bit-width CNN applications, is the main task for this seminar topic. Furthermore, the implementation-wise differences, as well as potential hardware, should be discussed in this work. 

References:

[1] Google TPU; https://cloud.google.com/tpu/?hl=de; 2018.

[2] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, et al.; BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1; 2016.

[3] Matthieu Courbariaux,Yoshua Bengio, Jean-Pierre David; Binary Connect: Training Deep Neural Networks with binary weights during propagations; 2016.

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

Applying Binary Weights and Activations to Deep Neural Networks (MSEI)

Applying Binary Weights and Activations to Deep Neural Networks (MSEI)

Keywords:
Binary, EXOR, Optimization, DNN

Short Description:
Binary weights and acctivations are capable of increasing the performnance of DNNs drastically

Description

Floating-point operations are computational heavy, memory hungry and the energy consumption is high, making them less optimal in embedded hardware. Fix-point operations are commonly used in embedded hardware [1]. In contrast, binary weighs and operations of CNNs are actually discussed in the research-community.

Courbariaux et al. [2] have applied binary weights to DNNs with a minor loss of accuracy. Moreover, in [3], they have proven that binary activations and computations can also be used within DNNs.

An elaborate analysis and comparison of recent publications, in the field of low bit-width CNN applications, is the main task for this seminar topic. Furthermore, the implementation-wise differences, as well as potential hardware, should be discussed in this work. 

References:

[1] Google TPU; https://cloud.google.com/tpu/?hl=de; 2018.

[2] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, et al.; BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1; 2016.

[3] Matthieu Courbariaux,Yoshua Bengio, Jean-Pierre David; Binary Connect: Training Deep Neural Networks with binary weights during propagations; 2016.

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

Sparse Weighs and Activations in Deep Neural Networks (MSCE)

Sparse Weighs and Activations in Deep Neural Networks (MSCE)

Keywords:
Sparse, DNN, Optimization

Short Description:
If the computations of convolutional neural networks are performed sparsely, the computational cost and the memory demand is reduced.

Description

Nowadays, Convolutional Neural Networks (CNNs) are used in a wide field, such as image and sound recognition, object detection and mobile vision. While offering remarkable results in several classification tasks and regression analysis, deep learning algorithms are extremely computationally heavy and storing the vast number of parameters requires a lot of memory space and bandwidth.

Song Han et al. [1] have demonstrated that DNNs consist of a huge number of redundant and unused parameters. Removing those weights from the model makes the kernels and computations sparse. However, sparse weights and activations decrease the memory demand of the model. Moreover, if the computations are performed sparsely, the computational cost is reduced analogously. In fact, modern accelerator for DNNs are highly parallelized.

In this seminar, the methods and possibilities of performing sparse convolutions and operations of DNNs on parallel hardware are to be studied.

References:

[1] Song Han, Huizi Mao, William J. Dally; Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding; ICLR 2016

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

Sparse Weighs and Activations in Deep Neural Networks (MSEI)

Sparse Weighs and Activations in Deep Neural Networks (MSEI)

Keywords:
Sparse, DNN, Optimization

Short Description:
If the computations of convolutional neural networks are performed sparsely, the computational cost and the memory demand is reduced.

Description

Nowadays, Convolutional Neural Networks (CNNs) are used in a wide field, such as image and sound recognition, object detection and mobile vision. While offering remarkable results in several classification tasks and regression analysis, deep learning algorithms are extremely computationally heavy and storing the vast number of parameters requires a lot of memory space and bandwidth.

Song Han et al. [1] have demonstrated that DNNs consist of a huge number of redundant and unused parameters. Removing those weights from the model makes the kernels and computations sparse. However, sparse weights and activations decrease the memory demand of the model. Moreover, if the computations are performed sparsely, the computational cost is reduced analogously. In fact, modern accelerator for DNNs are highly parallelized.

In this seminar, the methods and possibilities of performing sparse convolutions and operations of DNNs on parallel hardware are to be studied.

References:

[1] Song Han, Huizi Mao, William J. Dally; Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding; ICLR 2016

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

Convolution in Convolutional Neural Networks (MSCE)

Convolution in Convolutional Neural Networks (MSCE)

Keywords:
DNN, Optimization, Convolution, GEMM, FFT, Winograd, Hardware

Short Description:
Choosing the right concolution technique helps to accelerate the computation of deep neural networks in embedded systems. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

Description

Applying state-of-the-art CNNs in embedded systems, like in the field of autonomous driving, is a twofold challenging task. Firstly, automotive applications have constrained hardware resources, likewise the memory storage or the computational performance. Secondly, applications for autonomous driving have to perform fast and require low latency.

However, the classification of one image by VGG-16 [1] requires 15.5 billion floating-point operations.

Within CNNs, convolutions can be applied differently. However, choosing the right method helps to accelerate the computation. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

In this seminar topic, existing methods for convolutions are studied. Moreover, the details of those methods and their use for CNNs are elaborated and the effects of convolutional parameters (kernel size, padding or stride) and potential hardware accelerators have to be evaluated.

References:

[1] K. Simonyan, A. Zisserman; Very Deep Convolutional Networks for Large-Scale Image Recognition; ICLR 2015.

[2] Rahul Garg, Laurie Hendren; A Portable and High-Performance General Matrix-Multiply; IEEE 2014.

[3] Tahmid Abtahi, Amey Kulkarni, Tinoosh Mohsenin; Accelerating convolutional neural network with FFT on tiny cores; ISCAS 2017.

[4] Andrew Lavin, Scott Gray, Fast Algorithms for Convolutional Neural Networks, CVPR 2016.

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

Convolution in Convolutional Neural Networks (MSEI)

Convolution in Convolutional Neural Networks (MSEI)

Keywords:
DNN, Optimization, Convolution, GEMM, FFT, Winograd, Hardware

Short Description:
Choosing the right concolution technique helps to accelerate the computation of deep neural networks in embedded systems. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

Description

Applying state-of-the-art CNNs in embedded systems, like in the field of autonomous driving, is a twofold challenging task. Firstly, automotive applications have constrained hardware resources, likewise the memory storage or the computational performance. Secondly, applications for autonomous driving have to perform fast and require low latency.

However, the classification of one image by VGG-16 [1] requires 15.5 billion floating-point operations.

Within CNNs, convolutions can be applied differently. However, choosing the right method helps to accelerate the computation. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

In this seminar topic, existing methods for convolutions are studied. Moreover, the details of those methods and their use for CNNs are elaborated and the effects of convolutional parameters (kernel size, padding or stride) and potential hardware accelerators have to be evaluated.

References:

[1] K. Simonyan, A. Zisserman; Very Deep Convolutional Networks for Large-Scale Image Recognition; ICLR 2015.

[2] Rahul Garg, Laurie Hendren; A Portable and High-Performance General Matrix-Multiply; IEEE 2014.

[3] Tahmid Abtahi, Amey Kulkarni, Tinoosh Mohsenin; Accelerating convolutional neural network with FFT on tiny cores; ISCAS 2017.

[4] Andrew Lavin, Scott Gray, Fast Algorithms for Convolutional Neural Networks, CVPR 2016.

Contact

alexander.frickenstein@bmw.de

Supervisor:

Alexander Frickenstein

The Evolution of Bitcoin Hardware

The Evolution of Bitcoin Hardware

Description

Since its deployment in 2009, Bitcoin has achieved remarkable success and spawned hundreds of other cryptocurrencies. This seminar topic traces the evolution of the hardware underlying the system, from early GPU-based homebrew machines to today’s datacenters powered by application-specific integrated circuits. These ASIC clouds provide a glimpse into planet-scale computing’s future.

Supervisor:

Statical WCET-Analysis for Multi-Core Systems

Statical WCET-Analysis for Multi-Core Systems

Description

It is indispensable to know the worst-case execution time (WCET) for the development of real-time systems. There exist several methods to approximate the WCET on single-core platforms. Whenever multiple tasks run simultaneously on a multi-core platform, these methods cannot provide a reliable estimation any more. The goal of this seminar is to summarize the major problems which arise when analysing multi-core applications and some methods to solve them.

Contact

Dirk Gabriel
Raum N2117
Tel. 089 289 28578
dirk.gabriel@tum.de

Supervisor:

Representation Learning for Multicore Power/Thermo Features

Representation Learning for Multicore Power/Thermo Features

Keywords:
Machine learning, representation learning, multicore, power, temperature

Description

 

Reducing the power consumption of multicore processors is an ongoing and challenging task for processor designers. With increasing transistor count, power and thermal information is increasingly difficult to obtain. To obtain more and reliabe power/thermo information, designers are starting to use novel machine learning algorithms. In this seminar, you will investigate different representation learning algorithms – a subset of machine learning algorithms – for identifying power/thermo features on chip. You will make an overview over related work for multicore reprenstation learning and finally make an educated guess which representation learning algorithms are best suited for identiyfing power/thermo features.

 

Contact

mark.sagi@tum.de

Supervisor:

Convolution in Convolutional Neural Networks

Convolution in Convolutional Neural Networks

Keywords:
DNN, Optimization, Convolution, GEMM, FFT, Winograd, Hardware

Short Description:
Choosing the right concolution technique helps to accelerate the computation of deep neural networks in embedded systems. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

Description

 

Applying state-of-the-art CNNs in embedded systems, like in the field of autonomous driving, is a twofold challenging task. Firstly, automotive applications have constrained hardware resources, likewise the memory storage or the computational performance. Secondly, applications for autonomous driving have to perform fast and require low latency.

 

However, the classification of one image by VGG-16 [1] requires 15.5 billion floating-point operations.

 

Within CNNs, convolutions can be applied differently. However, choosing the right method helps to accelerate the computation. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

In this seminar topic, existing methods for convolutions are studied. Moreover, the details of those methods and their use for CNNs are elaborated and the effects of convolutional parameters (kernel size, padding or stride) and potential hardware accelerators have to be evaluated.

 

References:

[1]

K. Simonyan, A. Zisserman; Very Deep Convolutional Networks for Large-Scale Image Recognition; ICLR 2015.

[2]

Rahul Garg, Laurie Hendren; A Portable and High-Performance General Matrix-Multiply; IEEE 2014.

[3]

Tahmid Abtahi, Amey Kulkarni, Tinoosh Mohsenin; Accelerating convolutional neural network with FFT on tiny cores; ISCAS 2017.

[4]

Andrew Lavin, Scott Gray, Fast Algorithms for Convolutional Neural Networks, CVPR 2016.

Supervisor:

Alexander Frickenstein

Student Assistant Jobs

Studentische Hilfskraft für Vorlesung Digitale Schaltungen

Studentische Hilfskraft für Vorlesung Digitale Schaltungen

Description

Die Tätigkeit umfasst 

  • Vorkorrektur von Hausaufgaben und praktischen Übungen
  • 2 ngSpice Tutorstunden zu den praktischen Übungen

Contact

Sven Rheindt

Room: N2140

Tel. 089 289 28387

sven.rheindt@tum.de

Supervisor: