Currently offered Theses

Often, new topics are in preparation for being advertised, which are not yet listed here. Sometimes there is also the possibility to define a topic matching your specific interests. Therefore, do not hesitate to contact our scientific staff, if you are interested in contributing to our work. If you have further questions concerning a thesis at the institute please contact Dr. Thomas Wild.

For interested students in an "Ingenieurpraxis":

We supervise such internships done in industry if the topic matches our area of work. However, we do not offer such internships at our chair as from our point of view students should gain early experience in industry work.

 

BAMAIDPFPIPHSSHK
Title
------

Extensions & Performance Benchmarks of a CAPI-based Network Interface Card

Extensions & Performance Benchmarks of a CAPI-based Network Interface Card

Description

With ever-increasing network data rates, the data transfer between network interface card (NIC) and the host system has a decisive impact on the achievable application performance. To fully exploit the host system’s CPU capacity for application processing, it is important to minimize I/O processing overheads. In this project, we want to extend the implementation and optimize the performance of an FPGA-based NIC that is connected to the host system with the Coherent Accelerator Processor Interface (CAPI) [1] for IBM POWER8 Systems.

In a previous project an initial implementation of the CAPI-based NIC was developed using the CAPI Storage, Network and Analytics Programming (SNAP) framework [2]. The goal of this project is to integrate the physical network interfaces in the design, as well as to identify and mitigate performance bottlenecks.

[1] https://developer.ibm.com/linuxonpower/capi/

[2] https://openpowerfoundation.org/blogs/capi-snap-simple-developers

Towards this goal you will complete the following tasks:

  • Analyze source code and working principles of the existing NIC implementation
  • Getting familiar with CAPI and the CAPI SNAP framework
  • Integrate an Ethernet Media Access Controller (MAC) IP core into the FPGA design
  • Benchmark throughput and latency of FPGA-to-host communication through simulations and measurements
  • Identify performance bottlenecks, propose and implement improvements
  • Extend the design to make use of multiple RX/TX queues for multi-core processing

Prerequisites

To successfully complete this project, you should already have several of the following skills and experiences:

  • Knowledge of a hardware description language such as Verilog and/or VHDL
  • Hands-on FPGA development experience
  • Solid C programming skills
  • Proficiency using Linux
  • Self-motivated and structured work style

Learning Objectives

By completing this project, you will be able to

  • understand the basic working principles of NICs, as well as FPGA-host communication mechanisms
  • apply your theoretical knowledge to an implementation consisting of both hard- and software parts
  • document work in a scientific report form and in a presentation

Contact

Andreas Oeldemann
Room N2137
Tel. 089 289 22962
andreas.oeldemann@tum.de

The thesis is carried out in cooperation with

Power Systems Acceleration Department
IBM Systems – HW Development Böblingen
IBM Deutschland R&D GmbH

 

 

Supervisor:

-----

Design and Implementation of a Hardware Managed Queue

Design and Implementation of a Hardware Managed Queue

Description

Description

Queues are a central element of an Operating System and Application Control Flow in general.

This project is part of a hardware-software codesign.

Goal

The goal of this project is to develop a hardware managed queue for a NoC-based multiprocessor platform

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

----

Multi-core Interference Channel Analysis (at GE Aviation)

Multi-core Interference Channel Analysis (at GE Aviation)

Description

This work is an offer of General Electric Aviation supervised at TUM LIS.

About GE Aviation

GE Aviation Munich is a R&D center of excellence and is in the heart of southern Germany, on the Garching campus of the Technical University of Munich. This creates a unique blend for our engineers to be in a university setting, while performing research and development in a world-class industrial environment that is dedicated to bringing innovative technologies to market. Within the R&D community, the center maintains close partnerships with numerous universities, research institutions and technology companies in Germany and abroad.

Role summary

The role of the student will be to conduct an interference channel analysis to address potential safety challenges of modern multi-core architectures.

Responsibilities / Goals

GE Aviation is investigating the use of modern multi-core architectures. You will characterize the interference channels of two different multi-core architectures (NXP T1040 and Xilinx Zynq Ultrascale+). The former is a quadcore Power PC built around the e5500 core, the latter a quad-core ARM built around the A53 core.
In your role you will:

  • Investigate domain specific literature (CAST-32A) which will give you a guideline and direction
  • Identify interference channels by using the specifications of both architectures mentioned above
  • Perform a state-of-the art search of existing test suites that help to exercise and identify interference channels
  • Characterize each identified test suite's interference channel's analysis capability and granularity of results
  • Implement a test suite based on the existing ones and successfully run it on both architectures in the lab

Expected Qualifications

  • Good C/C++ Skills
  • Good understanding of real-time operating systems (e.g. RTLinux, FreeRTOS, WindRiver VXWorks) and MPSoCs
  • Fluency in German and English
  • Experience in use of real-time operating systems is a plus
  • Self-motivated, structured work style and good communication skills

Contact

Supervisor at GE Aviation: Alexander Walsch

Online application form

Supervisor:

------

Near Memory Traffic Compression for NoC-based Distributed Memory Architectures

Near Memory Traffic Compression for NoC-based Distributed Memory Architectures

Description

The bandwidth of data movement in a NoC based distributed memory architectures is one of the major bottlenecks of such systems.

Compressing the data traffic in the system could be an improvement.

The goal of this project is to make a survey of available data traffic compression schemes and architectures.

 

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

---

Application Profiling for Near Memory Computing

Application Profiling for Near Memory Computing

Description

* Image Source: http://www.layer7.co.za/app_profiling.html

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to profile application in the context of Near Memory Computing and to identify useful functions or primitives that could be accelerated.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C/C++
  • Good programming skills in SystemC
  • Very good analytical thinking and understanding of complex problems
  • Good knowledge about digital circuit design
  • Very good knowledge in the field of Near Memory Computing

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

---

FPGA Prototyping a Bus Front-End for Near Memory Accelerators

FPGA Prototyping a Bus Front-End for Near Memory Accelerators

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to develop a bus front-end for near memory operations on a FPGA prototype.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

---

FPGA Prototyping a Memory Back-End for Near Memory Accelerators

FPGA Prototyping a Memory Back-End for Near Memory Accelerators

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to develop a memory back-end for near memory operations on a FPGA prototype.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

------

Frequency Optimization of a FPGA Prototype

Frequency Optimization of a FPGA Prototype

Description

Description

Our NoC-based many-core design is implemented on multiple Xilinx Virtex7 FPGAs. It is currently frequency limited by individual components.

Goal

The goal of this work is to optimize the overall frequency of an FPGA design.

This work includes:

  • Indetification of the critical paths of the design
  • Pipelining the design to reach higher frequencies

Prerequisites

For this challenging task, several prerequisites should be met:

  • Very good knowledge of VHDL
  • Very good knowledge of the Xilinx Vivado Synthesis Tool
  • Very good experience with FPGA design
  • Very good knowledge about digital circuit design

Application

If you are interested, send me an email with your CV, your transcript of records and summary of your experience attachted.

Contact

Sven Rheindt

Room: N2140

Tel. 089 289 28387

sven.rheindt@tum.de

Supervisor:

---

Simulator Support for Dynamic Task Migration

Simulator Support for Dynamic Task Migration

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to implement dynamic data migration into a trace-based simulator and to evaluate its potential.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C++ or SystemC
  • Good comprehension of a complex system
  • Very good knowledge about hardware development.

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

------

Weiterentwicklung eines Linux Client Deployment Systems mit Puppet

Weiterentwicklung eines Linux Client Deployment Systems mit Puppet

Description

Die Fakultät für Elektro- und Informationstechnik stellt für Studenten und Mitarbeiter eine Vielzahl von Linux PCs zur Verfügung. Ein Konfigurationsmanagement stellt sicher, dass eine einheitliche Konfiguration auf allen PCs vorhanden ist und jederzeit aktuell gehalten wird. Für diese Aufgaben setzen wir die Open Source Werkzeuge Foreman (für die Grundinstallation) und Puppet (für das Konfigurationsmanagement) ein. Deine Aufgabe ist es, als Werkstudent die Anpassung und Weiterentwicklung dieses Systems zu begleiten. Die konkreten Aufgaben werden dabei je nach Bedarf vergeben, aktuell geplant sind beispielsweise die Umsetzung einer automatisierten Testumgebung.
Diese Arbeit gibt dir die einmalige Möglichkeit, am „Schalthebel der Automatisierung“ zu sitzen, wie sie in aktuellen Cloud-Umgebungen üblich ist. Mit deiner Arbeit beeinflusst und verbesserst du so die Installation hunderter PCs. Um die Aufgabe erfolgreich umsetzen zu können, sind folgende Voraussetzungen notwendig:

  • sehr gute Linux-Kenntnisse
  • geübter Umgang mit Werkzeugen der Open Source Welt, wie git, Skript-Sprachen, etc.
  • Interesse an einer längerfristigen Beschäftigung
  • selbständige Arbeitsweise und der Wunsch, sich in neue Themen einzuarbeiten


Bitte erläutere in deiner Bewerbung kurz, warum du dich für das Thema interessierst und welche relevanten Vorkenntnisse du bereits gesammelt hast.

Supervisor:

-----

Hardware accelerated Image Fusion

Hardware accelerated Image Fusion

Description

Automated driving systems require reliable information on the current environment in order to make proper decisions. Different sensor systems like cameras, LIDAR and radar contribute to this information. To minimize the possibility of incorrect recognitions or undetected objects the data provided by the different sensors must be exhaustively analyzed and compared to each other.

Such comparisons are only possible if the full surrounding is observed by each sensor system. As a single camera has a limited viewing angle multiple cameras are placed at different places around the vehicle to provide the required visual input.

 Additionally the processing time of the sensor inputs and data fusion must stay within limited bounds to ensure low end-to-end reaction times. For the camera systems this leads to a hardware accelerated implementation in order to achieve the required processing time.

Goal

The major goal of the thesis is the selection and implementation of a suitible algorithm to combine multiple images provided by different cameras to one image. Whereas the evaluation of the algorithm can be done with a pure software version e.g. with OpenCV the final version should run on Xilinx Zynq with suitable hardware accelerators implemented in the FPGA part.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:

  • Knowledge of a hardware description language e.g. VHDL
  • Solid C programming skills
  • Hands-on FPGA development experience, preferably using Xilinx Vivado
  • Self-motivated and structured work style

Contact

Dirk Gabriel
Room N2117
Tel. 089 289 28578
dirk.gabriel@tum.de

Supervisor:

Ongoing Works

Master's Theses

Efficient Offloading of Network Functionalities via ISA Extension

Efficient Offloading of Network Functionalities via ISA Extension

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to efficiently offload network functionalities and near memory operations via ISA extension. A hardware prototype will be built.

Learning Objectives

Towards this goal you’ll complete the following tasks: 

  • Work in a bigger project and understand the concept of an existing HW platform
  • Develop, implement and test an advanced hardware module on the given platform
  • Compare/Evaluate the implementation with state of the art
  • Document your work in a written thesis report and present your work in a presentation 

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in VHDL
  • Good programming skills in C
  • Good comprehension of a complex system
  • Very good knowledge about hardware development

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

Student

Steffen Schlienz

Simulator Support for Dynamic Data Migration

Simulator Support for Dynamic Data Migration

Description

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to implement dynamic data migration into a trace-based simulator and to evaluate its potential.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C++ or SystemC
  • Good comprehension of a complex system
  • Very good knowledge about hardware development.

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

Student

Iffat Brekhna

Optimizing Region Based Cache Coherence for the InvasIC Architecture on a FPGA Prototype

Optimizing Region Based Cache Coherence for the InvasIC Architecture on a FPGA Prototype

Keywords:
Cache Coherence, Distributed Directories, FPGA

Description

Providing hardware coherence for modern tile-based MPSoCs requires additional area. As a result, this does not scale with increasing tile counts. As part of the Invasive Computing project, we introduced Region Based Cache Coherence (RBCC) which is a dynamic scalable approach that provides on-demand coherence based on application requirements. However, the directories currently used for RBCC are not optimized for area. Therefore, RBCC can be further enhanced by optimizing these structures in conjunction with the coherency protocol for hybrid distributed shared memory MPSoCs.

Goal

The goal of this project is to optimize directory structures with smart replacement policiesand implement amodified coherence protocol to save on-chip area without sacrificing performance.

Towards this goal you’ll complete the following tasks:

  • Investigate existing directory based cache coherence schemes
  • Implement a smart directory stucture to reduce hardware overheads
  • Implement a hybrid cache coherence protocol for distributed shared memory systems
  • Verify the design on a FPFA-based hardware platform

Prerequisites

To successfully complete this project, you should already have the following skills and experiences:

  • Very Good VHDL Skills
  • Good C/C++ Skills
  • Good understanding of MPSoCs and Cache Coherence Schemes
  • Self-motivated and structured work style

Learning Objectives

 After you have successfully completed this project, you will be able to

  • Understand the challenges of cache coherence in multi-core systems
  • Understand the work flow from software-to-hardware

Contact

Akshay Srivatsa
Room N2140
Tel. 089 289 22963
srivatsa.akshay@tum.de

Supervisor:

Seminars

To Speed Up Artificial Intelligence, Mix Memory and Processing

To Speed Up Artificial Intelligence, Mix Memory and Processing

Description

If John von Neumann were designing a computer today, there’s no way he would build a thick wall between processing and memory. At least, that’s what computer engineer Naresh Shanbhag of the ­University of Illinois at Urbana-Champaign believes. The eponymous von Neumann architecture was published in 1945. It enabled the first stored-memory, reprogrammable computers—and it’s been the backbone of the industry ever since.

Now, Shanbhag thinks it’s time to switch to a design that’s better suited for today’s data-intensive tasks. In February, at the International Solid-State Circuits Conference (ISSCC), in San Francisco, he and others made their case for a new architecture that brings computing and memory closer together. The idea is not to replace the processor altogether but to add new functions to the memory that will make devices smarter without requiring more power.

Read further...

The goal of this seminar is to analyze the potential and need of near memory computing in the field of artificail intelligence.

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Supervisor:

Error Detection and Correction Codes for Network on Chip

Error Detection and Correction Codes for Network on Chip

Description

Modern Multi Processor System on Chip (MPSoC) often rely on a Network-on-Chip (NoC) to connect the different components on the chip. However, with ever decreasing feature sizes the systems become less reliable and errors can occur. Safety-critical applications, e.g. for autonomous driving, must be reliable enough to tolerate such occuring errors and compensate them. The goal of this seminar is to survey different error detection and correction codes that are used in NoCs and compare their implementation cost in terms of area, power consumption, and added latency.

Supervisor:

Representation Learning for Multicore Power/Thermo Features

Representation Learning for Multicore Power/Thermo Features

Keywords:
Machine learning, representation learning, multicore, power, temperature

Description

 

Reducing the power consumption of multicore processors is an ongoing and challenging task for processor designers. With increasing transistor count, power and thermal information is increasingly difficult to obtain. To obtain more and reliabe power/thermo information, designers are starting to use novel machine learning algorithms. In this seminar, you will investigate different representation learning algorithms – a subset of machine learning algorithms – for identifying power/thermo features on chip. You will make an overview over related work for multicore reprenstation learning and finally make an educated guess which representation learning algorithms are best suited for identiyfing power/thermo features.

 

Contact

mark.sagi@tum.de

Supervisor:

Convolution in Convolutional Neural Networks

Convolution in Convolutional Neural Networks

Keywords:
DNN, Optimization, Convolution, GEMM, FFT, Winograd, Hardware

Short Description:
Choosing the right concolution technique helps to accelerate the computation of deep neural networks in embedded systems. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

Description

 

Applying state-of-the-art CNNs in embedded systems, like in the field of autonomous driving, is a twofold challenging task. Firstly, automotive applications have constrained hardware resources, likewise the memory storage or the computational performance. Secondly, applications for autonomous driving have to perform fast and require low latency.

 

However, the classification of one image by VGG-16 [1] requires 15.5 billion floating-point operations.

 

Within CNNs, convolutions can be applied differently. However, choosing the right method helps to accelerate the computation. Instead of using conventional matrix multiplications, one can use methods such as GEMM [2], FFT [3] or Winograd [4].

In this seminar topic, existing methods for convolutions are studied. Moreover, the details of those methods and their use for CNNs are elaborated and the effects of convolutional parameters (kernel size, padding or stride) and potential hardware accelerators have to be evaluated.

 

References:

[1]

K. Simonyan, A. Zisserman; Very Deep Convolutional Networks for Large-Scale Image Recognition; ICLR 2015.

[2]

Rahul Garg, Laurie Hendren; A Portable and High-Performance General Matrix-Multiply; IEEE 2014.

[3]

Tahmid Abtahi, Amey Kulkarni, Tinoosh Mohsenin; Accelerating convolutional neural network with FFT on tiny cores; ISCAS 2017.

[4]

Andrew Lavin, Scott Gray, Fast Algorithms for Convolutional Neural Networks, CVPR 2016.

Supervisor:

Alexander Frickenstein

Student Assistant Jobs

Studentische Hilfskraft für Vorlesung Digitale Schaltungen

Studentische Hilfskraft für Vorlesung Digitale Schaltungen

Description

Die Tätigkeit umfasst 

  • Vorkorrektur von Hausaufgaben und praktischen Übungen
  • 2 ngSpice Tutorstunden zu den praktischen Übungen

Contact

Sven Rheindt

Room: N2140

Tel. 089 289 28387

sven.rheindt@tum.de

Supervisor: