Aktuelles Angebot an studentischen Arbeiten

In unseren Arbeitsgruppen sind oftmals Arbeiten in Vorbereitung, die hier noch nicht aufgelistet sind. Teilweise besteht auch die Möglichkeit, ein Thema entsprechend Ihrer speziellen Interessenslage zu definieren. Kontaktieren Sie hierzu einfach einen Mitarbeiter aus dem entsprechenden Arbeitsgebiet. Falls Sie darüber hinaus allgemeine Fragen zur Durchführung einer Arbeit am LIS haben, wenden Sie sich bitte an Dr. Thomas Wild.

Für Interessenten einer Ingenieurpraxis:

Wir betreuen gerne Ingenieurpraxen, die in der Industrie durchgeführt werden, wenn das jeweilige Thema zu unseren eigenen Arbeitsgebieten passt. Ingenieurpraxen am Lehrstuhl bieten wir jedoch nicht an, da aus unserer Sicht Studenten bereits frühzeitig Industrieerfahrung sammeln sollten.

 

BAMAIDPFPIPHSSHK
Titel
------

Optimization methods applied to microelectronics design

Optimization methods applied to microelectronics design

Beschreibung

Microelectronics-related problems include several complex sub-problems (e.g. placement, routing, partitioning, application mapping, power/thermal management, etc.) that are often classified as NP-complete or NP-hard. To solve these, several optimization methods (e.g. metaheuristics) coming from the operational research field can be used.

The goal of this seminar is to survey the sub-problems encountered in microelectronics design, show how they can be analyzed and modeled, and what kind of optimization methods can be used to solve them.

Kontakt

Anh Vu Doan
Room N2116
anhvu.doan@tum.de

Betreuer:

Nguyen Anh Vu Doan
------

Extensions & Performance Benchmarks of a CAPI-based Network Interface Card

Extensions & Performance Benchmarks of a CAPI-based Network Interface Card

Beschreibung

With ever-increasing network data rates, the data transfer between network interface card (NIC) and the host system has a decisive impact on the achievable application performance. To fully exploit the host system’s CPU capacity for application processing, it is important to minimize I/O processing overheads. In this project, we want to extend the implementation and optimize the performance of an FPGA-based NIC that is connected to the host system with the Coherent Accelerator Processor Interface (CAPI) [1] for IBM POWER8 Systems.

In a previous project an initial implementation of the CAPI-based NIC was developed using the CAPI Storage, Network and Analytics Programming (SNAP) framework [2]. The goal of this project is to integrate the physical network interfaces in the design, as well as to identify and mitigate performance bottlenecks.

[1] https://developer.ibm.com/linuxonpower/capi/

[2] https://openpowerfoundation.org/blogs/capi-snap-simple-developers

Towards this goal you will complete the following tasks:

  • Analyze source code and working principles of the existing NIC implementation
  • Getting familiar with CAPI and the CAPI SNAP framework
  • Integrate an Ethernet Media Access Controller (MAC) IP core into the FPGA design
  • Benchmark throughput and latency of FPGA-to-host communication through simulations and measurements
  • Identify performance bottlenecks, propose and implement improvements
  • Extend the design to make use of multiple RX/TX queues for multi-core processing

Voraussetzungen

To successfully complete this project, you should already have several of the following skills and experiences:

  • Knowledge of a hardware description language such as Verilog and/or VHDL
  • Hands-on FPGA development experience
  • Solid C programming skills
  • Proficiency using Linux
  • Self-motivated and structured work style

Learning Objectives

By completing this project, you will be able to

  • understand the basic working principles of NICs, as well as FPGA-host communication mechanisms
  • apply your theoretical knowledge to an implementation consisting of both hard- and software parts
  • document work in a scientific report form and in a presentation

Kontakt

Andreas Oeldemann
Room N2137
Tel. 089 289 22962
andreas.oeldemann@tum.de

The thesis is carried out in cooperation with

Power Systems Acceleration Department
IBM Systems – HW Development Böblingen
IBM Deutschland R&D GmbH

 

 

Betreuer:

Andreas Oeldemann
-----

Design and Implementation of a Hardware Managed Queue

Design and Implementation of a Hardware Managed Queue

Beschreibung

Description

Queues are a central element of an Operating System and Application Control Flow in general.

This project is part of a hardware-software codesign.

Goal

The goal of this project is to develop a hardware managed queue for a NoC-based multiprocessor platform

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
----

Multi-core Interference Channel Analysis (at GE Aviation)

Multi-core Interference Channel Analysis (at GE Aviation)

Beschreibung

This work is an offer of General Electric Aviation supervised at TUM LIS.

About GE Aviation

GE Aviation Munich is a R&D center of excellence and is in the heart of southern Germany, on the Garching campus of the Technical University of Munich. This creates a unique blend for our engineers to be in a university setting, while performing research and development in a world-class industrial environment that is dedicated to bringing innovative technologies to market. Within the R&D community, the center maintains close partnerships with numerous universities, research institutions and technology companies in Germany and abroad.

Role summary

The role of the student will be to conduct an interference channel analysis to address potential safety challenges of modern multi-core architectures.

Responsibilities / Goals

GE Aviation is investigating the use of modern multi-core architectures. You will characterize the interference channels of two different multi-core architectures (NXP T1040 and Xilinx Zynq Ultrascale+). The former is a quadcore Power PC built around the e5500 core, the latter a quad-core ARM built around the A53 core.
In your role you will:

  • Investigate domain specific literature (CAST-32A) which will give you a guideline and direction
  • Identify interference channels by using the specifications of both architectures mentioned above
  • Perform a state-of-the art search of existing test suites that help to exercise and identify interference channels
  • Characterize each identified test suite's interference channel's analysis capability and granularity of results
  • Implement a test suite based on the existing ones and successfully run it on both architectures in the lab

Expected Qualifications

  • Good C/C++ Skills
  • Good understanding of real-time operating systems (e.g. RTLinux, FreeRTOS, WindRiver VXWorks) and MPSoCs
  • Fluency in German and English
  • Experience in use of real-time operating systems is a plus
  • Self-motivated, structured work style and good communication skills

Kontakt

Supervisor at GE Aviation: Alexander Walsch

Online application form

Betreuer:

Thomas Wild
------

To Speed Up Artificial Intelligence, Mix Memory and Processing

To Speed Up Artificial Intelligence, Mix Memory and Processing

Beschreibung

If John von Neumann were designing a computer today, there’s no way he would build a thick wall between processing and memory. At least, that’s what computer engineer Naresh Shanbhag of the ­University of Illinois at Urbana-Champaign believes. The eponymous von Neumann architecture was published in 1945. It enabled the first stored-memory, reprogrammable computers—and it’s been the backbone of the industry ever since.

Now, Shanbhag thinks it’s time to switch to a design that’s better suited for today’s data-intensive tasks. In February, at the International Solid-State Circuits Conference (ISSCC), in San Francisco, he and others made their case for a new architecture that brings computing and memory closer together. The idea is not to replace the processor altogether but to add new functions to the memory that will make devices smarter without requiring more power.

Read further...

The goal of this seminar is to analyze the potential and need of near memory computing in the field of artificail intelligence.

Kontakt

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
------

Near Memory Traffic Compression for NoC-based Distributed Memory Architectures

Near Memory Traffic Compression for NoC-based Distributed Memory Architectures

Beschreibung

The bandwidth of data movement in a NoC based distributed memory architectures is one of the major bottlenecks of such systems.

Compressing the data traffic in the system could be an improvement.

The goal of this project is to make a survey of available data traffic compression schemes and architectures.

 

Kontakt

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
------

The Evolution of Bitcoin Hardware

The Evolution of Bitcoin Hardware

Beschreibung

Since its deployment in 2009, Bitcoin has achieved remarkable success and spawned hundreds of other cryptocurrencies. This seminar topic traces the evolution of the hardware underlying the system, from early GPU-based homebrew machines to today’s datacenters powered by application-specific integrated circuits. These ASIC clouds provide a glimpse into planet-scale computing’s future.

Betreuer:

Armin Sadighi
------

Brain-inspired computing

Brain-inspired computing

Beschreibung

The inner workings of the brain as a biological information processing system remain largely a mystery to science. Yet there is a growing interest in applying what is known about the brain to the design of novel computing systems, in part to explore hypotheses of brain function, but also to see if brain-inspired approaches can point to novel computational systems capable of circumventing the limitations of conventional approaches, particularly in the light of the slowing of the historical exponential progress resulting from Moore’s Law. Although there are, as yet, few
compelling demonstrations of the advantages of such approaches in engineered systems, a number of large-scale platforms have been developed recently that promise to accelerate progress both in understanding the biology and in
supporting engineering applications. SpiNNaker (Spiking Neural Network Architecture) is one such large-scale example, and much has been learnt in the design, development and commissioning of this machine that will inform
future developments in this area.

Betreuer:

Armin Sadighi
------

Statical WCET-Analysis for Multi-Core Systems

Statical WCET-Analysis for Multi-Core Systems

Beschreibung

It is indispensable to know the worst-case execution time (WCET) for the development of real-time systems. There exist several methods to approximate the WCET on single-core platforms. Whenever multiple tasks run simultaneously on a multi-core platform, these methods cannot provide a reliable estimation any more. The goal of this seminar is to summarize the major problems which arise when analysing multi-core applications and some methods to solve them.

Kontakt

Dirk Gabriel
Raum N2117
Tel. 089 289 28578
dirk.gabriel@tum.de

Betreuer:

Dirk Gabriel
------

Mult-Agent Reinforcement Learning for Multicore Processors

Mult-Agent Reinforcement Learning for Multicore Processors

Stichworte:
Machine learning, multi-agent reinforcement learning, multicore, power, temperature

Beschreibung

Reducing the power consumption of multicore processors is an ongoing and challenging task for processor designers. Many different machine learning algorithms (Reinforcement learning (RL), supervised learning, unsupervised learning) have been proposed to manage power and performance of multicore processors. Especially, RL has been widely used for DVFS, DPM and DTM. However, the interaction of multiple RL algorithms, e.g. RL-1 managing DVFS and RL-2 managing DPM, is an open research area. In this seminar, you will identify state-of-the-art RL algorithms for DVFS, DPM and DTM and investigate if there are any Multi-Agent-RL algorithms for multicore power/thermal management.

Kontakt

mark.sagi@tum.de

Betreuer:

Mark Sagi
------

Applying Sparse Weighs and Activations to Deep Neural Networks

Applying Sparse Weighs and Activations to Deep Neural Networks

Stichworte:
Sparse, DNN, Optimization

Kurzbeschreibung:
If the computations of convolutional neural networks are performed sparsely, the computational cost and the memory demand is reduced.

Beschreibung

 

Nowadays, Convolutional Neural Networks (CNNs) are used in a wide field, such as image and sound recognition, object detection and mobile vision. While offering remarkable results in several classification tasks and regression analysis, deep learning algorithms are extremely computationally heavy and storing the vast number of parameters requires a lot of memory space and bandwidth.

 

Song Han et al. [1] have demonstrated that DNNs consist of a huge number of redundant and unused parameters. Removing those weights from the model makes the kernels and computations sparse. However, sparse weights and activations decrease the memory demand of the model. Moreover, if the computations are performed sparsely, the computational cost is reduced analogously. In fact, modern accelerator for DNNs are highly parallelized.

In this seminar, the methods and possibilities of performing sparse convolutions and operations of DNNs on parallel hardware are to be studied.

 

References:

[1]

Song Han, Huizi Mao, William J. Dally; Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding; ICLR 2016

Kontakt

alexander.frickenstein@bmw.de

alexander.frickenstein@tum.de

Betreuer:

Alexander Frickenstein
------

Applying Binary Weights and Activations to Deep Neural Networks

Applying Binary Weights and Activations to Deep Neural Networks

Stichworte:
Binary, EXOR, Optimization, DNN

Kurzbeschreibung:
Binary weights and acctivations are capable of increasing the performnance of DNNs drastically

Beschreibung

 

Floating-point operations are computational heavy, memory hungry and the energy consumption is high, making them less optimal in embedded hardware. Fix-point operations are commonly used in embedded hardware [1]. In contrast, binary weighs and operations of CNNs are actually discussed in the research-community.

 

Courbariaux et al. [2] have applied binary weights to DNNs with a minor loss of accuracy. Moreover, in [3], they have proven that binary activations and computations can also be used within DNNs.

An elaborate analysis and comparison of recent publications, in the field of low bit-width CNN applications, is the main task for this seminar topic. Furthermore, the implementation-wise differences, as well as potential hardware, should be discussed in this work.  

 

References:

[1]

Google TPU; https://cloud.google.com/tpu/?hl=de; 2018.

[2]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, et al.; BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1; 2016.

[3]

Matthieu Courbariaux,Yoshua Bengio, Jean-Pierre David; Binary Connect: Training Deep Neural Networks with binary weights during propagations; 2016.

Betreuer:

Alexander Frickenstein
---

Application Profiling for Near Memory Computing

Application Profiling for Near Memory Computing

Beschreibung

* Image Source: http://www.layer7.co.za/app_profiling.html

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to profile application in the context of Near Memory Computing and to identify useful functions or primitives that could be accelerated.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C/C++
  • Good programming skills in SystemC
  • Very good analytical thinking and understanding of complex problems
  • Good knowledge about digital circuit design
  • Very good knowledge in the field of Near Memory Computing

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
---

FPGA Prototyping a Bus Front-End for Near Memory Accelerators

FPGA Prototyping a Bus Front-End for Near Memory Accelerators

Beschreibung

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to develop a bus front-end for near memory operations on a FPGA prototype.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
---

FPGA Prototyping a Memory Back-End for Near Memory Accelerators

FPGA Prototyping a Memory Back-End for Near Memory Accelerators

Beschreibung

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to develop a memory back-end for near memory operations on a FPGA prototype.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills VHDL
  • Good comprehension of a complex system
  • Good knowledge about hardware development.
  • Very good knowledge about digital circuit design

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
------

Frequency Optimization of a FPGA Prototype

Frequency Optimization of a FPGA Prototype

Beschreibung

Description

Our NoC-based many-core design is implemented on multiple Xilinx Virtex7 FPGAs. It is currently frequency limited by individual components.

Goal

The goal of this work is to optimize the overall frequency of an FPGA design.

This work includes:

  • Indetification of the critical paths of the design
  • Pipelining the design to reach higher frequencies

Prerequisites

For this challenging task, several prerequisites should be met:

  • Very good knowledge of VHDL
  • Very good knowledge of the Xilinx Vivado Synthesis Tool
  • Very good experience with FPGA design
  • Very good knowledge about digital circuit design

Application

If you are interested, send me an email with your CV, your transcript of records and summary of your experience attachted.

Contact

Sven Rheindt

Room: N2140

Tel. 089 289 28387

sven.rheindt@tum.de

Betreuer:

Sven Rheindt
---

Simulator Support for Dynamic Task Migration

Simulator Support for Dynamic Task Migration

Beschreibung

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to implement dynamic data migration into a trace-based simulator and to evaluate its potential.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C++ or SystemC
  • Good comprehension of a complex system
  • Very good knowledge about hardware development.

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt
----

Approximate Computing für digitales Kino

Approximate Computing für digitales Kino

Stichworte:
Analyse und Abschätzung von Rechenungenauigkeiten, Programmierung (Matlab bzw. GPU/FPGA), Bildverarbeitung

Beschreibung

Für die Anwendung im professionellen Kino- und TV-Bereich muss die Bildqualität einer Digitalkamera
höchsten Ansprüchen genügen. Die Signalverarbeitung der Sensordaten in der Kamera besteht aus
vielen verschiedenen Schritten, die die Bildqualität beeinflussen. Diese Schritte werden in der Kamera
auf einem FPGA durchgeführt, bei Rohdatenaufnahme auf einem Rechner mit CPU und/oder GPU.
Das heißt, es müssen die gleichen Schritte für verschiedene Plattformen implementiert und optimiert
werden.
Durch Implementierung für verschiedene Rechenplattformen, können Abweichungen entstehen, zum
Beispiel, wenn float Operationen mit Integer Operationen angenähert werden. Diese
Rechenungenauigkeiten vergrößern sich dann ggfs. durch die Fehlerfortpflanzung über die vielen
Prozessschritte. Deswegen soll in der Arbeit untersucht werden, wie sich Abweichungen in einer
Bildverarbeitungskette fortsetzen (Fehlerfortpflanzung).
Wenn die Fehler am Ende der Prozesskette zu groß werden, kann die Qualität dramatisch reduziert
sein. Wie die Abweichungen zu bewerten sind wirde durch Verwendung von etablierten
Bildqualitätsmetriken beurteilt.
Auch kleine Fehler können jedoch Schwierigkeiten bereiten, und zwar besonders beim Testen der
Prozessschritte. Automatisiertes Testen wird schwierig bis unmöglich, und das Auffinden von Fehlern,
die nicht aus Rechenungenauigkeiten, sondern aus echten Unterschieden kommen, wird stark
erschwert.
Um die Qualitätsabschätzung und den Test von Bildverarbeitungsalgorithmen zu verbessern, soll in
diesem Praktikum eine genaue Analyse der Rechenungenauigkeiten auf verschiedenen Plattformen
durchgeführt werden: FPGA, CPU und GPU. Nachdem die einzelnen Operationen verglichen worden
sind, soll weiterhin die Fehlerfortpflanzung durch eine typische Kamera-Prozesskette untersucht
werden. Je nach Art der Arbeit, Interesse und verfügbarem Zeitrahmen, kann die Arbeit dann in
Richtung Testautomatisierung oder in Richtung Optimierung durch Approximate Computing fortgesetzt
werden.
Testautomatisierung
In Richtung Testautomatisierung können die Ergebnisse der Fehlerfortpflanzung direkt genutzt
werden, um Vorschläge für das Testen von Prozessschritten zu erarbeiten.
Das automatisierte Testen ist einfach, wenn die Implementierung der einzelnen Schritte auf allen drei
Plattformen (FPGA, CPU,GPU) bitgenau gleich ist. In diesem Fall kann sehr einfach ein
automatisierter Test durchgeführt werden und anhand einer Referenz-Implementierung Ergebnisse für
verschiedenen Eingangsdaten überprüft werden. Wie zuvor erwähnt ist das Bitgenauigkeit meist nicht
gegeben, und deswegen müssen automatisierte Tests entwickelt werden, die auf den Ergebnissen der
Fehlerfortpflanzungs-Analyse basieren.
Optimierung durch Approximate Computing
Performance-Optimierung durch Approximate Computing heißt in diesem Fall, dass zum Beispiel
übergenaue Rechenschritte, deren Annäherung durch ungenauere Schritte zu wenig oder keiner
Qualitätseinbuße führt, ersetzt werden mit einfacheren Implementierungen, die Fehler erlauben.
Dadurch soll höhere Geschwindigkeit oder besserer Ressourcenverbrauch erreicht werden. Falls
diese Richtung gewählt wird, sollen also zunächst die Schritte identifiziert werden, bei denen eine
Performance-Optimierung lohnend ist, und im Idealfall trickreiche Optimierungen gefunden werden die
die Performance verbessern. Je nach Erfahrungen und Interesse, kann sich dieser Teil der Arbeit
mehr auf GPU-Prozessierung mit Cuda/OpenCL oder auf FPGA-Prozessierung mit VHDL
konzentrieren.
Für eine bessere Vorstellung des Themas im Folgendem einen groben Überblick über die geplanten
Tätigkeiten:
1) Fehlerfortpflanzung in einer Bildverarbeitungskette
2) Evaluierung von Rundungsfehlern und Unterschieden durch unterschiedliche Implementierung
von den in der Bildverarbeitung benötigten Rechenoperationen je nach Implementierung
(FPGA,CPU,GPU) und Zahlendarstellung (integer, float,...)
3) Weiterentwicklung in Richtung a) oder b)
a) Tests für die ganze oder einen Teil der Bildkette
i) Bildkette aus FPGA-Modulen (aus HIL Test)
ii) Bildkette in Trilian (GPU/CPU)
iii) Bildkette in Matlab (als Referenz)
b) Performance-Optimierung durch Approximate Computing
i) Mit dem Ziel Ressourcenverbrauch und/oder Datenrate auf dem FPGA zu verbessern
ii) Mit dem Ziel die GPU-Implemetierung zu beschleunigen
Diese Punkte dienen hier zur Orientierung, die Herangehensweise zur Beantwortung der oben
genannten Frage sollen im Rahmen einer Masterarbeit selbstständig geplant werden (in Abstimmung
und mit Beratung der Betreuer).
Wer Spaß an Bildverarbeitung hat, und wer Lust hat verschiedene Rechen-Plattformen
kennenzulernen, bitte schickt einfach eine Kurzbewerbung bestehend aus Lebenslauf und aktuellem
Notenauszug per Email.
Vorraussetzung sind Erfahrung mit mindestens einer der folgenden Tools/Sprachen: Matlab, C/C++,
VHDL, Cuda, OpenCL. Genaueres in persönlichem Gespräch.
Das Thema eignet sich somit für ein längeres Praktikum oder auch für eine Abschlussarbeit im
Bereich Elektrotechnik oder Informatik.
Bei Interesse an einer Werkstudententätigkeit, oder an einer Bachelorarbeit, wird das Thema
entsprechend eingegrenzt.
Weitere Infos: Dr. Tamara Seybold; Email: tseybold@arri.de

Betreuer:

Walter Stechele
------

Weiterentwicklung eines Linux Client Deployment Systems mit Puppet

Weiterentwicklung eines Linux Client Deployment Systems mit Puppet

Beschreibung

Die Fakultät für Elektro- und Informationstechnik stellt für Studenten und Mitarbeiter eine Vielzahl von Linux PCs zur Verfügung. Ein Konfigurationsmanagement stellt sicher, dass eine einheitliche Konfiguration auf allen PCs vorhanden ist und jederzeit aktuell gehalten wird. Für diese Aufgaben setzen wir die Open Source Werkzeuge Foreman (für die Grundinstallation) und Puppet (für das Konfigurationsmanagement) ein. Deine Aufgabe ist es, als Werkstudent die Anpassung und Weiterentwicklung dieses Systems zu begleiten. Die konkreten Aufgaben werden dabei je nach Bedarf vergeben, aktuell geplant sind beispielsweise die Umsetzung einer automatisierten Testumgebung.
Diese Arbeit gibt dir die einmalige Möglichkeit, am „Schalthebel der Automatisierung“ zu sitzen, wie sie in aktuellen Cloud-Umgebungen üblich ist. Mit deiner Arbeit beeinflusst und verbesserst du so die Installation hunderter PCs. Um die Aufgabe erfolgreich umsetzen zu können, sind folgende Voraussetzungen notwendig:

  • sehr gute Linux-Kenntnisse
  • geübter Umgang mit Werkzeugen der Open Source Welt, wie git, Skript-Sprachen, etc.
  • Interesse an einer längerfristigen Beschäftigung
  • selbständige Arbeitsweise und der Wunsch, sich in neue Themen einzuarbeiten


Bitte erläutere in deiner Bewerbung kurz, warum du dich für das Thema interessierst und welche relevanten Vorkenntnisse du bereits gesammelt hast.

Betreuer:

Philipp Wagner
-----

Hardware accelerated Image Fusion

Hardware accelerated Image Fusion

Beschreibung

Automated driving systems require reliable information on the current environment in order to make proper decisions. Different sensor systems like cameras, LIDAR and radar contribute to this information. To minimize the possibility of incorrect recognitions or undetected objects the data provided by the different sensors must be exhaustively analyzed and compared to each other.

Such comparisons are only possible if the full surrounding is observed by each sensor system. As a single camera has a limited viewing angle multiple cameras are placed at different places around the vehicle to provide the required visual input.

 Additionally the processing time of the sensor inputs and data fusion must stay within limited bounds to ensure low end-to-end reaction times. For the camera systems this leads to a hardware accelerated implementation in order to achieve the required processing time.

Goal

The major goal of the thesis is the selection and implementation of a suitible algorithm to combine multiple images provided by different cameras to one image. Whereas the evaluation of the algorithm can be done with a pure software version e.g. with OpenCV the final version should run on Xilinx Zynq with suitable hardware accelerators implemented in the FPGA part.

Voraussetzungen

To successfully complete this project, you should already have the following skills and experiences:

  • Knowledge of a hardware description language e.g. VHDL
  • Solid C programming skills
  • Hands-on FPGA development experience, preferably using Xilinx Vivado
  • Self-motivated and structured work style

Kontakt

Dirk Gabriel
Room N2117
Tel. 089 289 28578
dirk.gabriel@tum.de

Betreuer:

Dirk Gabriel

Master-/Diplomarbeiten


Bachelor-/Studienarbeiten


Forschungspraxis


Werkstudenten-Stellen

Interdisziplinäres Projekt


Laufende Arbeiten

Masterarbeiten

Efficient Offloading of Network Functionalities via ISA Extension

Efficient Offloading of Network Functionalities via ISA Extension

Beschreibung

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to efficiently offload network functionalities and near memory operations via ISA extension. A hardware prototype will be built.

Learning Objectives

Towards this goal you’ll complete the following tasks: 

  • Work in a bigger project and understand the concept of an existing HW platform
  • Develop, implement and test an advanced hardware module on the given platform
  • Compare/Evaluate the implementation with state of the art
  • Document your work in a written thesis report and present your work in a presentation 

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in VHDL
  • Good programming skills in C
  • Good comprehension of a complex system
  • Very good knowledge about hardware development

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt

Student

Steffen Schlienz

Simulator Support for Dynamic Data Migration

Simulator Support for Dynamic Data Migration

Beschreibung

Description

Hitting a wall is not a pleasant thing. Computer systems faced many walls in the last decades.Being able to break the memory wall in the mid 90's and the power wall in 2004, it now faces the next crucial barrier for scalabilty. Although being able to scale systems to 100's or 1000's of cores through NoCs, performance doesn't scale due to data-to-task dislocality. We now face the locality wall.

The newest trend to tackle this issue is data-task migration and processing in or near memory.

Goal

The goal of this project is to implement dynamic data migration into a trace-based simulator and to evaluate its potential.

Prerequisites

To successfully complete this project, you should already have the following skills and experiences.

  • Very good programming skills in C++ or SystemC
  • Good comprehension of a complex system
  • Very good knowledge about hardware development.

Contact

Sven Rheindt, Room: N2140, Phone +49.89.289.28387, sven.rheindt@tum.de

Betreuer:

Sven Rheindt

Student

Iffat Brekhna

Optimizing Region Based Cache Coherence for the InvasIC Architecture (HW)

Optimizing Region Based Cache Coherence for the InvasIC Architecture (HW)

Stichworte:
Cache Coherence, Distributed Directories, FPGA

Beschreibung

Providing hardware coherence for modern tile-based MPSoCs requires additional area. As a result, this does not scale with increasing tile counts. As part of the Invasive Computing project, we introduced Region Based Cache Coherence (RBCC) which is a dynamic scalable approach that provides on-demand coherence based on application requirements. However, the directories currently used for RBCC are not optimized for area. Therefore, RBCC can be further enhanced by optimizing these structures in conjunction with the coherency protocol for hybrid distributed shared memory MPSoCs.

Goal

The goal of this project is to optimize directory structures with smart replacement policiesand implement amodified coherence protocol to save on-chip area without sacrificing performance.

Towards this goal you’ll complete the following tasks:

  • Investigate existing directory based cache coherence schemes
  • Implement a smart directory stucture to reduce hardware overheads
  • Implement a hybrid cache coherence protocol for distributed shared memory systems
  • Verify the design on a FPFA-based hardware platform

Voraussetzungen

To successfully complete this project, you should already have the following skills and experiences:

  • Very Good VHDL Skills
  • Good C/C++ Skills
  • Good understanding of MPSoCs and Cache Coherence Schemes
  • Self-motivated and structured work style

Learning Objectives

 After you have successfully completed this project, you will be able to

  • Understand the challenges of cache coherence in multi-core systems
  • Understand the work flow from software-to-hardware

Kontakt

Akshay Srivatsa
Room N2140
Tel. 089 289 22963
srivatsa.akshay@tum.de

Betreuer:

Srivatsa Akshay Sateesh