Introdution to Power-Aware HPC
Lecture in the winter-term 2020/21
Prof. Dr. D. Kranzlmüller,
Dr. Hayk Shoukourian
This course will be held in English!
Welcome to the course webpage for Introdution to Power-Aware HPC for winter-term 2020/21 at LMU Munich. Here you will be able to find the details on the lecture and the accompanying practical project.
Connection details have been communicated to all registered students via Uni2Work framework.
Due to current COVID-19 related situation, it is decided to hold the lecture online. We will be using Zoom remote conferencing service. Please ensure that the Zoom Client
is installed on your side prior joining the meeting. The connection specifics will be announced soon to all registered participants. Please note the Rules for Online Teaching
Please note that the lecture room has been modified. The lectures will be held in the room A 120, Hauptgebaeude/Main Building.
Welcome to the course webpage Introdution to Power-Aware HPC for winter-term 2020/21 at LMU Munich.
- Registration will be opened on the 3rd of September via UNI2WORK
(NOTE: registration closes on 11.10.2020 at 18:00)
- The lectures are scheduled for Wednesdays from 10:00 to 12:00. The first lecture will take place on 04.11.2020 at 10:00. The room number is 220 (Amalienstr. 73A)
Contents of the lecture
Some of the current High Performance Computing (HPC) systems already consume more than 15 MW of power - a sufficient amount of power for sustaining a small city.
Energy consumption is becoming a dominating factor for the Total Cost of Ownership of many HPC systems, making high-performance design and energy-efficient design in
many ways synonymous.
Apart from the high power bills, power consumptions of these magnitudes act as a limiting factor in building and operating Exascale systems, i.e. next generation of HPC systems that are capable of performing 1018 floating point operations per second. This could already cause the entire
data center's power delivery and cooling infrastructures to breach the safety limits as well as affect the environmental sustainability by producing high carbon footprint.
Therefore, it is important to be preemptive in improving energy/power efficiency of HPC data centers.
This course explores different energy consumption issues in modern HPC data centers, discusses their impacts on the design of new computing systems and presents different strategies that aim to reduce the overall power consumption.
The lecture will cover the main concepts of energy consumption paradigms that should remain valid despite the continuous technological changes in the area.
Upon completion of this course the participants should acquire knowledge on:
- the importance of power/energy-efficiency for modern data centers
- the theory behind a variety of impacts that power dissipation in a CMOS chip has on HPC data centers
- contemporary tools for monitoring different power consumption related metrics
- diverse techniques on energy-efficiency tuning
- power-related challenges for next generation HPC systems
- contemporary resource management and scheduling techniques that are tuned for energy-efficiency
- power variation in homogeneous HPC systems and the potential of possible cost savings
- Intel's Model Specific Registers (MSRs) used for power management support
- principles of various machine learning techniques and their applications for intelligent power management
- high-frequency data collection techniques
- datacenter basics (understand the building blocks of modern datacenters and learn about possible architectures)
The course is intended for master students of computer science and related fields. The lecture and the project work have a cumulative weight of 6 ECTS.
More formally, in German:
Die Vorlesung richtet sich an Master-Studierende der Informatik. Für die Vorlesung und die Projektarbeit werden 6 ECTS-Punkte vergeben.
The number of students will be limited to 20. The registration will open 03.09.2020 from 08:00 via UNI2WORK and will close on 11.10.2020 at 18:00.
- Python knowledge
- Interest in energy-efficient supercomputing
- Interest in developing machine-learning frameworks
- Lecture: Wednesdays, 10:15 to 11:45 ONLINE. The connection details will be communicated to all registered participants.
in room The first lecture will be held on 4th of November 2020 (online, via zoom). 220 in Amalienstr. 73A A 120, Hauptgebaeude/Main Building.
- Guided Tour at Leibniz Supercomputing Centre (LRZ): TBA. Meeting point: LRZ, 85748 Garching bei Muenchen.
- Exam: TBA (see the exam section for more details)
- Repeat Exam: TBA (see the exam section for more details)
Project: "Increasing Cooling Efficiency of a Data Center"
This project aims at building Machine-Learning (ML) based models for predicting the power consumption of a HPC data center's cooling loop.
Participants will form groups, where each group will be assigned with an annual operational data obtained at Leibniz Supercomputing Centre (LRZ).
The provided data will contain various sensor measurements from LRZ's building infrastructure.
Each group of students would need to analyze the data, design and develop a ML-based model capable of predicting the power consumption of LRZ's warm-water cooling loop.
During this project students will gain an experience that could be applied not only to HPC data centers but also to other domains involving ML-based modeling.
The detailed description of the project assignment will follow during the lecture.
The training data can be found here Project Section.
There will be a written examination (closed book) which will be held in February 2021. The exact time and room will be published as soon as possible.
The retake of the exam is scheduled for:
The lecture notes are available in the Download Section.
CMOS VLSI Design: A Circuits and Systems Perspective (4th Edition) by Neil Weste, David Harris
Computer Organization and Design RISC-V Edition: The Hardware Software Interface by David A. Patterson, John L. Hennessy
Energy-Efficient Distributed Computing Systems by Albert Y. Zomaya, Young Choon Lee
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy
Machine Learning: An Algorithmic Perspective, second edition by Stephen Marsland
Introduction to Apache Flink: Stream Processing for Real Time and Beyond By Ellen, M.D. Friedman, Kostas Tzoumas
The Data Center as a Computer by Luiz André Barroso, Jimmy Clidaras, Urs Hölzle
Additional scholary articles: sources will be indicated in the course slides
Regeln zur Online-Lehre
Sehr viele Lehrveranstaltungen finden online statt. Als Dozenten bitten wir um Nachsicht, falls Dinge nicht immer perfekt laufen und hoffen auf Ihre konstruktive Mitarbeit. In dieser Situation gelten zudem online einige Regeln, die im realen Leben ohnehin klar wären, auf die wir hier aber explizit hinweisen möchten:
- In Live-Veranstaltungen bitten wir um einen disziplinierten Umgang mit Audio (normalerweise aus) und Bandbreite (Video nach Bedarf)
- Die Aufzeichnung oder Weiterleitung von Veranstaltungen durch Teilnehmer sind nicht erlaubt.
- Die Verteilung von Inhalten (Video, Audio, Bilder, PDFs, etc.) in anderen Kanälen als den vom Autor vorgesehenen ist nicht erlaubt.
Wer eine dieser Regeln verletzt, muss damit rechnen, von der fraglichen Veranstaltung ausgeschlossen zu werden und wir behalten uns weitere Schritte vor. Mit allen anderen freuen wir uns auf das gemeinsame Experiment "Online-Semester".
Rules for Online Teaching
Most teaching happens currently online. As lecturers, we ask you to be forgiving if things should not work perfectly right away, and we hope for your constructive participation. In this situation, we would also like to explicitly point out some rules, which would be self-evident in real life:
- In live meetings, we ask you to responsibly deal with audio (off by default) and bandwidth (video as needed).
- Recording or redirecting streams by participants is not allowed.
- Distributing content (video, audio, images, PDFs, etc.) in other channels than those foreseen by the author is not allowed.
If you violate one of these rules, you can expect to be expelled from the respective course, and we reserve the right for further action. With all others, we are looking forward to the joint experiment of an "online semester".
, or per appointment, or after lectures.