5月3日:Yves Robert
发布时间:2017-05-02 浏览量:3571

报告题目:Fault-tolerance techniques for High Performance Computing

报告人:Yves Robert 教授

主持人:王长波

报告时间:5月3日(周三)13:30—14:30

报告地点:中北校区数学馆201室

报告人介绍:

Yves Robert received the PhD degree from Institut National Polytechnique de Grenoble. He is currently a full professor in the Computer Science Laboratory LIP at ENS Lyon. He is the author of 7 books, 150 papers published in international journals, and 240 papers published in international conferences. He is the editor of 11 book proceedings and 13 journal special issues. He is the advisor of 30 PhD theses. His main research interests are scheduling techniques and resilient algorithms for large-scale platforms. He is a Fellow of the IEEE. He has been elected a Senior Member of Institut Universitaire de France. He has been awarded the 2014 IEEE TCSC Award for Excellence in Scalable Computing, and the 2016 IEEE TCPP Outstanding Service Award. He holds a Visiting Scientist position at the University of Tennessee Knoxville since 2011.

报告摘要:

This talk will provide an overview of fault-tolerance techniques for High Performance Computing at very large scale. We first address fail-stop errors, a.k.a. unrecoverable failures, and discuss various checkpoint protocols. We then discuss silent errors, a.k.a. silent data corruptions, and present several detection/correction mechanisms.

银河集团9873.cσm
学院地址:上海中山北路3663号理科大楼
院长信箱:yuanzhang@sei.ecnu.edu.cn | 办公邮箱:office@sei.ecnu.edu.cn | 院办电话:021-62232550
Copyright Software Engineering Institute