- Prof Miroslaw Malek, University of Lugano, Switzerland
- Dr. Jorge Cardoso, IT R&D Division, Huawei
Prof. Miroslaw Malek
Miroslaw Malek is Professor and Director of Advanced Learning and Research Institute (ALaRI) at the Faculty of Informatics at the Università della Svizzera italiana in Lugano. Prior to that he was professor at the University of Texas at Austin (1977-1994) and at the Humboldt University in Berlin (1994-2012) holding also visiting appointments at the IBM Yorktown Heights, AT&T Bell Labs, Stanford University, NYU, TU-Vienna, Politecnico di Milano, Università di Roma “La Sapienza,” Chinese University of Hong Kong and IBM-Japan Chair at the Keio University. He received his PhD in Computer Science from the Technical University of Wroclaw in Poland.
His research interests focus on dependable and secure architectures and services in parallel, cloud, distributed and embedded computing environments including failure prediction, and service availability. Among others he has participated in two pioneering parallel computer projects, contributed to the theory and practice of dependable, parallel network design and made numerous contributions, reflected in over 250 publications and nine books. He founded, organized and co-organized numerous workshops and conferences. He served and serves on editorial boards of several journals including ACM Computing Surveys and is consultant to government and companies on technical and strategic issues in information technology.
Predictive Analytics: a Shortcut to Dependable Computing
Abstract: We introduce three major “tyrants,” namely, complexity, time, and uncertainty, and conclude that because of them dependability is and will continue to be a permanent challenge.
We briefly analyze the impact of “tyrants” on the modeling and models and argue that with current complexity levels and necessity of dealing with time, in addition to classical synthesis and analysis methods, we need to turn to empirical, data-driven approaches which require monitoring, online measurement, online analysis, diagnosis, failure prediction and decision making to support recovery and nonstop computing and communication. We pledge for a change of mindset by advocating proactive fault management where instead of waiting for a failure we anticipate and avoid it or minimize the potential damage. To illustrate such approaches, two case studies are presented.
In the first case study, we address the problem of proactive fault management by demonstrating how runtime monitoring, variable selection and model re-evaluation lead to effective failure prediction.
The second case study is focused on early malware detection in high-impact, Android-based systems. The main goal is the development of lightweight techniques for dynamic malware detection, suitable for battery-operated environments. For this purpose, we propose to use a minimal set of most indicative memory and CPU features reflecting malicious behavior.
Finally, we attest that data-driven models derived by monitoring and measurement will positively impact dependability and observe that by using predictive analytics and effective methods for failure avoidance and downtime minimization we have a potential of enhancing system availability by an order of magnitude or more.
Dr. Jorge Cardoso
Dr. Jorge Cardoso is Chief Architect for Cloud Operations and Analytics at Huawei’s German Research Centre (GRC) in Munich. He is also Professor at the University of Coimbra since 2009. In 2013 and 2014, he was a Guest Professor at the Karlsruhe Institute of Technology (KIT) and a Fellow at the Technical University of Dresden (TU Dresden). Previously, he worked for major companies such as SAP Research (Germany) on the Internet of services and the Boeing Company in Seattle (USA) on Enterprise Application Integration. Since 2013, he is the Vice-Chair of the KEYSTONE COST Action, a EU research network bringing together more than 70 researchers from 26 countries. He has a Ph.D. in Computer Science from the University of Georgia (USA).
Cloud Reliability: Decreasing outage frequency using fault injection
Abstract: In 2016, Google Cloud had 74 minutes of total downtime, Microsoft Azure had 270 minutes, and 108 minutes of downtime for Amazon Web Services (see cloudharmony.com). Reliability is one of the most important properties of a successful cloud platform. Several approaches can be explored to increase reliability ranging from automated replication, to live migration, and to formal system analysis. Another interesting approach is to use software fault injection to test a platform during prototyping, implementation and operation. Fault injection was popularized by Netflix and their Chaos Monkey fault-injection tool to test cloud applications. The main idea behind this technique is to inject failures in a controlled manner to guarantee the ability of a system to survive failures during operations. This talk will explain how fault injection can also be applied to detect vulnerabilities of OpenStack cloud platform and how to effectively and efficiently detect the damages caused by the faults injected.