Tutorials

We are pleased to offer the following tutorials from leading experts in academia and industry.

(Click on the arrows to move between tutorials, or on the + sign to see all choices.)

Practical Tooling for Serverless Computing

Tuesday morning, December 5, Salon A
Presenter:
Josef Spillner https://blog.zhaw.ch/icclab/josef-spillner/

Description:
Serverless computing is considered one of the strongest trends in cloud application development. This tutorial transfers the outcome of many months of research on this topic to you as attendee. You will be able to understand what FaaS and serverless computing is all about, what the state of technology is, which tools can be used in research and teaching, and where the research challenges are.

The tutorial consists of four parts of around 45 minutes each and short (10 minutes) breaks in between, for a total of 3.5 hours: Serverless foundations, function developer tools, function execution tools, research challenges.

Intended audience:
Due to the novelty of the topic, this tutorial addresses anybody with background knowledge in cloud computing and cloud applications. No prior knowledge on serverless concepts or technologies are required.

The tutorial is particularly of interest to PhD candidates and proposal writers who need compiled background knowledge and some hands-on experience on the topic in spite of the lack of literature. It also contains sufficient practical advice for professional software engineers looking forward to add cloud-delivered functionality to their applications.

Supplementary material:
https://drive.switch.ch/index.php/s/Upjd0aXCypZjMnZ

Interactive Video Streaming Using Heterogeneous Cloud Services

Tuesday morning, December 5, Salon B
Presenter:
Dr. Mohsen Amini
Research lab Web site: http://hpcclab.org/
Homepage: http://www.ucs.louisiana.edu/~mxa2975/

Description:
Thanks to the high speed Internet, basic video streaming has become an ordinary service nowadays. However, what is offered currently is far from the higher level services that enable stream viewers to \emph{interact} with the video streams. Interactive video streaming enables processing of the video streams upon viewers’ requests for a particular video. For instance, a viewer may request to watch a video stream with a particular resolution. Another example, is a viewer who requests to view a summary (highlights) of a video stream.

Cloud services have provided an ideal platform for video streaming providers to satisfy the computational demands needed for interactive video streaming. However, the problem in utilizing cloud services for interactive video streaming is:
"how to provide a robust interactive video streaming service through guaranteeing QoS desires of the viewers, while spending the minimum cost for the cloud services?" Accordingly, the objective of this tutorial is to present challenges, structures, and methods required to enable interactive video streaming that guarantee QoS in a cost-efficient manner. In particular, we present a framework for interactive video streaming called Interactive Video Streaming Engine (IVSE) that deals with the challenges of cloud-based interactive video streaming services and provides methods to address these challenges.

For the sake of cost-efficiency, the Resource Provisioning component supplies heterogeneous types of Virtual Machines (VM) where each VM has a different affinity with various arriving video processing tasks. As a result, the system creates and manages a dynamic heterogeneous VM cluster where the configuration of the VMs varies to conform to the arriving workload.

In summary, this tutorial describes innovations in interactive video streaming particularly in the following areas:
Robust, cost-efficient, and self-configurable VM provisioning policy: We will explain novel methods to provision a dynamic VM cluster that conforms its heterogeneity according to the arriving requests.
Heterogeneity- and QoS-aware scheduling method: It efficiently schedules streaming tasks on available heterogeneous VMs with the goal of minimizing both missing tasks’ deadlines and their startup delays.
Execution time prediction for video streaming tasks: We will elaborate on the influential factors of the video streaming tasks execution times. In addition, we will explain the way to model affinity exists between heterogeneous VMs and tasks while considering their cost difference.

A priority-aware admission control method: That prioritizes submission of streaming tasks to minimize the startup delay. The method can also consider the viewer subscription priority, and network speed at the viewers’ end.

Cost-efficient caching methods: We will elaborate on the trade-off between computation versus storage for video streams. We also provide a formal way to measure the hotness of video streams and provide methods that perform caching based on the hotness measure.

Intended audience:
Our tutorial provides guidelines for researchers and practitioners in Utility and Cloud Computing area who are interested into applications of cloud computing systems. Particularly, the tutorial will be of interest of those who research in heterogeneous computing, performance modeling, task scheduling, resource provisioning, and caching policies.

The findings, approaches, and outcomes can be beneficial to researchers who work on other High Performance Computing applications. We present our empirically-validated tools and evaluation benchmarks to real-world practice for use by practitioners and researchers. The tutorial provides a technology-transfer and will break down the barriers between the interactive video streaming and heterogeneous high performance computing based on cloud.

Supplementary material:
http://hpcclab.org/index.php/projects/high-quality-video-streaming-using-cloud-services/

Machine Learning GPU Power Measurement on Chameleon Cloud

Tuesday afternoon, December 5, Salon A
Presenter:
Joon-Yee Chuah, Texas Advanced Computing Center (http://tacc.utexas.edu)

Description:
Machine Learning (ML) is becoming critical for many industrial and scientific endeavors, and has a growing presence in High Performance Computing (HPC) environments. Neural network training requires long execution times for large data sets, and libraries like TensorFlow implement GPU acceleration to reduce the total runtime for each calculation.

This tutorial demonstrates how to 1) use Chameleon Cloud to perform comparative studies of ML training performance across different hardware configurations; and 2) run and monitor power utilization of TensorFlow on NVIDIA GPUs.

Intended audience:
Students or academia researchers interested in measuring performance and power utilization on GPUs, and individuals interested in learning to use bare metal cloud systems to evaluate hardware for system design.

Supplementary material:
http://bit.ly/chameleon_gpu

Understanding Performance Interference Benchmarking and Application Profiling Techniques for Cloud-hosted Latency-sensitive Application

Tuesday afternoon, December 5, Salon B
Presenters:
Shashank Shekhar - http://www.dre.vanderbilt.edu/~sshekhar/
Yogesh Barve - http://www.dre.vanderbilt.edu/~ydbarve/
Dr. Aniruddha Gokhale - http://www.dre.vanderbilt.edu/~gokhale/ (will not be able to attend)

Description:
Cloud infrastructure providers must have an up to date understanding of their cloud resources’ usage so that they can effectively manage their cloud platforms while supporting multi-tenancy. At the same time, timely and scalable access to various resource usage statistics is critical to service providers, who host their services in the cloud, so they can ensure that their services provide the required quality of service to their customers through elastic and on demand auto-scaling while minimizing service-hosting costs.

To that end, these providers must understand how their services will perform under a variety of multi-tenancy scenarios and workload patterns. Conducting such benchmarking experiments and obtaining the desired resource statistics to pinpoint the sources of problems, such as level of performance interference which stem from multi-tenancy, is a hard problem due to a variety of reasons including heterogeneity in hardware and operating systems, and availability of hardware-specific, low-level statistics collection tools all of which make it extremely complex for providers to use existing capabilities and extend them as hardware changes and the statistics collection needs change. These challenges are further amplified as cloud platforms increasingly span fog and edge resources. Thus, a framework that is extensible and provides a higher level of abstraction to make it easy to use is needed.

Although, modern frameworks such as collectd are designed to address these challenges, these tools provide only the building blocks, thereby making the users responsible to integrate and extend the capabilities. To overcome these challenges this tutorial presents a framework called INDICES, which builds on collectd, while providing an integrated and extensible framework for users to rapidly conduct a variety of performance benchmarking experiments and collect a range of resource usage and application performance statistics. This tutorial focuses on the design of INDICES and provides attendees with basic hands-on experimentation, and describes how INDICES can be deployed in the cloud.

Intended audience:
The tutorial targets an audience with interest in resource management and those who want to develop expertise in server performance monitoring and benchmarking.

Redfish – Overview and Deep Dive

Wednesday morning, December 6, Salon A
Presenter:
Jeff Hilland, President DMTF and Chief Technologist for Manageability, Hewlett Packard Enterprise Data Center Infrastructure Group.
http://www.dmtf.org/about/officers/bios#hilland
https://www.linkedin.com/in/jeff-hilland-7a4bb8a

Intended audience:
Professionals or academics interested in any of the following: RESTful APIs, Manageability, Cloud Infrastructure, Data Center Infrastructure, Standards, Hybrid IT

Redfish is a RESTful manageability API for Software Defined IT infrastructure. As it is already built into many items of data center infrastructure, Redfish is ready for use for a growing number of modern data center control and automation tasks.

This tutorial will provide a deep dive into the goals, architecture, protocol, data model and surrounding adoption efforts.  There will be a brief overview of the Distributed Management Task Force (DMTF,
www.dmtf.org) and how DMTF is working with other organizations to expand the scope of Redfish via working relationships with organizations like the Storage Networking Industry Association, IETF, Open Compute Project, Green Grid, ASHRAE and others. 

The talk will begin by making the case for Redfish including coverage of why Redfish is needed and the data center control tasks it is designed to handle, The tutorial will then detail the data modeling approach, common modeling and then dive into the management of major and minor components including servers, storage, networking and other parts of the infrastructure.  An overview of the schema definition languages and open source efforts will be covered.


Supplementary material:
http://Redfish.dmtf.org

Resource Management in Cloud Platform as a Service Systems

Thursday morning, December 7, Salon A
Presenters:
Stefania Costache (http://researcher.ibm.com/researcher/view.php?person=us-svcostac)
Djawida Dib (https://www.linkedin.com/in/djawidadib/)
Nikos Parlavantzas (http://people.irisa.fr/Nikolaos.Parlavantzas/)
Christine Morin (http://people.rennes.inria.fr/Christine.Morin/)

Description:
Platform-as-a-Service (PaaS) clouds offer services to automate the deployment and management of applications, relieving application owners of the complexity of managing the underlying infrastructure resources. However, application owners have an increasingly larger diversity and volume of workloads, which they want to execute at minimum cost while maintaining desired performance guarantees.

In this tutorial we give an overview how existing PaaS systems cope with this challenge. In particular, we present a taxonomy of commonly-encountered design decisions regarding how PaaS systems manage underlying resources. We then use this taxonomy to discuss an extensive set of PaaS systems targeting different application domains.

Intended Audience:
The tutorial targets a broad audience, encompassing PhD students, researchers in cloud computing and engineers developing PaaS services or cloud applications.

HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications

Friday morning and afternoon, December 8, Salon A
Presenters:
- Dhabaleswar K. (DK) Panda (http://www.cse.ohio-state.edu/~panda)
- Xiaoyi Lu (http://www.cse.ohio-state.edu/~luxi)

Description:
Significant growth has been witnessed during the last few years in HPC clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). To alleviate the cost burden, sharing HPC cluster resources to end users through virtualization for both scientific computing and Big Data processing is becoming more and more attractive.

In this tutorial, we first provide an overview of popular virtualization system software on HPC cloud environments, such as hypervisors (e.g., KVM), containers (e.g., Docker, Singularity), OpenStack, Slurm, etc. Then we provide an overview of high-performance interconnects and communication mechanisms on HPC clouds, such as InfiniBand, RDMA, SR-IOV, IVShmem, etc. We further discuss the opportunities and technical challenges of designing high-performance MPI runtime over these environments.

Next, we introduce our proposed novel approaches to enhance MPI library design over SR-IOV enabled InfiniBand clusters with both virtual machines and containers. We also discuss how to integrate these designs into popular cloud management systems like OpenStack and HPC cluster resource managers like Slurm. Not only for HPC middleware and applications, we will demonstrate how high-performance solutions can be designed to run Big Data and Deep Learning workloads (like Hadoop, Spark, TensorFlow, CNTK, Caffe) in HPC cloud environments.

Intended audience:
This tutorial is targeted for various categories of people working in the areas of HPC, Big Data processing, and Deep Learning on modern HPC clouds with high-performance interconnects. Specific audience this tutorial is aimed at include:

• Scientists, engineers, researchers, and students engaged in designing next-generation HPC, Big Data, and Deep Learning systems and applications over HPC clouds with high-performance interconnects;

• Designers and developers of Cloud Computing, HPC, Big Data, Deep Learning, OpenStack, Slurm, MPI, Hadoop, Spark, gRPC/TensorFlow, etc. middleware;

• Newcomers to the field of Cloud Computing, HPC, Big Data processing, and Deep Learning on modern high-performance computing clusters who are interested in familiarizing themselves with OpenStack, Slurm, MPI, Hadoop, Spark, gRPC/TensorFlow, RDMA, high-performance networking, etc.;

• Managers and administrators responsible for setting-up next generation HPC Clouds to efficiently run HPC, Big Data, and Deep Learning workloads in their organizations/laboratories.

Supplementary material:
- http://web.cse.ohio-state.edu/~panda.2/ucc17_cloud_tut.html

Hosting Organizations

Stacks Image pp139_n53_n10
Stacks Image pp139_n53_n12

Sponsors

Stacks Image pp139_n53_n16
Stacks Image pp139_n53_n18
Stacks Image pp139_n53_n20
Stacks Image pp139_n53_n23
Stacks Image ppp139_n53_n38_n2

Important Dates