I run my seminar sessions pretty wide open to my and the student
interests. Ideally I like to see the students performing independent
reading/studying to help find papers and select topics under the
general theme of the term's study. There are no hard prereqs for this
class except that you should be skilled in the general computing field
and ready for independent work/learning.
Theme for the Term
In this session, I am planning to look at parallel and distributed
systems with a special emphasis on multi-/many-core processors,
virtualization, and the opportunities that virtualization enable for
optimizing the supporting system services for parallel applications.
Specifically I hope to explore the opportunities that virtualization
provides to Beowulf clusters with multi-core, and many-core
processors. In this study we will examine the optimization of virtual
application environments (operating systems) in Beowulf systems. We
will likely be looking at micro-kernels, minimalist lightweight
operating systems, and so on. We will use PDES as the application
framework upon which we will conduct our studies.
Grading
The class we be organized as readings and discussions; no projects,
homeworks, or exams will be assigned. My expectation is that students
will read and explore this problem space on their own and bring
interesting papers to the class for review and discussion.
Possible Topics/Readings
These pages will change throughout the class as we decide
which papers to study. Check these pages regularly.
Parallel & Distributed Simulation Overview:
we will have a quick overview of PDES as an application space for
the main studies of parallel systems to be discussed during the
quarter.
Fujimoto's Survey of
Parallel Simulation: Old, but a solid foundation. A
must read. Full citation: R. Fujimoto, "Parallel Discrete
Event Simulation," Communications of the ACM, 33, 10,
30-53, October 1990.
Lamport's Theory of
Clocks: Fundamental underpinnings of event dependencies and
order. Full citation: L. Lamport, "Time, Clocks, and the Ordering
of Events in a Distributed System," Communications of
ACM, 21, 7, 558-585, July 1978.
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris,
A. Ho, R. Neugebauer, I. Pratt, A. Warfield, "Xen and the Art of
Virtualization," SOSP '03, Oct 2003.
A. Kivity, Y. Kamay, D. Laor, U. Lublin, A. Liguori, "kvm: the Linux Virtual
Machine Monitor," Proc of the Linux Symposium,
Volume 1, 225-230, June 2007.
D. Abramson, J. Jackson, S. Muthrasanallur, G. Neiger,
G. Regnier, R. Sankaran, I. Schoinas, R. Uhlig, B. Vembu,
J. Wiegert, Intel
Virtualization Technology for Directed I/O,"Intel
Technology Journal, Vol 10, No 3, 179-192, August 2006.
Virtualization and Memory Management (read in order for best
understanding of the problem and solutions; it would be nice if we
could locate an overview paper on Intel's current solution):
X. Zhang, A. X. F. Xu, Q. Li, D. K. Y. Yao, S. Qing, and
H. Zhang, "A Hash-TLB
Approach for MMU Virtualization in Xen/IA64,"
International Symposium on Parallel and Distributed
Processing (IPDPS '08), 1-8 April 2008.
G. Chen, X. Wang, Z. Wang, X. Wen, X. Jin, Y. Luo, and
X. Li, "REMOCA: Hypervisor
Remote Disk Cache," Int Symp on Parallel and
Distributed Processing with Applications, 161-169, 2009.
Multi-core, many-core, and the return of Parallel Computing
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer,
J. Kubiatowicz, N. Morgan and David Patterson and K. Sen,
J. Wawrzynek, D. Wessel, and K Yelick, "A View of the
Parallel Computing Landscape," Communications of the
ACM, 52, 10, 56-67, Oct 2009.
Specific Machines/hardware:
IBM's Blue Gene Machines
A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu,
P. Coteus, M. E. Giampapa, R. A. Haring, P. Heidelberger,
D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht,
B. D. Steinmacher-Burow, T. Takken, P. Vranas, "Overview of
the Blue Gene/L System Architecture," IBM J. Research and Dev,
Vol 49, No 2/3, 195-212, March/May 2005.
Operating Systems/Application Support Environments
One could always write a custom O/S for a Beowulf cluster or
replace various system services (e.g., TCP/IP, task scheduling,
memory management) with application specific specialized services
to optimize a parallel application. However, most Beowulf
clusters are general purpose machines that provide service a wide
range of users. Thus, the typical Beowulf cluster is setup
running a standard desktop/server O/S with standard services.
With virtualization we can now easily deploy a customized
execution environment to help optimize parallel applications.
However, the key question is: does the cost of virtualization
exceed the performance gains of a customized virtual guest?
While much work has been done to develop full custom, lightweight
O/S's for clusters, we will separate our studies of them into a
separate section. Within the space of optimizations in a virtual
guest, unfortunately, not much has been done. As you know, we've
been looking at replacing the standard TCP/IP protocol stack with
an active message stack. Others have projects to develop compact
distributions to use as virtual guests. Hopefully the class will
be able to find out additional readings, for now all I have are:
Replace TCP/IP with a light weight messaging subsystem like
GAMMA.
Currently we are reworking the GAMMA driver into a version we're
calling ucgamma. A draft paper with some preliminary results is
available.
Here's a project that proposes to create a small O/S to
serve as a virtual guest for parallel applications JeOS:
Just enough OS Project
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany,
J. H. Ahn, P. Mattson, J. D. Owens,, "Programmable Stream
Processors," IEEE Computer, 54-62, August 2003.
Cuda
GPU Gems 2, Chandra recommends chapters 31, 32, and 33.
Pthreads. Skipping a formal review of them in class.
Checkout one of the online tutorials instead.
Lock-free Data Structures. The first 5 pages of this next
paper give a good introduction to the two most common errors with
atomic instructions and the rest of it is their soution to solving
thesae problems. Good read.
warped's Lock-free Data Structures. Not reporting progress
for these data structures this quarter (not well enough
developed). However, we do use a calendar queue data structure
(as described in the following paper) to help organize/plan a
scheduling window for the simulation objects.
V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia,
V. Talwar, P. Ranganathan, "GViM:
GPU-Accelerated Virtual Machines," Workshop on
System-Level Virtualizatrion for High Performance Computing
(HPCVirt '09), March 2009.
Larabee: L. Seiler, D. Carmean, E. Sprangle, T. Forsyth,
M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin,
R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: A Many-Core x86
Architecture for Visual Computing," ACM Transactions on
Graphics, 27, 3, August 2008.