ITR/SI+AP: GUARANTEEING HIGH-BANDWIDTH CONNECTIONS IN A DYNAMIC DISTRIBUTED
NETWORK: DEVELOPMENT OF A DEPLOYABLE, AUTHENTICATED QoS
INVESTIGATORS
William A. Adamson, Research Investigator, Information Technology Division,
Assistant Director, Center for Information Technology Integration, University
of Michigan
Brian Athey, Assistant Professor of Biology, Director, University of Michigan
Visible Human Project
Steve Corbat
?, Director, Backbone Network, University Corporation for Advanced Internet
Development
Farnam Jahanian, Associate Professor of Electrical Engineering and Computer
Science, Director of Software Systems Laboratory
Shawn P.McKee, Assistant Research Scienti
st, Department of Physics, University of Michigan
Eric Myers, Research Physicist, Department of Physics, University of Michigan
Homer A. Neal, Professor of Physics, Director, UM-ATLAS Collaboratory Project,
University of Michigan
Jeffrey C. Ogden, UM Internet2 Coordinator, Associate Director, Merit Network
Kang G. Shin, Professor of Electrical Engineering and Computer Science,
Director, Real-Time Computing Laboratory, University of Michigan
Victor K. Wong, Academic Liaison and Director, Information Technology for
Research, University of Michigan
PARTICIPATING INSTITUTIONS
Univers
i
ty of Michigan
University Corporation for Advanced Internet Development
Merit Network, Inc.
EXECUTIVE SUMMARY
We propose to develop a secured, dynamic Quality of Service (QoS) computer
network in a production environment. This has never been done before. We will
focus our efforts mainly on implementing QoS at end-points and at the gigaPoP
level, but with a view towards using QoS over intermediate Internet2 QoS
domains when that becomes possible. This proposal builds on current work, seed
funded by several institutions, to develop a working QoS channel between
researchers at two end-points, specifically the University of Michigan (U-M)
Physics Department in Ann Arbor and the European Laboratory for Particle
Physics (CERN) in Geneva, Switzerland. We will do this by 1) implementing a
system of bandwidth brokers which use distributed authentication and
authorization to dynamically control differentiated services at the router
level; 2) implementing additional intermediate QoS domains and/or additional
end-points; 3) studying this network to further understand the requirements for
a scalable authorization policy and to understand the effects of network
topology on performance; and 4) testing the network in real-world situations
against practical research applications that require QoS. We expect our
proposed QoS network to serve as a deployable template for other research and
education communities that need to make use of QoS. Our larger goal is to
enable a whole series of collaborative research applicat
ions that require QoS for optimal performance, such as high quality interactive
video streaming and large scale distributed computing. We will test our QoS
network with several such applications, including those of direct use to the
current research efforts of the ATLAS project one of the worlds
largest scientific collaborations and the NGI Visible Human project. We
will also test our work with a variety of network monitoring and diagnostic
tools.
BACKGROUND: The need for QoS
Many research tasks of current and future interest require the transfer of
large amounts of data across computer networks in a timely fashion. The
Internet as it is presently constructed is not up to this task. Todays
Internet provides only a best effort data delivery service where
all network traffic receives the same priority, whether it contains
time-sensitive medical data or is just part of a popular music file. When
network traffic exceeds the capacity of some network segments to carry all of
the traffic, the network responds to the congestion by dropping some packets,
which must then be resent. Network congestion is usually not serious enough to
keep the resent packets from eventually reaching their final destination, but
the need to re-send the packets causes delays. Well-behaved network
applications are expected to slow their rate of data transmission when they
re-send packets, and thus adapt to the network resources that are available.
This allows large numbers of users and applications to share the network, but
the speed of delivery and the amount of network bandwidth available to a
specific user and application varies. The variability in network performance
is beyond the control of any single user or application.
The variable performance provided by the Internets best effort
; delivery service works well for many network applications, but not for
others. Examples of demanding applications that do not work well on today
;s Internet include high quality interactive video, the real-time control of
remote scientific instruments, scientific visualization where the computational
function and the display function are remote from each other, and distributed
grid computations where the use of remote computational and storage
elements must be tightly coordinated in order to perform the computation
efficiently.
Two approaches are being pursued to overcome the limitations of best
effort delivery. The first and so far the most common approach is to
build specialized portions of the network that are capable of very high
performance and which are open to relatively few users. This approach
over-provisions portions of the network in the hope that there will be little
or no network congestion, and hence no need to re-send packets, and no need for
applications to slow the data rate at which they transmit to the network,
resulting in good network performance with little variation due to external
factors.
The other approach is to implement different levels of network service that may
be requested by an application, so that important or time-sensitive traffic is
given preference over other traffic, much like first-class mail is given
preference over parcel post. With this approach there is no longer a single
best effort service. Instead, data packets are marked to indicate the service
level they require and capable networks give the packets different priorities.
The general name for such differentiated service is Quality of
Service (QoS). These two approaches, over provisioning and QoS, are not
mutually exclusive. In fact it seems likely that significant progress will
require that both approaches be pursued.
A particular example of the need for QoS is the ATLAS project [1], which
involves almost 2000 physicists around the world working to design, construct,
test, and operate a particle physics detector for CERNs Large Hadron
Collider (LHC). ATLAS participants need guaranteed high bandwidth and low
latency now for interactive collaboration (e.g., high quality video
conferencing) and for testing and implementing distributed grid computing. As
the ATLAS detector becomes operational it will generate, on average, several
terabytes of data per day (on the order of a petabyte per year), and
distributed grid computing is seen as possibly the only way that meaningful
physics can be extracted by dispersed researchers from such a large volume of
data. ATLAS researchers, both at Michigan and elsewhere, are beginning initial
work to implement both QoS and distributed grid computing.
Guaranteed high bandwidth and low latency will also be important in medical and
life sciences applications. An immediate example is provided by the Next
Generation Internet (NGI) Visible Human (VH) Project [2]. Its aim is to
develop NGI systems to serve visible human datasets in novel and educationally
useful ways. These include a comprehensive set of interactive 2D and 3D VH
browsers featuring arbitrary 2D cutting and 3D visualizations as well as an
interactive web navigation engine to create and visualize anatomic flythroughs
under the haptic control of the users. This will allow for delivery of several
simultaneous high quality digital streams, enabled by the QoS network system,
creating structured medical knowledge using the VH datasets.
Although the idea behind QoS is simple, implementing it in practice in a
real-world network environment has turned out to be a daunting task. The
ability to create a dedicated preferred channel has only recently been
demonstrated [3], but a number of other issues remain. In a dynamic
environment bandwidth allocations would have to be created and released as
needed, by making a request to a Bandwidth Broker (BB), which is a
program that controls the differentiated service of routers. Separate
bandwidth brokers will control different QoS domains, so some means of
coordinating bandwidth allocations across separate QoS domains is needed.
Abuse of preferred services could cause serious problems, so authentication and
authorization to and between Bandwidth Brokers will be an important component
of a practical QoS system.
Several other hurdles exist. One is the fact that it may not be possible to
have any control over intermediate network segments, which are often commercial
commodity networks and which may not support QoS. One solution to this
problem is the creation of separate over-provisioned intermediate network
segments, such as the Internet2 Abilene network. The biggest problems with
QoS, however, are expected to be at the end-point networks, roughly at the
level of either a campus network or a gigaPoP. The effects of adding QoS flows
to existing network traffic are presently unknown. One cannot hope to tune a
network to use QoS effectively until the effects of QoS on existing networks
can be measured and understood.
PRESENT WORK: Authenticated QoS Signaling
Because of the importance of QoS, work on implementing QoS is already underway
at the University of Michigan, with seed funding and participation from several
U-M sources: the Physics Department, the Center for Information Technology
Integration (CITI), the Office of the Vice President for Research (OVPR), the
office of the Chief Information Officer (CIO), and the College of Literature,
Science and the Arts (LS&A). Support is also provided by the University
Corporation for Advanced Internet Development (UCAID), Merit Network Inc, the
European Laboratory for Particle Physics (CERN), the NGI Visible Human project,
and the Argonne National Laboratory.
This non-NSF funded project focuses just on the problems of Authenticated QoS
Signaling [4]. In a dynamic network environment, QoS requires signaling to
bandwidth brokers to make and release bandwidth reservations. To avoid abuse,
these signals must be authenticated, and the bandwidth broker must verify the
authorization of the authenticated signals. The system we are constructing
uses the Akenti Access Control System [5] to provide Public Key (PK) based
authorization decisions, the Lightweight Directory Access Protocol (LDAP) for
storing and retrieving directory data (such as PK certificates, user, resource
or Akenti authorization data), and the Globus Architecture for Reservation and
Allocation (GARA) bandwidth broker software from Argonne National Laboratory.
The Internet2 Middleware Working group is developing LDAP schema to provide a
common directory namespace for Internet2 organizations. This effort has so far
produced a user schema, called Eduperson. One goal of our present project is to
use the Internet2 Middleware Working groups directory definitions and
provide some real-world feedback to them.
GARA is actually a part of ANLs Globus system for grid computing [6], so
the system will be tested using Globus as one of the underlying applications
requiring QoS. Because the University of Michigan relies heavily on Kerberos
for authentication and authorization, an important additional component is the
KX509 software from U-Ms Center for Information Technology Integration
(CITI), which creates and signs short-term PK certificates based on valid
Kerberos authentication of the requester, thus joining the Kerberos and PK
systems. Integration of KX509 with GARA will result in the ability to use
Kerberos as a bandwidth broker authentication method.
Although this project has only just started, significant progress has already
been made. Basic services are in place. The Akenti software has been compiled
and tested and is running in its base form. CITIs LDAP service is
currently being configured with the Internet2 Eduperson schema. The globus
software has been compiled, tested, and is now running. GARA, the globus
bandwidth broker component, is compiled and is currently being tested. A
router has been made available for the project. Once the router hardware has
been upgraded and the bandwidth broker is up and running we will begin to
reserve bandwidth with GARA in its current form. KX509, Akenti and LDAP
integration will follow.
Solving the problem of authenticated QoS authorization is an important step
toward a fully operational QoS network, though it is only one of many steps
toward that goal.
NEXT CHALLENGES: A secured, dynamic QoS that works end-to-end
This ITR grant will enable us to tackle the next set of problems that must be
overcome in order to create a practical QoS system. Our next goal is to move
from a simple demonstration of QoS signaling between two specific end-points
toward a production QoS deployment by developing the tools needed to realize
end-to-end QoS for diverse, and ultimately any, end-points. Once this has been
accomplished we will test the network using practical research applications
that require QoS. In particular, we propose to meet the following challenges:
I)
Analyze and understand how QoS router configurations affect a working
production network. Installing and removing configurations can lead to
combinations of configuration features that lead to undesirable network
behavior. These configurations include:
II)
Significantly extend the simple end-point only QoS domain model by:
III)
In our present work, a minimum set of Akenti attributes are used to describe a
simple authorization policy. In a production system, the policies will be much
more complex. While Akenti is designed to accommodate such complexity we will:
IV) Analyze and understand the effects of network topology on performance.
This includes:
V)
Train first ourselves and then additional network engineers, departmental
network administrators, application developers, and end users in what is
required to develop, deploy, operate, troubleshoot, and tune high performance
networking and QoS solutions.
VI)
Test the performance of the network in real-world situations against practical
research applications that require QoS, particularly those that will be of
direct use to the ATLAS project and the Visible Human project. We expect these
to include (but not be limited to):
There are a number of reasons why the University of Michigan is an ideal choice
for this project. There is strong interest at Michigan in making QoS work, as
evinced by the fact that work on QoS has already started. Multiple
organizations are committed to the project, both in and outside of the
University. These organizations have strong track records in software
development and advanced network design and operation, including the operation
of production networks. Through our partners we have access to the underlying
network infrastructure within departments at U-M, the U-M campus network, the
Michigan gigaPoP, Internet2/Abilene, STARTAP, and CERN. We have several real
world applications that require QoS, and a strong incentive to get those
applications working. We anticipate the participation of students from our
Research Experience for Undergraduates at CERN program, which will involve them
in both the latest research in particle physics and the newest developments in
computer netw
orking. This project will support real science and education by putting
applications that use QoS in the hands of end users through our work with the
ATLAS and Visible Human projects.
1
Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:
Development of a Deployable, Authenticated QoS