1. ITR/SI+AP: GUARANTEEING HIGH-BANDWIDTH CONNECTIONS IN A DYNAMIC DISTRIBUTED NETWORK: DEVELOPMENT OF A DEPLOYABLE, AUTHENTICATED QoS
    1. INVESTIGATORS
    2. PARTICIPATING INSTITUTIONS
    3. EXECUTIVE SUMMARY
    4. BACKGROUND: The need for QoS
    5. PRESENT WORK: Authenticated QoS Signaling
    6. NEXT CHALLENGES: A secured, dynamic QoS that works end-to-end

ITR/SI+AP: GUARANTEEING HIGH-BANDWIDTH CONNECTIONS IN A DYNAMIC DISTRIBUTED NETWORK: DEVELOPMENT OF A DEPLOYABLE, AUTHENTICATED QoS

ITR/SI+AP: GUARANTEEING HIGH-BANDWIDTH CONNECTIONS IN A DYNAMIC DISTRIBUTED NETWORK: DEVELOPMENT OF A DEPLOYABLE, AUTHENTICATED QoS

 

 

INVESTIGATORS

 

William A. Adamson, Research Investigator, Information Technology Division, Assistant Director, Center for Information Technology Integration, University of Michigan

 

Brian Athey, Assistant Professor of Biology, Director, University of Michigan Visible Human Project

 

Steve Corbat ?, Director, Backbone Network, University Corporation for Advanced Internet Development

 

Farnam Jahanian, Associate Professor of Electrical Engineering and Computer Science, Director of Software Systems Laboratory

 

Shawn P.McKee, Assistant Research Scienti st, Department of Physics, University of Michigan

 

Eric Myers, Research Physicist, Department of Physics, University of Michigan

 

Homer A. Neal, Professor of Physics, Director, UM-ATLAS Collaboratory Project, University of Michigan

 

Jeffrey C. Ogden, UM Internet2 Coordinator, Associate Director, Merit Network

 

Kang G. Shin, Professor of Electrical Engineering and Computer Science, Director, Real-Time Computing Laboratory, University of Michigan

 

Victor K. Wong, Academic Liaison and Director, Information Technology for Research, University of Michigan

 

 

PARTICIPATING INSTITUTIONS

 

Univers i ty of Michigan

University Corporation for Advanced Internet Development

Merit Network, Inc.

 

EXECUTIVE SUMMARY

 

We propose to develop a secured, dynamic Quality of Service (QoS) computer network in a production environment. This has never been done before. We will focus our efforts mainly on implementing QoS at end-points and at the gigaPoP level, but with a view towards using QoS over intermediate Internet2 QoS domains when that becomes possible. This proposal builds on current work, seed funded by several institutions, to develop a working QoS channel between researchers at two end-points, specifically the University of Michigan (U-M) Physics Department in Ann Arbor and the European Laboratory for Particle Physics (CERN) in Geneva, Switzerland. We will do this by 1) implementing a system of bandwidth brokers which use distributed authentication and authorization to dynamically control differentiated services at the router level; 2) implementing additional intermediate QoS domains and/or additional end-points; 3) studying this network to further understand the requirements for a scalable authorization policy and to understand the effects of network topology on performance; and 4) testing the network in real-world situations against practical research applications that require QoS. We expect our proposed QoS network to serve as a deployable template for other research and education communities that need to make use of QoS. Our larger goal is to enable a whole series of collaborative research applicat ions that require QoS for optimal performance, such as high quality interactive video streaming and large scale distributed computing. We will test our QoS network with several such applications, including those of direct use to the current research efforts of the ATLAS project – one of the world’s largest scientific collaborations – and the NGI Visible Human project. We will also test our work with a variety of network monitoring and diagnostic tools.

 

BACKGROUND: The need for QoS

 

Many research tasks of current and future interest require the transfer of large amounts of data across computer networks in a timely fashion. The Internet as it is presently constructed is not up to this task. Today’s Internet provides only a “best effort” data delivery service where all network traffic receives the same priority, whether it contains time-sensitive medical data or is just part of a popular music file. When network traffic exceeds the capacity of some network segments to carry all of the traffic, the network responds to the congestion by dropping some packets, which must then be resent. Network congestion is usually not serious enough to keep the resent packets from eventually reaching their final destination, but the need to re-send the packets causes delays. Well-behaved network applications are expected to slow their rate of data transmission when they re-send packets, and thus adapt to the network resources that are available. This allows large numbers of users and applications to share the network, but the speed of delivery and the amount of network bandwidth available to a specific user and application varies. The variability in network performance is beyond the control of any single user or application.

The variable performance provided by the Internet’s “best effort” ; delivery service works well for many network applications, but not for others. Examples of demanding applications that do not work well on today’ ;s Internet include high quality interactive video, the real-time control of remote scientific instruments, scientific visualization where the computational function and the display function are remote from each other, and distributed “grid” computations where the use of remote computational and storage elements must be tightly coordinated in order to perform the computation efficiently.

Two approaches are being pursued to overcome the limitations of “best effort” delivery. The first and so far the most common approach is to build specialized portions of the network that are capable of very high performance and which are open to relatively few users. This approach over-provisions portions of the network in the hope that there will be little or no network congestion, and hence no need to re-send packets, and no need for applications to slow the data rate at which they transmit to the network, resulting in good network performance with little variation due to external factors.

The other approach is to implement different levels of network service that may be requested by an application, so that important or time-sensitive traffic is given preference over other traffic, much like first-class mail is given preference over parcel post. With this approach there is no longer a single best effort service. Instead, data packets are marked to indicate the service level they require and capable networks give the packets different priorities. The general name for such differentiated service is “Quality of Service” (QoS). These two approaches, over provisioning and QoS, are not mutually exclusive. In fact it seems likely that significant progress will require that both approaches be pursued.

A particular example of the need for QoS is the ATLAS project [1], which involves almost 2000 physicists around the world working to design, construct, test, and operate a particle physics detector for CERN’s Large Hadron Collider (LHC). ATLAS participants need guaranteed high bandwidth and low latency now for interactive collaboration (e.g., high quality video conferencing) and for testing and implementing distributed grid computing. As the ATLAS detector becomes operational it will generate, on average, several terabytes of data per day (on the order of a petabyte per year), and distributed grid computing is seen as possibly the only way that meaningful physics can be extracted by dispersed researchers from such a large volume of data. ATLAS researchers, both at Michigan and elsewhere, are beginning initial work to implement both QoS and distributed grid computing.

Guaranteed high bandwidth and low latency will also be important in medical and life sciences applications. An immediate example is provided by the Next Generation Internet (NGI) Visible Human (VH) Project [2]. Its aim is to develop NGI systems to serve visible human datasets in novel and educationally useful ways. These include a comprehensive set of interactive 2D and 3D VH browsers featuring arbitrary 2D cutting and 3D visualizations as well as an interactive web navigation engine to create and visualize anatomic flythroughs under the haptic control of the users. This will allow for delivery of several simultaneous high quality digital streams, enabled by the QoS network system, creating structured medical knowledge using the VH datasets.

Although the idea behind QoS is simple, implementing it in practice in a real-world network environment has turned out to be a daunting task. The ability to create a dedicated preferred channel has only recently been demonstrated [3], but a number of other issues remain. In a dynamic environment bandwidth allocations would have to be created and released as needed, by making a request to a “Bandwidth Broker” (BB), which is a program that controls the differentiated service of routers. Separate bandwidth brokers will control different QoS domains, so some means of coordinating bandwidth allocations across separate QoS domains is needed. Abuse of preferred services could cause serious problems, so authentication and authorization to and between Bandwidth Brokers will be an important component of a practical QoS system.

Several other hurdles exist. One is the fact that it may not be possible to have any control over intermediate network segments, which are often commercial commodity networks and which may not support QoS. One solution to this problem is the creation of separate over-provisioned intermediate network segments, such as the Internet2 Abilene network. The biggest problems with QoS, however, are expected to be at the end-point networks, roughly at the level of either a campus network or a gigaPoP. The effects of adding QoS flows to existing network traffic are presently unknown. One cannot hope to tune a network to use QoS effectively until the effects of QoS on existing networks can be measured and understood.

 

PRESENT WORK: Authenticated QoS Signaling

 

Because of the importance of QoS, work on implementing QoS is already underway at the University of Michigan, with seed funding and participation from several U-M sources: the Physics Department, the Center for Information Technology Integration (CITI), the Office of the Vice President for Research (OVPR), the office of the Chief Information Officer (CIO), and the College of Literature, Science and the Arts (LS&A). Support is also provided by the University Corporation for Advanced Internet Development (UCAID), Merit Network Inc, the European Laboratory for Particle Physics (CERN), the NGI Visible Human project, and the Argonne National Laboratory.

This non-NSF funded project focuses just on the problems of Authenticated QoS Signaling [4]. In a dynamic network environment, QoS requires signaling to bandwidth brokers to make and release bandwidth reservations. To avoid abuse, these signals must be authenticated, and the bandwidth broker must verify the authorization of the authenticated signals. The system we are constructing uses the Akenti Access Control System [5] to provide Public Key (PK) based authorization decisions, the Lightweight Directory Access Protocol (LDAP) for storing and retrieving directory data (such as PK certificates, user, resource or Akenti authorization data), and the Globus Architecture for Reservation and Allocation (GARA) bandwidth broker software from Argonne National Laboratory.

The Internet2 Middleware Working group is developing LDAP schema to provide a common directory namespace for Internet2 organizations. This effort has so far produced a user schema, called Eduperson. One goal of our present project is to use the Internet2 Middleware Working group’s directory definitions and provide some real-world feedback to them.

GARA is actually a part of ANL’s Globus system for grid computing [6], so the system will be tested using Globus as one of the underlying applications requiring QoS. Because the University of Michigan relies heavily on Kerberos for authentication and authorization, an important additional component is the KX509 software from U-M’s Center for Information Technology Integration (CITI), which creates and signs short-term PK certificates based on valid Kerberos authentication of the requester, thus joining the Kerberos and PK systems. Integration of KX509 with GARA will result in the ability to use Kerberos as a bandwidth broker authentication method.

Although this project has only just started, significant progress has already been made. Basic services are in place. The Akenti software has been compiled and tested and is running in its base form. CITI’s LDAP service is currently being configured with the Internet2 Eduperson schema. The globus software has been compiled, tested, and is now running. GARA, the globus bandwidth broker component, is compiled and is currently being tested. A router has been made available for the project. Once the router hardware has been upgraded and the bandwidth broker is up and running we will begin to reserve bandwidth with GARA in it’s current form. KX509, Akenti and LDAP integration will follow.

Solving the problem of authenticated QoS authorization is an important step toward a fully operational QoS network, though it is only one of many steps toward that goal.


NEXT CHALLENGES: A secured, dynamic QoS that works end-to-end

 

This ITR grant will enable us to tackle the next set of problems that must be overcome in order to create a practical QoS system. Our next goal is to move from a simple demonstration of QoS signaling between two specific end-points toward a production QoS deployment by developing the tools needed to realize end-to-end QoS for diverse, and ultimately any, end-points. Once this has been accomplished we will test the network using practical research applications that require QoS. In particular, we propose to meet the following challenges:

I)   Analyze and understand how QoS router configurations affect a working production network. Installing and removing configurations can lead to combinations of configuration features that lead to undesirable network behavior. These configurations include:

II)   Significantly extend the simple end-point only QoS domain model by:

III)   In our present work, a minimum set of Akenti attributes are used to describe a simple authorization policy. In a production system, the policies will be much more complex. While Akenti is designed to accommodate such complexity we will:

IV) Analyze and understand the effects of network topology on performance. This includes:

V)   Train first ourselves and then additional network engineers, departmental network administrators, application developers, and end users in what is required to develop, deploy, operate, troubleshoot, and tune high performance networking and QoS solutions.

 

VI)   Test the performance of the network in real-world situations against practical research applications that require QoS, particularly those that will be of direct use to the ATLAS project and the Visible Human project. We expect these to include (but not be limited to):

There are a number of reasons why the University of Michigan is an ideal choice for this project. There is strong interest at Michigan in making QoS work, as evinced by the fact that work on QoS has already started. Multiple organizations are committed to the project, both in and outside of the University. These organizations have strong track records in software development and advanced network design and operation, including the operation of production networks. Through our partners we have access to the underlying network infrastructure within departments at U-M, the U-M campus network, the Michigan gigaPoP, Internet2/Abilene, STARTAP, and CERN. We have several real world applications that require QoS, and a strong incentive to get those applications working. We anticipate the participation of students from our Research Experience for Undergraduates at CERN program, which will involve them in both the latest research in particle physics and the newest developments in computer netw orking. This project will support real science and education by putting applications that use QoS in the hands of end users through our work with the ATLAS and Visible Human projects.

 

Back to top


1

 

 

Guaranteeing High-Bandwidth Connections in a Dynamic Distributed Network:

Development of a Deployable, Authenticated QoS