MATC: Michigan ATLAS Tier 2 Center
A Proposal to Establish a
Research and Development Oriented
Prototype Tier 2 Computing Center
at the University of Michigan
Project Director
William R. Martin
Nuclear Engineering and Radiological Sciences
Co-Principal Investigators
Abhijit Bose
Center for Parallel Computing
Thomas Hacker
Center for Parallel Computing
Shawn McKee
Department of Physics Homer A. Neal
Department of Physics
February 28, 2001
University of Michigan
Ann Arbor, MI 48109
I. Introduction
The University of Michigan (UM) ATLAS group hereby submits a proposal for a
U.S. ATLAS Tier 2 Computing Center that leverages existing activities and
resources at the UM and modest external funds to create a powerful entity in
support of the goals of the ATLAS Project. The proposed effort will be carried
out by a very strong UM team that covers the spectrum of physics analysis and
grid computing, from network analysis and measurement to support of
data-intensive computing and support of specific applications such as ATLAS
simulation and reconstruction. Moreover, this proposal takes advantage of
extraordinary current UM facilities and resources, including massively parallel
computing systems, data-intensive systems including robot tape storage,
support from expert systems support staff who are active in grid work, and
expert user consultant staff who are knowledgeable in nearly all areas of
advanced computing including data-intensive computing.
Section II describes the prototype Tier 2 Center we propose to establish at
Michigan. Section III summarizes the current activities in advanced computing
that are underway at Michigan, including grid computing, parallel computing,
data-intensive computing, and network analysis and tuning, including QoS.
Section IV continues this discussion of current UM efforts by describing the
ongoing ATLAS-related computing activities at Michigan. Section V delineates
the specific personnel that will contribute to the Tier 2 Center, both directly
and indirectly. These two sections indicate the extent to which the prototype
Tier 2 Center will complement and leverage existing UM activities and UM
resources that will play an important role in the operation of the Tier 2
Center. Section VI presents the proposed budget for the prototype Tier 2 Center
and Section VII discusses stated criteria for the prototype Tier 2 Centers and
the UM resources and activities that respond to these criteria.
II. Prototype Tier 2 Center
Introduction
This section describes the overall goals and objectives of the prototype Tier 2
Center along with a description of its resources, management, and activities
over the next two years. Subsequent sections include a proposed budget for two
years operation and a discussion of the current activities and resources at the
UM which will contribute to the Tier 2 Center.
Overall Goals for the ATLAS Prototype Tier 2 Centers
The overarching goals of the ATLAS Tier 2 Centers are to:
Objectives for the UM Prototype Tier 2 Center
In support of these overall goals, we have defined the following specific
objectives for our proposed Tier 2 Center:
We feel that if we are successful in carrying out these objectives, the
overarching goals of the ATLAS Tier 2 initiative will be met. Carrying out
these objectives will involve establishing and operating a robust
grid-computing environment that will offer access to UM compute-intensive and
data-intensive resources to users on and off campus. Much of this is already in
place and the proposed Tier 2 Center will involve consolidation and focusing
of current activities, along with modest additional investments in key areas,
into a coherent and flexible center that will carry out its role as a prototype
Tier 2 Center.
Major Tasks of the Center
To meet the objectives outlined above, we propose to undertake the following
major tasks:
Establish the Testbed Facility
. This task includes the completion of the Pentium III cluster as well as the
integration of the mass store system with the cluster. These are described as
separate tasks below. The mass storage system is already fully integrated with
the SP2/SP3 system. This effort will be completed within six months of project
inception.
Complete the Cluster
. In partnership with several other faculty groups on campus, we propose the
investment of project funds and current staff resources to finish the 128 cpu
Pentium III cluster, which is nearly complete as of the date of this proposal.
This investment will allow the Tier 2 Center to access a significant share
(about 16 nodes) of the cluster at any time as well as access to a major
fraction of the cluster on occasion for development or demonstration purposes.
It will be possible using queuing priorities to configure a portion of the
cluster (e.g., 16 nodes) for specific needs and ready access by the Tier 2 user
community.
Integrate the Mass Store with the Cluster
. We will install equipment as needed to allow fast access to the mass storage
system from the cluster and will install the necessary software on the cluster
to allow the integrated system to meet the needs of the Tier 2 user community.
The CPC consulting staff will assist the Tier 2 users to utilize the system and
the CPC systems staff will assist in system development and configuration to
allow efficient and convenient use of these systems. The CPC plans to link
together its mass storage system, AFS, and Globus to provide a uniform global
file system that can be used locally to manage experimental data, then
processed with grid resources. In addition to the large mass storage system, a
640 GB high speed RAID storage system will be available to the nodes on the
cluster. The 640 GB high speed storage, in combination with the 42 TB mass
storage system, will allow ATLAS users to prefetch and stage datasets from the
mass storage system into the RAID on an as-need basis. Results of computation
can also be stored on the high speed RAID for later migration to the mass
storage system. This integrated pyramid approach to storage will provide ATLAS
users with both high speed and large scale access to the datasets generated
from experiments.
Establish the User Testbed and Suite of Prototypical Services
. This task will involve the close collaboration of the ATLAS Integration and
Systems Support teams (these teams are defined under Structure of
Center below) to ensure that the user demands for the prototypical Tier 2
Center are representative of the demands expected for the permanent Tier 2
Centers. This effort will also include an examination of the feasibility for
offering enhanced services to properly authenticated users who have a need for
specific resources of interest to ATLAS, including expedited processing or
priority access to scarce resources (e.g., access to a large data archive) This
task should be completed within a year of project inception.
Assess Performance
. We plan to assess performance of the Tier 2 Center to deliver compute and
data services for a broad spectrum of ATLAS needs. Users will be able to access
two distinct systems -- the integrated cluster/mass store system or the
integrated SP/mass store system -- and we plan to assess performance of both
alternatives to providing services to the ATLAS community. If possible, we
would like to compare and contrast the cost and performance tradeoffs of our
Tivoli-based (ADSM) data management system with the HPSS-based system that is
present at the prototype Tier 2 Center at Indiana University (IU). This task
will include results from the Mock Data Challenge 2 task described next.
To provide guidance to future Tier 2 sites in the selection of appropriate
hardware and software solutions to support the Atlas effort, we will assess the
relative performance of various commercial and non-commercial hardware and
software components. A taxonomy of critical factors that directly affect the
ability of the computer systems and networks to deliver expected performance
will be developed. An assessment of the contributions of various software,
hardware and network components to the performance of Atlas applications based
upon that taxonomy will be performed. The range of systems that will be
analyzed will range include the IBM SP3 and SP2 systems, the Intel Pentium III
cluster, and potentially other hardware resources available within the NPACI
partnership. A final report will be issued based upon these measurements, and a
specific set of requirements and recommendations based upon the measurements
will be issued.
Mock Data Challenge 2
. One of the prototype centers primary responsibilities will be to take part in
the ATLAS Mock Data Challenge 2 (MDC2) starting in January of 2003. MDC2 will
test the ATLAS computing model with special emphasis on the grid component.
While the anticipated data size will be ~10% of a normal high luminosity year
(around 100 TB), the number of interconnects (computers) will be more like 50%.
Thus the viability and performance of the grid will be realistically tested.
The analysis load imposed by 100 million events will require something like
1000 machines of 30 Si95 power, running for 10 million CPU-seconds. Multiple
Tier 1 centers will be involved, but, because of our large involvement in
ATLAS, we anticipate the US will carry a significant fraction of the analysis
load (~25%). This implies the Tier 1 center and the 2 prototype Tier 2 centers
will have to provide significant resources to carry out a successful MDC2 in
the US. The hardware resources available at a prototype Tier 2 center at
Michigan will more than sufficient for MDC2. We will have a dedicated 16 node
component of the CPCs cluster available full time for the analysis. Local
disk storage will be built up to 1 TB with the ADSM system providing near-line
storage of up to 20 TB. Additional computing resources will also be available
from the CPC cluster, though not dedicated for the ATLAS MDC2. Our network
connectivity will be at least at the OC12 level to each of the 16 nodes in the
cluster and our global network connectivity to Internet2 could be at the OC48
level or greater, by the time of MDC2. We plan to hire a second Tier 2 FTE
during the second year of the prototype center whose primary responsibility
will be to plan for and carry out MDC2. We envision hiring someone with broad
knowledge in both hardware and software who can prepare for and oversee the
MDC2 activities at our Tier 2 center. This person would additionally serve as
another software/user support person within the center, albeit one who is
especially knowledgeable about ATLAS software.
User Support and Consulting.
It is imperative that a project of this magnitude have a very strong user
support component to help project personnel in using the ATLAS Grid as well as
port existing applications to the Grid architecture. Once the Grid is
operational, there will be integration issues among the various components that
make up the Grid (such as Globus, data storage, application interfaces that
are used to access network resources, etc.) for both application developers
and users. The staff at CPC bring valuable experience to this effect by
serving as one of the four national sites as part of the NPACI Consulting and
Strategic Applications Collaborations (SAC) activities since 1998. Randy
Crawford and Abhijit Bose have been handling the day-to-day consulting and
support for the national users of NPACI along with staff from UCSD, the
University of Texas, and Caltech. Our experience with remote consulting (via
the Remedy system) and distance education can be applied and duplicated across
the participating institutions of the ATLAS project.
Collaboration with other US ATLAS Sites
. We are aware that several other sites within US ATLAS, in addition to the
Tier-1 Center, have considerable strengths and much to offer in achieving the
overall goals stated for US ATLAS computing. We already have good working
relations with the key individuals at the Indiana Tier-2 Center, a Center for
which we would offer complementary strengths, and would make extraordinary
efforts to involve colleagues at the other US ATLAS sites in our activities as
well. This could include their full participation in the planning of Center
activities and involvement in the various test bed projects.
Structure of the Tier 2 Center
The Center will have an organizational structure organized around the following
research and development areas:
Management of Center
The Center will have a Director and leaders for each of the above areas. An
Executive Committee consisting of the area leaders, three faculty, and the
Director will make decisions regarding future directions and allocation of
resources to carry out the goals and objectives of the Center. The Director
will be W. Martin, who will provide 20% of his time during the calendar year in
support of this initiative, and we propose that half (10%) be provided by the
Tier 2 grant and half by the UM. Homer Neal and the area leaders (A. Bose, T.
Hacker, and S. McKee) are co-principal investigators on this grant and they
will provide their time and services to the Tier 2 Center in support of their
roles on the Executive Center and in managing and directing activities that
support the Tier 2 Center. No new funding is being requested for the
co-principal investigators as their efforts will be covered by other UM
sources. The estimated efforts of the project director and co-principal
investigators are summarized in Section V below.
Computing Resources
The Center computing resources can be divided into resources needed to carry
out its research and development role versus resources needed to allow the
Center to be prototypical of a planned Tier 2 Center. Access to development
resources is needed to allow our system support and user support staff to
develop and implement the infrastructure to allow the grid-computing
environment to work in the context of the Michigan facilities. Access to
demonstration resources is needed to ensure that that realistic services are
provided in response to actual user demands, however we anticipate modest
production usage of the Center resources, only enough to quantify the impact of
such usage on the overall computing environment. Both types of resources will
entail sporadic, incidental, and usually short duration access to the primary
CPC resources (computing and data) but development access will also require
ancillary equipment such as a high end workstation and visualization equipment
and perhaps network equipment that can be essentially dedicated for MATC
activities. In addition, we propose to make extensive use of the 128 CPU Intel
Pentium III cluster
that is currently being built. We propose to invest modest resources
(~ $40,000) in this facility to enhance it for the purpose of our proposed
center, and the remaining funds (~ $40,000) will be used just prior to MDC2 to
enable us to meet the hardware needs of the challenge.
A major component of our proposed effort is the assessment and comparison of
two different approaches to providing the compute cycles for the Tier 2 Center,
namely via the IBM SP2/SP3 facility and via a locally-developed and maintained
cluster. The UM will provide sufficient access to its current SP2/SP3
facilities to allow this evaluation and assessment. This access is enabled by
the recent decision by the UMs College of Literature, Science, and the
Arts (LSA) to invest $200,000 into the current CPC facilities, which will
guarantee access to a substantial portion of the SP facility, including some
production usage as needed during the MDC2 campaign.
The cluster is being built by a combination of funds from several faculty
groups at the UM and the proposed investment of additional funds as a part of
this initiative will provide the Tier 2 Center with guaranteed access to a
portion of the cluster as well as occasional access to the entire cluster, thus
providing access to a substantial computing capability as well as computing
capacity.
Staff Resources
The Tier 2 Center staff resources will consist of professional computing staff
who will engage in research, system development, and user support functions. In
addition, there will be staff who are physicists who are actively engaged in
ATLAS activities and who will provide the crucial advice and oversight
regarding services needed to be provided by the Center as well as playing the
key role of testing and evaluating the integrated computing infrastructure in
the Tier 2 Center for ATLAS-related activities. Ongoing activities in these
areas by current faculty and staff will allow us to fulfill the objectives of
our Center with a minimum of additional staff. A list of participants and their
estimated efforts, both directly contributing to the Tier 2 Center as well as
related efforts that indirectly contribute, is provided in Section V.
As detailed in Section V, ongoing activities of approximately 0.7
FTEs on the CPC staff are directly related to the current project in the key
areas of grid computing, networking, and operating and maintaining the
compute-intensive and data-intensive facilities. Complementing the CPC effort
is an additional 1.6 FTEs (direct participation) from the ATLAS group, who will
play a key role with the research and development groups to create, deploy,
and assess the suite of prototypical services for the ATLAS users.
Therefore, given the extensive ongoing activities that will contribute heavily
to meeting the objectives of our proposed center, we will only need to hire one
additional full-time staff during the first year of the prototype Tier 2
Center. This person would ideally be a systems support person with experience
in grid computing, data-intensive computing, and networking, who could provide
support to the Research and ATLAS Integration areas as well as the Systems
Support area. This would not be a senior person but someone who would report to
T. Hacker above in his role as head of the Systems Support area.
During year 2, we propose to hire a second systems support person. This person
would ideally be a more senior person who could take over the operational
responsibilities of the Tier 2 Center as it grows and its activities expand to
meet the increased needs of the user community. One of the primary
responsibilities of this new hire would be to prepare for and coordinate the
Mock-Data Challenge 2 effort for our Tier 2 prototype center.
III. Advanced Computing Activities at Michigan
The University of Michigan has a long history in the development and operation
of state-of-the-art networking and computing facilities. Indeed, it was the
NSFnet facility operated by Merit, Inc. and the University of Michigan that was
the first network management system established to support the burgeoning
needs of U.S. research universities. Today, the headquarters for UCAID
(Internet2) are located in Ann Arbor and that consortium is led by an
individual who is a member of the UM faculty.
We will not provide an extensive overview of these entities, but will make a
few comments about some of the existing campus computing structures and
capabilities that would play key roles in the support and operation of the
proposed Tier 2 Center.
Center for Parallel Computing
The Center for Parallel Computing (CPC) was established in 1992 as a result of
a major infrastructure grant from the NSF and substantial funding from Ford
Motor Company. The CPC operates a suite of parallel computing systems and an
integrated mass storage system, both for the UM user community as well as the
national user community affiliated with the NPACI partnership, which is
described in more detail later. The resources and activities of CPC are
described in a following section.
[Ref:
http://www.engin.umich.edu/labs/cpc]
Laboratory for Scientific Computation
The Laboratory for Scientific Computation (LaSC) is an interdisciplinary unit
with the primary goal to foster and promote the use of advanced computational
methods in research and instructional activities at the UM. LaSC provides
consultation and assistance in scientific computing, numerical methods, and
parallel computing, both for research and instruction. In addition, LaSC
administers the Doctoral Degree Program in Scientific Computing and coordinates
the Department of Energy (DoE) Computational Fellowship Program at the UM.
Parallel and Cluster Computing at Michigan
The UM has a strong and extensive record in parallel computing, beginning 15
years ago with the NCUBE-64 hypercube parallel processor and continuing to
todays large IBM SP2 and SP3 systems. In the intervening years, the UM
obtained and operated many diverse parallel systems, including KSR-1, KSR-2,
Convex Exemplar, SGI PowerChallenge, and SGI Origin systems. The parallel
systems currently operated by CPC including the following:
The IBM SP2/SP3 systems will be designated SP systems in the
following sections. We propose to make extensive use of the SP systems and the
Pentium III cluster in the Tier 2 Center, as will be discussed in detail in the
following sections.
Data-intensive Computing at Michigan
There are several data-intensive computing systems and initiatives under way at
the UM that will be leveraged for the ATLAS project. The CPC operates a 42 TB
mass storage system based on Tivoli that includes 2 TB of disk. The mass
storage space is made available to local and national researchers via NFS and
Storage Resource Broker (SRB), a national scale data system developed by the
lead site of the NPACI partnership, San Diego Supercomputer Center. As a major
participant in the Data Intensive Computing thrust area in NPACI, members of
the CPC staff (Hacker, Mach, Britt) are involved in a distributed development
effort with staff at SDSC to expand the number of platforms and storage systems
that SRB can interact with to build a national scale distributed storage
system. In addition to the NPACI SRB activities, the CPC, in conjunction with
the ATLAS project, will work with SRB development staff at SDSC and within
NPACI to investigate the use of the SRB system and NPACI data intensive
computing resources to store and deliver ATLAS datasets on a national and
international scale. In parallel with the SRB work, the CPC (Hacker, Mach,
Britt) is putting into production a multi-resident AFS server (MR-AFS) that
will make the mass storage system available to the worldwide research community
via the AFS file system. In addition, the CPC will be deploying a 640 GB Linux
RAID to support data-intensive computing activities for the Intel cluster
described in this proposal.
The CPC recently expanded the software base of the San Diego Supercomputer
Center Storage Resource Broker by adding support within SRB for ODBC-compliant
databases, in particular Sybase. This extended SRB's existing functionality of
using ODBC primitives that can support both ODBC databases and IBM's DB2 at the
same time. The expertise of CPC staff in handling terascale levels of data as
well as developing core technologies supporting data-intensive computing will
be valuable to the ATLAS project because of the head-start we have gained in
these areas. [Ref:
http://www.npaci.edu/DICE/SRB/index.html]
The UMs Center for Information Technology Integration (CITI) is
developing several storage technologies that will be utilized at the CPC in
support of NPACI and ATLAS computing activities. CITI, with the support of a
grant from Sun, is developing NFS version 4, which is designed for wide area
networks and provides strong authentication and authorization mechanisms. [Ref:
http://www.citi.umich.edu/projects/nfsv4]
Additionally, with the support of the Naval Research Laboratory, CITI has
developed a version of AFS for very high speed networks that operates natively
on high speed. ATM networks. CPC is currently investigating the integration of
these new technologies into the data-intensive computing support
infrastructure.
[Ref: Antonelli, C. AFS for Very High Speed Networks,
http://www.citi.umich.edu/projects/vafs]
The production systems operated by CPC, in conjunction with the research and
development activities of CPC and CITI, provide a world-class data
infrastructure base for ATLAS computing activities to be supported at the
proposed Tier 2 Center.
Staff Resources for Computing at Michigan
The CPC facilities are operated and maintained by three systems support staff
and three user support staff. The systems support staff are associated with
CAEN, the College of Engineering Computer-Aided Engineering Network, a
comprehensive organization that operates and maintains a system of nearly 2000
workstations and supporting network and computing infrastructure. The user
support staff is associated with LaSC, and together with the systems support
staff, assist in the support of the CPC facilities. For convenience, all of
these staff will be described as CPC staff for the purpose of this proposal
since they are responsible for operating, maintaining, and supporting the CPC
facilities.
A premium has been placed on user support staff who are knowledgeable about a
wide spectrum of computing issues, including numerical methods and algorithms
for parallel computers. Thus two have PhD degrees (Mathematics and Mechanical
Engineering) and two have MS degrees in Computer Science. All are expert in
scientific computation and parallel computing. Having expert user support staff
available to talk to current and prospective users is critical to the success
of CPC and is a key ingredient of our proposed Tier 2 Center. One of the user
support staff (A. Bose) has also been very active in research issues related to
network QoS and Internet backbone networks as described below in more detail.
The CPC systems support staff are highly experienced and are responsible for
the operation and maintenance of the parallel systems described above, as well
as the mass store system and the visualization facilities. They are also active
in grid computing and networking, as noted below.
In addition to the CPC support staff, there are several other staff at the UM
who are involved in activities directly and indirectly related to this proposed
Tier 2 Center. These include staff involved in grid computing related to ATLAS
(Myers, McKee)
and a staff (Myers) who is actively involved in networking (QoS). Andy Adamson,
a staff member of CITI, is developing a QoS capability that is tightly
integrated with GARA (Globus resource manager), the Internet2 Qbone, and
PKI-based authentication and authorization mechanisms to manage access to
backbone QoS facilities.
Grid Computing at Michigan
Our systems support staff is also deeply involved in grid computing. There are
several campus-wide and national level grid projects in which CPC is involved.
On campus, CPC (Hacker, Mach) and Physics (Neal, McKee, Myers) are working with
CITI on wide area network Quality of Service (QoS) authorization and
reservation mechanisms that are tightly integrated with Globus. Additionally,
CPC staff (Hacker, Mach, Britt) are working on building a bridge between grid
environments such as Legion and Globus that are based on X.509 authentication;
with Kerberos and the global AFS file system. The CPC plans to link together
its mass storage system, AFS, and Globus to provide a uniform global file
system that can be used locally to manage experimental data, then processed
with grid resources.
On a national scale, one of the systems staff (Hacker) is involved in the
Global Grid Forum (GGF) as a co-chair of the GGF Accounting working group. The
working group is involved in defining and implementing Grid wide standards for
access account interchange (user accounts), accounting record interchange, and
defining means of balance and exchange (based on economy models) between
resource providers and consumers on the Grid.
Also, a recent initiative (Hacker, Mach, Britt) with the University of Texas at
Austin has resulted in a working prototype grid consisting of 64
node SP2 systems at both institutions that are operated essentially as a single
load balanced system to users accessing the system. The Michigan/Texas SP2
grid uses commercially available technology, such as AFS and LoadLeveller to
provide a base production-computing infrastructure, and is investigating the
tight integration of Globus into the system to provide wider access to the
resources to the Grid user community. The CPC plans to integrate the results of
Grid research currently underway at the University of Michigan and the Global
Grid Forum into an NPACI wide metasystems computing Grid.
Within the NPACI community, the University of Michigan is a significant
mid-range partner site that is actively working with SDSC on creating a
production quality Grid computing system between the NPACI partners based on a
combination of Globus and Legion. The work currently being done by systems
staff at SDSC, University of Texas and University of Michigan is putting the
foundations into place on a national scale to support Grid computing for the
national and international research community.
QoS Guarantees and Grid Computing
A challenging problem in Grid computing is to guarantee specific service levels
to users. These service levels may comprise network resources such as
guaranteed bandwidth, packet-loss ratio, delay and available bit rate along
with end-to-end guarantees on acceptable and measurable service metrics for
applications that use the grid. For example, in the ATLAS Grid, an acceptable
guarantee can be defined as sustained bandwidth, a pre-agreed priority level
for all data packets among participating nodes in the Grid (these can be the
data caches that will require period synchronization), prioritization over
other traffic in the grid etc.
In such a scenario, packets destined to and from the ATLAS Grid can be
processed in the underlying network (routers and switching fabric) such that
the service levels can be met. This will be crucial as scientists and project
personnel will require timely and efficient access to the very large volumes of
data that will be generated during the experiment. e.g., using the file
transfer protocol (ftp) that are available in today's Internet will not work
for ATLAS. However, a framework that can reserve and guarantee the appropriate
service levels required to move terabyte-sized data files using ftp can be used
in ATLAS.
The Differentiated Services (DiffServ) working group at the Internet
Engineering Task Force (IETF) was formed to standardize a common mechanism for
providing differentiated classes of service for Internet traffic, included
data, voice, multimedia etc. Differentiated services are expected to be an
integral part of tomorrow's Internet. There are already a number of data
networking vendors supporting some form of differentiated services in their
hardware. The DiffServ approach to providing quality of service (QoS) comprises
of marking a small bit-pattern in the IP header of a packet (TOS byte in IPV4
or Traffic Class byte in IPV6) and providing particular packet forwarding
services (per-hop behavior or PHB) at each router. Michigan's ATLAS team
members, Andy Adamson and Abhijit Bose have been active in the DiffServ working
group, building testbeds of DiffServ-enabled prototype networks on the
Michigan campus. Recently, Bose along with Prof. Kang Shin and his graduate
students in the Electrical Engineering and Computer Science department build a
Linux-based DiffServ network in the Real-Time Computing Laboratory that will
serve as a testbed for feasibility study of employing DiffServ in the wide area
networks. They have successfully demonstrated providing guaranteeing
performance of real-time streamed video data over a congested network in the
prototype network. They are currently developing a framework that will enable
existing network applications such as ftp, telnet, web browsers to reserve,
monitor and utilize high bandwidth networks (such as QBone and Internet2)
without any change in the source code. The knowledge, expertise and software
gained from such experiments and testbeds can be directly applied to the ATLAS
Grid project. We envision an ATLAS Grid consisting of high-speed networks
(using both public as well as private networks) connecting experimental and
computational sites with transparent QoS and differentiated service levels
built-in within the Grid.
[Ref:
http://www.engin.umich.edu/labs/cpc/diffserv.html]
Challenges and Efforts to Bring the Grid Online
Several key software technologies must be deployed in a sustainable manner for
the Grid to become an effective and reliable computational tool for scientists.
These key components of the Grid include authentication, authorization,
resource identification and discovery, and account management and accounting.
CPC and CITI are actively involved in performing the research necessary to
bridge the gap from development to deployment of these Grid facilities. The
University of Michigan has a long and distinguished track record in bringing
cutting edge technologies into production to support academic communities. The
Institutional File System project and Uniquename system is an example of this
success. Over 120,000 users are actively managed by the production systems put
into place at the university to provide access to systems from UNIX
workstations to online student billing. A recent example of this leadership is
the development of an X.509 based certificate system that can transform
Kerberos service tickets to short-lived X.509 certificates for web browsers.
Current challenges identified by CPC and CITI staff in bringing Grid
technologies online includes creating a national system of peered Certificate
Authorities (CA) with mutual trust and Service Level Agreements (SLAs) to
provide secure and reliable key signing and verification services. Research
into utilizing smart card technologies with traditional password based
authentication to store, retrieve, and forward X.509 certificates. In the area
of account creation, one of the staff members of CPC (Hacker) recently
submitted a paper on predicting the peak number of computational jobs and
unique users to provision a Grid system to provide a predictable grade of
service for dynamically assigning accounts and disk space on demand for a
stream of users from an unbounded pool of users from the Grid. To provide
authorization services, CPC and CITI staff are working on integrating an X.509
based authorization system (Akenti) into the production systems at U-M and the
Grid. Finally, in the area of accounting, one of the CPC staff (Hacker) is
actively involved in research in defining accounting and exchange of computing
resources as co-chair of the Accounting Working Group with William Thigpen of
NASA IPG in the Global Grid Forum. In addition to this work, Hacker is working
with NPACI systems staff in defining and developing systems for accounting data
interchange between NPACI systems within the NPACI partnership
.
Network Resources at Michigan
The CPC is connected to Internet 2 via a dedicated OC-12 622 Mbps link, and
several systems at the CPC are now connected to the OC-12 link with fiber optic
1000 Mbps gigabit Ethernet links (see Appendix A for a detailed network map).
For the next year, the CPC (Hacker) plans to work on tuning the OC-12 uplink,
local networks, and hosts with help from Matt Mathis at Pittsburgh
Supercomputing Center and the Web100 project. Additionally, the CPC (Hacker)
will participate in a 1000 Mb/sec test fiber network on campus that will
provide a direct (1 hop) connection to the ATLAS research group in Physics and
the gateway to Merit.
Network Analysis and Research at Michigan
QoS and network performance are very important to the ATLAS computing model
The success of the ATLAS project is dependent upon the ability of the
networking infrastructure to move petabytes of data. This goal is difficult to
realize today given the current poor configuration and tuning of local and
global networks. The facilities to perform precise network tuning, and
functioning Quality of Service mechanisms must be deployed in the local,
national and global network paths between ATLAS sites. Research currently
underway at the U-M and within the Grid community are addressing these
problems.
The first problem, network tuning, will be addressed with the combination of
targeted deployment of test machines with a measurement infrastructure. Test
probe machines will be installed at U-M, Merit, Internet2, CERN, and other
ATLAS sites to allow ATLAS administrators to accurately baseline and tune
actual network bandwidth between ATLAS sites. Software from PSC (Web100) will
be used on the end hosts to provide host and application specific network
tuning functionality, which will be used in conjunction with raw network
performance measured by the test probes at ATLAS sites. The combination of
network and host tuning will provide a mechanism for ATLAS applications to
dynamically control and tune application characteristics to dynamic network
conditions to maximize throughput.
The second problem, QoS, will be addressed with the results of the QoS work of
Andy Adamson, along with the results of local area network QoS research
currently being performed by staff members of CPC. This project is targeted at
providing differentiated service end-to-end in a wide-area network
(WAN) setting. The initial phase is internally funded by UM, Argonne, UCAID and
CERN and is focusing on implementing the signaling necessary to provide QoS.
Our group is actively seeking further funding from both DOE and NSF to create a
deployable toolkit which can enable QoS within a campus domain. The goal of
the QoS project is to integrate the Public Key Infrastructure (PKI) based
Akenti authorization system into Argonne National Lab's Globus based bandwidth
broker and the Internet2 Middleware working group LDAP directory schema to
provide a scaleable authenticated means to request network quality of service.
As a critical component of this work, Adamson has developed a test suite to
validate the ability of various hardware platforms (Cisco, Torrent, and others)
to actually deliver the reserved network bandwidth. [Ref:
http://www.citi.umich.edu/projects/qos/]
Synthesis of these two approaches will provide two key facilities to ATLAS
applications: first, the ability to pre-reserve required bandwidth with the QoS
facilities, and second, the ability to actually use the bandwidth up to the
QoS limits imposed by the network.
Michigans NPACI Grant
The NPACI (National Partnership for Advanced Computational Infrastructure)
project is an NSF-funded grant that supports infrastructure (facilities, staff,
and equipment) for high performance computing, including data-intensive
computing. The lead institution in the NPACI partnership is UCSD, which
operates the San Diego Supercomputer Center. In addition to UCSD and the
University of Michigan, the other major resource partners are Caltech, UC
Berkeley, and the University of Texas. The UM is a "mid-range" site,
operating and maintaining a mid-sized parallel computing facility that provides
computational cycles, data resources, and expert consultation to the users at
the UM as well as the national NPACI community. Approximately 75% of the
computing facilities are allocated for usage by the NPACI user community and
the remainder by the UM community, reflecting the approximate share of the
funds to purchase these systems. The NPACI allocations on these systems are
peer-reviewed.
The NPACI focus is development and deployment of infrastructure (hardware,
software, and people) for high end computing, including data and CPU intensive
computing. A substantial effort has been expended in the grid area, since this
is clearly the path for the future. The NSF provides about $1M per year to the
UM to carry out its role as a NPACI midrange site and these funds provide
support for the parallel computing facility, the data-intensive facility, and
support staff. Since a major objective of an ATLAS Tier 2 Center is to provide
resources to its community of users, such as computing infrastructure for ATLAS
applications, there is a natural linkage and considerable overlap between the
goals of the Tier 2 Center and the UMs NPACI project and this offers
substantial opportunities for leveraging resources as will be discussed later.
IV. Current ATLAS-related Computing Activities at Michigan
The University of Michigan now has 15 physicists, 4 graduate students, 3
technicians, and 11 other staff and students working in ATLAS. Our major
responsibilities include research & development and production activities
associated with the MDT chambers for the forward muon spectrometer, specific
electronics for the muon chambers, and a variety of computer related tasks.
The following activities are currently underway in direct support of our ATLAS
activities and are particularly pertinent to the goals of our proposed Tier 2
Center:
Quality of Service networking (QoS)
- The University of Michigan is actively involved in QoS efforts in both the
Grid computing community and the Internet 2 community. This work will be a key
technology component of the Universitys contribution to the ATLAS
project. Since networking and QoS are so critical to both the grid computing
model and ATLAS we feel our efforts in this area are an important aspect of the
development of a prototype Tier2 center. Furthermore, our work on networking
and QoS is resulting in significant network infrastructure improvements, which
directly benefit the Tier2 center.
ATLAS involves transfer of large volumes of data among the participating
sites. These data sets can be time-critical and may require a quality of
service guarantee from the networks that are used to transfer, access and
modify these data sets. The research and prototyping work done at the
University of Michigan in the areas of QoS, differentiated services, network
performance and analysis will be directly applicable to the ATLAS Grid and the
underlying networks that connect the various sites. We address specifics of our
contributions to date in a later section.
Grid computing for ATLAS
- UM is one of the testbed sites for Grid Computing in the US ATLAS Grid
effort and has recently hosted a grid testbed meeting for US ATLAS. With this
proposal we now merge the efforts of the UM ATLAS physicists with those
involved in the grid effort at the CPC. The resulting synergy between Physics,
CPC and NPACI will result in very significant grid effort for ATLAS. Combining
this expertise in the context of a prototype US ATLAS Tier 2 center will be
unique, providing a critical connection between Tier 2 development and the grid
computing model.
Coordination of the ATLAS Muon Database
- S. Goldfarb is the coordinator for the ATLAS Muon Database Task. His
responsibilities include the design of the event and detector description data
models for the muon spectrometer, and the development of software interfaces to
the data stores. As a contribution toward this task, he and S. McKee have
taken on the leading role in the development of an XML-based description of the
muon spectrometer geometry and readout. This generic description will be used
as the single source of input for the applications, which simulate, reconstruct
and visualize muon spectrometer data. Preliminary tests of the model are
concentrating on the usage of the description to generate GEANT4 geometries.
[Ref:
http://atlas.physics.lsa.umich.edu/aspdb/main.asp]
Other activities
- There are a number of additional software and computing projects in support
of the ATLAS project:
These activities provide a broad and rich context conducive to Tier 2
development and prototyping.
In addition to a strong collection of individuals already in the Michigan ATLAS
group, including Jay Chapman, Bing Zhou, Homer Neal, Edward Diehl, Zhengguo
Zhao, Myron Campbell, Shawn McKee, Dan Levin, Steven Goldfarb, Tieshing Dai,
and Eric Myers, Bill Martin is a recent addition who will be directing the
proposed center. He is a professor of nuclear engineering at the UM and has
deep research interests and experience in computational physics. Martin has
been very active over the past two decades in Monte Carlo methods development
and the use of high performance computers in scientific computing. He is the
founding director of the UMs Laboratory for Scientific Computation and
was the founding director of the Center for Parallel Computing. He is currently
director of the UMs NPACI (National Partnerships for Advanced
Computational Infrastructure) grant, which will play a major role in assuring
the success of the Tier 2 effort at Michigan. The NPACI project is described in
more detail in a following section. Attachment A includes short vitae for the
above people and other participants on this project.
V. Leveraging Current UM Resources for the Prototype Tier 2 Center
As noted throughout this proposal, the UM is engaged in many activities that
relate directly to the prototype Tier 2 Center goals and objectives. These
ongoing activities are an important part of our proposal because they allow us
to carry out the role of the prototype Tier 2 Center with minimal new funding
for staff and equipment. For convenience, we have summarized in this section
all of the current resources at the UM, including staff resources as well as
physical resources, that will contribute to the operation of the Tier 2 Center.
Staff Resources
The following faculty and staff will be directly involved in the activities of
the Tier 2 Center. Estimates of their effort are included, where Direct
participation means activities directly related to the activities of the
Tier 2 Center, whether supported or not on Center funds, and Related
efforts pertains to other activities that contribute less directly to the
Tier 2 Center.
William R. Martin
. Martin is currently the director of the Laboratory for Scientific Computation
and Director of the NPACI Project. He spends about 20% of his time on these
activities at the current time. He will spend about 20% of his time to direct
the Tier 2 Center, and it is proposed that half of this, or 10%, be supported
directly by the Tier 2 Center funding. He will also be involved in the GEANT4
simulation and development efforts as part of the UM ATLAS team.
Direct participation:
.2 FTE Related efforts:
.2 FTE
Homer A. Neal
. Neal is currently a physicist in the ATLAS Collaboration, Director of the UM
ATLAS Collaboratory Project, and Institutional Representative for the
University of Michigan in the ATLAS experiment. He is part of the UM MDT Muon
Detector Group, has worked with S. McKee on the development of the MDT
Production Database and is part of the UM group exploring the development of a
ATLAS Trigger Database. Neal has also played a central role in the coordination
of the current QoS research underway at the University of Michigan, involving
researchers from the UM ITD Division, UM Physics, Internet2, Merit, CERN and
Argonne National Laboratory. The ATLAS Collaboratory Project has been active in
the development and implementation of web-based archiving tools, with ongoing
test applications occurring in the archiving of the CERN Summer Lecture Program
and in the delivery of GEANT4 training lectures (e.g., via web archiving of
the GEANT4 Workshop that was held in Ann Arbor in February 2001). Neal would
devote 15% FTE toward direct activities in the Tier 2 Center, and 50% toward
related activities.
Direct participation: .15 FTE
Related efforts: .5 FTE
Shawn McKee
. McKee is currently the physicist in charge of the UM muon production
database. Additionally he is active in the US ATLAS grid testbed effort, the
ATLAS muon detector description and simulation effort and is involved in
network research related to the grid computing model. He is committed to
/facilities/dit/wan/ ATLAS for 20% of his time through 2005 and is working on a
number of related activities that will contribute to the Tier 2 Center. He
will lead the ATLAS Integration area of the prototype Tier 2 Center.
Direct participation:
.2 FTE Related efforts:
.6 FTE
Eric Myers
. Myers is currently a research physicist working in the ATLAS Collaboratory
Project. His special skills include computer networking and network security
issues. He is playing the lead role in several of the UM web-based archiving
initiatives (e.g., the web archiving of the current GEANT4 Workshop in Ann
Arbor). He is also involved in grid computing research and development. He
hosted, with Shawn McKee, the recent US ATLAS Grid Workshop in Ann Arbor. In
that event one of his key presentations was on how to install and bring up the
Globus software on multiple platforms. He would be available to directly
contribute 50% of his time to Tier-2 Center activities. An additional 25% of
his time would be devoted to Tier-2 related activities.
Direct participation:
.5 FTE Related efforts:
.25 FTE
Thomas Hacker
. Hacker is currently a Systems Project Coordinator responsible for managing
the systems and systems staff for the CPC. He is actively involved in
developing and deploying cluster computing, grid computing, mass storage
systems, and network measurement and tuning. He will be devote 20% of his time
to Tier-2 Center efforts, and 60% to related efforts that have a direct impact
upon Atlas activities. Hacker will lead the Systems Support area of the
prototype Tier 2 Center.
Direct participation: .2 FTE
Related efforts: .6 FTE
Abhijit Bose.
Bose is currently a computational scientist at CPC and has responsibility for
user applications and support for all CPC computing platforms. He also
participates in the Strategic Applications Collaborations (SAC) program of
NPACI where staff from the resource partners team up with scientists in
optimizing parallel codes across NPACI systems. He is also a member of the Real
Time Computing Laboratory (RTCL) in the Electrical Engineering and Computer
Science Department at Michigan where he is working on the differentiated
services (DiffServ) framework for QoS networking. He along with other
researchers at RTCL recently built a DiffServ testbed using high-performance
Linux workstations and off-the-shelf routers; and demonstrated the feasibility
of managing and delivering bandwidth to specific applications and users over a
single network for a commercial customer. He will be involved with network QoS
for the proposed ATLAS Grid and will lead the efforts related to the Research
area of the prototype Tier 2 Center.
Direct participation:
.2 FTE Related efforts:
.25 FTE
Randall Crawford.
Crawford is currently a computational scientist at CPC and has responsibility
of user applications and support for all CPC computing platforms. He is also
working with the DICE research group at SDSC on Storage Resource Broker in
support of ODBC/Sybase. He has expertise with parallel application development
on Very Large Databases (VLDB), benchmarking of file system and parallel
computing systems. He was the lead engineer on an operational decision support
system consisting of over four billion events and terabyte of storage. He will
be involved with benchmarking initiatives for optimal performance of the ATLAS
Grid as well as consulting and user support activities.
Direct participation:
.1 FTE Related efforts:
.2 FTE
David Woodcock.
Woodcock is a user-support specialist at CPC focusing on parallel mathematical
and scientific libraries. He was one of the designers of AGM, which performs
automatic matching of the spot patterns from 2-D electrophoresis developed by
BioImage Inc. in Ann Arbor. He also wrote Rhapsody - a parallel applications
package built on PVM. At CPC, he has been working with faculty from the
Engineering and Medical schools on a variety of projects including design of
experiments, computer simulation of the spread of the HIV virus, among others.
He will be involved with consulting and user-support activities of the ATLAS
project.
Direct participation: .1 FTE
Related efforts: .2 FTE
Rod Mach
. Rod Mach is currently a systems research programmer for CPC. His special
skills include administering AFS/Kerberos , SP2 administration, web
programming, and systems security. He is currently working on the
Michigan-Texas grid, AFS installation across NPACI, and assisting with the
Legion and Globus efforts on the SP3.
Direct participation:
.1 FTE Related efforts:
.2 FTE
Matthew Britt
. Matthew Britt is a system administrator working for CPC. His focus is on
large-scale data storage, and has been involved in network benchmarking,
grid-related storage projects (SRB) and Linux cluster design and implementation
.
Direct participation:
.1 FTE Related efforts:
.2 FTE
Steven Goldfarb
. Steven Goldfarb is a Senior Research Fellow at the University of
Michigan and a member of the UM ATLAS Group. He is the task Leader for the
ATLAS-wide Muon Database Project. He is responsible for the planning and
development of the core database software for the ATLAS Muon Spectrometer. This
work involves the design and implementation of software interfaces to the
transient and persistent stores for the Detector Description and Event data and
the integration with the Conditions and Production data. He has organized and
chaired workshops on HEP detector description languages and has been involved
in efforts to promote XML for use in ATLAS detector description applications.
He is permanently located at CERN and would provide a valuable presence there
for our Tier-2 initiatives involving CERN.
He would devote 25% of his time to direct Tier-2 activities, and 50% of his
time to related activities.
Direct participation: .25 FTE
Related efforts: .5 FTE
Others from ATLAS group
. Several other University of Michigan physicists are expected to contribute
both directly and indirectly to the operation of the Tier-2 Center. These
individuals include Wolfgang Lorenzon, Dan Levin, Tieshing Dai, Jay Chapman,
Bing Zhou, Edward Diehl, Zhengguo Zhao, and Myron Campbell. The estimated total
direct contribution is .5 FTE, and the total related contribution is 1.0 FTE.
Direct participation:
.5 FTE
Related efforts: 1.0 FTE
Summary of Leveraged Staff Resources
. The total number of direct and related FTEs that will contribute to the
prototype Tier 2 Center at Michigan is given below.
Direct participants:
2.6 FTE (not including new hires)
Related participants:
4.7 FTE New staff
2.0 FTE (by year 2)
Total participation: 9.3 FTE
Physical Resources
The physical resources that will be leveraged for the Tier 2 Center includes
computing resources, data resources, and networking resources and are listed
below.
Computing resources
. The new Pentium III cluster with 128 nodes will be a shared facility that
will accommodate several faculty groups at the UM. Approximately 16 of the
nodes will be available on any everyday basis, based on the proposed
expenditure of project funds to complete the cluster, while a larger fraction
of the cluster will be available for occasional demonstration or development
usage. The configuration of each host computer in the cluster is: two 800Mhz
Intel Pentium III CPUs, 1 GB RAM, 40 GB (or larger) drive, and an Intel Gigabit
Ethernet adapter for switch interconnect.
Also, access to the SP systems will be available to the Tier 2 Center.
Occasional access such as for development or demonstration will be provided by
the UM. Occasional production usage of the SP system will also be provided
although heavy production usage will be accommodated by applying to NPACI for a
large allocation on the SP system. We anticipate no difficulty in obtaining
large blocks of time on the SP system if needed through this NPACI allocation
process.
Data resources.
The Center for Parallel Computing provides several levels of data storage,
varying by capacity, I/O rate, and longevity of storage. Local scratch space on
the disks on each node is available for researchers to use on a short-term
basis for high speed access to imported data sets or to store output from job
execution. To provide researchers with secure network attached storage, private
and shared disk space is made available via AFS. Datasets stored in AFS can be
downloaded, uploaded and accessed by AFS client software running on hosts
anywhere in the world. To meet the needs of researchers that require a large,
unlimited storage system, CPC operates a mass storage system with an overall
capacity of 42TB based on Tivoli that is available via NFS.
Network resources
. The CPC has a direct 622 Mbs OC-12 link through the university network to
Abilene to a Cisco 5500 switch that provides 100 Mb/s FDDI,1000 Mb/s Gigabit
Ethernet and 100 Mb/s 100BaseT connections to the systems operated by the CPC.
The system that will be utilized by Atlas will have a Gigabit Ethernet link to
the Cisco 5500, and will have 622 Mb/s access to the Abilene network. Work is
under way to connect the CPC to Physics and Merit via a Gigabit Ethernet link.
Other Michigan Resources
In addition to the above resources that will be directly used during the
operation of the prototype Tier 2 Center, there are other resources that may be
taken advantage by the Center. These include facilities for conducting
conferences and workshops as well as facilities and staff to produce
high-quality web-based distance training modules.
Conference/workshop facilities
. To provide facilities for ATLAS meetings and training sessions, the
University of Michigan Media Union houses a Conference Center, consisting of
five rooms available for meetings and workshops, two teleconferencing rooms,
and two training rooms equipped specifically for the use of computer training.
[REF: <
http://www.ummu.umich.edu/facilities/teleconference/teleconf.html> <
http://www.ummu.umich.edu/facilities/training/training.html> <
http://www.ummu.umich.edu/facilities/conferenceCenter/confrm.html> ].
The Media Union teleconferencing facilities consists of a meeting room that can
be divided into two separate halves, a foyer suitable for coffee or light
lunch buffet service, a coatroom, and a small catering kitchen. There are two
lecterns, each outfitted with a built-in computer (one a Pentium Pro and the
other a Macintosh) and built-in controls to other equipment. Each half-room has
a whiteboard, visualizer, rear-projection screen, projector, traditional
projection screen, internet hookups (802.11 wireless and wired), and power
outlets in the floor, and can project a Powerpoint, Web, or other presentation.
The room can be configured for videoconferencing using Polycom over dedicated
ISDN lines.
Each of the Media Union training rooms have forty-four workstations and can be
divided into four separate quadrant teaching areas, each with its own
instructor's station. Instructors can easily connect a laptop computer to the
projector, and an overhead screen projector and VCR are also available.
Monitors for the student workstations are recessed into the desks, allowing
easier communication among students and instructors.
Distance learning and online education and training.
For ATLAS education, outreach, and training functions, the Center for
Professional Development (CPD) at the University of Michigan College of
Engineering can provide comprehensive program development, instructional
technology, and support services. This unit has developed and delivered
successful programs and services for over 30 years, using progressively
innovative technologies to serve hundreds of thousands of users. Resources
include two-way videoconferencing, satellite uplink, multimedia/web-based
program development, and broadcast quality video production capabilities.
Through CPD, the University of Michigan serves as the hub for the National
Partnership for Advanced Computational Infrastructure (NPACI) Distance
Training, and provides on-demand training and tutorials for new users of a
variety of NPACI resources that are distributed across the NPACI partner sites.
Details about additional CPD capabilities and examples of web-based training
and education programs are available at the Showcase on
http://cpd.engin.umich.edu and at the Michigan Engineering Online site,
http://meonline.engin.umich.edu or by email at borbely@engin.umich.edu.
VI. Budget
The following budget reflects the costs for two years operation of the proposed
prototype Tier 2 Center. It represents half of the costs for W. Martin to
direct the center, one full-time staff, and a modest equipment budget. The UM
will provide cash costsharing for half ($40,000) of the equipment costs.
Year 1
Year 2 Notes
Salary (Director) $13,500
$14,000 10% of academic year salary
Salary (new staff 1) 60,000
62,500 Systems support
Salary (new staff 2) ---
75,000 Systems support
Benefits 19,600 40,400
26.7 % Supplies
4000 4000
Travel 4000 4000
Subtotal 101,100 199,900
Equipment 80,000 ---
Indirect costs
51,561 101,949 51% on Subtotal
Total Project Costs $232,661
$301,849
UM cash costsharing
$40,000 --- 50% of equipment
Funds to be requested $192,661
$301,849
VII. Relevance to Selection Criteria for a Tier 2 Center
This section presents a quick overview of the selection criteria listed in the
document and a brief summary of the UM response to the criteria.
The first criterion is that any prototype Tier 2 Center must be acceptable to
the NSF. Since the UM is an institution of higher learning with many
NSF-supported research grants, this is not an issue.
The second criterion is that the chosen site must be active in Grid research.
Michigan is already an active part of the US ATLAS grid testbed. Also, as part
of our networking and QoS work we are contributing to the Globus authentication
and authorization model. The NPACI group here has independently been active in
grid issues as discussed above and with their joining in the ATLAS effort,
Michigan will have a significant grid presence.
The third selection criteria relates to a technically capable PI. Martin
certainly meets this criterion. His work with developing the Michigan NPACI
site is very similar to work which would need to be done developing a Tier 2
Center. In addition, he been involved in the past with the GEANT code and has
substantial experience with using and developing general purpose Monte Carlo
codes.
The fourth point discusses leveraging existing infrastructure and resources. We
estimate that ~ 2.5 FTEs of current faculty and staff will be directed towards
activities of the Tier 2 Center while another ~ 4.5 FTEs will contribute to
the overall computing environment and will help make our Tier 2 Center a
success. In addition, the UMs computing and data resources will be
heavily utilized with modest investments by the Tier 2 Center.
The last major criterion is WAN connectivity. Michigan already has network
access that is the equal of any ATLAS site. Our current Internet2 connection is
OC12 and could easily be upgraded to OC48 if there is need. We are also
directly connecting some of our machines a special gigabit campus network,
which bypasses the bulk of the campus traffic. Currently we are the only site
we know of which provides at least 622 Mbps connectivity directly from our
computers to Internet2.
In conclusion, we believe the case for selecting the University of Michigan as
a US ATLAS Prototype Tier-2 Center is a strong one. We know of no other site in
US ATLAS that is better suited to meeting the criteria that have been
announced. We have a collection of talented, dedicated computer professionals
and physicists who could make an enormous contribution toward the attainment of
the goals necessary for the success of ATLAS Computing in the US and
worldwide.
Appendix A
CPC High Performance Network Topology
Appendix B
Curriculum Vitae
William R. Martin
Professor, Department of Nuclear Engineering and Radiological Sciences
Director, Laboratory for Scientific Computation
University of Michigan
Ann Arbor, MI 48109-2104
wrm@umich.edu
Currently on sabbatical leave at Los Alamos National Laboratory through June
2001
Personal
Married with two children
U.S. citizen
DOE Q - clearance
Educational
B.S.E., Engineering Physics, University of Michigan, 1967.
M.S., Physics, University of Wisconsin, 1968.
M.S.E., Nuclear Engineering, University of Michigan, 1975.
Ph.D., Nuclear Engineering, University of Michigan, 1976.
Professional
1969-73 : Officer, U.S. Navy, Naval Reactors Division, Washington, D.C.
1976-77 : Principal physicist, Combustion Engineering, Inc.
1977-present : Professor, Department of Nuclear Engineering, (1989) ;
Associate
Professor (1981-88); Assistant Professor (1977-81).
1990-94: Chair, Department of Nuclear Engineering.
1986 - present: Director of the Laboratory for Scientific Computation.
1993-94: Director of the Center for Parallel Computing.
1994-99: Associate Dean for Academic Affairs, College of Engineering.
Honors and Awards
Distinguished Scholar Award, College of Engineering, Univ. of Michigan,
1967.
Royal Society Visiting Scholar, Imperial College, London, 1989.
Glenn Murphy Award for Outstanding Contributions to the Profession and
Teaching of
Nuclear Engineering, American Society for Engineering Education, 1993.
Fellow of the American Nuclear Society, 1995.
Relevant Experience
Martin has been active in developing numerical methods for application to
nuclear engineering problems, including neutron and photon transport,
thermal-hydraulics and heat transfer, and reactor plant simulation. His focus
in recent years has been the development of Monte Carlo methods for electron
and photon transport and their adaptation to advanced computer architectures,
including massively parallel processors and distributed networks of
workstations. Other areas include the development of numerical methods for
solving the Boltzmann transport equation, including the use of finite element
methods and discrete ordinate methods. He has also worked on advanced fuel
designs and fuel cycles for light water reactors and new core designs for
research reactors. Martin has organized three short courses for IBM Corporation
in the area of scientific computation and advanced computer architectures. He
has been the Director of the UM Laboratory for Scientific Computation
since its creation in 1986. Martin led the successful initiative to develop a
high performance computing facility at Michigan, which culminate
d with NSF and Ford Motor Company funding for the
Center for Parallel Competing
, for which he was founding Director. He has published approximately 90 journal
articles and conference proceedings and has graduated 22 Ph.D. students while
a faculty member at Michigan.
Editorial Service
Associate Editor, Journal of Computational Physics
(Academic Press, Inc.)
Advisory Editor,
Nuclear Science and Engineering (American Nuclear Society)
Member of the Editorial Board,
Transport Theory and Statistical Physics (Marcel Dekker)
Consulting Activities
Argonne National Laboratory
(1978-79) -- neutronics methods for research reactors.
Bendix Aerospace Systems
(1980-81) -- neutron and gamma transport computations.
Los Alamos National Laboratory (1980-82) -- vectorized Monte Carlo.
Consumers Power Company
(1981-82) -- nuclear reactor operator training.
International Atomic Energy Agency
(1982-83) -- low-enrichment research reactors.
International Business Machines Corporation
(1984-85) -- the development and
presentation of three short courses in scientific computation.
Rockwell International Corporation
(1985) -- vector supercomputers.
Viking Corporation
(1987) -- review of reactor plant simulation code for BWR transient
simulation (for the Nuclear Regulatory Commission).
Los Alamos National Laboratory
(1986 - 89) -- Monte Carlo and particle-in-cell methods
for particle transport on hypercube parallel processors.
Winfrith Atomic Energy Establishment
(1989) -- Monte Carlo on advanced computers.
Applied Research Associates
(1990) -- Monte Carlo methods on advanced computers.
Lawrence Livermore National Laboratory
(1982 - present) -- vectorized Monte Carlo;
computational transport methods; advanced computer architectures, Monte Carlo
methods.
Selected Publications
Book
J. J. Duderstadt and W. R. Martin, Transport Theory
, Wiley-Interscience, New York (1979).
Chapters in Books
1.
D.K. Wehe, J.C. Lee, W.R. Martin, R.C. Mann, W.R. Hamel, and J. Tulenko, "
Intelligent Robotics and Remote Sensing Systems for the Nuclear Industry,"
Nucl. Eng. Des. 113, 259 (1989). 2.
W.R. Martin, "Monte Carlo Methods on Advanced Computer Architectures,"
; Advances in Nuclear Science and Technology 22
, 105-164 (1992).
Chapter in Electronic Textbook
W.R. Martin and P. Burns, "Introduction to Monte Carlo Methods", in
the electronic textbook High Performance Computing
, Computational Science Education Project, Department of Energy,
http://csep1.phy.ornl.gov/csep.html, 1995.
Archival Journal Articles from 1990
1.
W.R. Martin and G.C. Pomraning, "Monte Carlo Analysis of the
Backscattering of Radiation from a Sphere to a Plane,"
J. Quant. Spect. Rad. Transfer 43, 115-126 (1990).
2.
J.L. Vujic and W.R. Martin, "Vectorization and Parallelization of a
Production Reactor Assembly Code," Prog. in Nuclear Energy 25
, 291-305 (1991).
3. J.S. Koopman, W.R. Martin, et al., Assessing Risk Factors for Transmission
of Infection, American Journal of Epidemiology 133
, 1199-1209 (1991).
4.
W.R. Martin, "Monte Carlo Methods on Advanced Computer Architectures,"
; Advances in Nuclear Science and Technology 22
, 105-164 (1991).
5.
M.L. Adams and W.R. Martin, "Diffusion-Synthetic Acceleration of
Discontinuous Finite Element Transport Iterations,"
Nuclear Science and Engineering, 111, 145-167 (1992).
6.
C.T. Ballinger, J.A. Rathkopf, and W.R. Martin, "The Response History
Monte Carlo Method for Electron Transport,"
Nuclear Science and Engineering, 112, 283-295 (1992).
7. C.T. Ballinger, D.E. Cullen, S.T. Perkins, J.A. Rathkopf, W.R. Martin, and
S.J. Wilderman, "Single-scatter Monte Carlo compared to condensed history
results for low energy electrons,"
Nuclear Instruments and Methods in Physics Research B72, 19-27 (1992).
8.
W.R. Martin, "Undergraduate Education in Nuclear Engineering in the
USA," J. of Radioanalytical and Nuclear Chemistry, 171
, 183-192 (1993).
9.
S.E. Aumeier, J.C. Lee, D.M. Cribley, and W.R. Martin, "Cross Section
Parameterization Using Irradiation Time and Exposure for Global Depletion
Analysis," Nuclear Technology, 108, 299-319 (1994).
10.
D.C. Losey, J.C. Lee, W.R. Martin, and T.C. Adamson, "Singular
Perturbation Solutions of the Neutron Transport Equation,"
Nuclear Science and Engineering, 123, 68-85 (1996).
Conference Proceedings from 1990
1.
R.E. Grove, W.R. Martin, J.C. Lee, A. Oukebdane, and M.C. Edlund, "
Preliminary Design and Analysis of a Slightly-Enriched Spectral Shift
Reactor,"
Int. Conf. on the Physics of Reactors: Operation, Design, and Computation
, Marseille, France (April 1990).
2.
J.L. Vujic and William R. Martin, "Two-Dimensional Collision Probability
Method with Anisotropic Scattering for Vector and Parallel Processing,"
International Conference on the Physics of Reactors: Operation, Design, and
Computation, Marseille, France (April 1990). 3.
W.R. Martin, "The Impact of Advanced Computers on Nuclear Reactor
Analysis,"
Third Int. Seminar on Finite Element and Allied Methods in Radiation Transport
, Imperial College, London (April 1990). [invited] 4.
J.L. Vujic and W.R. Martin, "Vectorization and Parallelization of a
Production Reactor Assembly Code,"
Third Int. Seminar on Finite Element and Allied Methods in Radiation Transport
, Imperial College, London (April 1990).
