1.  Michigan ATLAS Group Computing
      1. Motivation for Proposed Computing Facility
      2. Description of UMROCKs Cluster and Prototype Computing Facility
      3. Michigan-MSU Tier-2 proposal
      4. Grid and Network Research and Development at Michigan


 Michigan ATLAS Group Computing

The University of Michigan ATLAS Group is one of the largest groups in the ATLAS Collaboration and, as such, has made major contributions to the construction and commissioning of the ATLAS detector as outlined in this proposal. We expect to play a leading role in calibrating and validating the performance of the ATLAS muon system during the era of LHC operations. Our ultimate goal, of course, is to participate fully in the ATLAS physics analysis with particular emphasis on W/Z/Higgs events with final-state muons.

The basic ATLAS computing philosophy is based on a grid-linked system of tiered computing centers. This philosophy has emphasized the leveraged acquisition of computing hardware and operational personnel with little regard for mission-oriented computing activities. While our group, along with the ATLAS group at Michigan State, has submitted a strong Tier-2 proposal to US ATLAS there is no guarantee that we will be selected. As responsible and dedicated scientists, we must plan to implement significant ATLAS computing resources, independent of the outcome of future Tier-2 competitions. We count on our funding agencies to assure that a major group such as our Michigan ATLAS group will be properly supported in acquiring and operating the necessary computational facilities. The idea that a group would make major detector contributions and be denied the opportunity to participate fully in the physics analysis of the experiment is fundamentally unacceptable.

We have continued to utilize the HYPNOS cluster acquired last year (now called UMROCKS) to undertake significant simulation activities. UMROCKS is described in detail in the section below. During the last year we have had to deal with UMROCKs age. A significant amount of effort has gone into repair and replacement of failing components. The details of this plan are also described below as well as our prototyping of new replacement equipment to support our efforts.

Before describing our plans for this facility, we first motivate the scope and capability of required computing resources with a brief discussion of the kinds of calibration activities and physics analyses we intend to pursue in our group.


Motivation for Proposed Computing Facility

The Michigan ATLAS group is comprised, among others, of physicists who have worked with and analyzed the data from both the CDF and D0 detectors at Fermilab. This has naturally evolved into the interests resulting in the fabrication of the MDT chambers and the desire to ensure they function properly and yield good physics data. One of the many steps entailed in such an analysis, calibration of the detector and validation of its performance, can be performed periodically using the Z->mu-mu and W->mu-nu events generated during the normal course of LHC running. As noted elsewhere, Michigan is now one of three ATLAS Muon Calibration centers and as such we will require a significant amount of computational capacity to fulfill our responsibility in this area. In one month's running at standard luminosity some 100 Million such events will be recorded. It will require 160 CPUs and 180 TB of storage to analyze this one month’s worth of events in one month’s time.

This calibration and validation is just one step in any full scale physics analysis involving the ATLAS MDT system. Work has already begun on a number of physics analyses as described elsewhere.

This co-existence of necessary and ongoing detector monitoring and physics analysis requires a much larger computational and storage facility than that exemplified above. Further, such a system will require ongoing attention in and of itself, both keeping the hardware functioning and maintaining an up to date and secure software environment. One-half FTE of manpower will be dedicated to these tasks, supervising hourly help for low level work and coordinating between the analysis personnel and the ATLAS software management staff to ensure a smoothly functioning system with minimal disruptions.

Some of this computational load can in principle be served by the Tier-1 and Tier-2 US-ATLAS centers and if we are selected as one of the two remaining Tier-2 sites Michigan would be a logical place to carry out this work. Even so, Tier-2 resources are controlled by a US ATLAS Resource Allocation Committee (RAC) and we will still need local resources we can control to meet our groups computing needs.

While grid computing promises to deliver resources to the whole collaboration based upon prioritized policies, this type of capability is far from being realized and may not be effective until well after LHC startup. The only way we can be certain that we can fill our needs is through use of our own, dedicated and local resources.


Description of UMROCKs Cluster and Prototype Computing Facility

The UMROCKS computing cluster, formerly an NPACI resource, has been transferred to the Physics department into an appropriately modified laboratory space (2268 West Hall). The Hypnos cluster provides ~100 nodes of dual AMD Athlon MP 2000 (and 2400) processors, 2 GB of RAM per node, two 100 GB hard disks per node, a 1 TB RAID array and a FastEthernet switch for interconnections. While this cluster is over three and a half years old it is still a powerful computing resource capable of running ATLAS simulation, reconstruction and analysis jobs.

UMROCKS has given us extensive experience in system and cluster management. We have learned a number of lessons about reliability and robustness that we have integrated into our operations and future equipment planning. Understanding how best to organize and manage storage, networks and computing resources is critical to delivering a successful infrastructure to support our ATLAS work.

Because of the age of the UMROCKS nodes and the ever increasing demand for computational power and storage required to meet our ATLAS computing needs we are planning to acquire and deploy equipment in constant dollars per year. The University has been very supportive of our needs and is doubling our DOE computer equipment funding to help us acquire these resources. This model provides newer, more capable equipment each year, both to meet expanding demands as well as to replace outdated or failing equipment. By spreading out the purchases we are able to reap the benefits implicit in Moore’s law increases in capability for constant dollar amounts.

The following table shows the cost of providing 16 dual processor nodes per year (current equivalent is dual dual-core Opteron 285, 1U rack mount, 8 GB RAM), 4 "storage" nodes (disk servers, 11 TB / node in FY2006) as well as funds for parts (materials and supplies).

    CPUs Storage M&S Equip Tot
2006-7 $77,800 $67,200 $5,000 $150,000
2007-8 $77,800 $67,200 $5,000 $150,000
2008-9 $77,800 $67,200 $5,000 $150,000
The following table lists the capability we will acquire as a function of year. Note for storage we list the amount acquired each year in parenthesis after the integrated total.
   
Nodes
Node(TB)
Si2K
Disk(TB)
StorNodes
2006-7
16
12
100
44
4
2007-8
32
25
226
106.2(62.2)
4(8)
2008-9
48
43
391
194.2(88)
4(12)
This plan delivers the basic required resources for Michigan ATLAS computing.

We are prototyping a subset of this equipment to validate its suitability for our use. We have acquired (in May 2006) 1 storage node and 4 dual dual-core processor nodes for testing and plan too have testing completed by the end of July in time to guide our 2006-2007 purchases.


Michigan-MSU Tier-2 proposal

As mentioned our Michigan ATLAS group along with the Michigan State ATLAS group has proposed to be one of the final two US ATLAS Tier-2 sites. We have leveraged over $4M in institutional support over 5 years to deliver a strong proposal to US ATLAS.

If we are selected Shawn McKee would be Co-Director (at 50% time) and Bob Ball would be the joint Tier-2 Manager (at 100% time).

Our Tier-2 would be based upon the same type of systems outlined above and would benefit from excellent networking, strong institutional support and significant local computational, grid, simulation and analysis expertise. Because of MiLR (Michigan Lambda Rail) our Tier-2 site would be unique in having redundant 10 Gigabit Ethernet links between our distributed Michigan-MSU sites and Chicago.


Grid and Network Research and Development at Michigan

Michigan has played a significant role in networking and grids for ATLAS. Shawn McKee is the US ATLAS networking manager, co-Chairs the Internet2 High-Energy and Nuclear Physics (HENP) Working Group and is a founding co-Chair of a new Open Science Grid Networking Technical Group. From 2002-2005 he was the technical lead on the NSF Middleware Initiative (NMI) testbed at Michigan and is continuing to participate in SURAGrid. He is chairing the LHC-OPN monitoring effort which is focused on providing monitoring and measurement of the network connecting the world-wide LHC Tier-1 centers with CERN (Tier-0). He is also part of the Michigan MGRID project which is developing grid middleware and techniques easy access for distributed grid resources. Perhaps the most impact for ATLAS in the grid and network area will come from three newly funded projects which McKee is Co-PI on:

 
· UltraLight: A $2 Million NSF ITR funded by MPS (Mathematical and Physics Sciences) which is exploring advanced networking infrastructure in support of LHC scale physics. In addition to a number of CMS collaborators, Michigan, Brookhaven and SLAC are participants. The final two years of UltraLight are focused on integrating new capabilities into the computing models of ATLAS and CMS from this project.
· TeraPaths: A DoE/MICS funded project at BNL, with participation of Michigan, concentrating on developing MPLS/QoS capabilities for the Tier 1, Tier 2 and eventually Tier 3 computing centers for LHC. This work is being directly integrated into the Tier 1 efforts and we are closely coupled to ESNet and other HEP sites.
· GridNFS: A $1.2 Million NSF NMI Development project to create a “grid” aware version of NFS (Network File System) based upon the newly developed NFS V4 standard. We have an agreement to test our project within OSG and plan to work closely with the OSG Storage working group and CERN to make GridNFS accessible to LHC projects.

Back to top