1. ATLAS Great LakesTier-2 (AGL-Tier2)
  2. Shawn McKee (for the AGL Tier2) Shawn McKee (for the AGL Tier2)
  3. University of Michigan University of Michigan
  4. US ATLAS Tier US ATLAS Tier- -2 Meeting at Harvard 2 Meeting at Harvard
  5. Boston, MA, August 17 Boston, MA, August 17th th, 2006 , 2006

ATLAS Great Lakes
Tier-2 (AGL-Tier2)

Back to top


Shawn McKee (for the AGL Tier2)
Shawn McKee (for the AGL Tier2)

Back to top


University of Michigan
University of Michigan

Back to top


US ATLAS Tier
US ATLAS Tier-
-2 Meeting at Harvard
2 Meeting at Harvard

Back to top


Boston, MA, August 17
Boston, MA, August 17th
th, 2006
, 2006

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
2
The AGL Tier2
The ATLAS Great Lakes Tier-2 (AGL-Tier2) is a joint effort of
Michigan and Michigan State Universities.

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
3
AGL Personnel Involved
We have a large group of
We have a large group of
faculty between MSU and
faculty between MSU and
UM who will be involved
UM who will be involved
in related aspects of the
in related aspects of the
AGL
AGL-
-Tier2
Tier2
Table at the right shows
Table at the right shows
the expected contribution
the expected contribution
by name and year
by name and year

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
4
10GE Protected Network
We will have a single
We will have a single “
“/23
/23”
” network for the AGL
network for the AGL-
-Tier2
Tier2
• Internally each site (UM/MSU) will have a /24
Our network will have 3 10GE wavelengths on MiLR in a
Our network will have 3 10GE wavelengths on MiLR in a “
“triangle
triangle”
• Loss of any of the 3 waves doesn’t impact connectivity for both sites

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
5
Existing MSU Cluster

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
6
Current UMROCKS Cluster
We have a 5 rack AMD
We have a 5 rack AMD
Athlon
Athlon cluster with 70
cluster with 70
operational nodes
operational nodes
(2000/2400/2600 dual
(2000/2400/2600 dual
processor, 2 GB RAM)
processor, 2 GB RAM)
Two 100+GB disks
Two 100+GB disks
Plan to have ~100
Plan to have ~100
nodes operational
nodes operational
ROCKS V4.1
ROCKS V4.1

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
7
UMROCKS Installation
ROCKS V4.1 with
ROCKS V4.1 with
SLC V4.3 (32 bit) OS
SLC V4.3 (32 bit) OS
Significant additional
Significant additional
software:
software: TWiki
TWiki,
,
Cacti, Inventory,
Cacti, Inventory,
Syslog
Syslog, etc.
, etc.
Cluster running
Cluster running
dCache
dCache as well
as well

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
8
Ganglia Info on Athlon Cluster
Currently ~70 operational nodes
Currently ~70 operational nodes –
– plan for ~100 from parts
plan for ~100 from parts

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
9
AGL-Tier2 TWiki

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
10
Cacti Graphing/Monitoring

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
11
AGL-Tier2 Inventory Software

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
12
Existing Servers/Services
In addition to the UMROCKS cluster we have a number of servers/s
In addition to the UMROCKS cluster we have a number of servers/services
ervices
operational
operational
Two gatekeepers: Dual Xeon 3.6 (2MB cache), 4GB RAM, Intel SE752
Two gatekeepers: Dual Xeon 3.6 (2MB cache), 4GB RAM, Intel SE7520AF2
0AF2
motherboards, IMM card (IPMI) called gate01/gate02.grid.umich.ed
motherboards, IMM card (IPMI) called gate01/gate02.grid.umich.edu
u
AFS Cell
AFS Cell atlas.umich.edu
atlas.umich.edu hosted on linat02/linat03/linat04.grid.umich.edu with
hosted on linat02/linat03/linat04.grid.umich.edu with
file servers attera/linat06/linat07/linat08/atums1/atums2 (about
file servers attera/linat06/linat07/linat08/atums1/atums2 (about 6TB)
6TB)
NFS data servers umfs01/umfs02/linat09/linat10/linat11 hosting a
NFS data servers umfs01/umfs02/linat09/linat10/linat11 hosting about 22TB
bout 22TB
total
total
Hypnos.grid.umich.edu
Hypnos.grid.umich.edu is
is dCache
dCache headnode
headnode for UMROCKS
for UMROCKS
Have MonALISA node at ml
Have MonALISA node at ml-
-um.ultralight.org
um.ultralight.org and other monitoring services
and other monitoring services
Oracle server on one of the
Oracle server on one of the “
“prototype
prototype”
” systems for Calibration/Alignment DB
systems for Calibration/Alignment DB
replication
replication
Planned servers: NDT node,
Planned servers: NDT node, GridFTP
GridFTP, DQ2
, DQ2

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
13
OSG & ATLAS Software Status/Plans
As shown we have both AFS and NFS storage at our Tier
As shown we have both AFS and NFS storage at our Tier-
-2
2
We plan to install software on AFS (good for
We plan to install software on AFS (good for readonly
readonly type data). OSG
type data). OSG
(0.4.1) and ATLAS software already in AFS (/
(0.4.1) and ATLAS software already in AFS (/afs/atlas.umich.edu
afs/atlas.umich.edu)
)
ATLAS software is mirrored via
ATLAS software is mirrored via Pacman
Pacman on our AFS cell at:
on our AFS cell at:
http://gate01.grid.umich.edu/am
http://gate01.grid.umich.edu/am-
-UM/ATLAS.mirror
UM/ATLAS.mirror
All users have their home space in AFS. Our system is setup to
All users have their home space in AFS. Our system is setup to get
get
Kereberos
Kereberos TGT (and AFS Tokens) at login via
TGT (and AFS Tokens) at login via gssklog
gssklog (instructions on
(instructions on
TWiki
TWiki)
)
All OSG accounts created with
All OSG accounts created with “
“uniqname
uniqname”
” IDs
IDs

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
14
Prototype Opteron Cluster
Testbed for dual dual
Testbed for dual dual-
-core systems (Opteron 280s, 4GB ram)
core systems (Opteron 280s, 4GB ram)

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
15
Prototype Building Block Details
We have purchased 5 dual dual
We have purchased 5 dual dual-
-core Opteron 280 systems and an NFS
core Opteron 280 systems and an NFS
storage server to test with.
storage server to test with.
Worker nodes are using
Worker nodes are using Supermicro
Supermicro H8DAR
H8DAR-
-T(1U) motherboards (AMD
T(1U) motherboards (AMD
8132 chipset), 4GB of RAM, dual dual
8132 chipset), 4GB of RAM, dual dual-
-core Opteron 280, three 250GB
core Opteron 280, three 250GB
SATA
SATA-
-II hot
II hot-
-swappable drives, CDROM (4 cores/1U)
swappable drives, CDROM (4 cores/1U)
Disk server is a dual dual
Disk server is a dual dual-
-core Opteron 280, 5U, 24 SATA
core Opteron 280, 5U, 24 SATA-
-II (500GB)
II (500GB)
drives, dual 250GB system disks, 8GB of RAM, dual 1GE
drives, dual 250GB system disks, 8GB of RAM, dual 1GE NICs
NICs, Areca
, Areca
1170 RAID6 controller (11TB)
1170 RAID6 controller (11TB)
Need to test I/O scaling using NFS with ~20 clients / storage se
Need to test I/O scaling using NFS with ~20 clients / storage server
rver
Possible to use 10GE NIC on storage server if network impacts th
Possible to use 10GE NIC on storage server if network impacts the
e
performance.
performance.

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
16
Michigan State Space Planning

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
17
Proposal Numbers for AGL-Tier2
The table above is from our proposal. The FY06 numbers assumed f
The table above is from our proposal. The FY06 numbers assumed full funding
ull funding
in FY06.
in FY06.
We need to discuss the US ATLAS needs and timescales to deploy r
We need to discuss the US ATLAS needs and timescales to deploy resources
esources
for the AGL
for the AGL-
-Tier2.
Tier2.
Consideration is the availability of the long
Consideration is the availability of the long-
-term computer space at both MSU
term computer space at both MSU
and UM
and UM
We have 59 CPU
We have 59 CPU-
-years of Opteron time from our Center for Advanced
years of Opteron time from our Center for Advanced
Computing which we will dedicate to the AGL
Computing which we will dedicate to the AGL-
-Tier2 (gate02/torque)
Tier2 (gate02/torque)

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
18
Planning for AGL-Tier2 Profile
MSU and UM are both working on high
MSU and UM are both working on high-
-quality server spaces.
quality server spaces.
Michigan will share the Internet2/Merit space (MITC)
Michigan will share the Internet2/Merit space (MITC)
• 4MW of power for 2MW of cooling and 2MW of equipment (Flywheels and
generators)
• ~280 racks of space
• Lots of fiber access
The MITC space is scheduled to be ready March 2007
The MITC space is scheduled to be ready March 2007
We have
We have “
“interim
interim”
” space in our colleges server room IF we need to use
space in our colleges server room IF we need to use
it (up to 6 racks worth), but this would require two moves.
it (up to 6 racks worth), but this would require two moves.
MSU space will be ready in the same timescale (spring 2007)
MSU space will be ready in the same timescale (spring 2007)

Shawn McKee
Shawn McKee -
- AGL
AGL-
-Tier2 (MSU/UM)
Tier2 (MSU/UM)
19
Summary
The AGL
The AGL-
-Tier2 is very close to being able to provide cycles for US
Tier2 is very close to being able to provide cycles for US
ATLAS.
ATLAS.
The AGL
The AGL-
-Tier2 should truly be a single
Tier2 should truly be a single “
“site
site”
” from US
from US ATLAS
ATLAS’
’s
s point of
point of
view, even though our equipment and services are distributed bet
view, even though our equipment and services are distributed between
ween
two campuses 60 miles apart. This is because of MiLR (10GE) and
two campuses 60 miles apart. This is because of MiLR (10GE) and a
a
common network address block.
common network address block.
We have some useful services running to help monitor and manage
We have some useful services running to help monitor and manage our
our
equipment. If others are interested we are happy to share
equipment. If others are interested we are happy to share…
The amount of cycles and storage required should be discussed as
The amount of cycles and storage required should be discussed as well
well
as the time profile so we can optimize our plans.
as the time profile so we can optimize our plans.

Back to top