A Proposal Submitted to the National Science Foundation 2004 ITR Program
NSF 04012; February 25, 2004
GECSR:
A Global GridEnabled Collaboratory for
Scientific Research
Julian Bunn (coPI), Philippe Galvez
California Institute of Technology
Daniel Elvira, Daniel Green, John Wormersley
Fermi National Accelerator Laboratory
Heidi Alvarez
Florida International University
Deborah A. Agarwal
Lawrence Berkeley National Laboratory
Raymond Brock
Michigan State University
Yasar Onel, Shaowen Wang
University of Iowa
Nicholas Hadley, Shuichi Kunori (coPI), Andris Skuja
University of Maryland
Eric Dey, Tom Finholt(coPI), Steven Goldfarb, Homer A. Neal(PI), Charles Severance
University of Michigan
Kaushik De
University of Texas at Arlington
Project Summary
ITR: A Global GridEnabled Collaboratory for Scientific Research
The High Energy and Nuclear Physics (HENP) communities face unprecedented challenges as they seek to
effectively execute the next generation of experiments involving hundreds to thousands of collaborators around the
world. The very success of these experiments will critically depend on sustaining vigorous geographically
distributed collaborations that are coherent, efficient and effective while accessing, processing, and sharing
Petabytescale data. This has never been done before and there are serious concerns that the science emerging from
these projects will be limited by the absence of robust tools to make such largescale, highly interactive
collaborations possible. We seek to address these challenges by developing and deploying a collaborative
environment, called the
GridEnabled Collaboratory for Scientific Research
(GECSR), that both integrates existing
technologies and develops innovative new technologies that will be readily adopted by the target communities.
GECSR will combine the best opensource applications from within the HENP communities (and from allied Grid
and National Middleware efforts) within a common portal interface. To ensure that the capabilities of the GECSR
meet the needs of target users, we will follow an iterative development model that in each cycle will include
systematic assessment of user requirements, tool development and deployment consistent with requirements, and
evaluation to determine whether deployed tools will satisfy requirements. A distinguishing feature of this approach
is that the combination of expertise in physics, computer science, and social science will greatly enhance the chances
of both technological and sociological success. Institutions in our proposal have established track records of
international leadership in each of these areas.
The recent report of the NSF BlueRibbon Advisory Panel on Cyberinfrastructure identifies five key service
categories that will provide a foundation for the comprehensive knowledge environments that will enable
individuals, teams and organizations to revolutionize scientific practice. The HENP community, working with
computer science and communities in astrophysics and other disciplines, has addressed four of these services, based
largely on common underlying middleware, in the widespread deployment of gridenabled high performance
computing resources, and through several data grid projects. The proposed GridEnabled Collaboratory for
Scientific Research will provide the collaboration services, the fifth of these service elements of cyberinfrastructure,
that is required for the HENP community to fully realize a functionally complete environment, with the potential to
significantly transform the conduct of research.
Broader Impact
This proposal is distinguished by a tight integration between
1)
the science of collaboratories
, 2)
a globally scalable
working environment built on the foundation of a powerful fully functional set of working collaborative tools,
3)
an
agentbased monitoring and decisionsupport system that will allow collaborating scientists to perform data
intensive analysis tasks efficiently, and
4)
an education and outreach agenda that is inclusive of minorities, and
reaches highschool physics programs through vehicles such as QuarkNet and CHEPREO and success models such
as ThinkQuest. Assessment of the methodology of scientific collaborations and the iterative evaluation of the tools
by a team independent of the developers will be a critical element ensuring the success of the proposed work and
insuring its value beyond HENP.
The broader implications of this proposal, and its mission to grant full partnership to groups in all world regions, are
driven by
(1)
the right of groups and governments that have contributed to the global Collaborations to share and
collaborate in the data analysis, and thus in the process of search and discovery that is at the root of all basic
research.
(2)
the need to justify global projects, and global Collaborations as a valid and vital means of conducting
future leadingedge scientific research and other endeavors, without exploitation of the poorer nations and world
regions and
(3)
the need to develop effective means of collaboration and cooperation among subgroups with vastly
different academic and cultural backgrounds, and especially to allow effective collaboration by universitybased
students in the native cultural milieu of their home countries. Six percent of this proposal’s budget is allocated to
Education and Outreach that specifically targets minority populations: this should produce a sea
change in how high
energy physics experimentation, and the necessary global collaboration, impacts the quality of science education and
addresses the needs of underserved populations across the US, and around the world.
1
ITR: A Global GridEnabled Collaboratory for Scientific Research
The High Energy and Nuclear Physics (HENP) communities face unprecedented challenges as they seek to
effectively execute the next generation of experiments involving hundreds to thousands of collaborators around the
world. The very success of these experiments will critically depend on sustaining vigorous geographically
distributed collaborations that are coherent, efficient and effective while accessing, processing, and sharing
Petabytescale data. This has never been done before and there are serious concerns that the science emerging from
these projects will be limited by the absence of robust tools to make such largescale, highly interactive
collaborations possible. We seek to address these challenges by developing and deploying a collaborative
environment that both integrates existing technologies and develops innovative new technologies that will be readily
adopted by the target communities. That is the purpose of the proposal presented herein.
The HENP communities have a strong history of innovative use of information technology to support collaborative
activity, as indicated by HENP leadership in development of the world’s most advanced Gridbased systems
[1,2,3,4], in deployment of scalable videoconferencing tools [5], and in the use of Gridenabled analysis
environments [6]. However, despite this important progress, there is no unifying collaborative framework that links
these key capabilities together. Recent work has shown that heterogeneous collaborative tools – i.e., where
physicists must shift from one application to another as they perform different tasks – can introduce new sources of
complexity that impede rather than aid collaborations [7]. We will seek to avoid this problem by adopting a holistic
design approach that involves representatives from the enduser community in all stages of the research,
development and deployment processes.
We propose to produce an integrated collaborative environment, called the
GridEnabled Collaboratory for
Scientific Research
(GECSR), that will combine the best opensource applications from within the HENP
communities (and from allied Grid and National Middleware efforts) within a common portal interface. To ensure
that the capabilities of the GECSR meet the needs of target users, we will follow an iterative development model
that in each cycle will include systematic assessment of user requirements, tool development and deployment
consistent with requirements, and evaluation to determine whether deployed tools will satisfy requirements. A
distinguishing feature of this approach is that the combination of expertise in physics, computer science, and social
science will greatly enhance the chances of both technological and sociological success. Institutions in our proposal
have established track records of international leadership in each of these areas.
The significance of the GECSR effort is underlined by the recent report of the NSF BlueRibbon Advisory Panel on
Cyberinfrastructure[8], which identified five key service categories that form the foundation for comprehensive
knowledge environments to enable individuals, teams and organizations to revolutionize scientific practice. These
include: high performance computation services; data, information knowledge management services; observation,
measurement, fabrication services; interfaces, visualization services; and collaboration services. The HENP
communities have generated coordinated activity within the first four of these categories– primarily through: the use
of common underlying middleware; widespread deployment of gridenabled high performance computing resources;
data grid projects for management of data, information and knowledge [9]; instruments that can be monitored
remotely through eLogs and other interfaces; and development of the Gridenabled Analysis Environment [6]. In
contrast, progress within the collaboration services category has not been at a comparable level. As a result, there is
some risk that the important work done in the other areas may be diminished if physicists lack highly usable and
standard collaborative interfaces that tie together the disparate computational, data, and instrument resources. The
proposed GECSR fills this gap by creating a coherent communitysanctioned activity within the collaboration
services category that is equal in caliber and level of coordination to ongoing efforts within the other four categories.
HENP’s Collaboration Challenges
The major HENP experiments of the next twenty years will break new ground in our understanding of the
fundamental interactions, structures and symmetries that govern the nature of matter and spacetime in our universe.
Among the principal goals at the high energy frontier are to find the mechanism responsible for mass in the
universe, to discover the “Higgs” particles associated with mass generation, to understand the unification of the
fundamental forces of nature, to search for new symmetries and extra dimensions, and to explore the fundamental
mechanism that led to the predominance of matter over antimatter in the observable cosmos.
The largest collaborations today, such as CMS[10] and ATLAS[11] who are building experiments for CERN’s[12]
Large Hadron Collider (LHC; [13] ) program, each encompass of the order of 2000 physicists from 150 institutions
2
in more than 30 countries. Each of these collaborations include 300400 physicists in the US, from more than 30
universities as well as the major US HEP laboratories. The LHC experiments are expected to begin collecting data in
2007, with a high level of collaborative activity required up to that date and well beyond. The current generation of
experiments now in operation and taking data at SLAC[14] and Fermilab (D0[15] and CDF[16]) are similar in scale
to the US contingent of the nextgeneration experiments. Each of these experiments faces unprecedented challenges
in terms of:
•
The global extent and multilevel organization of the physics collaborations, leading to the need to
collaborate and share dataintensive work in fundamentally new ways.
•
The dataintensiveness of the work, where the data volume to be processed, distributed, accessed and
analyzed by a major experiment are in the Petabyte (10
15
Bytes) range now, and are expected to rise to the
Exabyte (10
18
Bytes) range within the next ten years.
•
The complexity of the data, particularly at the LHC where the physics discovery potential is related to the
very high intensity (luminosity) as well as the high energy of the collisions, such that ~20 interactions
accompany the particle interaction of interest
The new paradigm of “Grids” and gridcomputing[17] is thought to hold the key to meeting the collaboration,
computing and datamanagement needs of HENP and there are significant efforts underway to explore and develop
the grid toolkits and middleware that will be required for success in HENP. The complex problems of connecting
and enabling resources like networks, computers and storage certainly justify the attention and support from the
funding agencies and the HENP communities. However, these purely technical issues are equaled in complexity by
the sociotechnical problems of effectively and efficiently connecting and enabling physicists to do their science
using emerging cyberinfrastructure – and these topics have not received as much attention from HENP physicists or
the agencies. There is considerable risk in allowing this discrepancy to persist and the primary goal of this proposal
is to redress this situation.
A particular problem for the organization and conduct of cyberinfrastructurebased physics is that past practice and
experience provides only limited guidance for what should be done in the future. Up to the present, HENP physicists
have performed experiments and analyses in tightly coupled cooperating groups. Daytoday research was typically
conducted in smaller teams (15 physicists) that work closely together and then share their results within a larger
overarching collection of teams for verification and further analysis. The scale of these enterprises, combined with
frequent geographic concentration of collaborators, meant that communication could be organized primarily through
informal and formal facetoface contact (e.g., hallway encounters or lab meetings). By contrast, the worldwide
scope and large size of newer HENP collaborations, such as CMS and ATLAS, make dependence on facetoface
communication unrealistic. The geographic dispersion of these next generation collaborations highlights the need for
novel tools and work practices. For example, while experiments may run twentyfour hours a day, physicists are
most likely to be active during working hours in their local time zones. Working together under these circumstances
will require collaboration tools that support both synchronous and asynchronous activity.
To summarize, we argue that a focused effort to develop a robust collaborative environment for largescale HEPN
experiments is critical, because physicists will receive their primary exposure to the other components of
cyberinfrastructure via collaboration services – and if these services are deficient, then the potential benefit of the
other capabilities will be reduced. The GECSR, then, is an effort to create a collaboration services element that is
equal to the outstanding work already accomplished and underway in other physics cyberinfrastructure projects.
Approach and Architecture
To ensure the highest probability of meeting user needs, we will employ an iterative
development strategy that
combines assessment, development, deployment, and evaluation within each release cycle [18,19,20,21,22]. Our
goal is the creation of a common portal interface that provides a standard look and feel for uniform access to a
diverse array of the best open source applications from within the HENP communities, and from related Grid and
NMI efforts. The basis for the GECSR portal interface will be the Open Grid Computing Environment (OGCE)
[23]. The advantage of the OGCE is that it provides a common context to plugin specialized capabilities – such as
features specific to the HENP communities – alongside a standard set of collaboration tools and services (e.g.,
announcement capabilities, persistent chat, shared calendars, rolebased access control, and threaded discussions).
At the University of Michigan, a variant of the OGCE is currently in use as an enterprisewide course management
3
system with approximately 30,000 users per term. In the OGCE, the common context to add specialized features is
achieved through the use of opensource components and standards, such as JSR 168, a java standard that defines
elements called portlets for delivery of applications through portals. An advantage of the portlet approach, as
implemented in OGCE, is that portlets are decomposed into their presentation components supported by persistent
services accessed via a standardized API. In addition, the OGCE optionally extends login authentication to use the
Grid [24] as an authentication provider, such that a proxy credential can also be stored by OGCE to allow portlets to
perform Grid operations.
Users will organize activity in the GECSR portal environment through
sessions
, or combinations of people, data,
and communication that are tracked by the system. For example, in the case of a research group with ongoing
meetings a new session will begin by calling up archived data, metadata, and tools that the group has used in the
past. A key feature of any session is that resources available to users will be consistent with preferences and
access/authorization levels of everyone involved in the session. For instance, a session will not employ a given
capability, such as audio recording, unless a group decision rule is satisfied (e.g., majority approval or unanimous
approval). Sessions will consist of both synchronous and asynchronous activity. In the synchronous setting,
participants in a session will be able to access a wide array of communication capabilities, including video and audio
conferencing, supplemented by the ability to view and control common applications, such as text editors, data
analysis tools, or graphics viewers. In the asynchronous setting, users will be able to access previous sessions and to
receive notifications when aspects of a session have changed, such as the addition of new or revised data. To aid
retrieval of session content, the GECSR will use automatic archiving, where all versions of artifacts used in a
session, such as documents, will be captured and cataloged.
GECSR sessions will draw on the features of leading collaboration tools – many developed by or with the help of
personnel involved with this proposal. A key motivation for selection of these tools, rather than comparable
commercial products, is that the applications proposed for the GECSR are open source and in the public domain.
This distinction is critical because the costs of commercial collaboration tools such as WebEx, Microsoft Office
Live Meeting (formerly Placeware), or Xerox Docushare – represent a significant barrier to widespread adoption
and use. Further, customization of these offtheshelf products is unlikely due to proprietary control of the
underlying software. Therefore, for synchronous audio and video communication, the GECSR will employ both the
Virtual Rooms Videoconferencing System (VRVS) [5] and the Access Grid [25]. VRVS, installed on 26,000
machines with over 8200 registered users in 99 countries, provides a worldwide multipoint connection service that
allows users to work together with different protocols (SIP, H.323, Mbone, Access Grid, MPEG2, and MPEG4),
operating systems, and applications. VRVS users are growing at a rate of roughly one new user per hour. The
Access Grid is a specific implementation of highquality audio hardware with large video displays that allows
natural conversations in grouptogroup interactions.
VRVS interoperates with the Access Grid and provides
scalable access to an AG along with interfacing of H.323 hardware. For sharing and viewing common applications
in real time, GECSR will use the Virtual Network Computer (VNC) system. For secure presence awareness and
instant messaging, the GECSR will use the Pervasive Collaborative Computing Environment (PCCE) [26], which
includes Jabber eXtensible Messaging and Presence protocols as well as collaborative computational workflow
tools. For asynchronous playback of audio, video, and PowerPoint slides, the GECSR will use the Syncomat
application developed for the Web Lecture Archive Project [27]. For access to stored data, the GECSR will use a
peertopeer file sharing system based on a reliable and secure group communication protocol [28]. Finally, to
automatically configure resources within sessions, the GECSR will use the Monitoring Agents in A Large Integrated
Services Architecture (MonALISA) system [29], a monitoring framework and multithreaded, autodiscovering
services architecture.
We plan to integrate the tools described above into a unified system with the following novel features:
•
Persistent Collaboration
. Object persistence is a critical attribute of a collaborative environment. Without
the ability to archive and make available the documents, whiteboards, conversations and interactions that
occur in the course of an ongoing collaboration we lose one of the most transforming aspects collaborative
environments could provide. Persistent documents and interactions in collaborative sessions create a form
of group memory, keeping otherwise transient events or results available, not only for those who
participated in the original session, but any others who may need to find such results;
•
Language of access
. We need to provide the users of the GECSR with a language of access to allow each
user to define the circumstances under which various collaborators, groups or agents can interrupt them and
at what level. The language should be rich enough to allow definition of automated ways of responding
4
under specified circumstances and must encompass dynamic definition of new groups or interactions. This
language should also extend to allowing authorization decisions for persistent objects and the context
sensitive use of collaborative tools, as well as use of tools consistent with group preferences and norms
(i.e., as expressed through group decision rules). Such a capability is required to assure the users that they
will have the flexibility to map social and organizational policies from existing collaborations to GECSR
enabled collaborations.
•
HumanSystemHuman as well as HumanHuman interactions for collaborative work.
A customizable dynamic environment that archives interactions and allows dynamic creation of groups
could quickly produce more data and metadata than a typical user can process. Providing agentassisted
collaboration
as an integral part of the GECSR will be critical to giving users the capabilities they will need
to find information, join groups and interact with colleagues. In addition, monitoring tools can aid
collaborations by automatically optimizing communication channels, such as in videoconferences. Finally,
for dataintensive transactions, agents can help users determine optimal strategies for moving and viewing
data.
The importance of assessment and evaluation
Because collaborative tools intersect with individual, group, and organizational work practices, a critical goal for
successful development and deployment is identification and resolution of key sociotechnical problems.
Preliminary studies of the ATLAS Collaboration [7], for example, show that dissatisfaction with currently deployed
technologies reflects the mismatch between expectations and performance, such as the difference between the ease
of facetoface meetings compared to videoconferences. GECSR’s effectiveness depends on its ability to provide
capabilities essential to the HENP community. To guide and improve the alignment of performance with
expectations, the GECSR team will include researchers with expertise in humancomputer interaction (HCI) who
can distill common themes from diverse and complicated social and technical feedback. Understanding these social
and technical themes will produce two critical benefits for the GECSR project. First, the HCI analyses will help
identify the functionality the various GECSR systems must provide to meet the requirements of the HENP
community. Second, the HCI analyses will play a role in the adoption of GECSR by providing project visibility to
target users very early in the design process, managing expectations, and keeping the end users engaged. The HCI
effort falls into two categories of activity: assessment and evaluation. Assessment activity can be thought of as
elicitation of requirements from physicists. Evaluation activity can be thought of as measurement of satisfaction
with deployed systems.
To meet the goals of the assessment effort, we will use multiple, complementary methods. Specifically, we propose
to combine a communitybased survey with selected intense observation of physicists at work. Surveys are a well
understood mechanism for efficiently obtaining data about preferences in a larger population through extrapolation
from a relatively small number of randomlyselected respondents. The sample for the GECSR user survey will be a
randomly stratified selection of physicists, students and technicians involved in the ATLAS, CMS, D0, and CDF
collaborations. Items included in the questionnaire will cover: a) communication and collaboration (e.g., frequency
and mode of communication by location of collaborators; size and geographic distribution of participants within
primary collaborations); b) barriers and challenges to communication and collaboration (e.g., coordination loss,
delays, miscommunication, information overload); c) practice (e.g., primary field of research; allocation of time
across research activities); d) expectations for the GECSR (e.g., new opportunities for collaboration; better access to
experimental data; new ways to incorporate research data in the classroom); e) priorities for GECSR capabilities
(e.g., access to data repositories; remote participation in meetings and experiments; collaborative data visualization);
and f) policies for data and instrument use (e.g., safety; data access; intellectual property). The aim of the GECSR
survey activity will be to produce snapshots of needs and attitudes within the HENP community of users at the start,
midpoint, and end of the GECSR project.
In contrast to the broad perspective provided by the survey data, we propose to periodically observe activity and
interview participants at 35 sites from U.S. institutions involved in the CMS and ATLAS collaborations. Data
collected within these settings will help illuminate variation in user needs across different social and institutional
arrangements (e.g., a large research university versus a small liberal arts college). Specifically, systematic
observation is often the best way to detect the causes of low adoption of deployed technologies – such as Ruhleder
and Star’s [30] analysis of Worm Community System use by
c. elegans
researchers (WCS) or Orlikowski’s [31]
analysis of Lotus Notes use by employees of a consulting firm. In both cases, technically elegant tools failed
5
because key features did not match critical features of work organization, such as criteria for promotion or rewards
systems. The aim of the observational and interview activity will be to generate descriptions of processes that either
enhance or impede adoption of GECSR applications. As with the survey effort, these data will be used to shape
deployment and development priorities – and serve as input to the evaluation effort.
In the iterative development scheme proposed for GECSR, the evaluation component provides closure on each
release cycle – and provides input for the next cycle – by offering feedback on user satisfaction and performance
related to GECSR use. We propose to measure user satisfaction in terms of subjective reports and objective
indicators. We plan to obtain subjective reports of user satisfaction through questionnaire items included on the
second and third wave administrations of the community survey (described previously). These items will focus on:
a) interface design (e.g., clarity of interface elements; function of icons); b) usability (e.g., do actions have expected
consequences; are there predictable failure modes; is system help sufficient); c) functionality (e.g., does an
application do what a user wants; are there missing capabilities); d) graphical layout (e.g., do plots show standard
units; do applications consume appropriate screen real estate); and e) performance (e.g., did the application update
promptly; did data transfer reliably). Objective indicators of satisfaction will include usage logs, where use levels
are assumed to relate positively with satisfaction.
In terms of performance related to GECSR use, we propose to use a variety of measures. First, because a target of
GECSR development is collaboration, we will focus on the structure of collegial networks before and after GECSR
introduction as a marker of GECSR impact on the extent and diversity of network ties. That is, Newman et al. [32]
have developed a number of techniques for analyzing and visualizing graphs of extensive networks of association
among physicists (e.g., using coauthorship). We will employ similar techniques to characterize networks among
the population of scientists within the ATLAS, CMS, D0, and CDF collaborations. For example, a critical question
will be the degree to which GECSR use results in more diverse collaborations, both measured in terms of geographic
dispersion and in terms of research focus. In addition, we will be interested in whether GECSR use provides a
greater capacity for collaboration, as measured by the number and size of primary collaborations. Second, because
we expect GECSR use to enhance the significance of collaborations, we will be examining subjective and objective
measures of collaboration impact related to use. For instance, subjective measures might include questionnaire
items about the selfreported importance of a collaboration, the likelihood of future collaborations with the same
collaborators, and the quality of the collaboration experience (e.g., trustworthiness of collaborators). Objective
measures of impact might include the output from a collaboration, such as the number of papers and reports – and
the visibility of this output. Finally, because one of the anticipated benefits of collaboratorybased science is a
leveling of access to experts, instruments, and data – we will be examining whether the GECSR expands
participation in physics research. That is, beyond the focal audience of users – the physicists in the large
collaborations – we will look for evidence that GECSR tools create new opportunities for engagement by faculty at
nonPhD institutions, by undergraduate, by K12 students, and by the general public. Specifically, we plan to
produce case descriptions of novel forms of involvement by new players that the GECSR may enable. The results
of this evaluation activity will be shared with the larger scientific community and will inform the development of
future tools, environments and collaborations in other science, engineering and education domains.
Education and Outreach (E&O)
The Education and Outreach (E&O) program associated with this proposal will enable a collaborative learning
community engaging traditional and nontraditional students in our studies of particle physics at the energy frontier.
In the collaborative learning environments that will result from the development components of the proposal,
students will become empowered to do science projects and/or help develop the learning environment itself under
the supervision of teachers, and education researchers. We will deploy collaboratory tools into high schools and
college classrooms to disseminate highenergy physics knowledge, enhance physics classroom activities, and
explore how the tools adapt to the school community.
Florida International University (FIU) will introduce the tools to local high schools and evaluate their impact. FIU,
in collaboration with partners at Florida State University (FSU), the University of Florida (UF), and the California
Institute of Technology (Caltech), in cooperation with the NSF, are creating and operating an interregional Grid
enabled Center for HighEnergy Physics Research and Educational Outreach (CHEPREO) at FIU, encompassing an
integrated program of research, network infrastructure development, and education and outreach at one of the largest
minority schools in the US [33]. CHEPREO is extending FIU’s existing research activities at Jefferson National
Laboratory to the longterm highenergy physics research program at the CMS experiment at CERN, create a robust
6
outreach activity based on CMS research, develop an advanced networking and Grid computing infrastructure that
will draw in new collaborators from South America, and enhance science and math education in South Florida for
underserved minority students through pedagogic enhancements and teacher training led by a Physics Learning
Center. Through GESCR support, FIU will also lead a coordinated program for E&O by creating a community of
practice among the various collaborating institutions involved in this proposal, enabled through GESCR tools to
insure the integration of E&O project goals: to engage minorities; improve physics classroom curriculum directed by
teacher/student contribution and response; add a new dimension to existing programs (QuarkNet, ThinkQuest); and
employ education research techniques to investigate these approaches.
QuarkNetaffiliated [34] teachers will utilize the collaboratory tools through summer highenergy research
programs. QuarkNet is a successful national program of particle physics education and outreach, supported by NSF
that partners high school teachers with physicist mentors. Teachers work as researchers and collaborators in
forefront physics projects. The program, now in its fifth year, has over 50 participating university and national
laboratory centers, five of which are represented in this proposal. They have well established relationships between
teachers and physicists and have reached tens of thousands of high school students through the programs. The
University of Texas at Arlington (UTA) and the University of Iowa (UI) are among the twelve original QuarkNet
Centers funded by NSF since 1999. FIU is a new QuarkNet center. The groups also cover significant geographic
regions and have the potential to engage underrepresented groups in HENP. The E&O team envisages a
competition, modeled after the ThinkQuest [35] concept of forming 5 person teams with a coach to develop a project
to be shared. While ThinkQuest projects are website lessons, we would develop live lesson plans that would involve
remote teams linked together into the collaborative learning environment. Moreover teams could and should consist
of geographical separated members with like interests. By providing access to our facilities and research activities,
we will build personal fulfillment and science excitement in students as well as open their eyes to new career
opportunities. Teachers will be reinvigorated through greater exposure to pedagogical innovations that will translate
into successful physics classroom experiences.
Our first goal is to disseminate the collaboratory tools into schools located near our partners. The tools will provide
a 'real window' into the exciting world of particle physics at remote accelerator laboratories for many K12 students
and teachers. For education and outreach involving K12 teachers, we will draw from the pool of active and highly
regarded QuarkNet Teachers affiliated with our participating groups’ QuarkNet Centers . Through the collaboratory
tools, students and teachers will interact with physicists at universities and laboratories to bring high energy physics
into the classroom. Classroomclassroom interactions will also be a vital part of the proposal to foster teacher
mentoring as well as to build a community among students. We will also provide internships for students to actively
work on developing, prototyping, testing and using the collaborative tools. The Education and Outreach Area
Coordinator will establish a group that will discuss communication and development issues related to education.
These discussions will be aimed at the goals of developing collaborative learning sessions referencing both
QuarkNet and ThinkQuest team lesson development model, with a new dimension of live interactive participation,
and collaborative class and research projects, among undergraduates and high school students.
Prairie View A&M (PVAMU) is the second oldest university in the state of Texas. Blacks constituted 92% of the
undegraduate enrollment of 7255 students at PVAMU in Fall 2002. UTA will partner with PVAMU to concentrate
on minority recruiting programs. Since we are delivering a new collaborative environment in this proposal, it would
be appropriate to focus on how this new paradigm can help traditionally underrepresented students in physics.
PVAMU played a major role in developing graphics software for BaBar. PVAMU is in the process of putting
together a large donated grid computing cluster. The collaborative tools to be developed through this proposal, with
active student involvement at PVAMU, can open doors for traditionally underrepresented students to participate in
the greatest scientific discoveries at U.S. and international laboratories.
In the area of education research, we will develop direct and collaborative links with colleagues engaged in studying
the effectiveness of collaboratory tools from an educational perspective at the undergraduate level as well as in the
education of scientists joining largescale research projects. To do this, we provide support for a graduate student
doing dissertation research in the School of Education at Michigan on the effectiveness of information delivery
using the WLAP technology. Undergraduates participating in a summer Research Experience for Undergraduates
(REU) at CERN and general physics undergraduates will be used as subjects in research on design principles for the
WLAP technology and in an attempt to study the role of individual difference in the design of instructional materials
for same [36,37]. The results of this research will be immediately applied to the research and development efforts of
this proposal that relate to humancomputer interface and will be incorporated into the other aspects of this
educational and outreach proposal. Through this research, physicists will gain insight from professional educators
7
into the nature and effectiveness of information delivery using the WLAP technology. Simultaneously, educators
will gain insight into the nature and operation of science experimentation in large collaborations, which may spawn
further research. The team at the University of Michigan will also carefully examine the transferability of the
knowledge gained in the development of the GECSR Project to the development of collegelevel courses. By
drawing upon the 300 + high energy physics lectures already recorded by the proposal participants from the CERN
Summer Student Program and interfacing these through the OGCE portal we plan to develop a full seniorlevel
course in high energy physics and to evaluate and track the performance of students who enroll in it.
The University of Iowa team will leverage four existing programs to conduct education and outreach for this
proposal: Iowa QuarkNet, Science Education Center of School of Education, GROW (Grid Research and education
group @ IoWa), and the Studio of Academic Technologies for Education.
A special effort will be undertaken by Florida International University in developing a “Grids for Kids”
Collaboration on a physics grid tailored for high school implementation which will enable grades 9–12 students to
interact in a virtual global village where handson implementation and use can promote both understanding and
spark an interest to continue studies in physics and technology at the undergraduate level. A small grid array (15
servers) in a high school classroom may easily stimulate interest by placing the tools to analyze a simple result set to
modeling complex physical behavior. Other related projects are planned at the K12 level and for teachers. Teachers
lesson plans now can include more meaningful applications of theory. Experiments can include larger data sets;
more complex data and their results could be analyzed at a rate much higher than could be accomplished with
current technology. An environment where teachers can submit a data set to run on a physics cluster with Gigaflops
of computing power and also monitor the job as it runs; all this from a remote location, can be an exhilarating
experience for kids of any background. Grid physics for kids would immensely help the entry level knowledge of
university students; wherever this initiative is implanted.
Summary of Coordination Plan
A significant challenge in the proposed GECSR project involves successful coordination of diverse participants
representing multiple institutions and different disciplinary backgrounds. For example, a recent analysis of 62
completed projects from the NSF Knowledge and Distributed Intelligence (KDI) program, conducted by Cummings
and Kiesler [38,39] found varying outcomes in geographically distributed multidisciplinary teams. Specifically,
performance did not vary by multidisciplinarity – but was strongly influenced by geographic dispersion. In
particular, dispersed projects that did better employed a number of common management practices – including
holding a kickoff workshop or conference, conducting regular audio conferences, and exchanging email regularly.
However, workshops and sustained communication were costly in terms of resources and attention of project
leaders. Analysis of responses from the PIs in the KDI projects showed the need for tools and practices that could
accomplish the following: a) manage and track the trajectory of tasks over time; b) reduce information overload; c)
support spontaneous conversation – including awareness of availability for conversation at a distance; d) support
simultaneous decision making; and e) support scheduling of and participation in presentations and meetings across
distance.
In recognition of the risks posed by a team distributed across eight or more institutions we will take explicit steps to
address the pitfalls identified in the KDI study (in addition to regular use of standard modes of communication, such
as the phone, email and etc.). Note that the short description here is expanded in the separate threepage
Coordination Plan that accompanies this submission. First, project activity will be lead and directed by a
management committee consisting of the four CoPIs and chaired by Dr. Neal. The management committee will
meet three times a month, with the default mode being via videoconference – using technology under development
in the scope of the GECSR project (e.g., VRVS and Access Grid). Once a month the committee will hold an
augmented management meeting where leaders of the core technology will join the four CoPIs. Second, the project
will use the calendar tool within the OGCE to maintain an updated roadmap of project deliverables and milestones.
In addition, a current Gannt chart will be posted in the resources area of the project OGCE workspace for quick
review of schedule expectations and contingencies. Third, to reduce information overload, the project will use the
resources area within the OGCE project space to post and review critical documents and URLs, with notification
levels that individuals can adjust (e.g., frequency and type of notifications). Rigorous use of the resources area
should also reduce the phenomenon of “death by email attachment” that plagues many distributed projects. Fourth,
to provide presence awareness and opportunities for spontaneous interaction, project members will adopt the Jabber
based instant messaging application being developed for GECSR – and also use the builtin presence awareness and
8
chat capabilities available within OGCE. Fifth, to support distributed presentations, the project will use tools in the
Access Grid distribution, such as
dppt
, to view PowerPoint slides across multiple locations. Finally, to jumpstart
creation and maintenance of trust and cohesion within the project team, the project will hold an annual workshop
where participants can gather and discuss issues facetoface – with the initial meeting held as a kickoff event
shortly after notification of funding. In addition, the project leadership will identify opportunities for subsets of the
project personnel to meet, such as combining projectrelated meetings with larger meetings (e.g., ATLAS week).
Timeline and Milestones
First Year: “Integrating Collaborative Tools”
The goal of the first year will be successful integration of core collaborative tools within the OGCE portal
environment. Specifically, at the end of Year 1, users will be able to access key capabilities to support GECSR
sessions within a common interface, including: videoconferencing, presence awareness and messaging, control and
viewing of shared applications, and a shared file system. Each tool identified for integration into OGCE will be
enhanced as needed to provide security, persistence, and compatibility with access and monitoring capabilities (e.g.,
MonALISA).
In terms of synchronous collaborative tool development, key deliverables in Year 1 will be:
•
Adaptation of VRVS to work within the OGCE environment. Specifically, this will involve development
of management tools within OGCE that allow users to create groups and associate these groups with
GECSR sessions. The management tools will also allow users to associate VRVS resources with GECSR
sessions, such as live or archived videoconferences. For example, users will have the option to convert a
transient GECSR session into a persistent session, where the GECSR system will automatically apply
archiving (e.g., indexed MPEG audio/video streams) to communication in persistent sessions – subject to
the preferences of the collaborators; (CalTech)
•
Adaptation of VNC to work within the OGCE environment. Specifically, this will involve development of
OGCE tools that allow users associated with a GECSR session to control and view shared applications.
Early development of specialized modules, for example java applets, optimized for quality of service under
dynamically changing bandwidths will be explored and developed; (CalTech, Michigan)
•
Adaptation of MonALISA to work within the OGCE environment. The MonALISA system provides a
distributed monitoring and control service [40] for collaborative applications. MonALISA is based on a
scalable Dynamic Distributed Services Architecture [41] and uses a multithreaded engine for
asynchronous data collection and interaction with other services and agents. Activity related to MonALISA
in Year 1 will involve development of OGCE tools that allow users to access output of MonALISA
processes, such as monitoring of hosts where VRVS reflectors are running (load, CPU, IO traffic),
monitoring the status of VRVS reflectors (number of VRVS rooms, clients, peers, traffic), providing
remote administrative control of reflectors using a X.509 certificate based SSL connection, providing real
time measurements for the quality of connectivity between selected possible peers (ABPing measures RTT,
lost packages, the jitters using UDP and computes the quality of the connection every 2 s); (CalTech,
Michigan)
•
Adaptation of the PCCE to work within the OGCE environment. The PCCE software provides secure
presence and messaging that provides both synchronous and asynchronous communication. The PCCE
software is based on the XMPP standards[42] and leverages the Jabber open source software
developments[43]; (LBNL,UTA)
In terms of asynchronous collaborative tool development, key deliverables in Year 1 will be:
•
Adaptation of reliable and secure peertopeer information sharing within the OGCE environment. The
scalable and secure peertopeer information sharing tool enables groups to securely share information
among collaborators from the information’s natural storage location (e.g. files and databases)[28]; (LBNL);
•
Adaptation of automatic archiving systems within the OGCE environment; (All)
In terms of deployment, key deliverables in Year 1 will be:
9
•
Use of GECSR by the physics participants in this proposal to perform physics analysis, detector
commissioning and detector operation; (All, Maryland leads)
•
Tighter integration of the legacy experiment specific code with the GESCR environment, for example,
integrating PCCE and OGCE infrastructure into data production and analysis tools; (All, Michigan leads)
In terms of assessment and evaluation, key deliverables in Year 1 will be:
•
Selection of the community sample, administration of the baseline community survey, analysis of the
survey data, and submission of a report based on the survey results; (Michigan)
•
Selection of the sites for indepth observation and interviews, completion of the baseline visits to these
sites, and submission of a report based on the results of the observations and interviews; (Michigan)
•
Analysis of feedback from physics participants in the GECSR project about the performance and usability
of the GECSR environment; (Michigan)
Second Year: “Introducing Dynamic Collaboration Capability”
The focus of the development effort in Year 2 will shift from the integration effort of the Year 1 release to the
development of new tools that leverage capabilities that will exist in GECSR after the initial release and add
completely new capabilities to GECSR. In particular, a key goal for Year 2 will be development of a “language of
access” that when combined with other monitoring capabilities in GECSR will assist users in discovering and
joining interesting collaborations. All of the OGCE services will be adapted to also support WSDL bindings using
the WSResource Framework (WSRF) [44]. This will allow collaborative capabilities to be added to any
application that is capable of acting as WSRF client.
In terms of synchronous collaborative tool development, key deliverables for Year 2 will be:
•
Dynamic optimization of connectivity. Using MonALISA, we will provide a dynamic distributed decision
unit to compute a global Minimum Spanning Tree (MST) to optimize connectivity and to correctly solve
the connectivity when reflectors are down or unreachable due to network problems. This is done using a
set of agents deployed to all the monitoring services to collect the connectivity information and to
dynamically compute the MST; (CalTech)
•
Implementation of a language of access. The GECSR will have a language of access, using authorization
tools [45] that allow association of adhoc groups of physicists with GECSR sessions based on individual
X.509 certificates. Prior work has produced Webbased tools that implement much of the language of
access functionality. The new work in Year 2 will integrate these existing tools within the OGCE
environment; (CalTech)
•
Implementation of locationaware monitoring. We will use a geographicinformationsystemsbased Grid
information broker (GISGIB) [46] to provide physicists with a locationaware mechanism for monitoring
GECSRbased collaborations, discovering interesting collaborations, and analyzing the spatial and
temporal characteristics of collaborations. GISGIB will help physicists dynamically identify where and
when interesting collaborations are taking place. In addition, the GISGIB can be used to organize and
analyze coll