A Proposal Submitted to the National Science Foundation 2004 ITR Program
    NSF 04­012; February 25, 2004
     
     
    GECSR:
    A Global Grid­Enabled Collaboratory for
    Scientific Research
     
     
     
     
     
     
     
     
    Julian Bunn (co­PI), Philippe Galvez
    California Institute of Technology
     
    Daniel Elvira, Daniel Green, John Wormersley
    Fermi National Accelerator Laboratory
     
    Heidi Alvarez
    Florida International University
     
    Deborah A. Agarwal
    Lawrence Berkeley National Laboratory
     
    Raymond Brock
    Michigan State University
     
    Yasar Onel, Shaowen Wang
    University of Iowa
     
    Nicholas Hadley, Shuichi Kunori (co­PI), Andris Skuja
    University of Maryland
     
    Eric Dey, Tom Finholt(co­PI), Steven Goldfarb, Homer A. Neal(PI), Charles Severance
    University of Michigan
     
    Kaushik De
    University of Texas at Arlington
     

    Project Summary
    ITR: A Global Grid­Enabled Collaboratory for Scientific Research
     
    The High Energy and Nuclear Physics (HENP) communities face unprecedented challenges as they seek to
    effectively execute the next generation of experiments involving hundreds to thousands of collaborators around the
    world. The very success of these experiments will critically depend on sustaining vigorous geographically
    distributed collaborations that are coherent, efficient and effective while accessing, processing, and sharing
    Petabyte­scale data. This has never been done before and there are serious concerns that the science emerging from
    these projects will be limited by the absence of robust tools to make such large­scale, highly interactive
    collaborations possible. We seek to address these challenges by developing and deploying a collaborative
    environment, called the
    Grid­Enabled Collaboratory for Scientific Research
    (GECSR), that both integrates existing
    technologies and develops innovative new technologies that will be readily adopted by the target communities.
    GECSR will combine the best open­source applications from within the HENP communities (and from allied Grid
    and National Middleware efforts) within a common portal interface. To ensure that the capabilities of the GECSR
    meet the needs of target users, we will follow an iterative development model that in each cycle will include
    systematic assessment of user requirements, tool development and deployment consistent with requirements, and
    evaluation to determine whether deployed tools will satisfy requirements. A distinguishing feature of this approach
    is that the combination of expertise in physics, computer science, and social science will greatly enhance the chances
    of both technological and sociological success. Institutions in our proposal have established track records of
    international leadership in each of these areas.
    The recent report of the NSF Blue­Ribbon Advisory Panel on Cyberinfrastructure identifies five key service
    categories that will provide a foundation for the comprehensive knowledge environments that will enable
    individuals, teams and organizations to revolutionize scientific practice. The HENP community, working with
    computer science and communities in astrophysics and other disciplines, has addressed four of these services, based
    largely on common underlying middleware, in the widespread deployment of grid­enabled high performance
    computing resources, and through several data grid projects. The proposed Grid­Enabled Collaboratory for
    Scientific Research will provide the collaboration services, the fifth of these service elements of cyberinfrastructure,
    that is required for the HENP community to fully realize a functionally complete environment, with the potential to
    significantly transform the conduct of research.
     
    Broader Impact
    This proposal is distinguished by a tight integration between
    1)
    the science of collaboratories
    , 2)
    a globally scalable
    working environment built on the foundation of a powerful fully functional set of working collaborative tools,
    3)
    an
    agent­based monitoring and decision­support system that will allow collaborating scientists to perform data
    intensive analysis tasks efficiently, and
    4)
    an education and outreach agenda that is inclusive of minorities, and
    reaches high­school physics programs through vehicles such as QuarkNet and CHEPREO and success models such
    as ThinkQuest. Assessment of the methodology of scientific collaborations and the iterative evaluation of the tools
    by a team independent of the developers will be a critical element ensuring the success of the proposed work and
    insuring its value beyond HENP.
     
    The broader implications of this proposal, and its mission to grant full partnership to groups in all world regions, are
    driven by
    (1)
    the right of groups and governments that have contributed to the global Collaborations to share and
    collaborate in the data analysis, and thus in the process of search and discovery that is at the root of all basic
    research.
    (2)
    the need to justify global projects, and global Collaborations as a valid and vital means of conducting
    future leading­edge scientific research and other endeavors, without exploitation of the poorer nations and world
    regions and
    (3)
    the need to develop effective means of collaboration and cooperation among sub­groups with vastly
    different academic and cultural backgrounds, and especially to allow effective collaboration by university­based
    students in the native cultural milieu of their home countries. Six percent of this proposal’s budget is allocated to
    Education and Outreach that specifically targets minority populations: this should produce a sea
    change in how high
    energy physics experimentation, and the necessary global collaboration, impacts the quality of science education and
    addresses the needs of underserved populations across the US, and around the world.

     
    1
    ITR: A Global Grid­Enabled Collaboratory for Scientific Research
    The High Energy and Nuclear Physics (HENP) communities face unprecedented challenges as they seek to
    effectively execute the next generation of experiments involving hundreds to thousands of collaborators around the
    world. The very success of these experiments will critically depend on sustaining vigorous geographically
    distributed collaborations that are coherent, efficient and effective while accessing, processing, and sharing
    Petabyte­scale data. This has never been done before and there are serious concerns that the science emerging from
    these projects will be limited by the absence of robust tools to make such large­scale, highly interactive
    collaborations possible. We seek to address these challenges by developing and deploying a collaborative
    environment that both integrates existing technologies and develops innovative new technologies that will be readily
    adopted by the target communities. That is the purpose of the proposal presented herein.
    The HENP communities have a strong history of innovative use of information technology to support collaborative
    activity, as indicated by HENP leadership in development of the world’s most advanced Grid­based systems
    [1,2,3,4], in deployment of scalable videoconferencing tools [5], and in the use of Grid­enabled analysis
    environments [6]. However, despite this important progress, there is no unifying collaborative framework that links
    these key capabilities together. Recent work has shown that heterogeneous collaborative tools – i.e., where
    physicists must shift from one application to another as they perform different tasks – can introduce new sources of
    complexity that impede rather than aid collaborations [7]. We will seek to avoid this problem by adopting a holistic
    design approach that involves representatives from the end­user community in all stages of the research,
    development and deployment processes.
    We propose to produce an integrated collaborative environment, called the
    Grid­Enabled Collaboratory for
    Scientific Research
    (GECSR), that will combine the best open­source applications from within the HENP
    communities (and from allied Grid and National Middleware efforts) within a common portal interface. To ensure
    that the capabilities of the GECSR meet the needs of target users, we will follow an iterative development model
    that in each cycle will include systematic assessment of user requirements, tool development and deployment
    consistent with requirements, and evaluation to determine whether deployed tools will satisfy requirements. A
    distinguishing feature of this approach is that the combination of expertise in physics, computer science, and social
    science will greatly enhance the chances of both technological and sociological success. Institutions in our proposal
    have established track records of international leadership in each of these areas.
    The significance of the GECSR effort is underlined by the recent report of the NSF Blue­Ribbon Advisory Panel on
    Cyberinfrastructure[8], which identified five key service categories that form the foundation for comprehensive
    knowledge environments to enable individuals, teams and organizations to revolutionize scientific practice. These
    include: high performance computation services; data, information knowledge management services; observation,
    measurement, fabrication services; interfaces, visualization services; and collaboration services. The HENP
    communities have generated coordinated activity within the first four of these categories– primarily through: the use
    of common underlying middleware; widespread deployment of grid­enabled high performance computing resources;
    data grid projects for management of data, information and knowledge [9]; instruments that can be monitored
    remotely through eLogs and other interfaces; and development of the Grid­enabled Analysis Environment [6]. In
    contrast, progress within the collaboration services category has not been at a comparable level. As a result, there is
    some risk that the important work done in the other areas may be diminished if physicists lack highly usable and
    standard collaborative interfaces that tie together the disparate computational, data, and instrument resources. The
    proposed GECSR fills this gap by creating a coherent community­sanctioned activity within the collaboration
    services category that is equal in caliber and level of coordination to ongoing efforts within the other four categories.
    HENP’s Collaboration Challenges
    The major HENP experiments of the next twenty years will break new ground in our understanding of the
    fundamental interactions, structures and symmetries that govern the nature of matter and space­time in our universe.
    Among the principal goals at the high energy frontier are to find the mechanism responsible for mass in the
    universe, to discover the “Higgs” particles associated with mass generation, to understand the unification of the
    fundamental forces of nature, to search for new symmetries and extra dimensions, and to explore the fundamental
    mechanism that led to the predominance of matter over antimatter in the observable cosmos.
    The largest collaborations today, such as CMS[10] and ATLAS[11] who are building experiments for CERN’s[12]
    Large Hadron Collider (LHC; [13] ) program, each encompass of the order of 2000 physicists from 150 institutions

     
    2
    in more than 30 countries. Each of these collaborations include 300­400 physicists in the US, from more than 30
    universities as well as the major US HEP laboratories. The LHC experiments are expected to begin collecting data in
    2007, with a high level of collaborative activity required up to that date and well beyond. The current generation of
    experiments now in operation and taking data at SLAC[14] and Fermilab (D0[15] and CDF[16]) are similar in scale
    to the US contingent of the next­generation experiments. Each of these experiments faces unprecedented challenges
    in terms of:
    The global extent and multi­level organization of the physics collaborations, leading to the need to
    collaborate and share data­intensive work in fundamentally new ways.
    The data­intensiveness of the work, where the data volume to be processed, distributed, accessed and
    analyzed by a major experiment are in the Petabyte (10
    15
    Bytes) range now, and are expected to rise to the
    Exabyte (10
    18
    Bytes) range within the next ten years.
    The complexity of the data, particularly at the LHC where the physics discovery potential is related to the
    very high intensity (luminosity) as well as the high energy of the collisions, such that ~20 interactions
    accompany the particle interaction of interest
    The new paradigm of “Grids” and grid­computing[17] is thought to hold the key to meeting the collaboration,
    computing and data­management needs of HENP and there are significant efforts underway to explore and develop
    the grid toolkits and middleware that will be required for success in HENP. The complex problems of connecting
    and enabling resources like networks, computers and storage certainly justify the attention and support from the
    funding agencies and the HENP communities. However, these purely technical issues are equaled in complexity by
    the socio­technical problems of effectively and efficiently connecting and enabling physicists to do their science
    using emerging cyberinfrastructure – and these topics have not received as much attention from HENP physicists or
    the agencies. There is considerable risk in allowing this discrepancy to persist and the primary goal of this proposal
    is to redress this situation.
    A particular problem for the organization and conduct of cyberinfrastructure­based physics is that past practice and
    experience provides only limited guidance for what should be done in the future. Up to the present, HENP physicists
    have performed experiments and analyses in tightly coupled cooperating groups. Day­to­day research was typically
    conducted in smaller teams (1­5 physicists) that work closely together and then share their results within a larger
    overarching collection of teams for verification and further analysis. The scale of these enterprises, combined with
    frequent geographic concentration of collaborators, meant that communication could be organized primarily through
    informal and formal face­to­face contact (e.g., hallway encounters or lab meetings). By contrast, the worldwide
    scope and large size of newer HENP collaborations, such as CMS and ATLAS, make dependence on face­to­face
    communication unrealistic. The geographic dispersion of these next generation collaborations highlights the need for
    novel tools and work practices. For example, while experiments may run twenty­four hours a day, physicists are
    most likely to be active during working hours in their local time zones. Working together under these circumstances
    will require collaboration tools that support both synchronous and asynchronous activity.
    To summarize, we argue that a focused effort to develop a robust collaborative environment for large­scale HEPN
    experiments is critical, because physicists will receive their primary exposure to the other components of
    cyberinfrastructure via collaboration services – and if these services are deficient, then the potential benefit of the
    other capabilities will be reduced. The GECSR, then, is an effort to create a collaboration services element that is
    equal to the outstanding work already accomplished and underway in other physics cyberinfrastructure projects.
    Approach and Architecture
    To ensure the highest probability of meeting user needs, we will employ an iterative
     
    development strategy that
    combines assessment, development, deployment, and evaluation within each release cycle [18,19,20,21,22]. Our
    goal is the creation of a common portal interface that provides a standard look and feel for uniform access to a
    diverse array of the best open source applications from within the HENP communities, and from related Grid and
    NMI efforts. The basis for the GECSR portal interface will be the Open Grid Computing Environment (OGCE)
    [23]. The advantage of the OGCE is that it provides a common context to plug­in specialized capabilities – such as
    features specific to the HENP communities – alongside a standard set of collaboration tools and services (e.g.,
    announcement capabilities, persistent chat, shared calendars, role­based access control, and threaded discussions).
    At the University of Michigan, a variant of the OGCE is currently in use as an enterprise­wide course management

     
    3
    system with approximately 30,000 users per term. In the OGCE, the common context to add specialized features is
    achieved through the use of open­source components and standards, such as JSR 168, a java standard that defines
    elements called portlets for delivery of applications through portals. An advantage of the portlet approach, as
    implemented in OGCE, is that portlets are decomposed into their presentation components supported by persistent
    services accessed via a standardized API. In addition, the OGCE optionally extends login authentication to use the
    Grid [24] as an authentication provider, such that a proxy credential can also be stored by OGCE to allow portlets to
    perform Grid operations.
    Users will organize activity in the GECSR portal environment through
    sessions
    , or combinations of people, data,
    and communication that are tracked by the system. For example, in the case of a research group with ongoing
    meetings a new session will begin by calling up archived data, metadata, and tools that the group has used in the
    past. A key feature of any session is that resources available to users will be consistent with preferences and
    access/authorization levels of everyone involved in the session. For instance, a session will not employ a given
    capability, such as audio recording, unless a group decision rule is satisfied (e.g., majority approval or unanimous
    approval). Sessions will consist of both synchronous and asynchronous activity. In the synchronous setting,
    participants in a session will be able to access a wide array of communication capabilities, including video and audio
    conferencing, supplemented by the ability to view and control common applications, such as text editors, data
    analysis tools, or graphics viewers. In the asynchronous setting, users will be able to access previous sessions and to
    receive notifications when aspects of a session have changed, such as the addition of new or revised data. To aid
    retrieval of session content, the GECSR will use automatic archiving, where all versions of artifacts used in a
    session, such as documents, will be captured and cataloged.
    GECSR sessions will draw on the features of leading collaboration tools – many developed by or with the help of
    personnel involved with this proposal. A key motivation for selection of these tools, rather than comparable
    commercial products, is that the applications proposed for the GECSR are open source and in the public domain.
    This distinction is critical because the costs of commercial collaboration tools ­­ such as WebEx, Microsoft Office
    Live Meeting (formerly Placeware), or Xerox Docushare – represent a significant barrier to widespread adoption
    and use. Further, customization of these off­the­shelf products is unlikely due to proprietary control of the
    underlying software. Therefore, for synchronous audio and video communication, the GECSR will employ both the
     
    Virtual Rooms Videoconferencing System (VRVS) [5] and the Access Grid [25]. VRVS, installed on 26,000
    machines with over 8200 registered users in 99 countries, provides a worldwide multi­point connection service that
    allows users to work together with different protocols (SIP, H.323, Mbone, Access Grid, MPEG2, and MPEG4),
    operating systems, and applications. VRVS users are growing at a rate of roughly one new user per hour. The
    Access Grid is a specific implementation of high­quality audio hardware with large video displays that allows
    natural conversations in group­to­group interactions.
      
    VRVS interoperates with the Access Grid and provides
    scalable access to an AG along with interfacing of H.323 hardware. For sharing and viewing common applications
    in real time, GECSR will use the Virtual Network Computer (VNC) system. For secure presence awareness and
    instant messaging, the GECSR will use the Pervasive Collaborative Computing Environment (PCCE) [26], which
    includes Jabber eXtensible Messaging and Presence protocols as well as collaborative computational workflow
    tools. For asynchronous playback of audio, video, and PowerPoint slides, the GECSR will use the Synco­mat
    application developed for the Web Lecture Archive Project [27]. For access to stored data, the GECSR will use a
    peer­to­peer file sharing system based on a reliable and secure group communication protocol [28]. Finally, to
    automatically configure resources within sessions, the GECSR will use the Monitoring Agents in A Large Integrated
    Services Architecture (MonALISA) system [29], a monitoring framework and multithreaded, auto­discovering
    services architecture.
     
    We plan to integrate the tools described above into a unified system with the following novel features:
    Persistent Collaboration
    . Object persistence is a critical attribute of a collaborative environment. Without
    the ability to archive and make available the documents, whiteboards, conversations and interactions that
    occur in the course of an ongoing collaboration we lose one of the most transforming aspects collaborative
    environments could provide. Persistent documents and interactions in collaborative sessions create a form
    of group memory, keeping otherwise transient events or results available, not only for those who
    participated in the original session, but any others who may need to find such results;
    Language of access
    . We need to provide the users of the GECSR with a language of access to allow each
    user to define the circumstances under which various collaborators, groups or agents can interrupt them and
    at what level. The language should be rich enough to allow definition of automated ways of responding

     
    4
    under specified circumstances and must encompass dynamic definition of new groups or interactions. This
    language should also extend to allowing authorization decisions for persistent objects and the context
    sensitive use of collaborative tools, as well as use of tools consistent with group preferences and norms
    (i.e., as expressed through group decision rules). Such a capability is required to assure the users that they
    will have the flexibility to map social and organizational policies from existing collaborations to GECSR­
    enabled collaborations.
    Human­System­Human as well as Human­Human interactions for collaborative work.
    A customizable dynamic environment that archives interactions and allows dynamic creation of groups
    could quickly produce more data and meta­data than a typical user can process. Providing agent­assisted
    collaboration
     
    as an integral part of the GECSR will be critical to giving users the capabilities they will need
    to find information, join groups and interact with colleagues. In addition, monitoring tools can aid
    collaborations by automatically optimizing communication channels, such as in videoconferences. Finally,
    for data­intensive transactions, agents can help users determine optimal strategies for moving and viewing
    data.
    The importance of assessment and evaluation
    Because collaborative tools intersect with individual, group, and organizational work practices, a critical goal for
    successful development and deployment is identification and resolution of key socio­technical problems.
    Preliminary studies of the ATLAS Collaboration [7], for example, show that dissatisfaction with currently deployed
    technologies reflects the mismatch between expectations and performance, such as the difference between the ease
    of face­to­face meetings compared to videoconferences. GECSR’s effectiveness depends on its ability to provide
    capabilities essential to the HENP community. To guide and improve the alignment of performance with
    expectations, the GECSR team will include researchers with expertise in human­computer interaction (HCI) who
    can distill common themes from diverse and complicated social and technical feedback. Understanding these social
    and technical themes will produce two critical benefits for the GECSR project. First, the HCI analyses will help
    identify the functionality the various GECSR systems must provide to meet the requirements of the HENP
    community. Second, the HCI analyses will play a role in the adoption of GECSR by providing project visibility to
    target users very early in the design process, managing expectations, and keeping the end users engaged. The HCI
    effort falls into two categories of activity: assessment and evaluation. Assessment activity can be thought of as
    elicitation of requirements from physicists. Evaluation activity can be thought of as measurement of satisfaction
    with deployed systems.
    To meet the goals of the assessment effort, we will use multiple, complementary methods. Specifically, we propose
    to combine a community­based survey with selected intense observation of physicists at work. Surveys are a well­
    understood mechanism for efficiently obtaining data about preferences in a larger population through extrapolation
    from a relatively small number of randomly­selected respondents. The sample for the GECSR user survey will be a
    randomly stratified selection of physicists, students and technicians involved in the ATLAS, CMS, D0, and CDF
    collaborations. Items included in the questionnaire will cover: a) communication and collaboration (e.g., frequency
    and mode of communication by location of collaborators; size and geographic distribution of participants within
    primary collaborations); b) barriers and challenges to communication and collaboration (e.g., coordination loss,
    delays, miscommunication, information overload); c) practice (e.g., primary field of research; allocation of time
    across research activities); d) expectations for the GECSR (e.g., new opportunities for collaboration; better access to
    experimental data; new ways to incorporate research data in the classroom); e) priorities for GECSR capabilities
    (e.g., access to data repositories; remote participation in meetings and experiments; collaborative data visualization);
    and f) policies for data and instrument use (e.g., safety; data access; intellectual property). The aim of the GECSR
    survey activity will be to produce snapshots of needs and attitudes within the HENP community of users at the start,
    mid­point, and end of the GECSR project.
    In contrast to the broad perspective provided by the survey data, we propose to periodically observe activity and
    interview participants at 3­5 sites from U.S. institutions involved in the CMS and ATLAS collaborations. Data
    collected within these settings will help illuminate variation in user needs across different social and institutional
    arrangements (e.g., a large research university versus a small liberal arts college). Specifically, systematic
    observation is often the best way to detect the causes of low adoption of deployed technologies – such as Ruhleder
    and Star’s [30] analysis of Worm Community System use by
    c. elegans
    researchers (WCS) or Orlikowski’s [31]
    analysis of Lotus Notes use by employees of a consulting firm. In both cases, technically elegant tools failed

     
    5
    because key features did not match critical features of work organization, such as criteria for promotion or rewards
    systems. The aim of the observational and interview activity will be to generate descriptions of processes that either
    enhance or impede adoption of GECSR applications. As with the survey effort, these data will be used to shape
    deployment and development priorities – and serve as input to the evaluation effort.
    In the iterative development scheme proposed for GECSR, the evaluation component provides closure on each
    release cycle – and provides input for the next cycle – by offering feedback on user satisfaction and performance
    related to GECSR use. We propose to measure user satisfaction in terms of subjective reports and objective
    indicators. We plan to obtain subjective reports of user satisfaction through questionnaire items included on the
    second and third wave administrations of the community survey (described previously). These items will focus on:
    a) interface design (e.g., clarity of interface elements; function of icons); b) usability (e.g., do actions have expected
    consequences; are there predictable failure modes; is system help sufficient); c) functionality (e.g., does an
    application do what a user wants; are there missing capabilities); d) graphical layout (e.g., do plots show standard
    units; do applications consume appropriate screen real estate); and e) performance (e.g., did the application update
    promptly; did data transfer reliably). Objective indicators of satisfaction will include usage logs, where use levels
    are assumed to relate positively with satisfaction.
    In terms of performance related to GECSR use, we propose to use a variety of measures. First, because a target of
    GECSR development is collaboration, we will focus on the structure of collegial networks before and after GECSR
    introduction as a marker of GECSR impact on the extent and diversity of network ties. That is, Newman et al. [32]
    have developed a number of techniques for analyzing and visualizing graphs of extensive networks of association
    among physicists (e.g., using co­authorship). We will employ similar techniques to characterize networks among
    the population of scientists within the ATLAS, CMS, D0, and CDF collaborations. For example, a critical question
    will be the degree to which GECSR use results in more diverse collaborations, both measured in terms of geographic
    dispersion and in terms of research focus. In addition, we will be interested in whether GECSR use provides a
    greater capacity for collaboration, as measured by the number and size of primary collaborations. Second, because
    we expect GECSR use to enhance the significance of collaborations, we will be examining subjective and objective
    measures of collaboration impact related to use. For instance, subjective measures might include questionnaire
    items about the self­reported importance of a collaboration, the likelihood of future collaborations with the same
    collaborators, and the quality of the collaboration experience (e.g., trustworthiness of collaborators). Objective
    measures of impact might include the output from a collaboration, such as the number of papers and reports – and
    the visibility of this output. Finally, because one of the anticipated benefits of collaboratory­based science is a
    leveling of access to experts, instruments, and data – we will be examining whether the GECSR expands
    participation in physics research. That is, beyond the focal audience of users – the physicists in the large
    collaborations – we will look for evidence that GECSR tools create new opportunities for engagement by faculty at
    non­PhD institutions, by undergraduate, by K­12 students, and by the general public. Specifically, we plan to
    produce case descriptions of novel forms of involvement by new players that the GECSR may enable. The results
    of this evaluation activity will be shared with the larger scientific community and will inform the development of
    future tools, environments and collaborations in other science, engineering and education domains.
    Education and Outreach (E&O)
    The Education and Outreach (E&O) program associated with this proposal will enable a collaborative learning
    community engaging traditional and non­traditional students in our studies of particle physics at the energy frontier.
    In the collaborative learning environments that will result from the development components of the proposal,
    students will become empowered to do science projects and/or help develop the learning environment itself under
    the supervision of teachers, and education researchers. We will deploy collaboratory tools into high schools and
    college classrooms to disseminate high­energy physics knowledge, enhance physics classroom activities, and
    explore how the tools adapt to the school community.
    Florida International University (FIU) will introduce the tools to local high schools and evaluate their impact. FIU,
    in collaboration with partners at Florida State University (FSU), the University of Florida (UF), and the California
    Institute of Technology (Caltech), in cooperation with the NSF, are creating and operating an interregional Grid­
    enabled Center for High­Energy Physics Research and Educational Outreach (CHEPREO) at FIU, encompassing an
    integrated program of research, network infrastructure development, and education and outreach at one of the largest
    minority schools in the US [33]. CHEPREO is extending FIU’s existing research activities at Jefferson National
    Laboratory to the long­term high­energy physics research program at the CMS experiment at CERN, create a robust

     
    6
    outreach activity based on CMS research, develop an advanced networking and Grid computing infrastructure that
    will draw in new collaborators from South America, and enhance science and math education in South Florida for
    underserved minority students through pedagogic enhancements and teacher training led by a Physics Learning
    Center. Through GESCR support, FIU will also lead a coordinated program for E&O by creating a community of
    practice among the various collaborating institutions involved in this proposal, enabled through GESCR tools to
    insure the integration of E&O project goals: to engage minorities; improve physics classroom curriculum directed by
    teacher/student contribution and response; add a new dimension to existing programs (QuarkNet, ThinkQuest); and
    employ education research techniques to investigate these approaches.
    QuarkNet­affiliated [34] teachers will utilize the collaboratory tools through summer high­energy research
    programs. QuarkNet is a successful national program of particle physics education and outreach, supported by NSF
    that partners high school teachers with physicist mentors. Teachers work as researchers and collaborators in
    forefront physics projects. The program, now in its fifth year, has over 50 participating university and national
    laboratory centers, five of which are represented in this proposal. They have well established relationships between
    teachers and physicists and have reached tens of thousands of high school students through the programs. The
    University of Texas at Arlington (UTA) and the University of Iowa (UI) are among the twelve original QuarkNet
    Centers funded by NSF since 1999. FIU is a new QuarkNet center. The groups also cover significant geographic
    regions and have the potential to engage underrepresented groups in HENP. The E&O team envisages a
    competition, modeled after the ThinkQuest [35] concept of forming 5 person teams with a coach to develop a project
    to be shared. While ThinkQuest projects are website lessons, we would develop live lesson plans that would involve
    remote teams linked together into the collaborative learning environment. Moreover teams could and should consist
    of geographical separated members with like interests. By providing access to our facilities and research activities,
    we will build personal fulfillment and science excitement in students as well as open their eyes to new career
    opportunities. Teachers will be reinvigorated through greater exposure to pedagogical innovations that will translate
    into successful physics classroom experiences.
    Our first goal is to disseminate the collaboratory tools into schools located near our partners. The tools will provide
    a 'real window' into the exciting world of particle physics at remote accelerator laboratories for many K­12 students
    and teachers. For education and outreach involving K­12 teachers, we will draw from the pool of active and highly
    regarded QuarkNet Teachers affiliated with our participating groups’ QuarkNet Centers . Through the collaboratory
    tools, students and teachers will interact with physicists at universities and laboratories to bring high energy physics
    into the classroom. Classroom­classroom interactions will also be a vital part of the proposal to foster teacher
    mentoring as well as to build a community among students. We will also provide internships for students to actively
    work on developing, prototyping, testing and using the collaborative tools. The Education and Outreach Area
    Coordinator will establish a group that will discuss communication and development issues related to education.
    These discussions will be aimed at the goals of developing collaborative learning sessions referencing both
    QuarkNet and ThinkQuest team lesson development model, with a new dimension of live interactive participation,
    and collaborative class and research projects, among undergraduates and high school students.
    Prairie View A&M (PVAMU) is the second oldest university in the state of Texas. Blacks constituted 92% of the
    undegraduate enrollment of 7255 students at PVAMU in Fall 2002. UTA will partner with PVAMU to concentrate
    on minority recruiting programs. Since we are delivering a new collaborative environment in this proposal, it would
    be appropriate to focus on how this new paradigm can help traditionally under­represented students in physics.
    PVAMU played a major role in developing graphics software for BaBar. PVAMU is in the process of putting
    together a large donated grid computing cluster. The collaborative tools to be developed through this proposal, with
    active student involvement at PVAMU, can open doors for traditionally underrepresented students to participate in
    the greatest scientific discoveries at U.S. and international laboratories.
    In the area of education research, we will develop direct and collaborative links with colleagues engaged in studying
    the effectiveness of collaboratory tools from an educational perspective at the undergraduate level as well as in the
    education of scientists joining large­scale research projects. To do this, we provide support for a graduate student
    doing dissertation research in the School of Education at Michigan on the effectiveness of information delivery
    using the WLAP technology. Undergraduates participating in a summer Research Experience for Undergraduates
    (REU) at CERN and general physics undergraduates will be used as subjects in research on design principles for the
    WLAP technology and in an attempt to study the role of individual difference in the design of instructional materials
    for same [36,37]. The results of this research will be immediately applied to the research and development efforts of
    this proposal that relate to human­computer interface and will be incorporated into the other aspects of this
    educational and outreach proposal. Through this research, physicists will gain insight from professional educators

     
    7
    into the nature and effectiveness of information delivery using the WLAP technology. Simultaneously, educators
    will gain insight into the nature and operation of science experimentation in large collaborations, which may spawn
    further research. The team at the University of Michigan will also carefully examine the transferability of the
    knowledge gained in the development of the GECSR Project to the development of college­level courses. By
    drawing upon the 300 + high energy physics lectures already recorded by the proposal participants from the CERN
    Summer Student Program and interfacing these through the OGCE portal we plan to develop a full senior­level
    course in high energy physics and to evaluate and track the performance of students who enroll in it.
    The University of Iowa team will leverage four existing programs to conduct education and outreach for this
    proposal: Iowa QuarkNet, Science Education Center of School of Education, GROW (Grid Research and education
    group @ IoWa), and the Studio of Academic Technologies for Education.
    A special effort will be undertaken by Florida International University in developing a “Grids for Kids”
    Collaboration on a physics grid tailored for high school implementation which will enable grades 9–12 students to
    interact in a virtual global village where hands­on implementation and use can promote both understanding and
    spark an interest to continue studies in physics and technology at the undergraduate level. A small grid array (1­5
    servers) in a high school classroom may easily stimulate interest by placing the tools to analyze a simple result set to
    modeling complex physical behavior. Other related projects are planned at the K­12 level and for teachers. Teachers
    lesson plans now can include more meaningful applications of theory. Experiments can include larger data sets;
    more complex data and their results could be analyzed at a rate much higher than could be accomplished with
    current technology. An environment where teachers can submit a data set to run on a physics cluster with Gigaflops
    of computing power and also monitor the job as it runs; all this from a remote location, can be an exhilarating
    experience for kids of any background. Grid physics for kids would immensely help the entry level knowledge of
    university students; wherever this initiative is implanted.
    Summary of Coordination Plan
    A significant challenge in the proposed GECSR project involves successful coordination of diverse participants
    representing multiple institutions and different disciplinary backgrounds. For example, a recent analysis of 62
    completed projects from the NSF Knowledge and Distributed Intelligence (KDI) program, conducted by Cummings
    and Kiesler [38,39] found varying outcomes in geographically distributed multi­disciplinary teams. Specifically,
    performance did not vary by multidisciplinarity – but was strongly influenced by geographic dispersion. In
    particular, dispersed projects that did better employed a number of common management practices – including
    holding a kickoff workshop or conference, conducting regular audio conferences, and exchanging email regularly.
    However, workshops and sustained communication were costly in terms of resources and attention of project
    leaders. Analysis of responses from the PIs in the KDI projects showed the need for tools and practices that could
    accomplish the following: a) manage and track the trajectory of tasks over time; b) reduce information overload; c)
    support spontaneous conversation – including awareness of availability for conversation at a distance; d) support
    simultaneous decision making; and e) support scheduling of and participation in presentations and meetings across
    distance.
    In recognition of the risks posed by a team distributed across eight or more institutions we will take explicit steps to
    address the pitfalls identified in the KDI study (in addition to regular use of standard modes of communication, such
    as the phone, email and etc.). Note that the short description here is expanded in the separate three­page
    Coordination Plan that accompanies this submission. First, project activity will be lead and directed by a
    management committee consisting of the four Co­PIs and chaired by Dr. Neal. The management committee will
    meet three times a month, with the default mode being via videoconference – using technology under development
    in the scope of the GECSR project (e.g., VRVS and Access Grid). Once a month the committee will hold an
    augmented management meeting where leaders of the core technology will join the four Co­PIs. Second, the project
    will use the calendar tool within the OGCE to maintain an updated roadmap of project deliverables and milestones.
    In addition, a current Gannt chart will be posted in the resources area of the project OGCE workspace for quick
    review of schedule expectations and contingencies. Third, to reduce information overload, the project will use the
    resources area within the OGCE project space to post and review critical documents and URLs, with notification
    levels that individuals can adjust (e.g., frequency and type of notifications). Rigorous use of the resources area
    should also reduce the phenomenon of “death by email attachment” that plagues many distributed projects. Fourth,
    to provide presence awareness and opportunities for spontaneous interaction, project members will adopt the Jabber­
    based instant messaging application being developed for GECSR – and also use the built­in presence awareness and

     
    8
    chat capabilities available within OGCE. Fifth, to support distributed presentations, the project will use tools in the
    Access Grid distribution, such as
    dppt
    , to view PowerPoint slides across multiple locations. Finally, to jumpstart
    creation and maintenance of trust and cohesion within the project team, the project will hold an annual workshop
    where participants can gather and discuss issues face­to­face – with the initial meeting held as a kickoff event
    shortly after notification of funding. In addition, the project leadership will identify opportunities for sub­sets of the
    project personnel to meet, such as combining project­related meetings with larger meetings (e.g., ATLAS week).
    Timeline and Milestones
    First Year: “Integrating Collaborative Tools”
    The goal of the first year will be successful integration of core collaborative tools within the OGCE portal
    environment. Specifically, at the end of Year 1, users will be able to access key capabilities to support GECSR
    sessions within a common interface, including: videoconferencing, presence awareness and messaging, control and
    viewing of shared applications, and a shared file system. Each tool identified for integration into OGCE will be
    enhanced as needed to provide security, persistence, and compatibility with access and monitoring capabilities (e.g.,
    MonALISA).
    In terms of synchronous collaborative tool development, key deliverables in Year 1 will be:
    Adaptation of VRVS to work within the OGCE environment. Specifically, this will involve development
    of management tools within OGCE that allow users to create groups and associate these groups with
    GECSR sessions. The management tools will also allow users to associate VRVS resources with GECSR
    sessions, such as live or archived videoconferences. For example, users will have the option to convert a
    transient GECSR session into a persistent session, where the GECSR system will automatically apply
    archiving (e.g., indexed MPEG audio/video streams) to communication in persistent sessions – subject to
    the preferences of the collaborators; (CalTech)
    Adaptation of VNC to work within the OGCE environment. Specifically, this will involve development of
    OGCE tools that allow users associated with a GECSR session to control and view shared applications.
    Early development of specialized modules, for example java applets, optimized for quality of service under
    dynamically changing bandwidths will be explored and developed; (CalTech, Michigan)
    Adaptation of MonALISA to work within the OGCE environment. The MonALISA system provides a
    distributed monitoring and control service [40] for collaborative applications. MonALISA is based on a
    scalable Dynamic Distributed Services Architecture [41] and uses a multi­threaded engine for
    asynchronous data collection and interaction with other services and agents. Activity related to MonALISA
    in Year 1 will involve development of OGCE tools that allow users to access output of MonALISA
    processes, such as monitoring of hosts where VRVS reflectors are running (load, CPU, IO traffic),
    monitoring the status of VRVS reflectors (number of VRVS rooms, clients, peers, traffic), providing
    remote administrative control of reflectors using a X.509 certificate based SSL connection, providing real
    time measurements for the quality of connectivity between selected possible peers (ABPing measures RTT,
    lost packages, the jitters using UDP and computes the quality of the connection every 2 s); (CalTech,
    Michigan)
    Adaptation of the PCCE to work within the OGCE environment. The PCCE software provides secure
    presence and messaging that provides both synchronous and asynchronous communication. The PCCE
    software is based on the XMPP standards[42] and leverages the Jabber open source software
    developments[43]; (LBNL,UTA)
    In terms of asynchronous collaborative tool development, key deliverables in Year 1 will be:
    Adaptation of reliable and secure peer­to­peer information sharing within the OGCE environment. The
    scalable and secure peer­to­peer information sharing tool enables groups to securely share information
    among collaborators from the information’s natural storage location (e.g. files and databases)[28]; (LBNL);
    Adaptation of automatic archiving systems within the OGCE environment; (All)
    In terms of deployment, key deliverables in Year 1 will be:

     
    9
    Use of GECSR by the physics participants in this proposal to perform physics analysis, detector
    commissioning and detector operation; (All, Maryland leads)
    Tighter integration of the legacy experiment specific code with the GESCR environment, for example,
    integrating PCCE and OGCE infrastructure into data production and analysis tools; (All, Michigan leads)
    In terms of assessment and evaluation, key deliverables in Year 1 will be:
    Selection of the community sample, administration of the baseline community survey, analysis of the
    survey data, and submission of a report based on the survey results; (Michigan)
    Selection of the sites for in­depth observation and interviews, completion of the baseline visits to these
    sites, and submission of a report based on the results of the observations and interviews; (Michigan)
    Analysis of feedback from physics participants in the GECSR project about the performance and usability
    of the GECSR environment; (Michigan)
    Second Year: “Introducing Dynamic Collaboration Capability”
    The focus of the development effort in Year 2 will shift from the integration effort of the Year 1 release to the
    development of new tools that leverage capabilities that will exist in GECSR after the initial release and add
    completely new capabilities to GECSR. In particular, a key goal for Year 2 will be development of a “language of
    access” that when combined with other monitoring capabilities in GECSR will assist users in discovering and
    joining interesting collaborations. All of the OGCE services will be adapted to also support WSDL bindings using
    the WS­Resource Framework (WS­RF) [44]. This will allow collaborative capabilities to be added to any
    application that is capable of acting as WS­RF client.
    In terms of synchronous collaborative tool development, key deliverables for Year 2 will be:
    Dynamic optimization of connectivity. Using MonALISA, we will provide a dynamic distributed decision
    unit to compute a global Minimum Spanning Tree (MST) to optimize connectivity and to correctly solve
    the connectivity when reflectors are down or unreachable due to network problems. This is done using a
    set of agents deployed to all the monitoring services to collect the connectivity information and to
    dynamically compute the MST; (CalTech)
    Implementation of a language of access. The GECSR will have a language of access, using authorization
    tools [45] that allow association of ad­hoc groups of physicists with GECSR sessions based on individual
    X.509 certificates. Prior work has produced Web­based tools that implement much of the language of
    access functionality. The new work in Year 2 will integrate these existing tools within the OGCE
    environment; (CalTech)
    Implementation of location­aware monitoring. We will use a geographic­information­systems­based Grid
    information broker (GIS­GIB) [46] to provide physicists with a location­aware mechanism for monitoring
    GECSR­based collaborations, discovering interesting collaborations, and analyzing the spatial and
    temporal characteristics of collaborations. GIS­GIB will help physicists dynamically identify where and
    when interesting collaborations are taking place. In addition, the GIS­GIB can be used to organize and
    analyze coll