The High Energy and Nuclear Physics (HENP) communities face unprecedented challenges as they seek to effectively execute the next generation of experiments involving hundreds to thousands of collaborators around the world. The very success of these experiments will critically depend on sustaining vigorous geographically distributed collaborations that are coherent, efficient and effective while accessing, processing, and sharing Petabyte-scale data. This has never been done before and there are serious concerns that the science emerging from these projects will be limited by the absence of robust tools to make such large-scale, highly interactive collaborations possible. We seek to address these challenges by developing and deploying a collaborative environment that both integrates existing technologies and develops innovative new technologies that will be readily adopted by the target communities. That is the purpose of the proposal presented herein.
The HENP communities have a strong history of innovative use of information technology to support collaborative activity, as indicated by HENP leadership in development of the world’s most advanced Grid-based systems [1,2,3,4], in deployment of scalable videoconferencing tools [5], and in the use of Grid-enabled analysis environments [6]. However, despite this important progress, there is no unifying collaborative framework that links these key capabilities together. Recent work has shown that heterogeneous collaborative tools – i.e., where physicists must shift from one application to another as they perform different tasks – can introduce new sources of complexity that impede rather than aid collaborations [7]. We will seek to avoid this problem by adopting a holistic design approach that involves representatives from the end-user community in all stages of the research, development and deployment processes.
We propose to produce an integrated collaborative environment, called the Grid-Enabled Collaboratory for Scientific Research (GECSR), that will combine the best open-source applications from within the HENP communities (and from allied Grid and National Middleware efforts) within a common portal interface. To ensure that the capabilities of the GECSR meet the needs of target users, we will follow an iterative development model that in each cycle will include systematic assessment of user requirements, tool development and deployment consistent with requirements, and evaluation to determine whether deployed tools will satisfy requirements. A distinguishing feature of this approach is that the combination of expertise in physics, computer science, and social science will greatly enhance the chances of both technological and sociological success. Institutions in our proposal have established track records of international leadership in each of these areas.
The significance of the GECSR effort is underlined by the recent report of the NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure[8], which identified five key service categories that form the foundation for comprehensive knowledge environments to enable individuals, teams and organizations to revolutionize scientific practice. These include: high performance computation services; data, information knowledge management services; observation, measurement, fabrication services; interfaces, visualization services; and collaboration services. The HENP communities have generated coordinated activity within the first four of these categories– primarily through: the use of common underlying middleware; widespread deployment of grid-enabled high performance computing resources; data grid projects for management of data, information and knowledge [9]; instruments that can be monitored remotely through eLogs and other interfaces; and development of the Grid-enabled Analysis Environment [6]. In contrast, progress within the collaboration services category has not been at a comparable level. As a result, there is some risk that the important work done in the other areas may be diminished if physicists lack highly usable and standard collaborative interfaces that tie together the disparate computational, data, and instrument resources. The proposed GECSR fills this gap by creating a coherent community-sanctioned activity within the collaboration services category that is equal in caliber and level of coordination to ongoing efforts within the other four categories.
HENP’s Collaboration Challenges
The major HENP experiments of the next twenty years will break new ground in our understanding of the fundamental interactions, structures and symmetries that govern the nature of matter and space-time in our universe. Among the principal goals at the high energy frontier are to find the mechanism responsible for mass in the universe, to discover the “Higgs” particles associated with mass generation, to understand the unification of the fundamental forces of nature, to search for new symmetries and extra dimensions, and to explore the fundamental mechanism that led to the predominance of matter over antimatter in the observable cosmos.
The largest collaborations today, such as CMS[10] and ATLAS[11] who are building experiments for CERN’s[12] Large Hadron Collider (LHC; [13] ) program, each encompass of the order of 2000 physicists from 150 institutions in more than 30 countries. Each of these collaborations include 300-400 physicists in the US, from more than 30 universities as well as the major US HEP laboratories. The LHC experiments are expected to begin collecting data in 2007, with a high level of collaborative activity required up to that date and well beyond. The current generation of experiments now in operation and taking data at SLAC[14] and Fermilab (D0[15] and CDF[16]) are similar in scale to the US contingent of the next-generation experiments. Each of these experiments faces unprecedented challenges in terms of:
The global extent and multi-level organization of the physics collaborations, leading to the need to collaborate and share data-intensive work in fundamentally new ways.
The data-intensiveness of the work, where the data volume to be processed, distributed, accessed and analyzed by a major experiment are in the Petabyte (1015 Bytes) range now, and are expected to rise to the Exabyte (1018 Bytes) range within the next ten years.
The complexity of the data, particularly at the LHC where the physics discovery potential is related to the very high intensity (luminosity) as well as the high energy of the collisions, such that ~20 interactions accompany the particle interaction of interest
The new paradigm of “Grids” and grid-computing[17] is thought to hold the key to meeting the collaboration, computing and data-management needs of HENP and there are significant efforts underway to explore and develop the grid toolkits and middleware that will be required for success in HENP. The complex problems of connecting and enabling resources like networks, computers and storage certainly justify the attention and support from the funding agencies and the HENP communities. However, these purely technical issues are equaled in complexity by the socio-technical problems of effectively and efficiently connecting and enabling physicists to do their science using emerging cyberinfrastructure – and these topics have not received as much attention from HENP physicists or the agencies. There is considerable risk in allowing this discrepancy to persist and the primary goal of this proposal is to redress this situation.
A particular problem for the organization and conduct of cyberinfrastructure-based physics is that past practice and experience provides only limited guidance for what should be done in the future. Up to the present, HENP physicists have performed experiments and analyses in tightly coupled cooperating groups. Day-to-day research was typically conducted in smaller teams (1-5 physicists) that work closely together and then share their results within a larger overarching collection of teams for verification and further analysis. The scale of these enterprises, combined with frequent geographic concentration of collaborators, meant that communication could be organized primarily through informal and formal face-to-face contact (e.g., hallway encounters or lab meetings). By contrast, the worldwide scope and large size of newer HENP collaborations, such as CMS and ATLAS, make dependence on face-to-face communication unrealistic. The geographic dispersion of these next generation collaborations highlights the need for novel tools and work practices. For example, while experiments may run twenty-four hours a day, physicists are most likely to be active during working hours in their local time zones. Working together under these circumstances will require collaboration tools that support both synchronous and asynchronous activity.
To summarize, we argue that a focused effort to develop a robust collaborative environment for large-scale HEPN experiments is critical, because physicists will receive their primary exposure to the other components of cyberinfrastructure via collaboration services – and if these services are deficient, then the potential benefit of the other capabilities will be reduced. The GECSR, then, is an effort to create a collaboration services element that is equal to the outstanding work already accomplished and underway in other physics cyberinfrastructure projects.
Approach and Architecture
To ensure the highest probability of meeting user needs, we will employ an iterative development strategy that combines assessment, development, deployment, and evaluation within each release cycle [18,19,20,21,22]. Our goal is the creation of a common portal interface that provides a standard look and feel for uniform access to a diverse array of the best open source applications from within the HENP communities, and from related Grid and NMI efforts. The basis for the GECSR portal interface will be the Open Grid Computing Environment (OGCE) [23]. The advantage of the OGCE is that it provides a common context to plug-in specialized capabilities – such as features specific to the HENP communities – alongside a standard set of collaboration tools and services (e.g., announcement capabilities, persistent chat, shared calendars, role-based access control, and threaded discussions). At the University of Michigan, a variant of the OGCE is currently in use as an enterprise-wide course management system with approximately 30,000 users per term. In the OGCE, the common context to add specialized features is achieved through the use of open-source components and standards, such as JSR 168, a java standard that defines elements called portlets for delivery of applications through portals. An advantage of the portlet approach, as implemented in OGCE, is that portlets are decomposed into their presentation components supported by persistent services accessed via a standardized API. In addition, the OGCE optionally extends login authentication to use the Grid [24] as an authentication provider, such that a proxy credential can also be stored by OGCE to allow portlets to perform Grid operations.
Users will organize activity in the GECSR portal environment through sessions, or combinations of people, data, and communication that are tracked by the system. For example, in the case of a research group with ongoing meetings a new session will begin by calling up archived data, metadata, and tools that the group has used in the past. A key feature of any session is that resources available to users will be consistent with preferences and access/authorization levels of everyone involved in the session. For instance, a session will not employ a given capability, such as audio recording, unless a group decision rule is satisfied (e.g., majority approval or unanimous approval). Sessions will consist of both synchronous and asynchronous activity. In the synchronous setting, participants in a session will be able to access a wide array of communication capabilities, including video and audio conferencing, supplemented by the ability to view and control common applications, such as text editors, data analysis tools, or graphics viewers. In the asynchronous setting, users will be able to access previous sessions and to receive notifications when aspects of a session have changed, such as the addition of new or revised data. To aid retrieval of session content, the GECSR will use automatic archiving, where all versions of artifacts used in a session, such as documents, will be captured and cataloged.
GECSR sessions will draw on the features of leading collaboration tools – many developed by or with the help of personnel involved with this proposal. A key motivation for selection of these tools, rather than comparable commercial products, is that the applications proposed for the GECSR are open source and in the public domain. This distinction is critical because the costs of commercial collaboration tools -- such as WebEx, Microsoft Office Live Meeting (formerly Placeware), or Xerox Docushare – represent a significant barrier to widespread adoption and use. Further, customization of these off-the-shelf products is unlikely due to proprietary control of the underlying software. Therefore, for synchronous audio and video communication, the GECSR will employ both the Virtual Rooms Videoconferencing System (VRVS) [5] and the Access Grid [25]. VRVS, installed on 26,000 machines with over 8200 registered users in 99 countries, provides a worldwide multi-point connection service that allows users to work together with different protocols (SIP, H.323, Mbone, Access Grid, MPEG2, and MPEG4), operating systems, and applications. VRVS users are growing at a rate of roughly one new user per hour. The Access Grid is a specific implementation of high-quality audio hardware with large video displays that allows natural conversations in group-to-group interactions. VRVS interoperates with the Access Grid and provides scalable access to an AG along with interfacing of H.323 hardware. For sharing and viewing common applications in real time, GECSR will use the Virtual Network Computer (VNC) system. For secure presence awareness and instant messaging, the GECSR will use the Pervasive Collaborative Computing Environment (PCCE) [26], which includes Jabber eXtensible Messaging and Presence protocols as well as collaborative computational workflow tools. For asynchronous playback of audio, video, and PowerPoint slides, the GECSR will use the Synco-mat application developed for the Web Lecture Archive Project [27]. For access to stored data, the GECSR will use a peer-to-peer file sharing system based on a reliable and secure group communication protocol [28]. Finally, to automatically configure resources within sessions, the GECSR will use the Monitoring Agents in A Large Integrated Services Architecture (MonALISA) system [29], a monitoring framework and multithreaded, auto-discovering services architecture.
We plan to integrate the tools described above into a unified system with the following novel features:
Persistent Collaboration. Object persistence is a critical attribute of a collaborative environment. Without the ability to archive and make available the documents, whiteboards, conversations and interactions that occur in the course of an ongoing collaboration we lose one of the most transforming aspects collaborative environments could provide. Persistent documents and interactions in collaborative sessions create a form of group memory, keeping otherwise transient events or results available, not only for those who participated in the original session, but any others who may need to find such results;
Language of access. We need to provide the users of the GECSR with a language of access to allow each user to define the circumstances under which various collaborators, groups or agents can interrupt them and at what level. The language should be rich enough to allow definition of automated ways of responding under specified circumstances and must encompass dynamic definition of new groups or interactions. This language should also extend to allowing authorization decisions for persistent objects and the context sensitive use of collaborative tools, as well as use of tools consistent with group preferences and norms (i.e., as expressed through group decision rules). Such a capability is required to assure the users that they will have the flexibility to map social and organizational policies from existing collaborations to GECSR-enabled collaborations.
Human-System-Human as well as Human-Human interactions for collaborative work.
A customizable dynamic environment that archives interactions and allows dynamic creation of groups could quickly produce more data and meta-data than a typical user can process. Providing agent-assisted collaboration as an integral part of the GECSR will be critical to giving users the capabilities they will need to find information, join groups and interact with colleagues. In addition, monitoring tools can aid collaborations by automatically optimizing communication channels, such as in videoconferences. Finally, for data-intensive transactions, agents can help users determine optimal strategies for moving and viewing data.
The importance of assessment and evaluation
Because collaborative tools intersect with individual, group, and organizational work practices, a critical goal for successful development and deployment is identification and resolution of key socio-technical problems. Preliminary studies of the ATLAS Collaboration [7], for example, show that dissatisfaction with currently deployed technologies reflects the mismatch between expectations and performance, such as the difference between the ease of face-to-face meetings compared to videoconferences. GECSR’s effectiveness depends on its ability to provide capabilities essential to the HENP community. To guide and improve the alignment of performance with expectations, the GECSR team will include researchers with expertise in human-computer interaction (HCI) who can distill common themes from diverse and complicated social and technical feedback. Understanding these social and technical themes will produce two critical benefits for the GECSR project. First, the HCI analyses will help identify the functionality the various GECSR systems must provide to meet the requirements of the HENP community. Second, the HCI analyses will play a role in the adoption of GECSR by providing project visibility to target users very early in the design process, managing expectations, and keeping the end users engaged. The HCI effort falls into two categories of activity: assessment and evaluation. Assessment activity can be thought of as elicitation of requirements from physicists. Evaluation activity can be thought of as measurement of satisfaction with deployed systems.
To meet the goals of the assessment effort, we will use multiple, complementary methods. Specifically, we propose to combine a community-based survey with selected intense observation of physicists at work. Surveys are a well-understood mechanism for efficiently obtaining data about preferences in a larger population through extrapolation from a relatively small number of randomly-selected respondents. The sample for the GECSR user survey will be a randomly stratified selection of physicists, students and technicians involved in the ATLAS, CMS, D0, and CDF collaborations. Items included in the questionnaire will cover: a) communication and collaboration (e.g., frequency and mode of communication by location of collaborators; size and geographic distribution of participants within primary collaborations); b) barriers and challenges to communication and collaboration (e.g., coordination loss, delays, miscommunication, information overload); c) practice (e.g., primary field of research; allocation of time across research activities); d) expectations for the GECSR (e.g., new opportunities for collaboration; better access to experimental data; new ways to incorporate research data in the classroom); e) priorities for GECSR capabilities (e.g., access to data repositories; remote participation in meetings and experiments; collaborative data visualization); and f) policies for data and instrument use (e.g., safety; data access; intellectual property). The aim of the GECSR survey activity will be to produce snapshots of needs and attitudes within the HENP community of users at the start, mid-point, and end of the GECSR project.
In contrast to the broad perspective provided by the survey data, we propose to periodically observe activity and interview participants at 3-5 sites from U.S. institutions involved in the CMS and ATLAS collaborations. Data collected within these settings will help illuminate variation in user needs across different social and institutional arrangements (e.g., a large research university versus a small liberal arts college). Specifically, systematic observation is often the best way to detect the causes of low adoption of deployed technologies – such as Ruhleder and Star’s [30] analysis of Worm Community System use by In contrast to the broad perspective provided by the survey data, we propose to periodically observe activity and interview participants at 3-5 sites from U.S. institutions involved in the CMS and ATLAS collaborations. Data collected within these settings will help illuminate variation in user needs across different social and institutional arrangements (e.g., a large research university versus a small liberal arts college). Specifically, systematic observation is often the best way to detect the causes of low adoption of deployed technologies – such as Ruhleder and Star’s [30] analysis of Worm Community System use by c. elegans researchers (WCS) or Orlikowski’s [31] analysis of Lotus Notes use by employees of a consulting firm. In both cases, technically elegant tools failed because key features did not match critical features of work organization, such as criteria for promotion or rewards systems. The aim of the observational and interview activity will be to generate descriptions of processes that either enhance or impede adoption of GECSR applications. As with the survey effort, these data will be used to shape deployment and development priorities – and serve as input to the evaluation effort.
In the iterative development scheme proposed for GECSR, the evaluation component provides closure on each release cycle – and provides input for the next cycle – by offering feedback on user satisfaction and performance related to GECSR use. We propose to measure user satisfaction in terms of subjective reports and objective indicators. We plan to obtain subjective reports of user satisfaction through questionnaire items included on the second and third wave administrations of the community survey (described previously). These items will focus on: a) interface design (e.g., clarity of interface elements; function of icons); b) usability (e.g., do actions have expected consequences; are there predictable failure modes; is system help sufficient); c) functionality (e.g., does an application do what a user wants; are there missing capabilities); d) graphical layout (e.g., do plots show standard units; do applications consume appropriate screen real estate); and e) performance (e.g., did the application update promptly; did data transfer reliably). Objective indicators of satisfaction will include usage logs, where use levels are assumed to relate positively with satisfaction.
In terms of performance related to GECSR use, we propose to use a variety of measures. First, because a target of GECSR development is collaboration, we will focus on the structure of collegial networks before and after GECSR introduction as a marker of GECSR impact on the extent and diversity of network ties. That is, Newman et al. [32] have developed a number of techniques for analyzing and visualizing graphs of extensive networks of association among physicists (e.g., using co-authorship). We will employ similar techniques to characterize networks among the population of scientists within the ATLAS, CMS, D0, and CDF collaborations. For example, a critical question will be the degree to which GECSR use results in more diverse collaborations, both measured in terms of geographic dispersion and in terms of research focus. In addition, we will be interested in whether GECSR use provides a greater capacity for collaboration, as measured by the number and size of primary collaborations. Second, because we expect GECSR use to enhance the significance of collaborations, we will be examining subjective and objective measures of collaboration impact related to use. For instance, subjective measures might include questionnaire items about the self-reported importance of a collaboration, the likelihood of future collaborations with the same collaborators, and the quality of the collaboration experience (e.g., trustworthiness of collaborators). Objective measures of impact might include the output from a collaboration, such as the number of papers and reports – and the visibility of this output. Finally, because one of the anticipated benefits of collaboratory-based science is a leveling of access to experts, instruments, and data – we will be examining whether the GECSR expands participation in physics research. That is, beyond the focal audience of users – the physicists in the large collaborations – we will look for evidence that GECSR tools create new opportunities for engagement by faculty at non-PhD institutions, by undergraduate, by K-12 students, and by the general public. Specifically, we plan to produce case descriptions of novel forms of involvement by new players that the GECSR may enable. The results of this evaluation activity will be shared with the larger scientific community and will inform the development of future tools, environments and collaborations in other science, engineering and education domains.
Education and Outreach (E&O)
The Education and Outreach (E&O) program associated with this proposal will enable a collaborative learning community engaging traditional and non-traditional students in our studies of particle physics at the energy frontier. In the collaborative learning environments that will result from the development components of the proposal, students will become empowered to do science projects and/or help develop the learning environment itself under the supervision of teachers, and education researchers. We will deploy collaboratory tools into high schools and college classrooms to disseminate high-energy physics knowledge, enhance physics classroom activities, and explore how the tools adapt to the school community.
Florida International University (FIU) will introduce the tools to local high schools and evaluate their impact. FIU, in collaboration with partners at Florida State University (FSU), the University of Florida (UF), and the California Institute of Technology (Caltech), in cooperation with the NSF, are creating and operating an interregional Grid-enabled Center for High-Energy Physics Research and Educational Outreach (CHEPREO) at FIU, encompassing an integrated program of research, network infrastructure development, and education and outreach at one of the largest minority schools in the US [33]. CHEPREO is extending FIU’s existing research activities at Jefferson National Laboratory to the long-term high-energy physics research program at the CMS experiment at CERN, create a robust outreach activity based on CMS research, develop an advanced networking and Grid computing infrastructure that will draw in new collaborators from South America, and enhance science and math education in South Florida for underserved minority students through pedagogic enhancements and teacher training led by a Physics Learning Center. Through GESCR support, FIU will also lead a coordinated program for E&O by creating a community of practice among the various collaborating institutions involved in this proposal, enabled through GESCR tools to insure the integration of E&O project goals: to engage minorities; improve physics classroom curriculum directed by teacher/student contribution and response; add a new dimension to existing programs (QuarkNet, ThinkQuest); and employ education research techniques to investigate these approaches.
QuarkNet-affiliated [34] teachers will utilize the collaboratory tools through summer high-energy research programs. QuarkNet is a successful national program of particle physics education and outreach, supported by NSF that partners high school teachers with physicist mentors. Teachers work as researchers and collaborators in forefront physics projects. The program, now in its fifth year, has over 50 participating university and national laboratory centers, five of which are represented in this proposal. They have well established relationships between teachers and physicists and have reached tens of thousands of high school students through the programs. The University of Texas at Arlington (UTA) and the University of Iowa (UI) are among the twelve original QuarkNet Centers funded by NSF since 1999. FIU is a new QuarkNet center. The groups also cover significant geographic regions and have the potential to engage underrepresented groups in HENP. The E&O team envisages a competition, modeled after the ThinkQuest [35] concept of forming 5 person teams with a coach to develop a project to be shared. While ThinkQuest projects are website lessons, we would develop live lesson plans that would involve remote teams linked together into the collaborative learning environment. Moreover teams could and should consist of geographical separated members with like interests. By providing access to our facilities and research activities, we will build personal fulfillment and science excitement in students as well as open their eyes to new career opportunities. Teachers will be reinvigorated through greater exposure to pedagogical innovations that will translate into successful physics classroom experiences.
Our first goal is to disseminate the collaboratory tools into schools located near our partners. The tools will provide a 'real window' into the exciting world of particle physics at remote accelerator laboratories for many K-12 students and teachers. For education and outreach involving K-12 teachers, we will draw from the pool of active and highly regarded QuarkNet Teachers affiliated with our participating groups’ QuarkNet Centers . Through the collaboratory tools, students and teachers will interact with physicists at universities and laboratories to bring high energy physics into the classroom. Classroom-classroom interactions will also be a vital part of the proposal to foster teacher mentoring as well as to build a community among students. We will also provide internships for students to actively work on developing, prototyping, testing and using the collaborative tools. The Education and Outreach Area CoordinaOur first goal is to disseminate the collaboratory tools into schools located near our partners. The tools will provide a 'real window' into the exciting world of particle physics at remote accelerator laboratories for many K-12 students and teachers. For education and outreach involving K-12 teachers, we will draw from the pool of active and highly regarded QuarkNet Teachers affiliated with our participating groups’ QuarkNet Centers . Through the collaboratory tools, students and teachers will interact with physicists at universities and laboratories to bring high energy physics into the classroom. Classroom-classroom interactions will also be a vital part of the proposal to foster teacher mentoring as well as to build a community among students. We will also provide internships for students to actively work on developing, prototyping, testing and using the collaborative tools. The Education and Outreach Area Coordinator will establish a group that will discuss communication and development issues related to education. These discussions will be aimed at the goals of developing collaborative learning sessions referencing both QuarkNet and ThinkQuest team lesson development model, with a new dimension of live interactive participation, and collaborative class and research projects, among undergraduates and high school students.
Prairie View A&M (PVAMU) is the second oldest university in the state of Texas. Blacks constituted 92% of the undegraduate enrollment of 7255 students at PVAMU in Fall 2002. UTA will partner with PVAMU to concentrate on minority recruiting programs. Since we are delivering a new collaborative environment in this proposal, it would be appropriate to focus on how this new paradigm can help traditionally under-represented students in physics. PVAMU played a major role in developing graphics software for BaBar. PVAMU is in the process of putting together a large donated grid computing cluster. The collaborative tools to be developed through this proposal, with active student involvement at PVAMU, can open doors for traditionally underrepresented students to participate in the greatest scientific discoveries at U.S. and international laboratories.
In the area of education research, we will develop direct and collaborative links with colleagues engaged in studying the effectiveness of collaboratory tools from an educational perspective at the undergraduate level as well as in the education of scientists joining large-scale research projects. To do this, we provide support for a graduate student doing dissertation research in the School of Education at Michigan on the effectiveness of information delivery using the WLAP technology. Undergraduates participating in a summer Research Experience for Undergraduates (REU) at CERN and general physics undergraduates will be used as subjects in research on design principles for the WLAP technology and in an attempt to study the role of individual difference in the design of instructional materials for same [36,37]. The results of this research will be immediately applied to the research and development efforts of this proposal that relate to human-computer interface and will be incorporated into the other aspects of this educational and outreach proposal. Through this research, physicists will gain insight from professional educators into the nature and effectiveness of information delivery using the WLAP technology. Simultaneously, educators will gain insight into the nature and operation of science experimentation in large collaborations, which may spawn further research. The team at the University of Michigan will also carefully examine the transferability of the knowledge gained in the development of the GECSR Project to the development of college-level courses. By drawing upon the 300 + high energy physics lectures already recorded by the proposal participants from the CERN Summer Student Program and interfacing these through the OGCE portal we plan to develop a full senior-level course in high energy physics and to evaluate and track the performance of students who enroll in it.
The University of Iowa team will leverage four existing programs to conduct education and outreach for this proposal: Iowa QuarkNet, Science Education Center of School of Education, GROW (Grid Research and education group @ IoWa), and the Studio of Academic Technologies for Education.
A special effort will be undertaken by Florida International University in developing a “Grids for Kids” Collaboration on a physics grid tailored for high school implementation which will enable grades 9–12 students to interact in a virtual global village where hands-on implementation and use can promote both understanding and spark an interest to continue studies in physics and technology at the undergraduate level. A small grid array (1-5 servers) in a high school classroom may easily stimulate interest by placing the tools to analyze a simple result set to modeling complex physical behavior. Other related projects are planned at the K-12 level and for teachers. Teachers lesson plans now can include more meaningful applications of theory. Experiments can include larger data sets; more complex data and their results could be analyzed at a rate much higher than could be accomplished with current technology. An environment where teachers can submit a data set to run on a physics cluster with Gigaflops of computing power and also monitor the job as it runs; all this from a remote location, can be an exhilarating experience for kids of any background. Grid physics for kids would immensely help the entry level knowledge of university students; wherever this initiative is implanted.
Summary of Coordination Plan
A significant challenge in the proposed GECSR project involves successful coordination of diverse participants representing multiple institutions and different disciplinary backgrounds. For example, a recent analysis of 62 completed projects from the NSF Knowledge and Distributed Intelligence (KDI) program, conducted by Cummings and Kiesler [38,39] found varying outcomes in geographically distributed multi-disciplinary teams. Specifically, performance did not vary by multidisciplinarity – but was strongly influenced by geographic dispersion. In particular, dispersed projects that did better employed a number of common management practices – including holding a kickoff workshop or conference, conducting regular audio conferences, and exchanging email regularly. However, workshops and sustained communication were costly in terms of resources and attention of project leaders. Analysis of responses from the PIs in the KDI projects showed the need for tools and practices that could accomplish the following: a) manage and track the trajectory of tasks over time; b) reduce information overload; c) support spontaneous conversation – including awareness of availability for conversation at a distance; d) support simultaneous decision making; and e) support scheduling of and participation in presentations and meetings across distance. In recognition of the risks posed by a team distributed across eight or more institutions we will take explicit steps to address the pitfalls identified in the KDI study (in addition to regular use of standard modes of communication, such as the phone, email and etc.). Note that the short description here is expanded in the separate three-page Coordination Plan that accompanies this submission. First, project activity will be lead and directed by a management committee consisting of the four Co-PIs and chaired by Dr. Neal. The management committee will meet three times a month, with the default mode being via videoconference – using technology under development in the scope of the GECSR project (e.g., VRVS and Access Grid). Once a month the committee will hold an augmented management meeting where leaders of the core technology will join the four Co-PIs. Second, the project will use the calendar tool within the OGCE to maintain an updated roadmap of project deliverables and milestones. In addition, a current Gannt chart will be posted in the resources area of the project OGCE workspace for quick review of schedule expectations and contingencies. Third, to reduce information overload, the project will use the resources area within the OGCE project space to post and review critical documents and URLs, with notification levels that individuals can adjust (e.g., frequency and type of notifications). Rigorous use of the resources area should also reduce the phenomenon of “death by email attachment” that plagues many distributed projects. Fourth, to provide presence awareness and opportunities for spontaneous interaction, project members will adopt the Jabber-based instant messaging application being developed for GECSR – and also use the built-in presence awareness and chat capabilities available within OGCE. Fifth, to support distributed presentations, the project will use tools in the Access Grid distribution, such as dppt, to view PowerPoint slides across multiple locations. Finally, to jumpstart creation and maintenance of trust and cohesion within the project team, the project will hold an annual workshop where participants can gather and discuss issues face-to-face – with the initial meeting held as a kickoff event shortly after notification of funding. In addition, the project leadership will identify opportunities for sub-sets of the project personnel to meet, such as combining project-related meetings with larger meetings (e.g., ATLAS week).
Timeline and Milestones
First Year: “Integrating Collaborative Tools”
The goal of the first year will be successful integration of core collaborative tools within the OGCE portal environment. Specifically, at the end of Year 1, users will be able to access key capabilities to support GECSR sessions within a common interface, including: videoconferencing, presence awareness and messaging, control and viewing of shared applications, and a shared file system. Each tool identified for integration into OGCE will be enhanced as needed to provide security, persistence, and compatibility with access and monitoring capabilities (e.g., MonALISA).
In terms of synchronous collaborative tool development, key deliverables in Year 1 will be:
Adaptation of VRVS to work within the OGCE environment. Specifically, this will involve development of management tools within OGCE that allow users to create groups and associate these groups with GECSR sessions. The management tools will also allow users to associate VRVS resources with GECSR sessions, such as live or archived videoconferences. For example, users will have the option to convert a transient GECSR session into a persistent session, where the GECSR system will automatically apply archiving (e.g., indexed MPEG audio/video streams) to communication in persistent sessions – subject to the preferences of the collaborators; (CalTech)
Adaptation of VNC to work within the OGCE environment. Specifically, this will involve development of OGCE tools that allow users associated with a GECSR session to control and view shared applications. Early development of specialized modules, for example java applets, optimized for quality of service under dynamically changing bandwidths will be explored and developed; (CalTech, Michigan)
Adaptation of MonALISA to work within the OGCE environment. The MonALISA system provides a distributed monitoring and control service [40] for collaborative applications. MonALISA is based on a scalable Dynamic Distributed Services Architecture [41] and uses a multi-threaded engine for asynchronous data collection and interaction with other services and agents. Activity related to MonALISA in Year 1 will involve development of OGCE tools that allow users to access output of MonALISA processes, such as monitoring of hosts where VRVS reflectors are running (load, CPU, IO traffic), monitoring the status of VRVS reflectors (number of VRVS rooms, clients, peers, traffic), providing remote administrative control of reflectors using a X.509 certificate based SSL connection, providing real time measurements for the quality of connectivity between selected possible peers (ABPing measures RTT, lost packages, the jitters using UDP and computes the quality of the connection every 2 s); (CalTech, Michigan)
Adaptation of the PCCE to work within the OGCE environment. The PCCE software provides secure presence and messaging that provides both synchronous and asynchronous communication. The PCCE software is based on the XMPP standards[42] and leverages the Jabber open source software developments[43]; (LBNL,UTA)
In terms of asynchronous collaborative tool development, key deliverables in Year 1 will be:
Adaptation of reliable and secure peer-to-peer information sharing within the OGCE environment. The scalable and secure peer-to-peer information sharing tool enables groups to securely share information among collaborators from the information’s natural storage location (e.g. files and databases)[28]; (LBNL);
Adaptation of automatic archiving systems within the OGCE environment; (All)
In terms of deployment, key deliverables in Year 1 will be:
Use of GECSR by the physics participants in this proposal to perform physics analysis, detector commissioning and detector operation; (All, Maryland leads)
Tighter integration of the legacy experiment specific code with the GESCR environment, for example, integrating PCCE and OGCE infrastructure into data production and analysis tools; (All, Michigan leads)
In terms of assessment and evaluation, key deliverables in Year 1 will be:
Selection of the community sample, administration of the baseline community survey, analysis of the survey data, and submission of a report based on the survey results; (Michigan)
Selection of the sites for in-depth observation and interviews, completion of the baseline visits to these sites, and submission of a report based on the results of the observations and interviews; (Michigan)
Analysis of feedback from physics participants in the GECSR project about the performance and usability of the GECSR environment; (Michigan)
Second Year: “Introducing Dynamic Collaboration Capability”
The focus of the development effort in Year 2 will shift from the integration effort of the Year 1 release to the development of new tools that leverage capabilities that will exist in GECSR after the initial release and add completely new capabilities to GECSR. In particular, a key goal for Year 2 will be development of a “language of access” that when combined with other monitoring capabilities in GECSR will assist users in discovering and joining interesting collaborations. All of the OGCE services will be adapted to also support WSDL bindings using the WS-Resource Framework (WS-RF) [44]. This will allow collaborative capabilities to be added to any application that is capable of acting as WS-RF client.
In terms of synchronous collaborative tool development, key deliverables for Year 2 will be:
Dynamic optimization of connectivity. Using MonALISA, we will provide a dynamic distributed decision unit to compute a global Minimum Spanning Tree (MST) to optimize connectivity and to correctly solve the connectivity when reflectors are down or unreachable due to network problems. This is done using a set of agents deployed to all the monitoring services to collect the connectivity information and to dynamically compute the MST; (CalTech)
Implementation of a language of access. The GECSR will have a language of access, using authorization tools [45] that allow association of ad-hoc groups of physicists with GECSR sessions based on individual X.509 certificates. Prior work has produced Web-based tools that implement much of the language of access functionality. The new work in Year 2 will integrate these existing tools within the OGCE environment; (CalTech)
Implementation of location-aware monitoring. We will use a geographic-information-systems-based Grid information broker (GIS-GIB) [46] to provide physicists with a location-aware mechanism for monitoring GECSR-based collaborations, discovering interesting collaborations, and analyzing the spatial and temporal characteristics of collaborations. GIS-GIB will help physicists dynamically identify where and when interesting collaborations are taking place. In addition, the GIS-GIB can be used to organize and analyze collaborative efforts that are geographically distributed and where the level of activity varies over time; (Iowa)
Implementation of collaborative, grid-enabled dialogs. Collaborative work-sharing will be enhanced by decision support from the monitoring and other Grid-management agents, who will interact with the collaborators in a given session to determine their needs, match the needs to Grid system conditions, and begin a simple dialog in case there are difficult choices to be made (e.g., in terms of moving large volumes of data); (CalTech, UTA)
In terms of asynchronous collaborative tool development, key deliverables for Year 2 will be:
Adaptation of HEPBook [47] to work within the OGCE environment. HEPBook provides a graphical user interface for entry of multi-media content, extensive search capabilities, access control, authorization, and protocols to connect to remote repositories. In addition, we will provide an application programmer interface to allow users to define new HEPBook entry types and to define new software agents (e.g., for managing HEPBook configuration). In particular, users will be able to use languages other than Java and the Java Expert System Shell (JESS). The basis for the agent platform will be JBOSS and Java Enterprise Servers; (Maryland)
Adaptation of Quadtree-based Replica Management Service to work within the OGCE environment. An adaptive quadtree approach is proposed to support replica management service in a dynamic data sharing system within a GECSR session or across sessions. It is anticipated that Grid-based collaborative LHC analysis will require frequent data replication and distribution based on dynamic data access patterns, continuing generation of new data, and diverse scopes of human-to-human collaborations. Current Grid middleware (e.g., GDMP) does not support regional and global optimization of dynamically distributing, replicating, and sharing large LHC datasets on the Grid. A preliminary study [48] has demonstrated that a quadtree-based algorithm can adaptively decompose and share datasets on the Grid based on their characteristics such as frequency of data access. The quadtree approach will be integrated with GECSR-based group communication protocols to provide optimal solutions to large-scale data allocation and replication within GECSR sessions; (Iowa)
Document versioning and annotation capabilities associated with the peer-to-peer information sharing tool. This capability will provide an initial course-grained collaborative document editing facility. (LBNL)
In terms of deployment, key deliverables in Year 2 will be:
Release of the GECSR environment for use by a limited number of external user groups in the HENP community (e.g., including but not limited to groups associated with CDF and D0 at the Tevatron and with ATLAS and CMS at the LHC). These groups will be selected to cover a representative distribution of potential GECSR users. Candidate groups include two physics object groups from CMS (each group has 30+ members focused on development of algorithms for physics object reconstruction and implementation of these algorithms in software), sub-groups of the CMS object groups (2-5 members focused on specific tasks), the Hadron calorimeter group (this group has around the clock activity associated with beam test and detector commissioning at CERN – there are already plans to perform remote tasks from the US), and the challenge driven data production groups in the LHC experiments. (Maryland)
In terms of assessment and evaluation, key deliverables in Year 2 will be:
Analysis of feedback from test users in the external groups about the performance and usability of the GECSR environment. (Michigan)
In terms of support of users, key deliverables in Year 2 will be:
Third Year: “Deploying a production environment”
In terms of asynchronous collaborative tool development, key deliverables in Year 3 will be:
Implementation of a prototype tool for collaborative LaTeX document preparation. Physics collaborations often revolve around preparation of LaTex documents and mechanisms for marking up, versioning, and commenting on these documents are a critical need. Work on this capability will begin in year two and a first prototype capability will be available in year 3. (LBNL)
In terms of deployment, key deliverables in Year 3 will be:
In terms of assessment and evaluation, key deliverables in Year 3 will be:
Administration of the mid-point community survey, analysis of the survey data, and submission of a report based on the survey results. (Michigan)
Completion of the mid-point visits to the in-depth sites, and submission of a report based on the results of the observations and interviews. (Michigan)
In terms of support of users, key deliverables in Year 3 will be:
Set up and opening of GECSR support office at Fermilab. We propose to have one dedicated computer professional for deployment and support. Since this person needs to support a large number of internal and external user groups, the deployment and support activity will reside at Fermilab and utilize the infrastructure of the US CMS Tier-1 center for the most efficient deployment and support of the user community. In addition, the person will work closely with computing and software support groups in each HEP collaboration to provide additional support to users. The deployment and support person will also collect bug reports on the deployed GECSR environment, and pass this feedback to the development team for further improvement of GECSR. (Maryland)
Enhancements to deployed tools and capabilities based on prioritization by the management team using the assessment and evaluation from the second year. (All)
Fourth Year: “Operational use of the GECSR”
In terms of asynchronous collaborative tool development, key deliverables in Year 4 will be:
In terms of deployment, key deliverables in Year 4 will be:
Widespread adoption and use of GECSR within the HENP community. Physicists in this collaboration will work closely with the management group of each target HEP collaboration for wide adoption and use of the GECSR environment. Some members in this proposal are from such management group. D. Green is the US-CMS Research Program manager and also a member of the international CMS management board. J. Womersley is a co-spokesperson of the D0 collaboration. We have already started discussing with managers of other HEP collaborations. (All, Maryland leads)
In terms of assessment and evaluation, key deliverables in Year 4 will be:
Administration of the final community survey, analysis of the survey data, and submission of a report based on the survey results. (Michigan)
Completion of the final visits to the in-depth sites, and submission of a report based on the results of the observations and interviews. (Michigan)
In terms of support of users, key deliverables in Year 4 will be:
Broader Impacts
Though HENP will be the targeted community for this proposal, the tools, systems, and the integrated working environment, proposed here will be designed and constructed to be broadly applicable, so that GECSR will contribute to improved modes of collaborative work in many fields of science and engineering, education, and homeland security. Specifically, there are three target areas where the GECSR experience strongly overlaps with related ongoing efforts in other fields and will have immediate impact. First, integration of videoconferencing capabilities within the OGCE framework will deliver video applications to other communities using OGCE, such as the earthquake engineers via NEESgrid, who currently rely on costly proprietary unicast technologies (e.g., H.323 devices) [49]. Second, GECSR applications will be compatible with the Sakai environment, since OGCE and Sakai share the same base software. Sakai is a recently announced Mellon Foundation effort involving Michigan, MIT, Stanford, Indiana and the uPortal consortium that will build an open source course management system. Assuming widespread adoption of Sakai, any institution using Sakai will also be able to incorporate GECSR features, allowing a seamless integration of research and classroom tools [50]. Third, the NIH has created eight regional centers for biodefense, called Regional Centers of Excellence, that will require tools for secure collaboration and data movement across geographically distributed labs. At least one of these centers, the Great Lakes RCE, has already adopted an OGCE-based collaboration environment [51] – and the NIH has announced a competition to construct cyberinfrastructure to join all eight centers which is likely to expand the use of OGCE beyond the Great Lakes center. Therefore, while GECSR development is imperative for the health and success of current and future global physics collaborations, output from the GECSR project will also be vital for the success of other high visibility cyberinfrastructure initiatives.
In terms of the impact on physicists, the collaborations enabled and fostered by the GECSR, with the support of governments and funding agencies around the world, will start to fulfill the vision that scientists from all regions are allowed, encouraged, and given the means to function as full partners in the scientific process of search and discovery. This proposal is an initial attempt to realize worldwide participation in the next generation of HENP experiments. True global collaboration in the data-intensive analysis that upcoming experiments will require creates a demand for a new generation of grid-enabled collaborative tools that can both access resources distributed around the world and knit together geographically dispersed collaborators.
Background of our Groups
This proposal brings together several groups with complementary and extensive knowledge and experience in several areas of collaborative scientific research:
Caltech: Caltech’s expertise is in videoconferencing software, virtual rooms, Virtual Room Videoconferencing System, agent based monitoring, virtual organization management, high performance wide area network implementation, management and operations, Grid systems and Grid-Enabled Analysis development, self-learning distributed systems optimization. The Caltech group, led by the US CMS Collaboration Board Chair, originated and has had responsibility for transatlantic networking for HEP since 1982. It currently co-manages the operation and development of the “LHCNet” links between CERN and the Starlight in Chicago [52]. It led the MONARC project [53] that developed the worldwide distributed computing model for LHC data analysis, and originated the Data Grid hierarchy concept that has been adopted by the LHC experiments. It shares leadership in the Particle Physics Data Grid [2] and iVDGL[54]. It developed VRVS [5] which went into production in 1997 and currently runs on over 20,000 hosts..
The Caltech CMS group in HEP has worked in close collaboration with the Center for Advanced Computing Research (CACR) since 1996. This collaboration began with the GIOD (Globally Interconnected Object Databases) [55] project funded by Hewlett Packard, and the first accurate evaluation of the backgrounds to Higgs particle searches using the diphoton signature, using pre-TeraGrid prototypes that took advantage of the large-scale computational resources at CACR and other NPACI sites. This collaboration led to the development of the first prototype Tier2 center at CACR and SDSC, by the CMS groups at Caltech and UCSD. Caltech and UCSD are two of the four TeraGrid [56] sites (along with Argonne and NCSA).
Caltech, and the CMS group in particular, has been very active in fostering collaborations for science and education on a global scale. The group has provided assistance in the design and implementation of Tier2 centers in developing countries since 1999, including India, Pakistan, China, Korea and Brazil among others. Recent events organized by this group include a lecture series on Grids, Networks and national IT infrastructure at the renowned Nathiagali Summer College in Islamabad Pakistan (founded by Nobel Laureate Abdus Salam in 1974), a session on Grids at UERJ in Rio (and made available to the world scientific community using VRVS and AMPATH) in February 2002, and a session on the “Role of New Technologies in Developing the Information Society” in Bucharest in November 2002 [57]. ]. Recently the Inter-regional Grid Enabled Center for High Energy Physics Research and Educational Outreach (CHEPREO) at FIU was established through the auspices of the NSF, and provides an ideal testbed environment for the deployment, assessment, and evaluation of collaborative tools for radical improvement in science and math education
Fermilab: Fermilab is the US center of CMS for detector construction, commissioning and operations, and physics analysis. It is also the US CMS Tier-1 center for the computing. It is also one of the US institutions working on the LHC machine construction and operation. Many of the Fermilab team members are also members of CDF or D0. Their participation in this proposal is as a user group: to help define requirements and evaluate the prototype collaboratory environment. In addition, the members from the Computing Division (CD) will coordinate work on this proposal and other computing projects at Fermilab, which include the US CMS computing project and the HEPbook development project in CD.
Florida International University is ranked by the Carnegie Foundation for the Advancement of Teaching as a Doctoral/Research University-Extensive, the highest ranking in its classification system. FIU has more than 32,686 students, 1,100 full-time faculty, and 90,000 alumni, making it the largest university in South Florida and placing it among the nation’s 25 largest colleges and universities. It is unique in that it has the highest proportion of international students and faculty of any major university in the country. Its mission includes being the principal educational and research interface between the State universities and South and Central America and the Caribbean. AMPATH provides opportunities for Hispanic minority students to work on the project in technology and administrative positions. FIU is one of the largest minority universities in the continental US and the largest Hispanic serving minority university with over 32,000 students, (55% Hispanic and 20% African American). The FIU nuclear physics group was established in 1995 when 8 positions (6 experimental and 2 theoretical) were created in partnership with the Thomas Jefferson National Accelerator Facility. Members of this group have gone on to collaborate in the CMS experiment through the NSF supported CHEPREO project.
University of Iowa: The University of Iowa team members will bring their expertise in the following aspects into the execution of the proposed GECSR project: (1) Grid research, education, and outreach in HENP through the collaboration between GROW (Grid Research and educatiOn group @ IoWa) and university CMS community; (2) CMS collaboration experiences in forward calorimeter construction and data analysis; (3) collaborative visualization (e.g., NSF funded LIVE (Laboratory for Immersive Visualization for the Environment) [58]); (4) geographical-information-systems-based Grid monitoring; (5) an iVDGL member institution participating in the development and operation of USCMS Grid testbeds; and (6) an intra-campus research Grid - HawkGrid supported by its operation center.
Lawrence Berkeley National Laboratory: The members of the Lawrence Berkeley National Laboratory team bring to the proposal extensive expertise in physics collaborations and nine years of development, deployment and use of a broad range of collaborative tools including a Pervasive Collaborative Computing Environment[26], secure and reliable group communication protocols [59], secure messaging and presence, Grid computing, secure peer-to-peer shared file systems[28], security models appropriate to collaborative activities[26], and the Access Grid videoconferencing systems.
Maryland: The members of the University of Maryland team have extensive experience in coordinating many groups with a large number of people both in CMS and D0. They worked as coordinators of the CMS hadron calorimeter construction project, the CMS Jet and Missing Et physics group, the D0 physics group and its subgroups (top, new phenomena), the D0 Computing group, the D0 simulation group, etc. The Maryland team brings to the proposal close ties to the user community. Also, the team takes a responsibility for deploying the developed collaboratory environment and user support..
University of Michigan: The University of Michigan is an international leader in the design and assessment of collaboratories, in the development of open-source environments for Web-based classroom and research collaboration, and in the capture and storage of lecture and presentation content for replay over the Web. In terms of collaboratories, faculty at the School of Information (SI) developed and deployed the NSF-funded Space Physics and Aeronomy Research Collaboratory (SPARC) in 1993, which remains the world’s oldest, continuously operating collaboratory. Over the past decade, SI faculty have led several other important NSF and NIH sponsored collaboratory projects, including the currently funded George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES)[60] – where Dr. Finholt is a co-PI on NEESgrid [49], the grid-based collaboratory component of NEES. In addition, SI faculty are leading an effort to develop general principles for collaboratory design and use through the NSF ITR-funded Science of Collaboratories[61] project, where Dr. Finholt is also a co-PI. In terms of environments for Web-based collaboration, faculty and staff at the University of Michigan’s Duderstadt Center (formerly the Media Union) are leading the effort to build an open source course management system, called Sakai [50]. The Sakai project, carried out with MIT, Stanford, Indiana, and the uPortal consortium, was just launched with a major gift from the Mellon Foundation. Dr. Severance is leading the effort to produce grid-enabled versions of Sakai, such as for the NEESgrid and OGCE projects. Finally, in terms of capture and replay of lecture content over the Web, faculty and staff at the MU and the Department of Physics produced the first applications for archiving and replaying CERN seminars and tutorials. Specifically, Professor Neal, Dr. McKee, and Dr. Severance have directed and shaped the Web Lecture Archive Project [27], which in an expanded form will provide a key mechanism for education and outreach in this proposal. Overarching these independent activities is the Michigan Grid Research and Infrastructure Development (MGRID) center[62] – an interdisciplinary effort funded by the University of Michigan administration to develop an internal grid testbed (mgrid.org). MGRID center staff will work with key projects, such as the research proposed here, to enhance evolution of data and computing grid applications at Michigan and elsewhere. Drs. Neal and Finholt are appointed to the Executive Board of MGRID, and Dr. Severance is a key member of the MGRID technical advisory team.
University of Texas at Arlington: Close interactions between the HEP researchers and CS researchers at UTA have led to recent acquisitions from the NSF MRI program and joint supervision of graduate students. The HEP group at UTA played a leading role in developing tools (Grid Application Toolkit GRAT[63]) for large scale Monte Carlo data generation using Data Grid middleware like Globus [64] and Condor [65]. Thousands of cpu days are made available for specific studies over short periods of 1-2 weeks at 3 national laboratories and 7 universities in the U.S. The collaboratory tools proposed here will allow collaborations to share and manage the task of data production and analysis. The CS researchers at UTA have extensive experience in multimedia (author of the MtreC MPEG4 Toolkit [66]) and database research. UTA serves a large minority population in the Dallas-Ft. Worth metroplex.
Results from Prior NSF Support
Conclusion: Transition to a New Era of Collaboration in Scientific Research and Education
Management Coordination Plan
Specific roles of project personnel The Education and Outreach Coordinator will be Dr. Heidi Alvarez (Florida International University), who has led development of the
Center for High-Energy Physics Research and Educational Outreach (CHEPREO) at FIU.
The Education and Outreach Coordinator will establish and lead the group that will design and implement the GECSR E&O activity. These activities will be aimed at the goals of developing collaborative learning sessions, and collaborative class and research projects, among undergraduates and high school students.
How the project will be managed Specific coordination mechanisms
A significant challenge in the proposed GECSR project involves successful coordination of diverse participants representing multiple institutions and different disciplinary backgrounds. For example, a recent NSF report about 62 completed projects from the Knowledge and Distributed Intelligence (KDI) program (Cummings & Kiesler, 2003), found varying outcomes in geographically distributed multi-disciplinary teams. Specifically, performance did not vary by multidisciplinarity – but was strongly influenced by geographic dispersion. In particular, dispersed projects that did better employed a number of common management practices – including holding a kickoff workshop or conference, conducting regular audio conferences, and exchanging email regularly. However, workshops and sustained communication were costly in terms of resources and attention of project leaders. Analysis of responses from the PIs in the KDI projects showed the need for tools and practices that could accomplish the following: a) manage and track the trajectory of tasks over time; b) reduce information overload; c) support spontaneous conversation – including awareness of availability for conversation at a distance; d) support simultaneous decision making; and e) support scheduling of and participation in presentations and meetings across distance.
In recognition of the risks posed by a team distributed across eight or more institutions we will take explicit steps to address the pitfalls identified in the KDI study (in addition to regular use of standard modes of communication, such as the phone, email and etc.). First, the management team will meet three times a month, with the default mode being via videoconference – using technology under development in the scope of the GECSR project (e.g., VRVS and Access Grid). Second, the management team will hold a monthly meeting of the augmented management team. Third, the project In recognition of the risks posed by a team distributed across eight or more institutions we will take explicit steps to address the pitfalls identified in the KDI study (in addition to regular use of standard modes of communication, such as the phone, email and etc.). First, the management team will meet three times a month, with the default mode being via videoconference – using technology under development in the scope of the GECSR project (e.g., VRVS and Access Grid). Second, the management team will hold a monthly meeting of the augmented management team. Third, the project will use the calendar tool within the OGCE to maintain an updated roadmap of project deliverables and milestones. In addition, a current Gannt chart will be posted in the resources area of the project OGCE workspace for quick review of schedule expectations and contingencies. Fourth, to reduce information overload, the project will use the resources area within the OGCE project space to post and review critical documents and URLs, with notification levels that individuals can adjust (e.g., frequency and type of notifications). Rigorous use of the resources area should also reduce the phenomenon of “death by email attachment” that plagues many distributed projects. Fifth, to provide presence awareness and opportunities for spontaneous interaction, project members will adopt the Jabber-based instant messaging application developed for GECSR – and also use the built-in presence awareness and chat capabilities within OGCE. Sixth, to support distributed presentations, the project will use tools in the Access Grid distribution, such as dppt, to view PowerPoint slides across multiple locations. Finally, to jumpstart creation and maintenance of trust and cohesion within the project team, the project will hold an annual workshop where participants can gather and discuss issues face-to-face – with the initial meeting held as a kickoff event shortly after notification of funding. In addition, the project leadership will identify opportunities for sub-sets of the project personnel to meet, such as combining project-related meetings with larger meetings (e.g., ATLAS week).
Specific budget items to support coordination
The most significant budget item in support of coordination within the GECSR project is funding for development and deployment of the GECSR tools themselves – which in addition to be used by physics researchers to do their work will also be used by the members of the GECSR team to execute the GECSR project. Another significant budget items is funding for travel, which will support an annual face-to-face meeting of the augmented management team and an annual meeting of the EAB.
1
First Year: “Integrating Collaborative Tools”
Second Year: “Introducing Dynamic Collaboration Capability”
Establishment of a single integrated GECSR help desk and support function. (Maryland)
Enhancements to deployed tools based on prioritization by the project management team using the assessment and evaluation information from the first year. (All)
Third Year: “Deploying a production environment”
The focus of activity in Year 3 will shift to deployment of GECSR within the larger HENP community.
Deployment of the GECSR environment for the QuarkNet [46], CHEPREO[47] and PVAMU outreach participants. (FIU, UTA)
Deployment of prototype collaborative editing capabilities to a select HENP group for testing and feedback. (Maryland, MSU)
Fourth Year: “Operational use of the GECSR”
The focus of activity in Year 4 will shift to operational use of GECSR within the larger HENP community.
Collaborative LaTeX document editing capability. This capability will provide users with rudimentary change tracking and commenting facilities that are integrated into LaTeX. (LBNL)
Continued operation of the GECSR support office at Fermilab. (Maryland)
Enhancements to deployed tools and capabilities based on prioritization by the management team using the assessment and evaluation from the second year. (All)
The broader implications of this proposal, and its mission to grant full partnership to groups in all world regions are driven by (1) The right of groups and governments that have contributed to the global collaborations to share and collaborate in the data analysis, and thus in the process of search and discovery that is at the root of all basic research. (2) The need to justify global projects, and global collaborations as a valid and vital means of conducting future leading-edge scientific research and other endeavors, without exploitation of the poorer nations and world regions and (3) The need to develop effective means of collaboration and cooperation among sub-groups with vastly different academic and cultural backgrounds, and especially to allow effective collaboration by university-based students from their home countries.
Homer Neal: University of Michigan NSF Award DUE #0333580 ( 10/1/032003 – 9/30/2005; $249,998). PI- “ Web Lecture Capture Project” The project is designed to develop a fast and efficient process for capturing and publishing content-rich lectures on the web in the setting of large, multiple-parallel session conferences and meetings using the WLAP technology. Significant milestones have been achieved, including a recent demonstration at the American Physical Society Meeting. University of Michigan NSF Award PHY #0139604 (10/1/2001 –9/30/2004, $310,000). (PI) --Research Experience for Undergraduates Program at the European Organization for Nuclear Research (CERN). This program provides the opportunity for U.S. undergraduates to spend summers at CERN carrying out research and participating in the prestigious CERN summer student program. A number of students in this program would be expected to work on the GECSR project proposed herein.
Julian Bunn: Johns Hopkins University Subaward Agreement No.08002-48195 under NSF Grant No. PHY-9980044 (9/15/1999 - 8/31/2003; $1,051,771), "Accessing Large Data Archives in Astronomy and Particle Physics" Results: The sheer amount of data that are used in the fields of astronomy and particle physics has forced both communities to move from flat files to highly organized databases. This project contributed to 1) The organization of data for efficient access, 2) Storage of data in widely distributed locations, 3) Efficient handling of entire major archives.
Tom Finholt: CMS-0117853 (8/01/2001 – 9/30/2004; $1,814,026), “NEESgrid (Network for Earthquake Engineering Simulation): A Distributed Virtual Laboratory For Advanced Earthquake Experimentation and Simulation”. Results: This project will contribute to the system integration effort for the George E. Brown, Jr. Network for Earthquake Engineering Simulation. It will define the user requirements for the NEESgrid and contribute several software components including collaboration tools through the CompreHensive CollaborativE Framework (CHEF). Representative Papers: Severance, C.R. (2003). Integrating the grid into CHEF. NEESgrid Technical Report TR-2003-1 and Finholt, T.A., Wierba, E.E., Birnholtz, J.P. & Hofer, E. (2002). NEESgrid user requirements document. NEESgrid Technical Report.
Shuichi Kunori: No prior support
We are proposing to launch a next-generation collaborative framework for scientific research, harnessing multi-institutional expertise, and existing and new tools across a broad range of technologies, to begin the transition to a new era of collaboration in the sciences. Just as there is an ongoing transition to grid computing, enabling transparent access between people and computing resources, we foresee a transition to collaborative environments that enable new modes of transparent, persistent and spontaneous interaction between scientists located at sites around the globe. The collaboratory we envision will provide secure, intelligent, easy to use collaborative tools – delivered via a uniform portal interface -- integrated with discipline-specific tools for each scientific field. This makes it a key required element for the success of today's global scientific collaborations, and gives it the potential of becoming a cornerstone of more tightly knit worldwide collaborations of the future. Extension of the GECSR system and concepts to the formation of Grid-enabled Learning Environments will provide educators and students alike with richer and easier modes of access and interaction with scientists located in all world regions; making them, from their classrooms and homes, part of the discovery process of frontier science. The proposal’s research outcomes, from the development of the collaboratory to a deeper understanding of effective modes of global communication, could have a profound and broad impact on research and education, as well as the way multi-institution organizations operate, by enhancing our ability to interact and share work collectively, across global networks.
The Project Director will be Dr. Homer Neal (University of Michigan), who is the director of the UM-ATLAS Collaboratory Project, the Samuel A. Goudsmit Professor of Physics, Interim President Emeritus, Vice President Emeritus for Research, and former chair of physics from 1987 to 1993 at Michigan. Dr. Neal's research area is experimental high energy physics and he is currently conducting his research at CERN, where his research group is part of the ATLAS Experiment. His technical research expertise includes the design of particle detectors, particle event reconstruction and analysis, large-scale database management, and particle physics phenomenology. He has led many experiments that have elucidated the nature of spin effects in high energy particle interactions, including proton-proton elastic scattering, electron-positron scattering and in various inclusive hadronic reactions. In the ATLAS experiment, for which he is the Michigan Institutional Representative, Neal is involved in developing the computing infrastructure required to carry out the planned physics analysis. This includes work on high speed networking between CERN and the US, quality of service protocols, and the development of collaborative tools. He is also involved with the Michigan effort to construct muon chambers for the ATLAS forward muon spectrometer.
The Project Coordinator will be Dr. Shuichi Kunori (University of Maryland), who has long experience in leadership of physics analysis groups. The Project Coordinator has overall responsibility for project coordination and deliverables, as well as for calling meetings, writing reports, dissemination, and the use of the delivered tools in the physics collaborations, as well as working closely with the Technical Coordinator on their joint responsibilities. The Project Coordinator will report to the Steering Group for the Management Team.
The Technical Coordinator will be Dr. Julian Bunn (California Institute of Technology), who has significant technical expertise in grids, collaborative systems, and HEP computing. The Technical Coordinator will have overall responsibility for the day-to-day coordination of the technical work of the project, for identifying needed inter-team coordination and work, and for working closely with the Project Coordinator on their joint responsibilities.
The Assessment and Evaluation Coordinator will be Dr. Thomas Finholt (University of Michigan), who has lead assessment and evaluation of several major collaboratory projects including NEESgrid, the collaboratory element of the George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES). The Assessment and Evaluation Coordinator will design, administer, and analyze the community survey and design, conduct, and analyze the in-depth site visits. In addition, the Assessment and Evaluation Coordinator will work with the rest of the project management to insure that development and deployment activities are consistent with documented user needs.
This project will involve coordination of development and deployment activity across distributed institutions. In addition, this project will be part of the LHC research program, where thousands of physicists are engaged in an ongoing managed international collaborative effort, in which the core of the project team is already embedded as collaborators in the LHC physics experiments. Therefore, the project will employ multiple management structures to meet the need for internal organization and the need for external coordination with the larger LHC research community.
In terms of internal organization, the project will be led by Dr. Neal and a management team consisting of the coordinators named previously: Bunn, Finholt, Kunori, and Alvarez. The management team will be responsible for running the project, will employ the specific coordination mechanisms described in the next section, will track the progress of the project, and will administer funds. The experience of this team will enable us to meet the significant management challenges of the project: the broad scope of the problem to be addressed; the diverse multi-disciplinary teams; and the diverse external constituencies including national labs and international organizations. In addition, the project will employ an augmented management team, consisting of the core project leadership and the leads within each of the technical development areas.
In terms of external coordination, the project will form a GECSR user group, with the responsibility for assisting the assessment and evaluation effort in collection of user requirements and of user experiences with the GECSR tools. The GECSR user group activity will be led by Dr. Raymond Brock (Michigan State University). Through the GECSR user group, interested and early-adopter members of the research community can discuss and give feedback on issues and suggestions for: more effective modes of communication during work sessions; required features or new paradigms; and future development work. A GECSR Web site will be maintained to keep the scientific community up to date with development plans and issues, and to encourage online feedback.
Finally, to determine broad strategic directions and to stay current with the physics community and relevant technology communities, the GECSR project will form an external advisory board (EAB). The EAB will engage both the physics user community and the information technology research community to advise the GECSR management team. Members will include people from the physics experiments (e.g., US ATLAS and US CMS research program managers or designees), from Grid projects (e.g., GrPhyN), computer science researchers, and other information technology experts. The EAB will meet once a year.