Apache Web Services in the Real World, an E-Science PerspectiveSrinath PereraArchitect, WSO2 Inc. Member, Apache Software FoundationLanka Software Foundation Outline● Linked Environment for Atmospheric Discovery Project (LEAD), the Use Case. ● LEAD Architecture, using SOA to build a Large Scale E-Science Project.● History: LEAD and Apache Web Service Projects.● Apache as a Sustainability Model for Academic Projects. E-Science ●Continuation of High Performance Computing, Parallel Computing, and Grid.●Cyber-infrastructures to support Scientific Research. ●Build around “Computation” as the third Pillar of Science (along with Analysis and Experimentation).●Characterized by wide range of computing (CPU minutes to CPU years) and Data (few KB to Pbs of data) requirements.●Based on Real life usecases. Reality is Harder than Fiction● E-Science joins Theory with Real life data● Real Life Applications often go beyond our experiences. ●Most Weather models are calculated much less than ideal resolutions, otherwise a 24 hour forecast takes more than 24 hours !!!●Physics Usecases (e.g. Large Hadron Collider), Telescopes, Genome Analysis generate Tera bytes of data in days if not hours, and moving a 1TB takes hours even in a 10 GB networks of TeraGrid.●Scale, Geographical Distribution of resources, Heterogeneity makes these usecases Complex. Linked Environments for Atmospheric Discovery (LEAD) ● U.S. NSF funded, 10+ Universities, 11M $, 5 Years.● Used for U.S. National Weather forecasts by NOAA. ● Presented to U.S. Congress as an example to justify Scientific research spendings by U.S. NSF. ● Have brought the state of the art forecasting capabilities to wider audience ranging from hardcore scientists to high schools students. LEAD: Dynamic Weather Analysis in U.S. Wide Scale Why is it Hard? ● Geographically Distributed Sensors, Computing Power, Storage, and Expertise. ● Handling Failures and Recovery ● Long Running Jobs (> 1 Hour). ● Large Scale Jobs (10-1000+ processors). ● Large Sized Data (KBs to GB of data). ● Need to serve many Parallel Users. ● Usage Spikes. LEAD as an Example● Assume a Hurricane developed, and 1000 scientists across U.S. come to LEAD portal to run forecasts. ● Lets assume, ●Each user run 3 workflows.●Each Workflow has 6 services, generates about 300 notifications, moves 50 100MB files, generates 50 100MB files, and runs for one hour.●Each Service needs 5 CPUs Hours . Which Means● 3000 Parallel workflows ● Need 90,000 CPUs per Hour ● 250 TPS for messaging System● Move 8GB/Sec through the network● Generate 15TB data per Hour LEAD Can not handle these numbers yet, but they give us an idea about the challenge. SOA, E-Science and LEAD● E-Science infrastructures are Distributed, Complex, and Heterogeneous. ● SOA is designed to handle just the like.● LEAD is based on many SOA Specs– WSDL, SOAP, WS-Addressing for Communication– WS-BPEL for Workflows– WS-Eventing for Messaging – WSDM for service Management ● LEAD People have closely worked with and contributed to Web Services, pushing its limits to apply it to LEAD. Document Outline
Add New Comment