An Integrated Framework for Parameter-based Optimization of Scientific WorkflowsVijay S Kumar, G. Mehta, K. Vahi, V. Ratnakar, P. Sadayappan Jihie Kim, Ewa Deelman, Yolanda GilMary HallTahsin Kurc, Joel Saltz15 June 2009HPDC 20091Motivations• Performance of data analysis applications is influenced by parameters– optimization search for optimal values in a multi-dimensional parameter space• A systematic approach to:– enable the tuning of performance parameters (i.e., select optimal parameter values given an application execution context)– support optimizations arising from performance-quality trade-offs15 June 2009HPDC 20092Contributions of this paper• No auto-tuning yet (work in progress)• Core framework that can– support workflow execution (with application-level QoS) in distributed heterogeneous environments– enable manually tuning of parameters simultaneously– allow application developers and users to express applications semantically– leverage semantic descriptions to achieve performance optimizations• customized data-driven scheduling within Condor15 June 2009HPDC 20093Application characteristics• Workflows: Directed Acyclic Graphs with well-defined data flow dependencies– mix of sequential, pleasingly parallelizable and complex parallel components– flexible execution in distributed environments• Multidimensional data analysis– data partitioned into chunks for analysis– dataset elements bear spatial relationships, constraints– data has an inherent notion of qualityapplicationscan trade accuracy of analysis output for performance• End-user queries supplemented with application-level QoS requirements15 June 2009HPDC 20094Application scenario 1: No quality trade-offs• Minimize makespan while preserving highest output quality• Scale execution to handle terabyte-sized image data15 June 2009HPDC 20095Application scenario 2: Trade quality for performance• Support queries with application-level QoS requirements– “Minimize time to classify image regions with 60% accuracy”– “Maximize classification accuracy of overall image within 30 minutes”15 June 2009HPDC 20096Performance optimization decisions• What algorithm to use for• Where to map each this component?workflow component?• What data-chunking • Which components to strategy to adopt?merge into meta-components?• What is the quality of input data to this component?• Which components need toperform at lower accuracy• What is the processing levels?order of the chunks?15 June 2009HPDC 2009View each decision as a parameter that can be tuned7Conventional Approachworkflow designdatasetsApplication workflowApplication workflowWorkflow DescriptionWorkflow ExecutionSemantic representation• component discovery• clusters, the Grid or SOA• workflow composition• task-based / services-based• workflow validation• batch mode / interactive15 June 2009HPDC 20098Proposed approach: extensionsworkflow designAnalysis requests, queries with QoS:Analysis requests, queries with QoS:“Maximize accuracy within t time units”“Maximize accuracy within t time units”Application workflowApplication workflowTrade-off module• map high-level queries to metadatalow-level execution strategiesdatasets• select appropriate values forperformance parametersDescription moduleExecution moduleSemantic representationHierarchical execution:• search for components• map workflow components onto • workflow compositionGrid sites• workflow validation• fine-grain dataflow execution of• performance parameterscomponents on clusters15 June 2009HPDC 20099An instance of our proposed frameworkWINGS Description moduleP(Workflow INstance Generation and Selection)ARAPegasus WMSMExecution moduleDataCutterETCondor, DAGManERS Trade-off module Interacts with the description and execution modules15 June 2009HPDC 200910Document Outline
Add New Comment