Project Report
Cloud Computing Report1
December 22, 2010
Marcus Ljungblad
Navaneeth Rameshan
Wasif Malik
This report is prepared by
Marcus Ljungblad
Navaneeth Rameshan
Wasif Malik
1This report is a part of the cloud computing project.
Contents
1
Introduction
1
2
ProposedMethod
2
2.1
Attempted approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.1.1
Web Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.1.2
Single Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2
Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2.1
Web Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2.2
Single Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
3
Implementation
5
4
Results
7
4.1
Single-Task mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
4.1.1
Simulation summary . . . . . . . . . . . . . . . . . . . . . . . . .
9
4.2
Web-Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4.2.1
Simulation Summary . . . . . . . . . . . . . . . . . . . . . . . . .
11
4.3
Round-Robin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.3.1
Simulation Summary . . . . . . . . . . . . . . . . . . . . . . . . .
12
5
Conclusion
13
5.1
Scope for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
5.1.1
Shifting jobs between workers . . . . . . . . . . . . . . . . . . . .
13
5.1.2
Load parameters . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
5.1.3
Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
5.2
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
i
List of Figures
2.1
High level flow of Scheduling algorithm . . . . . . . . . . . . . . . . . . .
3
3.1
UML diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
4.1
Response and Queued jobs over time in single-task mode. . . . . . . . . .
8
4.2
Number of active, idle, and computing workers over time for single-task
mode.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
4.3
Response time and queued jobs over time in web-task mode
. . . . . . .
10
4.4
Active, idle and computing workers over time in web-task mode. . . . . .
11
4.5
Response time and queued jobs using round-robin scheduling . . . . . . .
12
ii
Chapter 1
Introduction
Distributing jobs of unknown size efficiently across a large number of machines is one
of the greatest challenges today in cloud computing. The goals ranges from minimizing
cost to minimizing time to complete a set of jobs; two often contradicting requirements.
While the best way to complete all jobs as fast as possible may be to schedule one job per
machine, it is far from the most cost-efficient. In effect, one must always make a trade-off
between the two.
In this report we present three algorithms: one focuses on minimizing response time for
a job, one on minimizing cost, and one reference algorithm implemented using round-
robin. To test the algorithms a cloud simulator was implemented in C++. Results from
simulations with varying inputs are compared against the round-robin algorithm. Finally,
a set of improvements to the evaluated algorithms are proposed.
1
Chapter 2
ProposedMethod
2.1
Attempted approaches
The following subsections describe the initial attempted approaches for scheduling.
2.1.1
Web Mode
For the web mode, we intended to do an efficient distribution of jobs to worker nodes, to
minimize swapping costs and fit them in memory in the best possible way so as to ensure
load distribution. However, since the scheduler has no information of the job memory, the
only possible way to efficiently distribute jobs is to schedule jobs in round-robin or random
distribution initially and get information about the job's memory from the worker. Then
the cost of transferring jobs from one worker node to another is calculated to see if it is
feasible to do a transfer for efficient load distribution. It may not be feasible to transfer a
job if the job has already computed most of its instructions. Although the worker nodes
don't provide a feedback on the number of instructions, the worst case time taken for the
jobs completed is used to estimate the time remaining. We discarded this method, as the
estimate for cost was deviating significantly from the actual cost. Time taken for a job
to complete depends largely on the number of jobs that are already present in the worker
node, the swapping costs and also the number of instructions for the job. We believe that
these factors made it difficult to predict the completion time.
2
3
2.1.2
Single Task Mode
For the single task mode, we intended to submit jobs in a round robin manner and also
compute the job away time for all the jobs that have been submitted . The job away
time is computed every scheduler cycle after at least one job from a task completes. The
average time is used in this case to estimate the completion time for jobs that have been
submitted. If the job away time is lesser than the average time of completion for the jobs
in the task, then the estimated time of completion is the average time itself but if the job
away time is greater than the average time, the average completion time is recomputed
as a weighted average. However, here again the estimation of time to complete deviated
significantly from the actual values and as a result either more workers were started than
what was necessary or lesser in some cases.
2.2
Proposed method
Figure 2.1 shows the high level flow of the scheduling algorithm.
Figure 2.1: High level flow of Scheduling algorithm
2.2.1
Web Mode
In the web mode, the goal of the scheduling algorithm is to minimize the average response
time. Ideally, a practical solution would be to distribute load evenly and to start enough
4
number of worker nodes to minimize response time. The jobs are submitted in round-
robin manner until at least one job completes. This is a modified version of the normal
round robin algorithm; it only sends one job per active worker in each scheduling cycle.
This results in the scheduler holding back jobs and increases the chances of jobs finishing
quickly due to reduced swapping time. As jobs complete, we keep track of worst case
completion time. Based on the current time, we also keep a track of time until next
charging tick. As soon as we have at least one completed job, the future jobs are scheduled
to worker nodes based on their current load. In the implementation, load is the number of
jobs a worker node is currently working on. Lesser the load, more jobs can be sent to the
worker and vice versa. For each worker node, the worst case execution times for the jobs
to be sent are estimated based on the worst case completion time seen so far. If the worst
case execution time of the jobs at hand exceeds the time until the next charging tick,
then only that many jobs are sent that are estimated to complete within the charging
tick. These jobs are chosen randomly from the queue for a good distribution. Jobs that
arent sent are considered spilled and spilled jobs for each worker is accumulated in the
same cycle. Depending on the estimation of how long the spilled jobs take, new worker
nodes are started. The scheduler cannot sent the spilled jobs to the newly started workers
immediately because the workers take some time to boot up and cant accept jobs in that
time. So, the spilled jobs are saved to a hash map and the scheduler tries to send them
to the specified workers at the start of each scheduling cycle. As soon as the new worker
node(s) boot up, the jobs will be submitted to them.
2.2.2
Single Task Mode
In single task mode, a similar scheduling algorithm like above is used but with one
major difference; the decision to start new nodes is dependent on the percentage of waste
value. The more money the user is willing to waste, the more nodes will be started
by the scheduler. This would result in a very quick response time but the cost will be
significantly more. Similarly, the lower the percentage of waste value, the more strict is
the scheduler in starting new nodes, the lesser the cost, but increased average response
time. For estimating the time it would take to complete all the jobs in hand, the scheduler
uses the same approach used in web task scheduler i.e. calculates the completion time
by multiplying the number of jobs in queue with the worst time to complete one job.
Theoretically, It would have been better to consider the average completion time for each
task and then decide if new nodes should be started or not; but due to time constraints
and complexity, this approach was not implemented.
Chapter 3
Implementation
The implementation of all modules was done in C++. The UML diagram in figure 3.1
shows the relationship between modules and their key attributes.
Simulator
-currentTime: long
1
1
1
0..n
TaskGen
Scheduler
Worker
-tasks: list<Task>
-queuedJobs: list<Job>
-workerState: enum
-jobs: list<Job>
-runningJobs: list<Job>
+execute()scheduler: Scheduler
-completedJobs: list<Job>
+startWorker()submit jobs/tasks
-workers: list<Worker>
-sendTask()+stopWorker()1
1
-workerStats: list<WorkerStats>
-createTask()+submitJobs()+runScheduler()has
+getState()+submitJobs(list<Job>)+getAvailableMemory()1
0..n
+notifyJobCompletion()+isAcceptingJobs()-getSlowestJobTime()+getTotalMemory() 1
1
-fetchJobsFromQueueRandomly()+getCostPerHour()-startWorkerNode()+getInstructionsPerTime() has
has
-runRoundRobinScheduler()+getTotalExecutionTime()-runWebScheduler()+getTotalCPUTime()-runSingleTaskScheduler()+getAverageResponseTime()
+getTotalCost() 0..n
0..n
+getQueuedJobs()
+getJobsCompleted()Task
Job
-taskid: long
-jobid: long
-jobrate
-taskid: long
has
-num_of_jobs
num_instructions: long
1
1..n
-jobs
mem_size: long
+getJobID()
+getTaskID()Figure 3.1: UML diagram
The clock functionality was implemented in the Simulator class by having a while loop
calling the task generator, scheduler, and worker objects in every iteration. One iteration
5
6
is considered to be one millisecond by each module. But since the scheduler can only
work after a configurable interval, the scheduler ignores iterations and only does work
after the specified scheduling interval. The jobs and tasks are fed to the scheduler by the
task generator at a rate specified in the input file (input.conf). The number of workers
to start automatically at the start of the simulation can be configured in workers.conf
but they will still take time to start-up when the simulation starts. Till that time, the
jobs will be queued at the schedulers end. The initial workers objects are created by
simulator and passed on to the scheduler at the start of the simulation. A limitation in
this implementation is that the scheduler has to wait for workers to start, even though
an initial numbers of started workers is specified.
The worker node is implemented as a state machine with the following states: Initialising,
Idle, Computing, Swapping, or Offline. It can accept jobs in all states except Initialising
and Offline. It maintains two queues: jobs in memory, and jobs in hard drive. When
swapping occurs at a node, it is conducted in a round-robin fashion. A job is started only
if it fits in the memory, if not, an existing job is swapped out (i.e moved to the jobs in
hard drive queue) and the next job retried. Moreover, the worker maintains a public API
statistical use towards the scheduler.
Chapter 4
Results
In this section, three test runs are presented and evaluated. The following configuration
was used for all the test runs.
Table 4.1: Simulator Configuration
Scheduling interval
0.1 s
Worker node speed
300 instr/s
Worker node memory
8 Gb
Worker swapping cost
5 instr/gb
Worker quantum
0.1 s
Worker node start-up time
120 s
Worker node notification time
2 instr
Worker node cost
1 Euro/hour
Allowed waste
30%
Workers started
2
4.1
Single-Task mode
Table 4.2 shows the input configuration for the jobs.
It can be seen from figure4.1 that in the first 120 seconds no jobs are scheduled as the
workers are being started during this time. Then jobs are scheduled in a round-robin
fashion until a sufficiently accurate estimation of when the jobs will complete is attained.
At this point one more worker is started and the spilled jobs are sent to the designated
worker. However, as well see in the following graph 4.2, this new worker is not well used
7
Document Outline
Add New Comment