I’ve noticed there are not many articles about boto and Amazon web
services. Although boto’s documentation is quite good, it lacks some
practical examples. Most specifically, I found quite a fair amount of
RTFM was needed to get an elastic map reduce job started on Amazon using
Boto (and I did it from Google app engine, just to go full cloud!). So
here it goes, a very basic EMR job launcher using boto:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
| zone_name = 'eu-west-1' access_id = ... private_key = ... # Connect to EMR conn = EmrConnection(access_id, private_key, region=RegionInfo(name=zone_name, endpoint= zone_name + '.elasticmapreduce.amazonaws.com' )) # Create a step for the EC2 instance to install Hive u '--install-hive' , u '--hive-versions' , u '0.7.1' ] '.elasticmapreduce/libs/script-runner/script-runner.jar' setup_step = JarStep( 'Hive setup' , start_jar, step_args=args) # Create a jobflow using the connection to EMR and specifying the # Hive setup step jobid = conn.run_jobflow( "Hive job" , log_bucket.get_bucket_url(), steps=[setup_step], keep_alive=keep_alive, action_on_failure= 'CANCEL_AND_WAIT' , master_instance_type= 'm1.medium' , slave_instance_type= 'm1.medium' , num_instances=2, hadoop_version= "0.20" ) # Set the termination protection, so the job id won't be killed after the # script is finished (that way we can reuse the instance for something else # Don't forget to shut it down when you're done! conn.set_termination_protection(jobid, True) s3_url = 'Link to a Hive SQL file in S3' '--hive-versions' , '0.7.1' , '--run-hive-script' , '--args' , '-f' , s3_url] step = JarStep( 'Run SQL' , start_jar, step_args=args) conn.add_jobflow_steps(jobid, [step]) ###https://monoinfinito.wordpress.com/2013/07/11/starting-an-emr-job-with-boto/ |
No comments:
Post a Comment