R & hadoop: Java Development with amazon sdk

Over the past few months, we've talked about using the AWS SDK for Java to store and retrieve Java objects in Amazon DynamoDB. Our first post was about the basic features of the DynamoDBMapper framework, and then we zeroed in on the behavior of auto-paginated scan. Today we're going to spend some time talking about how to store complex types in DynamoDB. We'll be working with the User class again, reproduced here:

@DynamoDBTable(tableName = "users")

public class User {

    private Integer id;

    private Set<String> friends;

    private String status;

    @DynamoDBHashKey

    public Integer getId() { return id; }

    public void setId(Integer id) { this.id = id; }

    @DynamoDBAttribute

    public Set<String> getFriends() { return friends; }

    public void setFriends(Set<String> friends) { this.friends = friends; }

    @DynamoDBAttribute

    public String getStatus() { return status; }

    public void setStatus(String status) { this.status = status; }

    @DynamoDBAttribute

    public String getStatus() { return status; }

    public void setStatus(String status) { this.status = status; }

}

Out of the box, DynamoDBMapper works with String, Date, and any numeric type such as int, Integer, byte, Long, etc. But what do you do when your domain object contains a reference to a complex type that you want persisted into DynamoDB?
Let's imagine that we want to store the phone number for each User in the system, and that we're working with a PhoneNumber class to represent it. For the sake of brevity, we are assuming it's an American phone number. Our simple PhoneNumber POJO looks like this:

public class PhoneNumber {

    private String areaCode;

    private String exchange;

    private String subscriberLineIdentifier;

    public String getAreaCode() { return areaCode; }    

    public void setAreaCode(String areaCode) { this.areaCode = areaCode; }

    public String getExchange() { return exchange; }   

    public void setExchange(String exchange) { this.exchange = exchange; }

    public String getSubscriberLineIdentifier() { return subscriberLineIdentifier; }    

    public void setSubscriberLineIdentifier(String subscriberLineIdentifier) { this.subscriberLineIdentifier = subscriberLineIdentifier; }      

}

If we try to store a reference to this class in our User class, DynamoDBMapper will complain because it doesn't know how to represent the PhoneNumber class as one of DynamoDB's basic data types.

Introducing the @DynamoDBMarshalling annotation

The DynamoDBMapper framework supports this use case by allowing you to specify how to convert your class into a String and vice versa. All you have to do is implement the DynamoDBMarshaller interface for your domain object. For a phone number, we can represent it using the standard (xxx) xxx-xxxx pattern with the following class:

public class PhoneNumberMarshaller implements DynamoDBMarshaller<PhoneNumber>

   {

    @Override

    public String marshall(PhoneNumber number) {

        return "(" + number.getAreaCode() + ") " + number.getExchange() + "-" + number.getSubscriberLineIdentifier();

    }

    @Override

    public PhoneNumber unmarshall(Class<PhoneNumber> clazz, String s) {

        String[] areaCodeAndNumber = s.split(" ");

        String areaCode = areaCodeAndNumber[0].substring(1,4);

        String[] exchangeAndSlid = areaCodeAndNumber[1].split("-");

        PhoneNumber number = new PhoneNumber();

        number.setAreaCode(areaCode);

        number.setExchange(exchangeAndSlid[0]);

        number.setSubscriberLineIdentifier(exchangeAndSlid[1]);

        return number;

    }    

}

Note that the DynamoDBMarshaller interface is templatized on the domain object you're working with, making this interface strictly typed.
Now that we have a class that knows how to convert our PhoneNumber class into a String and back, we just need to tell the DynamoDBMapper framework about it. We do so with the @DynamoDBMarshalling annotation.

@DynamoDBTable(tableName = "users")

public class User {

    ...

    @DynamoDBMarshalling (marshallerClass = PhoneNumberMarshaller.class)

    public PhoneNumber getPhoneNumber() { return phoneNumber; }    

    public void setPhoneNumber(PhoneNumber phoneNumber) { this.phoneNumber = phoneNumber; }             

}

Built-in support for JSON representation

The above example uses a very compact String representation of a phone number to use as little space in your DynamoDB table as possible. But if you're not overly concerned about storage costs or space usage, you can just use the built-in JSON marshaling capability to marshal your domain object. Defining a JSON marshaller class takes just a single line of code:

class PhoneNumberJSONMarshaller extends JsonMarshaller<PhoneNumber>

   { }

However, the trade-off of using this built-in marshaller is that it produces a String representation that's more verbose than you could write yourself. A phone number marshaled with this class would end up looking like this (with spaces added for clarity):

{
  "areaCode" : "xxx",
  "exchange: : "xxx",
  "subscriberLineIdentifier" : "xxxx"
}

When writing a custom marshaller, you'll also want to consider how easy it will be to write a scan filter that can find a particular value. Our compact phone number representation will be much easier to scan for than the JSON representation.
We're always looking for ways to make our customers' lives easier, so please let us know how you're using DynamoDBMapper to store complex objects, and what marshaling patterns have worked well for you. Share your success stories or complaints in the comments!

Working with Different AWS Regions
- April 5, 2013
- zachmu
Wherever you or your customers are in the world, there are AWS data centers nearby.
Each AWS region is a completely independent stack of services, totally isolated from other regions. You should always host your AWS application in the region nearest your customers. For example, if your customers are in Japan, running your website from Amazon EC2 instances in the Asia Pacific (Tokyo) region will ensure that your customers get the lowest possible latency when they connect to your site.
New in the 1.4 release of the AWS SDK for Java, the SDK now knows how to look up the endpoint for a given service in a particular region. Previously, developers needed to look up these endpoints themselves and then hard-code them into their applications when creating a client, like so:

?

1

2

AmazonDynamoDB dynamo = new AmazonDynamoDBClient(credentials);

dynamo.setEndpoint("https://dynamodb.us-west-2.amazonaws.com");

With the 1.4 release, the SDK will look up a service's regional endpoint automatically, so all you have to know is which region you want to use. This newer method looks like this:

?

1

2

AmazonDynamoDB dynamo = new AmazonDynamoDBClient(credentials);

dynamo.setRegion(Region.getRegion(Regions.US_WEST_2));

Regions can also create and configure clients for you, like a simple factory. This is especially helpful when you're working with multiple regions in your application and need to keep them straight. Just use region objects to create every client for you, and it will be obvious which client points to which region.

?

1

2

AmazonDynamoDB dynamo = Region.getRegion(Regions.US_WEST_2)

.createClient(AmazonDynamoDBClient.class, credentials, clientConfig);

It's important to note that the setRegion() method isn't thread-safe. We recommend setting the region once, when a client object is first created, then leaving it alone for the duration of the client's life cycle. Otherwise, the SDK's automatic retry logic could yield unexpected behavior if setRegion() is called at the wrong time. Using the Region objects as client factories encourages this pattern. If you need to talk to more than one region for a particular service, we recommend creating one service client object per region, rather than trying to share.
Finally, at times it may be useful to programmatically determine which regions a given service is available in. It's possible to ask a Region object if a given service is supported there:

?

1

Region.getRegion(Regions.US_WEST_2).isServiceSupported(ServiceAbbreviations.Dynamodb);

For more information about which services are available in each region, see http://aws.amazon.com/about-aws/globalinfrastructure/regional-product-services/.
For more information about the available regions and edge locations, see http://aws.amazon.com/about-aws/globalinfrastructure/.
- Share
The AWS Toolkit for Eclipse at EclipseCon 2013
- March 26, 2013
- zachmu
Jason and I are at EclipseCon in Boston this week to discuss what we've learned developing the AWS Toolkit for Eclipse over the last three years. Our session is chock full of advice for how to develop great Eclipse plug-ins, and offers a behind-the-scenes look at how we build the Toolkit. Here's what we plan to cover:

Learn best practices for Eclipse plug-in development that took us years to figure out!
The AWS Toolkit for Eclipse brings the AWS cloud to the Eclipse workbench, allowing developers to develop, debug, and deploy Java applications on the AWS platform. For three years, we've worked to integrate AWS services into your Eclipse development workflow. We started with a small seed of functionality for managing EC2 instances, and today support nine services and counting. We learned a lot on the way, and we'd like to share!
The Toolkit touches a wide array of Eclipse technologies and frameworks, from the Web Tools Platform to the Common Navigator Framework. By now we've explored so much of the Eclipse platform that we've started to become embarrassed by the parts of the Toolkit that we wrote first. If only someone had told us the right way to do things in the first place! Instead, we had to learn the hard way how to make our code robust, our user interfaces reliable and operating-system independent (not to mention pretty).
We're here to teach from our experience, to share all the things we wish someone had told us before we learned it the hard way. These are the pointers that will save you hours of frustration and help you deliver a better product to your customers. They're the tips we would send back in time to tell our younger selves. We'll show you how we used them to make the Toolkit better and how to incorporate them into your own product.
Topics include getting the most out of SWT layouts, using data binding to give great visual feedback in wizards, managing releases and updates, design patterns for resource sharing, and much more.

If you are attending the conference, come by to say hello and get all your questions about the Toolkit answered! We are also handing out $100 AWS credits to help you get started using AWS services without a financial commitment, so come talk to us and we'll hook you up.
- Share
Managing Multiple AWS Accounts with the AWS Toolkit for Eclipse
- February 12, 2013
- zachmu
When you're building the next great application with AWS services, you'll probably end up with several different AWS accounts. You may have one account for your production application's resources, another for your development environment, and a couple more for personal testing. It can be really helpful to switch between these various accounts during your development, either to move resources between accounts, to compare configuration values, or to debug a problem that only occurs in one environment.
The AWS Toolkit for Eclipse makes it painless to work with multiple AWS accounts. You can configure the toolkit to store as many different accounts as you like using the toolkit preferences page:

The previous screenshot illustrates configuring each account with a name to help you remember what it's for, as well as its access credentials. If you're importing your credentials into Eclipse for the first time, you can follow the links in the preferences dialog box to the credentials page on aws.amazon.com, where you can copy and paste them.
To configure multiple accounts, simply click the "Add account" button and fill in the account's name and credentials. You can use the drop-down menu to edit the details of any individual account, as well as select the active account the toolkit will use.

Once you have all your accounts configured, you can quickly switch between them using the triangle drop-down menu in the upper-right-hand corner of the AWS Explorer view. It's easy to miss this menu in Eclipse's UI, so here's a screenshot illustrating where to find it. The same drop-down menu also contains a shortcut to the accounts preferences page.

Switching the active account will cause the AWS Explorer view to refresh, showing you the AWS resources for whichever account you select. The active account will also be used for any actions you select from the orange AWS cube menu, such as launching a new Amazon EC2 instance.
How are you using the AWS Toolkit for Eclipse to manage your AWS accounts? Is the interface easy to understand? Does it work well for your use case? Let us know in the comments!
- Share

The DynamoDBMapper framework is a simple way to get Java objects into Amazon DynamoDB and back out again. In a blog post a few months ago, we outlined a simple use case for saving an object to DynamoDB, loading it, and then deleting it. If you haven't used the DynamoDBMapper framework before, you should take a few moments to read the previous post, since the use case we're examining today is more advanced.

Reintroducing the User Class

For this example, we'll be working with the same simple User class as the last post. The class has been properly annotated with the DynamoDBMapper annotations so that it works with the framework. The only difference is that, this time, the class has a @DynamoDBRangeKey attribute.

@DynamoDBTable(tableName = "users")

public static class User {

    private Integer id;

    private Date joinDate;

    private Set<String> friends;

    private String status;

    @DynamoDBHashKey

    public Integer getId() { return id; }

    public void setId(Integer id) { this.id = id; }

    @DynamoDBRangeKey

    public Date getJoinDate() { return joinDate; }       

    public void setJoinDate(Date joinDate) { this.joinDate = joinDate; }

    @DynamoDBAttribute(attributeName = "allFriends")

    public Set<String> getFriends() { return friends; }

    public void setFriends(Set<String> friends) { this.friends = friends; }

    @DynamoDBAttribute

    public String getStatus() { return status; }

    public void setStatus(String status) { this.status = status; }        

}

Let's say that we want to find all active users that are friends with someone named Jason. To do so, we can issue a scan request like so:

DynamoDBMapper mapper = new DynamoDBMapper(dynamo);

DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();

Map<String, Condition> filter = new HashMap<String, Condition>();

filter.put("allFriends", new Condition().withComparisonOperator(ComparisonOperator.CONTAINS)

        .withAttributeValueList(new AttributeValue().withS("Jason")));

filter.put(

        "status",

        new Condition().withComparisonOperator(ComparisonOperator.EQ).withAttributeValueList(

                new AttributeValue().withS("active")));

scanExpression.setScanFilter(filter);

List<User> scanResult = mapper.scan(User.class, scanExpression);

Note the "allFriends" attribute on line 5. Even though the Java object property is called "friends," the @DyamoDBAttribute annotation overrides the name of the attribute to be "allFriends." Also notice that we're using the CONTAINS comparison operator, which will check to see if a set-typed attribute contains a given value. The scan method on DynamoDBMapper immediately returns a list of results, which we can iterate over like so:

int usersFound = 0;

for ( User user : scanResult ) {

    System.out.println("Found user with id: " + user.getId());

    usersFound++;

}

System.out.println("Found " + usersFound + " users.");

So far, so good. But if we run this code on a large table, one with thousands or millions of items, we might notice some strange behavior. For one thing, our logging statements may not come at regular intervals—the program would seem to pause unpredictably in between chunks of results. And if you have wire-level logging turned on, you might notice something even stranger.

Found user with id: 5
DEBUG com.amazonaws.request - Sending Request: POST https://dynamodb.us-east-1.amazonaws.com/ ... 
DEBUG com.amazonaws.request - Sending Request: POST https://dynamodb.us-east-1.amazonaws.com/ ...
DEBUG com.amazonaws.request - Sending Request: POST https://dynamodb.us-east-1.amazonaws.com/ ...
DEBUG com.amazonaws.request - Sending Request: POST https://dynamodb.us-east-1.amazonaws.com/ ...
Found user with id: 6

Why does it take four service calls to iterate from user 5 to user 6? To answer this question, we need to understand how the scan operation works in DynamoDB, and what the scan operation in DynamoDBMapper is doing for us behind the scenes.

The Limit Parameter and Provisioned Throughput

In DynamoDB, the scan operation takes an optional limit parameter. Many new customers of the service get confused by this parameter, assuming that it's used to limit the number of results that are returned by the operation, as is the case with the query operation. This isn't the case at all. The limit for a scan doesn't apply to how many results are returned, but to how many table items are examined. Because scan works on arbitrary item attributes, not the indexed table keys like query does, DynamoDB has to scan through every item in the table to find the ones you want, and it can't predict ahead of time how many items it will have to examine to find a match. The limit parameter is there so that you can control how much of your table's provisioned throughput to consume with the scan before returning the results collected so far, which may be empty. That's why it took four services calls to find user 6 after finding user 5: DynamoDB had to scan through three full pages of the table before it found another item that matched the filters we specified. The List object returned by DynamoDBMapper.scan() hides this complexity from you and magically returns all the matching items in your table, no matter how many service calls it takes, so that you can concentrate on working with the domain objects in your search, rather than writing service calls in a loop. But it's still helpful to understand what's going on behind the scenes, so that you know how the scan operation can affect your table's available provisioned throughput.

Auto-Pagination to the Rescue

The scan method returns a PaginatedList, which lazily loads more results from DynamoDB as necessary. The list will make as many service calls as necessary to load the next item in the list. In the example above, it had to make four service calls to find the next matching user between user 5 and user 6. Importantly, not all methods from the List interface can take advantage of lazy loading. For example, if you call get(), the list will try to load as many items as the index you specified, if it hasn't loaded that many already. If you call the size() method, the list will load every single result in order to give you an accurate count. This can result in lots of provisioned throughput being consumed without you intending to, so be careful. On a very large table, it could even exhaust all the memory in your JVM.
We've had customer requests to provide manually paginated scan and query methods for DynamoDBMapper to enable more fine-tuned control of provisioned throughput consumption, and we're working on getting those out in a future release. In the meantime, tell us how you're using the auto-paginated scan and query functionality, and what you would like to see improved, in the comments!

AWS Java Meme Generator Sample Application
- January 14, 2013
- zachmu
If you couldn't make it to AWS re:Invent this year, you can watch all of the presentations on the AWS YouTube channel. My talk was about using the AWS Toolkit for Eclipse to develop and deploy a simple meme generation app.

The application uses a common AWS architectural design pattern to process its workload and serve content. All the binary image data is stored in an Amazon S3 bucket; the image metadata is stored in Amazon DynamoDB; and the image processing jobs are managed using an Amazon SQS queue.

Here's what happens when a customer creates a new meme image:
1. The JSP page running in AWS Elastic Beanstalk asks Amazon S3 for a set of all the images in the bucket, and displays them to the customer.
2. The customer selects their image and a caption to write onto it, then initiates a post.
3. The JSP page inserts a new item into DynamoDB containing the customer's choices, such as the S3 key of the blank image and the caption to write onto it.
4. The JSP page inserts a message into the SQS queue containing the ID of the DynamoDB item inserted in the previous step.
5. The JSP page polls the DynamoDB item periodically, waiting for the state to become "DONE".
6. A back-end image processing node on Amazon EC2 polls the SQS queue for work to do and finds the message inserted by the JSP page.
7. The back-end worker loads the appropriate item from DynamoDB, downloads the blank macro image from Amazon S3, writes the caption onto the image, then uploads it back to the bucket.
8. The back-end worker marks the DynamoDB item as "DONE".
9. The JSP page notices the work is done and displays the finished image to the customer.
Several customers in attendance expressed interest in the source code for the application, so we have released it on GitHub. It takes a little work to set up, mostly because you need to add the SDK and its third-party libraries to the project's classpath. Follow the instructions in the README file, and please let us know how we can improve them!
- Share
Running the AWS SDK for Android S3Uploader sample with Eclipse
- December 18, 2012
- zachmu
As we announced previously, the AWS Toolkit for Eclipse now supports creating AWS-enabled Android projects, making it easier to get started talking to AWS services from your Android app. The Toolkit will also optionally create a sample Android application that talks to S3. Let's walk through creating a new AWS Android project and running the sample.
First, make sure that you have the newest AWS Toolkit for Eclipse, available at aws.amazon.com/eclipse.
To create a new AWS-enabled Android project, choose File > New > Project... and find the AWS Android Project wizard.

The wizard will ask you to choose a project name and an Android target. If you haven't set up your Android SDK yet, you'll be able to do so from this wizard. Also make sure the option to create a sample application is checked.

That's it! The newly created project is configured with the AWS SDK for Android and the sample application. You'll want to edit the file Constants.java to fill in your AWS credentials and choose an S3 bucket name before running the application.

If this is your first time using the Android Eclipse plug-in, you may need to create an Android Virtual Device at this point using the AVD Manager view. On Windows 7, I found that I couldn't start the emulator with the default memory settings, as referenced in this Stack Overflow question, so I had to change them:

With this change, the emulator started right up for me, and I was able to see the S3Uploader application in the device's application list.

Finally, there's one last trick you might find useful in using the sample application: it relies on images in the Android image gallery of the emulated device. If you can't be bothered with mounting a file system, a simple way to get some images in there is to save them from the web browser. Just start the web browser, then tap-hold on an image and choose "Save Image".

We're excited by how much easier it is to get this sample running now that Eclipse does most of the setup for you. Give it a try, and let us know how it works for you!
- Share
Configuring SDK Download Behavior in the AWS Toolkit for Eclipse
- December 14, 2012
- zachmu
The AWS Toolkit for Eclipse will automatically download new releases of the AWS SDK for Java, ensuring that you always have the most recent version of the service clients and productivity libraries. Some customers with slow network connections told us that the automatic downloads were sometimes triggered when they didn't want to wait. In response, we made a couple of small changes to make this process more predictable and easier to manage.
First, we changed the directory where we download the SDKs. In previous releases of the Toolkit, the SDKs were stored in a directory specific to your eclipse workspace, so you would get a new SDK downloaded every time you started a new workspace. To eliminate this duplication, we consolidated all SDKs, for all workspaces, into one directory. It defaults to your home directory, but you can configure it to be wherever you want via a new preference page.

We also added a preference setting to not automatically check for and download new releases, so that customers adversely impacted by downloading every release of the SDK can opt out of this behavior. Even if you decide to manage your SDK releases manually, you can always update to the latest version using the Check for updates now button in the preferences.
As a final note, the same preferences can be configured for the AWS SDK for Android.
- Share
The AWS SDK for Java at re:Invent
- November 20, 2012
- zachmu
The first ever AWS conference, re:Invent, has sold out! If you are one of the lucky people with a conference pass, we hope that you will come to hear Jason Fulghum and me speak about advanced features of the AWS SDK for Java and the AWS Toolkit for Eclipse. On Wednesday morning after the keynote, I'm giving a presentation and live demo of the AWS Toolkit for Eclipse's features. It's called Develop, Deploy, and Debug with Eclipse and the AWS SDK for Java:

The AWS SDK for Java and the AWS Toolkit for Eclipse enable developers to easily manage AWS resources, quickly build web scale Java applications that interact with AWS services, and deploy those applications to the AWS platform. In this session, learn what functionality the AWS SDK for Java and the AWS Toolkit for Eclipse provide, see common usage scenarios with the AWS SDK for Java, and discover how to use the management, deployment, and debugging capabilities in the AWS Toolkit for Eclipse.
Then on Thursday afternoon, Jason Fulghum will be sharing our best tips and tricks for getting the most out of the AWS SDK for Java in his presentation, Being Productive with the AWS SDK for Java:

The AWS SDK for Java includes several higher level APIs that make working with AWS simpler. Learn more about higher level APIs such as TransferManager, which allows you to easily manage asynchronous uploads and downloads from Amazon Simple Storage Service (Amazon S3); the AWS DynamoDB Object Persistence Layer, which allows you to annotate your Java classes to specify how the SDK should map them to AWS DynamoDB tables when they are saved and loaded; and other higher level APIs such as the Amazon Simple Email Service (Amazon SES) JavaMail provider.
We hope to see you at re:Invent! But if you can't make it this year, don't worry -- we'll be sharing the tips in our talks, and more, right here on the AWS Java Blog in the months to come.
- Share

The AWS SDK for Java makes it easy to store objects in Amazon DynamoDB and get them back out again, all without having to write the code to transform your objects into table items and vice versa. All you need to do is annotate your domain classes in a few places and the SDK will handle the work of getting objects into and out of the database. For example, here's a minimal User class we want to store in DynamoDB:

@DynamoDBTable(tableName = "users")

public class User {

    private Integer id;

    private Set<String> friends;

    private String status;

    @DynamoDBHashKey

    public Integer getId() { return id; }

    public void setId(Integer id) { this.id = id; }

    @DynamoDBAttribute

    public Set<String> getFriends() { return friends; }

    public void setFriends(Set<String> friends) { this.friends = friends; }

    @DynamoDBAttribute

    public String getStatus() { return status; }

    public void setStatus(String status) { this.status = status; }

}

The DynamoDBMapper utility class makes it simple to store instances of this class in DynamoDB.

AmazonDynamoDB dynamo = new AmazonDynamoDBClient(awsCredentials);

DynamoDBMapper mapper = new DynamoDBMapper(dynamo);

// save a new item

mapper.save(newUser);

// update the item

newUser.setStatus("active");

newUser.getFriends().remove("Jeremy");

mapper.save(newUser);

// delete the item

mapper.delete(newUser);

DynamoDBMapper can also handle query and scan operations with automatic result pagination, optimistic locking, automatic hash key generation, custom marshaling logic, and much more. For more detail, see this article.

R & hadoop

Java Development with amazon sdk

Using Custom Marshallers to Store Complex Objects in Amazon DynamoDB

Introducing the @DynamoDBMarshalling annotation

Built-in support for JSON representation

Working with Different AWS Regions

The AWS Toolkit for Eclipse at EclipseCon 2013

Managing Multiple AWS Accounts with the AWS Toolkit for Eclipse

Understanding Auto-Paginated Scan with DynamoDBMapper

Reintroducing the User Class

The Limit Parameter and Provisioned Throughput

Auto-Pagination to the Rescue

AWS Java Meme Generator Sample Application

Running the AWS SDK for Android S3Uploader sample with Eclipse

Configuring SDK Download Behavior in the AWS Toolkit for Eclipse

The AWS SDK for Java at re:Invent

Storing Java objects in Amazon DynamoDB tables

No comments:

Post a Comment