Browse Source

add part 1

Charles Reid 2 years ago
1 changed files with 103 additions and 0 deletions
  1. +103

+ 103
- 0 View File

@@ -0,0 +1,103 @@
# AWSome Day Notes: Part 1: The Basics

Following are some notes from Amazon's AWSome Day (Tuesday, February 27, 2018).

## EC2 Costs and Scheduling

Cost of a node:
* Important to understand Amazon's price model: users pay for *access*, not for *hardware*
* Cost of AWS node is cost for *on the spot access*

* If you can anticipate your usage, you can schedule instances in advance, and get a discount
* Discount of 50% for one-year reservation (if you keep it busy for 6 months, you've made your money back)
* Spot instances also available - need to be robust to sudden starts/stops (good for embarrassingly parallel jobs)
* Cheaper to anticipate your usage and plan ahead

## EC2 Transfer Costs

EC2 Instances:
* See [EC2 Instance Pricing - Data Transfer]( section
* Network costs for AWS nodes are an important consideration for high-traffic nodes (>10 TB)

* Traffic going from the internet *into* a node is always free
* Traffic going from the node *out* to the internet incurrs costs after 10 TB
* Outbound traffic costs ~$90/TB

AWS Regions:
* Traffic *within* a region does not incur costs (well... it's complicated)
* Traffic *between* regions will incur costs

* Transfer *into* an EC2 node from S3 bucket in same AWS region does not incur costs
* Transfer *out of* an EC2 node into S3 bucket in same AWS region does not incur costs
* (If they did charge you, they would be double-dipping...)

Note: the list of prices is like a legal document, so use the [AWS Monthly Calculator]( to estimate monthly costs with more detail.

## S3 Transfer Costs

* See [S3 Pricing - Data Transfer](
* Price model for storage is simliar to price model for AWS nodes: you pay for *access*, not for *hardware*
* To give a sense of why, think about logistics of a large "disk farm": all the intensive operations are done by the head nodes, disks are just passive
* Busier disk farm needs sophisticated hardware for parallel read/write, high-bandwidth network lines, fast encryption

S3 storage pricing:
* Rule of thumb: ~$20/TB to store the data

* Transfer *into* an S3 bucket from the internet is always free (getting stuff into the bucket is the easy part - that's how they get ya)
* Transfer *out* of an S3 bucket to the internet costs ~$90/TB

* Transfer *out* of an S3 bucket to most other Amazon regions costs ~$20/TB
* Transfer *out* of an S3 bucket into an EC2 node in the same AWS region does not incur costs
* Transfer *into* an S3 bucket from an EC2 node in the same AWS region does not incur costs

As mentioned above, this means you won't be double-charged for transferring data from an S3 bucket to an EC2 node, then from the EC2 node out to the internet.

## S3 Storage Hierarchies

Continuing with the theme of planning ahead...

Storage hierarchies:
* Biggest cost of storage is not disk space, it's transfer
* Paying for speed, paying for timeliness, paying for *on the spot access* to your data
* Your data will be cheaper if you're willing to wait a few minutes or deal with a slow connection

Storage hierarchies:
* Standard (~$20/TB)
* Infrequent access (~$13/TB) - less frequent access, but at same transfer speed
* Glacier (~$4/TB) - delay of up to 12 hours (smaller files = faster), deleting data *newer* than 3 months incurrs costs

[Glacier Pricing](

Lifecycle rules:
* Can create rules to move old data from S3 buckets into Glacier

## EFS vs EBS vs S3

When do you use EFS, EBS, or S3?

Elastic Block Storage (EBS):
* **This is probably what you want**
* EBS is block storage for one EC2 node - designed for general purpose applications
* Cost: ~$120/TB/mo

Elastic File System (EFS):
* EFS is block storage for multiple EC2 nodes - designed for fast read-write operations, many incremental changes to files
* "Elastic" part of EFS - can dynamically grow as hard drive grows (PB+ scale)
* Hard drive on steroids - like plugging in a hard drive over a network, but big/fast/smart enough to be accessible to thousands+ of machines
* Expensive: ~$300/TB/mo

* S3 is object storage - it stores blobs of raw data, creates snapshots in time
* If you change a single character of a large file, bucket has to create new shapshot
* Booting from S3 as a hard disk would take you about a thousand years... don't do that
* Cheapest: ~$20/TB

Cool but $$$:
* You may see "appliances" mentioned in Amazon documentation - Amazon will ship you a physical data transfer appliance that encrypts and copies data on site ([Snowball](
* Can also purchase special network connections that bypass the public internet - like ISP putting alligator clips between your network lines and Amazon's network lines