AWS notes and scripts
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long. 4.8 KiB

2 years ago
  1. # AWSome Day Notes: Part 1: The Basics
  2. Following are some notes from Amazon's AWSome Day (Tuesday, February 27, 2018).
  3. ## EC2 Costs and Scheduling
  4. Cost of a node:
  5. * Important to understand Amazon's price model: users pay for *access*, not for *hardware*
  6. * Cost of AWS node is cost for *on the spot access*
  7. Scheduling:
  8. * If you can anticipate your usage, you can schedule instances in advance, and get a discount
  9. * Discount of 50% for one-year reservation (if you keep it busy for 6 months, you've made your money back)
  10. * Spot instances also available - need to be robust to sudden starts/stops (good for embarrassingly parallel jobs)
  11. * Cheaper to anticipate your usage and plan ahead
  12. ## EC2 Transfer Costs
  13. EC2 Instances:
  14. * See [EC2 Instance Pricing - Data Transfer]( section
  15. * Network costs for AWS nodes are an important consideration for high-traffic nodes (>10 TB)
  16. EC2-Internet:
  17. * Traffic going from the internet *into* a node is always free
  18. * Traffic going from the node *out* to the internet incurrs costs after 10 TB
  19. * Outbound traffic costs ~$90/TB
  20. AWS Regions:
  21. * Traffic *within* a region does not incur costs (well... it's complicated)
  22. * Traffic *between* regions will incur costs
  23. EC2-S3:
  24. * Transfer *into* an EC2 node from S3 bucket in same AWS region does not incur costs
  25. * Transfer *out of* an EC2 node into S3 bucket in same AWS region does not incur costs
  26. * (If they did charge you, they would be double-dipping...)
  27. Note: the list of prices is like a legal document, so use the [AWS Monthly Calculator]( to estimate monthly costs with more detail.
  28. ## S3 Transfer Costs
  29. * See [S3 Pricing - Data Transfer](
  30. * Price model for storage is simliar to price model for AWS nodes: you pay for *access*, not for *hardware*
  31. * To give a sense of why, think about logistics of a large "disk farm": all the intensive operations are done by the head nodes, disks are just passive
  32. * Busier disk farm needs sophisticated hardware for parallel read/write, high-bandwidth network lines, fast encryption
  33. S3 storage pricing:
  34. * Rule of thumb: ~$20/TB to store the data
  35. S3-Internet:
  36. * Transfer *into* an S3 bucket from the internet is always free (getting stuff into the bucket is the easy part - that's how they get ya)
  37. * Transfer *out* of an S3 bucket to the internet costs ~$90/TB
  38. S3-EC2:
  39. * Transfer *out* of an S3 bucket to most other Amazon regions costs ~$20/TB
  40. * Transfer *out* of an S3 bucket into an EC2 node in the same AWS region does not incur costs
  41. * Transfer *into* an S3 bucket from an EC2 node in the same AWS region does not incur costs
  42. As mentioned above, this means you won't be double-charged for transferring data from an S3 bucket to an EC2 node, then from the EC2 node out to the internet.
  43. ## S3 Storage Hierarchies
  44. Continuing with the theme of planning ahead...
  45. Storage hierarchies:
  46. * Biggest cost of storage is not disk space, it's transfer
  47. * Paying for speed, paying for timeliness, paying for *on the spot access* to your data
  48. * Your data will be cheaper if you're willing to wait a few minutes or deal with a slow connection
  49. Storage hierarchies:
  50. * Standard (~$20/TB)
  51. * Infrequent access (~$13/TB) - less frequent access, but at same transfer speed
  52. * Glacier (~$4/TB) - delay of up to 12 hours (smaller files = faster), deleting data *newer* than 3 months incurrs costs
  53. [Glacier Pricing](
  54. Lifecycle rules:
  55. * Can create rules to move old data from S3 buckets into Glacier
  56. ## EFS vs EBS vs S3
  57. When do you use EFS, EBS, or S3?
  58. Elastic Block Storage (EBS):
  59. * **This is probably what you want**
  60. * EBS is block storage for one EC2 node - designed for general purpose applications
  61. * Cost: ~$120/TB/mo
  62. Elastic File System (EFS):
  63. * EFS is block storage for multiple EC2 nodes - designed for fast read-write operations, many incremental changes to files
  64. * "Elastic" part of EFS - can dynamically grow as hard drive grows (PB+ scale)
  65. * Hard drive on steroids - like plugging in a hard drive over a network, but big/fast/smart enough to be accessible to thousands+ of machines
  66. * Expensive: ~$300/TB/mo
  67. S3:
  68. * S3 is object storage - it stores blobs of raw data, creates snapshots in time
  69. * If you change a single character of a large file, bucket has to create new shapshot
  70. * Booting from S3 as a hard disk would take you about a thousand years... don't do that
  71. * Cheapest: ~$20/TB
  72. Cool but $$$:
  73. * You may see "appliances" mentioned in Amazon documentation - Amazon will ship you a physical data transfer appliance that encrypts and copies data on site ([Snowball](
  74. * Can also purchase special network connections that bypass the public internet - like ISP putting alligator clips between your network lines and Amazon's network lines