Richa Khandelwal
  • Home
  • Software
  • Reflections
  • Travel Stories
  • About Me
Sign in Subscribe

AWS

A collection of 3 posts
Apache Spark

Tuning Spark Jobs on EMR with YARN - Lessons Learnt

Apache Spark is a distributed processing system that can process data at a very large scale. Even though Spark's memory model is optimized to handle large amount of data, it is no magic and there are several settings that can give you most out of your cluster. I am summarizing
Jun 21, 2017 4 min read
AWS

Cross-Account S3 bucket settings for data transfer on Hadoop based systems

While trying to write some data from one AWS account to another, I ran into several cross-account S3 settings issues. Google was coming out thin on my searches, hence documenting it in case somebody else runs into this. Problem Account 1 (let's call it Dumbledore) has a S3 Bucket. Account
Jan 1, 2017 3 min read
AWS

Migrating to EMR 5.0.X for Spark 2.0

AWS released EMR 5.0 recently. It is a major release and contains upgrades such as Apache Spark 2.0, Apache Hive 2.1, Presto 0.150, Apache Zeppelin 0.6.1 etc Spark 2.0 comes with various performance and API updates. There are also some breaking changes that
Aug 4, 2016 2 min read
Page 1 of 1
Richa Khandelwal © 2023
Powered by Ghost