Richa Khandelwal
  • Home
  • Software
  • Travel Stories
  • About Me
Sign in Subscribe

AWS

Tuning Spark Jobs on EMR with YARN - Lessons Learnt

Apache Spark is a distributed processing system that can process data at a very large scale. Even though Spark's memory model is optimized to handle large amount of data, it is no magic and there are several settings that can give you most out of your cluster. I
Richa Khandelwal Jun 21, 2017

Cross-Account S3 bucket settings for data transfer on Hadoop based systems

While trying to write some data from one AWS account to another, I ran into several cross-account S3 settings issues. Google was coming out thin on my searches, hence documenting it in case somebody else runs into this. Problem Account 1 (let's call it Dumbledore) has a S3
Richa Khandelwal Jan 1, 2017

Migrating to EMR 5.0.X for Spark 2.0

AWS released EMR 5.0 recently. It is a major release and contains upgrades such as Apache Spark 2.0, Apache Hive 2.1, Presto 0.150, Apache Zeppelin 0.6.1 etc Spark 2.0 comes with various performance and API updates. There are also some breaking changes that
Richa Khandelwal Aug 4, 2016

Subscribe to Richa Khandelwal

Don't miss out on the latest news. Sign up now to get access to the library of members-only articles.
Richa Khandelwal © 2025. Powered by Ghost