Tuning Spark Jobs on EMR with YARN - Lessons Learnt
Apache Spark is a distributed processing system that can process data at a very
large scale. Even though Spark's memory model is optimized to handle large
amount of data, it is no magic and there are several settings that can give you
most out of your cluster. I