Spark executor lost because of time out even after setting quite long time out value 1000 seconds -


hi have written spark job seems working fine hour , after executor start getting lost because of timeout see following in log statement

15/08/16 12:26:46 warn spark.heartbeatreceiver: removing executor 10 no recent heartbeats: 1051638 ms exceeds timeout 1000000 ms  

i dont see errors see above warning , because of executor gets removed yarn , see rpc client disassociated error , ioexception connection refused , fetchfailedexception

after executor gets removed see again getting added , starts working , other executors fails again. question is normal executor getting lost? happens task lost executors working on? spark job keeps on running since long around 4-5 hours have cluster 1.2 tb memory , no of cpu cores. solve above time out issue tried increase time spark.akka.timeout 1000 seconds no luck. using following command run spark job please guide new spark. using spark 1.4.1. in advance.

./spark-submit --class com.xyz.abc.mysparkjob  --conf "spark.executor.extrajavaoptions=-xx:maxpermsize=512m" --driver-java-options -xx:maxpermsize=512m --driver-memory 4g --master yarn-client --executor-memory 25g --executor-cores 8 --num-executors 5 --jars /path/to/spark-job.jar 

what might happen slaves cannot launch executor anymore, due memory issue. following messages in master logs:

15/07/13 13:46:50 info master: removing executor app-20150713133347-0000/5 because exited 15/07/13 13:46:50 info master: launching executor app-20150713133347-0000/9 on worker worker-20150713153302-192.168.122.229-59013 15/07/13 13:46:50 debug master: [actor] handled message (2.247517 ms) executorstatechanged(app-20150713133347-0000,5,exited,some(command exited code 1),some(1)) actor[akka.tcp://sparkworker@192.168.122.229:59013/user/worker#-83763597] 

you might find detailed java errors in worker's log directory, , maybe type of file: work/app-id/executor-id/hs_err_pid11865.log.

see http://pastebin.com/b4fbxvhr

this issue might resolved application management of rdd's, not increasing size of jvm's heap.


Comments

Popular posts from this blog

dns - How To Use Custom Nameserver On Free Cloudflare? -

python - Pygame screen.blit not working -

c# - Web API response xml language -