Spark does not have a good strategy to select valuable RDD to cache in limited memory.When memory has been full load
Spark will discard the least recently used RDD while ignoring other factors such as the computation cost and so on.This paper proposed a self-adaptive cache management strategy (SACM)
which comprised of automatic selection algorithm(Selection)
parallel cache cleanup algorithm (PCC) and lowest weight replacement algorithm (LWR).Selection algorithm can seek valuable RDDs and cache their partitions to speed up data intensive computations.PCC clean-up the valueless RDD sasynchronously to improve memory utilization.LWR takes comprehensive consideration of the usage frequency of RDD
the RDD's computation cost
and the size of RDD.Experiment results show that Spark with our selection algorithm calculates faster than traditional Spark
parallel cleanup algorithm contributes to the improvement of memory utilization
and LWR shows better performance in limited memory.