Self-Adaptive Strategy for Cache Management in Spark

BIAN Chen; YU Jiong; YING Chang-tian; XIU Wei-rong

doi:10.3969/j.issn.0372-2112.2017.02.003

您当前的位置：

首页 >

文章列表页 >

Self-Adaptive Strategy for Cache Management in Spark

更新时间：2025-07-16

- Self-Adaptive Strategy for Cache Management in Spark
- Acta Electronica Sinica Vol. 45, Issue 2, Pages: 278-284(2017)
- 作者机构：
  
  1. 新疆大学信息科学与工程学院,新疆,乌鲁木齐,830046
  2. 乌鲁木齐职业大学信息工程学院,新疆,乌鲁木齐,830002
  3. 新疆大学信息科学与工程学院,新疆,乌鲁木齐,830046
  4. 乌鲁木齐职业大学信息工程学院,新疆,乌鲁木齐,830002
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China (No.61262088, No.61462079)
- DOI：10.3969/j.issn.0372-2112.2017.02.003
  CLC： TP311
- Published Online：25 February 2017，
  
  Published：2017
- 稿件说明：
移动端阅览
BIAN Chen, YU Jiong, YING Chang-tian, et al. Self-Adaptive Strategy for Cache Management in Spark[J]. Acta Electronica Sinica, 2017, 45(2): 278-284.
DOI：

BIAN Chen, YU Jiong, YING Chang-tian, et al. Self-Adaptive Strategy for Cache Management in Spark[J]. Acta Electronica Sinica, 2017, 45(2): 278-284. DOI： 10.3969/j.issn.0372-2112.2017.02.003.

摘要

并行计算框架Spark缺乏有效缓存选择机制，不能自动识别并缓存高重用度数据；缓存替换算法采用LRU，度量方法不够细致，影响任务的执行效率.本文提出一种Spark框架自适应缓存管理策略（Self-Adaptive Cache Management，SACM），包括缓存自动选择算法（Selection）、并行缓存清理算法（Parallel Cache Cleanup，PCC）和权重缓存替换算法（Lowest Weight Replacement，LWR）.其中，缓存自动选择算法通过分析任务的DAG（Directed Acyclic Graph）结构，识别重用的RDD并自动缓存.并行缓存清理算法异步清理无价值的RDD，提高集群内存利用率.权重替换算法通过权重值判定替换目标，避免重新计算复杂RDD产生的任务延时，保障资源瓶颈下的计算效率.实验表明：我们的策略提高了Spark的任务执行效率，并使内存资源得到有效利用.

Abstract

As a parallel computation framework

Spark does not have a good strategy to select valuable RDD to cache in limited memory.When memory has been full load

Spark will discard the least recently used RDD while ignoring other factors such as the computation cost and so on.This paper proposed a self-adaptive cache management strategy (SACM)

which comprised of automatic selection algorithm(Selection)

parallel cache cleanup algorithm (PCC) and lowest weight replacement algorithm (LWR).Selection algorithm can seek valuable RDDs and cache their partitions to speed up data intensive computations.PCC clean-up the valueless RDD sasynchronously to improve memory utilization.LWR takes comprehensive consideration of the usage frequency of RDD

the RDD's computation cost

and the size of RDD.Experiment results show that Spark with our selection algorithm calculates faster than traditional Spark

parallel cleanup algorithm contributes to the improvement of memory utilization

and LWR shows better performance in limited memory.

关键词

Keywords

references

Views

1320

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Adaptive Data Balanced Partitioner in Spark

Review on Technology of Placement and Routing for the FPGA

Build Kd-Tree for Virtual Scenes in a Fast and Optimal Way

Related Author

HE Yu-lin

WU Dong-tong

HUANG Zhe-xue

CHEN Lei

WANG Shuo

ZHOU Jing

ZHANG Yao-wei

PANG Yong-jing

Related Institution

College of Computer Science and Software Engineering, Shenzhen University

Guangdong Laboratory of Artificial Intelligence and Digital Economy

Beijing Microelectronics Technology Institute

Department of Micro/nanoelectronics, School of Electronics Engineering and Computer Science, Peking University

Guangdong Laboratory of Artificial Intelligence and Digital Economy

⁰