A novel low latency and high throughput programmable tree architecture is proposed
which can efficiently implement both FSA and TSSA algorithms in motion estimation. In the proposed architecture
the processing elements (PE) are reduced by 1/3 and the delay time of PE is decreased by a half compared to other architectures. Specific buffering structure (ME window) is employed to lower I/O bandwidth and decrease I/O pin count. Pipeline interleaving technique is employed to use the hardware 100%. Owing to these properties
the achitecture is very suitable for VLSI implementation.