To solve the problem that the time consumption of convolutional neural network is too much
which is mostly caused by the high complexity of convolution operation
an FPGA implementation of a configurable CNN co-accelerator with eight-stage pipeline structure is proposed. By embedding the pooling controller in the convolution controller
the computational module is able to obtain more resources. Specially
a mirror-tree structure is designed to increase parallelism. Furthermore
to increase computational density and speed up calculation at the same time
the Map algorithm is implemented in this design. The experimental results show that the computing performance of this implementation reaches 22.74 GOPS on 32-bit fixed/float point. Compared with MAPLE accelerator
the computational density is increased by 283.3%
and the calculation speed is boosted by 224.9%. Compared with MCA(Memory-Centric Accelerator)
the computational density is increased by 14.47%
and the calculation speed is boosted by 33.76%. With a precision range between 8-bit and 16-bit fixed point
the performance reaches 58.3GOPS
and the computational density is increased by 8.5% compared with LBA(Layer-Based Accelerator).