Video object segmentation (VOS) is a research hotspot in the field of computer vision. Traditional VOS based on deep learning fine-tunes the deep network online
which leads to long time-consuming segmentation and is difficult to meet real-time requirements. Therefore
we propose a fast VOS method. First
the weight-shared siamese encoder subnet maps the reference stream and the target stream to the same feature space; so that the same objects have similar features. Then
the global feature extraction subnet matches the features similar to the given object to locate the object. Finally
the decoder subnet restores the object features and gets edge information by connecting the low-level features of target stream to output the mask. Experiments on public benchmark datasets show that our method improves the speed significantly and achieves good performance.