Learning Dynamic Siamese Network for Visual Object Tracking

Qing Guo       Wei Feng       Ce Zhou       Rui Huang        Liang Wan        Song Wang

International Conference on Computer Vision



How to effectively learn temporal variation of target appearance, to exclude the interference of cluttered back- ground, while maintaining real-time response, is an essen- tial problem of visual object tracking. Recently, Siamese networks have shown great potentials of matching based trackers in achieving balanced accuracy and beyond real- time speed. However, they still have a big gap to classifi- cation & updating based trackers in tolerating the temporal changes of objects and imaging conditions. In this paper, we propose dynamic Siamese network, via a fast transfor- mation learning model that enables effective online learn- ing of target appearance variation and background suppression from previous frames. We then present elementwise multi-layer fusion to adaptively integrate the network out- puts using multi-level deep features. Unlike state-of-the- art trackers, our approach allows the usage of any feasible generally- or particularly-trained features, such as SiamFC and VGG. More importantly, the proposed dynamic Siamese network can be jointly trained as a whole directly on the la- beled video sequences, thus can take full advantage of the rich spatial temporal information of moving objects. As a result, our approach achieves state-of-the-art performance on OTB-2013 and VOT-2015 benchmarks, while exhibits superiorly balanced accuracy and real-time response over state-of-the-art competitors.



(PDF, 4.07M)

copyright © Liang Wan / HOME