Weighted Multi-Modal Fusion for RGB-T Tracking

145 views

Authors

  • Dao Vu Hiep (Corresponding Author) School of Information and Communication Technology, Hanoi University of Science and Technology
  • Tran Quang Duc School of Information and Communication Technology, Hanoi University of Science and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.84.2022.32-41

Keywords:

Visual Object Tracking; Multi-modal fusion; Convulutional Neural Network; Discriminative Correlation Filtes.

Abstract

 As an important task in computer vision, visual object tracking, especially RGB tracking like KCF, CSRDCF, SiamFC, SiamRPN, ATOM, SiamDW, DiMP are  commonly  believed  to  be  fast  and  reliable  enough be deployed. However, RGB tracking obtains unsatisfactory performance in bad environmental conditions, e.g. low illumination, rain, and smog. It was found that thermal infrared sensors (8÷14 µm) provide a more stable signal for these scenarios. Some same level fusion modal algorithms such as FSRPN, SiamDW_T, mfDiMP obtain higher results while the environmental conditions are not considered.  The paper describes a weighted multi-modal fusion for RGB-T tracking. Experiments are carried on VOT-RGBT dataset that demonstrate our algorithm achieve EAO of 0.423, higher than some popular tracking algorithms and can operate at speed of 13 fps on casual hardware.

References

[1]. M. Kristan et al., "The Seventh Visual Object Tracking VOT2019 Challenge Results," 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 2206-2241, (2019), doi: 10.1109/ICCVW.2019.00276. DOI: https://doi.org/10.1109/ICCVW.2019.00276

[2]. D. S. Bolme, J. R. Beveridge, B. A. Draper and Y. M. Lui, "Visual object tracking using adaptive correlation filters," 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2544-2550, (2010), doi: 10.1109/CVPR.2010.5539960. DOI: https://doi.org/10.1109/CVPR.2010.5539960

[3]. Henriques, Joao & Caseiro, Rui & Martins, Pedro & Batista, Jorge. “High-Speed Tracking with Kernelized Correlation Filters”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 37. 10.1109/TPAMI.2014.2345390, (2014). DOI: https://doi.org/10.1109/TPAMI.2014.2345390

[4]. Lukežič, A., Vojíř, T., Čehovin Zajc, L. et al. “Discriminative Correlation Filter Tracker with Channel and Spatial Reliability”. Int J Comput Vis 126, 671–688 (2018). https://doi.org/10.1007/s11263-017-1061-3 DOI: https://doi.org/10.1007/s11263-017-1061-3

[5]. M. Danelljan, G. Häger, F. S. Khan and M. Felsberg, "Convolutional Features for Correlation Filter Based Visual Tracking," 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 621-629, (2015), doi: 10.1109/ICCVW.2015.84. DOI: https://doi.org/10.1109/ICCVW.2015.84

[6]. Bertinetto, Luca & Valmadre, Jack & Henriques, Joao & Vedaldi, Andrea & Torr, Philip. “Fully-Convolutional Siamese Networks for Object Tracking”. 9914. 850-865. 10.1007/978-3-319-48881-3_56, (2016). DOI: https://doi.org/10.1007/978-3-319-48881-3_56

[7]. B. Li, J. Yan, W. Wu, Z. Zhu and X. Hu, "High Performance Visual Tracking with Siamese Region Proposal Network," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, (2018), doi: 10.1109/CVPR.2018.00935. DOI: https://doi.org/10.1109/CVPR.2018.00935

[8]. Danelljan, Martin & Bhat, Goutam & Khan, Fahad & Felsberg, Michael. “ATOM: Accurate Tracking by Overlap Maximization”. 4655-4664. 10.1109/CVPR.2019.00479, (2019). DOI: https://doi.org/10.1109/CVPR.2019.00479

[9]. Zhang, Zhipeng & Peng, Houwen. “Deeper and Wider Siamese Networks for Real-Time Visual Tracking”. 4586-4595. 10.1109/CVPR.2019.00472, (2019). DOI: https://doi.org/10.1109/CVPR.2019.00472

[10]. Bhat, Goutam & Danelljan, Martin & Van Gool, Luc & Timofte, Radu. “Learning Discriminative Model Prediction for Tracking”. 6181-6190. 10.1109/ICCV.2019.00628, (2019). DOI: https://doi.org/10.1109/ICCV.2019.00628

[11]. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database”. IEEE Computer Vision and Pattern Recognition (CVPR), (2009). DOI: https://doi.org/10.1109/CVPR.2009.5206848

[12]. Zhang, Lichao & Danelljan, Martin & Gonzalez-Garcia, Abel & Weijer, Joost & Khan, Fahad. “Multi-Modal Fusion for End-to-End RGB-T Tracking”. 2252-2261. 10.1109/ICCVW.2019.00278, (2019). DOI: https://doi.org/10.1109/ICCVW.2019.00278

[13]. Hiep Dao, Hieu Dinh Mac, and Duc Quang Tran "Noise-aware deep learning algorithm for one-stage multispectral pedestrian detection," Journal of Electronic Imaging 31(3), 033035, 16 June (2022). https://doi.org/10.1117/1.JEI.31.3.033035 DOI: https://doi.org/10.1117/1.JEI.31.3.033035

[14]. S. Hwang, J. Park, N. Kim, Y. Choi and I. S. Kweon, "Multispectral pedestrian detection: Benchmark dataset and baseline," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1037-1045, (2015), doi: 10.1109/CVPR.2015.7298706. DOI: https://doi.org/10.1109/CVPR.2015.7298706

Published

28-12-2022

How to Cite

Dao, H., and D. Tran. “Weighted Multi-Modal Fusion for RGB-T Tracking”. Journal of Military Science and Technology, no. 84, Dec. 2022, pp. 32-41, doi:10.54939/1859-1043.j.mst.84.2022.32-41.

Issue

Section

Research Articles