Transformer-based framework for accurate segmentation of high-resolution images in structural health monitoring

Computer-Aided Civil and Infrastructure Engineering


Mohsen Azimi1, Tony Yang1

1Department of Civil Engineering, The University of British Columbia, Vancouver, British Columbia, Canada   

Abstract


High-resolution image segmentation is essential in structural health monitoring (SHM), enabling accurate detection and quantification of structural components and damages. However, conventional convolutional neural network-based segmentation methods face limitations in real-world deployment, particularly when handling high-resolution images producing low-resolution outputs. This study introduces a novel framework named Refined-Segment Anything Model (R-SAM) to overcome such challenges. R-SAM leverages the state-of-the-art zero-shot SAM to generate unlabeled segmentation masks, subsequently employing the DEtection Transformer model to label the instances. The key feature and contribution of the R-SAM is its refinement module, which improves the accuracy of masks generated by SAM without the need for extensive data annotations and fine-tuning. The effectiveness of the proposed framework was assessed through qualitative and quantitative analyses across diverse case studies, including multiclass segmentation, simultaneous segmentation and tracking, and 3D reconstruction. The results demonstrate that R-SAM outperforms state-of-the-art convolution neural network-based segmentation models with a mean intersection-over-union of 97% and a mean boundary accuracy of 87%. In addition, achieving high coefficients of determination in target-free tracking case studies highlights its versatility in addressing various challenges in SHM.