Abstract: Segmentation and perspective correction of identity documents captured by a smartphone camera are challenging due to variable illumination, different viewing angles, motion blur, occlusions, and complex backgrounds. To address these challenges, several state-of-the-art deep learning models are evaluated for identity card segmentation, including U-Net, Mask R-CNN, Mobile U-Net, Yolov8, YOLO11, and SegFormer.
These models are trained on custom synthetic images and tested on real samples to evaluate transferability from synthetic to real images. Among the evaluated models, U-Net achieves the best performance, with a Dice score of 99.90% on synthetic data and 98.65% on real images.
In this work, a robust line-based perspective correction method that overcomes common limitations of edge-based and corner-based rectification approaches is introduced. The method is evaluated on the custom dataset and on the public MIDV-500 and MIDV-2020 datasets. On the custom dataset, the proposed approach reduces the average corner distance to 2.59 px on synthetic images and 4.66 px on real images.
DOI: 10.1016/j.rineng.2026.110566