電腦科學與資訊工程科 Computer Science & Information Engineering
190022 Taiwan
視障之眼:以合成式資料增強訓練視覺語言模型於視障輔助的突破 BrailleEye: Advancing Vision Language Models for the Visually Impaired Through Synthetic Data Augmentation
The educational applications of Vision-Language Models (VLMs), such as OpenAI
GPT-5, have grown rapidly in recent years. However, when applied to assist learning for
visually impaired students, most existing VLMs demonstrate a critical limitation — they lack
understanding of Braille and tactile graphics. As a result, they often fail to provide
meaningful assistance or feedback, and in some cases even generate misleading responses.
This gap highlights a severe inequality in educational accessibility for the visually impaired,
contradicting the goals of the United Nations Sustainable Development Goals (SDGs).
To address the scarcity of domain-specific VLMs and training data for the visually
impaired, this study proposes a novel Synthetic Data Augmentation (SDA) method tailored
for the blind education domain. The proposed pipeline integrates easily accessible datasets
(such as SQuAD) with large-scale VLM-assisted synthesis to efficiently generate high
quality, generalized, and low-cost training data — significantly reducing the resources
required compared to traditional methods.
Building upon Google DeepMind’s state-of-the-art lightweight open-source model
Gemma 3–4B, we integrate SigLip ViT and apply LoRA supervised fine-tuning using the
synthesized dataset on a standard personal computer. Experimental results show that our
model surpasses GPT-5 and previous Braille-related VLM studies, achieving 94.3% in
Braille transcription, 74.2% in blind exam exact match (EM), 87.2% in tactile diagram
captioning, and human expert evaluation scores of 6.53/7 and 6.87/7. These results
demonstrate the model’s stability and strong visual-assistance capabilities under generalized
evaluation data. Furthermore, with only 36% of the parameters compared to prior models, its
lightweight architecture enables practical deployment on personal devices such as
smartphones and laptops, expanding the accessibility of AI-powered educational tools for the
visually impaired worldwide.