Abstract
Spiking Neural Networks are a relatively new type of energy-efficient artificial intelligence model. Studies on this model include unimodal models and multimodal models. Existing research studies multimodal spiking neural networks (SNNs) which fuse different types of data such as images and audio into one model. However, the effects of noise on the multimodal model are not thoroughly investigated. Noise types and noise levels in each data modality may influence multimodal SNN performance. In this paper, the proposed method is a new framework to study the effects of noise for multimodal SNNs on the classification task. Preprocessing techniques, the insertion of audio and image noise, and the Leaky-Integrate-and-Fire neurons are explored. Experimental results show that the multimodal SNN outperforms its unimodal counterparts. Moreover, some types of audio and images performed better than others, as well as some noise levels performing better than others. The datasets were all frames of images and snippets of audio extracted from videos. The simulated noise was generated from software, which shows that this method can be improved with real noise from real-world data acquisition in future work.