March 18, 2024
Join us for the 2024 March ECE Research Seminar.
ECE Research Seminar
March 22, 2024, Friday, 12:00 – 1:00 pm, Simrall 104
https://msstate.webex.com/msstate/j.php?MTID=m202a6146dd992ad4f3fa78a7fe6bfec8
Enhancing Vision Tasks through Wavelet-Integrated Attention Mechanisms: Attention Mechanism in CNNs and Self-Attention in Vision Transformers
Simegnew Yihunie Alaba | sa1724@msstate.edu
Abstract: The human visual system inherently prioritizes some areas of a scene over processing all the details simultaneously. Analogously, the attention mechanism highlights the most critical features while disregarding the less important ones. This work introduces two novel approaches that enhance the performance of neural networks in vision tasks by integrating discrete wavelet transform (DWT) to improve the feature extraction capability in CNN models and reduce the computational burden in vision transformers. The first approach introduces the Wavelet Convolutional Attention Module (WCAM), which incorporates inherent wavelet properties such as multiresolution, invertible downsampling, the capability to separate noise from the input, and sparsity into CNNs. WCAM enhances CNNs by directing them towards more efficient and discriminative feature extraction and learning, emphasizing crucial aspects of features while suppressing less significant ones. WCAM’s versatility and lightweight design allow easy integration into existing CNN architectures without additional computational complexity. The second model, WiFormer, combines the DWT with vision transformers to reduce the computational cost of vision transformers due to the multihead self-attention. This integration retains crucial image details while minimizing computational demands, particularly enhancing the model’s robustness through DWT’s noise filtering and sparsity. WiFormer simplifies the computational cost associated with the scaled dot-product attention by operating in the frequency domain, allowing for efficient element-wise multiplication. Despite being trained solely on the ImageNet-1k dataset without prior pretraining on larger datasets, WiFormer achieves a top-1 accuracy of 85.6% on image classification tasks and outperforms comparable state-of-the-art models. Extensive testing on datasets like CIFAR-10, CIFAR-100, ImageNet-1k, and MS COCO demonstrates WCAM’s and WiFormer’s superiority over existing convolutional attention modules and vision transformers in image classification and object detection tasks.
Mr. Simegnew Yihunie Alaba is a Ph.D. candidate in Electrical and Computer Engineering at Mississippi State University. He earned his Master of Science in Computer Engineering from Addis Ababa University and a Bachelor of Science in Electrical Engineering from Arbaminch University in Ethiopia. His academic focus encompasses computer vision, machine learning, deep learning, and autonomous driving. He has been working as a graduate research assistant at the High Performance Computing Collaboratory/GRI since January 2021 under the supervision of Dr. John Ball.
* For further information contact: Dr. Jenny Du | du@ece.msstate.edu | 5-2035
For WebEx Information, scan the QR
The Department of Electrical and Computer Engineering at Mississippi State University consists of 27 faculty members (including seven endowed professors), seven professional staff, and over 700 undergraduate and graduate students, with approximately 100 being at the Ph.D. level. With a research expenditure of over $14.24 million, the department houses the largest High Voltage Laboratory among North American universities.