This article features the work of Sagar Samtani, Assistant Professor of Operations and Decision Technologies; Weimer Faculty Fellow; Director, Kelley’s Data Science and Artificial Intelligence Lab (DSAIL) at the Kelley School of Business.
As machine learning has advanced, Deep Learning (DL) models, which are used to perform machine learning that is inspired by the structure of the human brain, have had significant success in various natural language processing (NLP) tasks. These models have improved the performance of text classification and text regression tasks, but have also proved to be extremely vulnerable to adversarial attacks.
The authors propose a new explanation-based method for adversarial text attacks using additive feature attribution explainable methods, or tools that can help explain the outputs of machine learning models. These tools measure the sensitivity of inputs when creating black-box adversarial attacks on DL models that perform text classification or regression.