Research in Responsible AI Lab
The Responsible AI Lab is committed to enhancing AI's fairness, explainability, accountability, transparency, ethics, security, and privacy through the design and development of novel algorithms and models, harnessing AI's potential as a powerful force for promoting social good. Moreover, our research delves into the examination and anticipation of societal impacts arising from both established and emerging AI techniques, with a particular emphasis on controversial aspects like face recognition, accomplished through big social data mining and deep learning. In addition, our lab engages in interdisciplinary efforts to tackle multifaceted real-world challenges spanning diverse domains, including sustainable manufacturing and smart agriculture. We are envisioning a future where AI is thoughtfully and responsibly integrated into society, shaping a more equitable world.
Responsible AI
Towards Transferable Targeted Adversarial Examples
Transferability of adversarial examples is critical for black-box deep learning model attacks. While most existing studies focus on enhancing the transferability of untargeted adversarial attacks, few of them studied how to generate transferable targeted adversarial examples that can mislead models into predicting a specific class. Moreover, existing transferable targeted adversarial attacks usually fail to sufficiently characterize the target class distribution, thus suffering from limited transferability. In this research, we propose the Transferable Targeted Adversarial Attack (TTAA), which can capture the distribution information of the target class from both label-wise and feature-wise perspectives, to generate highly transferable targeted adversarial examples. To this end, we design a generative adversarial training framework consisting of a generator to produce targeted adversarial examples, and feature-label dual discriminators to distinguish the generated adversarial examples from the target class images. Specifically, we design the label discriminator to guide the adversarial examples to learn label-related distribution information about the target class. Meanwhile, we design a feature discriminator, which extracts the feature-wise information with strong cross-model consistency, to enable the adversarial examples to learn the transferable distribution information. Furthermore, we introduce the random perturbation dropping to further enhance the transferability by augmenting the diversity of adversarial examples used in the training process.


Fairness-aware Adversarial Network Pruning
Network pruning aims to compress models while minimizing loss in accuracy. With the increasing focus on bias in AI systems, the bias inheriting or even magnification nature of traditional network pruning methods has raised a new perspective towards fairness-aware network pruning. Straightforward pruning plus debias methods and recent designs for monitoring disparities of demographic attributes during pruning have endeavored to enhance fairness in pruning. However, neither simple assembling of two tasks nor specifically designed pruning strategies could achieve the optimal trade-off among pruning ratio, accuracy, and fairness. This research proposes an end-to-end learnable framework for fairness-aware network pruning, which optimizes both pruning and debias tasks jointly by adversarial training against those final evaluation metrics like accuracy for pruning, and disparate impact and equalized odds for fairness. In other words, our fairness-aware adversarial pruning method would learn to prune without any handcraft rules. Therefore, our approach could flexibly adapt to variate network structures. Exhaustive experimentation demonstrates the generalization capacity of our approach, as well as superior performance on pruning and debias simultaneously. To highlight, the proposed method could preserve the SOTA pruning performance while significantly improving fairness by around 50% as compared to traditional pruning methods.
Investigating Code Generation Performance of ChatGPT
The recent advancements in large language models and generative models are enabling innovative ways of performing tasks like programming, debugging, and testing. Our research presents a scalable crowdsourcing data-driven framework to investigate the code generation performance of generative large language models. We focus on ChatGPT to reveal insights and patterns in code generation. We propose a hybrid keyword word expansion method that filters relevant social media posts on Twitter and Reddit using topic modeling and expert knowledge. Our data analytics show that ChatGPT has been used in more than 10 programming languages for a diverse range of tasks such as code debugging, interview preparation, and academic assignment solving. Surprisingly, our analysis shows that fear is the dominant emotion associated with ChatGPT's code generation, overshadowing emotions of happiness, anger, surprise, and sadness. We also identifiy many ethical issues of generated code. In certain instances, ChatGPT has produced code that showed biases related to race, gender, or other demographic characteristics. Furthermore, we construct a ChatGPT prompt and corresponding code dataset by analyzing the screenshots of ChatGPT code generation shared on social media. This dataset enables us to evaluate the quality of the generated code, and we will make the dataset available to the public soon.


Adversarial Attacking and Improving Gender Fairness in Image Search
Adversarial attacks are threatening the safety of AI models, but such attacks can also be used to examine and evaluate the fairness, robustness, trustworthiness, and security of AI systems. In our recently accepted AAAI'22 paper, we proposed adversarial attack queries composing of professions and countries (e.g., "CEO United States") to investigate whether gender bias is thoroughly mitigated by AI-based image search engines. Our experiments on Google, Baidu, Naver, and Yandex Image Search showed that the proposed attack could effectively trigger high levels of gender bias in image search results. To defend against such attacks and mitigate gender bias, we designed and implemented three novel re-ranking algorithms -- epsilon-greedy algorithm, relevance-aware swapping algorithm, and fairness-greedy algorithm, to re-rank returned images for given image queries. This work was selected as an oral presentation and featured by AAAS EurekAlert! and ACM Tech News.
Adversarial Auditing Facial Recognition Systems
Dr. Feng is leading a Microsoft Azure ($20,000) and UW Strategic Research Fund ($4,000) funded research project as a PI -- "Investigating Fairness and Trustworthiness of AI Facial Recognition Systems by Adversarial Attacks", where we explore and answer the following research questions: (i) Are there any brittleness and embedded biases of state-of-the-art facial recognition systems when inferring demographic features from original face images? (ii) Which adversarial attacks (e.g., pixel perturbation, rotation, semantic interference, and content editing) are more effective in confusing AI facial recognition systems? (iii) Will AI facial recognition systems perform differently on the same adversarial attacks across different demographic groups? (iv) How to improve the robustness and capability of AI facial recognition systems to defend against such adversarial attacks? Working with my postdoc mentor, Prof. Chirag Shah, we are now writing an NSF Expeditions in Computing (Expeditions) proposal for Trustworthy AI in Information Access Systems.


Towards Fairness-Aware Ranking by Defining Latent Groups Using Inferred Features
Group fairness in search and recommendation is drawing increasing attention in recent years. This project explores how to define latent groups, which cannot be determined by self-contained features but must be inferred from external data sources, for fairness-aware ranking. In particular, taking the Semantic Scholar dataset released in TREC 2020 Fairness Ranking Track as a case study, we infer and extract multiple fairness related dimensions of author identity including gender and location to construct groups. Furthermore, we propose a fairness-aware re-ranking algorithm incorporating both weighted relevance and diversity of returned items for given queries. Our experimental results demonstrate that different combinations of relative weights assigned to relevance, gender, and location groups perform as expected.
ExpScore: Learning Metrics for Recommendation Explanation
Many information access and machine learning systems, including recommender systems, lack transparency and accountability. High-quality recommendation explanations are of great significance to enhance the transparency and interpretability of such systems. However, evaluating the quality of recommendation explanations is still challenging due to the lack of human-annotated data and benchmarks. In this project, we present a large explanation dataset named RecoExp, which contains thousands of crowdsourced ratings of perceived quality in explaining recommendations. To measure explainability in a comprehensive and interpretable manner, we propose ExpScore, a novel machine learning-based metric that incorporates the definition of explainability from various perspectives (e.g., relevance, readability, subjectivity, and sentiment polarity). Experiments demonstrate that ExpScore not only vastly outperforms existing metrics and but also keeps itself explainable. These resources and our findings can serve as forces of public good for scholars as well as recommender systems users.


Towards Generating Robust, Fair, and Emotion-Aware Explanations for Recommender Systems
Recommender systems often suffer from lack of fairness and transparency. Providing robust and unbiased explanations for recommendations has been drawing more and more attention as it can help address these issues and improve trustworthiness and informativeness of recommender systems. Current explanation generation models are found to exaggerate certain emotions without accurately capturing the underlying tone or the meaning. In this project, we propose a novel method based on a multi-head transformer, called Emotion-aware Transformer for Explainable Recommendation (EmoTER), to generate more robust, fair, and emotion-enhanced explanations. To measure the linguistic quality and emotion fairness of the generated explanations, we adopt both automatic text metrics and human perceptions for evaluation. Experiments on three widely-used benchmark datasets with multiple evaluation metrics demonstrate that EmoTER consistently outperforms the existing state-of-the-art explanation generation models in terms of text quality, explainability, and consideration for fairness to emotion distribution. Implementation of EmoTER will be released as an open-source toolkit to support further research.
SenCAPTCHA: A Mobile-First CAPTCHA Using Orientation Sensors
With the increasing amount of time spent on mobile devices, it is necessary to design mobile-friendly software to enhance security and preserve privacy. To fight against malicious bots on mobile, we designed and developed SenCAPTCHA, a mobile-first CAPTCHA that leverages the device's orientation sensors to allow for easy completion of the CAPTCHA on devices with small screen sizes (e.g., smartphones, smartwatches). SenCAPTCHA takes advantage of the fact that detecting animal facial keypoints in mutated images is an AI-hard problem. SenCAPTCHA works by showing users an image of an animal and asking them to tilt their devices to guide a red ball into the center of that animal's eye. A demo and the source code of SenCAPTCHA are available at www.sencaptcha.org. We described the design of SenCAPTCHA and demonstrated that it is resilient to various machine learning based attacks. We also conducted two IRB-approved usability studies of SenCAPTCHA involving a total of 472 mobile device users recruited from Amazon Mechanical Turk; our results showed that SenCAPTCHA was viewed as a "fun" CAPTCHA and that it was preferred by half of the participants to other existing CAPTCHAs. This work was awarded the Best Presentation for the Security, Privacy, and Acceptance Track at ACM UbiComp'2020, and nominated for Conference Best Presentation, Audience Award and Judges Award.

Social Media Data Analytics, Mining, and Modeling
Real-time Large-scale Social Media Auditing and Monitoring Systems
Crowdsourced data is able to deliver valuable insights into how people perceive and react to the world. Relying on crowdsourced social media data, we were the first to conduct a systematic, large-scale study on the petition for new emojis, and develop an interactive real-time requested emoji tracking system. We collected more than thirty million related tweets in one year, and examined patterns of new emoji petitions through visualizing spatiotemporal distributions, summarizing advocacy behaviors, and exploring factors that inspire such requests. We also studied the equity, diversity, and fairness issues due to unreleased but expected emojis, and concluded the significance of new emojis on society such as business promotion and violence control. We also proposed time-continuity sensitive ranking algorithms to figure out the most desired emojis, and implemented a web-based real-time requested emoji tracking system - www.call4emoji.org, providing interactive query services, such as ranking emojis using different policies and filtering requested emojis by keywords and date ranges. This work has been covered internationally by news outlets, including Financial Times, Business Insider, and Yahoo! Finance. As an extended work, my proposal titled "Fairness and Transparency in Social Media Listening: Evidence in Emoji Requests" was awarded a 2020 Kaggle's Open Data Research Grant (one of 19 winners globally).


Unifying Telescope and Microscope: A Multi-lens Framework with Open Data for Modeling Emerging Events
The benefits of open data, such as accessibility and transparency, have motivated and enabled a large number of research studies and applications in both academia and industry. However, each open data only offers a single perspective, and its potential inherent limitations (e.g., demographic biases) may lead to poor decisions and misjudgments. This project discusses how to create and use multiple digital lenses empowered by open data, including census data (macro lens), search logs (meso lens), and social data (micro lens), to investigate general real-world events. To reveal the unique angles and perspectives brought by each open lens, we summarize and compare the underpinning open data from eleven dimensions, such as utility, data volume, dynamic variability, and demographic fairness. Then, we propose an easy-to-use and generalized open data-driven framework, which automatically retrieves multi-source data, extracts features, and trains machine learning models for the event specified by answering what, when, and where questions. With low labor efforts, the framework's generalization and automation capabilities guarantee an instant investigation of general events and phenomena, such as disasters, sports events, and political activities. We also conduct two case studies, i.e., the COVID-19 pandemic and Great American Eclipse, to demonstrate its feasibility and effectiveness at different time granularities.
Analyzing the Emerging Security and Privacy Issues on Social Media
Real-time social media auditing and monitoring can serve as a crowdsourcing tool to track and analyze the emerging security and privacy issues reported and discussed by people online. Now I am leading a research study that mainly leverages social media listening and auditing to investigate self-reported phishing attacks. To be specific, we filter and sample real-time phishing related social media postings, and further conduct data analytics to reveal potential patterns. We aim to figure out how emerging phishing attacks are correlated with trending and viral topics on social media, such as the COVID-19 vaccine and cryptocurrency. Thus, we will train machine learning models to predict potential phishing attacks before their occurrence. To mitigate potential social media data bias, we incorporate the hourly/daily aggregated search logs of phishing related keywords provided by Google Trends when modeling and analyzing phishing attacks. Dr. Feng is collaborating with Prof. Scott Ruoti in the Usable Security Empirical Research Lab (USER Lab) at University of Tennessee and Prof. Rick Wash in the Behavior Information and Technology Lab (BITLab) at Michigan State University on this project.


Investigating Smart Transportation and Human Mobility Through the Lens of Social Media
Recently, shared dockless electric scooters (e-scooters) have emerged as a daily alternative to driving for short-distance commuters in large cities due to the affordability, easy accessibility via an app, and zero emissions. Meanwhile, e-scooters come with challenges in city management, such as traffic rules, public safety, parking regulations, and liability issues. In this project, we collected and investigated 5.8 million scooter-tagged tweets and 144,197 images, generated by 2.7 million users from October 2018 to March 2020, to take a closer look at shared e-scooters via crowdsourcing data analytics. We profiled e-scooter usages from spatial-temporal perspectives, explored different stakeholders (i.e., riders, gig workers, and ridesharing companies), examined operation patterns (e.g., injury types, and parking behaviors), and conducted sentiment analysis. We also confirmed a great gender gap in shared e-scooters with 34.86% identified as female and 65.14% as male. To our best knowledge, this project is the first large-scale systematic study on shared e-scooters using big social data. In another project, we took the opportunity of The 2017 Great American Eclipse to look into its potential social, emotional, and human movement impacts at the national level through more than five million English eclipse-mentioning tweets.
EmojiCloud: an Open-source Python-based Tool for Emoji Cloud Visualization
This project proposes EmojiCloud, an open-source Python-based emoji cloud visualization tool, to generate a quick and straightforward understanding of emojis from the perspective of frequency and importance. EmojiCloud is flexible enough to support diverse drawing shapes, such as rectangles, ellipses, and image masked canvases. We also follow inclusive and personalized design principles to cover the unique emoji designs from seven emoji vendors (e.g., Twitter, Apple, and Windows) and allow users to customize plotted emojis and background colors. We hope EmojiCloud can benefit the whole emoji community due to its flexibility, inclusiveness, and customizability. Please pip install EmojiCloud and play EmojiCloud.
Tutorial | Source Code | Paper | Slides | Online Service (available soon)
AI in Cross-domain Applications

DeepWelding: a Deep Learning Enhanced Approach to GTAW Using Multi-source Sensing Images
Deep learning has great potentials to reshape manufacturing industries. In this paper, we present DeepWelding, a novel framework that applies deep learning techniques to improve gas tungsten arc welding (GTAW) process monitoring and penetration detection using multi-source sensing images. The framework is capable of analyzing multiple types of optical sensing images synchronously and consists of three deep learning enhanced consecutive phases: an image preprocessing, an image selection, and a weld penetration classification. Specifically, we adopted generative adversarial networks (pix2pix) for the image denoising and classic convolutional neural networks (AlexNet) for the image selection. Both pix2pix and AlexNet delivered satisfactory performance. However, five individual neural networks with heterogeneous architectures demonstrated inconsistent generalization capabilities in the classification phase when holding out multi-source images generated with specific experiment settings. Therefore, two ensemble methods combining multiple neural networks are designed to improve the model's performance on unseen data collected from different experiment settings. We also found that the quality of model prediction was heavily influenced by the data stream collection environment. We hope these findings are beneficial for the broad intelligent welding community.
SocialCattle: IoT-based Mastitis Detection and Control through Social Cattle Behavior Sensing in Smart Farms
Effective and efficient animal disease detection and control have drawn increasing attention in smart farming in recent years. It is crucial to explore how to harvest data and enable data-driven decision making for rapid diagnosis and early treatment of infectious diseases among herds. This paper proposes an IoT-based animal social behavior sensing framework to model mastitis propagation and infer mastitis infection risks among dairy cows. To monitor cow social behaviors, we deploy portable GPS devices on cows to track their movement trajectories and contacts with each other. Based on those collected location data, we build directed and weighted cattle social behavior graphs by treating cows as vertices and their contacts as edges, assigning contact frequencies between cows as edge weights, and determining edge directions according to contact spatial-temporal information. Then, we propose a flexible probabilistic disease transmission model, which considers both direct contacts with infected cows and indirect contacts via environmental contamination, to estimate and forecast mastitis infection probabilities. Our model can answer two common questions in animal disease detection and control: 1) which cows should be given the highest priorities for an investigation to determine whether there are already infected cows on the farm; 2) how to rank cows for further screening when only a tiny number of sick cows have been identified. Both theoretical and simulation-based analytics of in-the-field experiments (17 cows and more than 70-hours data) demonstrate the proposed framework's effectiveness. In addition, somatic cell count (SCC) mastitis tests validate our predictions as correct in real-world scenarios.
