Research in Responsible AI Lab
The Responsible AI Lab is committed to enhancing AI's fairness, explainability, accountability, transparency, ethics, security, and privacy through the design and development of novel algorithms and models, harnessing AI's potential as a powerful force for promoting social good. Moreover, our research delves into the examination and anticipation of societal impacts arising from both established and emerging AI techniques, with a particular emphasis on controversial aspects like face recognition, accomplished through big social data mining and deep learning. In addition, our lab engages in interdisciplinary efforts to tackle multifaceted real-world challenges spanning diverse domains, including education, biology, sustainable manufacturing, and smart agriculture. We are envisioning a future where AI is thoughtfully and responsibly integrated into society, shaping a more equitable world. Below, selected research works are highlighted in Responsible AI Generative AI Applied AI and Social Media Mining.
Responsible AI
PreciseDebias: An Automatic Prompt Engineering Approach for Generative AI to Mitigate Image Demographic Biases
This project addresses the critical issue of demographic biases in image-centric applications, such as image search engines and generative AI systems. These biases often perpetuate stereotypes and misrepresent diverse demographic groups. To tackle this problem, PreciseDebias introduces a comprehensive end-to-end framework designed to rectify demographic bias in image generation. PreciseDebias leverages fine-tuned Large Language Models (LLMs) in conjunction with text-to-image generative models to transform generic text prompts into demographically-informed prompts. The core innovation lies in the instruction-following LLM, which is meticulously designed to assess model biases and ensure balanced training. This approach allows the generation of images that accurately reflect specified demographic distributions, thus promoting diversity and fairness. Extensive experiments demonstrate PreciseDebias's effectiveness in mitigating biases related to both ethnicity and gender in generated images. The results showcase its robustness and ability to capture demographic intricacies, outperforming existing baseline methods. The generalization capabilities of PreciseDebias are further highlighted through diverse image generation across multiple professions and demographic attributes. To foster further research and reproducibility, all models and codes associated with PreciseDebias will be made publicly accessible. Bias Mitigation LLM Prompt Engineering Image Generation
Privacy-Preserving Model Accuracy Estimation on Unlabeled Datasets Through Distribution-Aware Adversarial Perturbation
This work tackles a critical issue in deep learning: accurately estimating model performance on unlabeled datasets while ensuring data privacy. Traditional methods often require direct access to the entire test dataset, risking data leakage and model theft. DAAP introduces a novel approach that leverages a publicly available dataset as an intermediary to bridge the gap between the model and the test data, thus mitigating privacy concerns. DAAP employs distribution-aware adversarial perturbations to minimize distributional discrepancies between datasets, enabling precise estimation of model performance on unseen test data. Two specialized strategies are presented for white-box and black-box model contexts. The white-box strategy reduces output entropy disparities, while the black-box strategy manipulates distribution discriminators. By avoiding direct interaction with the test data, DAAP ensures both data and model privacy. Extensive evaluations on the CIFAR-10-C, CIFAR-100-C, and CelebA datasets demonstrate DAAP's effectiveness in accurately estimating performance while safeguarding privacy. This innovative framework enhances the reliability and integrity of model performance assessments in real-world applications. Privacy Adversarial Learning Model Accuracy Estimation
Towards Transferable Targeted Adversarial Examples
Transferability of adversarial examples is critical for black-box deep learning model attacks. While most existing studies focus on enhancing the transferability of untargeted adversarial attacks, few of them studied how to generate transferable targeted adversarial examples that can mislead models into predicting a specific class. Moreover, existing transferable targeted adversarial attacks usually fail to sufficiently characterize the target class distribution, thus suffering from limited transferability. In this research, we propose the Transferable Targeted Adversarial Attack (TTAA), which can capture the distribution information of the target class from both label-wise and feature-wise perspectives, to generate highly transferable targeted adversarial examples. To this end, we design a generative adversarial training framework consisting of a generator to produce targeted adversarial examples, and feature-label dual discriminators to distinguish the generated adversarial examples from the target class images. Specifically, we design the label discriminator to guide the adversarial examples to learn label-related distribution information about the target class. Meanwhile, we design a feature discriminator, which extracts the feature-wise information with strong cross-model consistency, to enable the adversarial examples to learn the transferable distribution information. Furthermore, we introduce the random perturbation dropping to further enhance the transferability by augmenting the diversity of adversarial examples used in the training process. Security Adversarial Attack Transferability
Fairness-aware Adversarial Network Pruning
Network pruning aims to compress models while minimizing loss in accuracy. With the increasing focus on bias in AI systems, the bias inheriting or even magnification nature of traditional network pruning methods has raised a new perspective towards fairness-aware network pruning. Straightforward pruning plus debias methods and recent designs for monitoring disparities of demographic attributes during pruning have endeavored to enhance fairness in pruning. However, neither simple assembling of two tasks nor specifically designed pruning strategies could achieve the optimal trade-off among pruning ratio, accuracy, and fairness. This research proposes an end-to-end learnable framework for fairness-aware network pruning, which optimizes both pruning and debias tasks jointly by adversarial training against those final evaluation metrics like accuracy for pruning, and disparate impact and equalized odds for fairness. In other words, our fairness-aware adversarial pruning method would learn to prune without any handcraft rules. Therefore, our approach could flexibly adapt to variate network structures. Exhaustive experimentation demonstrates the generalization capacity of our approach, as well as superior performance on pruning and debias simultaneously. To highlight, the proposed method could preserve the SOTA pruning performance while significantly improving fairness by around 50% as compared to traditional pruning methods. Fairness Adversarial Attack Pruning
Investigating Code Generation Performance of ChatGPT
The recent advancements in large language models and generative models are enabling innovative ways of performing tasks like programming, debugging, and testing. Our research presents a scalable crowdsourcing data-driven framework to investigate the code generation performance of generative large language models. We focus on ChatGPT to reveal insights and patterns in code generation. We propose a hybrid keyword word expansion method that filters relevant social media posts on Twitter and Reddit using topic modeling and expert knowledge. Our data analytics show that ChatGPT has been used in more than 10 programming languages for a diverse range of tasks such as code debugging, interview preparation, and academic assignment solving. Surprisingly, our analysis shows that fear is the dominant emotion associated with ChatGPT's code generation, overshadowing emotions of happiness, anger, surprise, and sadness. We also identifiy many ethical issues of generated code. In certain instances, ChatGPT has produced code that showed biases related to race, gender, or other demographic characteristics. Furthermore, we construct a ChatGPT prompt and corresponding code dataset by analyzing the screenshots of ChatGPT code generation shared on social media. This dataset enables us to evaluate the quality of the generated code, and we have made the dataset available to the public. ChatGPT Demographic Bias Code Generation
Adversarial Attacking and Improving Gender Fairness in Image Search
Adversarial attacks are threatening the safety of AI models, but such attacks can also be used to examine and evaluate the fairness, robustness, trustworthiness, and security of AI systems. In our recently accepted AAAI'22 paper, we proposed adversarial attack queries composing of professions and countries (e.g., "CEO United States") to investigate whether gender bias is thoroughly mitigated by AI-based image search engines. Our experiments on Google, Baidu, Naver, and Yandex Image Search showed that the proposed attack could effectively trigger high levels of gender bias in image search results. To defend against such attacks and mitigate gender bias, we designed and implemented three novel re-ranking algorithms -- epsilon-greedy algorithm, relevance-aware swapping algorithm, and fairness-greedy algorithm, to re-rank returned images for given image queries. This work was selected as an oral presentation and featured by AAAS EurekAlert! and ACM Tech News. Bias Mitigation Gender Fairness Adversarial Attack
Adversarial Auditing Facial Recognition Systems
Dr. Feng is leading a Microsoft Azure ($20,000) and UW Strategic Research Fund ($4,000) funded research project as a PI -- "Investigating Fairness and Trustworthiness of AI Facial Recognition Systems by Adversarial Attacks", where we explore and answer the following research questions: (i) Are there any brittleness and embedded biases of state-of-the-art facial recognition systems when inferring demographic features from original face images? (ii) Which adversarial attacks (e.g., pixel perturbation, rotation, semantic interference, and content editing) are more effective in confusing AI facial recognition systems? (iii) Will AI facial recognition systems perform differently on the same adversarial attacks across different demographic groups? (iv) How to improve the robustness and capability of AI facial recognition systems to defend against such adversarial attacks? Working with my postdoc mentor, Prof. Chirag Shah, we are now writing an NSF Expeditions in Computing (Expeditions) proposal for Trustworthy AI in Information Access Systems. Fairness and Trustworthiness Facial Recognition Adversarial Auditing
Towards Fairness-Aware Ranking by Defining Latent Groups Using Inferred Features
Group fairness in search and recommendation is drawing increasing attention in recent years. This project explores how to define latent groups, which cannot be determined by self-contained features but must be inferred from external data sources, for fairness-aware ranking. In particular, taking the Semantic Scholar dataset released in TREC 2020 Fairness Ranking Track as a case study, we infer and extract multiple fairness related dimensions of author identity including gender and location to construct groups. Furthermore, we propose a fairness-aware re-ranking algorithm incorporating both weighted relevance and diversity of returned items for given queries. Our experimental results demonstrate that different combinations of relative weights assigned to relevance, gender, and location groups perform as expected. Group Fairness Fairness-aware Ranking
ExpScore: Learning Metrics for Recommendation Explanation
Many information access and machine learning systems, including recommender systems, lack transparency and accountability. High-quality recommendation explanations are of great significance to enhance the transparency and interpretability of such systems. However, evaluating the quality of recommendation explanations is still challenging due to the lack of human-annotated data and benchmarks. In this project, we present a large explanation dataset named RecoExp, which contains thousands of crowdsourced ratings of perceived quality in explaining recommendations. To measure explainability in a comprehensive and interpretable manner, we propose ExpScore, a novel machine learning-based metric that incorporates the definition of explainability from various perspectives (e.g., relevance, readability, subjectivity, and sentiment polarity). Experiments demonstrate that ExpScore not only vastly outperforms existing metrics and but also keeps itself explainable. These resources and our findings can serve as forces of public good for scholars as well as recommender systems users. Explainability Recommender System
SenCAPTCHA: A Mobile-First CAPTCHA Using Orientation Sensors
With the increasing amount of time spent on mobile devices, it is necessary to design mobile-friendly software to enhance security and preserve privacy. To fight against malicious bots on mobile, we designed and developed SenCAPTCHA, a mobile-first CAPTCHA that leverages the device's orientation sensors to allow for easy completion of the CAPTCHA on devices with small screen sizes (e.g., smartphones, smartwatches). SenCAPTCHA takes advantage of the fact that detecting animal facial keypoints in mutated images is an AI-hard problem. SenCAPTCHA works by showing users an image of an animal and asking them to tilt their devices to guide a red ball into the center of that animal's eye. A demo and the source code of SenCAPTCHA are available at www.sencaptcha.org. We described the design of SenCAPTCHA and demonstrated that it is resilient to various machine learning based attacks. We also conducted two IRB-approved usability studies of SenCAPTCHA involving a total of 472 mobile device users recruited from Amazon Mechanical Turk; our results showed that SenCAPTCHA was viewed as a "fun" CAPTCHA and that it was preferred by half of the participants to other existing CAPTCHAs. This work was awarded the Best Presentation for the Security, Privacy, and Acceptance Track at ACM UbiComp'2020, and nominated for Conference Best Presentation, Audience Award and Judges Award. Usable Security Privacy Mobile Sensing
Generative AI
GenFlowchart: Parsing and Understanding Flowchart Using Generative AI
This project introduces a novel framework aimed at enhancing the automated parsing and understanding of flowcharts. Flowcharts, widely used for system requirement analysis, preliminary planning, and detailed design, encapsulate logical flows and component-level information in an easily interpretable manner. However, the automated parsing of these diagrams is challenging due to their intricate logical structures and text-rich nature. GenFlowchart leverages cutting-edge technologies to address these challenges. It begins with a state-of-the-art segmentation model, the Segment Anything Model (SAM), to delineate various components and geometrical shapes within the flowchart. Optical Character Recognition (OCR) is then employed to extract text from each component, facilitating a deeper functional understanding. Finally, generative AI is utilized to integrate the segmented results and extracted text, reconstructing the flowchart’s workflows. This framework is evaluated against multiple flowcharts and benchmarked against several baseline approaches, demonstrating its superior performance. GenFlowchart offers significant advancements in flowchart parsing and interpretation, providing a comprehensive tool for accurately understanding and processing flowcharts across diverse applications. The project underscores the potential of generative AI in enhancing the accuracy and efficiency of automated flowchart analysis. LLM Image Segmentation OCR Multimodal Learning
LitAI: Enhancing Multimodal Literature Understanding and Mining with Generative AI
This project aims to revolutionize the retrieval and comprehension of multimodal information from literature documents, which are crucial for scientific research and knowledge discovery. Traditional methods for extracting information from literature, particularly those involving text, tables, and figures in PDF formats, often face challenges due to the diverse and non-standardized presentation formats. LitAI addresses these challenges by integrating Optical Character Recognition (OCR) with generative AI tools. This integration facilitates the accurate extraction of text, tables, and figures from PDF documents. Specifically, LitAI employs the capabilities of generative AI, such as ChatGPT, to enhance the quality of text recognition, correct typographical errors, and improve the overall coherence of extracted text. For table parsing, LitAI uses prompt engineering techniques to handle complex nested structures, ensuring precise data extraction. Moreover, the framework leverages GPT-4 Vision to analyze and interpret figures within their contextual descriptions, providing a comprehensive understanding of visual data in literature. GPT-4 LLM Literature Mining Multimodal Learning
S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document
This work presents an innovative framework designed to tackle the complexities of comprehending large-scale scientific software. Traditional methods face significant challenges due to the extensive codebases, diverse programming languages, and intricate documentation. S3LLM leverages generative AI, specifically large language models (LLMs), to enhance the analysis and understanding of scientific software. S3LLM integrates open-source LLaMA-2 models, which facilitate the conversion of natural language queries into domain-specific language (DSL) queries. This approach allows for efficient scanning and parsing of entire code repositories, including diverse metadata formats such as DOT, SQL, and custom formats. Additionally, S3LLM employs retrieval-augmented generation (RAG) and LangChain technologies to enable comprehensive document querying. The framework is demonstrated on the Energy Exascale Earth System Model (E3SM), showcasing its effectiveness in analyzing source code, metadata, and textual documents. By providing a user-friendly interface and enabling natural language interactions, S3LLM significantly reduces the need for extensive coding expertise, thereby making the process of understanding complex scientific software more efficient and accessible.LLM RAG DSL Large-Scale Scientific Software Understanding
Towards Generating Robust, Fair, and Emotion-Aware Explanations for Recommender Systems
Recommender systems often suffer from lack of fairness and transparency. Providing robust and unbiased explanations for recommendations has been drawing more and more attention as it can help address these issues and improve trustworthiness and informativeness of recommender systems. Current explanation generation models are found to exaggerate certain emotions without accurately capturing the underlying tone or the meaning. In this project, we propose a novel method based on a multi-head transformer, called Emotion-aware Transformer for Explainable Recommendation (EmoTER), to generate more robust, fair, and emotion-enhanced explanations. To measure the linguistic quality and emotion fairness of the generated explanations, we adopt both automatic text metrics and human perceptions for evaluation. Experiments on three widely-used benchmark datasets with multiple evaluation metrics demonstrate that EmoTER consistently outperforms the existing state-of-the-art explanation generation models in terms of text quality, explainability, and consideration for fairness to emotion distribution. Implementation of EmoTER will be released as an open-source toolkit to support further research. Explanation Generation Emotion-Aware Fairness Recommender System
Applied AI in Cross-domain Science
FractionNet: AI-Driven Insights in Transforming Handwritten Fraction Detection for Teacher Education
This work addresses the critical issue of inadequate fraction reasoning skills among elementary students and the challenges teachers face in understanding and teaching fractions. FractionNet is an innovative AI-based tool designed to assist teachers, students, and parents in analyzing children's handwritten fraction work. The system leverages a custom dataset derived from MNIST, employing YOLOv8 for object detection and Convolutional Neural Networks (CNNs) for understanding fractions. This research showcases FractionNet's ability to accurately detect and compute various handwritten fraction configurations through empirical experiments and visual examples. The framework aims to bridge the gap left by traditional Optical Character Recognition (OCR) techniques, which often struggle with the diversity and inconsistency of handwritten fractions. By integrating advanced AI methods, FractionNet provides a comprehensive solution for analyzing and understanding handwritten mathematical notations. Ultimately, FractionNet is envisioned as a valuable resource in teacher education, enhancing teachers' ability to assess and support students' fraction learning, and offering a novel approach to improving the comprehension of complex handwritten mathematical expressions. AI for Education Math Teaching Fraction Recognition
Generative AI for Cardiac Organoid Fluorescence Generation
The hPSC-derived cardiac organoids are essential for modeling heart development and disease, yet traditional bright-field microscopic imaging falls short in providing cell type-specific information. Fluorescence microscopy, while informative, is limited by its labor-intensive and sample-specific nature. To overcome these limitations, this project proposes an innovative approach using conditional Generative Adversarial Networks (GANs) to colorize grayscale images of cardiac organoids, thereby providing comprehensive fluorescence data. By employing the Pix2Pix GAN model enhanced with the Convolutional Block Attention Module (CBAM), the framework focuses on critical features to achieve accurate and realistic colorization. This method not only bridges the gap left by traditional imaging techniques but also introduces a novel evaluation metric, the Weighted Patch Histogram, which captures spatially aware color histogram information for a more accurate assessment of the generated images. The integration of advanced AI techniques in this research signifies a substantial advancement in the field, promising to enhance the visualization and analysis of cardiac organoids. This, in turn, facilitates better understanding and potential breakthroughs in cardiac research and drug development. AI for Biology Cardiac Organoid GANs CBAM
DeepWelding: a Deep Learning Enhanced Approach to GTAW Using Multi-source Sensing Images
Deep learning has great potentials to reshape manufacturing industries. In this paper, we present DeepWelding, a novel framework that applies deep learning techniques to improve gas tungsten arc welding (GTAW) process monitoring and penetration detection using multi-source sensing images. The framework is capable of analyzing multiple types of optical sensing images synchronously and consists of three deep learning enhanced consecutive phases: an image preprocessing, an image selection, and a weld penetration classification. Specifically, we adopted generative adversarial networks (pix2pix) for the image denoising and classic convolutional neural networks (AlexNet) for the image selection. Both pix2pix and AlexNet delivered satisfactory performance. However, five individual neural networks with heterogeneous architectures demonstrated inconsistent generalization capabilities in the classification phase when holding out multi-source images generated with specific experiment settings. Therefore, two ensemble methods combining multiple neural networks are designed to improve the model's performance on unseen data collected from different experiment settings. We also found that the quality of model prediction was heavily influenced by the data stream collection environment. We hope these findings are beneficial for the broad intelligent welding community. AI for Manufacturing Smart Welding GANs
SocialCattle: IoT-based Mastitis Detection and Control through Social Cattle Behavior Sensing in Smart Farms
Effective and efficient animal disease detection and control have drawn increasing attention in smart farming in recent years. It is crucial to explore how to harvest data and enable data-driven decision making for rapid diagnosis and early treatment of infectious diseases among herds. This paper proposes an IoT-based animal social behavior sensing framework to model mastitis propagation and infer mastitis infection risks among dairy cows. To monitor cow social behaviors, we deploy portable GPS devices on cows to track their movement trajectories and contacts with each other. Based on those collected location data, we build directed and weighted cattle social behavior graphs by treating cows as vertices and their contacts as edges, assigning contact frequencies between cows as edge weights, and determining edge directions according to contact spatial-temporal information. Then, we propose a flexible probabilistic disease transmission model, which considers both direct contacts with infected cows and indirect contacts via environmental contamination, to estimate and forecast mastitis infection probabilities. Our model can answer two common questions in animal disease detection and control: 1) which cows should be given the highest priorities for an investigation to determine whether there are already infected cows on the farm; 2) how to rank cows for further screening when only a tiny number of sick cows have been identified. Both theoretical and simulation-based analytics of in-the-field experiments (17 cows and more than 70-hours data) demonstrate the proposed framework's effectiveness. In addition, somatic cell count (SCC) mastitis tests validate our predictions as correct in real-world scenarios. AI for Agriculture Smart Farm Precision Agriculture
Social Media Data Analytics, Mining, and Modeling
Real-time Large-scale Social Media Auditing and Monitoring Systems
Crowdsourced data is able to deliver valuable insights into how people perceive and react to the world. Relying on crowdsourced social media data, we were the first to conduct a systematic, large-scale study on the petition for new emojis, and develop an interactive real-time requested emoji tracking system. We collected more than thirty million related tweets in one year, and examined patterns of new emoji petitions through visualizing spatiotemporal distributions, summarizing advocacy behaviors, and exploring factors that inspire such requests. We also studied the equity, diversity, and fairness issues due to unreleased but expected emojis, and concluded the significance of new emojis on society such as business promotion and violence control. We also proposed time-continuity sensitive ranking algorithms to figure out the most desired emojis, and implemented a web-based real-time requested emoji tracking system - www.call4emoji.org, providing interactive query services, such as ranking emojis using different policies and filtering requested emojis by keywords and date ranges. This work has been covered internationally by news outlets, including Financial Times, Business Insider, and Yahoo! Finance. As an extended work, my proposal titled "Fairness and Transparency in Social Media Listening: Evidence in Emoji Requests" was awarded a 2020 Kaggle's Open Data Research Grant (one of 19 winners globally). Social Media Real-time Tracking Human-Computer Interaction
Unifying Telescope and Microscope: A Multi-lens Framework with Open Data for Modeling Emerging Events
The benefits of open data, such as accessibility and transparency, have motivated and enabled a large number of research studies and applications in both academia and industry. However, each open data only offers a single perspective, and its potential inherent limitations (e.g., demographic biases) may lead to poor decisions and misjudgments. This project discusses how to create and use multiple digital lenses empowered by open data, including census data (macro lens), search logs (meso lens), and social data (micro lens), to investigate general real-world events. To reveal the unique angles and perspectives brought by each open lens, we summarize and compare the underpinning open data from eleven dimensions, such as utility, data volume, dynamic variability, and demographic fairness. Then, we propose an easy-to-use and generalized open data-driven framework, which automatically retrieves multi-source data, extracts features, and trains machine learning models for the event specified by answering what, when, and where questions. With low labor efforts, the framework's generalization and automation capabilities guarantee an instant investigation of general events and phenomena, such as disasters, sports events, and political activities. We also conduct two case studies, i.e., the COVID-19 pandemic and Great American Eclipse, to demonstrate its feasibility and effectiveness at different time granularities. Open Data Data Fusion Model Fusion Event Modeling Big Data
Analyzing the Emerging Security and Privacy Issues on Social Media
Real-time social media auditing and monitoring can serve as a crowdsourcing tool to track and analyze the emerging security and privacy issues reported and discussed by people online. Now I am leading a research study that mainly leverages social media listening and auditing to investigate self-reported phishing attacks. To be specific, we filter and sample real-time phishing related social media postings, and further conduct data analytics to reveal potential patterns. We aim to figure out how emerging phishing attacks are correlated with trending and viral topics on social media, such as the COVID-19 vaccine and cryptocurrency. Thus, we will train machine learning models to predict potential phishing attacks before their occurrence. To mitigate potential social media data bias, we incorporate the hourly/daily aggregated search logs of phishing related keywords provided by Google Trends when modeling and analyzing phishing attacks. Dr. Feng is collaborating with Prof. Scott Ruoti in the Usable Security Empirical Research Lab (USER Lab) at University of Tennessee and Prof. Rick Wash in the Behavior Information and Technology Lab (BITLab) at Michigan State University on this project. Social Media Security Privacy
Investigating Smart Transportation and Human Mobility Through the Lens of Social Media
Recently, shared dockless electric scooters (e-scooters) have emerged as a daily alternative to driving for short-distance commuters in large cities due to the affordability, easy accessibility via an app, and zero emissions. Meanwhile, e-scooters come with challenges in city management, such as traffic rules, public safety, parking regulations, and liability issues. In this project, we collected and investigated 5.8 million scooter-tagged tweets and 144,197 images, generated by 2.7 million users from October 2018 to March 2020, to take a closer look at shared e-scooters via crowdsourcing data analytics. We profiled e-scooter usages from spatial-temporal perspectives, explored different stakeholders (i.e., riders, gig workers, and ridesharing companies), examined operation patterns (e.g., injury types, and parking behaviors), and conducted sentiment analysis. We also confirmed a great gender gap in shared e-scooters with 34.86% identified as female and 65.14% as male. To our best knowledge, this project is the first large-scale systematic study on shared e-scooters using big social data. In another project, we took the opportunity of The 2017 Great American Eclipse to look into its potential social, emotional, and human movement impacts at the national level through more than five million English eclipse-mentioning tweets. Social Media Human Mobility Big Data
EmojiCloud: an Open-source Python-based Tool for Emoji Cloud Visualization
This project proposes EmojiCloud, an open-source Python-based emoji cloud visualization tool, to generate a quick and straightforward understanding of emojis from the perspective of frequency and importance. EmojiCloud is flexible enough to support diverse drawing shapes, such as rectangles, ellipses, and image masked canvases. We also follow inclusive and personalized design principles to cover the unique emoji designs from seven emoji vendors (e.g., Twitter, Apple, and Windows) and allow users to customize plotted emojis and background colors. We hope EmojiCloud can benefit the whole emoji community due to its flexibility, inclusiveness, and customizability. Please pip install EmojiCloud and play EmojiCloud. Social Media Data Visualization Emoji
Tutorial | Source Code | Paper | Slides | Online Service (available soon)