Social Media Comment Map
Table of Contents
Project Overview
When people scroll social media, they often spend more time reading the comments than the original post.
Yet what we see in comment sections is rarely a neutral sample of public opinion. It is shaped by opaque ranking algorithms, engagement metrics, and visibility dynamics that amplify certain voices over others. As a result, readers may overestimate extreme views, misjudge the distribution of opinions, or misperceive social norms.
This project asks:
Can we make collective voices visible without flattening individual ones?
Instead of engagement signals or raw text data, this approach treats comments as a space of collective sensemaking, where representation itself shapes perception.
Why Comment Spaces Matter
Comment sections are a major site of social inference. People form impressions of what “most people think” based on what they can see, and that visibility is platform-mediated.
If ranking systems disproportionately surface outrage, slogan-like rhetoric, or high-engagement edge cases, readers may infer skewed social norms. This design problem is also a social cognition problem.
The Intervention
I built an interactive prototype that transforms a comment thread into a semantic map.
Each comment is embedded using a language model and projected into a two-dimensional space. Comments with similar meanings appear closer together, forming clusters that reflect recurring themes, framings, or narrative styles.
Crucially, this map does not summarize away individual voices. Users can click any point to view:
- the full original comment
- number of likes
- timestamp
- cluster affiliation
The goal is not to replace comments with AI summaries, but to make their structure visible.
Demo
Alternatively, open the interactive map in a new tab
Method
This system uses a lightweight NLP and visualization pipeline:
- Data Collection
Comment threads are exported as csv/excel files (currently from Xiaohongshu and Reddit-style formats). - Text Embedding
Sentence-level embeddings are generated using OpenAI embedding models. - Dimensionality Reduction
UMAP projects high-dimensional embeddings into a 2D semantic space. - Clustering
HDBSCAN identifies dense thematic clusters without predefining cluster counts. - Interactive Visualization
Plotly renders an interactive scatterplot where each point represents a comment.
Findings and Implications
Across multiple datasets, several patterns emerged:
- More narrative-driven responses created larger, diffuse semantic regions.
- In some cases, what appeared to be polarized discussions visually resembled continuous landscapes rather than sharply separated camps.
- TF-IDF-based maps produced sharper separations around keyword repetition, while embedding-based maps revealed deeper semantic continuity.
Why This Matters
From a research perspective, comment presentation may shape:
- perceived social norms
- perceived polarization
- perceived toxicity
- willingness to express minority views
Social psychology literature shows that people infer majority opinion from visible cues. If comment ordering distorts distribution, norm perception may also be distorted.
From a product perspective, this raises design questions:
- Could alternative representations reduce misperception and polarization?
- Could visible distribution maps improve deliberative quality?
- How might platforms expose diversity without suppressing engagement?
From a policy perspective, transparency in comment representation could become part of broader algorithmic accountability conversations.
Resources
This project was inspired by Talk to the City, built by the AI Objectives Institute, which explores how collective input can be summarized while preserving nuance.