Social Media Comment Map

Table of Contents

Project Overview

When people scroll social media, they often spend more time reading the comments than the original post.

Yet what we see in comment sections is rarely a neutral sample of public opinion. It is shaped by opaque ranking algorithms, engagement metrics, and visibility dynamics that amplify certain voices over others. As a result, readers may overestimate extreme views, misjudge the distribution of opinions, or misperceive social norms.

This project asks:

Can we make collective voices visible without flattening individual ones?

Instead of engagement signals or raw text data, this approach treats comments as a space of collective sensemaking, where representation itself shapes perception.

Why Comment Spaces Matter

Comment sections are a major site of social inference. People form impressions of what “most people think” based on what they can see, and that visibility is platform-mediated.

If ranking systems disproportionately surface outrage, slogan-like rhetoric, or high-engagement edge cases, readers may infer skewed social norms. This design problem is also a social cognition problem.

The Intervention

I built an interactive prototype that transforms a comment thread into a semantic map.

Each comment is embedded using a language model and projected into a two-dimensional space. Comments with similar meanings appear closer together, forming clusters that reflect recurring themes, framings, or narrative styles.

Crucially, this map does not summarize away individual voices. Users can click any point to view:

The goal is not to replace comments with AI summaries, but to make their structure visible.

Demo

Alternatively, open the interactive map in a new tab

Method

This system uses a lightweight NLP and visualization pipeline:

  1. Data Collection
    Comment threads are exported as csv/excel files (currently from Xiaohongshu and Reddit-style formats).
  2. Text Embedding
    Sentence-level embeddings are generated using OpenAI embedding models.
  3. Dimensionality Reduction
    UMAP projects high-dimensional embeddings into a 2D semantic space.
  4. Clustering
    HDBSCAN identifies dense thematic clusters without predefining cluster counts.
  5. Interactive Visualization
    Plotly renders an interactive scatterplot where each point represents a comment.

Findings and Implications

Across multiple datasets, several patterns emerged:

Why This Matters

From a research perspective, comment presentation may shape:

Social psychology literature shows that people infer majority opinion from visible cues. If comment ordering distorts distribution, norm perception may also be distorted.

From a product perspective, this raises design questions:

From a policy perspective, transparency in comment representation could become part of broader algorithmic accountability conversations.

Resources

This project was inspired by Talk to the City, built by the AI Objectives Institute, which explores how collective input can be summarized while preserving nuance.