As an undergraduate linguistics major in New Mexico, I was exposed to not only the languages of present-day America but also to lots of indigenous scholarship and writing on those languages. Some of the most eye-opening, “aha” moments of my college career came from those classes.

I remember reading an essay by (I think1) Vine Deloria Jr in which he asked why his being a native speaker of his language seemed to have no value, while a white academic could spend a year in his community doing fieldwork, learn that same language (incompletely!), and turn that knowledge into tenure or a published book.

The relationship of (white) academia and Native communities has almost universally been an extractive one, not a mutually beneficial one. Once you see that, you see it everywhere, and you can’t unsee it. I first saw it for myself reading that essay.

The author’s question in that essay really stopped me in my tracks. I’d gotten interested in linguistics in high school through a hobby of constructing languages2. Writing descriptive grammars was something I’d wanted to do, but that’s probably the most “extractive” kind of linguistics work there is. Swoop in to a community, learn the language, swoop away and publish a book? Not cool. When you’re in high school, reading grammars for fun (yes, I did this), sometimes you forget that behind every dry academic description of a language, there is (or was) a speech community, a group of people who live their lives through that language. Those people matter.

My university Linguistics department worked hard to collaborate with Native communities in a way that was more fair - coauthoring papers and books with community members, for instance, or exchanging knowledge for knowledge by teaching courses at a tribal school in exchange for language access. They also paid community members for their time and expertise - this seems like an obvious thing to do but sadly it’s not.

I’m very glad that my department was so aware of these inequalities, and worked intentionally to address them. In the end, though, I found it hard to reconcile myself with the extractive nature of field research, as compelling as the linguistic knowledge gained might be.

Hugging Face just announced that it had created a Bluesky dataset, and the Bluesky community (the Bluesky communities that I’m part of, at least) responded very negatively to its doing so3. I see the same sort of extractive relationship in this dataset that Vine Deloria Jr did with academic researchers. They visit a community and spend time there, but only in order to extract a valuable output. They offer no value in return. The community that actually produced the knowledge or information sees no benefit from the work. In this case, they weren’t even consulted.

What could “consulted” mean here? What would a more fair exchange of value look like? Well, for one thing, I would hope to see informed consent. This is something that seems like it could be done fairly easily with Bluesky labelers. If I can subscribe to a labeler and like a post to add a “my favorite Taylor Swift era” label to my profile, I could just as easily do the same to add an “I consent to using my posts to train machine learning models” label. (And I bet doing that would have been received much more positively by the community - using Bluesky’s own tools and features to collect consent in an opt-in manner would have been rad).

I acknowledge that opt-in consent makes the creation of such a dataset harder4 and more complicated, and that the data collected will be less representative. But that’s not a reason to proceed without consent or community participation. If your work depends on knowledge or language or The Sauce that’s produced by a community of people, by and for themselves, and you’re collecting those things, and taking them away, and turning them into some kind of Value without their consent and without giving them anything in exchange, you’re in the wrong.

It’s a lot harder to do this work in a way that’s fair to all parties. But I want to live in a world where we all do our best to be fair to one another.


  1. It pains me that I haven’t been able to locate the specific essay. I’m traveling this week but I need to dig through my undergraduate papers and see if I can find it. I’d like to accurately attribute it. ↩︎

  2. aka, conlanging↩︎

  3. It’s since been taken down↩︎

  4. Insert “that’s a feature, not a bug” comment here. ↩︎