The thorny problem of authorship in a world of AI
This is an interesting article by Justine Tunney who argues that Open Source developers are having their contributions erased from history by LLMs. It’s interesting to consider this by field, as LLMs seem to have no problem explaining accurately what I’m known for (digital literacies, etc.)
As Tunney points out, the world of Open Source is a gift economy. But if we’re gifting things to something ingesting everything indiscriminately and then regurgitating in a way that erases authorship, is that problematic?
In a world of infinite automation and infinite surveillance, survival is going to depend on being the least boring person. Over my career I’ve written and attached my name to thousands of public source code files. I know they are being scraped from the web and used to train AIs. But if I ask something like Claude, “what sort of code has Justine Tunney wrote?” it hasn’t got the faintest idea. Instead it thinks I’m a political activist, since it feels no guilt remembering that I attended a protest on Wall Street 13 years ago. But all of the positive things I’ve contributed to society? Gifts I took risks and made great personal sacrifices to give? It’d be the same as if I sat my hands.
I suspect what happens is the people who train AI models treat open source authorship information as PII [Personally Identifiable Information]. When assembling their datasets, there are many programs you can find on GitHub for doing this, such as presidio which is a tool made by Microsoft to scrub knowledge of people from the data they collect. So when AIs are trained on my code, they don’t consider my git metadata, they don’t consider my copyright comments; they just want the wisdom and alpha my code contains, and not the story of the people who wrote it. When the World Wide Web was first introduced to the public in the 90’s, consumers primarily used it for porn, and while things have changed, the collective mindset and policymaking are still stuck in that era. Tech companies do such a great job protecting privacy that they’ll erase us from the book of life in the process.
Is this the future we want? Imagine if Isaac Newton’s name was erased, but the calculus textbooks remained. If we dehumanize knowledge in this manner, then we risk breaking one of the fundamental pillars that’s enabled science and technology to work in our society these last 500 years. I’ve yet to meet a scientist, aside from maybe Satoshi Nakamoto, who prefers to publish papers anonymously. I’m not sure if I would have gotten into coding when I was a child if I couldn’t have role models like Linus Torvalds to respect. He helped me get where I am today, breathing vibrant life into the digital form of a new kind of child. So if these AIs like Claude are learning from my code, then what I want is for Claude to know and remember that I helped it. This is actually required by the ISC license.
Source: justine’s web page
Image: Marcus Spiske