Six per cent isn’t a lot, but perhaps a number of approaches working together can help with this?

The proliferation of harmful and offensive content is a problem that many online platforms face today. One of the most common approaches for moderating offensive content online is via the identification and removal after it has been posted, increasingly assisted by machine learning algorithms. More recently, platforms have begun employing moderation approaches which seek to intervene prior to offensive content being posted. In this paper, we conduct an online randomized controlled experiment on Twitter to evaluate a new intervention that aims to encourage participants to reconsider their offensive content and, ultimately, seeks to reduce the amount of offensive content on the platform. The intervention prompts users who are about to post harmful content with an opportunity to pause and reconsider their Tweet. We find that users in our treatment prompted with this intervention posted 6% fewer offensive Tweets than non-prompted users in our control. This decrease in the creation of offensive content can be attributed not just to the deletion and revision of prompted Tweets -- we also observed a decrease in both the number of offensive Tweets that prompted users create in the future and the number of offensive replies to prompted Tweets. We conclude that interventions allowing users to reconsider their comments can be an effective mechanism for reducing offensive content online.
Source: Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content | arXiv.org