Here are my personal thoughts about how we could handle Trust & Safety features in Mastodon software.

This is based on my own experience in the field, my current knowledge of the source code and architecture, as well as my experience managing infrastructure for mastodon.social & mastodon.online since December 2022.

Context

Managing a Mastodon instance is hard work, with the most effort going to moderation and abuse, and not technical operations as one might expect. This includes:

Reviewing and approving new registrations
Pro-active content moderation
Handling user-generated reports
Fighting abuse such as spam, denial of service attacks and other inauthentic activity
Member support (lost password, help using the platform, bugs…)

As the Fediverse grows, these topics need to be rigorously handled and we need to build tools to scale them to match user growth, without requiring a proportional growth of the moderation burden.

Mastodon currently has many features to manage these activities, including:

Built-in rate-limits
Advanced moderation features
Settings to harden your instance (restricting sign ups, invite-only…)
API & webhooks to develop custom tools
Various tools to limit or block remote instances and accounts that are not behaving properly

These features are quickly being outscaled by the growth in membership and the community is asking for more effort in this area. This aligns with the Mastodon roadmap, as Mastodon gGmbH (the non-profit maintaining the software) itself administers two of the biggest instances and works to meet these needs on a daily basis. The moderation burden is not only limited to Mastodon ; every Fediverse software needs to implement similar tools and features, and most need to re-implement similar interfaces. Some instance admins have also developed custom tooling using the Mastodon API and webhooks to be more efficient at those tasks, or to introduce additional features.

Vision

I would like Mastodon to support configuring trusted providers for some of these moderation activities.

As a Mastodon instance admin, you could configure one or more trusted providers (third party or self-hosted), allowing each of them to handle some or all of a range of specific concerns, such as data-driven account moderation, semi-automated abuse-fighting, human intervention, context-sensitive abuse-fighting, end user support and more, and then rely on those trusted services for these matters.

Multiple providers could be configured and be used to inform or act on actions (ie: should this sign-up be allowed? Should this account be authorised to post?) and the Mastodon server makes the final decision based on their responses (average score for example). Obviously some features should only allow one provider, like end user support, while most of the others would allow multiple ones.

Depending on the configured level of trust you have with each providers, you could choose to act without human intervention, or have their suggested actions put into a queue for you to review and decide. In any case, an audit log of all actions taken needs to be available, as well as the ability to retract a given action.

Why an external software?

Mastodon’s source code has, by necessity, grown more and more complex, and the velocity of the project is not compatible with the often very fast pace of moderating spam and inauthentic behaviour.

By delegating this to software outside of the main code base, this would allow these tools to evolve quicker, attract different contributors and have less dependencies on the core team. Additional funding may be found that wants to support moderation tooling, but not platform development.

Furthermore, if we design an agnostic protocol, this will attract various external contributors, both individuals and companies, who may not be Mastodon users. And the software will be able to help other platforms and apps with their own moderation burdens.

It also allows anybody to start writing their own implementation, either because they want specific features or want to quickly experiment on an idea.

Why external providers?

As it stands, scaling a Mastodon instance is hard, as the operators need to form a moderation team with 24/7 availability and good reaction times, learn about all the specific vectors for abuse and common issues, design guidelines and more.

If you are a very small instance, you most probably do not have the resources to manage all of this, but you are still expected to react quickly if anything happens. For example a spam attack on your instance targeting users on other instances needs to be handled quickly and efficiently if you do not want to get blocked (and run the risk of never being unblocked). I think we could provide an opt-in default provider at installation, at least for some concerns (spam-fighting?), to alleviate the load of those new admins.

On the other side, if (like Mastodon gGmbH) you administrate multiple instances, you need your team to manage each instance separately, which can very quickly become painful. Having a common provider for all your instances makes it much easier for your moderation team, and allows coordinated action on every instance when needed (sharing IP blocks & toxic domains for example).

Several organisations (either non-profits or co-ops) are also being worked on, with the goal of sharing resources in common to tackle those topics. Those organisations could administrate their own provider service, providing it to their members. I can even envision companies jumping on this topic and charge some money for it, like Akismet.

Furthermore, abuse-fighting is very often more efficient when you have a more global view than your own instance. For example, if a bad actor starts an attack on one instance, it is very likely that they will target other instances when the initial target mitigates their actions. Sharing this kind of information mitigates this and allows proactive actions. Same thing for corpus-based methods (NLP, classification) which are much more efficient if you have more data available.

The last argument in favor of mutualisation is money: access to good information is expensive (reputation databases, Crowdsec…) and it does not make sense to have each organisation pay for it, plus most instances will not have the funding for it. But having external providers do it is much more efficient and can unlock very powerful pro-active abilities by leveraging those tools.

Why multiple providers?

There are a few big reasons for this:

Availability: you may not want to depend on only one provider. Even the best services have downtime, and you need to handle this gracefully but still have some protection
Trust: you may not want to trust a single provider on critical decisions regarding your instance. Having multiple providers configured allows your instance to build a consensus-based decision, similarly to how mail spam fighting software works based on various RBLs
Spreading intelligence signals: this goes both ways, as you might want to send various signals from the traffic you see to some providers to help the overall network (see below for more details), but you also might want to have more data at your disposal to make decisions. A specific example of this is spam-fighting, where you really want to have as much information about IP / email reputation to be proactive about it.

Implementation

Mastodon source code

I envision the implementation in Mastodon’s software to be mostly around a few areas:

Provider set up and auditing (why has an action been taken?)
Sending various telemetry data to the trusted providers: posts (probably not content at first, for privacy reasons), sign ups, current activity rates?
Gathering immediate information synchronously when some specific actions are in progress (new post, sign up) and act upon the results
If we go for a more advanced integration with features like human moderation and user-support, then they needs more specific integrations and most probably API / Webhooks conforming to the to-be-designed protocol (see above)

Reference provider implementation(s)

A reference provider needs to be implemented along the Mastodon work above, to be able to test it during development and gather real world feedback on how it performs.

It would live in a separate repository, and could be a semi-independent project at the start (but not too much as it will need to evolve along the Mastodon source code implementation), then become more independent as the protocol settles down.

I would like this implementation to be the flagship one and gather most of the volunteers at the start of the project. We need to ensure that as many volunteers as possible are participating in this implementation rather than having multiple groups working on their own solution.

API / Protocol

Most of what I discussed above could probably be implemented with the existing Mastodon API & Webhook, but I strongly think we need to work on a specific protocol for this purpose, for two reasons:

those use-cases have specific needs (around privacy, both with more privacy for some things as data is going to an external party, and less privacy on some others to have specific data that is not exposed anywhere else)
I would really like those features to not be tied to Mastodon and be usable by other software. We should try to use generic (as opposed to Mastodon-specific) vocabulary and objects for the protocol, which would help the whole Fediverse. This is probably more work to go this way, but the long-term impact is a lot bigger

This protocol would be based on HTTP, versioned, and would use simple and common interfaces such as a REST API and Webhooks, using secret keys set up when an admins registers on a provider for secure communication.

It would also need to specify client (ie, instance software) behaviour around rate-limiting, timeout behaviour (when to fail-open, when to retry…).

Instance admins should be able to choose the permitted access level they want to allow providers to have. For example, it could allow their moderation provider to act on reports and block users, but require an admin approval for blocking an instance or more impactful actions.

I believe the protocol should also allow some providers to interact with other providers, for example to use their signals as inputs for their own decisions and share intelligence.

Concerns

Here are various concerns I have in mind, and needs to drive the design of these features

Privacy

Here is the big concern. We always tried to not leak any information outside of your Mastodon instance, and the provider proposal obviously breaks it.

As always, privacy should be in control of the user, in this case the instance admin. For this, I suggest the protocol supports various levels of data privacy for each provider, allowing instance admins to send the amount of data they feel confident to each provider. For example:

your own provider, send every data (including IPs, email addresses, posts content). This is considered as internal to your service. This provider also handles your human moderation and user-support, so it has full access.
a non-profit provider you are a member of. You have a contract with them and their privacy policy is good for your needs. You can send content URLs, anonymised IPs (to /24), tokenized email addresses.
a third provider you want to share some data with for intelligence sharing, but nothing more. You could send post data without content, AS numbers and email-domains.

I firmly believe think that instances should also expose a list of such configured providers with the info that is being shared with each.

I am not an expert on these topics but there has been many research around this, and some very nice solution to adjacent problem have been deployed (for example K-anonymity in HIBP Password API)

Availability

You do not want your instance to go down if a provider is having issues. Providers need to be designed to be as available as possible, both by having a sane architecture and by being hosted behind protection services so they are not impacted by DDOS or other attacks.

But the instances also need to gracefully handle those failures, by either ignoring the provider and using the ones that are responding, failing open (accepting everything) or failing close (rejecting everything).

We could even have it as a setting by the server admin, allowing them to benefit from crowdsourced data from multiple instances if available, but have their own provider being mandatory and reject everything if it is down (same failure domain as the instance).

Misuse

Such a service is in an ideal position to be used for nefarious purposes.

This can be mitigated by using a consensus between various providers, but security is very important here and needs to be a focus of the project. All best-practices need to be implemented from the start (mandatory 2FA, audit log, dependency management, automatic code auditing…).

Any organisation deploying this kind of service should also be audited by external auditors to ensure they are following the best practices and will not be a source of data leaks or security issues. The software needs to be designed in such a way that those audits are as easy as possible.

Next steps

The scope for such a project is huge, and I strongly think it would be unwise to start it by trying to design it entirely and writing the needed protocols. This needs some live experimentation and to get some real use as soon as possible to better understand what can work and what needs more thinking or an alternate design.

As such, I suggest this plan:

Start with a small manageable scope. Spam-fighting is may be the easiest part to implement as the flows are simple: ingest some telemetry data, have an endpoint to get an answer on “should I allow this sign-up” (based on IP reputation, rate of new registrations, email provider reputation…) / “should I allow this new post” (for example, if your instance usually has 5 new posts per minute, a spike to 500 posts over 60 seconds is an easy signal to act upon). Telemetry for this can be limited to some less privacy-sensitive data at the start, as it should first work on traffic-based behaviour rather than content-related.
Do not start with interoperability. This is a highly desirable goal, but experimentation will be much faster if the initial work is strongly tied to a single platform. I feel confident that we can get something plugged into mastodon.social or mastodon.online quite early and this should bring relevant real-world data and could be used to start tuning the various algorithms. Obviously other platforms can also start working on their implementation, with the caveat that there is nothing normalised at the moment and that protocol can change.
Design a modular protocol from the start, with versioned APIs / payload so it can evolve quickly and instances can choose which part they want to implement. This will be very important as once the initial implementation is done, other parts will be added over time, like report-management or content moderation features.

I hope that this vision can be a first meeting point for the multiple organisations, groups and people interested in this topic and that things can start moving in the coming weeks. These matters are important and have been discussed a lot, but I would really like some real progress to be made and experimentation work to start soon.

I will advocate for this on the Mastodon side as this is the group I have the most influence with, as well as working with IFTAS on ensuring we go in the same direction.

If you want to contribute, you can become a Mastodon sponsor (money helps for real, at the moment there is only 1 full-time developer on the project), join the Mastodon Discord (available to Patreon members ) or directly reach out to me.

Thanks for reading, and let’s start building!

Renaud Chaput

Evolving Mastodon’s Trust & Safety Features