How will censoring of entries for legal or ethical reasons be handled? For example if a version of a component contains copyrighted or illegal material, or if someone invokes their GDPR right to erasure, or someone is at risk of being doxed through information stored in a component. Has any thought been given to that? Will there be something like a thombstone with a specific format that can replace a log entry at any time in a way that still passes validation through a special case? Or will it be handled in some other way?
There are two different scenarios here:
In one, a release's "content" is all that needs to be taken down; in this case the release would be marked "yanked" and the registry would stop serving / delete its content.
In another, something about a package name or release metadata must be taken down. Hopefully this would be rare, but certainly a court could compel a registry to do so. In that case - under the current design - the entire package log would need to be removed and presumably replaced (triggering warnings and requiring user intervention for consumers). There is room in the protocol for tombstoning a single release's entry but the log semantics would need to be updated to allow for that.
The information in the cryptographic data structures is kept very minimal.
Additionally, the strategy Lann describes for content takedown can also work for other pieces of metadata that we refer to as hashes instead of including in the content of the log directly. This actually includes package names, which we don't include directly in logs and instead refer to by hash. Other future metadata like a "rationale" or "context" field for a given event could also be included via hash-indirection with the content stored separately. It's my hope that by being cautious here and employing this indirection we can stay ahead of any concerns that might make us need to rewrite the whole log.
This actually includes package names, which we don't include directly in logs and instead refer to by hash.
You mean directly hashing the package name? That would be crackable using something like hashcat given enough compute power, right? Or do you mean some kind of scheme which has a commitment to a specific name while only being able to crack it with knowledge of a separately stored salt that can be censored if necessary?
It is an interesting idea but I am not aware of any case where the hash of censored content also needs to be censored.
If the package name is simple enough for the hash to be crackable in reasonable time, you will need to censor the hash itself too to prevent being able to get the information anyway, right?
If it was a security issue, yes
Last updated: Jan 24 2025 at 00:11 UTC